Abstract: A processor-implemented method of personalizing a speech recognition model includes: obtaining statistical information of first scaling vectors combined with a base model for speech recognition; obtaining utterance data of a user; and generating a personalized speech recognition model by modifying a second scaling vector combined with the base model based on the utterance data of the user and the statistical information.
Abstract: This relates to intelligent automated assistants and, more specifically, to intelligent context sharing and task performance among a collection of devices with intelligent automated assistant capabilities. An example method includes, at a first electronic device participating in a context-sharing group associated with a first location: receiving a user voice input; receiving, from a context collector, an aggregate context of the context-sharing group; providing at least a portion of the aggregate context and data corresponding to the user voice input to a remote device; receiving, from the remote device, a command to perform one or more tasks and a device identifier corresponding to a second electronic device; and transmitting the command to the second electronic device based on the device identifier, wherein the command causes the second electronic device to perform the one or more tasks.
Abstract: An electronic device includes a touchscreen display, a microphone, at least one speaker, a processor and a memory which stores instructions that cause the processor to receive a user utterance including a request for performing a task with the electronic device, to transmit data associated with the user utterance to an external server, to receive a response from the external server including sample utterances representative of an intent of the user utterance and the sample utterances being selected by the external server based on the user utterance, to display the sample utterances on the touchscreen display, to receive a user input to select one of the sample utterances, and to perform the task by causing the electronic device to follow a sequence of states associated with the selected one of the sample utterances.
Type:
Grant
Filed:
January 9, 2018
Date of Patent:
November 9, 2021
Inventors:
Kyoung Gu Woo, Kyu Young Kim, Hyun Jin Park, Injong Rhee, Woo Up Kwon, Dong Ho Jang
Abstract: A system for handling errors during automatic speech recognition by processing a potentially defective utterance to determine an alternative, potentially successful utterance. The system processes the N-best ASR hypotheses corresponding to the defective utterance using a trained model to generate a word-level feature vector. The word-level feature vector is processed using a sequence-to-sequence architecture to determine the alternate utterance.
Type:
Grant
Filed:
March 25, 2019
Date of Patent:
October 26, 2021
Assignee:
Amazon Technologies, Inc.
Inventors:
Alireza Roshan Ghias, Sean William Jewell, Chenlei Guo
Abstract: A computer-implemented method for providing contextual outputs based on varying coordinates related to a target event is provided. The method is implemented by a processor. The processor receives the target event based on a user input and the varying coordinates. The processor then executes a semantic analysis and relationships building operation on the varying coordinates and the target event to generate proposals and provides the contextual outputs based on the proposals. Note that the semantic analysis and relationships building operation reduces recommendation errors by and increases a computing efficiency of the processor with regard to providing the contextual outputs.
Type:
Grant
Filed:
March 6, 2019
Date of Patent:
October 26, 2021
Assignee:
INTERNATIONAL BUSINESS MACHINES CORPORATION
Abstract: A plurality of images captured using a camera included in a robotic system are analyzed. A spatial map is generated using a sensor included in the robotic system. A semantic location map is generated using at least the analyzed plurality of captured images and the generated spatial map. A natural language input referencing a desired product item is received from a user. A speech recognition result is recognized from the natural language input and sent to a reasoning engine. In response to sending the recognized speech recognition result, one or more commands for the robotic system are received from the reasoning engine. The received one or more commands are performed and feedback to the user based on at least one of the one or more commands is provided.
Type:
Grant
Filed:
December 20, 2018
Date of Patent:
October 19, 2021
Inventors:
Run Cui, Won Taek Chung, Hye Jun Yu, Hong Shik Shinn
Abstract: Systems and processes for operating an intelligent automated assistant are provided. In one example process, a speech input is received from a user. In response to determining that the speech input corresponds to a user intent of obtaining information associated with a user experience of the user, one or more parameters referencing a user experience of the user are identified. Metadata associated with the referenced user experience is obtained from an experiential data structure. Based on the metadata, one or more media items associated with the referenced are retrieved based on the metadata. The one or more media items associated with the referenced user experience are output together.
Type:
Grant
Filed:
August 7, 2018
Date of Patent:
October 12, 2021
Assignee:
Apple Inc.
Inventors:
Marcos Regis Vescovi, Eric M. G. Circlaeys, Richard Warren, Jeffrey Traer Bernstein, Matthaeus Krenn
Abstract: A multi-channel signal encoding method and an encoder, where the encoding method includes obtaining a multi-channel signal of a current frame, determining an initial multi-channel parameter of the current frame, determining a difference parameter based on the initial multi-channel parameter of the current frame and multi-channel parameters of previous K frames of the current frame, where the difference parameter represents a difference between the initial multi-channel parameter of the current frame and the multi-channel parameters of the previous K frames, and K is an integer greater than or equal to one, determining a multi-channel parameter of the current frame based on the difference parameter and a characteristic parameter of the current frame, and encoding the multi-channel signal based on the multi-channel parameter of the current frame. Hence, the method and the encoder ensure better accuracy of inter-channel information of a multi-channel signal.
Type:
Grant
Filed:
February 11, 2019
Date of Patent:
September 28, 2021
Assignee:
HUAWEI TECHNOLOGIES CO., LTD.
Inventors:
Zexin Liu, Xingtao Zhang, Haiting Li, Lei Miao
Abstract: Dialog visualizations are created to enable analysis of interactions between a user and a speech recognition system used to implement user commands. Spoken commands from the user may be classified, along with system responses to the spoken commands, to enable aggregation of communication exchanges that form dialog. This data may then be used to create a dialog visualization. The dialog visualization may enable an analyst to visually explore different branches of the interactions represented in the dialog visualization. The dialog visualization may show a trajectory of the dialog, which may be explored in an interactive manner by the analyst.
Type:
Grant
Filed:
December 22, 2014
Date of Patent:
August 17, 2021
Assignee:
Amazon Technologies, Inc.
Inventors:
Vikas Jain, Shishir Sridhar Bharathi, Giuseppe Pino Di Fabbrizio, Ling Hu, Sumedha Arvind Kshirsagar, Shamitha Somashekar, John Daniel Thimsen, Tudor Toma
Abstract: Validating belief states of an artificial intelligence system includes providing a question answering service; detecting a negative sentiment of a user to an answer transmitted to a device associated with the user; and responsive to detecting the negative sentiment, detecting that the answer relates to a topic on which there is controversy. Next, a new belief state is added to the question answering service based on the controversy, and an updated answer is transmitted to the device, wherein the updated answer is based on the new belief state.
Type:
Grant
Filed:
June 5, 2018
Date of Patent:
August 17, 2021
Assignee:
INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventors:
Aysu Ezen Can, Brendan Bull, Scott R. Carrier, Dwi Sianto Mansjur
Abstract: A device includes a memory configured to store category labels associated with categories of a natural language processing library. A processor is configured to analyze input audio data to generate a text string and to perform natural language processing on at least the text string to generate an output text string including an action associated with a first device, a speaker, a location, or a combination thereof. The processor is configured to compare the input audio data to audio data of the categories to determine whether the input audio data matches any of the categories and, in response to determining that the input audio data does not match any of the categories: create a new category label, associate the new category label with at least a portion of the output text string, update the categories with the new category label, and generate a notification indicating the new category label.
Type:
Grant
Filed:
May 4, 2018
Date of Patent:
August 17, 2021
Assignee:
QUALCOMM Incorporated
Inventors:
Erik Visser, Fatemeh Saki, Yinyi Guo, Sunkuk Moon, Lae-Hoon Kim, Ravi Choudhary
Abstract: An electronic device includes a microphone obtaining an audio signal, a memory in which a speaker model is stored, and at least one processor. The at least one processor is configured to obtain a voice signal from the audio signal, to compare the voice signal with the speaker model to verify a user, and, if a verification result indicates that the user corresponds to a pre-enrolled speaker, to perform an operation corresponding to the obtained voice signal.
Type:
Grant
Filed:
January 9, 2018
Date of Patent:
July 27, 2021
Inventors:
Young Woo Lee, Ho Seon Shin, Sang Hoon Lee
Abstract: Methods and systems for training one or more neural networks for transcription and for transcribing a media file using the trained one or more neural networks are provided. One of the methods includes: segmenting the media file into a plurality of segments; extracting, using a first neural network, audio features of a first and second segment of the plurality of segments; and identifying, using a second neural network, a best-candidate engine for each of the first and second segments based at least on audio features of the first and second segments. A best-candidate engine is a neural network having a highest predicted transcription accuracy among a collection of neural networks.
Type:
Grant
Filed:
January 8, 2019
Date of Patent:
June 22, 2021
Inventors:
Peter Nguyen, David Kettler, Karl Schwamb, Chad Steelberg
Abstract: Techniques for identifying content displayed by a content presentation system associated with a physical environment, detecting an audible expression by a user located within the physical environment, and storing information associated with the audible expression in relation to the displayed content are disclosed.
Abstract: A user, such as an elderly person, may be assisted by an assistance device in his or her caregiving environment that operates in conjunction with one or more server computers. The assistance device may execute a schedule of assistance actions where each assistance action is associated with a time and is executed at that time to assist the user. An assistance action may present an input request to a user, process a voice input of the user, and analyze the voice input to determine that the voice input corresponds to a negative response event, a positive response event, or a non-response event. Based on the categorization of one or more voice inputs as negative response events, positive response events, or non-response events, it may be determined to notify a caregiver of the user, for example where the user has not responded to a number of assistance actions.
Abstract: A playout delay adjustment method includes: adjusting a playout delay surplus based on a difference value between a first playout delay obtained in a first scheme and a second playout delay obtained in a second scheme and determining an adaptation type of a current frame according to whether a previous frame is an active frame; and when the determined adaptation type is signal-based adaptation, performing time scale modification (TSM) according to an adaptation scheme determined according to a comparison result between the first playout delay and the second playout delay and a comparison result between a target delay and the first playout delay.
Type:
Grant
Filed:
September 5, 2016
Date of Patent:
June 1, 2021
Assignees:
SAMSUNG ELECTRONICS CO., LTD., INDUSTRY—UNIVERSITY COOPERATION FOUNDATION HANYANG UNIVERSITY ERICA CAMPUS
Abstract: Systems and methods are provided for an automated speech recognition system. A microphone records a keyword spoken by a user, and a front end divides the recorded keyword into a plurality of subunits, each containing a segment of recorded audio, and extracts a set of features from each of the plurality of subunits. A decoder assigns one of a plurality of content classes to each of the plurality of subunits according to at least the extracted set of features for each subunit. A quality evaluation component calculates a score representing a quality of the keyword from the content classes assigned to the plurality of subunits.
Type:
Grant
Filed:
September 15, 2017
Date of Patent:
June 1, 2021
Assignee:
Texas Instruments Incorporated
Inventors:
Tarkesh Pande, Lorin Paul Netsch, David Patrick Magee
Abstract: A voice identification method comprises: obtaining audio data, and extracting an audio feature of the audio data; determining whether a voice identification feature having a similarity with the audio feature above a preset matching threshold exists in an associated feature library; and in response to determining that the voice identification feature exists in the associated feature library, updating, by using the audio feature, the voice identification feature obtained through matching.