Recognition Patents (Class 704/231)
  • Patent number: 11024296
    Abstract: Systems and methods are described herein for providing media guidance. Control circuitry may receive a first voice input and access a database of topics to identify a first topic associated with the first voice input. A user interface may generate a first response to the first voice input, and subsequent to generating the first response, the control circuitry may receive a second voice input. The control circuitry may determine a match between the second voice input and an interruption input such as a period of silence or a keyword or a phrase, such as “Ahh,”, “Umm,”, or “Hmm.” The user interface may generate a second response that is associated with a second topic related to the first topic. By interrupting the conversation and changing the subject from time to time, media guidance systems can appear to be more intelligent and human.
    Type: Grant
    Filed: March 11, 2020
    Date of Patent: June 1, 2021
    Assignee: Rovi Guides, Inc.
    Inventors: Charles Dawes, Walter R. Klappert
  • Patent number: 11024302
    Abstract: Systems and methods are provided for an automated speech recognition system. A microphone records a keyword spoken by a user, and a front end divides the recorded keyword into a plurality of subunits, each containing a segment of recorded audio, and extracts a set of features from each of the plurality of subunits. A decoder assigns one of a plurality of content classes to each of the plurality of subunits according to at least the extracted set of features for each subunit. A quality evaluation component calculates a score representing a quality of the keyword from the content classes assigned to the plurality of subunits.
    Type: Grant
    Filed: September 15, 2017
    Date of Patent: June 1, 2021
    Assignee: Texas Instruments Incorporated
    Inventors: Tarkesh Pande, Lorin Paul Netsch, David Patrick Magee
  • Patent number: 11024312
    Abstract: A voice recognition apparatus includes a communication part configured to communicate with a voice recognition server, a voice receiver configured to receive a user's voice signal, a storage part configured to store guide information comprising at least an example command for voice recognition; and a controller. The controller is configured to generate a guide image comprising at least a part of the example command, transmit the received user's voice signal to the voice recognition server through the communication part in response to receiving the user's voice signal by the voice receiver, and update the stored guide information based on update information received through the communication part.
    Type: Grant
    Filed: March 11, 2020
    Date of Patent: June 1, 2021
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Jong-cheol Park, Do-wan Kim, Sang-shin Park
  • Patent number: 11024306
    Abstract: The present disclosure is generally directed to the generation of voice-activated data flows in interconnected network. The voice-activated data flows can include input audio signals that include a request and are detected at a client device. The client device can transmit the input audio signal to a data processing system, where the input audio signal can be parsed and passed to the data processing system of a service provider to fulfill the request in the input audio signal. The present solution is configured to conserve network resources by reducing the number of network transmissions needed to fulfill a request.
    Type: Grant
    Filed: September 14, 2018
    Date of Patent: June 1, 2021
    Assignee: GOOGLE LLC
    Inventors: Gaurav Bhaya, Ulas Kirazci, Bradley Abrams, Adam Coimbra, Ilya Firman, Carey Radebaugh
  • Patent number: 11024316
    Abstract: Computer-implemented method and system for receiving and processing one or more moment-associating elements. For example, the computer-implemented method includes receiving the one or more moment-associating elements, transforming the one or more moment-associating elements into one or more pieces of moment-associating information, and transmitting at least one piece of the one or more pieces of moment-associating information.
    Type: Grant
    Filed: May 3, 2019
    Date of Patent: June 1, 2021
    Assignee: Otter.ai, Inc.
    Inventors: Yun Fu, Simon Lau, Kaisuke Nakajima, Julius Cheng, Sam Song Liang, James Mason Altreuter, Kean Kheong Chin, Zhenhao Ge, Hitesh Anand Gupta, Xiaoke Huang, James Francis McAteer, Brian Francis Williams, Tao Xing
  • Patent number: 11011167
    Abstract: A communication system includes a pair of speech recognition devices that are capable of communicating with each other, each of the speech recognition devices including a speech input section into which speech is input, a speech recognition section that recognizes speech input to the speech input section, and a speech output section that outputs speech. The communication system also includes an information generation section that generates notification information corresponding to speech recognized by the speech recognition section in one speech recognition device from out of the pair of speech recognition devices, and a speech output control section that performs control to output notification speech corresponding to the notification information at a specific timing from the speech output section of the other speech recognition device from out of the pair of speech recognition devices.
    Type: Grant
    Filed: January 8, 2019
    Date of Patent: May 18, 2021
    Assignee: Toyota Jidosha Kabushiki Kaisha
    Inventors: Hideki Kobayashi, Akihiro Muguruma, Yukiya Sugiyama, Shota Higashihara, Riho Matsuo, Naoki Yamamuro
  • Patent number: 11011162
    Abstract: The technology disclosed relates to performing speech recognition for a plurality of different devices or devices in a plurality of conditions. This includes storing a plurality of acoustic models associated with different devices or device conditions, receiving speech audio including natural language utterances, receiving metadata indicative of a device type or device condition, selecting an acoustic model from the plurality in dependence upon the received metadata, and employing the selected acoustic model to recognize speech from the natural language utterances included in the received speech audio. Each of speech recognition and the storage of acoustic models can be performed locally by devices or on a network-connected server. Also provided is a platform and interface, used by device developers to select, configure, and/or train acoustic models for particular devices and/or conditions.
    Type: Grant
    Filed: June 1, 2018
    Date of Patent: May 18, 2021
    Assignee: SOUNDHOUND, INC.
    Inventors: Mehul Patel, Keyvan Mohajer
  • Patent number: 11004445
    Abstract: In one embodiment, a smartwatch includes a processor and a memory storing instructions to be executed in the processor. The instructions are configured to cause the processor to obtain input comprising voice information; determine whether the voice information comprises interrogative keyword; and determine that the voice information is interrogative information in response to determining that the voice information comprises interrogative keyword. The instructions are configured to cause the processor to determine whether reply information corresponding to the interrogative information can be obtained from a memory of the smartwatch; and send the interrogative information to a server through a wireless network in response to determining that the reply information corresponding to the interrogative information cannot be obtained from the memory of the smartwatch.
    Type: Grant
    Filed: May 27, 2017
    Date of Patent: May 11, 2021
    Assignee: Huawei Technologies Co., Ltd.
    Inventors: Yizu Feng, Bin Li
  • Patent number: 11004458
    Abstract: Provided are a method and an apparatus for determining an encoding mode for improving the quality of a reconstructed audio signal. A method of determining an encoding mode includes determining one from among a plurality of encoding modes including a first encoding mode and a second encoding mode as an initial encoding mode in correspondence to characteristics of an audio signal, and if there is an error in the determination of the initial encoding mode, generating a modified encoding mode by modifying the initial encoding mode to a third encoding mode.
    Type: Grant
    Filed: October 4, 2019
    Date of Patent: May 11, 2021
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Ki-hyun Choo, Anton Victorovich Porov, Konstantin Sergeevich Osipov, Nam-suk Lee
  • Patent number: 11004454
    Abstract: Techniques for updating voice profiles used to perform user recognition are described. A system may use clustering techniques to update voice profiles. When the system receives audio data representing a spoken user input, the system may store the audio data. Periodically, the system may recall, from storage, audio data (representing previous user inputs). The system may identify clusters of the audio data, with each cluster including similar or identical speech characteristics. The system may determine a cluster is substantially similar to an existing voice profile. If this occurs, the system may create an updated voice profile using the original voice profile and the cluster of audio data.
    Type: Grant
    Filed: November 6, 2018
    Date of Patent: May 11, 2021
    Assignee: Amazon Technologies, Inc.
    Inventors: Sundararajan Srinivasan, Arindam Mandal, Krishna Subramanian, Spyridon Matsoukas, Aparna Khare, Rohit Prasad
  • Patent number: 11005838
    Abstract: Systems, methods, and other embodiments associated with a monitoring process for event detection and notification transmission are described. In one embodiment, a method includes configuring a monitoring process with a matching rule used to evaluate data sources of an enterprise computing environment to determine if an event has occurred. The example method may also include executing the monitor process to identify a set of subscribers and establish a trust relationship. The example method may also include, for each subscriber, executing the monitoring process to impersonate a subscriber, execute the matching rule upon data sources accessible to the subscriber to perform a test as to whether the event has occurred, and transmit a message of the event if the event occurred.
    Type: Grant
    Filed: May 15, 2018
    Date of Patent: May 11, 2021
    Assignee: Oracle International Corporation
    Inventors: Michael Tebben, Haiyan Wang, Nicole Laurent, Qiu Zhong, Aaron Johnson, Darryl M. Shakespeare
  • Patent number: 10991369
    Abstract: A system and method obtaining structured information from a conversation including receiving a first input from a user, determining a first set of slots filled based on the first input using natural language processing and a non-linear slot filling algorithm, determining first conversation based on the first set of slots filled, determining a first empty slot associated with the first conversation, prompting the user for a second input, the second input associated with the first empty slot, filling the first empty slot using natural language processing and the non-linear slot filling algorithm, determining that the slots associated with the first conversation are filled; and, responsive to determining that the slots associated with the first conversation are filled, initiating an action associated with the conversation.
    Type: Grant
    Filed: January 31, 2019
    Date of Patent: April 27, 2021
    Inventors: Hristo Borisov, Boyko Karadzhov, Ivan Atanasov, Georgi Varzonovtsev
  • Patent number: 10990902
    Abstract: A method, system, and computer program product for learning a recognition model for recognition processing. The method includes preparing one or more examples for learning, each of which includes an input segment, an additional segment adjacent to the input segment and an assigned label. The input segment and the additional segment are extracted from an original training data. A classification model is trained, using the input segment and the additional segment in the examples, to initialize parameters of the classification model so that extended segments including the input segment and the additional segment are reconstructed from the input segment. Then, the classification model is tuned to predict a target label, using the input segment and the assigned label in the examples, based on the initialized parameters. At least a portion of the obtained classification model is included in the recognition model.
    Type: Grant
    Filed: September 25, 2019
    Date of Patent: April 27, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: Gakuto Kurata
  • Patent number: 10984788
    Abstract: An automatic speech recognition (ASR) system includes at least one processor and a memory storing instructions.
    Type: Grant
    Filed: March 29, 2018
    Date of Patent: April 20, 2021
    Assignee: BlackBerry Limited
    Inventor: Darrin Kenneth John Fry
  • Patent number: 10984801
    Abstract: AM and LM parameters to be used for adapting an ASR model are derived for each audio segment of an audio stream comprising multiple audio programs. A set of identifiers, including a speaker identifier, a speaker domain identifier and a program domain identifier, is obtained for each audio segment. The set of identifiers are used to select most suitable AM and LM parameters for the particular audio segment. The embodiments enable provision of maximum constraints on the AMs and LMs and enable adaptation of the ASR model on the fly for audio streams of multiple audio programs, such as broadcast audio. This means that the embodiments enable selecting AM and LM parameters that are most suitable in terms of ASR performance for each audio segment.
    Type: Grant
    Filed: May 8, 2017
    Date of Patent: April 20, 2021
    Assignee: Telefonaktiebolaget LM Ericsson (publ)
    Inventors: Volodya Grancharov, Erlendur Karlsson, Sigurdur Sverrisson, Maxim Teslenko, Konstantinos Vandikas, Aneta Vulgarakis Feljan
  • Patent number: 10978048
    Abstract: An apparatus comprising one or more processors, a communication circuit, and a memory for storing instructions, which when executed, performs a method of recognizing a user utterance. The method comprises: receiving first data associated with a user utterance, performing, a first determination to determine whether the user utterance includes the first data and a specified word, performing a second determination to determine whether the first data includes the specified word, transmitting the first data to an external server, receiving a text generated from the first data by the external server, performing a third determination to determine whether the received text matches the specified word, and determining whether to activate the voice-based input system based on the third determination.
    Type: Grant
    Filed: May 23, 2018
    Date of Patent: April 13, 2021
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Tae Jin Lee, Young Woo Lee, Seok Yeong Jung, Chakladar Subhojit, Jae Hoon Jeong, Jun Hui Kim, Jae Geun Lee, Hyun Woong Lim, Soo Min Kang, Eun Hye Shin, Seong Min Je
  • Patent number: 10978192
    Abstract: Techniques for documenting a clinical procedure involve transcribing audio data comprising audio of one or more clinical personnel speaking while performing the clinical procedure. Examples of applicable clinical procedures include sterile procedures such as surgical procedures, as well as non-sterile procedures such as those conventionally involving a core code reporter. The transcribed audio data may be analyzed to identify relevant information for documenting the clinical procedure, and a text report including the relevant information documenting the clinical procedure may be automatically generated.
    Type: Grant
    Filed: January 22, 2019
    Date of Patent: April 13, 2021
    Assignee: Nuance Communications, Inc.
    Inventor: Mariana Casella dos Santos
  • Patent number: 10971160
    Abstract: A user device (e.g., voice assistant device, voice enabled device, smart device, computing device, etc.) may receive/detect audio content (e.g., speech, etc.) that includes a wake word and/or words similar to a wake word. The user device may require a wake word, a portion of the wake word, or words similar to the wake word to be detected prior to interacting with a user. The user device may, based on characteristics of the audio content, determine if the audio content originates from an authorized user. The user device may decrease and/or increase scrutiny applied to wake word detection based on whether audio content originates from an authorized user.
    Type: Grant
    Filed: November 13, 2018
    Date of Patent: April 6, 2021
    Assignee: Comcast Cable Communications, LLC
    Inventors: Hans Sayyadi, Nima Bina
  • Patent number: 10957330
    Abstract: Systems and methods for control of vehicles are provided. A computer-implemented method in example embodiments may include receiving, at a computing system comprising one or more processors positioned in a vehicle, voice data from one or more audio sensors positioned in the vehicle. The system can determine whether configuration of a reference voiceprint for a speech processing system of the vehicle is authorized based at least in part on performance data associated with the vehicle. In response to determining that configuration of the reference voiceprint is authorized, a first reference voiceprint based on the reference voice data can be stored and the speech processing system configured to authenticate input voice data for a first set of voice commands based on the reference voiceprint.
    Type: Grant
    Filed: May 31, 2019
    Date of Patent: March 23, 2021
    Assignee: GE Aviation Systems Limited
    Inventors: Stefan Alexander Schwindt, Barry Foye
  • Patent number: 10957446
    Abstract: Systems, methods, and computer readable storage medium for providing a genericized medical device architecture common to a plurality of medical devices are disclosed. The architecture may comprise at least one diagnostics module associated with at least one of the plurality of medical devices, wherein the at least one diagnostics module is configured to monitor an operational status of the at least one medical device. At least one hardware abstraction layer may be associated with at least one of the plurality of medical devices, and may be configured to provide abstracted access to hardware of the at least one medical device.
    Type: Grant
    Filed: August 8, 2016
    Date of Patent: March 23, 2021
    Assignee: Johnson & Johnson Surgical Vision, Inc.
    Inventors: Hou Man Chong, Edith W. Fung, Timothy L. Hunter, Deep K. Mehta
  • Patent number: 10950240
    Abstract: There is provided an information processing device and an information processing method that enable a desired voice recognition result to be easily obtained. The information processing device includes a presentation control unit that controls a separation at a time of presenting a recognition result of voice recognition on the basis of context relating to voice recognition. The present technology can be applied, for example, to an information processing device that independently performs voice recognition, a server that performs voice recognition in response to a call from a client and transmits the recognition result to the client, or the client that requests voice recognition to the server, receives the recognition result from the server, and presents the recognition result.
    Type: Grant
    Filed: August 14, 2017
    Date of Patent: March 16, 2021
    Assignee: SONY CORPORATION
    Inventors: Yuhei Taki, Shinichi Kawano
  • Patent number: 10949283
    Abstract: A computer-implemented method is presented for detecting anomalies in dynamic datasets generated in a cloud computing environment. The method includes monitoring a plurality of cloud servers receiving a plurality of data points, employing a two-level clustering training module to generate micro-clusters from the plurality of data points, each of the micro-clusters representing a set of original data from the plurality of data points, employing a detecting module to detect normal data points, abnormal data points, and unknown data points from the plurality of data points via a detection model, employing an evolving module using a different evolving mechanism for each of the normal, abnormal, and unknown data points to evolve the detection model, and generating a system report displayed on a user interface, the system report summarizing the micro-cluster information.
    Type: Grant
    Filed: November 6, 2018
    Date of Patent: March 16, 2021
    Assignee: International Business Machines Corporation
    Inventors: Jia Wei Yang, Fan Jing Meng
  • Patent number: 10943606
    Abstract: Detecting an end-point of user's voice command or utterance with high accuracy is critical in automatic speech recognition (ASR)-based human machine interface. If an ASR system incorrectly detects an end-point of utterance and transmits this incomplete sentence to other processing blocks for further processing, it is likely the processed result would lead to incorrect interpretation. A method includes selecting a first semantic network based on context of the audio signal and more accurately detecting the end-point of user's utterance included in the audio signal based on the first semantic network and also based on at least one timeout threshold associated with the first semantic network.
    Type: Grant
    Filed: April 12, 2018
    Date of Patent: March 9, 2021
    Assignee: QUALCOMM Incorporated
    Inventors: Paras Surendra Doshi, Ayush Agarwal, Shri Prakash
  • Patent number: 10944767
    Abstract: Mechanisms are provided for training a classifier to identify adversarial input data. A neural network processes original input data representing a plurality of non-adversarial original input data and mean output learning logic determines a mean response for each intermediate layer of the neural network based on results of processing the original input data. The neural network processes adversarial input data and layer-wise comparison logic compares, for each intermediate layer of the neural network, a response generated by the intermediate layer based on processing the adversarial input data, to the mean response associated with the intermediate layer, to thereby generate a distance metric for the intermediate layer. The layer-wise comparison logic generates a vector output based on the distance metrics that is used to train a classifier to identify adversarial input data based on responses generated by intermediate layers of the neural network.
    Type: Grant
    Filed: February 1, 2018
    Date of Patent: March 9, 2021
    Assignee: International Business Machines Corporation
    Inventors: Gaurav Goswami, Sharathchandra Pankanti, Nalini K. Ratha, Richa Singh, Mayank Vatsa
  • Patent number: 10936641
    Abstract: A faster and more streamlined system for providing summary and analysis of large amounts of communication data is described. System and methods are disclosed that employ an ontology to automatically summarize communication data and present the summary to the user in a form that does not require the user to listen to the communication data. In one embodiment, the summary is presented as written snippets, or short fragments, of relevant communication data that capture the meaning of the data relating to a search performed by the user. Such snippets may be based on theme and meaning unit identification.
    Type: Grant
    Filed: May 21, 2018
    Date of Patent: March 2, 2021
    Assignee: VERINT SYSTEMS LTD.
    Inventors: Roni Romano, Galia Zacay, Rahm Fehr
  • Patent number: 10930287
    Abstract: In some embodiments, an exemplary inventive system for improving computer speed and accuracy of automatic speech transcription includes at least components of: a computer processor configured to perform: generating a recognition model specification for a plurality of distinct speech-to-text transcription engines; where each distinct speech-to-text transcription engine corresponds to a respective distinct speech recognition model; receiving at least one audio recording representing a speech of a person; segmenting the audio recording into a plurality of audio segments; determining a respective distinct speech-to-text transcription engine to transcribe a respective audio segment; receiving, from the respective transcription engine, a hypothesis for the respective audio segment; accepting the hypothesis to remove a need to submit the respective audio segment to another distinct speech-to-text transcription engine, resulting in the improved computer speed and the accuracy of automatic speech transcription; and ge
    Type: Grant
    Filed: December 3, 2018
    Date of Patent: February 23, 2021
    Inventors: Tejas Shastry, Matthew Goldey, Svyat Vergun
  • Patent number: 10930271
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition using neural networks. A feature vector that models audio characteristics of a portion of an utterance is received. Data indicative of latent variables of multivariate factor analysis is received. The feature vector and the data indicative of the latent variables is provided as input to a neural network. A candidate transcription for the utterance is determined based on at least an output of the neural network.
    Type: Grant
    Filed: September 17, 2019
    Date of Patent: February 23, 2021
    Inventors: Andrew W. Senior, Ignacio Lopez Moreno
  • Patent number: 10916242
    Abstract: The present invention relates to the field of intelligent recognition, and discloses an intent recognition method based on a deep learning network, resolving a technical problem that accuracy of intent recognition is not high.
    Type: Grant
    Filed: March 26, 2020
    Date of Patent: February 9, 2021
    Assignee: NANJING SILICON INTELLIGENCE TECHNOLOGY CO., LTD.
    Inventors: Huapeng Sima, Ao Yao
  • Patent number: 10908873
    Abstract: A system and method for confirming a voice command of a media playback device is disclosed. The method includes receiving an instruction of a voice command and producing an audio confirmation of the command. A confirmation may be playing a media context item associated with the command, playing a verbal confirmation phrase, or playing a non-verbal audio cue.
    Type: Grant
    Filed: May 7, 2018
    Date of Patent: February 2, 2021
    Assignee: Spotify AB
    Inventors: Emma-Camelia Gosu, Daniel Bromand, Karl Humphreys
  • Patent number: 10892996
    Abstract: Systems and processes for operating an intelligent automated assistant are provided. In one example process, an event associated with an audio input is detected with a first process. In accordance with a detection of the event, a delay value associated with an electronic device is determined. The delay value corresponds to a time required to determine, with a second process, whether the audio input includes a spoken trigger. In accordance with a determination that the delay value exceeds a threshold, the delay value is broadcast during a first advertising session, and determination is made, during a second advertising session, whether the electronic device is to respond to the audio input. In accordance with a determination that the threshold is not exceeded, a determination is made, during the first advertising session, whether the electronic device is to respond to the audio input or wait for the second advertising session.
    Type: Grant
    Filed: August 31, 2018
    Date of Patent: January 12, 2021
    Assignee: Apple Inc.
    Inventor: Kurt Piersol
  • Patent number: 10885918
    Abstract: A system, method and computer program is provided for generating customized text representations of audio commands. A first speech recognition module may be used for generating a first text representation of an audio command based on a general language grammar. A second speech recognition module may be used for generating a second text representation of the audio command, the second module including a custom language grammar that may include contacts for a particular user. Entity extraction is applied to the second text representation and the entities are checked against a file containing personal language. If the entities are found in the user-specific language, the two text representations may be fused into a combined text representation and named entity recognition may be performed again to extract further entities.
    Type: Grant
    Filed: September 18, 2014
    Date of Patent: January 5, 2021
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Wilson Hsu, Kaheer Suleman, Joshua Pantony
  • Patent number: 10887462
    Abstract: A computing system, method and non-transitory computer readable memory are provided, to assist an agent during a client interaction between the agent and a client over a communications channel. An agent station may generate a graphic user interface (GUI) of the client interaction during the client interaction, the GUI displaying a current identified keyword and one or more interaction phases, each interaction phase having a respective current phase score for the client interaction. A keyword and associated keyword information from the client interaction may be received, including phase and corresponding phase score information, and the GUI updated with the currently identified keyword and newly received phase information accounting for the received corresponding phase score information. A situation report may be generated for a designated party, the situation report including an agent identification, and client interaction information including comments entered by the agent relating to the client interaction.
    Type: Grant
    Filed: April 9, 2019
    Date of Patent: January 5, 2021
    Assignee: West Corporation
    Inventors: Daniel A. Coyer, Ryan L. Techlin, Jeremy T. Tellock, Dennis C. White, Shelley A. Wildenberg
  • Patent number: 10872598
    Abstract: Embodiments of a production-quality text-to-speech (TTS) system constructed from deep neural networks are described. System embodiments comprise five major building blocks: a segmentation model for locating phoneme boundaries, a grapheme-to-phoneme conversion model, a phoneme duration prediction model, a fundamental frequency prediction model, and an audio synthesis model. For embodiments of the segmentation model, phoneme boundary detection was performed with deep neural networks using Connectionist Temporal Classification (CTC) loss. For embodiments of the audio synthesis model, a variant of WaveNet was created that requires fewer parameters and trains faster than the original. By using a neural network for each component, system embodiments are simpler and more flexible than traditional TTS systems, where each component requires laborious feature engineering and extensive domain expertise. Inference with system embodiments may be performed faster than real time.
    Type: Grant
    Filed: January 29, 2018
    Date of Patent: December 22, 2020
    Assignee: Baidu USA LLC
    Inventors: Sercan O. Arik, Mike Chrzanowski, Adam Coates, Gregory Diamos, Andrew Gibiansky, John Miller, Andrew Ng, Jonathan Raiman, Shubhahrata Sengupta, Mohammad Shoeybi
  • Patent number: 10867595
    Abstract: Described herein are systems and methods for generating natural language sentences with Sequence-to-sequence (Seq2Seq) models with attention. The Seq2Seq models may be implemented in applications, such as machine translation, image captioning, and speech recognition. Performance has further been improved by leveraging unlabeled data, often in the form of a language models. Disclosed herein are “Cold Fusion” architecture embodiments that leverage a pre-trained language model during training. The Seq2Seq models with Cold Fusion embodiments are able to better utilize language information enjoying faster convergence, better generalization, and almost complete transfer to a new domain while using less labeled training data.
    Type: Grant
    Filed: March 6, 2018
    Date of Patent: December 15, 2020
    Assignee: Baidu USA LLC
    Inventors: Anuroop Sriram, Heewoo Jun, Sanjeev Satheesh, Adam Coates
  • Patent number: 10867600
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for hotword trigger suppression are disclosed. In one aspect, a method includes the actions of receiving, by a microphone of a computing device, audio corresponding to playback of an item of media content, the audio including an utterance of a predefined hotword that is associated with performing an operation on the computing device. The actions further include processing the audio. The actions further include in response to processing the audio, suppressing performance of the operation on the computing device.
    Type: Grant
    Filed: October 31, 2017
    Date of Patent: December 15, 2020
    Assignee: Google LLC
    Inventors: Alexander H. Gruenstein, Johan Schalkwyk, Matthew Sharifi
  • Patent number: 10860333
    Abstract: Embodiments of the present disclosure seek to mitigate the timing issues of prior approaches by performing the NVMe device reset and post-reset re-initialization in parallel. In embodiments, the NVMe device reset and re-initialization operations are logically divided into front-end and back-end operations that may be carried out in parallel. Upon receipt of the command from a host to reset, the NVMe device carries out front-end reset operations for resetting the device, and in parallel performing back-end reinitialization operations. Once the front-end reset operations are complete, or after a predetermined period of time, the NVMe device reports to the host that the device reset is complete, while back-end operations continue. Once all reset and reinitialization operations are complete, the NVMe device may continue to conduct I/O instructions from the host.
    Type: Grant
    Filed: October 14, 2019
    Date of Patent: December 8, 2020
    Assignee: WESTERN DIGITAL TECHNOLOGIES, INC.
    Inventor: Shay Benisty
  • Patent number: 10860801
    Abstract: A method includes extracting a keyword and a slot from a natural language input, where the slot includes information. The method includes determining whether the keyword corresponds to one of a plurality of formation groups. In response to determining that the keyword corresponds to a specific formation group, the method includes updating metadata of the specific formation group with the information of the slot. In response to determining that the keyword does not correspond to any of the formation groups, the method includes determining whether the keyword corresponds to one of a plurality of clusters. In response to determining that the keyword corresponds to a specific cluster, the method includes updating the specific cluster with the information of the slot. In response to determining that the keyword does not correspond to any of the clusters, the method includes creating an additional formation group that includes the keyword and the slot.
    Type: Grant
    Filed: January 15, 2019
    Date of Patent: December 8, 2020
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Anil Yadav, Melvin Lobo, Chutian Wang
  • Patent number: 10854193
    Abstract: Methods, apparatuses, devices and computer-readable storage media for real-time speech recognition are provided. The method includes: based on an input speech signal, obtaining truncating information for truncating a sequence of features of the speech signal; based on the truncating information, truncating the sequence of features into a plurality of subsequences; and for each subsequence in the plurality of subsequences, obtaining a real-time recognition result through attention mechanism.
    Type: Grant
    Filed: February 6, 2019
    Date of Patent: December 1, 2020
    Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.
    Inventors: Xiaoyin Fu, Jinfeng Bai, Zhijie Chen, Mingxin Liang, Xu Chen, Lei Jia
  • Patent number: 10853584
    Abstract: Methods, apparatuses, and computer program products are described herein that are configured to express a time in an output text. In some example embodiments, a method is provided that comprises identifying a time period to be described linguistically in an output text. The method of this embodiment may also include identifying a communicative context for the output text. The method of this embodiment may also include determining one or more temporal reference frames that are applicable to the time period and a domain defined by the communicative context. The method of this embodiment may also include generating a phrase specification that linguistically describes the time period based on the descriptor that is defined by a temporal reference frame of the one or more temporal reference frames. In some examples, the descriptor specifies a time window that is inclusive of at least a portion of the time period to be described linguistically.
    Type: Grant
    Filed: April 19, 2019
    Date of Patent: December 1, 2020
    Assignee: ARRIA DATA2TEXT LIMITED
    Inventors: Gowri Somayajulu Sripada, Neil Burnett
  • Patent number: 10852720
    Abstract: Embodiments are disclosed for an example vehicle or driver assistance system for a vehicle. The example vehicle or driver assistance system includes an in-vehicle computing system of a vehicle, the in-vehicle computing system comprising an external device interface communicatively connecting the in-vehicle computing system to a mobile device, an inter-vehicle system communication module communicatively connecting the in-vehicle computing system to one or more vehicle systems of the vehicle, a processor, and a storage device storing instructions executable by the processor to receive a first command from the mobile device via the external device interface, perform a series of actions on the vehicle system until receiving a second command from the mobile device, both of the first command and the second command recognized by the mobile device based on one or more of voice commands issued by a user of the mobile device, and biometric analysis.
    Type: Grant
    Filed: February 10, 2016
    Date of Patent: December 1, 2020
    Assignee: HARMAN INTERNATIONAL INDUSTRIES, INCORPORATED
    Inventor: Yogesh Devidas Dusane
  • Patent number: 10846522
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating predictions for whether a target person is speaking during a portion of a video. In one aspect, a method includes obtaining one or more images which each depict a mouth of a given person at a respective time point. The images are processed using an image embedding neural network to generate a latent representation of the images. Audio data corresponding to the images is processed using an audio embedding neural network to generate a latent representation of the audio data. The latent representation of the images and the latent representation of the audio data is processed using a recurrent neural network to generate a prediction for whether the given person is speaking.
    Type: Grant
    Filed: October 16, 2018
    Date of Patent: November 24, 2020
    Assignee: Google LLC
    Inventors: Sourish Chaudhuri, Ondrej Klejch, Joseph Edward Roth
  • Patent number: 10847162
    Abstract: Multi-modal speech localization is achieved using image data captured by one or more cameras, and audio data captured by a microphone array. Audio data captured by each microphone of the array is transformed to obtain a frequency domain representation that is discretized in a plurality of frequency intervals. Image data captured by each camera is used to determine a positioning of each human face. Input data is provided to a previously-trained, audio source localization classifier, including: the frequency domain representation of the audio data captured by each microphone, and the positioning of each human face captured by each camera in which the positioning of each human face represents a candidate audio source. An identified audio source is indicated by the classifier based on the input data that is estimated to be the human face from which the audio data originated.
    Type: Grant
    Filed: June 27, 2018
    Date of Patent: November 24, 2020
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Eyal Krupka, Xiong Xiao
  • Patent number: 10846699
    Abstract: Embodiments of the invention are directed to systems and methods for biometrics transaction processing. A location of a device associated with a user may be determined. A reference to a biometric data model associated with the user stored within a database may be retrieved, based at least in part on the location. Biometric data may be received from the user. Using the reference, the biometric data may be compared to the biometric data model stored within the database. A determination may be made whether the user is authenticated for the transaction based on the comparing step.
    Type: Grant
    Filed: October 5, 2018
    Date of Patent: November 24, 2020
    Assignee: Visa International Service Association
    Inventors: John F. Sheets, Kim R. Wagner, Mark A. Nelsen
  • Patent number: 10847176
    Abstract: A computer-implemented method includes receiving, at a microphone of a voice-controlled device, a speech input, generating an electrical signal having a first gain level that is below a gain threshold for audible detection by a user, transmitting the electrical signal to the speaker and detecting, by the microphone, an audio signal that includes a combination of ambient noise and a probe audio signal, wherein the probe audio signal is output by the speaker based on the electrical signal. The method further includes determining a power level of the probe audio signal and determining a state of the display based on the power level of the probe audio signal.
    Type: Grant
    Filed: March 12, 2018
    Date of Patent: November 24, 2020
    Assignee: Amazon Technologies, Inc.
    Inventors: Trausti Thor Kristjansson, Srivatsan Kandadai, Mark Lawrence, Balsa Laban, Anna Chen Santos, Joseph Pedro Tavares, Miroslav Ristic, Valere Joseph Vanderschaegen
  • Patent number: 10847144
    Abstract: Differences are identified, at the lexical unit and/or phrase level, between time-varying corpora. A corpus for a time period of interest is compared with a reference corpus. N-grams are generated for both the corpus of interest and reference corpus. Numbers of occurrences are counted. An average number of occurrences, for each n-gram of the reference corpus, is determined. A difference value, between number of occurrences in corpus of interest and average number of occurrences, is determined. Each difference value is normalized. N-grams can be selected for display, or for further processing, on the basis of the normalized difference value. Further processing can include selecting a sample period. A plurality of reference corpora are produced, where a begin time, for each sub-corpus of the plurality of reference corpora, differs, from a begin time for the corpus of interest, by an integer multiple of the sample period. Word Cloud visualization is shown.
    Type: Grant
    Filed: September 15, 2015
    Date of Patent: November 24, 2020
    Assignee: NetBase Solutions, Inc.
    Inventors: Jens Erik Tellefsen, Ranjeet Singh Bhatia
  • Patent number: 10848544
    Abstract: The present disclosure relates to systems and processes for efficiently communicating mapping application data between electronic devices. In one example, a first electronic device can act as a proxy between a second electronic device and a map server by receiving a first request for map data from the second user device, determining a set of supplemental data to add to the first request to generate a complete second request for map data, and transmitting the second request to a map server. The first electronic device can receive the requested map data from the map server and transmit the received map data to the second electronic device. In another example, the first electronic device can act as a navigation server for the second electronic device by initially transmitting a full set of route data to the second electronic device and subsequently transmitting route update messages to the second electronic device.
    Type: Grant
    Filed: September 1, 2015
    Date of Patent: November 24, 2020
    Assignee: Apple Inc.
    Inventors: Aroon Pahwa, Matthew B. Ball
  • Patent number: 10841649
    Abstract: Example apparatus disclosed herein include a return path data classifier to classify a first viewing period associated with segments of return path data received from a set top box into tuning classifications based on the segments of the return path data; calculate a total reported tuning duration for the first viewing period when the first viewing period is classified as live or playback tuning; and compare the total reported tuning duration to a duration threshold to determine whether the segments of return path data associated with the first viewing period are valid. The example apparatus also includes a return path data rectifier to rectify missing tuning data associated with a second viewing period based on tuning data included in the segments of return path data associated with the first viewing period when the segments of the return path data associated with the first viewing period are determined to be valid.
    Type: Grant
    Filed: October 4, 2018
    Date of Patent: November 17, 2020
    Assignee: The Nielsen Company (US), LLC
    Inventors: Balachander Shankar, Jonathan Sullivan, Molly Poppie, John Charles Coughlin, Paul Chimenti, Rachel Worth Olson, Samantha M. Mowrer, David J. Kurzynski, Remy Spoentgen, Christine Heiss, Shuangxing Chen
  • Patent number: 10839158
    Abstract: A method includes detecting, by sensors, a current context associated with an electronic device. The method includes dynamically loading a neural network and selected features into a phrase-spotting audio front-end (AFE) processor. The neural network is configured, based on the current context, with at least one domain having an associated set(s) of trigger words. The method includes detecting, audio content that matches a trigger word from among the sets of trigger words associated with the at least one selected domain. The method includes in response to detecting audio content that matches the trigger word, outputting a wake-up signal to an application processor (AP). The AFE processor utilizes less computational resources than the AP. The method includes, in response to receiving the wake-up signal, the AP waking up and performing additional computation based on the matching trigger word. The method includes outputting results of the additional computation to an output device.
    Type: Grant
    Filed: January 25, 2019
    Date of Patent: November 17, 2020
    Assignee: Motorola Mobility LLC
    Inventors: Zhengping Ji, Rachid Alameh, Michael E. Russell
  • Patent number: 10831824
    Abstract: A device includes a transceiver, a storage device, and a processor. The transceiver receives an audio segment from a remote device, receives a request to communicate the audio segment to another remote device, and communicates the audio segment to the another remote device in response to the request to communicate the audio segment to the another remote device, the audio segment including at least one audio feature extracted from audio recorded by the device. The storage device stores the audio segment. The processor retrieves the audio segment from the storage device in response to the request to communicate the audio segment to the another remote device.
    Type: Grant
    Filed: July 1, 2019
    Date of Patent: November 10, 2020
    Assignee: Koye Corp.
    Inventors: Bosko Ilic, Vanja Jovicevic, Nemanja Zbiljic, Stefan Brajkovic
  • Patent number: 10832668
    Abstract: Techniques for dynamically maintaining speech processing data on a local device for frequently input commands are described. One or more devices receive speech processing data specific to one or more commands associated with system input frequencies satisfying an input frequency threshold. The device(s) then receives input audio corresponding to an utterance and generate input audio data corresponding thereto. The device(s) performs speech recognition processing on input audio data to generate input text data using a portion of the received speech processing data. The device(s) determines a probability score associated with the input text data and determines the probability score satisfies a threshold probability score. The device(s) then performs natural language processing on the input text data to determine the command using a portion of the speech processing data. The device(s) then outputs audio data responsive to the command.
    Type: Grant
    Filed: September 19, 2017
    Date of Patent: November 10, 2020
    Assignee: Amazon Technologies, Inc.
    Inventors: David William Devries, Rajesh Mittal