Recognition Patents (Class 704/231)

Neural network (Class 704/232)

Detect speech in noise (Class 704/233)

Normalizing (Class 704/234)

Speech to image (Class 704/235)

Specialized equations or comparisons (Class 704/236)

Creating patterns for matching (Class 704/243)

Voice recognition (Class 704/246)

Word recognition (Class 704/251)

Systems and methods for conversations with devices about media using interruptions and changes of subjects

Patent number: 11024296

Abstract: Systems and methods are described herein for providing media guidance. Control circuitry may receive a first voice input and access a database of topics to identify a first topic associated with the first voice input. A user interface may generate a first response to the first voice input, and subsequent to generating the first response, the control circuitry may receive a second voice input. The control circuitry may determine a match between the second voice input and an interruption input such as a period of silence or a keyword or a phrase, such as “Ahh,”, “Umm,”, or “Hmm.” The user interface may generate a second response that is associated with a second topic related to the first topic. By interrupting the conversation and changing the subject from time to time, media guidance systems can appear to be more intelligent and human.

Type: Grant

Filed: March 11, 2020

Date of Patent: June 1, 2021

Assignee: Rovi Guides, Inc.

Inventors: Charles Dawes, Walter R. Klappert
Quality feedback on user-recorded keywords for automatic speech recognition systems

Patent number: 11024302

Abstract: Systems and methods are provided for an automated speech recognition system. A microphone records a keyword spoken by a user, and a front end divides the recorded keyword into a plurality of subunits, each containing a segment of recorded audio, and extracts a set of features from each of the plurality of subunits. A decoder assigns one of a plurality of content classes to each of the plurality of subunits according to at least the extracted set of features for each subunit. A quality evaluation component calculates a score representing a quality of the keyword from the content classes assigned to the plurality of subunits.

Type: Grant

Filed: September 15, 2017

Date of Patent: June 1, 2021

Assignee: Texas Instruments Incorporated

Inventors: Tarkesh Pande, Lorin Paul Netsch, David Patrick Magee
Apparatus, system, and method for generating voice recognition guide by transmitting voice signal data to a voice recognition server which contains voice recognition guide information to send back to the voice recognition apparatus

Patent number: 11024312

Abstract: A voice recognition apparatus includes a communication part configured to communicate with a voice recognition server, a voice receiver configured to receive a user's voice signal, a storage part configured to store guide information comprising at least an example command for voice recognition; and a controller. The controller is configured to generate a guide image comprising at least a part of the example command, transmit the received user's voice signal to the voice recognition server through the communication part in response to receiving the user's voice signal by the voice receiver, and update the stored guide information based on update information received through the communication part.

Type: Grant

Filed: March 11, 2020

Date of Patent: June 1, 2021

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Jong-cheol Park, Do-wan Kim, Sang-shin Park
Activation of remote devices in a networked system

Patent number: 11024306

Abstract: The present disclosure is generally directed to the generation of voice-activated data flows in interconnected network. The voice-activated data flows can include input audio signals that include a request and are detected at a client device. The client device can transmit the input audio signal to a data processing system, where the input audio signal can be parsed and passed to the data processing system of a service provider to fulfill the request in the input audio signal. The present solution is configured to conserve network resources by reducing the number of network transmissions needed to fulfill a request.

Type: Grant

Filed: September 14, 2018

Date of Patent: June 1, 2021

Assignee: GOOGLE LLC

Inventors: Gaurav Bhaya, Ulas Kirazci, Bradley Abrams, Adam Coimbra, Ilya Firman, Carey Radebaugh
Systems and methods for capturing, processing, and rendering one or more context-aware moment-associating elements

Patent number: 11024316

Abstract: Computer-implemented method and system for receiving and processing one or more moment-associating elements. For example, the computer-implemented method includes receiving the one or more moment-associating elements, transforming the one or more moment-associating elements into one or more pieces of moment-associating information, and transmitting at least one piece of the one or more pieces of moment-associating information.

Type: Grant

Filed: May 3, 2019

Date of Patent: June 1, 2021

Assignee: Otter.ai, Inc.

Inventors: Yun Fu, Simon Lau, Kaisuke Nakajima, Julius Cheng, Sam Song Liang, James Mason Altreuter, Kean Kheong Chin, Zhenhao Ge, Hitesh Anand Gupta, Xiaoke Huang, James Francis McAteer, Brian Francis Williams, Tao Xing
Communication system, communication method, and computer-readable storage medium

Patent number: 11011167

Abstract: A communication system includes a pair of speech recognition devices that are capable of communicating with each other, each of the speech recognition devices including a speech input section into which speech is input, a speech recognition section that recognizes speech input to the speech input section, and a speech output section that outputs speech. The communication system also includes an information generation section that generates notification information corresponding to speech recognized by the speech recognition section in one speech recognition device from out of the pair of speech recognition devices, and a speech output control section that performs control to output notification speech corresponding to the notification information at a specific timing from the speech output section of the other speech recognition device from out of the pair of speech recognition devices.

Type: Grant

Filed: January 8, 2019

Date of Patent: May 18, 2021

Assignee: Toyota Jidosha Kabushiki Kaisha

Inventors: Hideki Kobayashi, Akihiro Muguruma, Yukiya Sugiyama, Shota Higashihara, Riho Matsuo, Naoki Yamamuro
Custom acoustic models

Patent number: 11011162

Abstract: The technology disclosed relates to performing speech recognition for a plurality of different devices or devices in a plurality of conditions. This includes storing a plurality of acoustic models associated with different devices or device conditions, receiving speech audio including natural language utterances, receiving metadata indicative of a device type or device condition, selecting an acoustic model from the plurality in dependence upon the received metadata, and employing the selected acoustic model to recognize speech from the natural language utterances included in the received speech audio. Each of speech recognition and the storage of acoustic models can be performed locally by devices or on a network-connected server. Also provided is a platform and interface, used by device developers to select, configure, and/or train acoustic models for particular devices and/or conditions.

Type: Grant

Filed: June 1, 2018

Date of Patent: May 18, 2021

Assignee: SOUNDHOUND, INC.

Inventors: Mehul Patel, Keyvan Mohajer
Information processing method, server, terminal, and information processing system

Patent number: 11004445

Abstract: In one embodiment, a smartwatch includes a processor and a memory storing instructions to be executed in the processor. The instructions are configured to cause the processor to obtain input comprising voice information; determine whether the voice information comprises interrogative keyword; and determine that the voice information is interrogative information in response to determining that the voice information comprises interrogative keyword. The instructions are configured to cause the processor to determine whether reply information corresponding to the interrogative information can be obtained from a memory of the smartwatch; and send the interrogative information to a server through a wireless network in response to determining that the reply information corresponding to the interrogative information cannot be obtained from the memory of the smartwatch.

Type: Grant

Filed: May 27, 2017

Date of Patent: May 11, 2021

Assignee: Huawei Technologies Co., Ltd.

Inventors: Yizu Feng, Bin Li
Coding mode determination method and apparatus, audio encoding method and apparatus, and audio decoding method and apparatus

Patent number: 11004458

Abstract: Provided are a method and an apparatus for determining an encoding mode for improving the quality of a reconstructed audio signal. A method of determining an encoding mode includes determining one from among a plurality of encoding modes including a first encoding mode and a second encoding mode as an initial encoding mode in correspondence to characteristics of an audio signal, and if there is an error in the determination of the initial encoding mode, generating a modified encoding mode by modifying the initial encoding mode to a third encoding mode.

Type: Grant

Filed: October 4, 2019

Date of Patent: May 11, 2021

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Ki-hyun Choo, Anton Victorovich Porov, Konstantin Sergeevich Osipov, Nam-suk Lee
Voice profile updating

Patent number: 11004454

Abstract: Techniques for updating voice profiles used to perform user recognition are described. A system may use clustering techniques to update voice profiles. When the system receives audio data representing a spoken user input, the system may store the audio data. Periodically, the system may recall, from storage, audio data (representing previous user inputs). The system may identify clusters of the audio data, with each cluster including similar or identical speech characteristics. The system may determine a cluster is substantially similar to an existing voice profile. If this occurs, the system may create an updated voice profile using the original voice profile and the cluster of audio data.

Type: Grant

Filed: November 6, 2018

Date of Patent: May 11, 2021

Assignee: Amazon Technologies, Inc.

Inventors: Sundararajan Srinivasan, Arindam Mandal, Krishna Subramanian, Spyridon Matsoukas, Aparna Khare, Rohit Prasad
Computer implemented monitoring process for personalized event detection and notification transmission

Patent number: 11005838

Abstract: Systems, methods, and other embodiments associated with a monitoring process for event detection and notification transmission are described. In one embodiment, a method includes configuring a monitoring process with a matching rule used to evaluate data sources of an enterprise computing environment to determine if an event has occurred. The example method may also include executing the monitor process to identify a set of subscribers and establish a trust relationship. The example method may also include, for each subscriber, executing the monitoring process to impersonate a subscriber, execute the matching rule upon data sources accessible to the subscriber to perform a test as to whether the event has occurred, and transmit a message of the event if the event occurred.

Type: Grant

Filed: May 15, 2018

Date of Patent: May 11, 2021

Assignee: Oracle International Corporation

Inventors: Michael Tebben, Haiyan Wang, Nicole Laurent, Qiu Zhong, Aaron Johnson, Darryl M. Shakespeare
Cognitive flow

Patent number: 10991369

Abstract: A system and method obtaining structured information from a conversation including receiving a first input from a user, determining a first set of slots filled based on the first input using natural language processing and a non-linear slot filling algorithm, determining first conversation based on the first set of slots filled, determining a first empty slot associated with the first conversation, prompting the user for a second input, the second input associated with the first empty slot, filling the first empty slot using natural language processing and the non-linear slot filling algorithm, determining that the slots associated with the first conversation are filled; and, responsive to determining that the slots associated with the first conversation are filled, initiating an action associated with the conversation.

Type: Grant

Filed: January 31, 2019

Date of Patent: April 27, 2021

Inventors: Hristo Borisov, Boyko Karadzhov, Ivan Atanasov, Georgi Varzonovtsev
Implementing a classification model for recognition processing

Patent number: 10990902

Abstract: A method, system, and computer program product for learning a recognition model for recognition processing. The method includes preparing one or more examples for learning, each of which includes an input segment, an additional segment adjacent to the input segment and an assigned label. The input segment and the additional segment are extracted from an original training data. A classification model is trained, using the input segment and the additional segment in the examples, to initialize parameters of the classification model so that extended segments including the input segment and the additional segment are reconstructed from the input segment. Then, the classification model is tuned to predict a target label, using the input segment and the assigned label in the examples, based on the initialized parameters. At least a portion of the obtained classification model is included in the recognition model.

Type: Grant

Filed: September 25, 2019

Date of Patent: April 27, 2021

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: Gakuto Kurata
User-guided arbitration of speech processing results

Patent number: 10984788

Abstract: An automatic speech recognition (ASR) system includes at least one processor and a memory storing instructions.

Type: Grant

Filed: March 29, 2018

Date of Patent: April 20, 2021

Assignee: BlackBerry Limited

Inventor: Darrin Kenneth John Fry
ASR training and adaptation

Patent number: 10984801

Abstract: AM and LM parameters to be used for adapting an ASR model are derived for each audio segment of an audio stream comprising multiple audio programs. A set of identifiers, including a speaker identifier, a speaker domain identifier and a program domain identifier, is obtained for each audio segment. The set of identifiers are used to select most suitable AM and LM parameters for the particular audio segment. The embodiments enable provision of maximum constraints on the AMs and LMs and enable adaptation of the ASR model on the fly for audio streams of multiple audio programs, such as broadcast audio. This means that the embodiments enable selecting AM and LM parameters that are most suitable in terms of ASR performance for each audio segment.

Type: Grant

Filed: May 8, 2017

Date of Patent: April 20, 2021

Assignee: Telefonaktiebolaget LM Ericsson (publ)

Inventors: Volodya Grancharov, Erlendur Karlsson, Sigurdur Sverrisson, Maxim Teslenko, Konstantinos Vandikas, Aneta Vulgarakis Feljan
Electronic apparatus for recognizing keyword included in your utterance to change to operating state and controlling method thereof

Patent number: 10978048

Abstract: An apparatus comprising one or more processors, a communication circuit, and a memory for storing instructions, which when executed, performs a method of recognizing a user utterance. The method comprises: receiving first data associated with a user utterance, performing, a first determination to determine whether the user utterance includes the first data and a specified word, performing a second determination to determine whether the first data includes the specified word, transmitting the first data to an external server, receiving a text generated from the first data by the external server, performing a third determination to determine whether the received text matches the specified word, and determining whether to activate the voice-based input system based on the third determination.

Type: Grant

Filed: May 23, 2018

Date of Patent: April 13, 2021

Assignee: Samsung Electronics Co., Ltd.

Inventors: Tae Jin Lee, Young Woo Lee, Seok Yeong Jung, Chakladar Subhojit, Jae Hoon Jeong, Jun Hui Kim, Jae Geun Lee, Hyun Woong Lim, Soo Min Kang, Eun Hye Shin, Seong Min Je
Methods and apparatus for generating clinical reports

Patent number: 10978192

Abstract: Techniques for documenting a clinical procedure involve transcribing audio data comprising audio of one or more clinical personnel speaking while performing the clinical procedure. Examples of applicable clinical procedures include sterile procedures such as surgical procedures, as well as non-sterile procedures such as those conventionally involving a core code reporter. The transcribed audio data may be analyzed to identify relevant information for documenting the clinical procedure, and a text report including the relevant information documenting the clinical procedure may be automatically generated.

Type: Grant

Filed: January 22, 2019

Date of Patent: April 13, 2021

Assignee: Nuance Communications, Inc.

Inventor: Mariana Casella dos Santos
Methods and systems for determining a wake word

Patent number: 10971160

Abstract: A user device (e.g., voice assistant device, voice enabled device, smart device, computing device, etc.) may receive/detect audio content (e.g., speech, etc.) that includes a wake word and/or words similar to a wake word. The user device may require a wake word, a portion of the wake word, or words similar to the wake word to be detected prior to interacting with a user. The user device may, based on characteristics of the audio content, determine if the audio content originates from an authorized user. The user device may decrease and/or increase scrutiny applied to wake word detection based on whether audio content originates from an authorized user.

Type: Grant

Filed: November 13, 2018

Date of Patent: April 6, 2021

Assignee: Comcast Cable Communications, LLC

Inventors: Hans Sayyadi, Nima Bina
Systems and methods for secure commands in vehicles

Patent number: 10957330

Abstract: Systems and methods for control of vehicles are provided. A computer-implemented method in example embodiments may include receiving, at a computing system comprising one or more processors positioned in a vehicle, voice data from one or more audio sensors positioned in the vehicle. The system can determine whether configuration of a reference voiceprint for a speech processing system of the vehicle is authorized based at least in part on performance data associated with the vehicle. In response to determining that configuration of the reference voiceprint is authorized, a first reference voiceprint based on the reference voice data can be stored and the speech processing system configured to authenticate input voice data for a first set of voice commands based on the reference voiceprint.

Type: Grant

Filed: May 31, 2019

Date of Patent: March 23, 2021

Assignee: GE Aviation Systems Limited

Inventors: Stefan Alexander Schwindt, Barry Foye
System and method for providing a genericized medical device architecture

Patent number: 10957446

Abstract: Systems, methods, and computer readable storage medium for providing a genericized medical device architecture common to a plurality of medical devices are disclosed. The architecture may comprise at least one diagnostics module associated with at least one of the plurality of medical devices, wherein the at least one diagnostics module is configured to monitor an operational status of the at least one medical device. At least one hardware abstraction layer may be associated with at least one of the plurality of medical devices, and may be configured to provide abstracted access to hardware of the at least one medical device.

Type: Grant

Filed: August 8, 2016

Date of Patent: March 23, 2021

Assignee: Johnson & Johnson Surgical Vision, Inc.

Inventors: Hou Man Chong, Edith W. Fung, Timothy L. Hunter, Deep K. Mehta
Information processing device and information processing method

Patent number: 10950240

Abstract: There is provided an information processing device and an information processing method that enable a desired voice recognition result to be easily obtained. The information processing device includes a presentation control unit that controls a separation at a time of presenting a recognition result of voice recognition on the basis of context relating to voice recognition. The present technology can be applied, for example, to an information processing device that independently performs voice recognition, a server that performs voice recognition in response to a call from a client and transmits the recognition result to the client, or the client that requests voice recognition to the server, receives the recognition result from the server, and presents the recognition result.

Type: Grant

Filed: August 14, 2017

Date of Patent: March 16, 2021

Assignee: SONY CORPORATION

Inventors: Yuhei Taki, Shinichi Kawano
Metric-based anomaly detection system with evolving mechanism in large-scale cloud

Patent number: 10949283

Abstract: A computer-implemented method is presented for detecting anomalies in dynamic datasets generated in a cloud computing environment. The method includes monitoring a plurality of cloud servers receiving a plurality of data points, employing a two-level clustering training module to generate micro-clusters from the plurality of data points, each of the micro-clusters representing a set of original data from the plurality of data points, employing a detecting module to detect normal data points, abnormal data points, and unknown data points from the plurality of data points via a detection model, employing an evolving module using a different evolving mechanism for each of the normal, abnormal, and unknown data points to evolve the detection model, and generating a system report displayed on a user interface, the system report summarizing the micro-cluster information.

Type: Grant

Filed: November 6, 2018

Date of Patent: March 16, 2021

Assignee: International Business Machines Corporation

Inventors: Jia Wei Yang, Fan Jing Meng
Context-based detection of end-point of utterance

Patent number: 10943606

Abstract: Detecting an end-point of user's voice command or utterance with high accuracy is critical in automatic speech recognition (ASR)-based human machine interface. If an ASR system incorrectly detects an end-point of utterance and transmits this incomplete sentence to other processing blocks for further processing, it is likely the processed result would lead to incorrect interpretation. A method includes selecting a first semantic network based on context of the audio signal and more accurately detecting the end-point of user's utterance included in the audio signal based on the first semantic network and also based on at least one timeout threshold associated with the first semantic network.

Type: Grant

Filed: April 12, 2018

Date of Patent: March 9, 2021

Assignee: QUALCOMM Incorporated

Inventors: Paras Surendra Doshi, Ayush Agarwal, Shri Prakash
Identifying artificial artifacts in input data to detect adversarial attacks

Patent number: 10944767

Abstract: Mechanisms are provided for training a classifier to identify adversarial input data. A neural network processes original input data representing a plurality of non-adversarial original input data and mean output learning logic determines a mean response for each intermediate layer of the neural network based on results of processing the original input data. The neural network processes adversarial input data and layer-wise comparison logic compares, for each intermediate layer of the neural network, a response generated by the intermediate layer based on processing the adversarial input data, to the mean response associated with the intermediate layer, to thereby generate a distance metric for the intermediate layer. The layer-wise comparison logic generates a vector output based on the distance metrics that is used to train a classifier to identify adversarial input data based on responses generated by intermediate layers of the neural network.

Type: Grant

Filed: February 1, 2018

Date of Patent: March 9, 2021

Assignee: International Business Machines Corporation

Inventors: Gaurav Goswami, Sharathchandra Pankanti, Nalini K. Ratha, Richa Singh, Mayank Vatsa
Call summary

Patent number: 10936641

Abstract: A faster and more streamlined system for providing summary and analysis of large amounts of communication data is described. System and methods are disclosed that employ an ontology to automatically summarize communication data and present the summary to the user in a form that does not require the user to listen to the communication data. In one embodiment, the summary is presented as written snippets, or short fragments, of relevant communication data that capture the meaning of the data relating to a search performed by the user. Such snippets may be based on theme and meaning unit identification.

Type: Grant

Filed: May 21, 2018

Date of Patent: March 2, 2021

Assignee: VERINT SYSTEMS LTD.

Inventors: Roni Romano, Galia Zacay, Rahm Fehr
Computer systems exhibiting improved computer speed and transcription accuracy of automatic speech transcription (AST) based on a multiple speech-to-text engines and methods of use thereof

Patent number: 10930287

Abstract: In some embodiments, an exemplary inventive system for improving computer speed and accuracy of automatic speech transcription includes at least components of: a computer processor configured to perform: generating a recognition model specification for a plurality of distinct speech-to-text transcription engines; where each distinct speech-to-text transcription engine corresponds to a respective distinct speech recognition model; receiving at least one audio recording representing a speech of a person; segmenting the audio recording into a plurality of audio segments; determining a respective distinct speech-to-text transcription engine to transcribe a respective audio segment; receiving, from the respective transcription engine, a hypothesis for the respective audio segment; accepting the hypothesis to remove a need to submit the respective audio segment to another distinct speech-to-text transcription engine, resulting in the improved computer speed and the accuracy of automatic speech transcription; and ge

Type: Grant

Filed: December 3, 2018

Date of Patent: February 23, 2021

Inventors: Tejas Shastry, Matthew Goldey, Svyat Vergun
Speech recognition using neural networks

Patent number: 10930271

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition using neural networks. A feature vector that models audio characteristics of a portion of an utterance is received. Data indicative of latent variables of multivariate factor analysis is received. The feature vector and the data indicative of the latent variables is provided as input to a neural network. A candidate transcription for the utterance is determined based on at least an output of the neural network.

Type: Grant

Filed: September 17, 2019

Date of Patent: February 23, 2021

Inventors: Andrew W. Senior, Ignacio Lopez Moreno
Intent recognition method based on deep learning network

Patent number: 10916242

Abstract: The present invention relates to the field of intelligent recognition, and discloses an intent recognition method based on a deep learning network, resolving a technical problem that accuracy of intent recognition is not high.

Type: Grant

Filed: March 26, 2020

Date of Patent: February 9, 2021

Assignee: NANJING SILICON INTELLIGENCE TECHNOLOGY CO., LTD.

Inventors: Huapeng Sima, Ao Yao
Command confirmation for a media playback device

Patent number: 10908873

Abstract: A system and method for confirming a voice command of a media playback device is disclosed. The method includes receiving an instruction of a voice command and producing an audio confirmation of the command. A confirmation may be playing a media context item associated with the command, playing a verbal confirmation phrase, or playing a non-verbal audio cue.

Type: Grant

Filed: May 7, 2018

Date of Patent: February 2, 2021

Assignee: Spotify AB

Inventors: Emma-Camelia Gosu, Daniel Bromand, Karl Humphreys
Variable latency device coordination

Patent number: 10892996

Abstract: Systems and processes for operating an intelligent automated assistant are provided. In one example process, an event associated with an audio input is detected with a first process. In accordance with a detection of the event, a delay value associated with an electronic device is determined. The delay value corresponds to a time required to determine, with a second process, whether the audio input includes a spoken trigger. In accordance with a determination that the delay value exceeds a threshold, the delay value is broadcast during a first advertising session, and determination is made, during a second advertising session, whether the electronic device is to respond to the audio input. In accordance with a determination that the threshold is not exceeded, a determination is made, during the first advertising session, whether the electronic device is to respond to the audio input or wait for the second advertising session.

Type: Grant

Filed: August 31, 2018

Date of Patent: January 12, 2021

Assignee: Apple Inc.

Inventor: Kurt Piersol
Speech recognition using phoneme matching

Patent number: 10885918

Abstract: A system, method and computer program is provided for generating customized text representations of audio commands. A first speech recognition module may be used for generating a first text representation of an audio command based on a general language grammar. A second speech recognition module may be used for generating a second text representation of the audio command, the second module including a custom language grammar that may include contacts for a particular user. Entity extraction is applied to the second text representation and the entities are checked against a file containing personal language. If the entities are found in the user-specific language, the two text representations may be fused into a combined text representation and named entity recognition may be performed again to extract further entities.

Type: Grant

Filed: September 18, 2014

Date of Patent: January 5, 2021

Assignee: Microsoft Technology Licensing, LLC

Inventors: Wilson Hsu, Kaheer Suleman, Joshua Pantony
System and method for assisting an agent during a client interaction

Patent number: 10887462

Abstract: A computing system, method and non-transitory computer readable memory are provided, to assist an agent during a client interaction between the agent and a client over a communications channel. An agent station may generate a graphic user interface (GUI) of the client interaction during the client interaction, the GUI displaying a current identified keyword and one or more interaction phases, each interaction phase having a respective current phase score for the client interaction. A keyword and associated keyword information from the client interaction may be received, including phase and corresponding phase score information, and the GUI updated with the currently identified keyword and newly received phase information accounting for the received corresponding phase score information. A situation report may be generated for a designated party, the situation report including an agent identification, and client interaction information including comments entered by the agent relating to the client interaction.

Type: Grant

Filed: April 9, 2019

Date of Patent: January 5, 2021

Assignee: West Corporation

Inventors: Daniel A. Coyer, Ryan L. Techlin, Jeremy T. Tellock, Dennis C. White, Shelley A. Wildenberg
Systems and methods for real-time neural text-to-speech

Patent number: 10872598

Abstract: Embodiments of a production-quality text-to-speech (TTS) system constructed from deep neural networks are described. System embodiments comprise five major building blocks: a segmentation model for locating phoneme boundaries, a grapheme-to-phoneme conversion model, a phoneme duration prediction model, a fundamental frequency prediction model, and an audio synthesis model. For embodiments of the segmentation model, phoneme boundary detection was performed with deep neural networks using Connectionist Temporal Classification (CTC) loss. For embodiments of the audio synthesis model, a variant of WaveNet was created that requires fewer parameters and trains faster than the original. By using a neural network for each component, system embodiments are simpler and more flexible than traditional TTS systems, where each component requires laborious feature engineering and extensive domain expertise. Inference with system embodiments may be performed faster than real time.

Type: Grant

Filed: January 29, 2018

Date of Patent: December 22, 2020

Assignee: Baidu USA LLC

Inventors: Sercan O. Arik, Mike Chrzanowski, Adam Coates, Gregory Diamos, Andrew Gibiansky, John Miller, Andrew Ng, Jonathan Raiman, Shubhahrata Sengupta, Mohammad Shoeybi
Cold fusing sequence-to-sequence models with language models

Patent number: 10867595

Abstract: Described herein are systems and methods for generating natural language sentences with Sequence-to-sequence (Seq2Seq) models with attention. The Seq2Seq models may be implemented in applications, such as machine translation, image captioning, and speech recognition. Performance has further been improved by leveraging unlabeled data, often in the form of a language models. Disclosed herein are “Cold Fusion” architecture embodiments that leverage a pre-trained language model during training. The Seq2Seq models with Cold Fusion embodiments are able to better utilize language information enjoying faster convergence, better generalization, and almost complete transfer to a new domain while using less labeled training data.

Type: Grant

Filed: March 6, 2018

Date of Patent: December 15, 2020

Assignee: Baidu USA LLC

Inventors: Anuroop Sriram, Heewoo Jun, Sanjeev Satheesh, Adam Coates
Recorded media hotword trigger suppression

Patent number: 10867600

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for hotword trigger suppression are disclosed. In one aspect, a method includes the actions of receiving, by a microphone of a computing device, audio corresponding to playback of an item of media content, the audio including an utterance of a predefined hotword that is associated with performing an operation on the computing device. The actions further include processing the audio. The actions further include in response to processing the audio, suppressing performance of the operation on the computing device.

Type: Grant

Filed: October 31, 2017

Date of Patent: December 15, 2020

Assignee: Google LLC

Inventors: Alexander H. Gruenstein, Johan Schalkwyk, Matthew Sharifi
Interleaved host reset and next re-initialization operations

Patent number: 10860333

Abstract: Embodiments of the present disclosure seek to mitigate the timing issues of prior approaches by performing the NVMe device reset and post-reset re-initialization in parallel. In embodiments, the NVMe device reset and re-initialization operations are logically divided into front-end and back-end operations that may be carried out in parallel. Upon receipt of the command from a host to reset, the NVMe device carries out front-end reset operations for resetting the device, and in parallel performing back-end reinitialization operations. Once the front-end reset operations are complete, or after a predetermined period of time, the NVMe device reports to the host that the device reset is complete, while back-end operations continue. Once all reset and reinitialization operations are complete, the NVMe device may continue to conduct I/O instructions from the host.

Type: Grant

Filed: October 14, 2019

Date of Patent: December 8, 2020

Assignee: WESTERN DIGITAL TECHNOLOGIES, INC.

Inventor: Shay Benisty
System and method for dynamic trend clustering

Patent number: 10860801

Abstract: A method includes extracting a keyword and a slot from a natural language input, where the slot includes information. The method includes determining whether the keyword corresponds to one of a plurality of formation groups. In response to determining that the keyword corresponds to a specific formation group, the method includes updating metadata of the specific formation group with the information of the slot. In response to determining that the keyword does not correspond to any of the formation groups, the method includes determining whether the keyword corresponds to one of a plurality of clusters. In response to determining that the keyword corresponds to a specific cluster, the method includes updating the specific cluster with the information of the slot. In response to determining that the keyword does not correspond to any of the clusters, the method includes creating an additional formation group that includes the keyword and the slot.

Type: Grant

Filed: January 15, 2019

Date of Patent: December 8, 2020

Assignee: Samsung Electronics Co., Ltd.

Inventors: Anil Yadav, Melvin Lobo, Chutian Wang
Methods, devices and computer-readable storage media for real-time speech recognition

Patent number: 10854193

Abstract: Methods, apparatuses, devices and computer-readable storage media for real-time speech recognition are provided. The method includes: based on an input speech signal, obtaining truncating information for truncating a sequence of features of the speech signal; based on the truncating information, truncating the sequence of features into a plurality of subsequences; and for each subsequence in the plurality of subsequences, obtaining a real-time recognition result through attention mechanism.

Type: Grant

Filed: February 6, 2019

Date of Patent: December 1, 2020

Assignee: BAIDU ONLINE NETWORK TECHNOLOGY (BEIJING) CO., LTD.

Inventors: Xiaoyin Fu, Jinfeng Bai, Zhijie Chen, Mingxin Liang, Xu Chen, Lei Jia
Method and apparatus for expressing time in an output text

Patent number: 10853584

Abstract: Methods, apparatuses, and computer program products are described herein that are configured to express a time in an output text. In some example embodiments, a method is provided that comprises identifying a time period to be described linguistically in an output text. The method of this embodiment may also include identifying a communicative context for the output text. The method of this embodiment may also include determining one or more temporal reference frames that are applicable to the time period and a domain defined by the communicative context. The method of this embodiment may also include generating a phrase specification that linguistically describes the time period based on the descriptor that is defined by a temporal reference frame of the one or more temporal reference frames. In some examples, the descriptor specifies a time window that is inclusive of at least a portion of the time period to be described linguistically.

Type: Grant

Filed: April 19, 2019

Date of Patent: December 1, 2020

Assignee: ARRIA DATA2TEXT LIMITED

Inventors: Gowri Somayajulu Sripada, Neil Burnett
Systems and methods for vehicle assistance

Patent number: 10852720

Abstract: Embodiments are disclosed for an example vehicle or driver assistance system for a vehicle. The example vehicle or driver assistance system includes an in-vehicle computing system of a vehicle, the in-vehicle computing system comprising an external device interface communicatively connecting the in-vehicle computing system to a mobile device, an inter-vehicle system communication module communicatively connecting the in-vehicle computing system to one or more vehicle systems of the vehicle, a processor, and a storage device storing instructions executable by the processor to receive a first command from the mobile device via the external device interface, perform a series of actions on the vehicle system until receiving a second command from the mobile device, both of the first command and the second command recognized by the mobile device based on one or more of voice commands issued by a user of the mobile device, and biometric analysis.

Type: Grant

Filed: February 10, 2016

Date of Patent: December 1, 2020

Assignee: HARMAN INTERNATIONAL INDUSTRIES, INCORPORATED

Inventor: Yogesh Devidas Dusane
Speaking classification using audio-visual data

Patent number: 10846522

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating predictions for whether a target person is speaking during a portion of a video. In one aspect, a method includes obtaining one or more images which each depict a mouth of a given person at a respective time point. The images are processed using an image embedding neural network to generate a latent representation of the images. Audio data corresponding to the images is processed using an audio embedding neural network to generate a latent representation of the audio data. The latent representation of the images and the latent representation of the audio data is processed using a recurrent neural network to generate a prediction for whether the given person is speaking.

Type: Grant

Filed: October 16, 2018

Date of Patent: November 24, 2020

Assignee: Google LLC

Inventors: Sourish Chaudhuri, Ondrej Klejch, Joseph Edward Roth
Multi-modal speech localization

Patent number: 10847162

Abstract: Multi-modal speech localization is achieved using image data captured by one or more cameras, and audio data captured by a microphone array. Audio data captured by each microphone of the array is transformed to obtain a frequency domain representation that is discretized in a plurality of frequency intervals. Image data captured by each camera is used to determine a positioning of each human face. Input data is provided to a previously-trained, audio source localization classifier, including: the frequency domain representation of the audio data captured by each microphone, and the positioning of each human face captured by each camera in which the positioning of each human face represents a candidate audio source. An identified audio source is indicated by the classifier based on the input data that is estimated to be the human face from which the audio data originated.

Type: Grant

Filed: June 27, 2018

Date of Patent: November 24, 2020

Assignee: Microsoft Technology Licensing, LLC

Inventors: Eyal Krupka, Xiong Xiao
Biometrics transaction processing

Patent number: 10846699

Abstract: Embodiments of the invention are directed to systems and methods for biometrics transaction processing. A location of a device associated with a user may be determined. A reference to a biometric data model associated with the user stored within a database may be retrieved, based at least in part on the location. Biometric data may be received from the user. Using the reference, the biometric data may be compared to the biometric data model stored within the database. A determination may be made whether the user is authenticated for the transaction based on the comparing step.

Type: Grant

Filed: October 5, 2018

Date of Patent: November 24, 2020

Assignee: Visa International Service Association

Inventors: John F. Sheets, Kim R. Wagner, Mark A. Nelsen
Detection of TV state using sub-audible signal

Patent number: 10847176

Abstract: A computer-implemented method includes receiving, at a microphone of a voice-controlled device, a speech input, generating an electrical signal having a first gain level that is below a gain threshold for audible detection by a user, transmitting the electrical signal to the speaker and detecting, by the microphone, an audio signal that includes a combination of ambient noise and a probe audio signal, wherein the probe audio signal is output by the speaker based on the electrical signal. The method further includes determining a power level of the probe audio signal and determining a state of the display based on the power level of the probe audio signal.

Type: Grant

Filed: March 12, 2018

Date of Patent: November 24, 2020

Assignee: Amazon Technologies, Inc.

Inventors: Trausti Thor Kristjansson, Srivatsan Kandadai, Mark Lawrence, Balsa Laban, Anna Chen Santos, Joseph Pedro Tavares, Miroslav Ristic, Valere Joseph Vanderschaegen
Methods and apparatus for identification and analysis of temporally differing corpora

Patent number: 10847144

Abstract: Differences are identified, at the lexical unit and/or phrase level, between time-varying corpora. A corpus for a time period of interest is compared with a reference corpus. N-grams are generated for both the corpus of interest and reference corpus. Numbers of occurrences are counted. An average number of occurrences, for each n-gram of the reference corpus, is determined. A difference value, between number of occurrences in corpus of interest and average number of occurrences, is determined. Each difference value is normalized. N-grams can be selected for display, or for further processing, on the basis of the normalized difference value. Further processing can include selecting a sample period. A plurality of reference corpora are produced, where a begin time, for each sub-corpus of the plurality of reference corpora, differs, from a begin time for the corpus of interest, by an integer multiple of the sample period. Word Cloud visualization is shown.

Type: Grant

Filed: September 15, 2015

Date of Patent: November 24, 2020

Assignee: NetBase Solutions, Inc.

Inventors: Jens Erik Tellefsen, Ranjeet Singh Bhatia
Communicating mapping application data between electronic devices

Patent number: 10848544

Abstract: The present disclosure relates to systems and processes for efficiently communicating mapping application data between electronic devices. In one example, a first electronic device can act as a proxy between a second electronic device and a map server by receiving a first request for map data from the second user device, determining a set of supplemental data to add to the first request to generate a complete second request for map data, and transmitting the second request to a map server. The first electronic device can receive the requested map data from the map server and transmit the received map data to the second electronic device. In another example, the first electronic device can act as a navigation server for the second electronic device by initially transmitting a full set of route data to the second electronic device and subsequently transmitting route update messages to the second electronic device.

Type: Grant

Filed: September 1, 2015

Date of Patent: November 24, 2020

Assignee: Apple Inc.

Inventors: Aroon Pahwa, Matthew B. Ball
Methods and apparatus to calibrate return path data for audience measurement

Patent number: 10841649

Abstract: Example apparatus disclosed herein include a return path data classifier to classify a first viewing period associated with segments of return path data received from a set top box into tuning classifications based on the segments of the return path data; calculate a total reported tuning duration for the first viewing period when the first viewing period is classified as live or playback tuning; and compare the total reported tuning duration to a duration threshold to determine whether the segments of return path data associated with the first viewing period are valid. The example apparatus also includes a return path data rectifier to rectify missing tuning data associated with a second viewing period based on tuning data included in the segments of return path data associated with the first viewing period when the segments of the return path data associated with the first viewing period are determined to be valid.

Type: Grant

Filed: October 4, 2018

Date of Patent: November 17, 2020

Assignee: The Nielsen Company (US), LLC

Inventors: Balachander Shankar, Jonathan Sullivan, Molly Poppie, John Charles Coughlin, Paul Chimenti, Rachel Worth Olson, Samantha M. Mowrer, David J. Kurzynski, Remy Spoentgen, Christine Heiss, Shuangxing Chen
Dynamically loaded phrase spotting audio-front end

Patent number: 10839158

Abstract: A method includes detecting, by sensors, a current context associated with an electronic device. The method includes dynamically loading a neural network and selected features into a phrase-spotting audio front-end (AFE) processor. The neural network is configured, based on the current context, with at least one domain having an associated set(s) of trigger words. The method includes detecting, audio content that matches a trigger word from among the sets of trigger words associated with the at least one selected domain. The method includes in response to detecting audio content that matches the trigger word, outputting a wake-up signal to an application processor (AP). The AFE processor utilizes less computational resources than the AP. The method includes, in response to receiving the wake-up signal, the AP waking up and performing additional computation based on the matching trigger word. The method includes outputting results of the additional computation to an output device.

Type: Grant

Filed: January 25, 2019

Date of Patent: November 17, 2020

Assignee: Motorola Mobility LLC

Inventors: Zhengping Ji, Rachid Alameh, Michael E. Russell
Audio segment based and/or compilation based social networking platform

Patent number: 10831824

Abstract: A device includes a transceiver, a storage device, and a processor. The transceiver receives an audio segment from a remote device, receives a request to communicate the audio segment to another remote device, and communicates the audio segment to the another remote device in response to the request to communicate the audio segment to the another remote device, the audio segment including at least one audio feature extracted from audio recorded by the device. The storage device stores the audio segment. The processor retrieves the audio segment from the storage device in response to the request to communicate the audio segment to the another remote device.

Type: Grant

Filed: July 1, 2019

Date of Patent: November 10, 2020

Assignee: Koye Corp.

Inventors: Bosko Ilic, Vanja Jovicevic, Nemanja Zbiljic, Stefan Brajkovic
Dynamic speech processing

Patent number: 10832668

Abstract: Techniques for dynamically maintaining speech processing data on a local device for frequently input commands are described. One or more devices receive speech processing data specific to one or more commands associated with system input frequencies satisfying an input frequency threshold. The device(s) then receives input audio corresponding to an utterance and generate input audio data corresponding thereto. The device(s) performs speech recognition processing on input audio data to generate input text data using a portion of the received speech processing data. The device(s) determines a probability score associated with the input text data and determines the probability score satisfies a threshold probability score. The device(s) then performs natural language processing on the input text data to determine the command using a portion of the speech processing data. The device(s) then outputs audio data responsive to the command.

Type: Grant

Filed: September 19, 2017

Date of Patent: November 10, 2020

Assignee: Amazon Technologies, Inc.

Inventors: David William Devries, Rajesh Mittal

prev 1 2 3 4 5 6 7 8 9 … next