Probability Patents (Class 704/240)

Reducing insertion errors in neural transducer-based automatic speech recognition

Patent number: 12367866

Abstract: Techniques for training a neural transducer-based automatic speech recognition model to be robust against background additive noise and thereby reducing insertion errors. In one aspect, a method of training an automatic speech recognition model includes: generating a modified training data set from an initial training dataset by concatenating one-word utterances with a preceding or a succeeding sentence in the initial training dataset based on a duration of silence between the one-word utterances and the preceding or the succeeding sentence; and training the automatic speech recognition model using the modified training data set.

Type: Grant

Filed: January 12, 2023

Date of Patent: July 22, 2025

Assignee: International Business Machines Corporation

Inventor: Takashi Fukuda
Using corrections, of automated assistant functions, for training of on-device machine learning models

Patent number: 12272360

Abstract: Processor(s) of a client device can: receive sensor data that captures environmental attributes of an environment of the client device; process the sensor data using a machine learning model to generate a predicted output that dictates whether one or more currently dormant automated assistant functions are activated; making a decision as to whether to trigger the one or more currently dormant automated assistant functions; subsequent to making the decision, determining that the decision was incorrect; and in response to determining that the determination was incorrect, generating a gradient based on comparing the predicted output to ground truth output. In some implementations, the generated gradient is used, by processor(s) of the client device, to update weights of the on-device speech recognition model. In some implementations, the generated gradient is additionally or alternatively transmitted to a remote system for use in remote updating of global weights of a global speech recognition model.

Type: Grant

Filed: May 7, 2024

Date of Patent: April 8, 2025

Assignee: GOOGLE LLC

Inventors: Françoise Beaufays, Rajiv Mathews, Dragan Zivkovic, Kurt Partridge, Andrew Hard
Multi-device output management

Patent number: 12249333

Abstract: A system is provided for modifying how an output is presented via a multi-device synchronous configuration based on detecting a speech characteristic in the user input. For example, if the user whispers a request, then the system may temporarily modify how the responsive output is presented to the user via multiple devices. In one example, the system may lower the volume on all devices presented the output. In another example, the system may present the output via a single device rather than multiple devices. The system may also determine to operate in an alternate output mode based on certain non-audio data.

Type: Grant

Filed: April 5, 2024

Date of Patent: March 11, 2025

Assignee: Amazon Technologies, Inc.

Inventor: Ezekiel Wade Sanborn de Asis
Language model biasing system

Patent number: 12183328

Abstract: Methods, systems, and apparatus for receiving audio data corresponding to a user utterance and context data, identifying an initial set of one or more n-grams from the context data, generating an expanded set of one or more n-grams based on the initial set of n-grams, adjusting a language model based at least on the expanded set of n-grams, determining one or more speech recognition candidates for at least a portion of the user utterance using the adjusted language model, adjusting a score for a particular speech recognition candidate determined to be included in the expanded set of n-grams, determining a transcription of user utterance that includes at least one of the one or more speech recognition candidates, and providing the transcription of the user utterance for output.

Type: Grant

Filed: May 16, 2023

Date of Patent: December 31, 2024

Assignee: Google LLC

Inventors: Petar Aleksic, Pedro J. Moreno Mengibar
Improving custom keyword spotting system accuracy with text-to-speech-based data augmentation

Patent number: 12159627

Abstract: The present disclosure provides methods and apparatus for optimizing a keyword spotting system. A set of utterance texts including a given keyword may be generated. A set of speech signals corresponding to the set of utterance texts may be synthesized. An acoustic model in the keyword spotting system may be optimized with at least a part of speech signals in the set of speech signals and utterance texts in the set of utterance texts corresponding to the at least a part of speech signals.

Type: Grant

Filed: June 12, 2020

Date of Patent: December 3, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Yao Tian, Yuija Xiao, Edward Lin, Lei He, Hui Zhu
Privacy-preserving social interaction measurement

Patent number: 12119019

Abstract: Various systems, devices, and methods for social interaction measurement that preserve privacy are presented. An audio signal can be captured using a microphone. The audio signal can be processed using an audio-based machine learning model that is trained to detect the presence of speech. The audio signal can be discarded such that content of the audio signal is not stored after the audio signal is processed using the machine learning model. An indication of whether speech is present within the audio signal can be output based at least in part on processing the audio signal using the audio-based machine learning model.

Type: Grant

Filed: January 18, 2022

Date of Patent: October 15, 2024

Assignee: Google LLC

Inventors: Julian Maclaren, Karolis Misiunas, Vahe Tshitoyan, Brian Foo, Kelly Dobson
Sound source localization with co-located sensor elements

Patent number: 12075210

Abstract: A system includes a plurality of acoustic sensor elements co-located with one another, each acoustic sensor element of the plurality of acoustic sensor elements being configured to generate a signal representative of sound incident upon the plurality of acoustic sensor elements, and a processor configured to determine data indicative of a location of a source of the sound based on the signals representative of the incident sound. The plurality of acoustic sensor elements include a directional acoustic sensor element configured to generate a signal representative of a directional component of the sound.

Type: Grant

Filed: October 5, 2020

Date of Patent: August 27, 2024

Assignee: Soundskrit Inc.

Inventors: Frederic Francois Leon Lepoutre, Sahil Kumar Gupta, Lucas Henrique Teixeira Carneiro
Speculative execution of dataflow program nodes

Patent number: 12032964

Abstract: A computer-implemented method is presented. The method comprises sequentially receiving a plurality of utterance prefixes, each sequentially received utterance prefix derived from a progressively longer incomplete portion of a full user utterance. For each sequentially received utterance prefix, a complete dataflow program is predicted based on the utterance prefix. The complete dataflow program includes a plurality of program nodes to be executed to satisfy the full user utterance. One or more program nodes are selected from the predicted complete dataflow program to speculatively execute based on at least the utterance prefix.

Type: Grant

Filed: July 28, 2022

Date of Patent: July 9, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Jason Michael Eisner, Samuel McIntire Thomson, Michael Jack Newman, Emmanouil Antonios Platanios, Jiawei Zhou
Multi-device output management based on speech characteristics

Patent number: 12002469

Abstract: A system is provided for modifying how an output is presented via a multi-device synchronous configuration based on detecting a speech characteristic in the user input. For example, if the user whispers a request, then the system may temporarily modify how the responsive output is presented to the user via multiple devices. In one example, the system may lower the volume on all devices presented the output. In another example, the system may present the output via a single device rather than multiple devices. The system may also determine to operate in a alternate output mode based on certain non-audio data.

Type: Grant

Filed: July 5, 2023

Date of Patent: June 4, 2024

Assignee: Amazon Technologies, Inc.

Inventor: Ezekiel Wade Sanborn de Asis
Dynamically adapting assistant responses

Patent number: 11875790

Abstract: Techniques are disclosed that enable dynamically adapting an automated assistant response using a dynamic familiarity measure. Various implementations process received user input to determine at least one intent, and generate a familiarity measure by processing intent specific parameters and intent agnostic parameters using a machine learning model. An automated assistant response is then determined that is based on the intent and that is based on the familiarity measure. The assistant response is responsive to the user input, and is adapted to the familiarity measure. For example, the assistant response can be more abbreviated and/or more resource efficient as the familiarity measure becomes more indicative of familiarity.

Type: Grant

Filed: March 1, 2019

Date of Patent: January 16, 2024

Assignee: GOOGLE LLC

Inventors: Tanya Kraljic, Miranda Callahan, Yi Sing Alex Loh, Darla Sharp
Touch-based method for user authentication

Patent number: 11861933

Abstract: A touch-based method for user authentication includes a training stage and an authentication stage. The training stage includes: generating, by a touch interface, a plurality of training touch parameters; and generating, by a processor, a training heat map according to the plurality of training touch parameters. The authentication stage includes: generating, by the touch interface, a plurality of testing touch parameters; generating, by the processor, a testing heat map according to the plurality of testing touch parameters; comparing, by the processor, the testing heat map with the training heat map to generate an error map; and generating, by the processor, one of pass signal and fail signal according to the error map.

Type: Grant

Filed: December 10, 2021

Date of Patent: January 2, 2024

Assignees: INVENTEC (PUDONG) TECHNOLOGY CORPORATION, INVENTEC CORPORATION

Inventor: Trista Pei-Chun Chen
High-quality non-parallel many-to-many voice conversion

Patent number: 11854562

Abstract: A method (and structure and computer product) to permit zero-shot voice conversion with non-parallel data includes receiving source speaker speech data as input data into a content encoder of a style transfer autoencoder system, the content encoder providing a source speaker disentanglement of the source speaker speech data by reducing speaker style information of the input source speech data while retaining content information and receiving target speaker input speech as input data into a target speaker encoder. The output of the content encoder and the target speaker encoder are combined in a decoder of the style transfer autoencoder, and the output of the decoder provides the content information of the input source speech data in a style of the target speaker speech information.

Type: Grant

Filed: May 14, 2019

Date of Patent: December 26, 2023

Assignee: International Business Machines Corporation

Inventors: Yang Zhang, Shiyu Chang
Multi-device output management based on speech characteristics

Patent number: 11783833

Abstract: A system is provided for modifying how an output is presented via a multi-device synchronous configuration based on detecting a speech characteristic in the user input. For example, if the user whispers a request, then the system may temporarily modify how the responsive output is presented to the user via multiple devices. In one example, the system may lower the volume on all devices presented the output. In another example, the system may present the output via a single device rather than multiple devices. The system may also determine to operate in a alternate output mode based on certain non-audio data.

Type: Grant

Filed: June 27, 2022

Date of Patent: October 10, 2023

Assignee: Amazon Technologies, Inc.

Inventor: Ezekiel Wade Sanborn de Asis
Processing multiple intents from an audio stream in a virtual reality application

Patent number: 11776560

Abstract: A method for processing multiple intents from an audio stream in a virtual reality application may include multiple steps, including: receiving a stream of words as a first utterance; processing the first utterance before the stream of words is fully received; based on the processing, determining a first intent from the first utterance before the stream of words is fully received; determining occurrence of a pause after the first utterance; and receiving a second stream of words as a second utterance, the second stream being received after the determined pause.

Type: Grant

Filed: October 13, 2022

Date of Patent: October 3, 2023

Assignee: Health Scholars Inc.

Inventors: Brian Philip Gillett, Akmal Hisyam Idris, James Oliver Lussier, Dustin Richard Parham, Kit Lee Burgess
Selective text prediction for electronic messaging

Patent number: 11755834

Abstract: A computing system is described that includes user interface components configured to receive typed user input; and one or more processors. The one or more processors are configured to: receive, by a computing system and at a first time, a first portion of text typed by a user in an electronic message being edited; predict, based on the first portion of text, a first candidate portion of text to follow the first portion of text; output, for display, the predicted first candidate portion of text for optional selection to append to the first portion of text; determine, at a second time that is after the first time, that the electronic message is directed to a sensitive topic; and responsive to determining that the electronic message is directed to a sensitive topic, refrain from outputting subsequent candidate portions of text for optional selection to append to text in the electronic message.

Type: Grant

Filed: December 22, 2017

Date of Patent: September 12, 2023

Assignee: Google LLC

Inventors: Paul Roland Lambert, Timothy Youngjin Sohn, Jacqueline Amy Tsay, Gagan Bansal, Cole Austin Bevis, Kaushik Roy, Justin Tzi-jay Lu, Katherine Anna Evans, Tobias Bosch, Yinan Wang, Matthew Vincent Dierker, Gregory Russell Bullock, Ettore Randazzo, Tobias Kaufmann, Yonghui Wu, Benjamin N. Lee, Xu Chen, Brian Strope, Yun-hsuan Sung, Do Kook Choe, Rami Eid Sammouf Al-Rfou'
Systems and methods for automatic speech recognition based on graphics processing units

Patent number: 11741967

Abstract: An automatic speech recognition system and a method thereof are provided. The system includes an encoder and a decoder. The encoder comprises a plurality of encoder layers. At least one encoder layer includes a plurality of encoder sublayers fused into one or more encoder kernels. The system further comprises a first pair of ping-pong buffers communicating with the one or more encoder kernels. The decoder comprises a plurality of decoder layers. At least one decoder layer includes a plurality of decoder sublayers fused into one or more decoder kernels. The decoder receives a decoder output related to the encoder output and generates a decoder output. The encoder sends the decoder output to a beam search kernel.

Type: Grant

Filed: January 4, 2021

Date of Patent: August 29, 2023

Assignee: KWAI INC.

Inventors: Yongxiong Ren, Heng Liu, Yang Liu, Lingzhi Liu, Jie Li, Yuanyuan Zhao, Xiaorui Wang
Predictive injection of conversation fillers for assistant systems

Patent number: 11704900

Abstract: In one embodiment, a method includes, by a client system, receiving, at the client system, a first user input, processing by the client system, the first user input to provide an initial response by identifying one or more entities referenced by the first user input and providing, by the client system, the initial response, where the initial response includes a conversational filler referencing at least one of the one or more identified entities, processing the first user input to provide a complete response by identifying, by the client system, one or more intents and one or more slots associated with the first user input based on a semantic analysis by a natural-language understanding module, and providing, by the client system, the complete response subsequent to the initial response, where the complete response is based on the one or more intents and the one or more slots.

Type: Grant

Filed: February 7, 2022

Date of Patent: July 18, 2023

Assignee: Meta Platforms, Inc.

Inventors: Emmanouil Koukoumidis, Michael Robert Hanson, Mohsen M Agsen
Language and grammar model adaptation using model weight data

Patent number: 11705116

Abstract: Systems and methods described herein relate to adapting a language model for automatic speech recognition (ASR) for a new set of words. Instead of retraining the ASR models, language models and grammar models, the system only modifies one grammar model and ensures its compatibility with the existing models in the ASR system.

Type: Grant

Filed: August 18, 2021

Date of Patent: July 18, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Ankur Gandhe, Ariya Rastrow, Gautam Tiwari, Ashish Vishwanath Shenoy, Chun Chen
Language model biasing system

Patent number: 11682383

Abstract: Methods, systems, and apparatus for receiving audio data corresponding to a user utterance and context data, identifying an initial set of one or more n-grams from the context data, generating an expanded set of one or more n-grams based on the initial set of n-grams, adjusting a language model based at least on the expanded set of n-grams, determining one or more speech recognition candidates for at least a portion of the user utterance using the adjusted language model, adjusting a score for a particular speech recognition candidate determined to be included in the expanded set of n-grams, determining a transcription of user utterance that includes at least one of the one or more speech recognition candidates, and providing the transcription of the user utterance for output.

Type: Grant

Filed: June 2, 2021

Date of Patent: June 20, 2023

Assignee: Google LLC

Inventors: Petar Aleksic, Pedro J. Moreno Mengibar
Processing multimodal user input for assistant systems

Patent number: 11676220

Abstract: In one embodiment, a method includes receiving a user input based on a plurality of modalities at the client system, wherein at least one of the modalities of the user input is a visual modality, determining one or more subjects and one or more attributes associated with the one or more subjects, respectively, based on the visual modality of the user input, resolving one or more entities corresponding to the one or more subjects based on the determined one or more attributes, and presenting a communication content at the client system responsive to the user input, wherein the communication content comprises information associated with executing results of one or more tasks corresponding to the one or more resolved entities.

Type: Grant

Filed: January 25, 2021

Date of Patent: June 13, 2023

Assignee: Meta Platforms, Inc.

Inventors: Vivek Natarajan, Shawn C. P. Mei, Zhengping Zuo
Word correction using automatic speech recognition (ASR) incremental response

Patent number: 11651775

Abstract: An exemplary automatic speech recognition (ASR) system may receive an audio input including a segment of speech. The segment of speech may be independently processed by general ASR and domain-specific ASR to generate multiple ASR results. A selection between the multiple ASR results may be performed based on respective confidence levels for the general ASR and domain-specific ASR. As incremental ASR is performed, a composite result may be generated based on general ASR and domain-specific ASR.

Type: Grant

Filed: July 26, 2021

Date of Patent: May 16, 2023

Assignee: ROVI GUIDES, INC.

Inventor: Jeffry Copps Robert Jose
Control apparatus, voice interaction apparatus, voice recognition server, and program

Patent number: 11587554

Abstract: The control system includes a calculation unit configured to control a voice interaction system including voice recognition models, in which the calculation unit instructs, when a conversation with a target person is started, the voice interaction system to first perform voice recognition and response generation by one voice recognition model tentatively selected from among the voice recognition models, determines a voice recognition model estimated to be optimal among the voice recognition models held in the voice interaction system based on results of the voice recognition of a speech made by the target person in a voice recognition server, and instructs, when the voice recognition model estimated to be optimal is different from the one voice recognition model tentatively selected, the voice interaction system to switch the voice recognition model to the one estimated to be optimal and to perform voice recognition and response generation.

Type: Grant

Filed: December 16, 2019

Date of Patent: February 21, 2023

Assignee: TOYOTA JIDOSHA KABUSHIKI KAISHA

Inventor: Narimasa Watanabe
Anomaly pattern detection system and method

Patent number: 11580005

Abstract: Provided is an anomaly pattern detection system including an anomaly detection device connected to one or more servers. The anomaly detection device may include an anomaly detector configured to model input data by considering all of the input data as normal patterns, and detect an anomaly pattern from the input data based on the modeling result.

Type: Grant

Filed: November 8, 2019

Date of Patent: February 14, 2023

Inventor: Jang Won Suh
Electronic apparatus and control method thereof

Patent number: 11568254

Abstract: An electronic apparatus is provided.

Type: Grant

Filed: December 26, 2019

Date of Patent: January 31, 2023

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Dongsoo Lee, Sejung Kwon, Parichay Kapoor, Byeoungwook Kim
Content output management based on speech quality

Patent number: 11562739

Abstract: Techniques for ensuring content output to a user conforms to a quality of the user's speech, even when a speechlet or skill ignores the speech's quality, are described. When a system receives speech, the system determines an indicator of the speech's quality (e.g., whispered, shouted, fast, slow, etc.) and persists the indicator in memory. When the system receives output content from a speechlet or skill, the system checks whether the output content is in conformity with the speech quality indicator. If the content conforms to the speech quality indicator, the system may cause the content to be output to the user without further manipulation. But, if the content does not conform to the speech quality indicator, the system may manipulate the content to render it in conformity with the speech quality indicator and output the manipulated content to the user.

Type: Grant

Filed: February 10, 2020

Date of Patent: January 24, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Andrew Smith, Christopher Schindler, Karthik Ramakrishnan, Rohit Prasad, Michael George, Rafal Kuklinski
Speaker diartzation using an end-to-end model

Patent number: 11545157

Abstract: Techniques are described for training and/or utilizing an end-to-end speaker diarization model. In various implementations, the model is a recurrent neural network (RNN) model, such as an RNN model that includes at least one memory layer, such as a long short-term memory (LSTM) layer. Audio features of audio data can be applied as input to an end-to-end speaker diarization model trained according to implementations disclosed herein, and the model utilized to process the audio features to generate, as direct output over the model, speaker diarization results. Further, the end-to-end speaker diarization model can be a sequence-to-sequence model, where the sequence can have variable length. Accordingly, the model can be utilized to generate speaker diarization results for any of various length audio segments.

Type: Grant

Filed: April 15, 2019

Date of Patent: January 3, 2023

Assignee: GOOGLE LLC

Inventors: Quan Wang, Yash Sheth, Ignacio Lopez Moreno, Li Wan
Method and apparatus for recognizing a voice

Patent number: 11508356

Abstract: Disclosed are a speech recognition method and a speech recognition device, in which speech recognition is performed by executing an artificial intelligence (AI) algorithm and/or a machine learning algorithm provided therein. According to an embodiment of the present disclosure, the speech recognition method includes buffering an inputted spoken utterance, determining whether a preset wake-up word is present in the spoken utterance by comparing the buffered spoken utterance to the preset wake-up word, and in response to the preset wake-up word in the spoken utterance, activating a speech recognition function and isolating, from the spoken utterance, a spoken sentence as a voice command without the wake-up word, and processing the spoken sentence and outputting a processing result.

Type: Grant

Filed: September 10, 2019

Date of Patent: November 22, 2022

Assignee: LG ELECTRONICS INC.

Inventor: Jong Hoon Chae
Mapping natural language to queries using a query grammar

Patent number: 11442932

Abstract: Systems and methods for mapping natural language to queries using a query grammar are described. For example, methods may include generating, based on a string, a set of tokens of a database syntax; generating a query graph for the set of tokens using a finite state machine representing a query grammar, wherein nodes of the finite state machine represent token types, directed edges of the finite state machine represent valid transitions between token types in the query grammar, vertices of the query graph correspond to respective tokens of the set of tokens, and directed edges of the query graph represent a transition between two tokens in a sequencing of the tokens; determining, based on the query graph, a sequence of the tokens in the set of tokens, forming a database query; and invoking a search of a database using a query based on the database query to obtain search results.

Type: Grant

Filed: July 16, 2019

Date of Patent: September 13, 2022

Assignee: ThoughtSpot, Inc.

Inventors: Nikhil Yadav, Ravi Tandon
Unified communications call routing and decision based on integrated analytics-driven database and aggregated data

Patent number: 11425252

Abstract: Exemplary aspects involve a data-communications apparatus or system communicate over a broadband network with a plurality of remotely-located data-communications circuits respectively associated with a plurality of remotely-situated client entities. The system includes a unified-communications and call center (UC-CC) platform that processes incoming data-communication interactions including different types of digitally-represented communications among which are incoming call, and that is integrated with a memory circuit including a database of information sets. Each of the information sets includes experience data corresponding to past incoming data-communication interactions processed by the platform, and with aggregated and organized data based on data collected in previous incoming interactions.

Type: Grant

Filed: July 21, 2021

Date of Patent: August 23, 2022

Assignee: 8x8, Inc.

Inventors: Bryan R. Martin, Matt Taylor, Manu Mukerji
Maintainable and scalable pipeline for automatic speech recognition language modeling

Patent number: 11410658

Abstract: Audio data saved at the end of client interactions are sampled, analyzed for pauses in speech, and sliced into stretches of acoustic data containing human speech between those pauses. The acoustic data are accompanied by machine transcripts made by VoiceAI. A suitable distribution of data useful for training and testing are stipulated during data sampling by applying certain filtering criteria. The resulting datasets are sent for transcription by a human transcriber team. The human transcripts are retrieved, some post-transcription processing and cleaning are performed, and the results are added to datastores for training and testing an acoustic model.

Type: Grant

Filed: October 29, 2019

Date of Patent: August 9, 2022

Assignee: Dialpad, Inc.

Inventors: Eddie Yee Tak Ma, James Palmer, Kevin James, Etienne Manderscheid
Multi-device output management based on speech characteristics

Patent number: 11393471

Abstract: A system is provided for modifying how an output is presented via a multi-device synchronous configuration based on detecting a speech characteristic in the user input. For example, if the user whispers a request, then the system may temporarily modify how the responsive output is presented to the user via multiple devices. In one example, the system may lower the volume on all devices presented the output. In another example, the system may present the output via a single device rather than multiple devices. The system may also determine to operate in a alternate output mode based on certain non-audio data.

Type: Grant

Filed: March 30, 2020

Date of Patent: July 19, 2022

Assignee: Amazon Technologies, Inc.

Inventor: Ezekiel Wade Sanborn de Asis
Information processing apparatus, information processing method, and program

Patent number: 11388480

Abstract: An information processing apparatus according to the present technology includes a reception unit, a first generation unit, a collection unit, and a second generation unit. The reception unit receives a content. The first generation unit analyzes the received content and generates one or more pieces of analysis information related to the content. The collection unit collects content information related to the content on a network on the basis of the one or more pieces of generated analysis information. The second generation unit generates an utterance sentence on the basis of at least one of the one or more pieces of analysis information and the collected content information.

Type: Grant

Filed: September 28, 2016

Date of Patent: July 12, 2022

Inventor: Hideo Nagasaka
Artificial intelligence apparatus for recognizing speech of user and method for the same

Patent number: 11367438

Abstract: An embodiment of the present invention provides an artificial intelligence (AI) apparatus for recognizing a speech of a user, the artificial intelligence apparatus includes a memory to store a speech recognition model and a processor to obtain a speech signal for a user speech, to convert the speech signal into a text using the speech recognition model, to measure a confidence level for the conversion, to perform a control operation corresponding to the converted text if the measured confidence level is greater than or equal to a reference value, and to provide feedback for the conversion if the measured confidence level is less than the reference value.

Type: Grant

Filed: May 16, 2019

Date of Patent: June 21, 2022

Assignee: LG ELECTRONICS INC.

Inventors: Jaehong Kim, Hyoeun Kim, Hangil Jeong, Heeyeon Choi
Method, apparatus, device and computer readable storage medium for recognizing and decoding voice based on streaming attention model

Patent number: 11355113

Abstract: A method, apparatus, device, and computer readable storage medium for recognizing and decoding a voice based on a streaming attention model are provided. The method may include generating a plurality of acoustic paths for decoding the voice using the streaming attention model, and then merging acoustic paths with identical last syllables of the plurality of acoustic paths to obtain a plurality of merged acoustic paths. The method may further include selecting a preset number of acoustic paths from the plurality of merged acoustic paths as retained candidate acoustic paths. Embodiments of the present disclosure present a concept that acoustic score calculating of a current voice fragment is only affected by its last voice fragment and has nothing to do with earlier voice history, and merge acoustic paths with the identical last syllables of the plurality of candidate acoustic paths.

Type: Grant

Filed: March 9, 2020

Date of Patent: June 7, 2022

Assignee: Baidu Online Network Technology (Beijing) Co., Ltd.

Inventors: Junyao Shao, Sheng Qian, Lei Jia
Hybrid learning-based and statistical processing techniques for voice activity detection

Patent number: 11341988

Abstract: A hybrid machine learning-based and DSP statistical post-processing technique is disclosed for voice activity detection. The hybrid technique may use a DNN model with a small context window to estimate the probability of speech by frames. The DSP statistical post-processing stage operates on the frame-based speech probabilities from the DNN model to smooth the probabilities and to reduce transitions between speech and non-speech states. The hybrid technique may estimate the soft decision on detected speech in each frame based on the smoothed probabilities, generate a hard decision using a threshold, detect a complete utterance that may include brief pauses, and estimate the end point of the utterance. The hybrid voice activity detection technique may incorporate a target directional probability estimator to estimate the direction of the speech source. The DSP statistical post-processing module may use the direction of the speech source to inform the estimates of the voice activity.

Type: Grant

Filed: September 23, 2019

Date of Patent: May 24, 2022

Assignee: APPLE INC.

Inventors: Ramin Pishehvar, Feiping Li, Ante Jukic, Mehrez Souden, Joshua D. Atkins
Neural machine translation adaptation

Patent number: 11341340

Abstract: Adapters for neural machine translation systems. A method includes determining a set of similar n-grams that are similar to a source n-gram, and each similar n-gram and the source n-gram is in a first language; determining, for each n-gram in the set of similar n-grams, a target n-gram is a translation of the similar n-gram in the first language to the target n-gram in the second language; generating a source encoding of the source n-gram, and, for each target n-gram determined from the set of similar n-grams determined for the source n-gram, a target encoding of the target n-gram and a conditional source target memory that is an encoding of each of the target encodings; providing, as input to a first prediction model, the source encoding and the condition source target memory; and generating a predicted translation of the source n-gram from the first language to the second language.

Type: Grant

Filed: October 1, 2019

Date of Patent: May 24, 2022

Assignee: Google LLC

Inventors: Ankur Bapna, Ye Tian, Orhan Firat
Selection biasing

Patent number: 11334182

Abstract: In some implementations, data indicating a touch received on a proximity-sensitive display is received while the proximity-sensitive display is presenting one or more items. In one aspect, the techniques describe may involve a process for disambiguating touch selections of hypothesized items, such as text or graphical objects that have been generated based on input data, on a proximity-sensitive display. This process may allow a user to more easily select hypothesized items that the user may wish to correct, by determining whether a touch received through the proximity-sensitive display represents a selection of each hypothesized item based at least on a level of confidence that the hypothesized item accurately represents the input data.

Type: Grant

Filed: December 10, 2019

Date of Patent: May 17, 2022

Assignee: Google LLC

Inventors: Jakob Nicolaus Foerster, Diego Melendo Casado, Glen Shires
Phone-based sub-word units for end-to-end speech recognition

Patent number: 11328731

Abstract: System and methods for identifying a text word from a spoken utterance are provided. An ensemble BPE system that includes a phone BPE system and a character BPE system receives a spoken utterance. Both BPE systems include a multi-level language model (LM) and an acoustic model. The phone BPE system identifies first words from the spoken utterance and determine a first score for each first word. The first words are converted into character sequences. The character BPE model converts the character sequences into second words and determines a second score for each second word. For each word from the first words that matches a word in the second words the first and second scores are combined. The text word is the word with a highest score.

Type: Grant

Filed: June 17, 2020

Date of Patent: May 10, 2022

Assignee: salesforce.com, inc.

Inventors: Weiran Wang, Yingbo Zhou, Caiming Xiong
Dynamic interpolation for hybrid language models

Patent number: 11295732

Abstract: In order to improve the accuracy of ASR, an utterance is transcribed using a plurality of language models, such as for example, an N-gram language model and a neural language model. The language models are trained separately. They each output a probability score or other figure of merit for a partial transcription hypothesis. Model scores are interpolated to determine a hybrid score. While recognizing an utterance, interpolation weights are chosen or updated dynamically, in the specific context of processing. The weights are based on dynamic variables associated with the utterance, the partial transcription hypothesis, or other aspects of context.

Type: Grant

Filed: August 1, 2019

Date of Patent: April 5, 2022

Assignee: SoundHound, Inc.

Inventors: Steffen Holm, Terry Kong, Kiran Garaga Lokeswarappa
Information processing device and information processing method for determining a user type based on performed speech

Patent number: 11289099

Abstract: An information processing device including processing circuitry is provided. The processing circuitry configured to receive voice information regarding a voice of a user collected by a specific microphone of a plurality of microphones. The processing circuitry is configured to determine the user identified on a basis of the voice information regarding the voice of the user collected by the specific microphone among the plurality of microphones to be a specific type of user that has performed speech a predefined number of times or more within at least a certain period of time. Further, the processing circuitry is configured to control a message to be output to the user via a speaker corresponding to the specific microphone based on the user being determined to be the specific type of user.

Type: Grant

Filed: August 4, 2017

Date of Patent: March 29, 2022

Assignee: SONY CORPORATION

Inventor: Keigo Ihara
Audio signal encoding method and audio signal decoding method, and encoder and decoder performing the same

Patent number: 11276413

Abstract: Disclosed are an audio signal encoding method and audio signal decoding method, and an encoder and decoder performing the same. The audio signal encoding method includes applying an audio signal to a training model including N autoencoders provided in a cascade structure, encoding an output result derived through the training model, and generating a bitstream with respect to the audio signal based on the encoded output result.

Type: Grant

Filed: August 16, 2019

Date of Patent: March 15, 2022

Assignees: Electronics and Telecommunications Research Institute, THE TRUSTEES OF INDIANA UNIVERSITY

Inventors: Mi Suk Lee, Jongmo Sung, Minje Kim, Kai Zhen
Methods and systems for improving hardware resiliency during serial processing tasks in distributed computer networks

Patent number: 11243810

Abstract: The system uses the non-repudiatory persistence of blockchain technology to store all task statuses and results across the distributed computer network in an immutable blockchain database. Coupled with the resiliency of the stored data, the system may determine a sequence of processing tasks for a given processing request and use the sequence to detect and/or predict failures. Accordingly, in the event of a detected system failure, the system may recover the results prior to the failure, minimizing disruptions to processing the request and improving hardware resiliency.

Type: Grant

Filed: January 22, 2021

Date of Patent: February 8, 2022

Assignee: The Bank of New York Mellon

Inventors: Sanjay Kumar Stribady, Saket Sharma, Gursel Taskale
System and method of text zoning

Patent number: 11217252

Abstract: A method of zoning a transcription of audio data includes separating the transcription of audio data into a plurality of utterances. A that each word in an utterances is a meaning unit boundary is calculated. The utterance is split into two new utterances at a work with a maximum calculated probability. At least one of the two new utterances that is shorter than a maximum utterance threshold is identified as a meaning unit.

Type: Grant

Filed: August 28, 2019

Date of Patent: January 4, 2022

Assignee: VERINT SYSTEMS INC.

Inventors: Roni Romano, Yair Horesh, Jeremie Dreyfuss
Voice communication targeting user interface

Patent number: 11204685

Abstract: User interfaces may enable users to initiate voice-communications with voice-controlled devices via a Wi-Fi network or other network via an Internet Protocol (IP) address. The user interfaces may include controls to enable users to initiate voice communications, such as Voice over Internet Protocol (VoIP) calls, with devices that do not have connectivity with traditional mobile telephone networks, such as traditional circuit transmissions of a Public Switched Telephone Network (PSTN). For example, the user interface may enable initiating a voice communication with a voice-controlled device that includes network connectivity via a home Wi-Fi network. The user interfaces may indicate availability of devices and/or contacts for voice communications and/or recent activity of devices or contact.

Type: Grant

Filed: February 21, 2020

Date of Patent: December 21, 2021

Assignee: Amazon Technologies, Inc.

Inventors: Blair Harold Beebe, Katherine Ann Baker, David Michael Rowell, Peter Chin
Smart communications assistant with audio interface

Patent number: 11178082

Abstract: Methods, systems, and computer programs are presented for a smart communications assistant with an audio interface. One method includes an operation for getting messages addressed to a user. The messages are from one or more message sources and each message comprising message data that includes text. The method further includes operations for analyzing the message data to determine a meaning of each message, for generating a score for each message based on the respective message data and the meaning of the message, and for generating a textual summary for the messages based on the message scores and the meaning of the messages. A speech summary is created based on the textual summary and the speech summary is then sent to a speaker associated with the user. The audio interface further allows the user to verbally request actions for the messages.

Type: Grant

Filed: November 15, 2019

Date of Patent: November 16, 2021

Assignee: Microsoft Technology Licensing, LLC

Inventors: Nikrouz Ghotbi, August Niehaus, Sachin Venugopalan, Aleksandar Antonijevic, Tvrtko Tadic, Vashutosh Agrawal, Lisa Stifelman
System and method for uninterrupted application awakening and speech recognition

Patent number: 11164584

Abstract: Systems and methods are provided for application awakening and speech recognition. Such system may comprise a microphone configured to record an audio in an audio queue. The system may further comprise a processor configured to monitor the audio queue for an awakening phrase, in response to detecting the awakening phrase, obtain an audio segment from the audio queue, and transmit the obtained audio segment to a server. The recording of the audio may be continuous from a beginning of the awakening phrase to an end of the audio segment.

Type: Grant

Filed: September 9, 2019

Date of Patent: November 2, 2021

Assignee: Beijing DiDi Infinity Technology and Development Co., Ltd.

Inventors: Liting Guo, Gangtao Hu
Augmented intent and entity extraction using pattern recognition interstitial regular expressions

Patent number: 11164564

Abstract: According to certain embodiments, a system comprises interface circuitry and processing circuitry. The processing circuitry receives an input via the interface circuitry. The input is based on an utterance of a user, and the processing circuitry uses a probabilistic engine to determine one or more candidate intents associated with the utterance. The processing circuitry determines a number of the one or more candidate intents that exceed a threshold. If the number of candidate intents that exceed the threshold does not equal one, the processing circuitry uses a deterministic engine to compare the input to a set of regular expression patterns. If the input matches one of the regular expression patterns, the processing circuitry uses the matching regular expression pattern to determine the intent of the utterance. The interface circuitry communicates the intent of the utterance as an output.

Type: Grant

Filed: June 12, 2020

Date of Patent: November 2, 2021

Assignee: Bank of America Corporation

Inventors: Donatus Asumu, Bhargav Aditya Ayyagari
Dialog based speech recognition

Patent number: 11151332

Abstract: Embodiments provide for dialog based speech recognition by clustering a plurality of nodes comprising a dialog tree into at least a first cluster and a second cluster; creating a first dataset of natural language sentences for the first cluster and a second dataset of natural language sentences for the second cluster; generating a first specialized language model (LM) associated with the first cluster based on the first dataset; and generating a second specialized LM associated with the second cluster based on the second dataset, wherein the first specialized LM is different from the second specialized LM.

Type: Grant

Filed: March 7, 2019

Date of Patent: October 19, 2021

Assignee: International Business Machines Business

Inventors: Julio Nogima, Marcelo C. Grave, Claudio S. Pinhanez
Remote and local predictions

Patent number: 11115463

Abstract: The description relates to predicting terms based on text inputted by a user. One example includes a computing device comprising a processor configured to send, over a communications network, the text to a remote prediction engine. The processor is configured to send the text to a local prediction engine stored at the computing device, and to monitor for a local predicted term from the local prediction engine and a remote predicted term from the remote prediction engine, in response to the sent text. The computing device includes a user interface configured to present a final predicted term to the user such that the user is able to select the final term. The processor is configured to form the final predicted term using either the remote predicted term or the local predicted term on the basis of a time interval running from the time at which the user input the text.

Type: Grant

Filed: November 22, 2016

Date of Patent: September 7, 2021

Assignee: Microsoft Technology Licensing, LLC

Inventors: Adam John Cudworth, Alexander Gautam Primavesi, Piotr Jerzy Holc, Joseph Charles Woodward
Audio analytics for natural language processing

Patent number: 11094316

Abstract: A device includes a memory configured to store category labels associated with categories of a natural language processing library. A processor is configured to analyze input audio data to generate a text string and to perform natural language processing on at least the text string to generate an output text string including an action associated with a first device, a speaker, a location, or a combination thereof. The processor is configured to compare the input audio data to audio data of the categories to determine whether the input audio data matches any of the categories and, in response to determining that the input audio data does not match any of the categories: create a new category label, associate the new category label with at least a portion of the output text string, update the categories with the new category label, and generate a notification indicating the new category label.

Type: Grant

Filed: May 4, 2018

Date of Patent: August 17, 2021

Assignee: QUALCOMM Incorporated

Inventors: Erik Visser, Fatemeh Saki, Yinyi Guo, Sunkuk Moon, Lae-Hoon Kim, Ravi Choudhary

1 2 3 4 5 … next