Patents by Inventor Tom O'malley

Tom O'malley has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Generalized Automatic Speech Recognition for Joint Acoustic Echo Cancellation, Speech Enhancement, and Voice Separation

Publication number: 20230298609

Abstract: A method for training a generalized automatic speech recognition model for joint acoustic echo cancellation, speech enhancement, and voice separation includes receiving a plurality of training utterances paired with corresponding training contextual signals. The training contextual signals include a training contextual noise signal including noise prior to the corresponding training utterance, a training reference audio signal, and a training speaker vector including voice characteristics of a target speaker that spoke the corresponding training utterance. The operations also include training, using a contextual signal dropout strategy, a contextual frontend processing model on the training utterances to learn how to predict enhanced speech features. Here, the contextual signal dropout strategy uses a predetermined probability to drop out each of the training contextual signals during training of the contextual frontend processing model.

Type: Application

Filed: February 19, 2023

Publication date: September 21, 2023

Applicant: Google LLC

Inventors: Tom O'Malley, Quan Wang, Arun Narayanan
Optimizing Personal VAD for On-Device Speech Recognition

Publication number: 20230298591

Abstract: A computer-implemented method includes receiving a sequence of acoustic frames corresponding to an utterance and generating a reference speaker embedding for the utterance. The method also includes receiving a target speaker embedding for a target speaker and generating feature-wise linear modulation (FiLM) parameters including a scaling vector and a shifting vector based on the target speaker embedding. The method also includes generating an affine transformation output that scales and shifts the reference speaker embedding based on the FiLM parameters. The method also includes generating a classification output indicating whether the utterance was spoken by the target speaker based on the affine transformation output.

Type: Application

Filed: March 17, 2023

Publication date: September 21, 2023

Applicant: Google LLC

Inventors: Shaojin Ding, Rajeev Rikhye, Qiao Liang, Yanzhang He, Quan Wang, Arun Narayanan, Tom O'Malley, Ian McGraw
Microphone Array Configuration Invariant, Streaming, Multichannel Neural Enhancement Frontend for Automatic Speech Recognition

Publication number: 20230298612

Abstract: A multichannel neural frontend speech enhancement model for speech recognition includes a speech cleaner, a stack of self-attention blocks each having a multi-headed self attention mechanism, and a masking layer. The speech cleaner receives, as input, a multichannel noisy input signal and a multichannel contextual noise signal, and generates, as output, a single channel cleaned input signal. The stack of self-attention blocks receives, as input, at an initial block of the stack of self-attention blocks, a stacked input including the single channel cleaned input signal and a single channel noisy input signal, and generates, as output, from a final block of the stack of self-attention blocks, an un-masked output. The masking layer receives, as input, the single channel noisy input signal and the un-masked output, and generates, as output, enhanced input speech features corresponding to a target utterance.

Type: Application

Filed: February 20, 2023

Publication date: September 21, 2023

Applicant: Google LLC

Inventors: Joseph Caroselli, Arun Narayanan, Tom O'malley
Joint Acoustic Echo Cancelation, Speech Enhancement, and Voice Separation for Automatic Speech Recognition

Publication number: 20230038982

Abstract: A method for automatic speech recognition using joint acoustic echo cancellation, speech enhancement, and voice separation includes receiving, at a contextual frontend processing model, input speech features corresponding to a target utterance. The method also includes receiving, at the contextual frontend processing model, at least one of a reference audio signal, a contextual noise signal including noise prior to the target utterance, or a speaker embedding including voice characteristics of a target speaker that spoke the target utterance. The method further includes processing, using the contextual frontend processing model, the input speech features and the at least one of the reference audio signal, the contextual noise signal, or the speaker embedding vector to generate enhanced speech features.

Type: Application

Filed: December 14, 2021

Publication date: February 9, 2023

Applicant: Google LLC

Inventors: Arun Narayanan, Tom O'malley, Quan Wang, Alex Park, James Walker, Nathan David Howard, Yanzhang He, Chung-Cheng Chiu

Generalized Automatic Speech Recognition for Joint Acoustic Echo Cancellation, Speech Enhancement, and Voice Separation

Optimizing Personal VAD for On-Device Speech Recognition

Microphone Array Configuration Invariant, Streaming, Multichannel Neural Enhancement Frontend for Automatic Speech Recognition

Joint Acoustic Echo Cancelation, Speech Enhancement, and Voice Separation for Automatic Speech Recognition