Patents by Inventor Ruoming Pang

Ruoming Pang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Emitting word timings with end-to-end models

Patent number: 12361927

Abstract: A method includes receiving a training example that includes audio data representing a spoken utterance and a ground truth transcription. For each word in the spoken utterance, the method also includes inserting a placeholder symbol before the respective word identifying a respective ground truth alignment for a beginning and an end of the respective word, determining a beginning word piece and an ending word piece, and generating a first constrained alignment for the beginning word piece and a second constrained alignment for the ending word piece. The first constrained alignment is aligned with the ground truth alignment for the beginning of the respective word and the second constrained alignment is aligned with the ground truth alignment for the ending of the respective word. The method also includes constraining an attention head of a second pass decoder by applying the first and second constrained alignments.

Type: Grant

Filed: May 31, 2024

Date of Patent: July 15, 2025

Assignee: Google LLC

Inventors: Tara N. Sainath, Basilio Garcia Castillo, David Rybach, Trevor Strohman, Ruoming Pang
Neural architecture search with factorized hierarchical search space

Patent number: 12293276

Abstract: The present disclosure is directed to an automated neural architecture search approach for designing new neural network architectures such as, for example, resource-constrained mobile CNN models. In particular, the present disclosure provides systems and methods to perform neural architecture search using a novel factorized hierarchical search space that permits layer diversity throughout the network, thereby striking the right balance between flexibility and search space size. The resulting neural architectures are able to be run relatively faster and using relatively fewer computing resources (e.g., less processing power, less memory usage, less power consumption, etc.), all while remaining competitive with or even exceeding the performance (e.g., accuracy) of current state-of-the-art mobile-optimized models.

Type: Grant

Filed: February 1, 2024

Date of Patent: May 6, 2025

Assignee: GOOGLE LLC

Inventors: Mingxing Tan, Quoc Le, Bo Chen, Vijay Vasudevan, Ruoming Pang
SYNTHESIS OF SPEECH FROM TEXT IN A VOICE OF A TARGET SPEAKER USING NEURAL NETWORKS

Publication number: 20250095630

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech synthesis. The methods, systems, and apparatus include actions of obtaining an audio representation of speech of a target speaker, obtaining input text for which speech is to be synthesized in a voice of the target speaker, generating a speaker vector by providing the audio representation to a speaker encoder engine that is trained to distinguish speakers from one another, generating an audio representation of the input text spoken in the voice of the target speaker by providing the input text and the speaker vector to a spectrogram generation engine that is trained using voices of reference speakers to generate audio representations, and providing the audio representation of the input text spoken in the voice of the target speaker for output.

Type: Application

Filed: December 2, 2024

Publication date: March 20, 2025

Applicant: Google LLC

Inventors: Ye Jia, Zhifeng Chen, Yonghui Wu, Jonathan Shen, Ruoming Pang, Ron J. Weiss, Ignacio Lopez Moreno, Fei Ren, Yu Zhang, Quan Wang, Patrick An Phu Nguyen
Language Agnostic Multilingual End-To-End Streaming On-Device ASR System

Publication number: 20250095634

Abstract: A method includes receiving a sequence of acoustic frames characterizing one or more utterances as input to a multilingual automated speech recognition (ASR) model. The method also includes generating a higher order feature representation for a corresponding acoustic frame. The method also includes generating a hidden representation based on a sequence of non-blank symbols output by a final softmax layer. The method also includes generating a probability distribution over possible speech recognition hypotheses based on the hidden representation generated by the prediction network at each of the plurality of output steps and the higher order feature representation generated by the encoder at each of the plurality of output steps. The method also includes predicting an end of utterance (EOU) token at an end of each utterance. The method also includes classifying each acoustic frame as either speech, initial silence, intermediate silence, or final silence.

Type: Application

Filed: December 2, 2024

Publication date: March 20, 2025

Applicant: Google LLC

Inventors: Bo Li, Tara N. Sainath, Ruoming Pang, Shuo-yiin Chang, Qiumin Xu, Trevor Strohman, Vince Chen, Qiao Liang, Heguang Liu, Yanzhang He, Parisa Haghani, Sameer Bidichandani
HARDWARE-OPTIMIZED NEURAL ARCHITECTURE SEARCH

Publication number: 20250077833

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for determining an architecture for a task neural network that is configured to perform a particular machine learning task on a target set of hardware resources. When deployed on a target set of hardware, such as a collection of datacenter accelerators, the task neural network may be capable of performing the particular machine learning task with enhanced accuracy and speed.

Type: Application

Filed: August 30, 2024

Publication date: March 6, 2025

Inventors: Sheng Li, Norman Paul Jouppi, Quoc V. Le, Mingxing Tan, Ruoming Pang, Liqun Cheng, Andrew Li
ASYNCHRONOUS DISTRIBUTED DATA FLOW FOR MACHINE LEARNING WORKLOADS

Publication number: 20250053444

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for distributing machine learning workloads, e.g., computations for training a neural network or computing an inference using a neural network, across multiple hardware accelerators.

Type: Application

Filed: August 23, 2024

Publication date: February 13, 2025

Inventors: Jeffrey Adgate Dean, Sudip Roy, Michael Acheson Isard, Aakanksha Chowdhery, Brennan Saeta, Chandramohan Amyangot Thekkath, Daniel William Hurt, Hyeontaek Lim, Laurent El Shafey, Parker Edward Schuh, Paul Ronald Barham, Ruoming Pang, Ryan Sepassi, Sanjay Ghemawat, Yonghui Wu
Co-Training of Action Recognition Machine Learning Models

Publication number: 20250037426

Abstract: A method includes obtaining video datasets each including pairs of a training video and a ground-truth action classification of the training video. The method also includes generating an action recognition model that includes a shared encoder model and action classification heads. A number of the action classifications heads may be equal to a number of the video datasets, and each action classification head may be configured to, based on an output of the shared encoder model, classify training videos sampled from a corresponding video dataset. The method also includes determining, by the action recognition model and for each training video sampled from the video datasets, an inferred action classification. The method further includes determining a loss value based on the inferred action classifications and the ground-truth action classifications, and adjusting parameters of the action recognition model based on the loss value.

Type: Application

Filed: December 9, 2022

Publication date: January 30, 2025

Inventors: Bowen Zhang, Jiahui Yu, Christopher Fifty, Wei Han, Andrew M. Dai, Ruoming Pang, Fei Sha
Optimizing inference performance for conformer

Patent number: 12190869

Abstract: A computer-implemented method includes receiving a sequence of acoustic frames as input to an automatic speech recognition (ASR) model. Here, the ASR model includes a causal encoder and a decoder. The method also includes generating, by the causal encoder, a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The method also includes generating, by the decoder, a first probability distribution over possible speech recognition hypotheses. Here, the causal encoder includes a stack of causal encoder layers each including a Recurrent Neural Network (RNN) Attention-Performer module that applies linear attention.

Type: Grant

Filed: September 29, 2022

Date of Patent: January 7, 2025

Assignee: Google LLC

Inventors: Tara N. Sainath, Rami Botros, Anmol Gulati, Krzysztof Choromanski, Ruoming Pang, Trevor Strohman, Weiran Wang, Jiahui Yu
Language agnostic multilingual end-to-end streaming on-device ASR system

Patent number: 12183322

Abstract: A method includes receiving a sequence of acoustic frames characterizing one or more utterances as input to a multilingual automated speech recognition (ASR) model. The method also includes generating a higher order feature representation for a corresponding acoustic frame. The method also includes generating a hidden representation based on a sequence of non-blank symbols output by a final softmax layer. The method also includes generating a probability distribution over possible speech recognition hypotheses based on the hidden representation generated by the prediction network at each of the plurality of output steps and the higher order feature representation generated by the encoder at each of the plurality of output steps. The method also includes predicting an end of utterance (EOU) token at an end of each utterance. The method also includes classifying each acoustic frame as either speech, initial silence, intermediate silence, or final silence.

Type: Grant

Filed: September 22, 2022

Date of Patent: December 31, 2024

Assignee: Google LLC

Inventors: Bo Li, Tara N. Sainath, Ruoming Pang, Shuo-yiin Chang, Qiumin Xu, Trevor Strohman, Vince Chen, Qiao Liang, Heguang Liu, Yanzhang He, Parisa Haghani, Sameer Bidichandani
Transducer-Based Streaming Deliberation for Cascaded Encoders

Publication number: 20240428786

Abstract: A method includes receiving a sequence of acoustic frames and generating, by a first encoder, a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The method also includes generating, by a first pass transducer decoder, a first pass speech recognition hypothesis for a corresponding first higher order feature representation and generating, by a text encoder, a text encoding for a corresponding first pass speech recognition hypothesis. The method also includes generating, by a second encoder, a second higher order feature representation for a corresponding first higher order feature representation. The method also includes generating, by a second pass transducer decoder, a second pass speech recognition hypothesis using a corresponding second higher order feature representation and a corresponding text encoding.

Type: Application

Filed: September 6, 2024

Publication date: December 26, 2024

Applicant: Google LLC

Inventors: Ke Hu, Tara N. Sainath, Arun Narayanan, Ruoming Pang, Trevor Strohman
Synthesis of speech from text in a voice of a target speaker using neural networks

Patent number: 12175963

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech synthesis. The methods, systems, and apparatus include actions of obtaining an audio representation of speech of a target speaker, obtaining input text for which speech is to be synthesized in a voice of the target speaker, generating a speaker vector by providing the audio representation to a speaker encoder engine that is trained to distinguish speakers from one another, generating an audio representation of the input text spoken in the voice of the target speaker by providing the input text and the speaker vector to a spectrogram generation engine that is trained using voices of reference speakers to generate audio representations, and providing the audio representation of the input text spoken in the voice of the target speaker for output.

Type: Grant

Filed: November 30, 2023

Date of Patent: December 24, 2024

Assignee: Google LLC

Inventors: Ye Jia, Zhifeng Chen, Yonghui Wu, Jonathan Shen, Ruoming Pang, Ron J. Weiss, Ignacio Lopez Moreno, Fei Ren, Yu Zhang, Quan Wang, Patrick An Phu Nguyen
TWO-PASS END TO END SPEECH RECOGNITION

Publication number: 20240420687

Abstract: Two-pass automatic speech recognition (ASR) models can be used to perform streaming on-device ASR to generate a text representation of an utterance captured in audio data. Various implementations include a first-pass portion of the ASR model used to generate streaming candidate recognition(s) of an utterance captured in audio data. For example, the first-pass portion can include a recurrent neural network transformer (RNN-T) decoder. Various implementations include a second-pass portion of the ASR model used to revise the streaming candidate recognition(s) of the utterance and generate a text representation of the utterance. For example, the second-pass portion can include a listen attend spell (LAS) decoder. Various implementations include a shared encoder shared between the RNN-T decoder and the LAS decoder.

Type: Application

Filed: August 26, 2024

Publication date: December 19, 2024

Inventors: Tara N. Sainath, Yanzhang He, Bo Li, Arun Narayanan, Ruoming Pang, Antoine Jean Bruguier, Shuo-yiin Chang, Wei Li
Cascaded encoders for simplified streaming and non-streaming ASR

Patent number: 12154581

Abstract: An automated speech recognition (ASR) model includes a first encoder, a second encoder, and a decoder. The first encoder receives, as input, a sequence of acoustic frames, and generates, at each of a plurality of output steps, a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The second encoder receives, as input, the first higher order feature representation generated by the first encoder at each of the plurality of output steps, and generates, at each of the plurality of output steps, a second higher order feature representation for a corresponding first higher order feature frame. The decoder receives, as input, the second higher order feature representation generated by the second encoder at each of the plurality of output steps, and generates, at each of the plurality of time steps, a first probability distribution over possible speech recognition hypotheses.

Type: Grant

Filed: April 21, 2021

Date of Patent: November 26, 2024

Assignee: Google LLC

Inventors: Arun Narayanan, Tara Sainath, Chung-Cheng Chiu, Ruoming Pang, Rohit Prabhavalkar, Jiahui Yu, Ehsan Variani, Trevor Strohman
Synthesizing speech from text using neural networks

Patent number: 12148444

Abstract: Methods, systems, and computer program products for generating, from an input character sequence, an output sequence of audio data representing the input character sequence. The output sequence of audio data includes a respective audio output sample for each of a number of time steps. One example method includes, for each of the time steps: generating a mel-frequency spectrogram for the time step by processing a representation of a respective portion of the input character sequence using a decoder neural network; generating a probability distribution over a plurality of possible audio output samples for the time step by processing the mel-frequency spectrogram for the time step using a vocoder neural network; and selecting the audio output sample for the time step from the possible audio output samples in accordance with the probability distribution.

Type: Grant

Filed: April 5, 2021

Date of Patent: November 19, 2024

Assignee: Google LLC

Inventors: Yonghui Wu, Jonathan Shen, Ruoming Pang, Ron J. Weiss, Michael Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, Russell John Wyatt Skerry-Ryan, Ryan M. Rifkin, Ioannis Agiomyrgiannakis
EFFICIENT STREAMING NON-RECURRENT ON-DEVICE END-TO-END MODEL

Publication number: 20240371363

Abstract: An ASR model includes a first encoder configured to receive a sequence of acoustic frames and generate a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The ASR model also includes a second encoder configured to receive the first higher order feature representation generated by the first encoder at each of the plurality of output steps and generate a second higher order feature representation for a corresponding first higher order feature frame. The ASR model also includes a decoder configured to receive the second higher order feature representation generated by the second encoder at each of the plurality of output steps and generate a first probability distribution over possible speech recognition hypothesis. The ASR model also includes a language model configured to receive the first probability distribution over possible speech hypothesis and generate a rescored probability distribution.

Type: Application

Filed: July 15, 2024

Publication date: November 7, 2024

Applicant: Google LLC

Inventors: Tara Sainath, Arun Narayanan, Rami Botros, Yanzhang He, Ehsan Variani, Cyril Allauzen, David Rybach, Ruoming Pang, Trevor Strohman
Convolution-Augmented Transformer Models

Publication number: 20240362453

Abstract: Systems and methods can utilize a conformer model to process a data set for various data processing tasks, including, but not limited to, speech recognition, sound separation, protein synthesis determination, video or other image set analysis, and natural language processing. The conformer model can use feed-forward blocks, a self-attention block, and a convolution block to process data to learn global interactions and relative-offset-based local correlations of the input data.

Type: Application

Filed: July 8, 2024

Publication date: October 31, 2024

Inventors: Anmol Gulati, Weikeng Qin, Zhengdong Zhang, Ruoming Pang, Niki Parmar, Jiahui Yu, Wei Han, Chung-Cheng Chiu, Yu Zhang, Yonghui Wu, Shibo Wang
Hardware-optimized neural architecture search

Patent number: 12131244

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for determining an architecture for a task neural network that is configured to perform a particular machine learning task on a target set of hardware resources. When deployed on a target set of hardware, such as a collection of datacenter accelerators, the task neural network may be capable of performing the particular machine learning task with enhanced accuracy and speed.

Type: Grant

Filed: September 30, 2020

Date of Patent: October 29, 2024

Assignee: Google LLC

Inventors: Sheng Li, Norman Paul Jouppi, Quoc V. Le, Mingxing Tan, Ruoming Pang, Liqun Cheng, Andrew Li
Transducer-based streaming deliberation for cascaded encoders

Patent number: 12118988

Abstract: A method includes receiving a sequence of acoustic frames and generating, by a first encoder, a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The method also includes generating, by a first pass transducer decoder, a first pass speech recognition hypothesis for a corresponding first higher order feature representation and generating, by a text encoder, a text encoding for a corresponding first pass speech recognition hypothesis. The method also includes generating, by a second encoder, a second higher order feature representation for a corresponding first higher order feature representation. The method also includes generating, by a second pass transducer decoder, a second pass speech recognition hypothesis using a corresponding second higher order feature representation and a corresponding text encoding.

Type: Grant

Filed: September 19, 2022

Date of Patent: October 15, 2024

Assignee: Google LLC

Inventors: Ke Hu, Tara N. Sainath, Arun Narayanan, Ruoming Pang, Trevor Strohman
Asynchronous distributed data flow for machine learning workloads

Patent number: 12112198

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for distributing machine learning workloads, e.g., computations for training a neural network or computing an inference using a neural network, across multiple hardware accelerators.

Type: Grant

Filed: December 15, 2022

Date of Patent: October 8, 2024

Assignee: Google LLC

Inventors: Jeffrey Adgate Dean, Sudip Roy, Michael Acheson Isard, Aakanksha Chowdhery, Brennan Saeta, Chandramohan Amyangot Thekkath, Daniel William Hurt, Hyeontaek Lim, Laurent El Shafey, Parker Edward Schuh, Paul Ronald Barham, Ruoming Pang, Ryan Sepassi, Sanjay Ghemawat, Yonghui Wu
Emitting Word Timings with End-to-End Models

Publication number: 20240321263

Abstract: A method includes receiving a training example that includes audio data representing a spoken utterance and a ground truth transcription. For each word in the spoken utterance, the method also includes inserting a placeholder symbol before the respective word identifying a respective ground truth alignment for a beginning and an end of the respective word, determining a beginning word piece and an ending word piece, and generating a first constrained alignment for the beginning word piece and a second constrained alignment for the ending word piece. The first constrained alignment is aligned with the ground truth alignment for the beginning of the respective word and the second constrained alignment is aligned with the ground truth alignment for the ending of the respective word. The method also includes constraining an attention head of a second pass decoder by applying the first and second constrained alignments.

Type: Application

Filed: May 31, 2024

Publication date: September 26, 2024

Applicant: Google LLC

Inventors: Tara N. Sainath, Basilio Garcia Castillo, David Rybach, Trevor Strohman, Ruoming Pang

1 2 3 4 5 next