Patents by Inventor Ruoming Pang

Ruoming Pang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Convolution-Augmented Transformer Models

Publication number: 20220207321

Abstract: Systems and methods can utilize a conformer model to process a data set for various data processing tasks, including, but not limited to, speech recognition, sound separation, protein synthesis determination, video or other image set analysis, and natural language processing. The conformer model can use feed-forward blocks, a self-attention block, and a convolution block to process data to learn global interactions and relative-offset-based local correlations of the input data.

Type: Application

Filed: December 31, 2020

Publication date: June 30, 2022

Inventors: Anmol Gulati, Ruoming Pang, Niki Parmar, Jiahui Yu, Wei Han, Chung-Cheng Chiu, Yu Zhang, Yonghui Wu, Shibo Wang, Weikeng Qin, Zhengdong Zhang
Unsupervised Learning of Disentangled Speech Content and Style Representation

Publication number: 20220189456

Abstract: A linguistic content and speaking style disentanglement model includes a content encoder, a style encoder, and a decoder. The content encoder is configured to receive input speech as input and generate a latent representation of linguistic content for the input speech output. The content encoder is trained to disentangle speaking style information from the latent representation of linguistic content. The style encoder is configured to receive the input speech as input and generate a latent representation of speaking style for the input speech as output. The style encoder is trained to disentangle linguistic content information from the latent representation of speaking style. The decoder is configured to generate output speech based on the latent representation of linguistic content for the input speech and the latent representation of speaking style for the same or different input speech.

Type: Application

Filed: November 18, 2021

Publication date: June 16, 2022

Applicant: Google LLC

Inventors: Ruoming Pang, Andros Tjandra, Yu Zhang, Shigeki Karita
Fast Emit Low-latency Streaming ASR with Sequence-level Emission Regularization

Publication number: 20220122586

Abstract: A computer-implemented method of training a streaming speech recognition model that includes receiving, as input to the streaming speech recognition model, a sequence of acoustic frames. The streaming speech recognition model is configured to learn an alignment probability between the sequence of acoustic frames and an output sequence of vocabulary tokens. The vocabulary tokens include a plurality of label tokens and a blank token. At each output step, the method includes determining a first probability of emitting one of the label tokens and determining a second probability of emitting the blank token. The method also includes generating the alignment probability at a sequence level based on the first probability and the second probability. The method also includes applying a tuning parameter to the alignment probability at the sequence level to maximize the first probability of emitting one of the label tokens.

Type: Application

Filed: September 9, 2021

Publication date: April 21, 2022

Applicant: Google LLC

Inventors: Jiahui Yu, Chung-cheng Chiu, Bo Li, Shuo-yiin Chang, Tara Sainath, Wei Han, Anmol Gulati, Yanzhang He, Arun Narayanan, Yonghui Wu, Ruoming Pang
Cascaded Encoders for Simplified Streaming and Non-Streaming ASR

Publication number: 20220122622

Abstract: An automated speech recognition (ASR) model includes a first encoder, a second encoder, and a decoder. The first encoder receives, as input, a sequence of acoustic frames, and generates, at each of a plurality of output steps, a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The second encoder receives, as input, the first higher order feature representation generated by the first encoder at each of the plurality of output steps, and generates, at each of the plurality of output steps, a second higher order feature representation for a corresponding first higher order feature frame. The decoder receives, as input, the second higher order feature representation generated by the second encoder at each of the plurality of output steps, and generates, at each of the plurality of time steps, a first probability distribution over possible speech recognition hypotheses.

Type: Application

Filed: April 21, 2021

Publication date: April 21, 2022

Applicant: Google LLC

Inventors: Arun Narayanan, Tara Sainath, Chung-Cheng Chiu, Ruoming Pang, Rohit Prabhavalkar, Jiahui Yu, Ehsan Variani, Trevor Strohman
Neural Architecture Search with Factorized Hierarchical Search Space

Publication number: 20220101090

Abstract: The present disclosure is directed to an automated neural architecture search approach for designing new neural network architectures such as, for example, resource-constrained mobile CNN models. In particular, the present disclosure provides systems and methods to perform neural architecture search using a novel factorized hierarchical search space that permits layer diversity throughout the network, thereby striking the right balance between flexibility and search space size. The resulting neural architectures are able to be run relatively faster and using relatively fewer computing resources (e.g., less processing power, less memory usage, less power consumption, etc.), all while remaining competitive with or even exceeding the performance (e.g., accuracy) of current state-of-the-art mobile-optimized models.

Type: Application

Filed: October 6, 2021

Publication date: March 31, 2022

Inventors: Mingxing Tan, Quoc V. Le, Bo Chen, Vijay Vasudevan, Ruoming Pang
HARDWARE-OPTIMIZED NEURAL ARCHITECTURE SEARCH

Publication number: 20220019869

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for determining an architecture for a task neural network that is configured to perform a particular machine learning task on a target set of hardware resources. When deployed on a target set of hardware, such as a collection of datacenter accelerators, the task neural network may be capable of performing the particular machine learning task with enhanced accuracy and speed.

Type: Application

Filed: September 30, 2020

Publication date: January 20, 2022

Inventors: Sheng Li, Norman Paul Jouppi, Quoc V. Le, Mingxing Tan, Ruoming Pang, Liqun Cheng, Andrew Li
Emitting Word Timings with End-to-End Models

Publication number: 20210350794

Abstract: A method includes receiving a training example that includes audio data representing a spoken utterance and a ground truth transcription. For each word in the spoken utterance, the method also includes inserting a placeholder symbol before the respective word identifying a respective ground truth alignment for a beginning and an end of the respective word, determining a beginning word piece and an ending word piece, and generating a first constrained alignment for the beginning word piece and a second constrained alignment for the ending word piece. The first constrained alignment is aligned with the ground truth alignment for the beginning of the respective word and the second constrained alignment is aligned with the ground truth alignment for the ending of the respective word. The method also includes constraining an attention head of a second pass decoder by applying the first and second constrained alignments.

Type: Application

Filed: March 17, 2021

Publication date: November 11, 2021

Applicant: Google LLC

Inventors: Tara N. Sainath, Basi Garcia, David Rybach, Trevor Strohman, Ruoming Pang
SYNTHESIZING SPEECH FROM TEXT USING NEURAL NETWORKS

Publication number: 20210295858

Abstract: Methods, systems, and computer program products for generating, from an input character sequence, an output sequence of audio data representing the input character sequence. The output sequence of audio data includes a respective audio output sample for each of a number of time steps. One example method includes, for each of the time steps: generating a mel-frequency spectrogram for the time step by processing a representation of a respective portion of the input character sequence using a decoder neural network; generating a probability distribution over a plurality of possible audio output samples for the time step by processing the mel-frequency spectrogram for the time step using a vocoder neural network; and selecting the audio output sample for the time step from the possible audio output samples in accordance with the probability distribution.

Type: Application

Filed: April 5, 2021

Publication date: September 23, 2021

Inventors: Yonghui Wu, Jonathan Shen, Ruoming Pang, Ron J. Weiss, Michael Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, Russell John Wyatt Skerry-Ryan, Ryan M. Rifkin, Ioannis Agiomyrgiannakis
Deliberation Model-Based Two-Pass End-To-End Speech Recognition

Publication number: 20210225369

Abstract: A method of performing speech recognition using a two-pass deliberation architecture includes receiving a first-pass hypothesis and an encoded acoustic frame and encoding the first-pass hypothesis at a hypothesis encoder. The first-pass hypothesis is generated by a recurrent neural network (RNN) decoder model for the encoded acoustic frame. The method also includes generating, using a first attention mechanism attending to the encoded acoustic frame, a first context vector, and generating, using a second attention mechanism attending to the encoded first-pass hypothesis, a second context vector.

Type: Application

Filed: January 14, 2021

Publication date: July 22, 2021

Applicant: Google LLC

Inventors: Ke Hu, Tara N. Sainath, Ruoming Pang, Rohit Prakash Prabhavalkar
Synthesis of Speech from Text in a Voice of a Target Speaker Using Neural Networks

Publication number: 20210217404

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech synthesis. The methods, systems, and apparatus include actions of obtaining an audio representation of speech of a target speaker, obtaining input text for which speech is to be synthesized in a voice of the target speaker, generating a speaker vector by providing the audio representation to a speaker encoder engine that is trained to distinguish speakers from one another, generating an audio representation of the input text spoken in the voice of the target speaker by providing the input text and the speaker vector to a spectrogram generation engine that is trained using voices of reference speakers to generate audio representations, and providing the audio representation of the input text spoken in the voice of the target speaker for output.

Type: Application

Filed: May 17, 2019

Publication date: July 15, 2021

Applicant: Google LLC

Inventors: Ye Jia, Zhifeng Chen, Yonghui Wu, Jonathan Shen, Ruoming Pang, Ron J. Weiss, Ignacio Lopez Moreno, Fei Ren, Yu Zhang, Quan Wang, Patrick Nguyen
Synthesizing speech from text using neural networks

Patent number: 10971170

Abstract: Methods, systems, and computer program products for generating, from an input character sequence, an output sequence of audio data representing the input character sequence. The output sequence of audio data includes a respective audio output sample for each of a number of time steps. One example method includes, for each of the time steps: generating a mel-frequency spectrogram for the time step by processing a representation of a respective portion of the input character sequence using a decoder neural network; generating a probability distribution over a plurality of possible audio output samples for the time step by processing the mel-frequency spectrogram for the time step using a vocoder neural network; and selecting the audio output sample for the time step from the possible audio output samples in accordance with the probability distribution.

Type: Grant

Filed: August 8, 2018

Date of Patent: April 6, 2021

Assignee: Google LLC

Inventors: Yonghui Wu, Jonathan Shen, Ruoming Pang, Ron J. Weiss, Michael Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, Russell John Wyatt Skerry-Ryan, Ryan M. Rifkin, Ioannis Agiomyrgiannakis
Using Context Information With End-to-End Models for Speech Recognition

Publication number: 20200357388

Abstract: A method includes receiving audio data encoding an utterance, processing, using a speech recognition model, the audio data to generate speech recognition scores for speech elements, and determining context scores for the speech elements based on context data indicating a context for the utterance. The method also includes executing, using the speech recognition scores and the context scores, a beam search decoding process to determine one or more candidate transcriptions for the utterance. The method also includes selecting a transcription for the utterance from the one or more candidate transcriptions.

Type: Application

Filed: March 24, 2020

Publication date: November 12, 2020

Applicant: Google LLC

Inventors: Ding Zhao, Bo Li, Ruoming Pang, Tara N. Sainath, David Rybach, Deepti Bhatia, Zelin Wu
Neural Architecture Search with Factorized Hierarchical Search Space

Publication number: 20200143227

Abstract: The present disclosure is directed to an automated neural architecture search approach for designing new neural network architectures such as, for example, resource-constrained mobile CNN models. In particular, the present disclosure provides systems and methods to perform neural architecture search using a novel factorized hierarchical search space that permits layer diversity throughout the network, thereby striking the right balance between flexibility and search space size. The resulting neural architectures are able to be run relatively faster and using relatively fewer computing resources (e.g., less processing power, less memory usage, less power consumption, etc.), all while remaining competitive with or even exceeding the performance (e.g., accuracy) of current state-of-the-art mobile-optimized models.

Type: Application

Filed: January 28, 2019

Publication date: May 7, 2020

Inventors: Mingxing Tan, Quoc Le, Bo Chen, Vijay Vasudevan, Ruoming Pang
TRAINING MACHINE LEARNING MODELS USING ADAPTIVE TRANSFER LEARNING

Publication number: 20200104710

Abstract: A method for training a target neural network on a target machine learning task is described.

Type: Application

Filed: September 27, 2019

Publication date: April 2, 2020

Inventors: Vijay Vasudevan, Ruoming Pang, Quoc V. Le, Daiyi Peng, Jiquan Ngiam, Simon Kornblith
SYNTHESIZING SPEECH FROM TEXT USING NEURAL NETWORKS

Publication number: 20200051583

Abstract: Methods, systems, and computer program products for generating, from an input character sequence, an output sequence of audio data representing the input character sequence. The output sequence of audio data includes a respective audio output sample for each of a number of time steps. One example method includes, for each of the time steps: generating a mel-frequency spectrogram for the time step by processing a representation of a respective portion of the input character sequence using a decoder neural network; generating a probability distribution over a plurality of possible audio output samples for the time step by processing the mel-frequency spectrogram for the time step using a vocoder neural network; and selecting the audio output sample for the time step from the possible audio output samples in accordance with the probability distribution.

Type: Application

Filed: August 8, 2018

Publication date: February 13, 2020

Inventors: Yonghui Wu, Jonathan Shen, Ruoming Pang, Ron J. Weiss
Efficient indexing and searching of access control listed documents

Patent number: 9479508

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for storing a plurality of documents in computer-readable memory, each document of the plurality of documents having a corresponding access control list (ACL), each ACL defining a plurality of users that are authorized to access a respective document, generating an index based on the plurality of users, the index comprising a plurality of partitions, each partition corresponding to a user of the plurality of users, and, for each document of the plurality of documents: ranking the users of the plurality of users, selecting a user as an indexing user based on the ranking, and storing the document in a partition of the index, the partition corresponding to the indexing user.

Type: Grant

Filed: August 24, 2015

Date of Patent: October 25, 2016

Assignee: Google Inc.

Inventors: Jeffrey Korn, Ruoming Pang, David Held, Dhyanesh Harishchandra Damania
Monitoring of negative feedback systems

Patent number: 9336310

Abstract: A computer-implemented method for identifying on-line comments as being legitimate or illegitimate is disclosed. The method includes receiving a comment directed to a piece of on-line content, randomly determining whether to review the comment manually, and providing for review of information regarding the comment by a manual reviewer if a determination is made to manually review the comment. The chance of determining whether to review the comment manually is dependent on outcomes of prior manual reviews of received comments.

Type: Grant

Filed: July 6, 2010

Date of Patent: May 10, 2016

Assignee: Google Inc.

Inventors: Mark M. Sandler, Nir Ailon, Raoul Sam Daruwala, Ruoming Pang
EFFICIENT INDEXING AND SEARCHING OF ACCESS CONTROL LISTED DOCUMENTS

Publication number: 20150365418

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for storing a plurality of documents in computer-readable memory, each document of the plurality of documents having a corresponding access control list (ACL), each ACL defining a plurality of users that are authorized to access a respective document, generating an index based on the plurality of users, the index comprising a plurality of partitions, each partition corresponding to a user of the plurality of users, and, for each document of the plurality of documents: ranking the users of the plurality of users, selecting a user as an indexing user based on the ranking, and storing the document in a partition of the index, the partition corresponding to the indexing user.

Type: Application

Filed: August 24, 2015

Publication date: December 17, 2015

Inventors: Jeffrey Korn, Ruoming Pang, David Held, Dhyanesh Harishchandra Damania
Efficient indexing and searching of access control listed documents

Patent number: 9152736

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for storing a plurality of documents in computer-readable memory, each document of the plurality of documents having a corresponding access control list (ACL), each ACL defining a plurality of users that are authorized to access a respective document, generating an index based on the plurality of users, the index comprising a plurality of partitions, each partition corresponding to a user of the plurality of users, and, for each document of the plurality of documents: ranking the users of the plurality of users, selecting a user as an indexing user based on the ranking, and storing the document in a partition of the index, the partition corresponding to the indexing user.

Type: Grant

Filed: March 7, 2012

Date of Patent: October 6, 2015

Assignee: Google Inc.

Inventors: Jeffrey Korn, Ruoming Pang, David Held, Dhyanesh Harishchandra Damania
Monitoring of Negative Feedback Systems

Publication number: 20150195295

Abstract: A computer-implemented method for identifying on-line comments as being legitimate or illegitimate is disclosed. The method includes receiving a comment directed to a piece of on-line content, randomly determining whether to review the comment manually, and providing for review of information regarding the comment by a manual reviewer if a determination is made to manually review the comment. The chance of determining whether to review the comment manually is dependent on outcomes of prior manual reviews of received comments.

Type: Application

Filed: July 6, 2010

Publication date: July 9, 2015

Inventors: Mark M. Sandler, Nir Ailon, Raoul Sam Daruwala, Ruoming Pang

prev 1 2 3 4 5 next