Patents by Inventor Ruoming Pang
Ruoming Pang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20220207321Abstract: Systems and methods can utilize a conformer model to process a data set for various data processing tasks, including, but not limited to, speech recognition, sound separation, protein synthesis determination, video or other image set analysis, and natural language processing. The conformer model can use feed-forward blocks, a self-attention block, and a convolution block to process data to learn global interactions and relative-offset-based local correlations of the input data.Type: ApplicationFiled: December 31, 2020Publication date: June 30, 2022Inventors: Anmol Gulati, Ruoming Pang, Niki Parmar, Jiahui Yu, Wei Han, Chung-Cheng Chiu, Yu Zhang, Yonghui Wu, Shibo Wang, Weikeng Qin, Zhengdong Zhang
-
Publication number: 20220189456Abstract: A linguistic content and speaking style disentanglement model includes a content encoder, a style encoder, and a decoder. The content encoder is configured to receive input speech as input and generate a latent representation of linguistic content for the input speech output. The content encoder is trained to disentangle speaking style information from the latent representation of linguistic content. The style encoder is configured to receive the input speech as input and generate a latent representation of speaking style for the input speech as output. The style encoder is trained to disentangle linguistic content information from the latent representation of speaking style. The decoder is configured to generate output speech based on the latent representation of linguistic content for the input speech and the latent representation of speaking style for the same or different input speech.Type: ApplicationFiled: November 18, 2021Publication date: June 16, 2022Applicant: Google LLCInventors: Ruoming Pang, Andros Tjandra, Yu Zhang, Shigeki Karita
-
Publication number: 20220122586Abstract: A computer-implemented method of training a streaming speech recognition model that includes receiving, as input to the streaming speech recognition model, a sequence of acoustic frames. The streaming speech recognition model is configured to learn an alignment probability between the sequence of acoustic frames and an output sequence of vocabulary tokens. The vocabulary tokens include a plurality of label tokens and a blank token. At each output step, the method includes determining a first probability of emitting one of the label tokens and determining a second probability of emitting the blank token. The method also includes generating the alignment probability at a sequence level based on the first probability and the second probability. The method also includes applying a tuning parameter to the alignment probability at the sequence level to maximize the first probability of emitting one of the label tokens.Type: ApplicationFiled: September 9, 2021Publication date: April 21, 2022Applicant: Google LLCInventors: Jiahui Yu, Chung-cheng Chiu, Bo Li, Shuo-yiin Chang, Tara Sainath, Wei Han, Anmol Gulati, Yanzhang He, Arun Narayanan, Yonghui Wu, Ruoming Pang
-
Publication number: 20220122622Abstract: An automated speech recognition (ASR) model includes a first encoder, a second encoder, and a decoder. The first encoder receives, as input, a sequence of acoustic frames, and generates, at each of a plurality of output steps, a first higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. The second encoder receives, as input, the first higher order feature representation generated by the first encoder at each of the plurality of output steps, and generates, at each of the plurality of output steps, a second higher order feature representation for a corresponding first higher order feature frame. The decoder receives, as input, the second higher order feature representation generated by the second encoder at each of the plurality of output steps, and generates, at each of the plurality of time steps, a first probability distribution over possible speech recognition hypotheses.Type: ApplicationFiled: April 21, 2021Publication date: April 21, 2022Applicant: Google LLCInventors: Arun Narayanan, Tara Sainath, Chung-Cheng Chiu, Ruoming Pang, Rohit Prabhavalkar, Jiahui Yu, Ehsan Variani, Trevor Strohman
-
Publication number: 20220101090Abstract: The present disclosure is directed to an automated neural architecture search approach for designing new neural network architectures such as, for example, resource-constrained mobile CNN models. In particular, the present disclosure provides systems and methods to perform neural architecture search using a novel factorized hierarchical search space that permits layer diversity throughout the network, thereby striking the right balance between flexibility and search space size. The resulting neural architectures are able to be run relatively faster and using relatively fewer computing resources (e.g., less processing power, less memory usage, less power consumption, etc.), all while remaining competitive with or even exceeding the performance (e.g., accuracy) of current state-of-the-art mobile-optimized models.Type: ApplicationFiled: October 6, 2021Publication date: March 31, 2022Inventors: Mingxing Tan, Quoc V. Le, Bo Chen, Vijay Vasudevan, Ruoming Pang
-
Publication number: 20220019869Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for determining an architecture for a task neural network that is configured to perform a particular machine learning task on a target set of hardware resources. When deployed on a target set of hardware, such as a collection of datacenter accelerators, the task neural network may be capable of performing the particular machine learning task with enhanced accuracy and speed.Type: ApplicationFiled: September 30, 2020Publication date: January 20, 2022Inventors: Sheng Li, Norman Paul Jouppi, Quoc V. Le, Mingxing Tan, Ruoming Pang, Liqun Cheng, Andrew Li
-
Publication number: 20210350794Abstract: A method includes receiving a training example that includes audio data representing a spoken utterance and a ground truth transcription. For each word in the spoken utterance, the method also includes inserting a placeholder symbol before the respective word identifying a respective ground truth alignment for a beginning and an end of the respective word, determining a beginning word piece and an ending word piece, and generating a first constrained alignment for the beginning word piece and a second constrained alignment for the ending word piece. The first constrained alignment is aligned with the ground truth alignment for the beginning of the respective word and the second constrained alignment is aligned with the ground truth alignment for the ending of the respective word. The method also includes constraining an attention head of a second pass decoder by applying the first and second constrained alignments.Type: ApplicationFiled: March 17, 2021Publication date: November 11, 2021Applicant: Google LLCInventors: Tara N. Sainath, Basi Garcia, David Rybach, Trevor Strohman, Ruoming Pang
-
Publication number: 20210295858Abstract: Methods, systems, and computer program products for generating, from an input character sequence, an output sequence of audio data representing the input character sequence. The output sequence of audio data includes a respective audio output sample for each of a number of time steps. One example method includes, for each of the time steps: generating a mel-frequency spectrogram for the time step by processing a representation of a respective portion of the input character sequence using a decoder neural network; generating a probability distribution over a plurality of possible audio output samples for the time step by processing the mel-frequency spectrogram for the time step using a vocoder neural network; and selecting the audio output sample for the time step from the possible audio output samples in accordance with the probability distribution.Type: ApplicationFiled: April 5, 2021Publication date: September 23, 2021Inventors: Yonghui Wu, Jonathan Shen, Ruoming Pang, Ron J. Weiss, Michael Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, Russell John Wyatt Skerry-Ryan, Ryan M. Rifkin, Ioannis Agiomyrgiannakis
-
Publication number: 20210225369Abstract: A method of performing speech recognition using a two-pass deliberation architecture includes receiving a first-pass hypothesis and an encoded acoustic frame and encoding the first-pass hypothesis at a hypothesis encoder. The first-pass hypothesis is generated by a recurrent neural network (RNN) decoder model for the encoded acoustic frame. The method also includes generating, using a first attention mechanism attending to the encoded acoustic frame, a first context vector, and generating, using a second attention mechanism attending to the encoded first-pass hypothesis, a second context vector.Type: ApplicationFiled: January 14, 2021Publication date: July 22, 2021Applicant: Google LLCInventors: Ke Hu, Tara N. Sainath, Ruoming Pang, Rohit Prakash Prabhavalkar
-
Publication number: 20210217404Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech synthesis. The methods, systems, and apparatus include actions of obtaining an audio representation of speech of a target speaker, obtaining input text for which speech is to be synthesized in a voice of the target speaker, generating a speaker vector by providing the audio representation to a speaker encoder engine that is trained to distinguish speakers from one another, generating an audio representation of the input text spoken in the voice of the target speaker by providing the input text and the speaker vector to a spectrogram generation engine that is trained using voices of reference speakers to generate audio representations, and providing the audio representation of the input text spoken in the voice of the target speaker for output.Type: ApplicationFiled: May 17, 2019Publication date: July 15, 2021Applicant: Google LLCInventors: Ye Jia, Zhifeng Chen, Yonghui Wu, Jonathan Shen, Ruoming Pang, Ron J. Weiss, Ignacio Lopez Moreno, Fei Ren, Yu Zhang, Quan Wang, Patrick Nguyen
-
Patent number: 10971170Abstract: Methods, systems, and computer program products for generating, from an input character sequence, an output sequence of audio data representing the input character sequence. The output sequence of audio data includes a respective audio output sample for each of a number of time steps. One example method includes, for each of the time steps: generating a mel-frequency spectrogram for the time step by processing a representation of a respective portion of the input character sequence using a decoder neural network; generating a probability distribution over a plurality of possible audio output samples for the time step by processing the mel-frequency spectrogram for the time step using a vocoder neural network; and selecting the audio output sample for the time step from the possible audio output samples in accordance with the probability distribution.Type: GrantFiled: August 8, 2018Date of Patent: April 6, 2021Assignee: Google LLCInventors: Yonghui Wu, Jonathan Shen, Ruoming Pang, Ron J. Weiss, Michael Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, Russell John Wyatt Skerry-Ryan, Ryan M. Rifkin, Ioannis Agiomyrgiannakis
-
Publication number: 20200357388Abstract: A method includes receiving audio data encoding an utterance, processing, using a speech recognition model, the audio data to generate speech recognition scores for speech elements, and determining context scores for the speech elements based on context data indicating a context for the utterance. The method also includes executing, using the speech recognition scores and the context scores, a beam search decoding process to determine one or more candidate transcriptions for the utterance. The method also includes selecting a transcription for the utterance from the one or more candidate transcriptions.Type: ApplicationFiled: March 24, 2020Publication date: November 12, 2020Applicant: Google LLCInventors: Ding Zhao, Bo Li, Ruoming Pang, Tara N. Sainath, David Rybach, Deepti Bhatia, Zelin Wu
-
Publication number: 20200143227Abstract: The present disclosure is directed to an automated neural architecture search approach for designing new neural network architectures such as, for example, resource-constrained mobile CNN models. In particular, the present disclosure provides systems and methods to perform neural architecture search using a novel factorized hierarchical search space that permits layer diversity throughout the network, thereby striking the right balance between flexibility and search space size. The resulting neural architectures are able to be run relatively faster and using relatively fewer computing resources (e.g., less processing power, less memory usage, less power consumption, etc.), all while remaining competitive with or even exceeding the performance (e.g., accuracy) of current state-of-the-art mobile-optimized models.Type: ApplicationFiled: January 28, 2019Publication date: May 7, 2020Inventors: Mingxing Tan, Quoc Le, Bo Chen, Vijay Vasudevan, Ruoming Pang
-
Publication number: 20200104710Abstract: A method for training a target neural network on a target machine learning task is described.Type: ApplicationFiled: September 27, 2019Publication date: April 2, 2020Inventors: Vijay Vasudevan, Ruoming Pang, Quoc V. Le, Daiyi Peng, Jiquan Ngiam, Simon Kornblith
-
Publication number: 20200051583Abstract: Methods, systems, and computer program products for generating, from an input character sequence, an output sequence of audio data representing the input character sequence. The output sequence of audio data includes a respective audio output sample for each of a number of time steps. One example method includes, for each of the time steps: generating a mel-frequency spectrogram for the time step by processing a representation of a respective portion of the input character sequence using a decoder neural network; generating a probability distribution over a plurality of possible audio output samples for the time step by processing the mel-frequency spectrogram for the time step using a vocoder neural network; and selecting the audio output sample for the time step from the possible audio output samples in accordance with the probability distribution.Type: ApplicationFiled: August 8, 2018Publication date: February 13, 2020Inventors: Yonghui Wu, Jonathan Shen, Ruoming Pang, Ron J. Weiss
-
Patent number: 9479508Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for storing a plurality of documents in computer-readable memory, each document of the plurality of documents having a corresponding access control list (ACL), each ACL defining a plurality of users that are authorized to access a respective document, generating an index based on the plurality of users, the index comprising a plurality of partitions, each partition corresponding to a user of the plurality of users, and, for each document of the plurality of documents: ranking the users of the plurality of users, selecting a user as an indexing user based on the ranking, and storing the document in a partition of the index, the partition corresponding to the indexing user.Type: GrantFiled: August 24, 2015Date of Patent: October 25, 2016Assignee: Google Inc.Inventors: Jeffrey Korn, Ruoming Pang, David Held, Dhyanesh Harishchandra Damania
-
Patent number: 9336310Abstract: A computer-implemented method for identifying on-line comments as being legitimate or illegitimate is disclosed. The method includes receiving a comment directed to a piece of on-line content, randomly determining whether to review the comment manually, and providing for review of information regarding the comment by a manual reviewer if a determination is made to manually review the comment. The chance of determining whether to review the comment manually is dependent on outcomes of prior manual reviews of received comments.Type: GrantFiled: July 6, 2010Date of Patent: May 10, 2016Assignee: Google Inc.Inventors: Mark M. Sandler, Nir Ailon, Raoul Sam Daruwala, Ruoming Pang
-
Publication number: 20150365418Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for storing a plurality of documents in computer-readable memory, each document of the plurality of documents having a corresponding access control list (ACL), each ACL defining a plurality of users that are authorized to access a respective document, generating an index based on the plurality of users, the index comprising a plurality of partitions, each partition corresponding to a user of the plurality of users, and, for each document of the plurality of documents: ranking the users of the plurality of users, selecting a user as an indexing user based on the ranking, and storing the document in a partition of the index, the partition corresponding to the indexing user.Type: ApplicationFiled: August 24, 2015Publication date: December 17, 2015Inventors: Jeffrey Korn, Ruoming Pang, David Held, Dhyanesh Harishchandra Damania
-
Patent number: 9152736Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for storing a plurality of documents in computer-readable memory, each document of the plurality of documents having a corresponding access control list (ACL), each ACL defining a plurality of users that are authorized to access a respective document, generating an index based on the plurality of users, the index comprising a plurality of partitions, each partition corresponding to a user of the plurality of users, and, for each document of the plurality of documents: ranking the users of the plurality of users, selecting a user as an indexing user based on the ranking, and storing the document in a partition of the index, the partition corresponding to the indexing user.Type: GrantFiled: March 7, 2012Date of Patent: October 6, 2015Assignee: Google Inc.Inventors: Jeffrey Korn, Ruoming Pang, David Held, Dhyanesh Harishchandra Damania
-
Publication number: 20150195295Abstract: A computer-implemented method for identifying on-line comments as being legitimate or illegitimate is disclosed. The method includes receiving a comment directed to a piece of on-line content, randomly determining whether to review the comment manually, and providing for review of information regarding the comment by a manual reviewer if a determination is made to manually review the comment. The chance of determining whether to review the comment manually is dependent on outcomes of prior manual reviews of received comments.Type: ApplicationFiled: July 6, 2010Publication date: July 9, 2015Inventors: Mark M. Sandler, Nir Ailon, Raoul Sam Daruwala, Ruoming Pang