Patents by Inventor Navdeep Jaitly

Navdeep Jaitly has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

END-TO-END TEXT-TO-SPEECH CONVERSION

Publication number: 20250078809

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating speech from text. One of the systems includes one or more computers and one or more storage devices storing instructions that when executed by one or more computers cause the one or more computers to implement: a sequence-to-sequence recurrent neural network configured to: receive a sequence of characters in a particular natural language, and process the sequence of characters to generate a spectrogram of a verbal utterance of the sequence of characters in the particular natural language; and a subsystem configured to: receive the sequence of characters in the particular natural language, and provide the sequence of characters as input to the sequence-to-sequence recurrent neural network to obtain as output the spectrogram of the verbal utterance of the sequence of characters in the particular natural language.

Type: Application

Filed: November 18, 2024

Publication date: March 6, 2025

Inventors: Samuel Bengio, Yuxuan Wang, Zongheng Yang, Zhifeng Chen, Yonghui Wu, Ioannis Agiomyrgiannakis, Ron J. Weiss, Navdeep Jaitly, Ryan M. Rifkin, Robert Andrew James Clark, Quoc V. Le, Russell J. Ryan, Ying Xiao
Sequence modeling using imputation

Patent number: 12242818

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for sequence modeling. One of the methods includes receiving an input sequence having a plurality of input positions; determining a plurality of blocks of consecutive input positions; processing the input sequence using a neural network to generate a latent alignment, comprising, at each of a plurality of input time steps: receiving a partial latent alignment from a previous input time step; selecting an input position in each block, wherein the token at the selected input position of the partial latent alignment in each block is a mask token; and processing the partial latent alignment and the input sequence using the neural network to generate a new latent alignment, wherein the new latent alignment comprises, at the selected input position in each block, an output token or a blank token; and generating, using the latent alignment, an output sequence.

Type: Grant

Filed: February 8, 2021

Date of Patent: March 4, 2025

Assignee: Google LLC

Inventors: William Chan, Chitwan Saharia, Geoffrey E. Hinton, Mohammad Norouzi, Navdeep Jaitly
End-to-end text-to-speech conversion

Patent number: 12190860

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating speech from text. One of the systems includes one or more computers and one or more storage devices storing instructions that when executed by one or more computers cause the one or more computers to implement: a sequence-to-sequence recurrent neural network configured to: receive a sequence of characters in a particular natural language, and process the sequence of characters to generate a spectrogram of a verbal utterance of the sequence of characters in the particular natural language; and a subsystem configured to: receive the sequence of characters in the particular natural language, and provide the sequence of characters as input to the sequence-to-sequence recurrent neural network to obtain as output the spectrogram of the verbal utterance of the sequence of characters in the particular natural language.

Type: Grant

Filed: November 21, 2023

Date of Patent: January 7, 2025

Assignee: Google LLC

Inventors: Samuel Bengio, Yuxuan Wang, Zongheng Yang, Zhifeng Chen, Yonghui Wu, Ioannis Agiomyrgiannakis, Ron J. Weiss, Navdeep Jaitly, Ryan M. Rifkin, Robert Andrew James Clark, Quoc V. Le, Russell J. Ryan, Ying Xiao
SPEECH RECOGNITION WITH SEQUENCE-TO-SEQUENCE MODELS

Publication number: 20240420686

Abstract: A method for performing speech recognition using sequence-to-sequence models includes receiving audio data for an utterance and providing features indicative of acoustic characteristics of the utterance as input to an encoder. The method also includes processing an output of the encoder using an attender to generate a context vector, generating speech recognition scores using the context vector and a decoder trained using a training process, and generating a transcription for the utterance using word elements selected based on the speech recognition scores. The transcription is provided as an output of the ASR system.

Type: Application

Filed: August 26, 2024

Publication date: December 19, 2024

Applicant: Google LLC

Inventors: Rohit Prakash Prabhavalkar, Zhifeng Chen, Bo Li, Chung-Cheng Chiu, Kanury Kanishka Rao, Yonghui Wu, Ron J. Weiss, Navdeep Jaitly, Michiel A. U. Bacchiani, Tara N. Sainath, Jan Kazimierz Chorowski, Anjuli Patricia Kannan, Ekaterina Gonina, Patrick An Phu Nguyen
Synthesizing speech from text using neural networks

Patent number: 12148444

Abstract: Methods, systems, and computer program products for generating, from an input character sequence, an output sequence of audio data representing the input character sequence. The output sequence of audio data includes a respective audio output sample for each of a number of time steps. One example method includes, for each of the time steps: generating a mel-frequency spectrogram for the time step by processing a representation of a respective portion of the input character sequence using a decoder neural network; generating a probability distribution over a plurality of possible audio output samples for the time step by processing the mel-frequency spectrogram for the time step using a vocoder neural network; and selecting the audio output sample for the time step from the possible audio output samples in accordance with the probability distribution.

Type: Grant

Filed: April 5, 2021

Date of Patent: November 19, 2024

Assignee: Google LLC

Inventors: Yonghui Wu, Jonathan Shen, Ruoming Pang, Ron J. Weiss, Michael Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, Russell John Wyatt Skerry-Ryan, Ryan M. Rifkin, Ioannis Agiomyrgiannakis
Speech recognition with sequence-to-sequence models

Patent number: 12106749

Abstract: A method for performing speech recognition using sequence-to-sequence models includes receiving audio data for an utterance and providing features indicative of acoustic characteristics of the utterance as input to an encoder. The method also includes processing an output of the encoder using an attender to generate a context vector, generating speech recognition scores using the context vector and a decoder trained using a training process, and generating a transcription for the utterance using word elements selected based on the speech recognition scores. The transcription is provided as an output of the ASR system.

Type: Grant

Filed: September 20, 2021

Date of Patent: October 1, 2024

Assignee: Google LLC

Inventors: Rohit Prakash Prabhavalkar, Zhifeng Chen, Bo Li, Chung-cheng Chiu, Kanury Kanishka Rao, Yonghui Wu, Ron J. Weiss, Navdeep Jaitly, Michiel A. u. Bacchiani, Tara N. Sainath, Jan Kazimierz Chorowski, Anjuli Patricia Kannan, Ekaterina Gonina, Patrick An Phu Nguyen
Speech recognition with attention-based recurrent neural networks

Patent number: 12100391

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for speech recognition. One method includes obtaining an input acoustic sequence, the input acoustic sequence representing an utterance, and the input acoustic sequence comprising a respective acoustic feature representation at each of a first number of time steps; processing the input acoustic sequence using a first neural network to convert the input acoustic sequence into an alternative representation for the input acoustic sequence; processing the alternative representation for the input acoustic sequence using an attention-based Recurrent Neural Network (RNN) to generate, for each position in an output sequence order, a set of substring scores that includes a respective substring score for each substring in a set of substrings; and generating a sequence of substrings that represent a transcription of the utterance.

Type: Grant

Filed: October 7, 2021

Date of Patent: September 24, 2024

Assignee: Google LLC

Inventors: William Chan, Navdeep Jaitly, Quoc V. Le, Oriol Vinyals, Noam M. Shazeer
END-TO-END TEXT-TO-SPEECH CONVERSION

Publication number: 20240127791

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating speech from text. One of the systems includes one or more computers and one or more storage devices storing instructions that when executed by one or more computers cause the one or more computers to implement: a sequence-to-sequence recurrent neural network configured to: receive a sequence of characters in a particular natural language, and process the sequence of characters to generate a spectrogram of a verbal utterance of the sequence of characters in the particular natural language; and a subsystem configured to: receive the sequence of characters in the particular natural language, and provide the sequence of characters as input to the sequence-to-sequence recurrent neural network to obtain as output the spectrogram of the verbal utterance of the sequence of characters in the particular natural language.

Type: Application

Filed: November 21, 2023

Publication date: April 18, 2024

Inventors: Samuel Bengio, Yuxuan Wang, Zongheng Yang, Zhifeng Chen, Yonghui Wu, Ioannis Agiomyrgiannakis, Ron J. Weiss, Navdeep Jaitly, Ryan M. Rifkin, Robert Andrew James Clark, Quoc V. Le, Russell J. Ryan, Ying Xiao
Training recurrent neural networks to generate sequences

Patent number: 11954594

Abstract: This document generally describes a neural network training system, including one or more computers, that trains a recurrent neural network (RNN) to receive an input, e.g., an input sequence, and to generate a sequence of outputs from the input sequence. In some implementations, training can include, for each position after an initial position in a training target sequence, selecting a preceding output of the RNN to provide as input to the RNN at the position, including determining whether to select as the preceding output (i) a true output in a preceding position in the output order or (ii) a value derived from an output of the RNN for the preceding position in an output order generated in accordance with current values of the parameters of the recurrent neural network.

Type: Grant

Filed: May 10, 2021

Date of Patent: April 9, 2024

Assignee: Google LLC

Inventors: Samy Bengio, Oriol Vinyals, Navdeep Jaitly, Noam M. Shazeer
End-to-end text-to-speech conversion

Patent number: 11862142

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating speech from text. One of the systems includes one or more computers and one or more storage devices storing instructions that when executed by one or more computers cause the one or more computers to implement: a sequence-to-sequence recurrent neural network configured to: receive a sequence of characters in a particular natural language, and process the sequence of characters to generate a spectrogram of a verbal utterance of the sequence of characters in the particular natural language; and a subsystem configured to: receive the sequence of characters in the particular natural language, and provide the sequence of characters as input to the sequence-to-sequence recurrent neural network to obtain as output the spectrogram of the verbal utterance of the sequence of characters in the particular natural language.

Type: Grant

Filed: August 2, 2021

Date of Patent: January 2, 2024

Assignee: Google LLC

Inventors: Samuel Bengio, Yuxuan Wang, Zongheng Yang, Zhifeng Chen, Yonghui Wu, Ioannis Agiomyrgiannakis, Ron J. Weiss, Navdeep Jaitly, Ryan M. Rifkin, Robert Andrew James Clark, Quoc V. Le, Russell J. Ryan, Ying Xiao
GENERATING STRUCTURED TEXT CONTENT USING SPEECH RECOGNITION MODELS

Publication number: 20230386652

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for speech recognition. One method includes obtaining an input acoustic sequence, the input acoustic sequence representing one or more utterances; processing the input acoustic sequence using a speech recognition model to generate a transcription of the input acoustic sequence, wherein the speech recognition model comprises a domain-specific language model; and providing the generated transcription of the input acoustic sequence as input to a domain-specific predictive model to generate structured text content that is derived from the transcription of the input acoustic sequence.

Type: Application

Filed: August 15, 2023

Publication date: November 30, 2023

Inventors: Christopher S. Co, Navdeep Jaitly, Lily Hao Yi Peng, Katherine Irene Chou, Ananth Sankar
Generating structured text content using speech recognition models

Patent number: 11763936

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for speech recognition. One method includes obtaining an input acoustic sequence, the input acoustic sequence representing one or more utterances; processing the input acoustic sequence using a speech recognition model to generate a transcription of the input acoustic sequence, wherein the speech recognition model comprises a domain-specific language model; and providing the generated transcription of the input acoustic sequence as input to a domain-specific predictive model to generate structured text content that is derived from the transcription of the input acoustic sequence.

Type: Grant

Filed: December 4, 2020

Date of Patent: September 19, 2023

Assignee: Google LLC

Inventors: Christopher S. Co, Navdeep Jaitly, Lily Hao Yi Peng, Katherine Irene Chou, Ananth Sankar
Recurrent neural networks for online sequence generation

Patent number: 11625572

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating a target sequence from a source sequence. In one aspect, the system includes a recurrent neural network configured to, at each time step, receive an input for the time step and process the input to generate a progress score and a set of output scores; and a subsystem configured to, at each time step, generate the recurrent neural network input and provide the input to the recurrent neural network; determine, from the progress score, whether or not to emit a new output at the time step; and, in response to determining to emit a new output, select an output using the output scores and emit the selected output as the output at a next position in the output order.

Type: Grant

Filed: May 3, 2018

Date of Patent: April 11, 2023

Assignee: Google LLC

Inventors: Chung-Cheng Chiu, Navdeep Jaitly, John Dieterich Lawson, George Jay Tucker
SEQUENCE MODELING USING IMPUTATION

Publication number: 20230075716

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for sequence modeling. One of the methods includes receiving an input sequence having a plurality of input positions; determining a plurality of blocks of consecutive input positions; processing the input sequence using a neural network to generate a latent alignment, comprising, at each of a plurality of input time steps: receiving a partial latent alignment from a previous input time step; selecting an input position in each block, wherein the token at the selected input position of the partial latent alignment in each block is a mask token; and processing the partial latent alignment and the input sequence using the neural network to generate a new latent alignment, wherein the new latent alignment comprises, at the selected input position in each block, an output token or a blank token; and generating, using the latent alignment, an output sequence.

Type: Application

Filed: February 8, 2021

Publication date: March 9, 2023

Inventors: William Chan, Chitwan Saharia, Geoffrey E. Hinton, Mohammad Norouzi, Navdeep Jaitly
GENERATING OUTPUT SEQUENCES FROM INPUT SEQUENCES USING NEURAL NETWORKS

Publication number: 20220138531

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating output sequences from input sequences. One of the methods includes obtaining an input sequence having a first number of inputs arranged according to an input order; processing each input in the input sequence using an encoder recurrent neural network to generate a respective encoder hidden state for each input in the input sequence; and generating an output sequence having a second number of outputs arranged according to an output order, each output in the output sequence being selected from the inputs in the input sequence, comprising, for each position in the output order: generating a softmax output for the position using the encoder hidden states that is a pointer into the input sequence; and selecting an input from the input sequence as the output at the position using the softmax output.

Type: Application

Filed: January 13, 2022

Publication date: May 5, 2022

Inventors: Oriol Vinyals, Navdeep Jaitly
SPEECH RECOGNITION WITH ATTENTION-BASED RECURRENT NEURAL NETWORKS

Publication number: 20220028375

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media for speech recognition. One method includes obtaining an input acoustic sequence, the input acoustic sequence representing an utterance, and the input acoustic sequence comprising a respective acoustic feature representation at each of a first number of time steps; processing the input acoustic sequence using a first neural network to convert the input acoustic sequence into an alternative representation for the input acoustic sequence; processing the alternative representation for the input acoustic sequence using an attention-based Recurrent Neural Network (RNN) to generate, for each position in an output sequence order, a set of substring scores that includes a respective substring score for each substring in a set of substrings; and generating a sequence of substrings that represent a transcription of the utterance.

Type: Application

Filed: October 7, 2021

Publication date: January 27, 2022

Applicant: Google LLC

Inventors: William Chan, Navdeep Jaitly, Quoc V. Le, Oriol Vinyals, Noam M. Shazeer
Generating output sequences from input sequences using neural networks

Patent number: 11227206

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating output sequences from input sequences. One of the methods includes obtaining an input sequence having a first number of inputs arranged according to an input order; processing each input in the input sequence using an encoder recurrent neural network to generate a respective encoder hidden state for each input in the input sequence; and generating an output sequence having a second number of outputs arranged according to an output order, each output in the output sequence being selected from the inputs in the input sequence, comprising, for each position in the output order: generating a softmax output for the position using the encoder hidden states that is a pointer into the input sequence; and selecting an input from the input sequence as the output at the position using the softmax output.

Type: Grant

Filed: August 27, 2019

Date of Patent: January 18, 2022

Assignee: Google LLC

Inventors: Oriol Vinyals, Navdeep Jaitly
SPEECH RECOGNITION WITH SEQUENCE-TO-SEQUENCE MODELS

Publication number: 20220005465

Abstract: A method for performing speech recognition using sequence-to-sequence models includes receiving audio data for an utterance and providing features indicative of acoustic characteristics of the utterance as input to an encoder. The method also includes processing an output of the encoder using an attender to generate a context vector, generating speech recognition scores using the context vector and a decoder trained using a training process, and generating a transcription for the utterance using word elements selected based on the speech recognition scores. The transcription is provided as an output of the ASR system.

Type: Application

Filed: September 20, 2021

Publication date: January 6, 2022

Applicant: Google LLC

Inventors: Rohit Prakash Prabhavalkar, Zhifeng Chen, Bo Li, Chung-cheng Chiu, Kanury Kanishka Rao, Yonghui Wu, Ron J. Weiss, Navdeep Jaitly, Michiel A.u. Bacchiani, Tara N. Sainath, Jan Kazimierz Chorowski, Anjuli Patricia Kannan, Ekaterina Gonina, Patrick An Phu Nguyen
Generating target sequences from input sequences using partial conditioning

Patent number: 11195521

Abstract: A system can be configured to perform tasks such as converting recorded speech to a sequence of phonemes that represent the speech, converting an input sequence of graphemes into a target sequence of phonemes, translating an input sequence of words in one language into a corresponding sequence of words in another language, or predicting a target sequence of words that follow an input sequence of words in a language (e.g., a language model). In a speech recognizer, the RNN system may be used to convert speech to a target sequence of phonemes in real-time so that a transcription of the speech can be generated and presented to a user, even before the user has completed uttering the entire speech input.

Type: Grant

Filed: February 4, 2020

Date of Patent: December 7, 2021

Assignee: Google LLC

Inventors: Navdeep Jaitly, Quoc V. Le, Oriol Vinyals, Samuel Bengio, Ilya Sutskever
END-TO-END TEXT-TO-SPEECH CONVERSION

Publication number: 20210366463

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating speech from text. One of the systems includes one or more computers and one or more storage devices storing instructions that when executed by one or more computers cause the one or more computers to implement: a sequence-to-sequence recurrent neural network configured to: receive a sequence of characters in a particular natural language, and process the sequence of characters to generate a spectrogram of a verbal utterance of the sequence of characters in the particular natural language; and a subsystem configured to: receive the sequence of characters in the particular natural language, and provide the sequence of characters as input to the sequence-to-sequence recurrent neural network to obtain as output the spectrogram of the verbal utterance of the sequence of characters in the particular natural language.

Type: Application

Filed: August 2, 2021

Publication date: November 25, 2021

Inventors: Samuel Bengio, Yuxuan Wang, Zongheng Yang, Zhifeng Chen, Yonghui Wu, Ioannis Agiomyrgiannakis, Ron J. Weiss, Navdeep Jaitly, Ryan M. Rifkin, Robert Andrew James Clark, Quoc V. Le, Russell J. Ryan, Ying Xiao

1 2 3 4 next