Patents by Inventor Mike Chrzanowski
Mike Chrzanowski has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11836596Abstract: A system including one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers to implement a memory and memory-based neural network is described. The memory is configured to store a respective memory vector at each of a plurality of memory locations in the memory. The memory-based neural network is configured to: at each of a plurality of time steps: receive an input; determine an update to the memory, wherein determining the update comprising applying an attention mechanism over the memory vectors in the memory and the received input; update the memory using the determined update to the memory; and generate an output for the current time step using the updated memory.Type: GrantFiled: November 30, 2020Date of Patent: December 5, 2023Assignee: DeepMind Technologies LimitedInventors: Mike Chrzanowski, Jack William Rae, Ryan Faulkner, Theophane Guillaume Weber, David Nunes Raposo, Adam Anthony Santoro
-
Patent number: 11705107Abstract: Embodiments of a production-quality text-to-speech (TTS) system constructed from deep neural networks are described. System embodiments comprise five major building blocks: a segmentation model for locating phoneme boundaries, a grapheme-to-phoneme conversion model, a phoneme duration prediction model, a fundamental frequency prediction model, and an audio synthesis model. For embodiments of the segmentation model, phoneme boundary detection was performed with deep neural networks using Connectionist Temporal Classification (CTC) loss. For embodiments of the audio synthesis model, a variant of WaveNet was created that requires fewer parameters and trains faster than the original. By using a neural network for each component, system embodiments are simpler and more flexible than traditional TTS systems, where each component requires laborious feature engineering and extensive domain expertise. Inference with system embodiments may be performed faster than real time.Type: GrantFiled: October 1, 2020Date of Patent: July 18, 2023Assignee: Baidu USA LLCInventors: Sercan O. Arik, Mike Chrzanowski, Adam Coates, Gregory Diamos, Andrew Gibiansky, John Miller, Andrew Ng, Jonathan Raiman, Shubhahrata Sengupta, Mohammad Shoeybi
-
Publication number: 20220108680Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, synthesizing audio data from text data using duration prediction. One of the methods includes processing an input text sequence that includes a respective text element at each of multiple input time steps using a first neural network to generate a modified input sequence comprising, for each input time step, a representation of the corresponding text element in the input text sequence; processing the modified input sequence using a second neural network to generate, for each input time step, a predicted duration of the corresponding text element in the output audio sequence; upsampling the modified input sequence according to the predicted durations to generate an intermediate sequence comprising a respective intermediate element at each of a plurality of intermediate time steps; and generating an output audio sequence using the intermediate sequence.Type: ApplicationFiled: October 1, 2021Publication date: April 7, 2022Inventors: Yu Zhang, Isaac Elias, Byungha Chun, Ye Jia, Yonghui Wu, Mike Chrzanowski, Jonathan Shen
-
Publication number: 20210081795Abstract: A system including one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers to implement a memory and memory-based neural network is described. The memory is configured to store a respective memory vector at each of a plurality of memory locations in the memory. The memory-based neural network is configured to: at each of a plurality of time steps: receive an input; determine an update to the memory, wherein determining the update comprising applying an attention mechanism over the memory vectors in the memory and the received input; update the memory using the determined update to the memory; and generate an output for the current time step using the updated memory.Type: ApplicationFiled: November 30, 2020Publication date: March 18, 2021Inventors: Mike Chrzanowski, Jack William Rae, Ryan Faulkner, Theophane Guillaume Weber, David Nunes Raposo, Adam Anthony Santoro
-
Publication number: 20210027762Abstract: Embodiments of a production-quality text-to-speech (TTS) system constructed from deep neural networks are described. System embodiments comprise five major building blocks: a segmentation model for locating phoneme boundaries, a grapheme-to-phoneme conversion model, a phoneme duration prediction model, a fundamental frequency prediction model, and an audio synthesis model. For embodiments of the segmentation model, phoneme boundary detection was performed with deep neural networks using Connectionist Temporal Classification (CTC) loss. For embodiments of the audio synthesis model, a variant of WaveNet was created that requires fewer parameters and trains faster than the original. By using a neural network for each component, system embodiments are simpler and more flexible than traditional TTS systems, where each component requires laborious feature engineering and extensive domain expertise. Inference with system embodiments may be performed faster than real time.Type: ApplicationFiled: October 1, 2020Publication date: January 28, 2021Applicant: Baidu USA LLCInventors: Sercan O. ARIK, Mike CHRZANOWSKI, Adam COATES, Gregory DIAMOS, Andrew GIBIANSKY, John MILLER, Andrew NG, Jonathan RAIMAN, Shubhahrata SENGUPTA, Mohammad SHOEYBI
-
Patent number: 10872598Abstract: Embodiments of a production-quality text-to-speech (TTS) system constructed from deep neural networks are described. System embodiments comprise five major building blocks: a segmentation model for locating phoneme boundaries, a grapheme-to-phoneme conversion model, a phoneme duration prediction model, a fundamental frequency prediction model, and an audio synthesis model. For embodiments of the segmentation model, phoneme boundary detection was performed with deep neural networks using Connectionist Temporal Classification (CTC) loss. For embodiments of the audio synthesis model, a variant of WaveNet was created that requires fewer parameters and trains faster than the original. By using a neural network for each component, system embodiments are simpler and more flexible than traditional TTS systems, where each component requires laborious feature engineering and extensive domain expertise. Inference with system embodiments may be performed faster than real time.Type: GrantFiled: January 29, 2018Date of Patent: December 22, 2020Assignee: Baidu USA LLCInventors: Sercan O. Arik, Mike Chrzanowski, Adam Coates, Gregory Diamos, Andrew Gibiansky, John Miller, Andrew Ng, Jonathan Raiman, Shubhahrata Sengupta, Mohammad Shoeybi
-
Patent number: 10853725Abstract: A system including one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers to implement a memory and memory-based neural network is described. The memory is configured to store a respective memory vector at each of a plurality of memory locations in the memory. The memory-based neural network is configured to: at each of a plurality of time steps: receive an input; determine an update to the memory, wherein determining the update comprising applying an attention mechanism over the memory vectors in the memory and the received input; update the memory using the determined update to the memory; and generate an output for the current time step using the updated memory.Type: GrantFiled: May 17, 2019Date of Patent: December 1, 2020Assignee: DeepMind Technologies LimitedInventors: Mike Chrzanowski, Jack William Rae, Ryan Faulkner, Theophane Guillaume Weber, David Nunes Raposo, Adam Anthony Santoro
-
Publication number: 20190354858Abstract: A system including one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers to implement a memory and memory-based neural network is described. The memory is configured to store a respective memory vector at each of a plurality of memory locations in the memory. The memory-based neural network is configured to: at each of a plurality of time steps: receive an input; determine an update to the memory, wherein determining the update comprising applying an attention mechanism over the memory vectors in the memory and the received input; update the memory using the determined update to the memory; and generate an output for the current time step using the updated memory.Type: ApplicationFiled: May 17, 2019Publication date: November 21, 2019Inventors: Mike Chrzanowski, Jack William Rae, Ryan Faulkner, Theophane Guillaume Weber, David Nunes Raposo, Adam Anthony Santoro
-
Patent number: 10332509Abstract: Embodiments of end-to-end deep learning systems and methods are disclosed to recognize speech of vastly different languages, such as English or Mandarin Chinese. In embodiments, the entire pipelines of hand-engineered components are replaced with neural networks, and the end-to-end learning allows handling a diverse variety of speech including noisy environments, accents, and different languages. Using a trained embodiment and an embodiment of a batch dispatch technique with GPUs in a data center, an end-to-end deep learning system can be inexpensively deployed in an online setting, delivering low latency when serving users at scale.Type: GrantFiled: November 21, 2016Date of Patent: June 25, 2019Assignee: Baidu USA, LLCInventors: Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Erich Elsen, Jesse Engel, Christopher Fougner, Xu Han, Awni Hannun, Ryan Prenger, Sanjeev Satheesh, Shubhabrata Sengupta, Dani Yogatama, Chong Wang, Jun Zhan, Zhenyao Zhu, Dario Amodei
-
Patent number: 10319374Abstract: Embodiments of end-to-end deep learning systems and methods are disclosed to recognize speech of vastly different languages, such as English or Mandarin Chinese. In embodiments, the entire pipelines of hand-engineered components are replaced with neural networks, and the end-to-end learning allows handling a diverse variety of speech including noisy environments, accents, and different languages. Using a trained embodiment and an embodiment of a batch dispatch technique with GPUs in a data center, an end-to-end deep learning system can be inexpensively deployed in an online setting, delivering low latency when serving users at scale.Type: GrantFiled: November 21, 2016Date of Patent: June 11, 2019Assignee: Baidu USA, LLCInventors: Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Erich Elsen, Jesse Engel, Christopher Fougner, Xu Han, Awni Hannun, Ryan Prenger, Sanjeev Satheesh, Shubhabrata Sengupta, Dani Yogatama, Chong Wang, Jun Zhan, Zhenyao Zhu, Dario Amodei
-
Publication number: 20180247636Abstract: Embodiments of a production-quality text-to-speech (TTS) system constructed from deep neural networks are described. System embodiments comprise five major building blocks: a segmentation model for locating phoneme boundaries, a grapheme-to-phoneme conversion model, a phoneme duration prediction model, a fundamental frequency prediction model, and an audio synthesis model. For embodiments of the segmentation model, phoneme boundary detection was performed with deep neural networks using Connectionist Temporal Classification (CTC) loss. For embodiments of the audio synthesis model, a variant of WaveNet was created that requires fewer parameters and trains faster than the original. By using a neural network for each component, system embodiments are simpler and more flexible than traditional TTS systems, where each component requires laborious feature engineering and extensive domain expertise. Inference with system embodiments may be performed faster than real time.Type: ApplicationFiled: January 29, 2018Publication date: August 30, 2018Applicant: Baidu USA LLCInventors: Sercan O. ARIK, Mike CHRZANOWSKI, Adam COATES, Gregory DIAMOS, Andrew GIBIANSKY, John MILLER, Andrew NG, Jonathan RAIMAN, Shubhahrata SENGUPTA, Mohammad SHOEYBI
-
Publication number: 20170148433Abstract: Embodiments of end-to-end deep learning systems and methods are disclosed to recognize speech of vastly different languages, such as English or Mandarin Chinese. In embodiments, the entire pipelines of hand-engineered components are replaced with neural networks, and the end-to-end learning allows handling a diverse variety of speech including noisy environments, accents, and different languages. Using a trained embodiment and an embodiment of a batch dispatch technique with GPUs in a data center, an end-to-end deep learning system can be inexpensively deployed in an online setting, delivering low latency when serving users at scale.Type: ApplicationFiled: November 21, 2016Publication date: May 25, 2017Applicant: Baidu USA LLCInventors: Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Erich Elsen, Jesse Engel, Christopher Fougner, Xu Han, Awni Hannun, Ryan Prenger, Sanjeev Satheesh, Shubhabrata Sengupta, Dani Yogatama, Chong Wang, Jun Zhan, Zhenyao Zhu, Dario Amodei
-
Publication number: 20170148431Abstract: Embodiments of end-to-end deep learning systems and methods are disclosed to recognize speech of vastly different languages, such as English or Mandarin Chinese. In embodiments, the entire pipelines of hand-engineered components are replaced with neural networks, and the end-to-end learning allows handling a diverse variety of speech including noisy environments, accents, and different languages. Using a trained embodiment and an embodiment of a batch dispatch technique with GPUs in a data center, an end-to-end deep learning system can be inexpensively deployed in an online setting, delivering low latency when serving users at scale.Type: ApplicationFiled: November 21, 2016Publication date: May 25, 2017Applicant: Baidu USA LLCInventors: Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Erich Elsen, Jesse Engel, Christopher Fougner, Xu Han, Awni Hannun, Ryan Prenger, Sanjeev Satheesh, Shubhabrata Sengupta, Dani Yogatama, Chong Wang, Jun Zhan, Zhenyao Zhu, Dario Amodei