Patents by Inventor Jesse Engel

Jesse Engel has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

GENERATING MUSIC FROM IMAGES USING GENERATIVE NEURAL NETWORKS

Publication number: 20250349276

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating an audio signal. One of the methods includes receiving an input image; processing, using one or more generative neural networks, the input image to generate a music caption describing one or more audio features corresponding to the input image; and processing, using an audio generative neural network, the music caption to generate an audio signal described by the music caption.

Type: Application

Filed: May 13, 2024

Publication date: November 13, 2025

Inventors: Timo Immanuel Denk, Jesse Engel, Christian Frank
GENERATING AUDIO USING AUTO-REGRESSIVE GENERATIVE NEURAL NETWORKS

Publication number: 20250266035

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a prediction of an audio signal. One of the methods includes receiving a request to generate an audio signal conditioned on an input; processing the input using an embedding neural network to map the input to one or more embedding tokens; generating a semantic representation of the audio signal; generating, using one or more generative neural networks and conditioned on at least the semantic representation and the embedding tokens, an acoustic representation of the audio signal; and processing at least the acoustic representation using a decoder neural network to generate the prediction of the audio signal.

Type: Application

Filed: May 2, 2025

Publication date: August 21, 2025

Inventors: Andrea Agostinelli, Timo Immanuel Denk, Antoine Caillon, Neil Zeghidour, Jesse Engel, Mauro Verzetti, Christian Frank, Zalán Borsos, Matthew Sharifi, Adam Joseph Roberts, Marco Tagliasacchi
Generating audio using auto-regressive generative neural networks

Patent number: 12322380

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a prediction of an audio signal. One of the methods includes receiving a request to generate an audio signal conditioned on an input; processing the input using an embedding neural network to map the input to one or more embedding tokens; generating a semantic representation of the audio signal; generating, using one or more generative neural networks and conditioned on at least the semantic representation and the embedding tokens, an acoustic representation of the audio signal; and processing at least the acoustic representation using a decoder neural network to generate the prediction of the audio signal.

Type: Grant

Filed: January 12, 2024

Date of Patent: June 3, 2025

Assignee: Google LLC

Inventors: Andrea Agostinelli, Timo Immanuel Denk, Antoine Caillon, Neil Zeghidour, Jesse Engel, Mauro Verzetti, Christian Frank, Zalán Borsos, Matthew Sharifi, Adam Joseph Roberts, Marco Tagliasacchi
Machine-Learned Differentiable Digital Signal Processing

Publication number: 20240395277

Abstract: Systems and methods of the present disclosure are directed toward digital signal processing using machine-learned differentiable digital signal processors. For example, embodiments of the present disclosure may include differentiable digital signal processors within the training loop of a machine-learned model (e.g., for gradient-based training). Advantageously, systems and methods of the present disclosure provide high quality signal processing using smaller models than prior systems, thereby reducing energy costs (e.g., storage and/or processing costs) associated with performing digital signal processing.

Type: Application

Filed: August 1, 2024

Publication date: November 28, 2024

Inventors: Jesse Engel, Adam Roberts, Chenjie Gu, Lamtharn Hantrakul
Machine-learned differentiable digital signal processing

Patent number: 12080311

Abstract: Systems and methods of the present disclosure are directed toward digital signal processing using machine-learned differentiable digital signal processors. For example, embodiments of the present disclosure may include differentiable digital signal processors within the training loop of a machine-learned model (e.g., for gradient-based training). Advantageously, systems and methods of the present disclosure provide high quality signal processing using smaller models than prior systems, thereby reducing energy costs (e.g., storage and/or processing costs) associated with performing digital signal processing.

Type: Grant

Filed: June 29, 2023

Date of Patent: September 3, 2024

Assignee: GOOGLE LLC

Inventors: Jesse Engel, Adam Roberts, Chenjie Gu, Lamtharn Hantrakul
GENERATING AUDIO USING AUTO-REGRESSIVE GENERATIVE NEURAL NETWORKS

Publication number: 20240233713

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a prediction of an audio signal. One of the methods includes receiving a request to generate an audio signal conditioned on an input; processing the input using an embedding neural network to map the input to one or more embedding tokens; generating a semantic representation of the audio signal; generating, using one or more generative neural networks and conditioned on at least the semantic representation and the embedding tokens, an acoustic representation of the audio signal; and processing at least the acoustic representation using a decoder neural network to generate the prediction of the audio signal.

Type: Application

Filed: January 12, 2024

Publication date: July 11, 2024

Inventors: Andrea Agostinelli, Timo Immanuel Denk, Antoine Caillon, Neil Zeghidour, Jesse Engel, Mauro Verzetti, Christian Frank, Zalán Borsos, Matthew Sharifi, Adam Joseph Roberts, Marco Tagliasacchi
GENERATING AUDIO USING AUTO-REGRESSIVE GENERATIVE NEURAL NETWORKS

Publication number: 20240079001

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a prediction of an audio signal. One of the methods includes receiving a request to generate an audio signal conditioned on an input; processing the input using an embedding neural network to map the input to one or more embedding tokens; generating a semantic representation of the audio signal; generating, using one or more generative neural networks and conditioned on at least the semantic representation and the embedding tokens, an acoustic representation of the audio signal; and processing at least the acoustic representation using a decoder neural network to generate the prediction of the audio signal.

Type: Application

Filed: September 7, 2023

Publication date: March 7, 2024

Inventors: Andrea Agostinelli, Timo Immanuel Denk, Antoine Caillon, Neil Zeghidour, Jesse Engel, Mauro Verzetti, Christian Frank, Zalán Borsos, Matthew Sharifi, Adam Joseph Roberts
Generating audio using auto-regressive generative neural networks

Patent number: 11915689

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a prediction of an audio signal. One of the methods includes receiving a request to generate an audio signal conditioned on an input; processing the input using an embedding neural network to map the input to one or more embedding tokens; generating a semantic representation of the audio signal; generating, using one or more generative neural networks and conditioned on at least the semantic representation and the embedding tokens, an acoustic representation of the audio signal; and processing at least the acoustic representation using a decoder neural network to generate the prediction of the audio signal.

Type: Grant

Filed: September 7, 2023

Date of Patent: February 27, 2024

Assignee: Google LLC

Inventors: Andrea Agostinelli, Timo Immanuel Denk, Antoine Caillon, Neil Zeghidour, Jesse Engel, Mauro Verzetti, Christian Frank, Zalán Borsos, Matthew Sharifi, Adam Joseph Roberts, Marco Tagliasacchi
Machine-Learned Differentiable Digital Signal Processing

Publication number: 20230343348

Abstract: Systems and methods of the present disclosure are directed toward digital signal processing using machine-learned differentiable digital signal processors. For example, embodiments of the present disclosure may include differentiable digital signal processors within the training loop of a machine-learned model (e.g., for gradient-based training). Advantageously, systems and methods of the present disclosure provide high quality signal processing using smaller models than prior systems, thereby reducing energy costs (e.g., storage and/or processing costs) associated with performing digital signal processing.

Type: Application

Filed: June 29, 2023

Publication date: October 26, 2023

Inventors: Jesse Engel, Adam Roberts, Chenjie Gu, Lamtharn Hantrakul
Machine-learned differentiable digital signal processing

Patent number: 11735197

Abstract: Systems and methods of the present disclosure are directed toward digital signal processing using machine-learned differentiable digital signal processors. For example, embodiments of the present disclosure may include differentiable digital signal processors within the training loop of a machine-learned model (e.g., for gradient-based training). Advantageously, systems and methods of the present disclosure provide high quality signal processing using smaller models than prior systems, thereby reducing energy costs (e.g., storage and/or processing costs) associated with performing digital signal processing.

Type: Grant

Filed: July 7, 2020

Date of Patent: August 22, 2023

Assignee: GOOGLE LLC

Inventors: Jesse Engel, Adam Roberts, Chenjie Gu, Lamtharn Hantrakul
GENERATING SEQUENCES OF DATA ELEMENTS USING CROSS-ATTENTION OPERATIONS

Publication number: 20230244907

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating a sequence of data elements that includes a respective data element at each position in a sequence of positions. In one aspect, a method includes: for each position after a first position in the sequence of positions: obtaining a current sequence of data element embeddings that includes a respective data element embedding of each data element at a position that precedes the current position, obtaining a sequence of latent embeddings, and processing: (i) the current sequence of data element embeddings, and (ii) the sequence of latent embeddings, using a neural network to generate the data element at the current position. The neural network includes a sequence of neural network blocks including: (i) a cross-attention block, (ii) one or more self-attention blocks, and (iii) an output block.

Type: Application

Filed: January 30, 2023

Publication date: August 3, 2023

Inventors: Curtis Glenn-Macway Hawthorne, Andrew Coulter Jaegle, Catalina-Codruta Cangea, Sebastian Borgeaud Dit Avocat, Charlie Thomas Curtis Nash, Mateusz Malinowski, Sander Etienne Lea Dieleman, Oriol Vinyals, Matthew Botvinick, Ian Stuart Simon, Hannah Rachel Sheahan, Neil Zeghidour, Jean-Baptiste Alayrac, Joao Carreira, Jesse Engel
Machine-Learned Differentiable Digital Signal Processing

Publication number: 20220013132

Abstract: Systems and methods of the present disclosure are directed toward digital signal processing using machine-learned differentiable digital signal processors. For example, embodiments of the present disclosure may include differentiable digital signal processors within the training loop of a machine-learned model (e.g., for gradient-based training). Advantageously, systems and methods of the present disclosure provide high quality signal processing using smaller models than prior systems, thereby reducing energy costs (e.g., storage and/or processing costs) associated with performing digital signal processing.

Type: Application

Filed: July 7, 2020

Publication date: January 13, 2022

Inventors: Jesse Engel, Adam Roberts, Chenjie Gu, Lamtharn Hantrakul
Systems and methods for a multi-core optimized recurrent neural network

Patent number: 10832120

Abstract: Systems and methods for a multi-core optimized Recurrent Neural Network (RNN) architecture are disclosed. The various architectures affect communication and synchronization operations according to the Multi-Bulk-Synchronous-Parallel (MBSP) model for a given processor. The resulting family of network architectures, referred to as MBSP-RNNs, perform similarly to a conventional RNNs having the same number of parameters, but are substantially more efficient when mapped onto a modern general purpose processor. Due to the large gain in computational efficiency, for a fixed computational budget, MBSP-RNNs outperform RNNs at applications such as end-to-end speech recognition.

Type: Grant

Filed: April 5, 2016

Date of Patent: November 10, 2020

Assignee: Baidu USA LLC

Inventors: Gregory Diamos, Awni Hannun, Bryan Catanzaro, Dario Amodei, Erich Elsen, Jesse Engel, Shubhabrata Sengupta
End-to-end speech recognition

Patent number: 10332509

Abstract: Embodiments of end-to-end deep learning systems and methods are disclosed to recognize speech of vastly different languages, such as English or Mandarin Chinese. In embodiments, the entire pipelines of hand-engineered components are replaced with neural networks, and the end-to-end learning allows handling a diverse variety of speech including noisy environments, accents, and different languages. Using a trained embodiment and an embodiment of a batch dispatch technique with GPUs in a data center, an end-to-end deep learning system can be inexpensively deployed in an online setting, delivering low latency when serving users at scale.

Type: Grant

Filed: November 21, 2016

Date of Patent: June 25, 2019

Assignee: Baidu USA, LLC

Inventors: Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Erich Elsen, Jesse Engel, Christopher Fougner, Xu Han, Awni Hannun, Ryan Prenger, Sanjeev Satheesh, Shubhabrata Sengupta, Dani Yogatama, Chong Wang, Jun Zhan, Zhenyao Zhu, Dario Amodei
Deployed end-to-end speech recognition

Patent number: 10319374

Abstract: Embodiments of end-to-end deep learning systems and methods are disclosed to recognize speech of vastly different languages, such as English or Mandarin Chinese. In embodiments, the entire pipelines of hand-engineered components are replaced with neural networks, and the end-to-end learning allows handling a diverse variety of speech including noisy environments, accents, and different languages. Using a trained embodiment and an embodiment of a batch dispatch technique with GPUs in a data center, an end-to-end deep learning system can be inexpensively deployed in an online setting, delivering low latency when serving users at scale.

Type: Grant

Filed: November 21, 2016

Date of Patent: June 11, 2019

Assignee: Baidu USA, LLC

Inventors: Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Erich Elsen, Jesse Engel, Christopher Fougner, Xu Han, Awni Hannun, Ryan Prenger, Sanjeev Satheesh, Shubhabrata Sengupta, Dani Yogatama, Chong Wang, Jun Zhan, Zhenyao Zhu, Dario Amodei
Generating music with deep neural networks

Patent number: 10068557

Abstract: The present disclosure provides systems and methods that include or otherwise leverage a machine-learned neural synthesizer model. Unlike a traditional synthesizer which generates audio from hand-designed components like oscillators and wavetables, the neural synthesizer model can use deep neural networks to generate sounds at the level of individual samples. Learning directly from data, the neural synthesizer model can provide intuitive control over timbre and dynamics and enable exploration of new sounds that would be difficult or impossible to produce with a hand-tuned synthesizer. As one example, the neural synthesizer model can be a neural synthesis autoencoder that includes an encoder model that learns embeddings descriptive of musical characteristics and an autoregressive decoder model that is conditioned on the embedding to autoregressively generate musical waveforms that have the musical characteristics one audio sample at a time.

Type: Grant

Filed: August 23, 2017

Date of Patent: September 4, 2018

Assignee: Google LLC

Inventors: Jesse Engel, Mohammad Norouzi, Karen Simonyan, Adam Roberts, Cinjon Resnick, Sander Etienne Lea Dieleman, Douglas Eck
SYSTEMS AND METHODS FOR A MULTI-CORE OPTIMIZED RECURRENT NEURAL NETWORK

Publication number: 20170169326

Abstract: Systems and methods for a multi-core optimized Recurrent Neural Network (RNN) architecture are disclosed. The various architectures affect communication and synchronization operations according to the Multi-Bulk-Synchronous-Parallel (MBSP) model for a given processor. The resulting family of network architectures, referred to as MBSP-RNNs, perform similarly to a conventional RNNs having the same number of parameters, but are substantially more efficient when mapped onto a modern general purpose processor. Due to the large gain in computational efficiency, for a fixed computational budget, MBSP-RNNs outperform RNNs at applications such as end-to-end speech recognition.

Type: Application

Filed: April 5, 2016

Publication date: June 15, 2017

Applicant: Baidu USA LLC

Inventors: Gregory Diamos, Awni Hannun, Bryan Catanzaro, Dario Amodei, Erich Elsen, Jesse Engel, Shubhabrata Sengupta
DEPLOYED END-TO-END SPEECH RECOGNITION

Publication number: 20170148433

Abstract: Embodiments of end-to-end deep learning systems and methods are disclosed to recognize speech of vastly different languages, such as English or Mandarin Chinese. In embodiments, the entire pipelines of hand-engineered components are replaced with neural networks, and the end-to-end learning allows handling a diverse variety of speech including noisy environments, accents, and different languages. Using a trained embodiment and an embodiment of a batch dispatch technique with GPUs in a data center, an end-to-end deep learning system can be inexpensively deployed in an online setting, delivering low latency when serving users at scale.

Type: Application

Filed: November 21, 2016

Publication date: May 25, 2017

Applicant: Baidu USA LLC

Inventors: Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Erich Elsen, Jesse Engel, Christopher Fougner, Xu Han, Awni Hannun, Ryan Prenger, Sanjeev Satheesh, Shubhabrata Sengupta, Dani Yogatama, Chong Wang, Jun Zhan, Zhenyao Zhu, Dario Amodei
END-TO-END SPEECH RECOGNITION

Publication number: 20170148431

Abstract: Embodiments of end-to-end deep learning systems and methods are disclosed to recognize speech of vastly different languages, such as English or Mandarin Chinese. In embodiments, the entire pipelines of hand-engineered components are replaced with neural networks, and the end-to-end learning allows handling a diverse variety of speech including noisy environments, accents, and different languages. Using a trained embodiment and an embodiment of a batch dispatch technique with GPUs in a data center, an end-to-end deep learning system can be inexpensively deployed in an online setting, delivering low latency when serving users at scale.

Type: Application

Filed: November 21, 2016

Publication date: May 25, 2017

Applicant: Baidu USA LLC

Inventors: Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Erich Elsen, Jesse Engel, Christopher Fougner, Xu Han, Awni Hannun, Ryan Prenger, Sanjeev Satheesh, Shubhabrata Sengupta, Dani Yogatama, Chong Wang, Jun Zhan, Zhenyao Zhu, Dario Amodei