Patents by Inventor Neil Zeghidour

Neil Zeghidour has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Generating audio using auto-regressive generative neural networks

Patent number: 12322380

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a prediction of an audio signal. One of the methods includes receiving a request to generate an audio signal conditioned on an input; processing the input using an embedding neural network to map the input to one or more embedding tokens; generating a semantic representation of the audio signal; generating, using one or more generative neural networks and conditioned on at least the semantic representation and the embedding tokens, an acoustic representation of the audio signal; and processing at least the acoustic representation using a decoder neural network to generate the prediction of the audio signal.

Type: Grant

Filed: January 12, 2024

Date of Patent: June 3, 2025

Assignee: Google LLC

Inventors: Andrea Agostinelli, Timo Immanuel Denk, Antoine Caillon, Neil Zeghidour, Jesse Engel, Mauro Verzetti, Christian Frank, Zalán Borsos, Matthew Sharifi, Adam Joseph Roberts, Marco Tagliasacchi
SEMI-SUPERVISED TEXT-TO-SPEECH BY GENERATING SEMANTIC AND ACOUSTIC REPRESENTATIONS

Publication number: 20250157456

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating an audio signal from input text. In one aspect, a method comprises receiving a request to convert input text into an audio signal, wherein the input text comprises multiple tokenized text inputs, generating, using a first generative neural network, a semantic representation of the tokenized text inputs comprising semantic tokens representing semantic content of the tokenized text inputs, each semantic token being selected from a vocabulary of semantic tokens, generating, using a second generative neural network and conditioned on at least the semantic representation, an acoustic representation of the semantic representation comprising one or more respective acoustic tokens representing acoustic properties of the audio signal, and processing the acoustic representation using a decoder neural network to generate the audio signal.

Type: Application

Filed: January 26, 2024

Publication date: May 15, 2025

Inventors: Evgeny Kharitonov, Damien Vincent, Zalán Borsos, Raphaël Marinier, Olivier Claude Pietquin, Matthew Sharifi, Marco Tagliasacchi, Neil Zeghidour
GENERATING CODED DATA REPRESENTATIONS USING NEURAL NETWORKS AND VECTOR QUANTIZERS

Publication number: 20250131932

Abstract: Methods, systems and apparatus, including computer programs encoded on computer storage media. According to one aspect, there is provided a method comprising: receiving a new input; processing the new input using an encoder neural network to generate a feature vector representing the new input; and generating a coded representation of the feature vector using a sequence of vector quantizers that are each associated with a respective codebook of code vectors, wherein the coded representation of the feature vector identifies a plurality of code vectors, including a respective code vector from the codebook of each vector quantizer, that define a quantized representation of the feature vector.

Type: Application

Filed: December 6, 2024

Publication date: April 24, 2025

Inventors: Neil Zeghidour, Marco Tagliasacchi, Dominik Roblek
Separating speech by source in audio recordings by predicting isolated audio signals conditioned on speaker representations

Patent number: 12236970

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing speech separation. One of the methods includes obtaining a recording comprising speech from a plurality of speakers; processing the recording using a speaker neural network having speaker parameter values and configured to process the recording in accordance with the speaker parameter values to generate a plurality of per-recording speaker representations, each speaker representation representing features of a respective identified speaker in the recording; and processing the per-recording speaker representations and the recording using a separation neural network having separation parameter values and configured to process the recording and the speaker representations in accordance with the separation parameter values to generate, for each speaker representation, a respective predicted isolated audio signal that corresponds to speech of one of the speakers in the recording.

Type: Grant

Filed: October 17, 2022

Date of Patent: February 25, 2025

Assignee: Google LLC

Inventors: Neil Zeghidour, David Grangier
USING MACHINE LEARNING AND DISCRETE TOKENS TO ESTIMATE DIFFERENT SOUND SOURCES FROM AUDIO MIXTURES

Publication number: 20250054500

Abstract: A system and method are disclosed. Audio input comprising the mixed audio signals is received by one or more client devices. The audio input is converted into a plurality of discrete tokens. A plurality of sound sources, each corresponding to a subset of discrete tokens of a plurality of subsets of discrete tokens, is determined using a trained machine learning model.

Type: Application

Filed: August 13, 2023

Publication date: February 13, 2025

Inventors: Hakan Erdogan, Scott Thomas Wisdom, John Hershey, Zalán Borsos, Marco Tagliasacchi, Neil Zeghidour, Xuankai Chang
COMPRESSING AUDIO WAVEFORMS USING A STRUCTURED LATENT SPACE

Publication number: 20250022477

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an encoder neural network and a decoder neural network. In one aspect, a method includes obtaining a first initial audio waveform and a first noisy audio waveform, obtaining a second initial audio waveform and a second noisy audio waveform, processing the first noisy audio waveform and the second noisy audio waveform using an encoder neural network, generating a blended embedding by concatenating: (i) clean feature dimensions from an embedding of the first noisy audio waveform, and (ii) noise feature dimensions from an embedding of the second noisy audio waveform, processing the blended embedding using a decoder neural network to generate a reconstructed audio waveform, determining gradients of an objective function; and updating parameter values of the encoder neural network and the decoder neural network using the gradients.

Type: Application

Filed: March 16, 2023

Publication date: January 16, 2025

Inventors: Ahmed Omran, Neil Zeghidour, Zalán Borsos, Félix de Chaumont Quitry, Marco Tagliasacchi
Spatial audio recording from home assistant devices

Patent number: 12200465

Abstract: The technology generally relates to spatial audio communication between devices. For example, a first device and a second device may be connected via a communication link. The first device may capture audio signals in an environment through two or more microphones. The first device may encode the captured audio with spatial configuration data. The first device may transmit the encoded audio via the communication link to the second device. The second device may decode the encoded audio into binaural or ambisonic audio to be output by one or more speakers of the second device. The binaural or ambisonic audio may be converted into spatial audio to be output. The second device may output the binaural or spatial audio to create an immersive listening experience.

Type: Grant

Filed: May 19, 2022

Date of Patent: January 14, 2025

Assignee: Google LLC

Inventors: Rajeev Conrad Nongpiur, Qian Zhang, Andrew James Sutter, Kung-Wei Liu, Jihan Li, Hélène Bahu, Leonardo Kusumo, Sze Chie Lim, Marco Tagliasacchi, Neil Zeghidour, Michael Takezo Chinen
Generating coded data representations using neural networks and vector quantizers

Patent number: 12198710

Abstract: Methods, systems and apparatus, including computer programs encoded on computer storage media. According to one aspect, there is provided a method comprising: receiving a new input; processing the new input using an encoder neural network to generate a feature vector representing the new input; and generating a coded representation of the feature vector using a sequence of vector quantizers that are each associated with a respective codebook of code vectors, wherein the coded representation of the feature vector identifies a plurality of code vectors, including a respective code vector from the codebook of each vector quantizer, that define a quantized representation of the feature vector.

Type: Grant

Filed: December 29, 2023

Date of Patent: January 14, 2025

Assignee: Google LLC

Inventors: Neil Zeghidour, Marco Tagliasacchi, Dominik Roblek
Learning Strides in Convolutional Neural Networks

Publication number: 20250005354

Abstract: A method of training a machine learning model, includes receiving training data for the machine learning model, wherein the training data comprises a plurality of batches. The method also includes applying a downsampling layer of the machine learning model to the plurality of batches of the training data to determine a stride comprising a learnable parameter for the downsampling layer. Applying the downsampling layer of the machine learning model to a batch of the training data includes projecting an input in a spatial domain to a Fourier domain, constructing a mask in the Fourier domain based on a current value of the stride and dimensions of the input, applying the mask as a low-pass filter to the projected input to produce a tensor in the Fourier domain, cropping the tensor based on the mask, and transforming the cropped tensor to the spatial domain.

Type: Application

Filed: October 5, 2022

Publication date: January 2, 2025

Inventors: Neil Zeghidour, Rachid Riad, Olivier Teboul, David Grangier
AUDIO-FOCUS FOR AMBIENT NOISE CANCELLATION

Publication number: 20240428818

Abstract: A method including identifying an audio capture device and a target direction associated with the audio capture device, detecting first audio associated with the target direction, enhancing the first audio using a machine learning model configured to detect audio associated with the target direction, optionally, detecting second audio associated with a direction different from the target direction, and optionally, diminishing the second audio using the machine learning model.

Type: Application

Filed: June 21, 2024

Publication date: December 26, 2024

Inventors: Rajeev Nongpiur, Neil Zeghidour, Marco Tagliasacchi
PERFORMING TASKS USING GENERATIVE NEURAL NETWORKS

Publication number: 20240428056

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing tasks. One of the methods includes obtaining a sequence of input tokens, where each token is selected from a vocabulary of tokens that includes text tokens and audio tokens, and wherein the sequence of input tokens includes tokens that describe a task to be performed and data for performing the task; generating a sequence of embeddings by embedding each token in the sequence of input tokens in an embedding space; and processing the sequence of embeddings using a language model neural network to generate a sequence of output tokens for the task, where each token is selected from the vocabulary.

Type: Application

Filed: June 21, 2024

Publication date: December 26, 2024

Inventors: Paul Kishan Rubenstein, Matthew Sharifi, Alexandru Tudor, Chulayuth Asawaroengchai, Duc Dung Nguyen, Marco Tagliasacchi, Neil Zeghidour, Zalán Borsos, Christian Frank, Dalia Salem Hassan Fahmy Elbadawy, Hannah Raphaelle Muckenhirn, Dirk Ryan Padfield, Damien Vincent, Evgeny Kharitonov, Michelle Dana Tadmor, Mihajlo Velimirovic, Feifan Chen, Victoria Zayats
Machine-Learned Models for Generation of Musical Accompaniments Based on Input Vocals

Publication number: 20240395233

Abstract: Training data comprising a plurality of training pairs is obtained. Each training pair comprises instrumental audio data and vocal audio data separated from audio data of a musical work of a respective plurality of musical works. For one or more training pairs of the plurality of training pairs, the vocal audio data is processed with machine-learned model(s) of a machine-learned generative audio model grouping to obtain a vocal intermediate representation for the vocal audio data. The instrumental audio data is processed with a pre-trained encoding model to obtain an instrumental intermediate representation for the instrumental audio data. A loss function is evaluated that evaluates a difference between the vocal intermediate representation and the instrumental intermediate representation. Values of parameters of a machine-learned model of the machine-learned generative audio model grouping are modified based on the loss function.

Type: Application

Filed: May 22, 2024

Publication date: November 28, 2024

Inventors: Adam Joseph Roberts, Jesse Hart Engel, Ian Stuart Simon, Andrea Agostinelli, Neil Zeghidour, Christopher James Donahue, Antoine Caillon
GENERATING AUDIO USING AUTO-REGRESSIVE GENERATIVE NEURAL NETWORKS

Publication number: 20240371366

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a prediction of an audio signal. One of the methods includes receiving a request to generate an audio signal; obtaining a semantic representation of the audio signal; generating, using one or more generative neural networks and conditioned on at least the semantic representation, an acoustic representation of the audio signal; and processing at least the acoustic representation using a decoder neural network to generate the prediction of the audio signal.

Type: Application

Filed: May 14, 2024

Publication date: November 7, 2024

Inventors: Neil Zeghidour, David Grangier, Marco Tagliasacchi, Raphaël Marinier, Olivier Teboul, Zalán Borsos
LEARNING NEURAL NETWORK ARCHITECTURES BY BACKPROPAGATION USING DIFFERENTIABLE MASKS

Publication number: 20240296331

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for jointly learning the architecture of a neural network during the training of the neural network. In particular, the architecture of the neural network is learned using differentiable parametric masks.

Type: Application

Filed: February 8, 2024

Publication date: September 5, 2024

Inventors: David Wilson Romero Guzman, Neil Zeghidour
GENERATING AUDIO USING AUTO-REGRESSIVE GENERATIVE NEURAL NETWORKS

Publication number: 20240233713

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a prediction of an audio signal. One of the methods includes receiving a request to generate an audio signal conditioned on an input; processing the input using an embedding neural network to map the input to one or more embedding tokens; generating a semantic representation of the audio signal; generating, using one or more generative neural networks and conditioned on at least the semantic representation and the embedding tokens, an acoustic representation of the audio signal; and processing at least the acoustic representation using a decoder neural network to generate the prediction of the audio signal.

Type: Application

Filed: January 12, 2024

Publication date: July 11, 2024

Inventors: Andrea Agostinelli, Timo Immanuel Denk, Antoine Caillon, Neil Zeghidour, Jesse Engel, Mauro Verzetti, Christian Frank, Zalán Borsos, Matthew Sharifi, Adam Joseph Roberts, Marco Tagliasacchi
Generating audio using auto-regressive generative neural networks

Patent number: 12020138

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a prediction of an audio signal. One of the methods includes receiving a request to generate an audio signal; obtaining a semantic representation of the audio signal; generating, using one or more generative neural networks and conditioned on at least the semantic representation, an acoustic representation of the audio signal; and processing at least the acoustic representation using a decoder neural network to generate the prediction of the audio signal.

Type: Grant

Filed: September 7, 2023

Date of Patent: June 25, 2024

Assignee: Google LLC

Inventors: Neil Zeghidour, David Grangier, Marco Tagliasacchi, Raphaël Marinier, Olivier Teboul, Zalán Borsos
GENERATING CODED DATA REPRESENTATIONS USING NEURAL NETWORKS AND VECTOR QUANTIZERS

Publication number: 20240185870

Abstract: Methods, systems and apparatus, including computer programs encoded on computer storage media. According to one aspect, there is provided a method comprising: receiving a new input; processing the new input using an encoder neural network to generate a feature vector representing the new input; and generating a coded representation of the feature vector using a sequence of vector quantizers that are each associated with a respective codebook of code vectors, wherein the coded representation of the feature vector identifies a plurality of code vectors, including a respective code vector from the codebook of each vector quantizer, that define a quantized representation of the feature vector.

Type: Application

Filed: December 29, 2023

Publication date: June 6, 2024

Inventors: Neil Zeghidour, Marco Tagliasacchi, Dominik Roblek
Compressing audio waveforms using neural networks and vector quantizers

Patent number: 11990148

Abstract: Methods, systems and apparatus, including computer programs encoded on computer storage media. One of the methods includes receiving an audio waveform that includes a respective audio sample for each of a plurality of time steps, processing the audio waveform using an encoder neural network to generate a plurality of feature vectors representing the audio waveform, generating a respective coded representation of each of the plurality of feature vectors using a plurality of vector quantizers that are each associated with a respective codebook of code vectors, wherein the respective coded representation of each feature vector identifies a plurality of code vectors, including a respective code vector from the codebook of each vector quantizer, that define a quantized representation of the feature vector, and generating a compressed representation of the audio waveform by compressing the respective coded representation of each of the plurality of feature vectors.

Type: Grant

Filed: February 6, 2023

Date of Patent: May 21, 2024

Assignee: Google LLC

Inventors: Neil Zeghidour, Marco Tagliasacchi, Dominik Roblek
END-TO-END SPEECH DIARIZATION VIA ITERATIVE SPEAKER EMBEDDING

Publication number: 20240144957

Abstract: A method includes receiving an input audio signal corresponding to utterances spoken by multiple speakers. The method also includes encoding the input audio signal into a sequence of T temporal embeddings. During each of a plurality of iterations each corresponding to a respective speaker of the multiple speakers, the method includes selecting a respective speaker embedding for the respective speaker by determining a probability that the corresponding temporal embedding includes a presence of voice activity by a single new speaker for which a speaker embedding was not previously selected during a previous iteration and selecting the respective speaker embedding for the respective speaker as the temporal embedding. The method also includes, at each time step, predicting a respective voice activity indicator for each respective speaker of the multiple speakers based on the respective speaker embeddings selected during the plurality of iterations and the temporal embedding.

Type: Application

Filed: December 19, 2023

Publication date: May 2, 2024

Applicant: Google LLC

Inventors: David Grangier, Neil Zeghidour, Oliver Teboul
GENERATING AUDIO USING AUTO-REGRESSIVE GENERATIVE NEURAL NETWORKS

Publication number: 20240079001

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a prediction of an audio signal. One of the methods includes receiving a request to generate an audio signal conditioned on an input; processing the input using an embedding neural network to map the input to one or more embedding tokens; generating a semantic representation of the audio signal; generating, using one or more generative neural networks and conditioned on at least the semantic representation and the embedding tokens, an acoustic representation of the audio signal; and processing at least the acoustic representation using a decoder neural network to generate the prediction of the audio signal.

Type: Application

Filed: September 7, 2023

Publication date: March 7, 2024

Inventors: Andrea Agostinelli, Timo Immanuel Denk, Antoine Caillon, Neil Zeghidour, Jesse Engel, Mauro Verzetti, Christian Frank, Zalán Borsos, Matthew Sharifi, Adam Joseph Roberts

1 2 next