Patents by Inventor Neil Zeghidour
Neil Zeghidour has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240144957Abstract: A method includes receiving an input audio signal corresponding to utterances spoken by multiple speakers. The method also includes encoding the input audio signal into a sequence of T temporal embeddings. During each of a plurality of iterations each corresponding to a respective speaker of the multiple speakers, the method includes selecting a respective speaker embedding for the respective speaker by determining a probability that the corresponding temporal embedding includes a presence of voice activity by a single new speaker for which a speaker embedding was not previously selected during a previous iteration and selecting the respective speaker embedding for the respective speaker as the temporal embedding. The method also includes, at each time step, predicting a respective voice activity indicator for each respective speaker of the multiple speakers based on the respective speaker embeddings selected during the plurality of iterations and the temporal embedding.Type: ApplicationFiled: December 19, 2023Publication date: May 2, 2024Applicant: Google LLCInventors: David Grangier, Neil Zeghidour, Oliver Teboul
-
Publication number: 20240078412Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a prediction of an audio signal. One of the methods includes receiving a request to generate an audio signal; obtaining a semantic representation of the audio signal; generating, using one or more generative neural networks and conditioned on at least the semantic representation, an acoustic representation of the audio signal; and processing at least the acoustic representation using a decoder neural network to generate the prediction of the audio signal.Type: ApplicationFiled: September 7, 2023Publication date: March 7, 2024Inventors: Neil Zeghidour, David Grangier, Marco Tagliasacchi, Raphaël Marinier, Olivier Teboul, Zalán Borsos
-
Publication number: 20240079001Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a prediction of an audio signal. One of the methods includes receiving a request to generate an audio signal conditioned on an input; processing the input using an embedding neural network to map the input to one or more embedding tokens; generating a semantic representation of the audio signal; generating, using one or more generative neural networks and conditioned on at least the semantic representation and the embedding tokens, an acoustic representation of the audio signal; and processing at least the acoustic representation using a decoder neural network to generate the prediction of the audio signal.Type: ApplicationFiled: September 7, 2023Publication date: March 7, 2024Inventors: Andrea Agostinelli, Timo Immanuel Denk, Antoine Caillon, Neil Zeghidour, Jesse Engel, Mauro Verzetti, Christian Frank, Zalán Borsos, Matthew Sharifi, Adam Joseph Roberts
-
Patent number: 11915689Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a prediction of an audio signal. One of the methods includes receiving a request to generate an audio signal conditioned on an input; processing the input using an embedding neural network to map the input to one or more embedding tokens; generating a semantic representation of the audio signal; generating, using one or more generative neural networks and conditioned on at least the semantic representation and the embedding tokens, an acoustic representation of the audio signal; and processing at least the acoustic representation using a decoder neural network to generate the prediction of the audio signal.Type: GrantFiled: September 7, 2023Date of Patent: February 27, 2024Assignee: Google LLCInventors: Andrea Agostinelli, Timo Immanuel Denk, Antoine Caillon, Neil Zeghidour, Jesse Engel, Mauro Verzetti, Christian Frank, Zalán Borsos, Matthew Sharifi, Adam Joseph Roberts, Marco Tagliasacchi
-
Patent number: 11887623Abstract: A method includes receiving an input audio signal corresponding to utterances spoken by multiple speakers. The method also includes encoding the input audio signal into a sequence of T temporal embeddings. During each of a plurality of iterations each corresponding to a respective speaker of the multiple speakers, the method includes selecting a respective speaker embedding for the respective speaker by determining a probability that the corresponding temporal embedding includes a presence of voice activity by a single new speaker for which a speaker embedding was not previously selected during a previous iteration and selecting the respective speaker embedding for the respective speaker as the temporal embedding. The method also includes, at each time step, predicting a respective voice activity indicator for each respective speaker of the multiple speakers based on the respective speaker embeddings selected during the plurality of iterations and the temporal embedding.Type: GrantFiled: June 22, 2021Date of Patent: January 30, 2024Assignee: Google LLCInventors: David Grangier, Neil Zeghidour, Oliver Teboul
-
Publication number: 20230379645Abstract: The technology generally relates to spatial audio communication between devices. For example, a first device and a second device may be connected via a communication link. The first device may capture audio signals in an environment through two or more microphones. The first device may encode the captured audio with spatial configuration data. The first device may transmit the encoded audio via the communication link to the second device. The second device may decode the encoded audio into binaural or ambisonic audio to be output by one or more speakers of the second device. The binaural or ambisonic audio may be converted into spatial audio to be output. The second device may output the binaural or spatial audio to create an immersive listening experience.Type: ApplicationFiled: May 19, 2022Publication date: November 23, 2023Inventors: Rajeev Conrad Nongpiur, Qian Zhang, Andrew James Sutter, Kung-Wei Liu, Jihan Li, Hélène Bahu, Leonardo Kusumo, Sze Chie Lim, Marco Tagliasacchi, Neil Zeghidour, Michael Takezo Chinen
-
Publication number: 20230377561Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing audio inputs using a learned audio frontend machine learning model that processes the audio input to generate a representation of the audio input. The representation can then be processed by an audio understanding model to generate a respective output for each of one or more audio understanding tasks.Type: ApplicationFiled: October 4, 2021Publication date: November 23, 2023Inventors: Neil Zeghidour, Olivier Teboul, Félix de Chaumont Quitry, Marco Tagliasacchi
-
Publication number: 20230244907Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating a sequence of data elements that includes a respective data element at each position in a sequence of positions. In one aspect, a method includes: for each position after a first position in the sequence of positions: obtaining a current sequence of data element embeddings that includes a respective data element embedding of each data element at a position that precedes the current position, obtaining a sequence of latent embeddings, and processing: (i) the current sequence of data element embeddings, and (ii) the sequence of latent embeddings, using a neural network to generate the data element at the current position. The neural network includes a sequence of neural network blocks including: (i) a cross-attention block, (ii) one or more self-attention blocks, and (iii) an output block.Type: ApplicationFiled: January 30, 2023Publication date: August 3, 2023Inventors: Curtis Glenn-Macway Hawthorne, Andrew Coulter Jaegle, Catalina-Codruta Cangea, Sebastian Borgeaud Dit Avocat, Charlie Thomas Curtis Nash, Mateusz Malinowski, Sander Etienne Lea Dieleman, Oriol Vinyals, Matthew Botvinick, Ian Stuart Simon, Hannah Rachel Sheahan, Neil Zeghidour, Jean-Baptiste Alayrac, Joao Carreira, Jesse Engel
-
Publication number: 20230186927Abstract: Methods, systems and apparatus, including computer programs encoded on computer storage media. One of the methods includes receiving an audio waveform that includes a respective audio sample for each of a plurality of time steps, processing the audio waveform using an encoder neural network to generate a plurality of feature vectors representing the audio waveform, generating a respective coded representation of each of the plurality of feature vectors using a plurality of vector quantizers that are each associated with a respective codebook of code vectors, wherein the respective coded representation of each feature vector identifies a plurality of code vectors, including a respective code vector from the codebook of each vector quantizer, that define a quantized representation of the feature vector, and generating a compressed representation of the audio waveform by compressing the respective coded representation of each of the plurality of feature vectors.Type: ApplicationFiled: February 6, 2023Publication date: June 15, 2023Inventors: Neil Zeghidour, Marco Tagliasacchi, Dominik Roblek
-
Publication number: 20230112265Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing speech separation. One of the methods includes obtaining a recording comprising speech from a plurality of speakers; processing the recording using a speaker neural network having speaker parameter values and configured to process the recording in accordance with the speaker parameter values to generate a plurality of per-recording speaker representations, each speaker representation representing features of a respective identified speaker in the recording; and processing the per-recording speaker representations and the recording using a separation neural network having separation parameter values and configured to process the recording and the speaker representations in accordance with the separation parameter values to generate, for each speaker representation, a respective predicted isolated audio signal that corresponds to speech of one of the speakers in the recording.Type: ApplicationFiled: October 17, 2022Publication date: April 13, 2023Inventors: Neil Zeghidour, David Grangier
-
Patent number: 11600282Abstract: Methods, systems and apparatus, including computer programs encoded on computer storage media. One of the methods includes receiving an audio waveform that includes a respective audio sample for each of a plurality of time steps, processing the audio waveform using an encoder neural network to generate a plurality of feature vectors representing the audio waveform, generating a respective coded representation of each of the plurality of feature vectors using a plurality of vector quantizers that are each associated with a respective codebook of code vectors, wherein the respective coded representation of each feature vector identifies a plurality of code vectors, including a respective code vector from the codebook of each vector quantizer, that define a quantized representation of the feature vector, and generating a compressed representation of the audio waveform by compressing the respective coded representation of each of the plurality of feature vectors.Type: GrantFiled: July 1, 2022Date of Patent: March 7, 2023Assignee: Google LLCInventors: Neil Zeghidour, Marco Tagliasacchi, Dominik Roblek
-
Publication number: 20230019128Abstract: Methods, systems and apparatus, including computer programs encoded on computer storage media. One of the methods includes receiving an audio waveform that includes a respective audio sample for each of a plurality of time steps, processing the audio waveform using an encoder neural network to generate a plurality of feature vectors representing the audio waveform, generating a respective coded representation of each of the plurality of feature vectors using a plurality of vector quantizers that are each associated with a respective codebook of code vectors, wherein the respective coded representation of each feature vector identifies a plurality of code vectors, including a respective code vector from the codebook of each vector quantizer, that define a quantized representation of the feature vector, and generating a compressed representation of the audio waveform by compressing the respective coded representation of each of the plurality of feature vectors.Type: ApplicationFiled: July 1, 2022Publication date: January 19, 2023Inventors: Neil Zeghidour, Marco Tagliasacchi, Dominik Roblek
-
Publication number: 20220375492Abstract: A method includes receiving an input audio signal corresponding to utterances spoken by multiple speakers. The method also includes encoding the input audio signal into a sequence of T temporal embeddings. During each of a plurality of iterations each corresponding to a respective speaker of the multiple speakers, the method includes selecting a respective speaker embedding for the respective speaker by determining a probability that the corresponding temporal embedding includes a presence of voice activity by a single new speaker for which a speaker embedding was not previously selected during a previous iteration and selecting the respective speaker embedding for the respective speaker as the temporal embedding. The method also includes, at each time step, predicting a respective voice activity indicator for each respective speaker of the multiple speakers based on the respective speaker embeddings selected during the plurality of iterations and the temporal embedding.Type: ApplicationFiled: June 22, 2021Publication date: November 24, 2022Applicant: Google LLCInventors: David Grangier, Neil Zeghidour, Oliver Teboul
-
Patent number: 11475909Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing speech separation. One of the methods includes obtaining a recording comprising speech from a plurality of speakers; processing the recording using a speaker neural network having speaker parameter values and configured to process the recording in accordance with the speaker parameter values to generate a plurality of per-recording speaker representations, each speaker representation representing features of a respective identified speaker in the recording; and processing the per-recording speaker representations and the recording using a separation neural network having separation parameter values and configured to process the recording and the speaker representations in accordance with the separation parameter values to generate, for each speaker representation, a respective predicted isolated audio signal that corresponds to speech of one of the speakers in the recording.Type: GrantFiled: February 8, 2021Date of Patent: October 18, 2022Assignee: Google LLCInventors: Neil Zeghidour, David Grangier
-
Publication number: 20210249027Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing speech separation. One of the methods includes obtaining a recording comprising speech from a plurality of speakers; processing the recording using a speaker neural network having speaker parameter values and configured to process the recording in accordance with the speaker parameter values to generate a plurality of per-recording speaker representations, each speaker representation representing features of a respective identified speaker in the recording; and processing the per-recording speaker representations and the recording using a separation neural network having separation parameter values and configured to process the recording and the speaker representations in accordance with the separation parameter values to generate, for each speaker representation, a respective predicted isolated audio signal that corresponds to speech of one of the speakers in the recording.Type: ApplicationFiled: February 8, 2021Publication date: August 12, 2021Inventors: Neil Zeghidour, David Grangier