Patents by Inventor Marco Tagliasacchi

Marco Tagliasacchi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Machine Learning Based Enhancement of Audio for a Voice Call

Publication number: 20240153514

Abstract: Apparatus and methods related to enhancement of audio content are provided. An example method includes receiving, by a computing device and via a communications network interface, a compressed audio data frame, wherein the compressed audio data frame is received after transmission over a communications network, The method further includes decompressing the compressed audio data frame to extract an audio waveform. The method also includes predicting, by applying a neural network to the audio waveform, an enhanced version of the audio waveform, wherein the neural network has been trained on (i) a ground truth sample comprising unencoded audio waveforms prior to compression by an audio encoder, and (ii) a training dataset comprising decoded audio waveforms after compression of the unencoded audio waveforms by the audio encoder. The method additionally includes providing, by an audio output component of the computing device, the enhanced version of the audio waveform.

Type: Application

Filed: March 5, 2021

Publication date: May 9, 2024

Inventors: Omer Ahmed Siddig Osman, Dominik Roblek, Yunpeng Li, Marco Tagliasacchi, Oleg Rybakov, Victor Ungureanu, Eric Giguere
GENERATING AUDIO USING AUTO-REGRESSIVE GENERATIVE NEURAL NETWORKS

Publication number: 20240078412

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a prediction of an audio signal. One of the methods includes receiving a request to generate an audio signal; obtaining a semantic representation of the audio signal; generating, using one or more generative neural networks and conditioned on at least the semantic representation, an acoustic representation of the audio signal; and processing at least the acoustic representation using a decoder neural network to generate the prediction of the audio signal.

Type: Application

Filed: September 7, 2023

Publication date: March 7, 2024

Inventors: Neil Zeghidour, David Grangier, Marco Tagliasacchi, Raphaël Marinier, Olivier Teboul, Zalán Borsos
Generating audio using auto-regressive generative neural networks

Patent number: 11915689

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a prediction of an audio signal. One of the methods includes receiving a request to generate an audio signal conditioned on an input; processing the input using an embedding neural network to map the input to one or more embedding tokens; generating a semantic representation of the audio signal; generating, using one or more generative neural networks and conditioned on at least the semantic representation and the embedding tokens, an acoustic representation of the audio signal; and processing at least the acoustic representation using a decoder neural network to generate the prediction of the audio signal.

Type: Grant

Filed: September 7, 2023

Date of Patent: February 27, 2024

Assignee: Google LLC

Inventors: Andrea Agostinelli, Timo Immanuel Denk, Antoine Caillon, Neil Zeghidour, Jesse Engel, Mauro Verzetti, Christian Frank, Zalán Borsos, Matthew Sharifi, Adam Joseph Roberts, Marco Tagliasacchi
Conditioned Separation of Arbitrary Sounds based on Machine Learning Models

Publication number: 20230419989

Abstract: Example methods include receiving training data comprising a plurality of audio clips and a plurality of textual descriptions of audio. The methods include generating a shared representation comprising a joint embedding. An audio embedding of a given audio clip is within a threshold distance of a text embedding of a textual description of the given audio clip. The methods include generating, based on the joint embedding, a conditioning vector and training, based on the conditioning vector, a neural network to: receive (i) an input audio waveform, and (ii) an input comprising one or more of an input textual description of a target audio source in the input audio waveform, or an audio sample of the target audio source, separate audio corresponding to the target audio source from the input audio waveform, and output the separated audio corresponding to the target audio source in response to the receiving of the input.

Type: Application

Filed: June 24, 2022

Publication date: December 28, 2023

Inventors: Beat Gfeller, Kevin Ian Kilgour, Marco Tagliasacchi, Aren Jansen, Scott Thomas Wisdom, Qingqing Huang
Machine Learning for Microphone Style Transfer

Publication number: 20230395087

Abstract: Example implementations of the present disclosure relate to machine learning for microphone style transfer, for example, to facilitate augmentation of audio data such as speech data to improve robustness of machine learning models trained on the audio data. Systems and methods for microphone style transfer can include one or more machine-learned microphone models trained to obtain and augment signal data to mimic characteristics of signal data obtained from a target microphone. The systems and methods can include a speech enhancement network for enhancing a sample before the style transfer. The augmentation output can then be utilized for a variety of downstream tasks.

Type: Application

Filed: October 15, 2021

Publication date: December 7, 2023

Inventors: Marco Tagliasacchi, Beat Gfeller, Yunpeng Li, Zalán Borsos
Spatial Audio Recording from Home Assistant Devices

Publication number: 20230379645

Abstract: The technology generally relates to spatial audio communication between devices. For example, a first device and a second device may be connected via a communication link. The first device may capture audio signals in an environment through two or more microphones. The first device may encode the captured audio with spatial configuration data. The first device may transmit the encoded audio via the communication link to the second device. The second device may decode the encoded audio into binaural or ambisonic audio to be output by one or more speakers of the second device. The binaural or ambisonic audio may be converted into spatial audio to be output. The second device may output the binaural or spatial audio to create an immersive listening experience.

Type: Application

Filed: May 19, 2022

Publication date: November 23, 2023

Inventors: Rajeev Conrad Nongpiur, Qian Zhang, Andrew James Sutter, Kung-Wei Liu, Jihan Li, Hélène Bahu, Leonardo Kusumo, Sze Chie Lim, Marco Tagliasacchi, Neil Zeghidour, Michael Takezo Chinen
LEARNED AUDIO FRONTEND MACHINE LEARNING MODEL FOR AUDIO UNDERSTANDING

Publication number: 20230377561

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing audio inputs using a learned audio frontend machine learning model that processes the audio input to generate a representation of the audio input. The representation can then be processed by an audio understanding model to generate a respective output for each of one or more audio understanding tasks.

Type: Application

Filed: October 4, 2021

Publication date: November 23, 2023

Inventors: Neil Zeghidour, Olivier Teboul, Félix de Chaumont Quitry, Marco Tagliasacchi
Self-supervised pitch estimation

Patent number: 11756530

Abstract: Example embodiments relate to techniques for training artificial neural networks or oilier machine-learning encoders to accurately predict the pitch of input audio samples in a semitone or otherwise logarithmically-scaled pitch space. An example method may include generating, from a sample of audio data, two training samples by applying two different pitch shifts to the sample of audio training data. This can be done by converting the sample of audio data into the frequency domain and then shifting the transformed data. These known shifts are then compared to the predicted pitches generated by applying the two training samples to the encoder. The encoder is then updated based on the comparison, such that the relative pitch output by the encoder is improved with respect to accuracy. One or more audio samples, labeled with absolute pitch values, can then be used to calibrate the relative pitch values generated by the trained encoder.

Type: Grant

Filed: September 25, 2020

Date of Patent: September 12, 2023

Assignee: Google LLC

Inventors: Marco Tagliasacchi, Mihajlo Velimirovic, Matthew Sharifi, Dominik Roblek, Christian Frank, Beat Gfeller
COMPRESSING AUDIO WAVEFORMS USING NEURAL NETWORKS AND VECTOR QUANTIZERS

Publication number: 20230186927

Abstract: Methods, systems and apparatus, including computer programs encoded on computer storage media. One of the methods includes receiving an audio waveform that includes a respective audio sample for each of a plurality of time steps, processing the audio waveform using an encoder neural network to generate a plurality of feature vectors representing the audio waveform, generating a respective coded representation of each of the plurality of feature vectors using a plurality of vector quantizers that are each associated with a respective codebook of code vectors, wherein the respective coded representation of each feature vector identifies a plurality of code vectors, including a respective code vector from the codebook of each vector quantizer, that define a quantized representation of the feature vector, and generating a compressed representation of the audio waveform by compressing the respective coded representation of each of the plurality of feature vectors.

Type: Application

Filed: February 6, 2023

Publication date: June 15, 2023

Inventors: Neil Zeghidour, Marco Tagliasacchi, Dominik Roblek
Self-Supervised Audio Representation Learning for Mobile Devices

Publication number: 20230085596

Abstract: Systems and methods for training a machine-learned model are provided. A method can include can include obtaining an unlabeled audio signal, sampling the unlabeled audio signal to select one or more sampled slices, inputting the one or more sampled slices into a machine-learned model, receiving, as an output of the machine-learned model, one or more determined characteristics associated with the audio signal, determining a loss function for the machine-learned model based at least in part on a difference between the one or more determined characteristics and one or more corresponding ground truth characteristics of the audio signal, and training the machine-learned model from end to end based at least in part on the loss function. The one or more determined characteristics can include one or more reconstructed portions of the audio signal temporally adjacent to the one or more sampled slices or an estimated distance between two sampled slices.

Type: Application

Filed: November 14, 2022

Publication date: March 16, 2023

Inventors: Beat Gfeller, Dominik Roblek, Félix de Chaumont Quitry, Marco Tagliasacchi
Compressing audio waveforms using neural networks and vector quantizers

Patent number: 11600282

Abstract: Methods, systems and apparatus, including computer programs encoded on computer storage media. One of the methods includes receiving an audio waveform that includes a respective audio sample for each of a plurality of time steps, processing the audio waveform using an encoder neural network to generate a plurality of feature vectors representing the audio waveform, generating a respective coded representation of each of the plurality of feature vectors using a plurality of vector quantizers that are each associated with a respective codebook of code vectors, wherein the respective coded representation of each feature vector identifies a plurality of code vectors, including a respective code vector from the codebook of each vector quantizer, that define a quantized representation of the feature vector, and generating a compressed representation of the audio waveform by compressing the respective coded representation of each of the plurality of feature vectors.

Type: Grant

Filed: July 1, 2022

Date of Patent: March 7, 2023

Assignee: Google LLC

Inventors: Neil Zeghidour, Marco Tagliasacchi, Dominik Roblek
GENERATING AUDIO WAVEFORMS USING ENCODER AND DECODER NEURAL NETWORKS

Publication number: 20230013370

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing an input audio waveform using a generator neural network to generate an output audio waveform. In one aspect, a method comprises: receiving an input audio waveform; processing the input audio waveform using an encoder neural network to generate a set of feature vectors representing the input audio waveform; and processing the set of feature vectors representing the input audio waveform using a decoder neural network to generate an output audio waveform that comprises a respective output audio sample for each of a plurality of output time steps.

Type: Application

Filed: July 1, 2022

Publication date: January 19, 2023

Inventors: Yunpeng Li, Marco Tagliasacchi, Dominik Roblek, Félix de Chaumont Quitry, Beat Gfeller, Hannah Raphaelle Muckenhirn, Victor Ungureanu, Oleg Rybakov, Karolis Misiunas, Zalán Borsos
COMPRESSING AUDIO WAVEFORMS USING NEURAL NETWORKS AND VECTOR QUANTIZERS

Publication number: 20230019128

Abstract: Methods, systems and apparatus, including computer programs encoded on computer storage media. One of the methods includes receiving an audio waveform that includes a respective audio sample for each of a plurality of time steps, processing the audio waveform using an encoder neural network to generate a plurality of feature vectors representing the audio waveform, generating a respective coded representation of each of the plurality of feature vectors using a plurality of vector quantizers that are each associated with a respective codebook of code vectors, wherein the respective coded representation of each feature vector identifies a plurality of code vectors, including a respective code vector from the codebook of each vector quantizer, that define a quantized representation of the feature vector, and generating a compressed representation of the audio waveform by compressing the respective coded representation of each of the plurality of feature vectors.

Type: Application

Filed: July 1, 2022

Publication date: January 19, 2023

Inventors: Neil Zeghidour, Marco Tagliasacchi, Dominik Roblek
MULTI-TASK ADAPTER NEURAL NETWORKS

Publication number: 20220383112

Abstract: A system including a multi-task adapter neural network for performing multiple machine learning tasks is described. The adapter neural network is configured to receive a shared input for the machine learning tasks, and process the shared input to generate, for each of the machine learning tasks, a respective predicted output. The adapter neural network includes (i) a shared encoder configured to receive the shared input and to process the shared input to extract shared feature representations for the machine learning tasks, and (ii) multiple task-adapter encoders, each of the task-adapter encoders being associated with a respective machine learning task in the machine learning tasks and configured to: receive the shared input, receive the shared feature representations from the shared encoder, and process the shared input and the shared feature representations to generate the respective predicted output for the respective machine learning task.

Type: Application

Filed: September 23, 2020

Publication date: December 1, 2022

Inventors: Marco Tagliasacchi, Félix de Chaumont Quitry, Dominik Roblek
Self-supervised audio representation learning for mobile devices

Patent number: 11501787

Abstract: Systems and methods for training a machine-learned model are provided. A method can include can include obtaining an unlabeled audio signal, sampling the unlabeled audio signal to select one or more sampled slices, inputting the one or more sampled slices into a machine-learned model, receiving, as an output of the machine-learned model, one or more determined characteristics associated with the audio signal, determining a loss function for the machine-learned model based at least in part on a difference between the one or more determined characteristics and one or more corresponding ground truth characteristics of the audio signal, and training the machine-learned model from end to end based at least in part on the loss function. The one or more determined characteristics can include one or more reconstructed portions of the audio signal temporally adjacent to the one or more sampled slices or an estimated distance between two sampled slices.

Type: Grant

Filed: August 22, 2019

Date of Patent: November 15, 2022

Assignee: GOOGLE LLC

Inventors: Beat Gfeller, Dominik Roblek, Félix de Chaumont Quitry, Marco Tagliasacchi
SELF-SUPERVISED PITCH ESTIMATION

Publication number: 20220343896

Abstract: Example embodiments relate to techniques for training artificial neural networks or oilier machine-learning encoders to accurately predict the pitch of input audio samples in a semitone or otherwise logarithmically-scaled pitch space. An example method may include generating, from a sample of audio data, two training samples by applying two different pitch shifts to the sample of audio training data. This can be done by converting the sample of audio data into the frequency domain and then shifting the transformed data. These known shifts are then compared to the predicted pitches generated by applying the two training samples to the encoder. The encoder is then updated based on the comparison, such that the relative pitch output by the encoder is improved with respect to accuracy. One or more audio samples, labeled with absolute pitch values, can then be used to calibrate the relative pitch values generated by the trained encoder.

Type: Application

Filed: September 25, 2020

Publication date: October 27, 2022

Inventors: Marco TAGLIASACCHI, Mihajlo VELIMIROVIC, Matthew SHARIFI, Dominik ROBLEK, Christian FRANK, Beat GFELLER
Methods and Systems for Implementing On-Device Non-Semantic Representation Fine-Tuning for Speech Classification

Publication number: 20220059117

Abstract: Examples relate to on-device non-semantic representation fine-tuning for speech classification. A computing system may obtain audio data having a speech portion and train a neural network to learn a non-semantic speech representation based on the speech portion of the audio data. The computing system may evaluate performance of the non-semantic speech representation based on a set of benchmark tasks corresponding to a speech domain and perform a fine-tuning process on the non-semantic speech representation based on one or more downstream tasks. The computing system may further generate a model based on the non-semantic representation and provide the model to a mobile computing device. The model is configured to operate locally on the mobile computing device.

Type: Application

Filed: August 24, 2020

Publication date: February 24, 2022

Inventors: Joel Shor, Ronnie Maor, Oran Lang, Omry Tuval, Marco Tagliasacchi, Ira Shavitt, Felix de Chaumont Quitry, Dotan Emanuel, Aren Jansen
Self-Supervised Audio Representation Learning for Mobile Devices

Publication number: 20210056980

Abstract: Systems and methods for training a machine-learned model are provided. A method can include can include obtaining an unlabeled audio signal, sampling the unlabeled audio signal to select one or more sampled slices, inputting the one or more sampled slices into a machine-learned model, receiving, as an output of the machine-learned model, one or more determined characteristics associated with the audio signal, determining a loss function for the machine-learned model based at least in part on a difference between the one or more determined characteristics and one or more corresponding ground truth characteristics of the audio signal, and training the machine-learned model from end to end based at least in part on the loss function. The one or more determined characteristics can include one or more reconstructed portions of the audio signal temporally adjacent to the one or more sampled slices or an estimated distance between two sampled slices.

Type: Application

Filed: August 22, 2019

Publication date: February 25, 2021

Inventors: Beat Gfeller, Dominik Roblek, Félix de Chaumont Quitry, Marco Tagliasacchi
Method, apparatus and system for robust video transmission with auxiliary information channel

Patent number: 9161048

Abstract: A method of transmitting video data related to a sequence of video frames, includes: encoding the video frames according to a first predictive encoding to generate encoded video data, the encoded video data including a prediction error based on the difference between a portion of a current video frame in the sequence and a first predictor thereof based on a first preceding video frame in the sequence; generating auxiliary video data related to the portion of the current video frame; and transmitting the encoded video data and the auxiliary video data to a receiver, the encoded video data being transmitted over a first channel, and the auxiliary video data being transmitted over a second channel. The step of generating auxiliary video data includes calculating a correlation between the first predictor and a predetermined second predictor based on a second preceding video frame in the sequence, the second preceding video frame preceding in the sequence the first preceding video frame.

Type: Grant

Filed: June 30, 2006

Date of Patent: October 13, 2015

Assignees: Telecom Italia S.p.A., Politecnico di Milano

Inventors: Giovanni Cordara, Marco Tagliasacchi, Stefano Tubaro
APPARATUS AND METHOD FOR LOCATING THE POINT OF IMPACT OF A BODY ON A SURFACE

Publication number: 20130304401

Abstract: An apparatus for locating the point of impact of a body on a surface comprises detecting means (12; 12a, 12b, 12c, 12d) adapted to detect pressure waves generated by the interaction of said body with said surface, and a processing unit (15) operatively connected to said detecting means (12; 12a, 12b, 12c, 12d); the detecting means (12; 12a, 12b, 12c, 12d) and the processing unit (14, 15) are configured for detecting and processing power values associated with said pressure waves so as to calculate the position of said point of impact as a function of the aforementioned power values. Also described is the related method.

Type: Application

Filed: January 30, 2012

Publication date: November 14, 2013

Applicant: TECHNOGYM S.P.A.

Inventors: Stefano Tubaro, Angusto Sarti, Marco Tagliasacchi, Fabio Antonacci, Fulvio Crivellaro, Gabriele Genovese

1 2 next