Patents by Inventor Marco Tagliasacchi
Marco Tagliasacchi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240153514Abstract: Apparatus and methods related to enhancement of audio content are provided. An example method includes receiving, by a computing device and via a communications network interface, a compressed audio data frame, wherein the compressed audio data frame is received after transmission over a communications network, The method further includes decompressing the compressed audio data frame to extract an audio waveform. The method also includes predicting, by applying a neural network to the audio waveform, an enhanced version of the audio waveform, wherein the neural network has been trained on (i) a ground truth sample comprising unencoded audio waveforms prior to compression by an audio encoder, and (ii) a training dataset comprising decoded audio waveforms after compression of the unencoded audio waveforms by the audio encoder. The method additionally includes providing, by an audio output component of the computing device, the enhanced version of the audio waveform.Type: ApplicationFiled: March 5, 2021Publication date: May 9, 2024Inventors: Omer Ahmed Siddig Osman, Dominik Roblek, Yunpeng Li, Marco Tagliasacchi, Oleg Rybakov, Victor Ungureanu, Eric Giguere
-
Publication number: 20240078412Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a prediction of an audio signal. One of the methods includes receiving a request to generate an audio signal; obtaining a semantic representation of the audio signal; generating, using one or more generative neural networks and conditioned on at least the semantic representation, an acoustic representation of the audio signal; and processing at least the acoustic representation using a decoder neural network to generate the prediction of the audio signal.Type: ApplicationFiled: September 7, 2023Publication date: March 7, 2024Inventors: Neil Zeghidour, David Grangier, Marco Tagliasacchi, Raphaël Marinier, Olivier Teboul, Zalán Borsos
-
Patent number: 11915689Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a prediction of an audio signal. One of the methods includes receiving a request to generate an audio signal conditioned on an input; processing the input using an embedding neural network to map the input to one or more embedding tokens; generating a semantic representation of the audio signal; generating, using one or more generative neural networks and conditioned on at least the semantic representation and the embedding tokens, an acoustic representation of the audio signal; and processing at least the acoustic representation using a decoder neural network to generate the prediction of the audio signal.Type: GrantFiled: September 7, 2023Date of Patent: February 27, 2024Assignee: Google LLCInventors: Andrea Agostinelli, Timo Immanuel Denk, Antoine Caillon, Neil Zeghidour, Jesse Engel, Mauro Verzetti, Christian Frank, Zalán Borsos, Matthew Sharifi, Adam Joseph Roberts, Marco Tagliasacchi
-
Publication number: 20230419989Abstract: Example methods include receiving training data comprising a plurality of audio clips and a plurality of textual descriptions of audio. The methods include generating a shared representation comprising a joint embedding. An audio embedding of a given audio clip is within a threshold distance of a text embedding of a textual description of the given audio clip. The methods include generating, based on the joint embedding, a conditioning vector and training, based on the conditioning vector, a neural network to: receive (i) an input audio waveform, and (ii) an input comprising one or more of an input textual description of a target audio source in the input audio waveform, or an audio sample of the target audio source, separate audio corresponding to the target audio source from the input audio waveform, and output the separated audio corresponding to the target audio source in response to the receiving of the input.Type: ApplicationFiled: June 24, 2022Publication date: December 28, 2023Inventors: Beat Gfeller, Kevin Ian Kilgour, Marco Tagliasacchi, Aren Jansen, Scott Thomas Wisdom, Qingqing Huang
-
Publication number: 20230395087Abstract: Example implementations of the present disclosure relate to machine learning for microphone style transfer, for example, to facilitate augmentation of audio data such as speech data to improve robustness of machine learning models trained on the audio data. Systems and methods for microphone style transfer can include one or more machine-learned microphone models trained to obtain and augment signal data to mimic characteristics of signal data obtained from a target microphone. The systems and methods can include a speech enhancement network for enhancing a sample before the style transfer. The augmentation output can then be utilized for a variety of downstream tasks.Type: ApplicationFiled: October 15, 2021Publication date: December 7, 2023Inventors: Marco Tagliasacchi, Beat Gfeller, Yunpeng Li, Zalán Borsos
-
Publication number: 20230379645Abstract: The technology generally relates to spatial audio communication between devices. For example, a first device and a second device may be connected via a communication link. The first device may capture audio signals in an environment through two or more microphones. The first device may encode the captured audio with spatial configuration data. The first device may transmit the encoded audio via the communication link to the second device. The second device may decode the encoded audio into binaural or ambisonic audio to be output by one or more speakers of the second device. The binaural or ambisonic audio may be converted into spatial audio to be output. The second device may output the binaural or spatial audio to create an immersive listening experience.Type: ApplicationFiled: May 19, 2022Publication date: November 23, 2023Inventors: Rajeev Conrad Nongpiur, Qian Zhang, Andrew James Sutter, Kung-Wei Liu, Jihan Li, Hélène Bahu, Leonardo Kusumo, Sze Chie Lim, Marco Tagliasacchi, Neil Zeghidour, Michael Takezo Chinen
-
Publication number: 20230377561Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing audio inputs using a learned audio frontend machine learning model that processes the audio input to generate a representation of the audio input. The representation can then be processed by an audio understanding model to generate a respective output for each of one or more audio understanding tasks.Type: ApplicationFiled: October 4, 2021Publication date: November 23, 2023Inventors: Neil Zeghidour, Olivier Teboul, Félix de Chaumont Quitry, Marco Tagliasacchi
-
Patent number: 11756530Abstract: Example embodiments relate to techniques for training artificial neural networks or oilier machine-learning encoders to accurately predict the pitch of input audio samples in a semitone or otherwise logarithmically-scaled pitch space. An example method may include generating, from a sample of audio data, two training samples by applying two different pitch shifts to the sample of audio training data. This can be done by converting the sample of audio data into the frequency domain and then shifting the transformed data. These known shifts are then compared to the predicted pitches generated by applying the two training samples to the encoder. The encoder is then updated based on the comparison, such that the relative pitch output by the encoder is improved with respect to accuracy. One or more audio samples, labeled with absolute pitch values, can then be used to calibrate the relative pitch values generated by the trained encoder.Type: GrantFiled: September 25, 2020Date of Patent: September 12, 2023Assignee: Google LLCInventors: Marco Tagliasacchi, Mihajlo Velimirovic, Matthew Sharifi, Dominik Roblek, Christian Frank, Beat Gfeller
-
Publication number: 20230186927Abstract: Methods, systems and apparatus, including computer programs encoded on computer storage media. One of the methods includes receiving an audio waveform that includes a respective audio sample for each of a plurality of time steps, processing the audio waveform using an encoder neural network to generate a plurality of feature vectors representing the audio waveform, generating a respective coded representation of each of the plurality of feature vectors using a plurality of vector quantizers that are each associated with a respective codebook of code vectors, wherein the respective coded representation of each feature vector identifies a plurality of code vectors, including a respective code vector from the codebook of each vector quantizer, that define a quantized representation of the feature vector, and generating a compressed representation of the audio waveform by compressing the respective coded representation of each of the plurality of feature vectors.Type: ApplicationFiled: February 6, 2023Publication date: June 15, 2023Inventors: Neil Zeghidour, Marco Tagliasacchi, Dominik Roblek
-
Publication number: 20230085596Abstract: Systems and methods for training a machine-learned model are provided. A method can include can include obtaining an unlabeled audio signal, sampling the unlabeled audio signal to select one or more sampled slices, inputting the one or more sampled slices into a machine-learned model, receiving, as an output of the machine-learned model, one or more determined characteristics associated with the audio signal, determining a loss function for the machine-learned model based at least in part on a difference between the one or more determined characteristics and one or more corresponding ground truth characteristics of the audio signal, and training the machine-learned model from end to end based at least in part on the loss function. The one or more determined characteristics can include one or more reconstructed portions of the audio signal temporally adjacent to the one or more sampled slices or an estimated distance between two sampled slices.Type: ApplicationFiled: November 14, 2022Publication date: March 16, 2023Inventors: Beat Gfeller, Dominik Roblek, Félix de Chaumont Quitry, Marco Tagliasacchi
-
Patent number: 11600282Abstract: Methods, systems and apparatus, including computer programs encoded on computer storage media. One of the methods includes receiving an audio waveform that includes a respective audio sample for each of a plurality of time steps, processing the audio waveform using an encoder neural network to generate a plurality of feature vectors representing the audio waveform, generating a respective coded representation of each of the plurality of feature vectors using a plurality of vector quantizers that are each associated with a respective codebook of code vectors, wherein the respective coded representation of each feature vector identifies a plurality of code vectors, including a respective code vector from the codebook of each vector quantizer, that define a quantized representation of the feature vector, and generating a compressed representation of the audio waveform by compressing the respective coded representation of each of the plurality of feature vectors.Type: GrantFiled: July 1, 2022Date of Patent: March 7, 2023Assignee: Google LLCInventors: Neil Zeghidour, Marco Tagliasacchi, Dominik Roblek
-
Publication number: 20230013370Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing an input audio waveform using a generator neural network to generate an output audio waveform. In one aspect, a method comprises: receiving an input audio waveform; processing the input audio waveform using an encoder neural network to generate a set of feature vectors representing the input audio waveform; and processing the set of feature vectors representing the input audio waveform using a decoder neural network to generate an output audio waveform that comprises a respective output audio sample for each of a plurality of output time steps.Type: ApplicationFiled: July 1, 2022Publication date: January 19, 2023Inventors: Yunpeng Li, Marco Tagliasacchi, Dominik Roblek, Félix de Chaumont Quitry, Beat Gfeller, Hannah Raphaelle Muckenhirn, Victor Ungureanu, Oleg Rybakov, Karolis Misiunas, Zalán Borsos
-
Publication number: 20230019128Abstract: Methods, systems and apparatus, including computer programs encoded on computer storage media. One of the methods includes receiving an audio waveform that includes a respective audio sample for each of a plurality of time steps, processing the audio waveform using an encoder neural network to generate a plurality of feature vectors representing the audio waveform, generating a respective coded representation of each of the plurality of feature vectors using a plurality of vector quantizers that are each associated with a respective codebook of code vectors, wherein the respective coded representation of each feature vector identifies a plurality of code vectors, including a respective code vector from the codebook of each vector quantizer, that define a quantized representation of the feature vector, and generating a compressed representation of the audio waveform by compressing the respective coded representation of each of the plurality of feature vectors.Type: ApplicationFiled: July 1, 2022Publication date: January 19, 2023Inventors: Neil Zeghidour, Marco Tagliasacchi, Dominik Roblek
-
Publication number: 20220383112Abstract: A system including a multi-task adapter neural network for performing multiple machine learning tasks is described. The adapter neural network is configured to receive a shared input for the machine learning tasks, and process the shared input to generate, for each of the machine learning tasks, a respective predicted output. The adapter neural network includes (i) a shared encoder configured to receive the shared input and to process the shared input to extract shared feature representations for the machine learning tasks, and (ii) multiple task-adapter encoders, each of the task-adapter encoders being associated with a respective machine learning task in the machine learning tasks and configured to: receive the shared input, receive the shared feature representations from the shared encoder, and process the shared input and the shared feature representations to generate the respective predicted output for the respective machine learning task.Type: ApplicationFiled: September 23, 2020Publication date: December 1, 2022Inventors: Marco Tagliasacchi, Félix de Chaumont Quitry, Dominik Roblek
-
Patent number: 11501787Abstract: Systems and methods for training a machine-learned model are provided. A method can include can include obtaining an unlabeled audio signal, sampling the unlabeled audio signal to select one or more sampled slices, inputting the one or more sampled slices into a machine-learned model, receiving, as an output of the machine-learned model, one or more determined characteristics associated with the audio signal, determining a loss function for the machine-learned model based at least in part on a difference between the one or more determined characteristics and one or more corresponding ground truth characteristics of the audio signal, and training the machine-learned model from end to end based at least in part on the loss function. The one or more determined characteristics can include one or more reconstructed portions of the audio signal temporally adjacent to the one or more sampled slices or an estimated distance between two sampled slices.Type: GrantFiled: August 22, 2019Date of Patent: November 15, 2022Assignee: GOOGLE LLCInventors: Beat Gfeller, Dominik Roblek, Félix de Chaumont Quitry, Marco Tagliasacchi
-
Publication number: 20220343896Abstract: Example embodiments relate to techniques for training artificial neural networks or oilier machine-learning encoders to accurately predict the pitch of input audio samples in a semitone or otherwise logarithmically-scaled pitch space. An example method may include generating, from a sample of audio data, two training samples by applying two different pitch shifts to the sample of audio training data. This can be done by converting the sample of audio data into the frequency domain and then shifting the transformed data. These known shifts are then compared to the predicted pitches generated by applying the two training samples to the encoder. The encoder is then updated based on the comparison, such that the relative pitch output by the encoder is improved with respect to accuracy. One or more audio samples, labeled with absolute pitch values, can then be used to calibrate the relative pitch values generated by the trained encoder.Type: ApplicationFiled: September 25, 2020Publication date: October 27, 2022Inventors: Marco TAGLIASACCHI, Mihajlo VELIMIROVIC, Matthew SHARIFI, Dominik ROBLEK, Christian FRANK, Beat GFELLER
-
Publication number: 20220059117Abstract: Examples relate to on-device non-semantic representation fine-tuning for speech classification. A computing system may obtain audio data having a speech portion and train a neural network to learn a non-semantic speech representation based on the speech portion of the audio data. The computing system may evaluate performance of the non-semantic speech representation based on a set of benchmark tasks corresponding to a speech domain and perform a fine-tuning process on the non-semantic speech representation based on one or more downstream tasks. The computing system may further generate a model based on the non-semantic representation and provide the model to a mobile computing device. The model is configured to operate locally on the mobile computing device.Type: ApplicationFiled: August 24, 2020Publication date: February 24, 2022Inventors: Joel Shor, Ronnie Maor, Oran Lang, Omry Tuval, Marco Tagliasacchi, Ira Shavitt, Felix de Chaumont Quitry, Dotan Emanuel, Aren Jansen
-
Publication number: 20210056980Abstract: Systems and methods for training a machine-learned model are provided. A method can include can include obtaining an unlabeled audio signal, sampling the unlabeled audio signal to select one or more sampled slices, inputting the one or more sampled slices into a machine-learned model, receiving, as an output of the machine-learned model, one or more determined characteristics associated with the audio signal, determining a loss function for the machine-learned model based at least in part on a difference between the one or more determined characteristics and one or more corresponding ground truth characteristics of the audio signal, and training the machine-learned model from end to end based at least in part on the loss function. The one or more determined characteristics can include one or more reconstructed portions of the audio signal temporally adjacent to the one or more sampled slices or an estimated distance between two sampled slices.Type: ApplicationFiled: August 22, 2019Publication date: February 25, 2021Inventors: Beat Gfeller, Dominik Roblek, Félix de Chaumont Quitry, Marco Tagliasacchi
-
Patent number: 9161048Abstract: A method of transmitting video data related to a sequence of video frames, includes: encoding the video frames according to a first predictive encoding to generate encoded video data, the encoded video data including a prediction error based on the difference between a portion of a current video frame in the sequence and a first predictor thereof based on a first preceding video frame in the sequence; generating auxiliary video data related to the portion of the current video frame; and transmitting the encoded video data and the auxiliary video data to a receiver, the encoded video data being transmitted over a first channel, and the auxiliary video data being transmitted over a second channel. The step of generating auxiliary video data includes calculating a correlation between the first predictor and a predetermined second predictor based on a second preceding video frame in the sequence, the second preceding video frame preceding in the sequence the first preceding video frame.Type: GrantFiled: June 30, 2006Date of Patent: October 13, 2015Assignees: Telecom Italia S.p.A., Politecnico di MilanoInventors: Giovanni Cordara, Marco Tagliasacchi, Stefano Tubaro
-
Publication number: 20130304401Abstract: An apparatus for locating the point of impact of a body on a surface comprises detecting means (12; 12a, 12b, 12c, 12d) adapted to detect pressure waves generated by the interaction of said body with said surface, and a processing unit (15) operatively connected to said detecting means (12; 12a, 12b, 12c, 12d); the detecting means (12; 12a, 12b, 12c, 12d) and the processing unit (14, 15) are configured for detecting and processing power values associated with said pressure waves so as to calculate the position of said point of impact as a function of the aforementioned power values. Also described is the related method.Type: ApplicationFiled: January 30, 2012Publication date: November 14, 2013Applicant: TECHNOGYM S.P.A.Inventors: Stefano Tubaro, Angusto Sarti, Marco Tagliasacchi, Fabio Antonacci, Fulvio Crivellaro, Gabriele Genovese