Patents by Inventor Marco Tagliasacchi

Marco Tagliasacchi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240153514
    Abstract: Apparatus and methods related to enhancement of audio content are provided. An example method includes receiving, by a computing device and via a communications network interface, a compressed audio data frame, wherein the compressed audio data frame is received after transmission over a communications network, The method further includes decompressing the compressed audio data frame to extract an audio waveform. The method also includes predicting, by applying a neural network to the audio waveform, an enhanced version of the audio waveform, wherein the neural network has been trained on (i) a ground truth sample comprising unencoded audio waveforms prior to compression by an audio encoder, and (ii) a training dataset comprising decoded audio waveforms after compression of the unencoded audio waveforms by the audio encoder. The method additionally includes providing, by an audio output component of the computing device, the enhanced version of the audio waveform.
    Type: Application
    Filed: March 5, 2021
    Publication date: May 9, 2024
    Inventors: Omer Ahmed Siddig Osman, Dominik Roblek, Yunpeng Li, Marco Tagliasacchi, Oleg Rybakov, Victor Ungureanu, Eric Giguere
  • Publication number: 20240078412
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a prediction of an audio signal. One of the methods includes receiving a request to generate an audio signal; obtaining a semantic representation of the audio signal; generating, using one or more generative neural networks and conditioned on at least the semantic representation, an acoustic representation of the audio signal; and processing at least the acoustic representation using a decoder neural network to generate the prediction of the audio signal.
    Type: Application
    Filed: September 7, 2023
    Publication date: March 7, 2024
    Inventors: Neil Zeghidour, David Grangier, Marco Tagliasacchi, Raphaël Marinier, Olivier Teboul, Zalán Borsos
  • Patent number: 11915689
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating a prediction of an audio signal. One of the methods includes receiving a request to generate an audio signal conditioned on an input; processing the input using an embedding neural network to map the input to one or more embedding tokens; generating a semantic representation of the audio signal; generating, using one or more generative neural networks and conditioned on at least the semantic representation and the embedding tokens, an acoustic representation of the audio signal; and processing at least the acoustic representation using a decoder neural network to generate the prediction of the audio signal.
    Type: Grant
    Filed: September 7, 2023
    Date of Patent: February 27, 2024
    Assignee: Google LLC
    Inventors: Andrea Agostinelli, Timo Immanuel Denk, Antoine Caillon, Neil Zeghidour, Jesse Engel, Mauro Verzetti, Christian Frank, Zalán Borsos, Matthew Sharifi, Adam Joseph Roberts, Marco Tagliasacchi
  • Publication number: 20230419989
    Abstract: Example methods include receiving training data comprising a plurality of audio clips and a plurality of textual descriptions of audio. The methods include generating a shared representation comprising a joint embedding. An audio embedding of a given audio clip is within a threshold distance of a text embedding of a textual description of the given audio clip. The methods include generating, based on the joint embedding, a conditioning vector and training, based on the conditioning vector, a neural network to: receive (i) an input audio waveform, and (ii) an input comprising one or more of an input textual description of a target audio source in the input audio waveform, or an audio sample of the target audio source, separate audio corresponding to the target audio source from the input audio waveform, and output the separated audio corresponding to the target audio source in response to the receiving of the input.
    Type: Application
    Filed: June 24, 2022
    Publication date: December 28, 2023
    Inventors: Beat Gfeller, Kevin Ian Kilgour, Marco Tagliasacchi, Aren Jansen, Scott Thomas Wisdom, Qingqing Huang
  • Publication number: 20230395087
    Abstract: Example implementations of the present disclosure relate to machine learning for microphone style transfer, for example, to facilitate augmentation of audio data such as speech data to improve robustness of machine learning models trained on the audio data. Systems and methods for microphone style transfer can include one or more machine-learned microphone models trained to obtain and augment signal data to mimic characteristics of signal data obtained from a target microphone. The systems and methods can include a speech enhancement network for enhancing a sample before the style transfer. The augmentation output can then be utilized for a variety of downstream tasks.
    Type: Application
    Filed: October 15, 2021
    Publication date: December 7, 2023
    Inventors: Marco Tagliasacchi, Beat Gfeller, Yunpeng Li, Zalán Borsos
  • Publication number: 20230379645
    Abstract: The technology generally relates to spatial audio communication between devices. For example, a first device and a second device may be connected via a communication link. The first device may capture audio signals in an environment through two or more microphones. The first device may encode the captured audio with spatial configuration data. The first device may transmit the encoded audio via the communication link to the second device. The second device may decode the encoded audio into binaural or ambisonic audio to be output by one or more speakers of the second device. The binaural or ambisonic audio may be converted into spatial audio to be output. The second device may output the binaural or spatial audio to create an immersive listening experience.
    Type: Application
    Filed: May 19, 2022
    Publication date: November 23, 2023
    Inventors: Rajeev Conrad Nongpiur, Qian Zhang, Andrew James Sutter, Kung-Wei Liu, Jihan Li, Hélène Bahu, Leonardo Kusumo, Sze Chie Lim, Marco Tagliasacchi, Neil Zeghidour, Michael Takezo Chinen
  • Publication number: 20230377561
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for processing audio inputs using a learned audio frontend machine learning model that processes the audio input to generate a representation of the audio input. The representation can then be processed by an audio understanding model to generate a respective output for each of one or more audio understanding tasks.
    Type: Application
    Filed: October 4, 2021
    Publication date: November 23, 2023
    Inventors: Neil Zeghidour, Olivier Teboul, Félix de Chaumont Quitry, Marco Tagliasacchi
  • Patent number: 11756530
    Abstract: Example embodiments relate to techniques for training artificial neural networks or oilier machine-learning encoders to accurately predict the pitch of input audio samples in a semitone or otherwise logarithmically-scaled pitch space. An example method may include generating, from a sample of audio data, two training samples by applying two different pitch shifts to the sample of audio training data. This can be done by converting the sample of audio data into the frequency domain and then shifting the transformed data. These known shifts are then compared to the predicted pitches generated by applying the two training samples to the encoder. The encoder is then updated based on the comparison, such that the relative pitch output by the encoder is improved with respect to accuracy. One or more audio samples, labeled with absolute pitch values, can then be used to calibrate the relative pitch values generated by the trained encoder.
    Type: Grant
    Filed: September 25, 2020
    Date of Patent: September 12, 2023
    Assignee: Google LLC
    Inventors: Marco Tagliasacchi, Mihajlo Velimirovic, Matthew Sharifi, Dominik Roblek, Christian Frank, Beat Gfeller
  • Publication number: 20230186927
    Abstract: Methods, systems and apparatus, including computer programs encoded on computer storage media. One of the methods includes receiving an audio waveform that includes a respective audio sample for each of a plurality of time steps, processing the audio waveform using an encoder neural network to generate a plurality of feature vectors representing the audio waveform, generating a respective coded representation of each of the plurality of feature vectors using a plurality of vector quantizers that are each associated with a respective codebook of code vectors, wherein the respective coded representation of each feature vector identifies a plurality of code vectors, including a respective code vector from the codebook of each vector quantizer, that define a quantized representation of the feature vector, and generating a compressed representation of the audio waveform by compressing the respective coded representation of each of the plurality of feature vectors.
    Type: Application
    Filed: February 6, 2023
    Publication date: June 15, 2023
    Inventors: Neil Zeghidour, Marco Tagliasacchi, Dominik Roblek
  • Publication number: 20230085596
    Abstract: Systems and methods for training a machine-learned model are provided. A method can include can include obtaining an unlabeled audio signal, sampling the unlabeled audio signal to select one or more sampled slices, inputting the one or more sampled slices into a machine-learned model, receiving, as an output of the machine-learned model, one or more determined characteristics associated with the audio signal, determining a loss function for the machine-learned model based at least in part on a difference between the one or more determined characteristics and one or more corresponding ground truth characteristics of the audio signal, and training the machine-learned model from end to end based at least in part on the loss function. The one or more determined characteristics can include one or more reconstructed portions of the audio signal temporally adjacent to the one or more sampled slices or an estimated distance between two sampled slices.
    Type: Application
    Filed: November 14, 2022
    Publication date: March 16, 2023
    Inventors: Beat Gfeller, Dominik Roblek, Félix de Chaumont Quitry, Marco Tagliasacchi
  • Patent number: 11600282
    Abstract: Methods, systems and apparatus, including computer programs encoded on computer storage media. One of the methods includes receiving an audio waveform that includes a respective audio sample for each of a plurality of time steps, processing the audio waveform using an encoder neural network to generate a plurality of feature vectors representing the audio waveform, generating a respective coded representation of each of the plurality of feature vectors using a plurality of vector quantizers that are each associated with a respective codebook of code vectors, wherein the respective coded representation of each feature vector identifies a plurality of code vectors, including a respective code vector from the codebook of each vector quantizer, that define a quantized representation of the feature vector, and generating a compressed representation of the audio waveform by compressing the respective coded representation of each of the plurality of feature vectors.
    Type: Grant
    Filed: July 1, 2022
    Date of Patent: March 7, 2023
    Assignee: Google LLC
    Inventors: Neil Zeghidour, Marco Tagliasacchi, Dominik Roblek
  • Publication number: 20230013370
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing an input audio waveform using a generator neural network to generate an output audio waveform. In one aspect, a method comprises: receiving an input audio waveform; processing the input audio waveform using an encoder neural network to generate a set of feature vectors representing the input audio waveform; and processing the set of feature vectors representing the input audio waveform using a decoder neural network to generate an output audio waveform that comprises a respective output audio sample for each of a plurality of output time steps.
    Type: Application
    Filed: July 1, 2022
    Publication date: January 19, 2023
    Inventors: Yunpeng Li, Marco Tagliasacchi, Dominik Roblek, Félix de Chaumont Quitry, Beat Gfeller, Hannah Raphaelle Muckenhirn, Victor Ungureanu, Oleg Rybakov, Karolis Misiunas, Zalán Borsos
  • Publication number: 20230019128
    Abstract: Methods, systems and apparatus, including computer programs encoded on computer storage media. One of the methods includes receiving an audio waveform that includes a respective audio sample for each of a plurality of time steps, processing the audio waveform using an encoder neural network to generate a plurality of feature vectors representing the audio waveform, generating a respective coded representation of each of the plurality of feature vectors using a plurality of vector quantizers that are each associated with a respective codebook of code vectors, wherein the respective coded representation of each feature vector identifies a plurality of code vectors, including a respective code vector from the codebook of each vector quantizer, that define a quantized representation of the feature vector, and generating a compressed representation of the audio waveform by compressing the respective coded representation of each of the plurality of feature vectors.
    Type: Application
    Filed: July 1, 2022
    Publication date: January 19, 2023
    Inventors: Neil Zeghidour, Marco Tagliasacchi, Dominik Roblek
  • Publication number: 20220383112
    Abstract: A system including a multi-task adapter neural network for performing multiple machine learning tasks is described. The adapter neural network is configured to receive a shared input for the machine learning tasks, and process the shared input to generate, for each of the machine learning tasks, a respective predicted output. The adapter neural network includes (i) a shared encoder configured to receive the shared input and to process the shared input to extract shared feature representations for the machine learning tasks, and (ii) multiple task-adapter encoders, each of the task-adapter encoders being associated with a respective machine learning task in the machine learning tasks and configured to: receive the shared input, receive the shared feature representations from the shared encoder, and process the shared input and the shared feature representations to generate the respective predicted output for the respective machine learning task.
    Type: Application
    Filed: September 23, 2020
    Publication date: December 1, 2022
    Inventors: Marco Tagliasacchi, Félix de Chaumont Quitry, Dominik Roblek
  • Patent number: 11501787
    Abstract: Systems and methods for training a machine-learned model are provided. A method can include can include obtaining an unlabeled audio signal, sampling the unlabeled audio signal to select one or more sampled slices, inputting the one or more sampled slices into a machine-learned model, receiving, as an output of the machine-learned model, one or more determined characteristics associated with the audio signal, determining a loss function for the machine-learned model based at least in part on a difference between the one or more determined characteristics and one or more corresponding ground truth characteristics of the audio signal, and training the machine-learned model from end to end based at least in part on the loss function. The one or more determined characteristics can include one or more reconstructed portions of the audio signal temporally adjacent to the one or more sampled slices or an estimated distance between two sampled slices.
    Type: Grant
    Filed: August 22, 2019
    Date of Patent: November 15, 2022
    Assignee: GOOGLE LLC
    Inventors: Beat Gfeller, Dominik Roblek, Félix de Chaumont Quitry, Marco Tagliasacchi
  • Publication number: 20220343896
    Abstract: Example embodiments relate to techniques for training artificial neural networks or oilier machine-learning encoders to accurately predict the pitch of input audio samples in a semitone or otherwise logarithmically-scaled pitch space. An example method may include generating, from a sample of audio data, two training samples by applying two different pitch shifts to the sample of audio training data. This can be done by converting the sample of audio data into the frequency domain and then shifting the transformed data. These known shifts are then compared to the predicted pitches generated by applying the two training samples to the encoder. The encoder is then updated based on the comparison, such that the relative pitch output by the encoder is improved with respect to accuracy. One or more audio samples, labeled with absolute pitch values, can then be used to calibrate the relative pitch values generated by the trained encoder.
    Type: Application
    Filed: September 25, 2020
    Publication date: October 27, 2022
    Inventors: Marco TAGLIASACCHI, Mihajlo VELIMIROVIC, Matthew SHARIFI, Dominik ROBLEK, Christian FRANK, Beat GFELLER
  • Publication number: 20220059117
    Abstract: Examples relate to on-device non-semantic representation fine-tuning for speech classification. A computing system may obtain audio data having a speech portion and train a neural network to learn a non-semantic speech representation based on the speech portion of the audio data. The computing system may evaluate performance of the non-semantic speech representation based on a set of benchmark tasks corresponding to a speech domain and perform a fine-tuning process on the non-semantic speech representation based on one or more downstream tasks. The computing system may further generate a model based on the non-semantic representation and provide the model to a mobile computing device. The model is configured to operate locally on the mobile computing device.
    Type: Application
    Filed: August 24, 2020
    Publication date: February 24, 2022
    Inventors: Joel Shor, Ronnie Maor, Oran Lang, Omry Tuval, Marco Tagliasacchi, Ira Shavitt, Felix de Chaumont Quitry, Dotan Emanuel, Aren Jansen
  • Publication number: 20210056980
    Abstract: Systems and methods for training a machine-learned model are provided. A method can include can include obtaining an unlabeled audio signal, sampling the unlabeled audio signal to select one or more sampled slices, inputting the one or more sampled slices into a machine-learned model, receiving, as an output of the machine-learned model, one or more determined characteristics associated with the audio signal, determining a loss function for the machine-learned model based at least in part on a difference between the one or more determined characteristics and one or more corresponding ground truth characteristics of the audio signal, and training the machine-learned model from end to end based at least in part on the loss function. The one or more determined characteristics can include one or more reconstructed portions of the audio signal temporally adjacent to the one or more sampled slices or an estimated distance between two sampled slices.
    Type: Application
    Filed: August 22, 2019
    Publication date: February 25, 2021
    Inventors: Beat Gfeller, Dominik Roblek, Félix de Chaumont Quitry, Marco Tagliasacchi
  • Patent number: 9161048
    Abstract: A method of transmitting video data related to a sequence of video frames, includes: encoding the video frames according to a first predictive encoding to generate encoded video data, the encoded video data including a prediction error based on the difference between a portion of a current video frame in the sequence and a first predictor thereof based on a first preceding video frame in the sequence; generating auxiliary video data related to the portion of the current video frame; and transmitting the encoded video data and the auxiliary video data to a receiver, the encoded video data being transmitted over a first channel, and the auxiliary video data being transmitted over a second channel. The step of generating auxiliary video data includes calculating a correlation between the first predictor and a predetermined second predictor based on a second preceding video frame in the sequence, the second preceding video frame preceding in the sequence the first preceding video frame.
    Type: Grant
    Filed: June 30, 2006
    Date of Patent: October 13, 2015
    Assignees: Telecom Italia S.p.A., Politecnico di Milano
    Inventors: Giovanni Cordara, Marco Tagliasacchi, Stefano Tubaro
  • Publication number: 20130304401
    Abstract: An apparatus for locating the point of impact of a body on a surface comprises detecting means (12; 12a, 12b, 12c, 12d) adapted to detect pressure waves generated by the interaction of said body with said surface, and a processing unit (15) operatively connected to said detecting means (12; 12a, 12b, 12c, 12d); the detecting means (12; 12a, 12b, 12c, 12d) and the processing unit (14, 15) are configured for detecting and processing power values associated with said pressure waves so as to calculate the position of said point of impact as a function of the aforementioned power values. Also described is the related method.
    Type: Application
    Filed: January 30, 2012
    Publication date: November 14, 2013
    Applicant: TECHNOGYM S.P.A.
    Inventors: Stefano Tubaro, Angusto Sarti, Marco Tagliasacchi, Fabio Antonacci, Fulvio Crivellaro, Gabriele Genovese