Patents by Inventor Alejandro LUEBS

Alejandro LUEBS has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

SPEECH CODING USING AUTO-REGRESSIVE GENERATIVE NEURAL NETWORKS

Publication number: 20230368804

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for coding speech using neural networks. One of the methods includes obtaining a bitstream of parametric coder parameters characterizing spoken speech; generating, from the parametric coder parameters, a conditioning sequence; generating a reconstruction of the spoken speech that includes a respective speech sample at each of a plurality of decoder time steps, comprising, at each decoder time step: processing a current reconstruction sequence using an auto-regressive generative neural network, wherein the auto-regressive generative neural network is configured to process the current reconstruction to compute a score distribution over possible speech sample values, and wherein the processing comprises conditioning the auto-regressive generative neural network on at least a portion of the conditioning sequence; and sampling a speech sample from the possible speech sample values.

Type: Application

Filed: May 8, 2023

Publication date: November 16, 2023

Inventors: Willem Bastiaan Kleijn, Jan K. Skoglund, Alejandro Luebs, Sze Chie Lim
Speech coding using content latent embedding vectors and speaker latent embedding vectors

Patent number: 11756561

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating discrete latent representations of input audio data. Only the discrete latent representation needs to be transmitted from an encoder system to a decoder system in order for the decoder system to be able to effectively to decode, i.e., reconstruct, the input audio data.

Type: Grant

Filed: February 17, 2022

Date of Patent: September 12, 2023

Assignee: DeepMind Technologies Limited

Inventors: Cristina Garbacea, Aaron Gerard Antonius van den Oord, Yazhe Li, Sze Chie Lim, Alejandro Luebs, Oriol Vinyals, Thomas Chadwick Walters
Speech coding using auto-regressive generative neural networks

Patent number: 11676613

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for coding speech using neural networks. One of the methods includes obtaining a bitstream of parametric coder parameters characterizing spoken speech; generating, from the parametric coder parameters, a conditioning sequence; generating a reconstruction of the spoken speech that includes a respective speech sample at each of a plurality of decoder time steps, comprising, at each decoder time step: processing a current reconstruction sequence using an auto-regressive generative neural network, wherein the auto-regressive generative neural network is configured to process the current reconstruction to compute a score distribution over possible speech sample values, and wherein the processing comprises conditioning the auto-regressive generative neural network on at least a portion of the conditioning sequence; and sampling a speech sample from the possible speech sample values.

Type: Grant

Filed: May 27, 2021

Date of Patent: June 13, 2023

Assignee: Google LLC

Inventors: Willem Bastiaan Kleijn, Jan K. Skoglund, Alejandro Luebs, Sze Chie Lim
SPEECH CODING USING CONTENT LATENT EMBEDDING VECTORS AND SPEAKER LATENT EMBEDDING VECTORS

Publication number: 20220319527

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating discrete latent representations of input audio data. Only the discrete latent representation needs to be transmitted from an encoder system to a decoder system in order for the decoder system to be able to effectively to decode, i.e., reconstruct, the input audio data.

Type: Application

Filed: February 17, 2022

Publication date: October 6, 2022

Inventors: Cristina Garbacea, Aaron Gerard Antonius van den Oord, Yazhe Li, Sze Chie Lim, Alejandro Luebs, Oriol Vinyals, Thomas Chadwick Walters
Speech coding using content latent embedding vectors and speaker latent embedding vectors

Patent number: 11257507

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating discrete latent representations of input audio data. Only the discrete latent representation needs to be transmitted from an encoder system to a decoder system in order for the decoder system to be able to effectively to decode, i.e., reconstruct, the input audio data.

Type: Grant

Filed: January 17, 2020

Date of Patent: February 22, 2022

Assignee: DeepMind Technologies Limited

Inventors: Cristina Garbacea, Aaron Gerard Antonius van den Oord, Yazhe Li, Sze Chie Lim, Alejandro Luebs, Oriol Vinyals, Thomas Chadwick Walters
SPEECH CODING USING AUTO-REGRESSIVE GENERATIVE NEURAL NETWORKS

Publication number: 20210366495

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for coding speech using neural networks. One of the methods includes obtaining a bitstream of parametric coder parameters characterizing spoken speech; generating, from the parametric coder parameters, a conditioning sequence; generating a reconstruction of the spoken speech that includes a respective speech sample at each of a plurality of decoder time steps, comprising, at each decoder time step: processing a current reconstruction sequence using an auto-regressive generative neural network, wherein the auto-regressive generative neural network is configured to process the current reconstruction to compute a score distribution over possible speech sample values, and wherein the processing comprises conditioning the auto-regressive generative neural network on at least a portion of the conditioning sequence; and sampling a speech sample from the possible speech sample values.

Type: Application

Filed: May 27, 2021

Publication date: November 25, 2021

Inventors: Willem Bastiaan Kleijn, Jan K. Skoglund, Alejandro Luebs, Sze Chie Lim
Speech coding using auto-regressive generative neural networks

Patent number: 11024321

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for coding speech using neural networks. One of the methods includes obtaining a bitstream of parametric coder parameters characterizing spoken speech; generating, from the parametric coder parameters, a conditioning sequence; generating a reconstruction of the spoken speech that includes a respective speech sample at each of a plurality of decoder time steps, comprising, at each decoder time step: processing a current reconstruction sequence using an auto-regressive generative neural network, wherein the auto-regressive generative neural network is configured to process the current reconstruction to compute a score distribution over possible speech sample values, and wherein the processing comprises conditioning the auto-regressive generative neural network on at least a portion of the conditioning sequence; and sampling a speech sample from the possible speech sample values.

Type: Grant

Filed: November 30, 2018

Date of Patent: June 1, 2021

Assignee: Google LLC

Inventors: Willem Bastiaan Kleijn, Jan K. Skoglund, Alejandro Luebs, Sze Chie Lim
SPEECH CODING USING DISCRETE LATENT REPRESENTATIONS

Publication number: 20200234725

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating discrete latent representations of input audio data. Only the discrete latent representation needs to be transmitted from an encoder system to a decoder system in order for the decoder system to be able to effectively to decode, i.e., reconstruct, the input audio data.

Type: Application

Filed: January 17, 2020

Publication date: July 23, 2020

Inventors: Cristina Garbacea, Aaron Gerard Antonius van den Oord, Yazhe Li, Sze Chie Lim, Alejandro Luebs, Oriol Vinyals, Thomas Chadwick Walters
SPEECH CODING USING AUTO-REGRESSIVE GENERATIVE NEURAL NETWORKS

Publication number: 20200176004

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for coding speech using neural networks. One of the methods includes obtaining a bitstream of parametric coder parameters characterizing spoken speech; generating, from the parametric coder parameters, a conditioning sequence; generating a reconstruction of the spoken speech that includes a respective speech sample at each of a plurality of decoder time steps, comprising, at each decoder time step: processing a current reconstruction sequence using an auto-regressive generative neural network, wherein the auto-regressive generative neural network is configured to process the current reconstruction to compute a score distribution over possible speech sample values, and wherein the processing comprises conditioning the auto-regressive generative neural network on at least a portion of the conditioning sequence; and sampling a speech sample from the possible speech sample values.

Type: Application

Filed: November 30, 2018

Publication date: June 4, 2020

Inventors: Willem Bastiaan Kleijn, Jan K. Skoglund, Alejandro Luebs, Sze Chie Lim
Mixture model based soft-clipping detection

Patent number: 10110187

Abstract: Mixture model based soft-clipping detection includes receiving input audio samples, generating soft-clipping information indicating whether the input audio samples include soft-clipping distortion, and outputting the soft-clipping information. Generating the soft-clipping information includes fitting a mixture model to the input audio samples, wherein fitting the mixture model to the input audio samples includes generating a fitted mixture model, such that the fitted mixture model has fitted parameters, and evaluating a soft-clipping distortion metric based on the parameters of the fitted mixture model, wherein evaluating the soft-clipping distortion metric includes identifying a soft-clipping distortion value.

Type: Grant

Filed: June 26, 2017

Date of Patent: October 23, 2018

Assignee: GOOGLE LLC

Inventors: Alejandro Luebs, Fritz Obermeyer
Bi-magnitude processing framework for nonlinear echo cancellation in mobile devices

Patent number: 10045137

Abstract: Techniques of performing acoustic echo cancellation involve providing a bi-magnitude filtering operation that performs a first filtering operation when a magnitude of an incoming audio signal to be output from a loudspeaker is less than a specified threshold and a second filtering operation when the magnitude of the incoming audio signal is greater than the threshold. The first filtering operation may take the form of a convolution between the incoming audio signal and a first impulse response function. The second filtering operation may take the form of a convolution between a nonlinear function of the incoming audio signal and a second impulse response function. For such a convolution, the bi-magnitude filtering operation involves providing, as the incoming audio signal, samples of the incoming audio signal over a specified window of time. The first and second impulse response functions may be determined from an input signal input into a microphone.

Type: Grant

Filed: June 30, 2017

Date of Patent: August 7, 2018

Assignee: Google LLC

Inventors: Jan Skoglund, Yiteng Huang, Alejandro Luebs
BI-MAGNITUDE PROCESSING FRAMEWORK FOR NONLINEAR ECHO CANCELLATION IN MOBILE DEVICES

Publication number: 20180007482

Abstract: Techniques of performing acoustic echo cancellation involve providing a bi-magnitude filtering operation that performs a first filtering operation when a magnitude of an incoming audio signal to be output from a loudspeaker is less than a specified threshold and a second filtering operation when the magnitude of the incoming audio signal is greater than the threshold. The first filtering operation may take the form of a convolution between the incoming audio signal and a first impulse response function. The second filtering operation may take the form of a convolution between a nonlinear function of the incoming audio signal and a second impulse response function. For such a convolution, the bi-magnitude filtering operation involves providing, as the incoming audio signal, samples of the incoming audio signal over a specified window of time. The first and second impulse response functions may be determined from an input signal input into a microphone.

Type: Application

Filed: June 30, 2017

Publication date: January 4, 2018

Inventors: Jan Skoglund, Yiteng Huang, Alejandro Luebs
GLOBALLY OPTIMIZED LEAST-SQUARES POST-FILTERING FOR SPEECH ENHANCEMENT

Publication number: 20170221502

Abstract: Existing post-filtering methods for microphone array speech enhancement have two common deficiencies. First, they assume that noise is either white or diffuse and cannot deal with point interferers. Second, they estimate the post-filter coefficients using only two microphones at a time, performing averaging over all the microphones pairs, yielding a suboptimal solution. The provided method describes a post-filtering solution that implements signal models which handle white noise, diffuse noise, and point interferers. The method also implements a globally optimized least-squares approach of microphones in a microphone array, providing a more optimal solution than existing conventional methods. Experimental results demonstrate the described method outperforming conventional methods in various acoustic scenarios.

Type: Application

Filed: February 3, 2016

Publication date: August 3, 2017

Applicant: Google Inc.

Inventors: Yiteng HUANG, Alejandro LUEBS, Jan SKOGLUND, Willem Bastiaan KLEIJN
Globally optimized least-squares post-filtering for speech enhancement

Patent number: 9721582

Abstract: Existing post-filtering methods for microphone array speech enhancement have two common deficiencies. First, they assume that noise is either white or diffuse and cannot deal with point interferers. Second, they estimate the post-filter coefficients using only two microphones at a time, performing averaging over all the microphones pairs, yielding a suboptimal solution. The provided method describes a post-filtering solution that implements signal models which handle white noise, diffuse noise, and point interferers. The method also implements a globally optimized least-squares approach of microphones in a microphone array, providing a more optimal solution than existing conventional methods. Experimental results demonstrate the described method outperforming conventional methods in various acoustic scenarios.

Type: Grant

Filed: February 3, 2016

Date of Patent: August 1, 2017

Assignee: GOOGLE INC.

Inventors: Yiteng Huang, Alejandro Luebs, Jan Skoglund, Willem Bastiaan Kleijn
Situation dependent transient suppression

Patent number: 9721580

Abstract: Provided are methods and systems for providing situation-dependent transient noise suppression for audio signals. Different strategies (e.g., levels of aggressiveness) of transient suppression and signal restoration are applied to audio signals associated with participants in a video/audio conference depending on whether or not each participant is speaking (e.g., whether a voiced segment or an unvoiced/non-speech segment of audio is present). If no participants are speaking or there is an unvoiced/non-speech sound present, a more aggressive strategy for transient suppression and signal restoration is utilized. On the other hand, where voiced audio is detected (e.g., a participant is speaking), the methods and systems apply a softer, less aggressive suppression and restoration process.

Type: Grant

Filed: March 31, 2014

Date of Patent: August 1, 2017

Assignee: Google Inc.

Inventors: Jan Skoglund, Alejandro Luebs
SITUATION DEPENDENT TRANSIENT SUPPRESSION

Publication number: 20150279386

Abstract: Provided are methods and systems for providing situation-dependent transient noise suppression for audio signals. Different strategies (e.g., levels of aggressiveness) of transient suppression and signal restoration are applied to audio signals associated with participants in a video/audio conference depending on whether or not each participant is speaking (e.g., whether a voiced segment or an unvoiced/non-speech segment of audio is present). If no participants are speaking or there is an unvoiced/non-speech sound present, a more aggressive strategy for transient suppression and signal restoration is utilized. On the other hand, where voiced audio is detected (e.g., a participant is speaking), the methods and systems apply a softer, less aggressive suppression and restoration process.

Type: Application

Filed: March 31, 2014

Publication date: October 1, 2015

Applicant: Google Inc.

Inventors: Jan SKOGLUND, Alejandro LUEBS