Patents by Inventor Michiel A. U. Bacchiani

Michiel A. U. Bacchiani has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

COMPLEX LINEAR PROJECTION FOR ACOUSTIC MODELING

Publication number: 20190115013

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition using complex linear projection are disclosed. In one aspect, a method includes the actions of receiving audio data corresponding to an utterance. The method further includes generating frequency domain data using the audio data. The method further includes processing the frequency domain data using complex linear projection. The method further includes providing the processed frequency domain data to a neural network trained as an acoustic model. The method further includes generating a transcription for the utterance that is determined based at least on output that the neural network provides in response to receiving the processed frequency domain data.

Type: Application

Filed: October 26, 2018

Publication date: April 18, 2019

Inventors: Samuel Bengio, Mirko Visontai, Christopher Walter George Thornton, Michiel A.U. Bacchiani, Tara N. Sainath, Ehsan Variani, Izhak Shafran
Complex linear projection for acoustic modeling

Patent number: 10140980

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition using complex linear projection are disclosed. In one aspect, a method includes the actions of receiving audio data corresponding to an utterance. The method further includes generating frequency domain data using the audio data. The method further includes processing the frequency domain data using complex linear projection. The method further includes providing the processed frequency domain data to a neural network trained as an acoustic model. The method further includes generating a transcription for the utterance that is determined based at least on output that the neural network provides in response to receiving the processed frequency domain data.

Type: Grant

Filed: December 21, 2016

Date of Patent: November 27, 2018

Assignee: Google LCC

Inventors: Samuel Bengio, Mirko Visontai, Christopher Walter George Thornton, Michiel A. U. Bacchiani, Tara N. Sainath, Ehsan Variani, Izhak Shafran
QUERY ENDPOINTING BASED ON LIP DETECTION

Publication number: 20180268812

Abstract: Systems and methods are described for improving endpoint detection of a voice query submitted by a user. In some implementations, a synchronized video data and audio data is received. A sequence of frames of the video data that includes images corresponding to lip movement on a face is determined. The audio data is endpointed based on first audio data that corresponds to a first frame of the sequence of frames and second audio data that corresponds to a last frame of the sequence of frames. A transcription of the endpointed audio data is generated by an automated speech recognizer. The generated transcription is then provided for output.

Type: Application

Filed: March 14, 2017

Publication date: September 20, 2018

Inventors: Chanwoo Kim, Rajeev Conrad Nongpiur, Michiel A.U. Bacchiani
ASYNCHRONOUS OPTIMIZATION FOR SEQUENCE TRAINING OF NEURAL NETWORKS

Publication number: 20180261204

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, by a first sequence-training speech model, a first batch of training frames that represent speech features of first training utterances; obtaining, by the first sequence-training speech model, one or more first neural network parameters; determining, by the first sequence-training speech model, one or more optimized first neural network parameters based on (i) the first batch of training frames and (ii) the one or more first neural network parameters; obtaining, by a second sequence-training speech model, a second batch of training frames that represent speech features of second training utterances; obtaining one or more second neural network parameters; and determining, by the second sequence-training speech model, one or more optimized second neural network parameters based on (i) the second batch of training frames and (ii) the one or more second neural network parameters.

Type: Application

Filed: March 2, 2018

Publication date: September 13, 2018

Inventors: Georg Heigold, Erik McDermott, Vincent O. Vanhoucke, Andrew W. Senior, Michiel A.U. Bacchiani
ADAPTIVE AUDIO ENHANCEMENT FOR MULTICHANNEL SPEECH RECOGNITION

Publication number: 20180197534

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.

Type: Application

Filed: December 20, 2017

Publication date: July 12, 2018

Inventors: Bo Li, Ron J. Weiss, Michiel A.U. Bacchiani, Tara N. Sainath, Kevin William Wilson
Asynchronous optimization for sequence training of neural networks

Patent number: 10019985

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, by a first sequence-training speech model, a first batch of training frames that represent speech features of first training utterances; obtaining, by the first sequence-training speech model, one or more first neural network parameters; determining, by the first sequence-training speech model, one or more optimized first neural network parameters based on (i) the first batch of training frames and (ii) the one or more first neural network parameters; obtaining, by a second sequence-training speech model, a second batch of training frames that represent speech features of second training utterances; obtaining one or more second neural network parameters; and determining, by the second sequence-training speech model, one or more optimized second neural network parameters based on (i) the second batch of training frames and (ii) the one or more second neural network parameters.

Type: Grant

Filed: April 22, 2014

Date of Patent: July 10, 2018

Assignee: Google LLC

Inventors: Georg Heigold, Erik McDermott, Vincent O. Vanhoucke, Andrew W. Senior, Michiel A. U. Bacchiani
COMPLEX LINEAR PROJECTION FOR ACOUSTIC MODELING

Publication number: 20180174575

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition using complex linear projection are disclosed. In one aspect, a method includes the actions of receiving audio data corresponding to an utterance. The method further includes generating frequency domain data using the audio data. The method further includes processing the frequency domain data using complex linear projection. The method further includes providing the processed frequency domain data to a neural network trained as an acoustic model. The method further includes generating a transcription for the utterance that is determined based at least on output that the neural network provides in response to receiving the processed frequency domain data.

Type: Application

Filed: December 21, 2016

Publication date: June 21, 2018

Inventors: Samuel Bengio, Mirko Visontai, Christopher Walter George Thornton, Michiel A.U. Bacchiani, Tara N. Sainath, Ehsan Variani, Izhak Shafran
Adaptive audio enhancement for multichannel speech recognition

Patent number: 9886949

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.

Type: Grant

Filed: December 28, 2016

Date of Patent: February 6, 2018

Assignee: Google Inc.

Inventors: Bo Li, Ron J. Weiss, Michiel A. U. Bacchiani, Tara N. Sainath, Kevin William Wilson
ADAPTIVE AUDIO ENHANCEMENT FOR MULTICHANNEL SPEECH RECOGNITION

Publication number: 20170278513

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.

Type: Application

Filed: December 28, 2016

Publication date: September 28, 2017

Inventors: Bo Li, Ron J. Weiss, Michiel A.U. Bacchiani, Tara N. Sainath, Kevin William Wilson
Processing multi-channel audio waveforms

Patent number: 9697826

Abstract: Methods, including computer programs encoded on a computer storage medium, for enhancing the processing of audio waveforms for speech recognition using various neural network processing techniques. In one aspect, a method includes: receiving multiple channels of audio data corresponding to an utterance; convolving each of multiple filters, in a time domain, with each of the multiple channels of audio waveform data to generate convolution outputs, wherein the multiple filters have parameters that have been learned during a training process that jointly trains the multiple filters and trains a deep neural network as an acoustic model; combining, for each of the multiple filters, the convolution outputs for the filter for the multiple channels of audio waveform data; inputting the combined convolution outputs to the deep neural network trained jointly with the multiple filters; and providing a transcription for the utterance that is determined.

Type: Grant

Filed: July 8, 2016

Date of Patent: July 4, 2017

Assignee: Google Inc.

Inventors: Tara N. Sainath, Ron J. Weiss, Kevin William Wilson, Andrew W. Senior, Arun Narayanan, Yedid Hoshen, Michiel A. U. Bacchiani
Context-dependent state tying using a neural network

Patent number: 9620145

Abstract: The technology described herein can be embodied in a method that includes receiving an audio signal encoding a portion of an utterance, and providing, to a first neural network, data corresponding to the audio signal. The method also includes generating, by a processor, data representing a transcription for the utterance based on an output of the first neural network. The first neural network is trained using features of multiple context-dependent states, the context-dependent states being derived from a plurality of context-independent states provided by a second neural network.

Type: Grant

Filed: May 20, 2014

Date of Patent: April 11, 2017

Assignee: Google Inc.

Inventors: Michiel A. U. Bacchiani, David Rybach
PROCESSING MULTI-CHANNEL AUDIO WAVEFORMS

Publication number: 20160322055

Abstract: Methods, including computer programs encoded on a computer storage medium, for enhancing the processing of audio waveforms for speech recognition using various neural network processing techniques. In one aspect, a method includes: receiving multiple channels of audio data corresponding to an utterance; convolving each of multiple filters, in a time domain, with each of the multiple channels of audio waveform data to generate convolution outputs, wherein the multiple filters have parameters that have been learned during a training process that jointly trains the multiple filters and trains a deep neural network as an acoustic model; combining, for each of the multiple filters, the convolution outputs for the filter for the multiple channels of audio waveform data; inputting the combined convolution outputs to the deep neural network trained jointly with the multiple filters; and providing a transcription for the utterance that is determined.

Type: Application

Filed: July 8, 2016

Publication date: November 3, 2016

Inventors: Tara N. Sainath, Ron J. Weiss, Kevin William Wilson, Andrew W. Senior, Arun Narayanan, Yedid Hoshen, Michiel A.U. Bacchiani
CONTEXT-DEPENDENT STATE TYING USING A NEURAL NETWORK

Publication number: 20150127327

Abstract: The technology described herein can be embodied in a method that includes receiving an audio signal encoding a portion of an utterance, and providing, to a first neural network, data corresponding to the audio signal. The method also includes generating, by a processor, data representing a transcription for the utterance based on an output of the first neural network. The first neural network is trained using features of multiple context-dependent states, the context-dependent states being derived from a plurality of context-independent states provided by a second neural network.

Type: Application

Filed: May 20, 2014

Publication date: May 7, 2015

Applicant: Google Inc.

Inventors: Michiel A.U. Bacchiani, David Rybach
ASYNCHRONOUS OPTIMIZATION FOR SEQUENCE TRAINING OF NEURAL NETWORKS

Publication number: 20150127337

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for obtaining, by a first sequence-training speech model, a first batch of training frames that represent speech features of first training utterances; obtaining, by the first sequence-training speech model, one or more first neural network parameters; determining, by the first sequence-training speech model, one or more optimized first neural network parameters based on (i) the first batch of training frames and (ii) the one or more first neural network parameters; obtaining, by a second sequence-training speech model, a second batch of training frames that represent speech features of second training utterances; obtaining one or more second neural network parameters; and determining, by the second sequence-training speech model, one or more optimized second neural network parameters based on (i) the second batch of training frames and (ii) the one or more second neural network parameters.

Type: Application

Filed: April 22, 2014

Publication date: May 7, 2015

Applicant: Google Inc.

Inventors: Georg Heigold, Erik McDermott, Vincent O. Vanhoucke, Andrew W. Senior, Michiel A.U. Bacchiani
System and method for using meta-data dependent language modeling for automatic speech recognition

Patent number: 8069043

Abstract: Disclosed are systems and methods for providing a spoken dialog system using meta-data to build language models to improve speech processing. Meta-data is generally defined as data outside received speech; for example, meta-data may be a customer profile having a name, address and purchase history of a caller to a spoken dialog system. The method comprises building tree clusters from meta-data and estimating a language model using the built tree clusters. The language model may be used by various modules in the spoken dialog system, such as the automatic speech recognition module and/or the dialog management module. Building the tree clusters from the meta-data may involve generating projections from the meta-data and further may comprise computing counts as a result of unigram tree clustering and then building both unigram trees and higher-order trees from the meta-data as well as computing node distances within the built trees that are used for estimating the language model.

Type: Grant

Filed: June 3, 2010

Date of Patent: November 29, 2011

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Michiel A. U. Bacchiani, Brian E. Roark
System and method of using meta-data in speech processing

Patent number: 7996224

Abstract: Systems and methods relate to generating a language model for use in, for example, a spoken dialog system or some other application. The method comprises building a class-based language model, generating at least one sequence network and replacing class labels in the class-based language model with the at least one sequence network. In this manner, placeholders or tokens associated with classes can be inserted into the models at training time and word/phone networks can be built based on meta-data information at test time. Finally, the placeholder token can be replaced with the word/phone networks at run time to improve recognition of difficult words such as proper names.

Type: Grant

Filed: October 29, 2004

Date of Patent: August 9, 2011

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Michiel A. U. Bacchiani, Sameer Raj Maskey, Brian E. Roark, Richard William Sproat
SYSTEM AND METHOD FOR USING META-DATA DEPENDENT LANGUAGE MODELING FOR AUTOMATIC SPEECH RECOGNITION

Publication number: 20100241430

Abstract: Disclosed are systems and methods for providing a spoken dialog system using meta-data to build language models to improve speech processing. Meta-data is generally defined as data outside received speech; for example, meta-data may be a customer profile having a name, address and purchase history of a caller to a spoken dialog system. The method comprises building tree clusters from meta-data and estimating a language model using the built tree clusters. The language model may be used by various modules in the spoken dialog system, such as the automatic speech recognition module and/or the dialog management module. Building the tree clusters from the meta-data may involve generating projections from the meta-data and further may comprise computing counts as a result of unigram tree clustering and then building both unigram trees and higher-order trees from the meta-data as well as computing node distances within the built trees that are used for estimating the language model.

Type: Application

Filed: June 3, 2010

Publication date: September 23, 2010

Applicant: AT&T Intellectual Property II, L.P., via transfer from AT&T Corp.

Inventors: Michiel A. U. Bacchiani, Brian E. Roark

prev 1 2 3