Patents by Inventor James Droppo

James Droppo has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Computing system for training neural networks

Patent number: 11049006

Abstract: Techniques and constructs can reduce the time required to determine solutions to optimization problems such as training of neural networks. Modifications to a computational model can be determined by a plurality of nodes operating in parallel. Quantized modification values can be transmitted between the nodes to reduce the volume of data to be transferred. The quantized values can be as small as one bit each. Quantization-error values can be stored and used in quantizing subsequent modifications. The nodes can operate in parallel and overlap computation and data transfer to further reduce the time required to determine solutions. The quantized values can be partitioned and each node can aggregate values for a corresponding partition.

Type: Grant

Filed: September 12, 2014

Date of Patent: June 29, 2021

Assignee: Microsoft Technology Licensing, LLC

Inventors: John Langford, Gang Li, Frank Torsten Bernd Seide, James Droppo, Dong Yu
Advancing word-based speech recognition processing

Patent number: 10629193

Abstract: Non-limiting examples of the present disclosure describe advancements in acoustic-to-word modeling that improve accuracy in speech recognition processing through the replacement of out-of-vocabulary (OOV) tokens. During the decoding of speech signals, better accuracy in speech recognition processing is achieved through training and implementation of multiple different solutions that present enhanced speech recognition models. In one example, a hybrid neural network model for speech recognition processing combines a word-based neural network model as a primary model and a character-based neural network model as an auxiliary model. The primary word-based model emits a word sequence, and an output of character-based auxiliary model is consulted at a segment where the word-based model emits an OOV token. In another example, a mixed unit speech recognition model is developed and trained to generate a mixed word and character sequence during decoding of a speech signal without requiring generation of OOV tokens.

Type: Grant

Filed: March 9, 2018

Date of Patent: April 21, 2020

Assignee: Microsoft Technology Licensing, LLC

Inventors: Guoli Ye, James Droppo, Jinyu Li, Rui Zhao, Yifan Gong
Multi-talker speech recognizer

Patent number: 10460727

Abstract: Various systems and methods for multi-talker speech separation and recognition are disclosed herein. In one example, a system includes a memory and a processor to process mixed speech audio received from a microphone. In an example, the processor can also separate the mixed speech audio using permutation invariant training, wherein a criterion of the permutation invariant training is defined on an utterance of the mixed speech audio. In an example, the processor can also generate a plurality of separated streams for submission to a speech decoder.

Type: Grant

Filed: May 23, 2017

Date of Patent: October 29, 2019

Assignee: Microsoft Technology Licensing, LLC

Inventors: James Droppo, Xuedong Huang, Dong Yu
ADVANCING WORD-BASED SPEECH RECOGNITION PROCESSING

Publication number: 20190279614

Abstract: Non-limiting examples of the present disclosure describe advancements in acoustic-to-word modeling that improve accuracy in speech recognition processing through the replacement of out-of-vocabulary (OOV) tokens. During the decoding of speech signals, better accuracy in speech recognition processing is achieved through training and implementation of multiple different solutions that present enhanced speech recognition models. In one example, a hybrid neural network model for speech recognition processing combines a word-based neural network model as a primary model and a character-based neural network model as an auxiliary model. The primary word-based model emits a word sequence, and an output of character-based auxiliary model is consulted at a segment where the word-based model emits an OOV token. In another example, a mixed unit speech recognition model is developed and trained to generate a mixed word and character sequence during decoding of a speech signal without requiring generation of OOV tokens.

Type: Application

Filed: March 9, 2018

Publication date: September 12, 2019

Inventors: Guoli YE, James DROPPO, Jinyu LI, Rui ZHAO, Yifan GONG
MULTI-TALKER SPEECH RECOGNIZER

Publication number: 20180254040

Abstract: Various systems and methods for multi-talker speech separation and recognition are disclosed herein. In one example, a system includes a memory and a processor to process mixed speech audio received from a microphone. In an example, the processor can also separate the mixed speech audio using permutation invariant training, wherein a criterion of the permutation invariant training is defined on an utterance of the mixed speech audio. In an example, the processor can also generate a plurality of separated streams for submission to a speech decoder.

Type: Application

Filed: May 23, 2017

Publication date: September 6, 2018

Applicant: Microsoft Technology Licensing, LLC

Inventors: James Droppo, Xuedong Huang, Dong Yu
COMPUTING SYSTEM FOR TRAINING NEURAL NETWORKS

Publication number: 20170308789

Abstract: Techniques and constructs can reduce the time required to determine solutions to optimization problems such as training of neural networks. Modifications to a computational model can be determined by a plurality of nodes operating in parallel. Quantized modification values can be transmitted between the nodes to reduce the volume of data to be transferred. The quantized values can be as small as one bit each. Quantization-error values can be stored and used in quantizing subsequent modifications. The nodes can operate in parallel and overlap computation and data transfer to further reduce the time required to determine solutions. The quantized values can be partitioned and each node can aggregate values for a corresponding partition.

Type: Application

Filed: September 12, 2014

Publication date: October 26, 2017

Inventors: John LANGFORD, Gang LI, Frank Torsten Bernd SEIDE, James DROPPO, Dong YU
Mixed speech recognition

Patent number: 9779727

Abstract: The claimed subject matter includes a system and method for recognizing mixed speech from a source. The method includes training a first neural network to recognize the speech signal spoken by the speaker with a higher level of a speech characteristic from a mixed speech sample. The method also includes training a second neural network to recognize the speech signal spoken by the speaker with a lower level of the speech characteristic from the mixed speech sample. Additionally, the method includes decoding the mixed speech sample with the first neural network and the second neural network by optimizing the joint likelihood of observing the two speech signals considering the probability that a specific frame is a switching point of the speech characteristic.

Type: Grant

Filed: December 30, 2016

Date of Patent: October 3, 2017

Assignee: Microsoft Technology Licensing, LLC

Inventors: Dong Yu, Chao Weng, Michael L. Seltzer, James Droppo
MIXED SPEECH RECOGNITION

Publication number: 20170110120

Abstract: The claimed subject matter includes a system and method for recognizing mixed speech from a source. The method includes training a first neural network to recognize the speech signal spoken by the speaker with a higher level of a speech characteristic from a mixed speech sample. The method also includes training a second neural network to recognize the speech signal spoken by the speaker with a lower level of the speech characteristic from the mixed speech sample. Additionally, the method includes decoding the mixed speech sample with the first neural network and the second neural network by optimizing the joint likelihood of observing the two speech signals considering the probability that a specific frame is a switching point of the speech characteristic.

Type: Application

Filed: December 30, 2016

Publication date: April 20, 2017

Applicant: Microsoft Technology Licensing, LLC

Inventors: Dong Yu, Chao Weng, Michael L. Seltzer, James Droppo
Mixed speech recognition

Patent number: 9558742

Abstract: The claimed subject matter includes a system and method for recognizing mixed speech from a source. The method includes training a first neural network to recognize the speech signal spoken by the speaker with a higher level of a speech characteristic from a mixed speech sample. The method also includes training a second neural network to recognize the speech signal spoken by the speaker with a lower level of the speech characteristic from the mixed speech sample. Additionally, the method includes decoding the mixed speech sample with the first neural network and the second neural network by optimizing the joint likelihood of observing the two speech signals considering the probability that a specific frame is a switching point of the speech characteristic.

Type: Grant

Filed: June 8, 2016

Date of Patent: January 31, 2017

Assignee: Microsoft Technology Licensing, LLC

Inventors: Dong Yu, Chao Weng, Michael L. Seltzer, James Droppo
MIXED SPEECH RECOGNITION

Publication number: 20160284348

Abstract: The claimed subject matter includes a system and method for recognizing mixed speech from a source. The method includes training a first neural network to recognize the speech signal spoken by the speaker with a higher level of a speech characteristic from a mixed speech sample. The method also includes training a second neural network to recognize the speech signal spoken by the speaker with a lower level of the speech characteristic from the mixed speech sample. Additionally, the method includes decoding the mixed speech sample with the first neural network and the second neural network by optimizing the joint likelihood of observing the two speech signals considering the probability that a specific frame is a switching point of the speech characteristic.

Type: Application

Filed: June 8, 2016

Publication date: September 29, 2016

Applicant: Microsoft Technology Licensing, LLC

Inventors: Dong Yu, Chao Weng, Michael L. Seltzer, James Droppo
Mixed speech recognition

Patent number: 9390712

Abstract: The claimed subject matter includes a system and method for recognizing mixed speech from a source. The method includes training a first neural network to recognize the speech signal spoken by the speaker with a higher level of a speech characteristic from a mixed speech sample. The method also includes training a second neural network to recognize the speech signal spoken by the speaker with a lower level of the speech characteristic from the mixed speech sample. Additionally, the method includes decoding the mixed speech sample with the first neural network and the second neural network by optimizing the joint likelihood of observing the two speech signals considering the probability that a specific frame is a switching point of the speech characteristic.

Type: Grant

Filed: March 24, 2014

Date of Patent: July 12, 2016

Assignee: Microsoft Technology Licensing, LLC.

Inventors: Dong Yu, Chao Weng, Michael L. Seltzer, James Droppo
MIXED SPEECH RECOGNITION

Publication number: 20150269933

Abstract: The claimed subject matter includes a system and method for recognizing mixed speech from a source. The method includes training a first neural network to recognize the speech signal spoken by the speaker with a higher level of a speech characteristic from a mixed speech sample. The method also includes training a second neural network to recognize the speech signal spoken by the speaker with a lower level of the speech characteristic from the mixed speech sample. Additionally, the method includes decoding the mixed speech sample with the first neural network and the second neural network by optimizing the joint likelihood of observing the two speech signals considering the probability that a specific frame is a switching point of the speech characteristic.

Type: Application

Filed: March 24, 2014

Publication date: September 24, 2015

Inventors: Dong Yu, Chao Weng, Michael L. Seltzer, James Droppo
Method and apparatus using harmonic-model-based front end for robust speech recognition

Patent number: 7516067

Abstract: A system and method are provided that reduce noise in speech signals. The system and method decompose a noisy speech signal into a harmonic component and a residual component. The harmonic component and residual component are then combined as a sum to form a noise-reduced value. In some embodiments, the sum is a weighted sum where the harmonic component is multiplied by a scaling factor. In some embodiments, the noise-reduced value is used in speech recognition.

Type: Grant

Filed: August 25, 2003

Date of Patent: April 7, 2009

Assignee: Microsoft Corporation

Inventors: Michael Seltzer, James Droppo, Alejandro Acero
Noise robust speech recognition with a switching linear dynamic model

Patent number: 7418383

Abstract: A unified, nonlinear, non-stationary, stochastic model is disclosed for estimating and removing effects of background noise on speech cepstra. Generally stated, the model is a union of dynamic system equations for speech and noise, and a model describing how speech and noise are mixed. Embodiments also pertain to related methods for enhancement.

Type: Grant

Filed: September 3, 2004

Date of Patent: August 26, 2008

Assignee: Microsoft Corporation

Inventors: James Droppo, Alejandro Acero
Method of determining uncertainty associated with acoustic distortion-based noise reduction

Publication number: 20070106504

Abstract: A method and apparatus are provided for determining uncertainty in noise reduction based on a parametric model of speech distortion. The method is first used to reduce noise in a noisy signal. In particular, noise is reduced from a representation of a portion of a noisy signal to produce a representation of a cleaned signal by utilizing an acoustic environment model. The uncertainty associated with the noise reduction process is then computed. In one embodiment, the uncertainty of the noise reduction process is used, in conjunction with the noise-reduced signal, to decode a pattern state.

Type: Application

Filed: December 20, 2006

Publication date: May 10, 2007

Applicant: Microsoft Corporation

Inventors: Li Deng, Alejandro Acero, James Droppo
Multi-sensory speech enhancement using a speech-state model

Publication number: 20060293887

Abstract: A method and apparatus determine a likelihood of a speech state based on an alternative sensor signal and an air conduction microphone signal. The likelihood of the speech state is used, together with the alternative sensor signal and the air conduction microphone signal, to estimate a clean speech value for a clean speech signal.

Type: Application

Filed: June 28, 2005

Publication date: December 28, 2006

Applicant: Microsoft Corporation

Inventors: Zhengyou Zhang, Zicheng Liu, Alejandro Acero, Amarnag Subramanya, James Droppo
Noise reduction using correction vectors based on dynamic aspects of speech and noise normalization

Publication number: 20060206321

Abstract: A method and apparatus are provided for reducing noise in a signal. Under one aspect of the invention, a correction vector is selected based on a noisy feature vector that represents a noisy signal. The selected correction vector incorporates dynamic aspects of pattern signals. The selected correction vector is then added to the noisy feature vector to produce a cleaned feature vector. In other aspects of the invention, a noise value is produced from an estimate of the noise in a noisy signal. The noise value is subtracted from a value representing a portion of the noisy signal to produce a noise-normalized value. The noise-normalized value is used to select a correction value that is added to the noise-normalized value to produce a cleaned noise-normalized value. The noise value is then added to the cleaned noise-normalized value to produce a cleaned value representing a portion of a cleaned signal.

Type: Application

Filed: May 5, 2006

Publication date: September 14, 2006

Applicant: Microsoft Corporation

Inventors: James Droppo, Li Deng, Alejandro Acero
Method of noise reduction based on dynamic aspects of speech

Publication number: 20060206322

Abstract: A system and method are provided that reduce noise in pattern recognition signals. To do this, embodiments of the present invention utilize a prior model of dynamic aspects of clean speech together with one or both of a prior model of static aspects of clean speech, and an acoustic model that indicates the relationship between clean speech, noisy speech and noise. In one embodiment, components of a noise-reduced feature vector are produced by forming a weighted sum of predicted values from the prior model of dynamic aspects of clean speech, the prior model of static aspects of clean speech and the acoustic-environmental model.

Type: Application

Filed: May 12, 2006

Publication date: September 14, 2006

Applicant: Microsoft Corporation

Inventors: Li Deng, James Droppo, Alejandro Acero
Method of pattern recognition using noise reduction uncertainty

Publication number: 20060206325

Abstract: A method and apparatus are provided for using the uncertainty of a noise-removal process during pattern recognition. In particular, noise is removed from a representation of a portion of a noisy signal to produce a representation of a cleaned signal. In the meantime, an uncertainty associated with the noise removal is computed and is used with the representation of the cleaned signal to modify a probability for a phonetic state in the recognition system. In particular embodiments, the uncertainty is used to modify a probability distribution, by increasing the variance in each Gaussian distribution by the amount equal to the estimated variance of the cleaned signal, which is used in decoding the phonetic state sequence in a pattern recognition task.

Type: Application

Filed: May 16, 2006

Publication date: September 14, 2006

Applicant: Microsoft Corporation

Inventors: James Droppo, Alejandro Acero, Li Deng
Method and apparatus for reducing noise corruption from an alternative sensor signal during multi-sensory speech enhancement

Publication number: 20060178880

Abstract: A method and apparatus classify a portion of an alternative sensor signal as either containing noise or not containing noise. The portions of the alternative sensor signal that are classified as containing noise are not used to estimate a portion of a clean speech signal and the channel response associated with the alternative sensor. The portions of the alternative sensor signal that are classified as not containing noise are used to estimate a portion of a clean speech signal and the channel response associated with the alternative sensor.

Type: Application

Filed: February 4, 2005

Publication date: August 10, 2006

Applicant: Microsoft Corporation

Inventors: Zhengyou Zhang, Amarnag Subramanya, James Droppo, Zicheng Liu

1 2 next