Patents by Inventor James Droppo

James Droppo has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11049006
    Abstract: Techniques and constructs can reduce the time required to determine solutions to optimization problems such as training of neural networks. Modifications to a computational model can be determined by a plurality of nodes operating in parallel. Quantized modification values can be transmitted between the nodes to reduce the volume of data to be transferred. The quantized values can be as small as one bit each. Quantization-error values can be stored and used in quantizing subsequent modifications. The nodes can operate in parallel and overlap computation and data transfer to further reduce the time required to determine solutions. The quantized values can be partitioned and each node can aggregate values for a corresponding partition.
    Type: Grant
    Filed: September 12, 2014
    Date of Patent: June 29, 2021
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: John Langford, Gang Li, Frank Torsten Bernd Seide, James Droppo, Dong Yu
  • Patent number: 10629193
    Abstract: Non-limiting examples of the present disclosure describe advancements in acoustic-to-word modeling that improve accuracy in speech recognition processing through the replacement of out-of-vocabulary (OOV) tokens. During the decoding of speech signals, better accuracy in speech recognition processing is achieved through training and implementation of multiple different solutions that present enhanced speech recognition models. In one example, a hybrid neural network model for speech recognition processing combines a word-based neural network model as a primary model and a character-based neural network model as an auxiliary model. The primary word-based model emits a word sequence, and an output of character-based auxiliary model is consulted at a segment where the word-based model emits an OOV token. In another example, a mixed unit speech recognition model is developed and trained to generate a mixed word and character sequence during decoding of a speech signal without requiring generation of OOV tokens.
    Type: Grant
    Filed: March 9, 2018
    Date of Patent: April 21, 2020
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Guoli Ye, James Droppo, Jinyu Li, Rui Zhao, Yifan Gong
  • Patent number: 10460727
    Abstract: Various systems and methods for multi-talker speech separation and recognition are disclosed herein. In one example, a system includes a memory and a processor to process mixed speech audio received from a microphone. In an example, the processor can also separate the mixed speech audio using permutation invariant training, wherein a criterion of the permutation invariant training is defined on an utterance of the mixed speech audio. In an example, the processor can also generate a plurality of separated streams for submission to a speech decoder.
    Type: Grant
    Filed: May 23, 2017
    Date of Patent: October 29, 2019
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: James Droppo, Xuedong Huang, Dong Yu
  • Publication number: 20190279614
    Abstract: Non-limiting examples of the present disclosure describe advancements in acoustic-to-word modeling that improve accuracy in speech recognition processing through the replacement of out-of-vocabulary (OOV) tokens. During the decoding of speech signals, better accuracy in speech recognition processing is achieved through training and implementation of multiple different solutions that present enhanced speech recognition models. In one example, a hybrid neural network model for speech recognition processing combines a word-based neural network model as a primary model and a character-based neural network model as an auxiliary model. The primary word-based model emits a word sequence, and an output of character-based auxiliary model is consulted at a segment where the word-based model emits an OOV token. In another example, a mixed unit speech recognition model is developed and trained to generate a mixed word and character sequence during decoding of a speech signal without requiring generation of OOV tokens.
    Type: Application
    Filed: March 9, 2018
    Publication date: September 12, 2019
    Inventors: Guoli YE, James DROPPO, Jinyu LI, Rui ZHAO, Yifan GONG
  • Publication number: 20180254040
    Abstract: Various systems and methods for multi-talker speech separation and recognition are disclosed herein. In one example, a system includes a memory and a processor to process mixed speech audio received from a microphone. In an example, the processor can also separate the mixed speech audio using permutation invariant training, wherein a criterion of the permutation invariant training is defined on an utterance of the mixed speech audio. In an example, the processor can also generate a plurality of separated streams for submission to a speech decoder.
    Type: Application
    Filed: May 23, 2017
    Publication date: September 6, 2018
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: James Droppo, Xuedong Huang, Dong Yu
  • Publication number: 20170308789
    Abstract: Techniques and constructs can reduce the time required to determine solutions to optimization problems such as training of neural networks. Modifications to a computational model can be determined by a plurality of nodes operating in parallel. Quantized modification values can be transmitted between the nodes to reduce the volume of data to be transferred. The quantized values can be as small as one bit each. Quantization-error values can be stored and used in quantizing subsequent modifications. The nodes can operate in parallel and overlap computation and data transfer to further reduce the time required to determine solutions. The quantized values can be partitioned and each node can aggregate values for a corresponding partition.
    Type: Application
    Filed: September 12, 2014
    Publication date: October 26, 2017
    Inventors: John LANGFORD, Gang LI, Frank Torsten Bernd SEIDE, James DROPPO, Dong YU
  • Patent number: 9779727
    Abstract: The claimed subject matter includes a system and method for recognizing mixed speech from a source. The method includes training a first neural network to recognize the speech signal spoken by the speaker with a higher level of a speech characteristic from a mixed speech sample. The method also includes training a second neural network to recognize the speech signal spoken by the speaker with a lower level of the speech characteristic from the mixed speech sample. Additionally, the method includes decoding the mixed speech sample with the first neural network and the second neural network by optimizing the joint likelihood of observing the two speech signals considering the probability that a specific frame is a switching point of the speech characteristic.
    Type: Grant
    Filed: December 30, 2016
    Date of Patent: October 3, 2017
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Dong Yu, Chao Weng, Michael L. Seltzer, James Droppo
  • Publication number: 20170110120
    Abstract: The claimed subject matter includes a system and method for recognizing mixed speech from a source. The method includes training a first neural network to recognize the speech signal spoken by the speaker with a higher level of a speech characteristic from a mixed speech sample. The method also includes training a second neural network to recognize the speech signal spoken by the speaker with a lower level of the speech characteristic from the mixed speech sample. Additionally, the method includes decoding the mixed speech sample with the first neural network and the second neural network by optimizing the joint likelihood of observing the two speech signals considering the probability that a specific frame is a switching point of the speech characteristic.
    Type: Application
    Filed: December 30, 2016
    Publication date: April 20, 2017
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Dong Yu, Chao Weng, Michael L. Seltzer, James Droppo
  • Patent number: 9558742
    Abstract: The claimed subject matter includes a system and method for recognizing mixed speech from a source. The method includes training a first neural network to recognize the speech signal spoken by the speaker with a higher level of a speech characteristic from a mixed speech sample. The method also includes training a second neural network to recognize the speech signal spoken by the speaker with a lower level of the speech characteristic from the mixed speech sample. Additionally, the method includes decoding the mixed speech sample with the first neural network and the second neural network by optimizing the joint likelihood of observing the two speech signals considering the probability that a specific frame is a switching point of the speech characteristic.
    Type: Grant
    Filed: June 8, 2016
    Date of Patent: January 31, 2017
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Dong Yu, Chao Weng, Michael L. Seltzer, James Droppo
  • Publication number: 20160284348
    Abstract: The claimed subject matter includes a system and method for recognizing mixed speech from a source. The method includes training a first neural network to recognize the speech signal spoken by the speaker with a higher level of a speech characteristic from a mixed speech sample. The method also includes training a second neural network to recognize the speech signal spoken by the speaker with a lower level of the speech characteristic from the mixed speech sample. Additionally, the method includes decoding the mixed speech sample with the first neural network and the second neural network by optimizing the joint likelihood of observing the two speech signals considering the probability that a specific frame is a switching point of the speech characteristic.
    Type: Application
    Filed: June 8, 2016
    Publication date: September 29, 2016
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Dong Yu, Chao Weng, Michael L. Seltzer, James Droppo
  • Patent number: 9390712
    Abstract: The claimed subject matter includes a system and method for recognizing mixed speech from a source. The method includes training a first neural network to recognize the speech signal spoken by the speaker with a higher level of a speech characteristic from a mixed speech sample. The method also includes training a second neural network to recognize the speech signal spoken by the speaker with a lower level of the speech characteristic from the mixed speech sample. Additionally, the method includes decoding the mixed speech sample with the first neural network and the second neural network by optimizing the joint likelihood of observing the two speech signals considering the probability that a specific frame is a switching point of the speech characteristic.
    Type: Grant
    Filed: March 24, 2014
    Date of Patent: July 12, 2016
    Assignee: Microsoft Technology Licensing, LLC.
    Inventors: Dong Yu, Chao Weng, Michael L. Seltzer, James Droppo
  • Publication number: 20150269933
    Abstract: The claimed subject matter includes a system and method for recognizing mixed speech from a source. The method includes training a first neural network to recognize the speech signal spoken by the speaker with a higher level of a speech characteristic from a mixed speech sample. The method also includes training a second neural network to recognize the speech signal spoken by the speaker with a lower level of the speech characteristic from the mixed speech sample. Additionally, the method includes decoding the mixed speech sample with the first neural network and the second neural network by optimizing the joint likelihood of observing the two speech signals considering the probability that a specific frame is a switching point of the speech characteristic.
    Type: Application
    Filed: March 24, 2014
    Publication date: September 24, 2015
    Inventors: Dong Yu, Chao Weng, Michael L. Seltzer, James Droppo
  • Patent number: 7516067
    Abstract: A system and method are provided that reduce noise in speech signals. The system and method decompose a noisy speech signal into a harmonic component and a residual component. The harmonic component and residual component are then combined as a sum to form a noise-reduced value. In some embodiments, the sum is a weighted sum where the harmonic component is multiplied by a scaling factor. In some embodiments, the noise-reduced value is used in speech recognition.
    Type: Grant
    Filed: August 25, 2003
    Date of Patent: April 7, 2009
    Assignee: Microsoft Corporation
    Inventors: Michael Seltzer, James Droppo, Alejandro Acero
  • Patent number: 7418383
    Abstract: A unified, nonlinear, non-stationary, stochastic model is disclosed for estimating and removing effects of background noise on speech cepstra. Generally stated, the model is a union of dynamic system equations for speech and noise, and a model describing how speech and noise are mixed. Embodiments also pertain to related methods for enhancement.
    Type: Grant
    Filed: September 3, 2004
    Date of Patent: August 26, 2008
    Assignee: Microsoft Corporation
    Inventors: James Droppo, Alejandro Acero
  • Publication number: 20070106504
    Abstract: A method and apparatus are provided for determining uncertainty in noise reduction based on a parametric model of speech distortion. The method is first used to reduce noise in a noisy signal. In particular, noise is reduced from a representation of a portion of a noisy signal to produce a representation of a cleaned signal by utilizing an acoustic environment model. The uncertainty associated with the noise reduction process is then computed. In one embodiment, the uncertainty of the noise reduction process is used, in conjunction with the noise-reduced signal, to decode a pattern state.
    Type: Application
    Filed: December 20, 2006
    Publication date: May 10, 2007
    Applicant: Microsoft Corporation
    Inventors: Li Deng, Alejandro Acero, James Droppo
  • Publication number: 20060293887
    Abstract: A method and apparatus determine a likelihood of a speech state based on an alternative sensor signal and an air conduction microphone signal. The likelihood of the speech state is used, together with the alternative sensor signal and the air conduction microphone signal, to estimate a clean speech value for a clean speech signal.
    Type: Application
    Filed: June 28, 2005
    Publication date: December 28, 2006
    Applicant: Microsoft Corporation
    Inventors: Zhengyou Zhang, Zicheng Liu, Alejandro Acero, Amarnag Subramanya, James Droppo
  • Publication number: 20060206322
    Abstract: A system and method are provided that reduce noise in pattern recognition signals. To do this, embodiments of the present invention utilize a prior model of dynamic aspects of clean speech together with one or both of a prior model of static aspects of clean speech, and an acoustic model that indicates the relationship between clean speech, noisy speech and noise. In one embodiment, components of a noise-reduced feature vector are produced by forming a weighted sum of predicted values from the prior model of dynamic aspects of clean speech, the prior model of static aspects of clean speech and the acoustic-environmental model.
    Type: Application
    Filed: May 12, 2006
    Publication date: September 14, 2006
    Applicant: Microsoft Corporation
    Inventors: Li Deng, James Droppo, Alejandro Acero
  • Publication number: 20060206325
    Abstract: A method and apparatus are provided for using the uncertainty of a noise-removal process during pattern recognition. In particular, noise is removed from a representation of a portion of a noisy signal to produce a representation of a cleaned signal. In the meantime, an uncertainty associated with the noise removal is computed and is used with the representation of the cleaned signal to modify a probability for a phonetic state in the recognition system. In particular embodiments, the uncertainty is used to modify a probability distribution, by increasing the variance in each Gaussian distribution by the amount equal to the estimated variance of the cleaned signal, which is used in decoding the phonetic state sequence in a pattern recognition task.
    Type: Application
    Filed: May 16, 2006
    Publication date: September 14, 2006
    Applicant: Microsoft Corporation
    Inventors: James Droppo, Alejandro Acero, Li Deng
  • Publication number: 20060206321
    Abstract: A method and apparatus are provided for reducing noise in a signal. Under one aspect of the invention, a correction vector is selected based on a noisy feature vector that represents a noisy signal. The selected correction vector incorporates dynamic aspects of pattern signals. The selected correction vector is then added to the noisy feature vector to produce a cleaned feature vector. In other aspects of the invention, a noise value is produced from an estimate of the noise in a noisy signal. The noise value is subtracted from a value representing a portion of the noisy signal to produce a noise-normalized value. The noise-normalized value is used to select a correction value that is added to the noise-normalized value to produce a cleaned noise-normalized value. The noise value is then added to the cleaned noise-normalized value to produce a cleaned value representing a portion of a cleaned signal.
    Type: Application
    Filed: May 5, 2006
    Publication date: September 14, 2006
    Applicant: Microsoft Corporation
    Inventors: James Droppo, Li Deng, Alejandro Acero
  • Publication number: 20060178880
    Abstract: A method and apparatus classify a portion of an alternative sensor signal as either containing noise or not containing noise. The portions of the alternative sensor signal that are classified as containing noise are not used to estimate a portion of a clean speech signal and the channel response associated with the alternative sensor. The portions of the alternative sensor signal that are classified as not containing noise are used to estimate a portion of a clean speech signal and the channel response associated with the alternative sensor.
    Type: Application
    Filed: February 4, 2005
    Publication date: August 10, 2006
    Applicant: Microsoft Corporation
    Inventors: Zhengyou Zhang, Amarnag Subramanya, James Droppo, Zicheng Liu