Patents by Inventor James Droppo
James Droppo has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11049006Abstract: Techniques and constructs can reduce the time required to determine solutions to optimization problems such as training of neural networks. Modifications to a computational model can be determined by a plurality of nodes operating in parallel. Quantized modification values can be transmitted between the nodes to reduce the volume of data to be transferred. The quantized values can be as small as one bit each. Quantization-error values can be stored and used in quantizing subsequent modifications. The nodes can operate in parallel and overlap computation and data transfer to further reduce the time required to determine solutions. The quantized values can be partitioned and each node can aggregate values for a corresponding partition.Type: GrantFiled: September 12, 2014Date of Patent: June 29, 2021Assignee: Microsoft Technology Licensing, LLCInventors: John Langford, Gang Li, Frank Torsten Bernd Seide, James Droppo, Dong Yu
-
Patent number: 10629193Abstract: Non-limiting examples of the present disclosure describe advancements in acoustic-to-word modeling that improve accuracy in speech recognition processing through the replacement of out-of-vocabulary (OOV) tokens. During the decoding of speech signals, better accuracy in speech recognition processing is achieved through training and implementation of multiple different solutions that present enhanced speech recognition models. In one example, a hybrid neural network model for speech recognition processing combines a word-based neural network model as a primary model and a character-based neural network model as an auxiliary model. The primary word-based model emits a word sequence, and an output of character-based auxiliary model is consulted at a segment where the word-based model emits an OOV token. In another example, a mixed unit speech recognition model is developed and trained to generate a mixed word and character sequence during decoding of a speech signal without requiring generation of OOV tokens.Type: GrantFiled: March 9, 2018Date of Patent: April 21, 2020Assignee: Microsoft Technology Licensing, LLCInventors: Guoli Ye, James Droppo, Jinyu Li, Rui Zhao, Yifan Gong
-
Patent number: 10460727Abstract: Various systems and methods for multi-talker speech separation and recognition are disclosed herein. In one example, a system includes a memory and a processor to process mixed speech audio received from a microphone. In an example, the processor can also separate the mixed speech audio using permutation invariant training, wherein a criterion of the permutation invariant training is defined on an utterance of the mixed speech audio. In an example, the processor can also generate a plurality of separated streams for submission to a speech decoder.Type: GrantFiled: May 23, 2017Date of Patent: October 29, 2019Assignee: Microsoft Technology Licensing, LLCInventors: James Droppo, Xuedong Huang, Dong Yu
-
Publication number: 20190279614Abstract: Non-limiting examples of the present disclosure describe advancements in acoustic-to-word modeling that improve accuracy in speech recognition processing through the replacement of out-of-vocabulary (OOV) tokens. During the decoding of speech signals, better accuracy in speech recognition processing is achieved through training and implementation of multiple different solutions that present enhanced speech recognition models. In one example, a hybrid neural network model for speech recognition processing combines a word-based neural network model as a primary model and a character-based neural network model as an auxiliary model. The primary word-based model emits a word sequence, and an output of character-based auxiliary model is consulted at a segment where the word-based model emits an OOV token. In another example, a mixed unit speech recognition model is developed and trained to generate a mixed word and character sequence during decoding of a speech signal without requiring generation of OOV tokens.Type: ApplicationFiled: March 9, 2018Publication date: September 12, 2019Inventors: Guoli YE, James DROPPO, Jinyu LI, Rui ZHAO, Yifan GONG
-
Publication number: 20180254040Abstract: Various systems and methods for multi-talker speech separation and recognition are disclosed herein. In one example, a system includes a memory and a processor to process mixed speech audio received from a microphone. In an example, the processor can also separate the mixed speech audio using permutation invariant training, wherein a criterion of the permutation invariant training is defined on an utterance of the mixed speech audio. In an example, the processor can also generate a plurality of separated streams for submission to a speech decoder.Type: ApplicationFiled: May 23, 2017Publication date: September 6, 2018Applicant: Microsoft Technology Licensing, LLCInventors: James Droppo, Xuedong Huang, Dong Yu
-
Publication number: 20170308789Abstract: Techniques and constructs can reduce the time required to determine solutions to optimization problems such as training of neural networks. Modifications to a computational model can be determined by a plurality of nodes operating in parallel. Quantized modification values can be transmitted between the nodes to reduce the volume of data to be transferred. The quantized values can be as small as one bit each. Quantization-error values can be stored and used in quantizing subsequent modifications. The nodes can operate in parallel and overlap computation and data transfer to further reduce the time required to determine solutions. The quantized values can be partitioned and each node can aggregate values for a corresponding partition.Type: ApplicationFiled: September 12, 2014Publication date: October 26, 2017Inventors: John LANGFORD, Gang LI, Frank Torsten Bernd SEIDE, James DROPPO, Dong YU
-
Patent number: 9779727Abstract: The claimed subject matter includes a system and method for recognizing mixed speech from a source. The method includes training a first neural network to recognize the speech signal spoken by the speaker with a higher level of a speech characteristic from a mixed speech sample. The method also includes training a second neural network to recognize the speech signal spoken by the speaker with a lower level of the speech characteristic from the mixed speech sample. Additionally, the method includes decoding the mixed speech sample with the first neural network and the second neural network by optimizing the joint likelihood of observing the two speech signals considering the probability that a specific frame is a switching point of the speech characteristic.Type: GrantFiled: December 30, 2016Date of Patent: October 3, 2017Assignee: Microsoft Technology Licensing, LLCInventors: Dong Yu, Chao Weng, Michael L. Seltzer, James Droppo
-
Publication number: 20170110120Abstract: The claimed subject matter includes a system and method for recognizing mixed speech from a source. The method includes training a first neural network to recognize the speech signal spoken by the speaker with a higher level of a speech characteristic from a mixed speech sample. The method also includes training a second neural network to recognize the speech signal spoken by the speaker with a lower level of the speech characteristic from the mixed speech sample. Additionally, the method includes decoding the mixed speech sample with the first neural network and the second neural network by optimizing the joint likelihood of observing the two speech signals considering the probability that a specific frame is a switching point of the speech characteristic.Type: ApplicationFiled: December 30, 2016Publication date: April 20, 2017Applicant: Microsoft Technology Licensing, LLCInventors: Dong Yu, Chao Weng, Michael L. Seltzer, James Droppo
-
Patent number: 9558742Abstract: The claimed subject matter includes a system and method for recognizing mixed speech from a source. The method includes training a first neural network to recognize the speech signal spoken by the speaker with a higher level of a speech characteristic from a mixed speech sample. The method also includes training a second neural network to recognize the speech signal spoken by the speaker with a lower level of the speech characteristic from the mixed speech sample. Additionally, the method includes decoding the mixed speech sample with the first neural network and the second neural network by optimizing the joint likelihood of observing the two speech signals considering the probability that a specific frame is a switching point of the speech characteristic.Type: GrantFiled: June 8, 2016Date of Patent: January 31, 2017Assignee: Microsoft Technology Licensing, LLCInventors: Dong Yu, Chao Weng, Michael L. Seltzer, James Droppo
-
Publication number: 20160284348Abstract: The claimed subject matter includes a system and method for recognizing mixed speech from a source. The method includes training a first neural network to recognize the speech signal spoken by the speaker with a higher level of a speech characteristic from a mixed speech sample. The method also includes training a second neural network to recognize the speech signal spoken by the speaker with a lower level of the speech characteristic from the mixed speech sample. Additionally, the method includes decoding the mixed speech sample with the first neural network and the second neural network by optimizing the joint likelihood of observing the two speech signals considering the probability that a specific frame is a switching point of the speech characteristic.Type: ApplicationFiled: June 8, 2016Publication date: September 29, 2016Applicant: Microsoft Technology Licensing, LLCInventors: Dong Yu, Chao Weng, Michael L. Seltzer, James Droppo
-
Patent number: 9390712Abstract: The claimed subject matter includes a system and method for recognizing mixed speech from a source. The method includes training a first neural network to recognize the speech signal spoken by the speaker with a higher level of a speech characteristic from a mixed speech sample. The method also includes training a second neural network to recognize the speech signal spoken by the speaker with a lower level of the speech characteristic from the mixed speech sample. Additionally, the method includes decoding the mixed speech sample with the first neural network and the second neural network by optimizing the joint likelihood of observing the two speech signals considering the probability that a specific frame is a switching point of the speech characteristic.Type: GrantFiled: March 24, 2014Date of Patent: July 12, 2016Assignee: Microsoft Technology Licensing, LLC.Inventors: Dong Yu, Chao Weng, Michael L. Seltzer, James Droppo
-
Publication number: 20150269933Abstract: The claimed subject matter includes a system and method for recognizing mixed speech from a source. The method includes training a first neural network to recognize the speech signal spoken by the speaker with a higher level of a speech characteristic from a mixed speech sample. The method also includes training a second neural network to recognize the speech signal spoken by the speaker with a lower level of the speech characteristic from the mixed speech sample. Additionally, the method includes decoding the mixed speech sample with the first neural network and the second neural network by optimizing the joint likelihood of observing the two speech signals considering the probability that a specific frame is a switching point of the speech characteristic.Type: ApplicationFiled: March 24, 2014Publication date: September 24, 2015Inventors: Dong Yu, Chao Weng, Michael L. Seltzer, James Droppo
-
Patent number: 7516067Abstract: A system and method are provided that reduce noise in speech signals. The system and method decompose a noisy speech signal into a harmonic component and a residual component. The harmonic component and residual component are then combined as a sum to form a noise-reduced value. In some embodiments, the sum is a weighted sum where the harmonic component is multiplied by a scaling factor. In some embodiments, the noise-reduced value is used in speech recognition.Type: GrantFiled: August 25, 2003Date of Patent: April 7, 2009Assignee: Microsoft CorporationInventors: Michael Seltzer, James Droppo, Alejandro Acero
-
Patent number: 7418383Abstract: A unified, nonlinear, non-stationary, stochastic model is disclosed for estimating and removing effects of background noise on speech cepstra. Generally stated, the model is a union of dynamic system equations for speech and noise, and a model describing how speech and noise are mixed. Embodiments also pertain to related methods for enhancement.Type: GrantFiled: September 3, 2004Date of Patent: August 26, 2008Assignee: Microsoft CorporationInventors: James Droppo, Alejandro Acero
-
Publication number: 20070106504Abstract: A method and apparatus are provided for determining uncertainty in noise reduction based on a parametric model of speech distortion. The method is first used to reduce noise in a noisy signal. In particular, noise is reduced from a representation of a portion of a noisy signal to produce a representation of a cleaned signal by utilizing an acoustic environment model. The uncertainty associated with the noise reduction process is then computed. In one embodiment, the uncertainty of the noise reduction process is used, in conjunction with the noise-reduced signal, to decode a pattern state.Type: ApplicationFiled: December 20, 2006Publication date: May 10, 2007Applicant: Microsoft CorporationInventors: Li Deng, Alejandro Acero, James Droppo
-
Publication number: 20060293887Abstract: A method and apparatus determine a likelihood of a speech state based on an alternative sensor signal and an air conduction microphone signal. The likelihood of the speech state is used, together with the alternative sensor signal and the air conduction microphone signal, to estimate a clean speech value for a clean speech signal.Type: ApplicationFiled: June 28, 2005Publication date: December 28, 2006Applicant: Microsoft CorporationInventors: Zhengyou Zhang, Zicheng Liu, Alejandro Acero, Amarnag Subramanya, James Droppo
-
Publication number: 20060206322Abstract: A system and method are provided that reduce noise in pattern recognition signals. To do this, embodiments of the present invention utilize a prior model of dynamic aspects of clean speech together with one or both of a prior model of static aspects of clean speech, and an acoustic model that indicates the relationship between clean speech, noisy speech and noise. In one embodiment, components of a noise-reduced feature vector are produced by forming a weighted sum of predicted values from the prior model of dynamic aspects of clean speech, the prior model of static aspects of clean speech and the acoustic-environmental model.Type: ApplicationFiled: May 12, 2006Publication date: September 14, 2006Applicant: Microsoft CorporationInventors: Li Deng, James Droppo, Alejandro Acero
-
Publication number: 20060206325Abstract: A method and apparatus are provided for using the uncertainty of a noise-removal process during pattern recognition. In particular, noise is removed from a representation of a portion of a noisy signal to produce a representation of a cleaned signal. In the meantime, an uncertainty associated with the noise removal is computed and is used with the representation of the cleaned signal to modify a probability for a phonetic state in the recognition system. In particular embodiments, the uncertainty is used to modify a probability distribution, by increasing the variance in each Gaussian distribution by the amount equal to the estimated variance of the cleaned signal, which is used in decoding the phonetic state sequence in a pattern recognition task.Type: ApplicationFiled: May 16, 2006Publication date: September 14, 2006Applicant: Microsoft CorporationInventors: James Droppo, Alejandro Acero, Li Deng
-
Publication number: 20060206321Abstract: A method and apparatus are provided for reducing noise in a signal. Under one aspect of the invention, a correction vector is selected based on a noisy feature vector that represents a noisy signal. The selected correction vector incorporates dynamic aspects of pattern signals. The selected correction vector is then added to the noisy feature vector to produce a cleaned feature vector. In other aspects of the invention, a noise value is produced from an estimate of the noise in a noisy signal. The noise value is subtracted from a value representing a portion of the noisy signal to produce a noise-normalized value. The noise-normalized value is used to select a correction value that is added to the noise-normalized value to produce a cleaned noise-normalized value. The noise value is then added to the cleaned noise-normalized value to produce a cleaned value representing a portion of a cleaned signal.Type: ApplicationFiled: May 5, 2006Publication date: September 14, 2006Applicant: Microsoft CorporationInventors: James Droppo, Li Deng, Alejandro Acero
-
Publication number: 20060178880Abstract: A method and apparatus classify a portion of an alternative sensor signal as either containing noise or not containing noise. The portions of the alternative sensor signal that are classified as containing noise are not used to estimate a portion of a clean speech signal and the channel response associated with the alternative sensor. The portions of the alternative sensor signal that are classified as not containing noise are used to estimate a portion of a clean speech signal and the channel response associated with the alternative sensor.Type: ApplicationFiled: February 4, 2005Publication date: August 10, 2006Applicant: Microsoft CorporationInventors: Zhengyou Zhang, Amarnag Subramanya, James Droppo, Zicheng Liu