Patents by Inventor Michael L. Seltzer
Michael L. Seltzer has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 10558909Abstract: A neural network is structured to connect the input values of an input set, at each level, to that level's output using a linear bypass connection. The linear bypass connection passes the input values, to the output, without applying a non-linear function to them.Type: GrantFiled: December 28, 2015Date of Patent: February 11, 2020Assignee: Microsoft Technology Licensing, LLCInventors: James G. Droppo, Pegah Ghahremani, Michael L. Seltzer
-
Patent number: 9824684Abstract: A sequence recognition system comprises a prediction component configured to receive a set of observed features from a signal to be recognized and to output a prediction output indicative of a predicted recognition based on the set of observed features. The sequence recognition system also comprises a classification component configured to receive the prediction output and to output a label indicative of recognition of the signal based on the prediction output.Type: GrantFiled: December 22, 2014Date of Patent: November 21, 2017Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Dong Yu, Yu Zhang, Michael L. Seltzer, James G. Droppo
-
Patent number: 9786284Abstract: This document describes various techniques for dual-band speech encoding. In some embodiments, a first type of speech feature is received from a remote entity, an estimate of a second type of speech feature is determined based on the first type of speech feature, the estimate of the second type of speech feature is provided to a speech recognizer, speech-recognition results based on the estimate of the second type of speech feature are received from the speech recognizer, and the speech-recognition results are transmitted to the remote entity.Type: GrantFiled: August 14, 2014Date of Patent: October 10, 2017Assignee: Microsoft Technology Licensing, LLCInventors: Alejandro Acero, James G. Droppo, III, Michael L. Seltzer
-
Patent number: 9779727Abstract: The claimed subject matter includes a system and method for recognizing mixed speech from a source. The method includes training a first neural network to recognize the speech signal spoken by the speaker with a higher level of a speech characteristic from a mixed speech sample. The method also includes training a second neural network to recognize the speech signal spoken by the speaker with a lower level of the speech characteristic from the mixed speech sample. Additionally, the method includes decoding the mixed speech sample with the first neural network and the second neural network by optimizing the joint likelihood of observing the two speech signals considering the probability that a specific frame is a switching point of the speech characteristic.Type: GrantFiled: December 30, 2016Date of Patent: October 3, 2017Assignee: Microsoft Technology Licensing, LLCInventors: Dong Yu, Chao Weng, Michael L. Seltzer, James Droppo
-
Publication number: 20170185887Abstract: A neural network is structured to connect the input values of an input set, at each level, to that level's output using a linear bypass connection. The linear bypass connection passes the input values, to the output, without applying a non-linear function to them.Type: ApplicationFiled: December 28, 2015Publication date: June 29, 2017Inventors: James G. Droppo, Pegah Ghahremani, Michael L. Seltzer
-
Publication number: 20170110120Abstract: The claimed subject matter includes a system and method for recognizing mixed speech from a source. The method includes training a first neural network to recognize the speech signal spoken by the speaker with a higher level of a speech characteristic from a mixed speech sample. The method also includes training a second neural network to recognize the speech signal spoken by the speaker with a lower level of the speech characteristic from the mixed speech sample. Additionally, the method includes decoding the mixed speech sample with the first neural network and the second neural network by optimizing the joint likelihood of observing the two speech signals considering the probability that a specific frame is a switching point of the speech characteristic.Type: ApplicationFiled: December 30, 2016Publication date: April 20, 2017Applicant: Microsoft Technology Licensing, LLCInventors: Dong Yu, Chao Weng, Michael L. Seltzer, James Droppo
-
Patent number: 9558742Abstract: The claimed subject matter includes a system and method for recognizing mixed speech from a source. The method includes training a first neural network to recognize the speech signal spoken by the speaker with a higher level of a speech characteristic from a mixed speech sample. The method also includes training a second neural network to recognize the speech signal spoken by the speaker with a lower level of the speech characteristic from the mixed speech sample. Additionally, the method includes decoding the mixed speech sample with the first neural network and the second neural network by optimizing the joint likelihood of observing the two speech signals considering the probability that a specific frame is a switching point of the speech characteristic.Type: GrantFiled: June 8, 2016Date of Patent: January 31, 2017Assignee: Microsoft Technology Licensing, LLCInventors: Dong Yu, Chao Weng, Michael L. Seltzer, James Droppo
-
Publication number: 20160284348Abstract: The claimed subject matter includes a system and method for recognizing mixed speech from a source. The method includes training a first neural network to recognize the speech signal spoken by the speaker with a higher level of a speech characteristic from a mixed speech sample. The method also includes training a second neural network to recognize the speech signal spoken by the speaker with a lower level of the speech characteristic from the mixed speech sample. Additionally, the method includes decoding the mixed speech sample with the first neural network and the second neural network by optimizing the joint likelihood of observing the two speech signals considering the probability that a specific frame is a switching point of the speech characteristic.Type: ApplicationFiled: June 8, 2016Publication date: September 29, 2016Applicant: Microsoft Technology Licensing, LLCInventors: Dong Yu, Chao Weng, Michael L. Seltzer, James Droppo
-
Patent number: 9390712Abstract: The claimed subject matter includes a system and method for recognizing mixed speech from a source. The method includes training a first neural network to recognize the speech signal spoken by the speaker with a higher level of a speech characteristic from a mixed speech sample. The method also includes training a second neural network to recognize the speech signal spoken by the speaker with a lower level of the speech characteristic from the mixed speech sample. Additionally, the method includes decoding the mixed speech sample with the first neural network and the second neural network by optimizing the joint likelihood of observing the two speech signals considering the probability that a specific frame is a switching point of the speech characteristic.Type: GrantFiled: March 24, 2014Date of Patent: July 12, 2016Assignee: Microsoft Technology Licensing, LLC.Inventors: Dong Yu, Chao Weng, Michael L. Seltzer, James Droppo
-
Publication number: 20160140956Abstract: A sequence recognition system comprises a prediction component configured to receive a set of observed features from a signal to be recognized and to output a prediction output indicative of a predicted recognition based on the set of observed features. The sequence recognition system also comprises a classification component configured to receive the prediction output and to output a label indicative of recognition of the signal based on the prediction output.Type: ApplicationFiled: December 22, 2014Publication date: May 19, 2016Inventors: Dong Yu, Yu Zhang, Michael L. Seltzer, James G. Droppo
-
Patent number: 9324321Abstract: The adaptation and personalization of a deep neural network (DNN) model for automatic speech recognition is provided. An utterance which includes speech features for one or more speakers may be received in ASR tasks such as voice search or short message dictation. A decomposition approach may then be applied to an original matrix in the DNN model. In response to applying the decomposition approach, the original matrix may be converted into multiple new matrices which are smaller than the original matrix. A square matrix may then be added to the new matrices. Speaker-specific parameters may then be stored in the square matrix. The DNN model may then be adapted by updating the square matrix. This process may be applied to all of a number of original matrices in the DNN model. The adapted DNN model may include a reduced number of parameters than those received in the original DNN model.Type: GrantFiled: March 7, 2014Date of Patent: April 26, 2016Assignee: Microsoft Technology Licensing, LLCInventors: Jian Xue, Jinyu Li, Dong Yu, Michael L. Seltzer, Yifan Gong
-
Publication number: 20150269933Abstract: The claimed subject matter includes a system and method for recognizing mixed speech from a source. The method includes training a first neural network to recognize the speech signal spoken by the speaker with a higher level of a speech characteristic from a mixed speech sample. The method also includes training a second neural network to recognize the speech signal spoken by the speaker with a lower level of the speech characteristic from the mixed speech sample. Additionally, the method includes decoding the mixed speech sample with the first neural network and the second neural network by optimizing the joint likelihood of observing the two speech signals considering the probability that a specific frame is a switching point of the speech characteristic.Type: ApplicationFiled: March 24, 2014Publication date: September 24, 2015Inventors: Dong Yu, Chao Weng, Michael L. Seltzer, James Droppo
-
Publication number: 20150255061Abstract: The adaptation and personalization of a deep neural network (DNN) model for automatic speech recognition is provided. An utterance which includes speech features for one or more speakers may be received in ASR tasks such as voice search or short message dictation. A decomposition approach may then be applied to an original matrix in the DNN model. In response to applying the decomposition approach, the original matrix may be converted into multiple new matrices which are smaller than the original matrix. A square matrix may then be added to the new matrices. Speaker-specific parameters may then be stored in the square matrix. The DNN model may then be adapted by updating the square matrix. This process may be applied to all of a number of original matrices in the DNN model. The adapted DNN model may include a reduced number of parameters than those received in the original DNN model.Type: ApplicationFiled: March 7, 2014Publication date: September 10, 2015Applicant: MICROSOFT CORPORATIONInventors: Jian Xue, Jinyu Li, Dong Yu, Michael L. Seltzer, Yifan Gong
-
Publication number: 20140358525Abstract: This document describes various techniques for dual-band speech encoding. In some embodiments, a first type of speech feature is received from a remote entity, an estimate of a second type of speech feature is determined based on the first type of speech feature, the estimate of the second type of speech feature is provided to a speech recognizer, speech-recognition results based on the estimate of the second type of speech feature are received from the speech recognizer, and the speech-recognition results are transmitted to the remote entity.Type: ApplicationFiled: August 14, 2014Publication date: December 4, 2014Inventors: Alejandro Acero, James G. Droppo, III, Michael L. Seltzer
-
Patent number: 8818797Abstract: This document describes various techniques for dual-band speech encoding. In some embodiments, a first type of speech feature is received from a remote entity, an estimate of a second type of speech feature is determined based on the first type of speech feature, the estimate of the second type of speech feature is provided to a speech recognizer, speech-recognition results based on the estimate of the second type of speech feature are received from the speech recognizer, and the speech-recognition results are transmitted to the remote entity.Type: GrantFiled: December 23, 2010Date of Patent: August 26, 2014Assignee: Microsoft CorporationInventors: Alejandro Acero, James G. Droppo, III, Michael L. Seltzer
-
Patent number: 8532985Abstract: A warped spectral estimate of an original audio signal can be used to encode a representation of a fine estimate of the original signal. The representation of the warped spectral estimate and the representation of the fine estimate can be sent to a speech recognition system. The representation of the warped spectral estimate can be passed to a speech recognition engine, where it may be used for speech recognition. The representation of the warped spectral estimate can also be used along with the representation of the fine estimate to reconstruct a representation of the original audio signal.Type: GrantFiled: December 3, 2010Date of Patent: September 10, 2013Assignee: Microsoft CoporationInventors: Michael L. Seltzer, James G. Droppo, Henrique S. Malvar, Alejandro Acero, Xing Fan
-
Patent number: 8515096Abstract: The quality of sound recorded from a plurality of people speaking at the same time is improved by incorporating prior knowledge into an independent component analysis (ICA) separating algorithm. More particularly, prior knowledge is defined as a probability distribution according to some prior situation (e.g., prior distribution of people in a room). A mixture of sounds (e.g., mixture of voices) from a plurality of sources (e.g., people) captured by one or more recording devices (e.g., microphones) is separated into individual components (e.g., individual voices from respective people) by applying an maximum a posteriori (MAP) ICA algorithm which incorporates prior knowledge of the respective sources (e.g., location of sources) directly into the MAP ICA algorithm thereby allowing recovery of independent underlying sounds associated with individual sources from the mixture.Type: GrantFiled: June 18, 2008Date of Patent: August 20, 2013Assignee: Microsoft CorporationInventors: Michael L. Seltzer, Graham Taylor, Alejandro Acero
-
Patent number: 8379891Abstract: Sound signals to be output from a loudspeaker array are modified by a plurality of filters designed according to an unconstrained optimization procedure to improve overall performance (e.g., power, directivity) of the loudspeaker array. More particularly, respective filters are configured to receive a signal to be output to a plurality of loudspeakers. Upon receiving the signal, the respective filters individually modify the received signal according to the results of the unconstrained optimization procedure and then output the individually modified signals to respective loudspeakers. The unconstrained optimization procedure takes into account manufacturing tolerances and individually enhances the signal output to each of a plurality of individual loudspeakers within an array to achieve an overall improvement in performance.Type: GrantFiled: June 4, 2008Date of Patent: February 19, 2013Assignee: Microsoft CorporationInventors: Ivan J. Tashev, James G. Droppo, Michael L. Seltzer, Alejandro Acero
-
Publication number: 20120166186Abstract: This document describes various techniques for dual-band speech encoding. In some embodiments, a first type of speech feature is received from a remote entity, an estimate of a second type of speech feature is determined based on the first type of speech feature, the estimate of the second type of speech feature is provided to a speech recognizer, speech-recognition results based on the estimate of the second type of speech feature are received from the speech recognizer, and the speech-recognition results are transmitted to the remote entity.Type: ApplicationFiled: December 23, 2010Publication date: June 28, 2012Applicant: Microsoft CorporationInventors: Alejandro Acero, James G. Droppo, III, Michael L. Seltzer
-
Publication number: 20120143599Abstract: A warped spectral estimate of an original audio signal can be used to encode a representation of a fine estimate of the original signal. The representation of the warped spectral estimate and the representation of the fine estimate can be sent to a speech recognition system. The representation of the warped spectral estimate can be passed to a speech recognition engine, where it may be used for speech recognition. The representation of the warped spectral estimate can also be used along with the representation of the fine estimate to reconstruct a representation of the original audio signal.Type: ApplicationFiled: December 3, 2010Publication date: June 7, 2012Applicant: Microsoft CorporationInventors: Michael L. Seltzer, James G. Droppo, Henrique S. Malvar, Alejandro Acero, Xing Fan