Patents by Inventor Michael L. Seltzer

Michael L. Seltzer has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10558909
    Abstract: A neural network is structured to connect the input values of an input set, at each level, to that level's output using a linear bypass connection. The linear bypass connection passes the input values, to the output, without applying a non-linear function to them.
    Type: Grant
    Filed: December 28, 2015
    Date of Patent: February 11, 2020
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: James G. Droppo, Pegah Ghahremani, Michael L. Seltzer
  • Patent number: 9824684
    Abstract: A sequence recognition system comprises a prediction component configured to receive a set of observed features from a signal to be recognized and to output a prediction output indicative of a predicted recognition based on the set of observed features. The sequence recognition system also comprises a classification component configured to receive the prediction output and to output a label indicative of recognition of the signal based on the prediction output.
    Type: Grant
    Filed: December 22, 2014
    Date of Patent: November 21, 2017
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Dong Yu, Yu Zhang, Michael L. Seltzer, James G. Droppo
  • Patent number: 9786284
    Abstract: This document describes various techniques for dual-band speech encoding. In some embodiments, a first type of speech feature is received from a remote entity, an estimate of a second type of speech feature is determined based on the first type of speech feature, the estimate of the second type of speech feature is provided to a speech recognizer, speech-recognition results based on the estimate of the second type of speech feature are received from the speech recognizer, and the speech-recognition results are transmitted to the remote entity.
    Type: Grant
    Filed: August 14, 2014
    Date of Patent: October 10, 2017
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Alejandro Acero, James G. Droppo, III, Michael L. Seltzer
  • Patent number: 9779727
    Abstract: The claimed subject matter includes a system and method for recognizing mixed speech from a source. The method includes training a first neural network to recognize the speech signal spoken by the speaker with a higher level of a speech characteristic from a mixed speech sample. The method also includes training a second neural network to recognize the speech signal spoken by the speaker with a lower level of the speech characteristic from the mixed speech sample. Additionally, the method includes decoding the mixed speech sample with the first neural network and the second neural network by optimizing the joint likelihood of observing the two speech signals considering the probability that a specific frame is a switching point of the speech characteristic.
    Type: Grant
    Filed: December 30, 2016
    Date of Patent: October 3, 2017
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Dong Yu, Chao Weng, Michael L. Seltzer, James Droppo
  • Publication number: 20170185887
    Abstract: A neural network is structured to connect the input values of an input set, at each level, to that level's output using a linear bypass connection. The linear bypass connection passes the input values, to the output, without applying a non-linear function to them.
    Type: Application
    Filed: December 28, 2015
    Publication date: June 29, 2017
    Inventors: James G. Droppo, Pegah Ghahremani, Michael L. Seltzer
  • Publication number: 20170110120
    Abstract: The claimed subject matter includes a system and method for recognizing mixed speech from a source. The method includes training a first neural network to recognize the speech signal spoken by the speaker with a higher level of a speech characteristic from a mixed speech sample. The method also includes training a second neural network to recognize the speech signal spoken by the speaker with a lower level of the speech characteristic from the mixed speech sample. Additionally, the method includes decoding the mixed speech sample with the first neural network and the second neural network by optimizing the joint likelihood of observing the two speech signals considering the probability that a specific frame is a switching point of the speech characteristic.
    Type: Application
    Filed: December 30, 2016
    Publication date: April 20, 2017
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Dong Yu, Chao Weng, Michael L. Seltzer, James Droppo
  • Patent number: 9558742
    Abstract: The claimed subject matter includes a system and method for recognizing mixed speech from a source. The method includes training a first neural network to recognize the speech signal spoken by the speaker with a higher level of a speech characteristic from a mixed speech sample. The method also includes training a second neural network to recognize the speech signal spoken by the speaker with a lower level of the speech characteristic from the mixed speech sample. Additionally, the method includes decoding the mixed speech sample with the first neural network and the second neural network by optimizing the joint likelihood of observing the two speech signals considering the probability that a specific frame is a switching point of the speech characteristic.
    Type: Grant
    Filed: June 8, 2016
    Date of Patent: January 31, 2017
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Dong Yu, Chao Weng, Michael L. Seltzer, James Droppo
  • Publication number: 20160284348
    Abstract: The claimed subject matter includes a system and method for recognizing mixed speech from a source. The method includes training a first neural network to recognize the speech signal spoken by the speaker with a higher level of a speech characteristic from a mixed speech sample. The method also includes training a second neural network to recognize the speech signal spoken by the speaker with a lower level of the speech characteristic from the mixed speech sample. Additionally, the method includes decoding the mixed speech sample with the first neural network and the second neural network by optimizing the joint likelihood of observing the two speech signals considering the probability that a specific frame is a switching point of the speech characteristic.
    Type: Application
    Filed: June 8, 2016
    Publication date: September 29, 2016
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Dong Yu, Chao Weng, Michael L. Seltzer, James Droppo
  • Patent number: 9390712
    Abstract: The claimed subject matter includes a system and method for recognizing mixed speech from a source. The method includes training a first neural network to recognize the speech signal spoken by the speaker with a higher level of a speech characteristic from a mixed speech sample. The method also includes training a second neural network to recognize the speech signal spoken by the speaker with a lower level of the speech characteristic from the mixed speech sample. Additionally, the method includes decoding the mixed speech sample with the first neural network and the second neural network by optimizing the joint likelihood of observing the two speech signals considering the probability that a specific frame is a switching point of the speech characteristic.
    Type: Grant
    Filed: March 24, 2014
    Date of Patent: July 12, 2016
    Assignee: Microsoft Technology Licensing, LLC.
    Inventors: Dong Yu, Chao Weng, Michael L. Seltzer, James Droppo
  • Publication number: 20160140956
    Abstract: A sequence recognition system comprises a prediction component configured to receive a set of observed features from a signal to be recognized and to output a prediction output indicative of a predicted recognition based on the set of observed features. The sequence recognition system also comprises a classification component configured to receive the prediction output and to output a label indicative of recognition of the signal based on the prediction output.
    Type: Application
    Filed: December 22, 2014
    Publication date: May 19, 2016
    Inventors: Dong Yu, Yu Zhang, Michael L. Seltzer, James G. Droppo
  • Patent number: 9324321
    Abstract: The adaptation and personalization of a deep neural network (DNN) model for automatic speech recognition is provided. An utterance which includes speech features for one or more speakers may be received in ASR tasks such as voice search or short message dictation. A decomposition approach may then be applied to an original matrix in the DNN model. In response to applying the decomposition approach, the original matrix may be converted into multiple new matrices which are smaller than the original matrix. A square matrix may then be added to the new matrices. Speaker-specific parameters may then be stored in the square matrix. The DNN model may then be adapted by updating the square matrix. This process may be applied to all of a number of original matrices in the DNN model. The adapted DNN model may include a reduced number of parameters than those received in the original DNN model.
    Type: Grant
    Filed: March 7, 2014
    Date of Patent: April 26, 2016
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Jian Xue, Jinyu Li, Dong Yu, Michael L. Seltzer, Yifan Gong
  • Publication number: 20150269933
    Abstract: The claimed subject matter includes a system and method for recognizing mixed speech from a source. The method includes training a first neural network to recognize the speech signal spoken by the speaker with a higher level of a speech characteristic from a mixed speech sample. The method also includes training a second neural network to recognize the speech signal spoken by the speaker with a lower level of the speech characteristic from the mixed speech sample. Additionally, the method includes decoding the mixed speech sample with the first neural network and the second neural network by optimizing the joint likelihood of observing the two speech signals considering the probability that a specific frame is a switching point of the speech characteristic.
    Type: Application
    Filed: March 24, 2014
    Publication date: September 24, 2015
    Inventors: Dong Yu, Chao Weng, Michael L. Seltzer, James Droppo
  • Publication number: 20150255061
    Abstract: The adaptation and personalization of a deep neural network (DNN) model for automatic speech recognition is provided. An utterance which includes speech features for one or more speakers may be received in ASR tasks such as voice search or short message dictation. A decomposition approach may then be applied to an original matrix in the DNN model. In response to applying the decomposition approach, the original matrix may be converted into multiple new matrices which are smaller than the original matrix. A square matrix may then be added to the new matrices. Speaker-specific parameters may then be stored in the square matrix. The DNN model may then be adapted by updating the square matrix. This process may be applied to all of a number of original matrices in the DNN model. The adapted DNN model may include a reduced number of parameters than those received in the original DNN model.
    Type: Application
    Filed: March 7, 2014
    Publication date: September 10, 2015
    Applicant: MICROSOFT CORPORATION
    Inventors: Jian Xue, Jinyu Li, Dong Yu, Michael L. Seltzer, Yifan Gong
  • Publication number: 20140358525
    Abstract: This document describes various techniques for dual-band speech encoding. In some embodiments, a first type of speech feature is received from a remote entity, an estimate of a second type of speech feature is determined based on the first type of speech feature, the estimate of the second type of speech feature is provided to a speech recognizer, speech-recognition results based on the estimate of the second type of speech feature are received from the speech recognizer, and the speech-recognition results are transmitted to the remote entity.
    Type: Application
    Filed: August 14, 2014
    Publication date: December 4, 2014
    Inventors: Alejandro Acero, James G. Droppo, III, Michael L. Seltzer
  • Patent number: 8818797
    Abstract: This document describes various techniques for dual-band speech encoding. In some embodiments, a first type of speech feature is received from a remote entity, an estimate of a second type of speech feature is determined based on the first type of speech feature, the estimate of the second type of speech feature is provided to a speech recognizer, speech-recognition results based on the estimate of the second type of speech feature are received from the speech recognizer, and the speech-recognition results are transmitted to the remote entity.
    Type: Grant
    Filed: December 23, 2010
    Date of Patent: August 26, 2014
    Assignee: Microsoft Corporation
    Inventors: Alejandro Acero, James G. Droppo, III, Michael L. Seltzer
  • Patent number: 8532985
    Abstract: A warped spectral estimate of an original audio signal can be used to encode a representation of a fine estimate of the original signal. The representation of the warped spectral estimate and the representation of the fine estimate can be sent to a speech recognition system. The representation of the warped spectral estimate can be passed to a speech recognition engine, where it may be used for speech recognition. The representation of the warped spectral estimate can also be used along with the representation of the fine estimate to reconstruct a representation of the original audio signal.
    Type: Grant
    Filed: December 3, 2010
    Date of Patent: September 10, 2013
    Assignee: Microsoft Coporation
    Inventors: Michael L. Seltzer, James G. Droppo, Henrique S. Malvar, Alejandro Acero, Xing Fan
  • Patent number: 8515096
    Abstract: The quality of sound recorded from a plurality of people speaking at the same time is improved by incorporating prior knowledge into an independent component analysis (ICA) separating algorithm. More particularly, prior knowledge is defined as a probability distribution according to some prior situation (e.g., prior distribution of people in a room). A mixture of sounds (e.g., mixture of voices) from a plurality of sources (e.g., people) captured by one or more recording devices (e.g., microphones) is separated into individual components (e.g., individual voices from respective people) by applying an maximum a posteriori (MAP) ICA algorithm which incorporates prior knowledge of the respective sources (e.g., location of sources) directly into the MAP ICA algorithm thereby allowing recovery of independent underlying sounds associated with individual sources from the mixture.
    Type: Grant
    Filed: June 18, 2008
    Date of Patent: August 20, 2013
    Assignee: Microsoft Corporation
    Inventors: Michael L. Seltzer, Graham Taylor, Alejandro Acero
  • Patent number: 8379891
    Abstract: Sound signals to be output from a loudspeaker array are modified by a plurality of filters designed according to an unconstrained optimization procedure to improve overall performance (e.g., power, directivity) of the loudspeaker array. More particularly, respective filters are configured to receive a signal to be output to a plurality of loudspeakers. Upon receiving the signal, the respective filters individually modify the received signal according to the results of the unconstrained optimization procedure and then output the individually modified signals to respective loudspeakers. The unconstrained optimization procedure takes into account manufacturing tolerances and individually enhances the signal output to each of a plurality of individual loudspeakers within an array to achieve an overall improvement in performance.
    Type: Grant
    Filed: June 4, 2008
    Date of Patent: February 19, 2013
    Assignee: Microsoft Corporation
    Inventors: Ivan J. Tashev, James G. Droppo, Michael L. Seltzer, Alejandro Acero
  • Publication number: 20120166186
    Abstract: This document describes various techniques for dual-band speech encoding. In some embodiments, a first type of speech feature is received from a remote entity, an estimate of a second type of speech feature is determined based on the first type of speech feature, the estimate of the second type of speech feature is provided to a speech recognizer, speech-recognition results based on the estimate of the second type of speech feature are received from the speech recognizer, and the speech-recognition results are transmitted to the remote entity.
    Type: Application
    Filed: December 23, 2010
    Publication date: June 28, 2012
    Applicant: Microsoft Corporation
    Inventors: Alejandro Acero, James G. Droppo, III, Michael L. Seltzer
  • Publication number: 20120143599
    Abstract: A warped spectral estimate of an original audio signal can be used to encode a representation of a fine estimate of the original signal. The representation of the warped spectral estimate and the representation of the fine estimate can be sent to a speech recognition system. The representation of the warped spectral estimate can be passed to a speech recognition engine, where it may be used for speech recognition. The representation of the warped spectral estimate can also be used along with the representation of the fine estimate to reconstruct a representation of the original audio signal.
    Type: Application
    Filed: December 3, 2010
    Publication date: June 7, 2012
    Applicant: Microsoft Corporation
    Inventors: Michael L. Seltzer, James G. Droppo, Henrique S. Malvar, Alejandro Acero, Xing Fan