Patents by Inventor Michael Lewis Seltzer

Michael Lewis Seltzer has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11646017
    Abstract: In one embodiment, a method includes accessing a machine-learning model configured to generate an encoding for an utterance by using a module to process data associated with each segment of the utterance in a series of iterations, performing operations associated with an i-th segment during an n-th iteration by the module, which include receiving an input comprising input contextual embeddings generated for the i-th segment in a preceding iteration and a memory bank storing memory vectors generated in the preceding iteration for segments preceding the i-th segment, generating attention outputs and a memory vector based on keys, values, and queries generated using the input, and generating output contextual embeddings for the i-th segment based on the attention outputs, providing the memory vector to the module for performing operations associated with the i-th segment in a next iteration, and performing speech recognition by decoding the encoding of the utterance.
    Type: Grant
    Filed: March 5, 2021
    Date of Patent: May 9, 2023
    Assignee: Meta Platforms, Inc.
    Inventors: Yangyang Shi, Yongqiang Wang, Chunyang Wu, Ching-Feng Yeh, Julian Yui-Hin Chan, Qiaochu Zhang, Duc Hoang Le, Michael Lewis Seltzer
  • Patent number: 10885900
    Abstract: Improvements in speech recognition in a new domain are provided via the student/teacher training of models for different speech domains. A student model for a new domain is created based on the teacher model trained in an existing domain. The student model is trained in parallel to the operation of the teacher model, with inputs in the new and existing domains respectfully, to develop a neural network that is adapted to recognize speech in the new domain. The data in the new domain may exclude transcription labels but rather are parallelized with the data analyzed in the existing domain analyzed by the teacher model. The outputs from the teacher model are compared with the outputs of the student model and the differences are used to adjust the parameters of the student model to better recognize speech in the second domain.
    Type: Grant
    Filed: August 11, 2017
    Date of Patent: January 5, 2021
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Jinyu Li, Michael Lewis Seltzer, Xi Wang, Rui Zhao, Yifan Gong
  • Publication number: 20190051290
    Abstract: Improvements in speech recognition in a new domain are provided via the student/teacher training of models for different speech domains. A student model for a new domain is created based on the teacher model trained in an existing domain. The student model is trained in parallel to the operation of the teacher model, with inputs in the new and existing domains respectfully, to develop a neural network that is adapted to recognize speech in the new domain. The data in the new domain may exclude transcription labels but rather are parallelized with the data analyzed in the existing domain analyzed by the teacher model. The outputs from the teacher model are compared with the outputs of the student model and the differences are used to adjust the parameters of the student model to better recognize speech in the second domain.
    Type: Application
    Filed: August 11, 2017
    Publication date: February 14, 2019
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Jinyu Li, Michael Lewis Seltzer, Xi Wang, Rui Zhao, Yifan Gong
  • Patent number: 9984678
    Abstract: Various technologies described herein pertain to adapting a speech recognizer to input speech data. A first linear transform can be selected from a first set of linear transforms based on a value of a first variability source corresponding to the input speech data, and a second linear transform can be selected from a second set of linear transforms based on a value of a second variability source corresponding to the input speech data. The linear transforms in the first and second sets can compensate for the first variability source and the second variability source, respectively. Moreover, the first linear transform can be applied to the input speech data to generate intermediate transformed speech data, and the second linear transform can be applied to the intermediate transformed speech data to generate transformed speech data. Further, speech can be recognized based on the transformed speech data to obtain a result.
    Type: Grant
    Filed: March 23, 2012
    Date of Patent: May 29, 2018
    Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC
    Inventors: Michael Lewis Seltzer, Alejandro Acero
  • Patent number: 9009039
    Abstract: Technologies are described herein for noise adaptive training to achieve robust automatic speech recognition. Through the use of these technologies, a noise adaptive training (NAT) approach may use both clean and corrupted speech for training. The NAT approach may normalize the environmental distortion as part of the model training. A set of underlying “pseudo-clean” model parameters may be estimated directly. This may be done without point estimation of clean speech features as an intermediate step. The pseudo-clean model parameters learned from the NAT technique may be used with a Vector Taylor Series (VTS) adaptation. Such adaptation may support decoding noisy utterances during the operating phase of a automatic voice recognition system.
    Type: Grant
    Filed: June 12, 2009
    Date of Patent: April 14, 2015
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Michael Lewis Seltzer, James Garnet Droppo, Ozlem Kalinli, Alejandro Acero
  • Patent number: 8793065
    Abstract: Oftentimes individuals have a number of objectives to complete while traveling in a vehicle. The objectives can be arranged automatically and an associated route can be produced such that the objectives can be completed in an effective manner. Data related to the objectives can be collected such as a traffic pattern on paths near a location the objective is to take place. Locations for the objectives to be completed can be determined automatically as well as provided by user. Analysis of the collected data can take place and based on a result of the analysis, an efficient route is produced.
    Type: Grant
    Filed: February 19, 2008
    Date of Patent: July 29, 2014
    Assignee: Microsoft Corporation
    Inventors: Michael Lewis Seltzer, Neil W. Black, Jeffrey D. Couckuyt, Ivan J. Tashev, John C. Krumm, Ruston Panabaker
  • Patent number: 8793066
    Abstract: A user can be compensated for taking detours from a projected route. Commonly, the reason for the compensation is that the user will be subjected to advertising, the user will pass by an establishment she is likely to visit, or to ease traffic congestion. Analysis of an area takes place and monetization opportunities are determined based upon the results of the analysis. A route between at least about two locations can be altered such that the user is provided a reward, commonly in an optimized manner.
    Type: Grant
    Filed: December 14, 2007
    Date of Patent: July 29, 2014
    Assignee: Microsoft Corporation
    Inventors: Ruston Panabaker, John C. Krumm, Jeffrey D. Couckuyt, Ivan J. Tashev, Michael Lewis Seltzer, Neil W. Black
  • Patent number: 8700394
    Abstract: Described is a technology by which a speech recognizer is adapted to perform in noisy environments using linear spline interpolation to approximate the nonlinear relationship between clean speech, noise, and noisy speech. Linear spline parameters that minimize the error the between predicted noisy features and actual noisy features are learned from training data, along with variance data that reflect regression errors. Also described is compensating for linear channel distortion and updating noise and channel parameters during speech recognition decoding.
    Type: Grant
    Filed: March 24, 2010
    Date of Patent: April 15, 2014
    Assignee: Microsoft Corporation
    Inventors: Michael Lewis Seltzer, Kaustubh Prakash Kalgaonkar, Alejandro Acero
  • Publication number: 20140067387
    Abstract: Scalar operations for model adaptation or feature enhancement may be utilized for recognizing an utterance during automatic speech recognition in a noisy environment. An utterance including distorted speech generated from a transmission source for delivery to a receiver, may be received by a computer. The distorted speech may be caused by the noisy environment and channel distortion. Computations using scalar operations in the form of an algorithm may then be performed for recognizing the utterance. As a result of performing all of the computations with scalar operations, computational complexity is very small in comparison to matrix and vector operations. Vector Taylor Series with diagonal Jacobian approximation may also be utilized as a distortion-model-based noise robust algorithm with scalar operations.
    Type: Application
    Filed: September 5, 2012
    Publication date: March 6, 2014
    Applicant: MICROSOFT CORPORATION
    Inventors: Jinyu Li, Michael Lewis Seltzer, Yifan Gong
  • Publication number: 20130253930
    Abstract: Various technologies described herein pertain to adapting a speech recognizer to input speech data. A first linear transform can be selected from a first set of linear transforms based on a value of a first variability source corresponding to the input speech data, and a second linear transform can be selected from a second set of linear transforms based on a value of a second variability source corresponding to the input speech data. The linear transforms in the first and second sets can compensate for the first variability source and the second variability source, respectively. Moreover, the first linear transform can be applied to the input speech data to generate intermediate transformed speech data, and the second linear transform can be applied to the intermediate transformed speech data to generate transformed speech data. Further, speech can be recognized based on the transformed speech data to obtain a result.
    Type: Application
    Filed: March 23, 2012
    Publication date: September 26, 2013
    Applicant: MICROSOFT CORPORATION
    Inventors: Michael Lewis Seltzer, Alejandro Acero
  • Patent number: 8473198
    Abstract: When users travel to an intended destination, a plurality of information can be beneficial to assist their travel. If a person is traveling to a crowded event, then information can be provided such that congested traffic areas can be provided. There can be financial opportunities available in relation to providing information concerning an intended destination. An advertiser can pay money to have information played about the advertiser when it relates to the intended destination. Furthermore, a user can pay money for detailed data concerning an intended location, such as where cheapest parking is located.
    Type: Grant
    Filed: December 14, 2007
    Date of Patent: June 25, 2013
    Assignee: Microsoft Corporation
    Inventors: John C. Krumm, Ruston Panabaker, Jeffrey D. Couckuyt, Ivan J. Tashev, Michael Lewis Seltzer, Neil W. Black
  • Patent number: 8401206
    Abstract: Described is a audio signal processing technology in which an adaptive beamformer processes input signals from microphones based on an estimate received from a pre-filter. The adaptive beamformer may compute its parameters (e.g., weights) for each frame based on the estimate, via a magnitude-domain objective function or log-magnitude-domain objective function. The pre-filter may include a time invariant beamformer and/or a non-linear spatial filter, and/or may include a spectral filter. The computed parameters may be adjusted based on a constraint, which may be selectively applied only at desired times.
    Type: Grant
    Filed: January 15, 2009
    Date of Patent: March 19, 2013
    Assignee: Microsoft Corporation
    Inventors: Michael Lewis Seltzer, Ivan Jelev Tashev
  • Patent number: 8213635
    Abstract: An audio signal is received that might include keyboard noise and speech. The audio signal is digitized and transformed from a time domain to a frequency domain. The transformed audio is analyzed to determine whether there is likelihood that keystroke noise is present. If it is determined there is high likelihood that the audio signal contains keystroke noise, a determination is made as to whether a keyboard event occurred around the time of the likely keystroke noise. If it is determined that a keyboard event occurred around the time of the likely keystroke noise, a determination is made as to whether speech is present in the audio signal around the time of the likely keystroke noise. If no speech is present, the keystroke noise is suppressed in the audio signal. If speech is detected in the audio signal or if the keystroke noise abates, the suppression gain is removed from the audio signal.
    Type: Grant
    Filed: December 5, 2008
    Date of Patent: July 3, 2012
    Assignee: Microsoft Corporation
    Inventors: Qin Li, Michael Lewis Seltzer, Chao He
  • Patent number: 8090532
    Abstract: As a pedestrian travels, various difficulties can be encountered, such as traveling through an unsafe neighborhood or being in an open area that is subject to harsh temperatures. A route can be developed for a person taking into account factors that specifically affect a pedestrian. Moreover, the route can alter as a situation of a user changes; for instance, if a user wants to add a stop along a route.
    Type: Grant
    Filed: December 14, 2007
    Date of Patent: January 3, 2012
    Assignee: Microsoft Corporation
    Inventors: Ivan J. Tashev, Jeffrey D. Couckuyt, Neil W. Black, John C. Krumm, Ruston Panabaker, Michael Lewis Seltzer
  • Patent number: 8065078
    Abstract: The presentation of location information to a user that is distracted by traveling can result in the user quickly forgetting, or never even comprehending, key parts of the location information, such as the street number. Identification can be made of intersections and points of interest near the user's destination, which can then be provided instead of, or in addition to, the address, thereby increasing user comprehension and retention, especially when distracted. Map data can be parsed into addresses, intersections and points of interest databases. These databases can be accessed to identify proximate intersections and points of interest, which can then be filtered and subsequently ranked to identify one intersection, one point of interest, or both, that can be presented to the user to aid the user in comprehending and retaining the location information even when distracted.
    Type: Grant
    Filed: August 10, 2007
    Date of Patent: November 22, 2011
    Assignee: Microsoft Corporation
    Inventors: Ivan Tashev, Michael Lewis Seltzer, Yun-Cheng Ju, Alex Acero
  • Patent number: 8060297
    Abstract: A user can intend to travel between different locations and employ different traveling manners to reach an intended travel destination. At different points, different devices can be employed for disclosing a route. For instance, as a user walks, a route can be integrated into a personal electronic device, such as a cellular telephone. An evaluation can take place that due to specific route details, for example detailed text, a particular device would be superior for presentment over another.
    Type: Grant
    Filed: December 14, 2007
    Date of Patent: November 15, 2011
    Assignee: Microsoft Corporation
    Inventors: Jeffrey D. Couckuyt, Neil W. Black, John C. Krumm, Ruston Panabaker, Ivan J. Tashev, Michael Lewis Seltzer
  • Publication number: 20110238416
    Abstract: Described is a technology by which a speech recognizer is adapted to perform in noisy environments using linear spline interpolation to approximate the nonlinear relationship between clean speech, noise, and noisy speech. Linear spline parameters that minimize the error the between predicted noisy features and actual noisy features are learned from training data, along with variance data that reflect regression errors. Also described is compensating for linear channel distortion and updating noise and channel parameters during speech recognition decoding.
    Type: Application
    Filed: March 24, 2010
    Publication date: September 29, 2011
    Applicant: MICROSOFT CORPORATION
    Inventors: Michael Lewis Seltzer, Kaustubh Prakash Kalgaonkar, Alejandro Acero
  • Publication number: 20100318354
    Abstract: Technologies are described herein for noise adaptive training to achieve robust automatic speech recognition. Through the use of these technologies, a noise adaptive training (NAT) approach may use both clean and corrupted speech for training. The NAT approach may normalize the environmental distortion as part of the model training. A set of underlying “pseudo-clean” model parameters may be estimated directly. This may be done without point estimation of clean speech features as an intermediate step. The pseudo-clean model parameters learned from the NAT technique may be used with a Vector Taylor Series (VTS) adaptation. Such adaptation may support decoding noisy utterances during the operating phase of a automatic voice recognition system.
    Type: Application
    Filed: June 12, 2009
    Publication date: December 16, 2010
    Applicant: Microsoft Corporation
    Inventors: Michael Lewis Seltzer, James Garnet Droppo, Ozlem Kalinli, Alejandro Acero
  • Publication number: 20100177908
    Abstract: Described is a audio signal processing technology in which an adaptive beamformer processes input signals from microphones based on an estimate received from a pre-filter. The adaptive beamformer may compute its parameters (e.g., weights) for each frame based on the estimate, via a magnitude-domain objective function or log-magnitude-domain objective function. The pre-filter may include a time invariant beamformer and/or a non-linear spatial filter, and/or may include a spectral filter. The computed parameters may be adjusted based on a constraint, which may be selectively applied only at desired times.
    Type: Application
    Filed: January 15, 2009
    Publication date: July 15, 2010
    Applicant: Microsoft Corporation
    Inventors: Michael Lewis Seltzer, Ivan Jelev Tashev
  • Publication number: 20100145689
    Abstract: An audio signal is received that might include keyboard noise and speech. The audio signal is digitized and transformed from a time domain to a frequency domain. The transformed audio is analyzed to determine whether there is likelihood that keystroke noise is present. If it is determined there is high likelihood that the audio signal contains keystroke noise, a determination is made as to whether a keyboard event occurred around the time of the likely keystroke noise. If it is determined that a keyboard event occurred around the time of the likely keystroke noise, a determination is made as to whether speech is present in the audio signal around the time of the likely keystroke noise. If no speech is present, the keystroke noise is suppressed in the audio signal. If speech is detected in the audio signal or if the keystroke noise abates, the suppression gain is removed from the audio signal.
    Type: Application
    Filed: December 5, 2008
    Publication date: June 10, 2010
    Applicant: Microsoft Corporation
    Inventors: Qin Li, Michael Lewis Seltzer, Chao He