Patents by Inventor Michael Lewis Seltzer

Michael Lewis Seltzer has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Efficient memory transformer based acoustic model for low latency streaming speech recognition

Patent number: 11646017

Abstract: In one embodiment, a method includes accessing a machine-learning model configured to generate an encoding for an utterance by using a module to process data associated with each segment of the utterance in a series of iterations, performing operations associated with an i-th segment during an n-th iteration by the module, which include receiving an input comprising input contextual embeddings generated for the i-th segment in a preceding iteration and a memory bank storing memory vectors generated in the preceding iteration for segments preceding the i-th segment, generating attention outputs and a memory vector based on keys, values, and queries generated using the input, and generating output contextual embeddings for the i-th segment based on the attention outputs, providing the memory vector to the module for performing operations associated with the i-th segment in a next iteration, and performing speech recognition by decoding the encoding of the utterance.

Type: Grant

Filed: March 5, 2021

Date of Patent: May 9, 2023

Assignee: Meta Platforms, Inc.

Inventors: Yangyang Shi, Yongqiang Wang, Chunyang Wu, Ching-Feng Yeh, Julian Yui-Hin Chan, Qiaochu Zhang, Duc Hoang Le, Michael Lewis Seltzer
Domain adaptation in speech recognition via teacher-student learning

Patent number: 10885900

Abstract: Improvements in speech recognition in a new domain are provided via the student/teacher training of models for different speech domains. A student model for a new domain is created based on the teacher model trained in an existing domain. The student model is trained in parallel to the operation of the teacher model, with inputs in the new and existing domains respectfully, to develop a neural network that is adapted to recognize speech in the new domain. The data in the new domain may exclude transcription labels but rather are parallelized with the data analyzed in the existing domain analyzed by the teacher model. The outputs from the teacher model are compared with the outputs of the student model and the differences are used to adjust the parameters of the student model to better recognize speech in the second domain.

Type: Grant

Filed: August 11, 2017

Date of Patent: January 5, 2021

Assignee: Microsoft Technology Licensing, LLC

Inventors: Jinyu Li, Michael Lewis Seltzer, Xi Wang, Rui Zhao, Yifan Gong
DOMAIN ADAPTATION IN SPEECH RECOGNITION VIA TEACHER-STUDENT LEARNING

Publication number: 20190051290

Abstract: Improvements in speech recognition in a new domain are provided via the student/teacher training of models for different speech domains. A student model for a new domain is created based on the teacher model trained in an existing domain. The student model is trained in parallel to the operation of the teacher model, with inputs in the new and existing domains respectfully, to develop a neural network that is adapted to recognize speech in the new domain. The data in the new domain may exclude transcription labels but rather are parallelized with the data analyzed in the existing domain analyzed by the teacher model. The outputs from the teacher model are compared with the outputs of the student model and the differences are used to adjust the parameters of the student model to better recognize speech in the second domain.

Type: Application

Filed: August 11, 2017

Publication date: February 14, 2019

Applicant: Microsoft Technology Licensing, LLC

Inventors: Jinyu Li, Michael Lewis Seltzer, Xi Wang, Rui Zhao, Yifan Gong
Factored transforms for separable adaptation of acoustic models

Patent number: 9984678

Abstract: Various technologies described herein pertain to adapting a speech recognizer to input speech data. A first linear transform can be selected from a first set of linear transforms based on a value of a first variability source corresponding to the input speech data, and a second linear transform can be selected from a second set of linear transforms based on a value of a second variability source corresponding to the input speech data. The linear transforms in the first and second sets can compensate for the first variability source and the second variability source, respectively. Moreover, the first linear transform can be applied to the input speech data to generate intermediate transformed speech data, and the second linear transform can be applied to the intermediate transformed speech data to generate transformed speech data. Further, speech can be recognized based on the transformed speech data to obtain a result.

Type: Grant

Filed: March 23, 2012

Date of Patent: May 29, 2018

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Michael Lewis Seltzer, Alejandro Acero
Noise adaptive training for speech recognition

Patent number: 9009039

Abstract: Technologies are described herein for noise adaptive training to achieve robust automatic speech recognition. Through the use of these technologies, a noise adaptive training (NAT) approach may use both clean and corrupted speech for training. The NAT approach may normalize the environmental distortion as part of the model training. A set of underlying “pseudo-clean” model parameters may be estimated directly. This may be done without point estimation of clean speech features as an intermediate step. The pseudo-clean model parameters learned from the NAT technique may be used with a Vector Taylor Series (VTS) adaptation. Such adaptation may support decoding noisy utterances during the operating phase of a automatic voice recognition system.

Type: Grant

Filed: June 12, 2009

Date of Patent: April 14, 2015

Assignee: Microsoft Technology Licensing, LLC

Inventors: Michael Lewis Seltzer, James Garnet Droppo, Ozlem Kalinli, Alejandro Acero
Route monetization

Patent number: 8793066

Abstract: A user can be compensated for taking detours from a projected route. Commonly, the reason for the compensation is that the user will be subjected to advertising, the user will pass by an establishment she is likely to visit, or to ease traffic congestion. Analysis of an area takes place and monetization opportunities are determined based upon the results of the analysis. A route between at least about two locations can be altered such that the user is provided a reward, commonly in an optimized manner.

Type: Grant

Filed: December 14, 2007

Date of Patent: July 29, 2014

Assignee: Microsoft Corporation

Inventors: Ruston Panabaker, John C. Krumm, Jeffrey D. Couckuyt, Ivan J. Tashev, Michael Lewis Seltzer, Neil W. Black
Route-based activity planner

Patent number: 8793065

Abstract: Oftentimes individuals have a number of objectives to complete while traveling in a vehicle. The objectives can be arranged automatically and an associated route can be produced such that the objectives can be completed in an effective manner. Data related to the objectives can be collected such as a traffic pattern on paths near a location the objective is to take place. Locations for the objectives to be completed can be determined automatically as well as provided by user. Analysis of the collected data can take place and based on a result of the analysis, an efficient route is produced.

Type: Grant

Filed: February 19, 2008

Date of Patent: July 29, 2014

Assignee: Microsoft Corporation

Inventors: Michael Lewis Seltzer, Neil W. Black, Jeffrey D. Couckuyt, Ivan J. Tashev, John C. Krumm, Ruston Panabaker
Acoustic model adaptation using splines

Patent number: 8700394

Abstract: Described is a technology by which a speech recognizer is adapted to perform in noisy environments using linear spline interpolation to approximate the nonlinear relationship between clean speech, noise, and noisy speech. Linear spline parameters that minimize the error the between predicted noisy features and actual noisy features are learned from training data, along with variance data that reflect regression errors. Also described is compensating for linear channel distortion and updating noise and channel parameters during speech recognition decoding.

Type: Grant

Filed: March 24, 2010

Date of Patent: April 15, 2014

Assignee: Microsoft Corporation

Inventors: Michael Lewis Seltzer, Kaustubh Prakash Kalgaonkar, Alejandro Acero
Utilizing Scalar Operations for Recognizing Utterances During Automatic Speech Recognition in Noisy Environments

Publication number: 20140067387

Abstract: Scalar operations for model adaptation or feature enhancement may be utilized for recognizing an utterance during automatic speech recognition in a noisy environment. An utterance including distorted speech generated from a transmission source for delivery to a receiver, may be received by a computer. The distorted speech may be caused by the noisy environment and channel distortion. Computations using scalar operations in the form of an algorithm may then be performed for recognizing the utterance. As a result of performing all of the computations with scalar operations, computational complexity is very small in comparison to matrix and vector operations. Vector Taylor Series with diagonal Jacobian approximation may also be utilized as a distortion-model-based noise robust algorithm with scalar operations.

Type: Application

Filed: September 5, 2012

Publication date: March 6, 2014

Applicant: MICROSOFT CORPORATION

Inventors: Jinyu Li, Michael Lewis Seltzer, Yifan Gong
FACTORED TRANSFORMS FOR SEPARABLE ADAPTATION OF ACOUSTIC MODELS

Publication number: 20130253930

Abstract: Various technologies described herein pertain to adapting a speech recognizer to input speech data. A first linear transform can be selected from a first set of linear transforms based on a value of a first variability source corresponding to the input speech data, and a second linear transform can be selected from a second set of linear transforms based on a value of a second variability source corresponding to the input speech data. The linear transforms in the first and second sets can compensate for the first variability source and the second variability source, respectively. Moreover, the first linear transform can be applied to the input speech data to generate intermediate transformed speech data, and the second linear transform can be applied to the intermediate transformed speech data to generate transformed speech data. Further, speech can be recognized based on the transformed speech data to obtain a result.

Type: Application

Filed: March 23, 2012

Publication date: September 26, 2013

Applicant: MICROSOFT CORPORATION

Inventors: Michael Lewis Seltzer, Alejandro Acero
Additional content based on intended travel destination

Patent number: 8473198

Abstract: When users travel to an intended destination, a plurality of information can be beneficial to assist their travel. If a person is traveling to a crowded event, then information can be provided such that congested traffic areas can be provided. There can be financial opportunities available in relation to providing information concerning an intended destination. An advertiser can pay money to have information played about the advertiser when it relates to the intended destination. Furthermore, a user can pay money for detailed data concerning an intended location, such as where cheapest parking is located.

Type: Grant

Filed: December 14, 2007

Date of Patent: June 25, 2013

Assignee: Microsoft Corporation

Inventors: John C. Krumm, Ruston Panabaker, Jeffrey D. Couckuyt, Ivan J. Tashev, Michael Lewis Seltzer, Neil W. Black
Adaptive beamformer using a log domain optimization criterion

Patent number: 8401206

Abstract: Described is a audio signal processing technology in which an adaptive beamformer processes input signals from microphones based on an estimate received from a pre-filter. The adaptive beamformer may compute its parameters (e.g., weights) for each frame based on the estimate, via a magnitude-domain objective function or log-magnitude-domain objective function. The pre-filter may include a time invariant beamformer and/or a non-linear spatial filter, and/or may include a spectral filter. The computed parameters may be adjusted based on a constraint, which may be selectively applied only at desired times.

Type: Grant

Filed: January 15, 2009

Date of Patent: March 19, 2013

Assignee: Microsoft Corporation

Inventors: Michael Lewis Seltzer, Ivan Jelev Tashev
Keystroke sound suppression

Patent number: 8213635

Abstract: An audio signal is received that might include keyboard noise and speech. The audio signal is digitized and transformed from a time domain to a frequency domain. The transformed audio is analyzed to determine whether there is likelihood that keystroke noise is present. If it is determined there is high likelihood that the audio signal contains keystroke noise, a determination is made as to whether a keyboard event occurred around the time of the likely keystroke noise. If it is determined that a keyboard event occurred around the time of the likely keystroke noise, a determination is made as to whether speech is present in the audio signal around the time of the likely keystroke noise. If no speech is present, the keystroke noise is suppressed in the audio signal. If speech is detected in the audio signal or if the keystroke noise abates, the suppression gain is removed from the audio signal.

Type: Grant

Filed: December 5, 2008

Date of Patent: July 3, 2012

Assignee: Microsoft Corporation

Inventors: Qin Li, Michael Lewis Seltzer, Chao He
Pedestrian route production

Patent number: 8090532

Abstract: As a pedestrian travels, various difficulties can be encountered, such as traveling through an unsafe neighborhood or being in an open area that is subject to harsh temperatures. A route can be developed for a person taking into account factors that specifically affect a pedestrian. Moreover, the route can alter as a situation of a user changes; for instance, if a user wants to add a stop along a route.

Type: Grant

Filed: December 14, 2007

Date of Patent: January 3, 2012

Assignee: Microsoft Corporation

Inventors: Ivan J. Tashev, Jeffrey D. Couckuyt, Neil W. Black, John C. Krumm, Ruston Panabaker, Michael Lewis Seltzer
Conveying locations in spoken dialog systems

Patent number: 8065078

Abstract: The presentation of location information to a user that is distracted by traveling can result in the user quickly forgetting, or never even comprehending, key parts of the location information, such as the street number. Identification can be made of intersections and points of interest near the user's destination, which can then be provided instead of, or in addition to, the address, thereby increasing user comprehension and retention, especially when distracted. Map data can be parsed into addresses, intersections and points of interest databases. These databases can be accessed to identify proximate intersections and points of interest, which can then be filtered and subsequently ranked to identify one intersection, one point of interest, or both, that can be presented to the user to aid the user in comprehending and retaining the location information even when distracted.

Type: Grant

Filed: August 10, 2007

Date of Patent: November 22, 2011

Assignee: Microsoft Corporation

Inventors: Ivan Tashev, Michael Lewis Seltzer, Yun-Cheng Ju, Alex Acero
Route transfer between devices

Patent number: 8060297

Abstract: A user can intend to travel between different locations and employ different traveling manners to reach an intended travel destination. At different points, different devices can be employed for disclosing a route. For instance, as a user walks, a route can be integrated into a personal electronic device, such as a cellular telephone. An evaluation can take place that due to specific route details, for example detailed text, a particular device would be superior for presentment over another.

Type: Grant

Filed: December 14, 2007

Date of Patent: November 15, 2011

Assignee: Microsoft Corporation

Inventors: Jeffrey D. Couckuyt, Neil W. Black, John C. Krumm, Ruston Panabaker, Ivan J. Tashev, Michael Lewis Seltzer
Acoustic Model Adaptation Using Splines

Publication number: 20110238416

Abstract: Described is a technology by which a speech recognizer is adapted to perform in noisy environments using linear spline interpolation to approximate the nonlinear relationship between clean speech, noise, and noisy speech. Linear spline parameters that minimize the error the between predicted noisy features and actual noisy features are learned from training data, along with variance data that reflect regression errors. Also described is compensating for linear channel distortion and updating noise and channel parameters during speech recognition decoding.

Type: Application

Filed: March 24, 2010

Publication date: September 29, 2011

Applicant: MICROSOFT CORPORATION

Inventors: Michael Lewis Seltzer, Kaustubh Prakash Kalgaonkar, Alejandro Acero
NOISE ADAPTIVE TRAINING FOR SPEECH RECOGNITION

Publication number: 20100318354

Abstract: Technologies are described herein for noise adaptive training to achieve robust automatic speech recognition. Through the use of these technologies, a noise adaptive training (NAT) approach may use both clean and corrupted speech for training. The NAT approach may normalize the environmental distortion as part of the model training. A set of underlying “pseudo-clean” model parameters may be estimated directly. This may be done without point estimation of clean speech features as an intermediate step. The pseudo-clean model parameters learned from the NAT technique may be used with a Vector Taylor Series (VTS) adaptation. Such adaptation may support decoding noisy utterances during the operating phase of a automatic voice recognition system.

Type: Application

Filed: June 12, 2009

Publication date: December 16, 2010

Applicant: Microsoft Corporation

Inventors: Michael Lewis Seltzer, James Garnet Droppo, Ozlem Kalinli, Alejandro Acero
ADAPTIVE BEAMFORMER USING A LOG DOMAIN OPTIMIZATION CRITERION

Publication number: 20100177908

Abstract: Described is a audio signal processing technology in which an adaptive beamformer processes input signals from microphones based on an estimate received from a pre-filter. The adaptive beamformer may compute its parameters (e.g., weights) for each frame based on the estimate, via a magnitude-domain objective function or log-magnitude-domain objective function. The pre-filter may include a time invariant beamformer and/or a non-linear spatial filter, and/or may include a spectral filter. The computed parameters may be adjusted based on a constraint, which may be selectively applied only at desired times.

Type: Application

Filed: January 15, 2009

Publication date: July 15, 2010

Applicant: Microsoft Corporation

Inventors: Michael Lewis Seltzer, Ivan Jelev Tashev
KEYSTROKE SOUND SUPPRESSION

Publication number: 20100145689

Abstract: An audio signal is received that might include keyboard noise and speech. The audio signal is digitized and transformed from a time domain to a frequency domain. The transformed audio is analyzed to determine whether there is likelihood that keystroke noise is present. If it is determined there is high likelihood that the audio signal contains keystroke noise, a determination is made as to whether a keyboard event occurred around the time of the likely keystroke noise. If it is determined that a keyboard event occurred around the time of the likely keystroke noise, a determination is made as to whether speech is present in the audio signal around the time of the likely keystroke noise. If no speech is present, the keystroke noise is suppressed in the audio signal. If speech is detected in the audio signal or if the keystroke noise abates, the suppression gain is removed from the audio signal.

Type: Application

Filed: December 5, 2008

Publication date: June 10, 2010

Applicant: Microsoft Corporation

Inventors: Qin Li, Michael Lewis Seltzer, Chao He

1 2 next