Patents by Inventor Michael Lewis Seltzer
Michael Lewis Seltzer has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11646017Abstract: In one embodiment, a method includes accessing a machine-learning model configured to generate an encoding for an utterance by using a module to process data associated with each segment of the utterance in a series of iterations, performing operations associated with an i-th segment during an n-th iteration by the module, which include receiving an input comprising input contextual embeddings generated for the i-th segment in a preceding iteration and a memory bank storing memory vectors generated in the preceding iteration for segments preceding the i-th segment, generating attention outputs and a memory vector based on keys, values, and queries generated using the input, and generating output contextual embeddings for the i-th segment based on the attention outputs, providing the memory vector to the module for performing operations associated with the i-th segment in a next iteration, and performing speech recognition by decoding the encoding of the utterance.Type: GrantFiled: March 5, 2021Date of Patent: May 9, 2023Assignee: Meta Platforms, Inc.Inventors: Yangyang Shi, Yongqiang Wang, Chunyang Wu, Ching-Feng Yeh, Julian Yui-Hin Chan, Qiaochu Zhang, Duc Hoang Le, Michael Lewis Seltzer
-
Patent number: 10885900Abstract: Improvements in speech recognition in a new domain are provided via the student/teacher training of models for different speech domains. A student model for a new domain is created based on the teacher model trained in an existing domain. The student model is trained in parallel to the operation of the teacher model, with inputs in the new and existing domains respectfully, to develop a neural network that is adapted to recognize speech in the new domain. The data in the new domain may exclude transcription labels but rather are parallelized with the data analyzed in the existing domain analyzed by the teacher model. The outputs from the teacher model are compared with the outputs of the student model and the differences are used to adjust the parameters of the student model to better recognize speech in the second domain.Type: GrantFiled: August 11, 2017Date of Patent: January 5, 2021Assignee: Microsoft Technology Licensing, LLCInventors: Jinyu Li, Michael Lewis Seltzer, Xi Wang, Rui Zhao, Yifan Gong
-
Publication number: 20190051290Abstract: Improvements in speech recognition in a new domain are provided via the student/teacher training of models for different speech domains. A student model for a new domain is created based on the teacher model trained in an existing domain. The student model is trained in parallel to the operation of the teacher model, with inputs in the new and existing domains respectfully, to develop a neural network that is adapted to recognize speech in the new domain. The data in the new domain may exclude transcription labels but rather are parallelized with the data analyzed in the existing domain analyzed by the teacher model. The outputs from the teacher model are compared with the outputs of the student model and the differences are used to adjust the parameters of the student model to better recognize speech in the second domain.Type: ApplicationFiled: August 11, 2017Publication date: February 14, 2019Applicant: Microsoft Technology Licensing, LLCInventors: Jinyu Li, Michael Lewis Seltzer, Xi Wang, Rui Zhao, Yifan Gong
-
Patent number: 9984678Abstract: Various technologies described herein pertain to adapting a speech recognizer to input speech data. A first linear transform can be selected from a first set of linear transforms based on a value of a first variability source corresponding to the input speech data, and a second linear transform can be selected from a second set of linear transforms based on a value of a second variability source corresponding to the input speech data. The linear transforms in the first and second sets can compensate for the first variability source and the second variability source, respectively. Moreover, the first linear transform can be applied to the input speech data to generate intermediate transformed speech data, and the second linear transform can be applied to the intermediate transformed speech data to generate transformed speech data. Further, speech can be recognized based on the transformed speech data to obtain a result.Type: GrantFiled: March 23, 2012Date of Patent: May 29, 2018Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Michael Lewis Seltzer, Alejandro Acero
-
Patent number: 9009039Abstract: Technologies are described herein for noise adaptive training to achieve robust automatic speech recognition. Through the use of these technologies, a noise adaptive training (NAT) approach may use both clean and corrupted speech for training. The NAT approach may normalize the environmental distortion as part of the model training. A set of underlying “pseudo-clean” model parameters may be estimated directly. This may be done without point estimation of clean speech features as an intermediate step. The pseudo-clean model parameters learned from the NAT technique may be used with a Vector Taylor Series (VTS) adaptation. Such adaptation may support decoding noisy utterances during the operating phase of a automatic voice recognition system.Type: GrantFiled: June 12, 2009Date of Patent: April 14, 2015Assignee: Microsoft Technology Licensing, LLCInventors: Michael Lewis Seltzer, James Garnet Droppo, Ozlem Kalinli, Alejandro Acero
-
Patent number: 8793066Abstract: A user can be compensated for taking detours from a projected route. Commonly, the reason for the compensation is that the user will be subjected to advertising, the user will pass by an establishment she is likely to visit, or to ease traffic congestion. Analysis of an area takes place and monetization opportunities are determined based upon the results of the analysis. A route between at least about two locations can be altered such that the user is provided a reward, commonly in an optimized manner.Type: GrantFiled: December 14, 2007Date of Patent: July 29, 2014Assignee: Microsoft CorporationInventors: Ruston Panabaker, John C. Krumm, Jeffrey D. Couckuyt, Ivan J. Tashev, Michael Lewis Seltzer, Neil W. Black
-
Patent number: 8793065Abstract: Oftentimes individuals have a number of objectives to complete while traveling in a vehicle. The objectives can be arranged automatically and an associated route can be produced such that the objectives can be completed in an effective manner. Data related to the objectives can be collected such as a traffic pattern on paths near a location the objective is to take place. Locations for the objectives to be completed can be determined automatically as well as provided by user. Analysis of the collected data can take place and based on a result of the analysis, an efficient route is produced.Type: GrantFiled: February 19, 2008Date of Patent: July 29, 2014Assignee: Microsoft CorporationInventors: Michael Lewis Seltzer, Neil W. Black, Jeffrey D. Couckuyt, Ivan J. Tashev, John C. Krumm, Ruston Panabaker
-
Patent number: 8700394Abstract: Described is a technology by which a speech recognizer is adapted to perform in noisy environments using linear spline interpolation to approximate the nonlinear relationship between clean speech, noise, and noisy speech. Linear spline parameters that minimize the error the between predicted noisy features and actual noisy features are learned from training data, along with variance data that reflect regression errors. Also described is compensating for linear channel distortion and updating noise and channel parameters during speech recognition decoding.Type: GrantFiled: March 24, 2010Date of Patent: April 15, 2014Assignee: Microsoft CorporationInventors: Michael Lewis Seltzer, Kaustubh Prakash Kalgaonkar, Alejandro Acero
-
Publication number: 20140067387Abstract: Scalar operations for model adaptation or feature enhancement may be utilized for recognizing an utterance during automatic speech recognition in a noisy environment. An utterance including distorted speech generated from a transmission source for delivery to a receiver, may be received by a computer. The distorted speech may be caused by the noisy environment and channel distortion. Computations using scalar operations in the form of an algorithm may then be performed for recognizing the utterance. As a result of performing all of the computations with scalar operations, computational complexity is very small in comparison to matrix and vector operations. Vector Taylor Series with diagonal Jacobian approximation may also be utilized as a distortion-model-based noise robust algorithm with scalar operations.Type: ApplicationFiled: September 5, 2012Publication date: March 6, 2014Applicant: MICROSOFT CORPORATIONInventors: Jinyu Li, Michael Lewis Seltzer, Yifan Gong
-
Publication number: 20130253930Abstract: Various technologies described herein pertain to adapting a speech recognizer to input speech data. A first linear transform can be selected from a first set of linear transforms based on a value of a first variability source corresponding to the input speech data, and a second linear transform can be selected from a second set of linear transforms based on a value of a second variability source corresponding to the input speech data. The linear transforms in the first and second sets can compensate for the first variability source and the second variability source, respectively. Moreover, the first linear transform can be applied to the input speech data to generate intermediate transformed speech data, and the second linear transform can be applied to the intermediate transformed speech data to generate transformed speech data. Further, speech can be recognized based on the transformed speech data to obtain a result.Type: ApplicationFiled: March 23, 2012Publication date: September 26, 2013Applicant: MICROSOFT CORPORATIONInventors: Michael Lewis Seltzer, Alejandro Acero
-
Patent number: 8473198Abstract: When users travel to an intended destination, a plurality of information can be beneficial to assist their travel. If a person is traveling to a crowded event, then information can be provided such that congested traffic areas can be provided. There can be financial opportunities available in relation to providing information concerning an intended destination. An advertiser can pay money to have information played about the advertiser when it relates to the intended destination. Furthermore, a user can pay money for detailed data concerning an intended location, such as where cheapest parking is located.Type: GrantFiled: December 14, 2007Date of Patent: June 25, 2013Assignee: Microsoft CorporationInventors: John C. Krumm, Ruston Panabaker, Jeffrey D. Couckuyt, Ivan J. Tashev, Michael Lewis Seltzer, Neil W. Black
-
Patent number: 8401206Abstract: Described is a audio signal processing technology in which an adaptive beamformer processes input signals from microphones based on an estimate received from a pre-filter. The adaptive beamformer may compute its parameters (e.g., weights) for each frame based on the estimate, via a magnitude-domain objective function or log-magnitude-domain objective function. The pre-filter may include a time invariant beamformer and/or a non-linear spatial filter, and/or may include a spectral filter. The computed parameters may be adjusted based on a constraint, which may be selectively applied only at desired times.Type: GrantFiled: January 15, 2009Date of Patent: March 19, 2013Assignee: Microsoft CorporationInventors: Michael Lewis Seltzer, Ivan Jelev Tashev
-
Patent number: 8213635Abstract: An audio signal is received that might include keyboard noise and speech. The audio signal is digitized and transformed from a time domain to a frequency domain. The transformed audio is analyzed to determine whether there is likelihood that keystroke noise is present. If it is determined there is high likelihood that the audio signal contains keystroke noise, a determination is made as to whether a keyboard event occurred around the time of the likely keystroke noise. If it is determined that a keyboard event occurred around the time of the likely keystroke noise, a determination is made as to whether speech is present in the audio signal around the time of the likely keystroke noise. If no speech is present, the keystroke noise is suppressed in the audio signal. If speech is detected in the audio signal or if the keystroke noise abates, the suppression gain is removed from the audio signal.Type: GrantFiled: December 5, 2008Date of Patent: July 3, 2012Assignee: Microsoft CorporationInventors: Qin Li, Michael Lewis Seltzer, Chao He
-
Patent number: 8090532Abstract: As a pedestrian travels, various difficulties can be encountered, such as traveling through an unsafe neighborhood or being in an open area that is subject to harsh temperatures. A route can be developed for a person taking into account factors that specifically affect a pedestrian. Moreover, the route can alter as a situation of a user changes; for instance, if a user wants to add a stop along a route.Type: GrantFiled: December 14, 2007Date of Patent: January 3, 2012Assignee: Microsoft CorporationInventors: Ivan J. Tashev, Jeffrey D. Couckuyt, Neil W. Black, John C. Krumm, Ruston Panabaker, Michael Lewis Seltzer
-
Patent number: 8065078Abstract: The presentation of location information to a user that is distracted by traveling can result in the user quickly forgetting, or never even comprehending, key parts of the location information, such as the street number. Identification can be made of intersections and points of interest near the user's destination, which can then be provided instead of, or in addition to, the address, thereby increasing user comprehension and retention, especially when distracted. Map data can be parsed into addresses, intersections and points of interest databases. These databases can be accessed to identify proximate intersections and points of interest, which can then be filtered and subsequently ranked to identify one intersection, one point of interest, or both, that can be presented to the user to aid the user in comprehending and retaining the location information even when distracted.Type: GrantFiled: August 10, 2007Date of Patent: November 22, 2011Assignee: Microsoft CorporationInventors: Ivan Tashev, Michael Lewis Seltzer, Yun-Cheng Ju, Alex Acero
-
Patent number: 8060297Abstract: A user can intend to travel between different locations and employ different traveling manners to reach an intended travel destination. At different points, different devices can be employed for disclosing a route. For instance, as a user walks, a route can be integrated into a personal electronic device, such as a cellular telephone. An evaluation can take place that due to specific route details, for example detailed text, a particular device would be superior for presentment over another.Type: GrantFiled: December 14, 2007Date of Patent: November 15, 2011Assignee: Microsoft CorporationInventors: Jeffrey D. Couckuyt, Neil W. Black, John C. Krumm, Ruston Panabaker, Ivan J. Tashev, Michael Lewis Seltzer
-
Publication number: 20110238416Abstract: Described is a technology by which a speech recognizer is adapted to perform in noisy environments using linear spline interpolation to approximate the nonlinear relationship between clean speech, noise, and noisy speech. Linear spline parameters that minimize the error the between predicted noisy features and actual noisy features are learned from training data, along with variance data that reflect regression errors. Also described is compensating for linear channel distortion and updating noise and channel parameters during speech recognition decoding.Type: ApplicationFiled: March 24, 2010Publication date: September 29, 2011Applicant: MICROSOFT CORPORATIONInventors: Michael Lewis Seltzer, Kaustubh Prakash Kalgaonkar, Alejandro Acero
-
Publication number: 20100318354Abstract: Technologies are described herein for noise adaptive training to achieve robust automatic speech recognition. Through the use of these technologies, a noise adaptive training (NAT) approach may use both clean and corrupted speech for training. The NAT approach may normalize the environmental distortion as part of the model training. A set of underlying “pseudo-clean” model parameters may be estimated directly. This may be done without point estimation of clean speech features as an intermediate step. The pseudo-clean model parameters learned from the NAT technique may be used with a Vector Taylor Series (VTS) adaptation. Such adaptation may support decoding noisy utterances during the operating phase of a automatic voice recognition system.Type: ApplicationFiled: June 12, 2009Publication date: December 16, 2010Applicant: Microsoft CorporationInventors: Michael Lewis Seltzer, James Garnet Droppo, Ozlem Kalinli, Alejandro Acero
-
Publication number: 20100177908Abstract: Described is a audio signal processing technology in which an adaptive beamformer processes input signals from microphones based on an estimate received from a pre-filter. The adaptive beamformer may compute its parameters (e.g., weights) for each frame based on the estimate, via a magnitude-domain objective function or log-magnitude-domain objective function. The pre-filter may include a time invariant beamformer and/or a non-linear spatial filter, and/or may include a spectral filter. The computed parameters may be adjusted based on a constraint, which may be selectively applied only at desired times.Type: ApplicationFiled: January 15, 2009Publication date: July 15, 2010Applicant: Microsoft CorporationInventors: Michael Lewis Seltzer, Ivan Jelev Tashev
-
Publication number: 20100145689Abstract: An audio signal is received that might include keyboard noise and speech. The audio signal is digitized and transformed from a time domain to a frequency domain. The transformed audio is analyzed to determine whether there is likelihood that keystroke noise is present. If it is determined there is high likelihood that the audio signal contains keystroke noise, a determination is made as to whether a keyboard event occurred around the time of the likely keystroke noise. If it is determined that a keyboard event occurred around the time of the likely keystroke noise, a determination is made as to whether speech is present in the audio signal around the time of the likely keystroke noise. If no speech is present, the keystroke noise is suppressed in the audio signal. If speech is detected in the audio signal or if the keystroke noise abates, the suppression gain is removed from the audio signal.Type: ApplicationFiled: December 5, 2008Publication date: June 10, 2010Applicant: Microsoft CorporationInventors: Qin Li, Michael Lewis Seltzer, Chao He