Patents by Inventor Michael Levit
Michael Levit has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11676576Abstract: Systems and methods are provided for acquiring training data and building an organizational-based language model based on the training data. In organizational data is generated via one or more applications associated with an organization, the collected organizational data is aggregated and filtered into training data that is used for training an organizational-based language model for speech processing based on the training data.Type: GrantFiled: August 11, 2021Date of Patent: June 13, 2023Assignee: Microsoft Technology Licensing, LLCInventors: Ziad Al Bawab, Anand U Desai, Cem Aksoylar, Michael Levit, Xin Meng, Shuangyu Chang, Suyash Choudhury, Dhiresh Rawal, Tao Li, Rishi Girish, Marcus Jager, Ananth Rampura Sheshagiri Rao
-
Publication number: 20220392432Abstract: Systems and methods for speech recognition correction include receiving a voice recognition input from an individual user and using a trained error correction model to add a new alternative result to a results list based on the received voice input processed by a voice recognition system. The error correction model is trained using contextual information corresponding to the individual user. The contextual information comprises a plurality of historical user correction logs, a plurality of personal class definitions, and an application context. A re-ranker re-ranks the results list with the new alternative result and a top result from the re-ranked results list is output.Type: ApplicationFiled: June 8, 2021Publication date: December 8, 2022Inventors: Issac ALPHONSO, Tasos ANASTASAKOS, Michael LEVIT, Nitin AGARWAL
-
Publication number: 20220382973Abstract: A computer implemented method includes receiving a natural language utterance, generating multiple alternative N-gram contexts for a evaluating a next word in the natural language utterance, selecting N-gram context candidates from the multiple alternative N-gram contexts comprising different sets of N-1 words in the natural language utterance for selecting a next word in the natural language utterance, and providing the N-gram context candidates for creating a transcript of the natural language utterance.Type: ApplicationFiled: May 28, 2021Publication date: December 1, 2022Inventors: Michael LEVIT, Cem Aksoylar
-
Publication number: 20220013109Abstract: Provided is a system and method for acquiring training data and building an organizational-based language model based on the training data. In one example, the method may include collecting organizational data that is generated via one or more applications associated with an organization, aggregating the collected organizational data with previously collected organizational data to generate aggregated organizational training data, training an organizational-based language model for speech processing based on the aggregated organizational training data, and storing the trained organizational-based language model.Type: ApplicationFiled: August 11, 2021Publication date: January 13, 2022Inventors: Ziad AL BAWAB, Anand U. DESAI, Cem AKSOYLAR, Michael LEVIT, Xin MENG, Shuangyu CHANG, Suyash CHOUDHURY, Dhiresh RAWAL, Tao LI, Rishi GIRISH, Marcus JAGER, Ananth Rampura SHESHAGIRI RAO
-
Patent number: 11120788Abstract: Provided is a system and method for acquiring training data and building an organizational-based language model based on the training data. In one example, the method may include collecting organizational data that is generated via one or more applications associated with an organization, aggregating the collected organizational data with previously collected organizational data to generate aggregated organizational training data, training an organizational-based language model for speech processing based on the aggregated organizational training data, and storing the trained organizational-based language model.Type: GrantFiled: June 27, 2019Date of Patent: September 14, 2021Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Ziad Al Bawab, Anand U Desai, Cem Aksoylar, Michael Levit, Xin Meng, Shuangyu Chang, Suyash Choudhury, Dhiresh Rawal, Tao Li, Rishi Girish, Marcus Jager, Ananth Rampura Sheshagiri Rao
-
Publication number: 20200349920Abstract: Provided is a system and method for acquiring training data and building an organizational-based language model based on the training data. In one example, the method may include collecting organizational data that is generated via one or more applications associated with an organization, aggregating the collected organizational data with previously collected organizational data to generate aggregated organizational training data, training an organizational-based language model for speech processing based on the aggregated organizational training data, and storing the trained organizational-based language model.Type: ApplicationFiled: June 27, 2019Publication date: November 5, 2020Inventors: Ziad AL BAWAB, Anand U DESAI, Cem AKSOYLAR, Michael LEVIT, Xin MENG, Shuangyu CHANG, Suyash CHOUDHURY, Dhiresh RAWAL, Tao LI, Rishi GIRISH, Marcus JAGER, Ananth Rampura SHESHAGIRI RAO
-
Patent number: 10497367Abstract: The customization of language modeling components for speech recognition is provided. A list of language modeling components may be made available by a computing device. A hint may then be sent to a recognition service provider for combining the multiple language modeling components from the list. The hint may be based on a number of different domains. A customized combination of the language modeling components based on the hint may then be received from the recognition service provider.Type: GrantFiled: December 22, 2016Date of Patent: December 3, 2019Assignee: Microsoft Technology Licensing, LLCInventors: Michael Levit, Hernan Guelman, Shuangyu Chang, Sarangarajan Parthasarathy, Benoit Dumoulin
-
Patent number: 10192545Abstract: A computer system for language modeling may collect training data from one or more information sources, generate a spoken corpus containing text of transcribed speech, and generate a typed corpus containing typed text. The computer system may derive feature vectors from the spoken corpus, analyze the typed corpus to determine feature vectors representing items of typed text, and generate an unspeakable corpus by filtering the typed corpus to remove each item of typed text represented by a feature vector that is within a similarity threshold of a feature vector derived from the spoken corpus. The computer system may derive feature vectors from the unspeakable corpus and train a classifier to perform discriminative data selection for language modeling based on the feature vectors derived from the spoken corpus and the feature vectors derived from the unspeakable corpus.Type: GrantFiled: June 5, 2017Date of Patent: January 29, 2019Assignee: Microsoft Technology Licensing, LLCInventors: Michael Levit, Shuangyu Chang, Benoit Dumoulin
-
Patent number: 9972311Abstract: Systems and methods are provided for optimizing language models for in-domain applications through an iterative, joint-modeling approach that expresses training material as alternative representations of higher-level tokens, such as named entities and carrier phrases. From a first language model, an in-domain training corpus may be represented as a set of alternative parses of tokens. Statistical information determined from these parsed representations may be used to produce a second (or updated) language model, which is further optimized for the domain. The second language model may be used to determine another alternative parsed representation of the corpus for a next iteration, and the statistical information determined from this representation may be used to produce a third (or further updated) language model. Through each iteration, a language model may be determined that is further optimized for the domain.Type: GrantFiled: May 7, 2014Date of Patent: May 15, 2018Assignee: Microsoft Technology Licensing, LLCInventors: Michael Levit, Sarangarajan Parthasarathy, Andreas Stolcke
-
Patent number: 9922654Abstract: An incremental speech recognition system. The incremental speech recognition system incrementally decodes a spoken utterance using an additional utterance decoder only when the additional utterance decoder is likely to add significant benefit to the combined result. The available utterance decoders are ordered in a series based on accuracy, performance, diversity, and other factors. A recognition management engine coordinates decoding of the spoken utterance by the series of utterance decoders, combines the decoded utterances, and determines whether additional processing is likely to significantly improve the recognition result. If so, the recognition management engine engages the next utterance decoder and the cycle continues. If the accuracy cannot be significantly improved, the result is accepted and decoding stops.Type: GrantFiled: December 13, 2016Date of Patent: March 20, 2018Assignee: Microsoft Technology Licensing, LLCInventors: Shuangyu Chang, Michael Levit, Abhik Lahiri, Barlas Oguz, Benoit Dumoulin
-
Publication number: 20170270912Abstract: A computer system for language modeling may collect training data from one or more information sources, generate a spoken corpus containing text of transcribed speech, and generate a typed corpus containing typed text. The computer system may derive feature vectors from the spoken corpus, analyze the typed corpus to determine feature vectors representing items of typed text, and generate an unspeakable corpus by filtering the typed corpus to remove each item of typed text represented by a feature vector that is within a similarity threshold of a feature vector derived from the spoken corpus. The computer system may derive feature vectors from the unspeakable corpus and train a classifier to perform discriminative data selection for language modeling based on the feature vectors derived from the spoken corpus and the feature vectors derived from the unspeakable corpus.Type: ApplicationFiled: June 5, 2017Publication date: September 21, 2017Applicant: Microsoft Technology Licensing, LLCInventors: Michael Levit, Shuangyu Chang, Benoit Dumoulin
-
Patent number: 9761220Abstract: A computer system for language modeling may collect training data from one or more information sources, generate a spoken corpus containing text of transcribed speech, and generate a typed corpus containing typed text. The computer system may derive feature vectors from the spoken corpus, analyze the typed corpus to determine feature vectors representing items of typed text, and generate an unspeakable corpus by filtering the typed corpus to remove each item of typed text represented by a feature vector that is within a similarity threshold of a feature vector derived from the spoken corpus. The computer system may derive feature vectors from the unspeakable corpus and train a classifier to perform discriminative data selection for language modeling based on the feature vectors derived from the spoken corpus and the feature vectors derived from the unspeakable corpus.Type: GrantFiled: May 13, 2015Date of Patent: September 12, 2017Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Michael Levit, Shuangyu Chang, Benoit Dumoulin
-
Patent number: 9734826Abstract: Optimized language models are provided for in-domain applications through an iterative, joint-modeling approach that interpolates a language model (LM) from a number of component LMs according to interpolation weights optimized for a target domain. The component LMs may include class-based LMs, and the interpolation may be context-specific or context-independent. Through iterative processes, the component LMs may be interpolated and used to express training material as alternative representations or parses of tokens. Posterior probabilities may be determined for these parses and used for determining new (or updated) interpolation weights for the LM components, such that a combination or interpolation of component LMs is further optimized for the domain. The component LMs may be merged, according to the optimized weights, into a single, combined LM, for deployment in an application scenario.Type: GrantFiled: March 11, 2015Date of Patent: August 15, 2017Assignee: Microsoft Technology Licensing, LLCInventors: Michael Levit, Sarangarajan Parthasarathy, Andreas Stolcke, Shuangyu Chang
-
Patent number: 9672202Abstract: Various components provide options to re-format an input based on one or more contexts. The input is received that has been submitted to an application (e.g., messaging application, mobile application, word-processing application, web browser, search tool, etc.), and one or more outputs are identified that are possibilities to be provided as options for re-formatting. A respective score of each output is determined by applying a statistical model to a respective combination of the input and each output, the respective score comprising a plurality of context scores that quantify a plurality of contexts of the respective combination. Exemplary contexts include historical-user contexts, domain contexts, and general contexts. One or more suggested outputs are selected from among the one or more outputs based on the respective scores and are provided as options to re-format the input.Type: GrantFiled: March 20, 2014Date of Patent: June 6, 2017Assignee: Microsoft Technology Licensing, LLCInventors: Issac Alphonso, Nick Kibre, Michael Levit, Sarangarajan Parthasarathy
-
Publication number: 20170103753Abstract: The customization of language modeling components for speech recognition is provided. A list of language modeling components may be made available by a computing device. A hint may then be sent to a recognition service provider for combining the multiple language modeling components from the list. The hint may be based on a number of different domains. A customized combination of the language modeling components based on the hint may then be received from the recognition service provider.Type: ApplicationFiled: December 22, 2016Publication date: April 13, 2017Applicant: Microsoft Technology Licensing, LLCInventors: Michael Levit, Hernan Guelman, Shuangyu Chang, Sarangarajan Parthasarathy, Benoit Dumoulin
-
Publication number: 20170092275Abstract: An incremental speech recognition system. The incremental speech recognition system incrementally decodes a spoken utterance using an additional utterance decoder only when the additional utterance decoder is likely to add significant benefit to the combined result. The available utterance decoders are ordered in a series based on accuracy, performance, diversity, and other factors. A recognition management engine coordinates decoding of the spoken utterance by the series of utterance decoders, combines the decoded utterances, and determines whether additional processing is likely to significantly improve the recognition result. If so, the recognition management engine engages the next utterance decoder and the cycle continues. If the accuracy cannot be significantly improved, the result is accepted and decoding stops.Type: ApplicationFiled: December 13, 2016Publication date: March 30, 2017Applicant: Microsoft Technology Licensing, LLCInventors: Shuangyu Chang, Michael Levit, Abhik Lahiri, Barlas Oguz, Benoit Dumoulin
-
Patent number: 9552817Abstract: An incremental speech recognition system. The incremental speech recognition system incrementally decodes a spoken utterance using an additional utterance decoder only when the additional utterance decoder is likely to add significant benefit to the combined result. The available utterance decoders are ordered in a series based on accuracy, performance, diversity, and other factors. A recognition management engine coordinates decoding of the spoken utterance by the series of utterance decoders, combines the decoded utterances, and determines whether additional processing is likely to significantly improve the recognition result. If so, the recognition management engine engages the next utterance decoder and the cycle continues. If the accuracy cannot be significantly improved, the result is accepted and decoding stops.Type: GrantFiled: March 19, 2014Date of Patent: January 24, 2017Assignee: Microsoft Technology Licensing, LLCInventors: Shuangyu Chang, Michael Levit, Abhik Lahiri, Barlas Oguz, Benoit Dumoulin
-
Patent number: 9542931Abstract: On a computing device a speech utterance is received from a user. The speech utterance is a section of a speech dialog that includes a plurality of speech utterances. One or more features from the speech utterance are identified. Each identified feature from the speech utterance is a specific characteristic of the speech utterance. One or more features from the speech dialog are identified. Each identified feature from the speech dialog is associated with one or more events in the speech dialog. The one or more events occur prior to the speech utterance. One or more identified features from the speech utterance and one or more identified features from the speech dialog are used to calculate a confidence score for the speech utterance.Type: GrantFiled: October 23, 2014Date of Patent: January 10, 2017Assignee: Microsoft Technology Licensing, LLCInventors: Michael Levit, Bruce Melvin Buntschuh
-
Patent number: 9529794Abstract: The customization of language modeling components for speech recognition is provided. A list of language modeling components may be made available by a computing device. A hint may then be sent to a recognition service provider for combining the multiple language modeling components from the list. The hint may be based on a number of different domains. A customized combination of the language modeling components based on the hint may then be received from the recognition service provider.Type: GrantFiled: March 27, 2014Date of Patent: December 27, 2016Assignee: Microsoft Technology Licensing, LLCInventors: Michael Levit, Hernan Guelman, Shuangyu Chang, Sarangarajan Parthasarathy, Benoit Dumoulin
-
Publication number: 20160336006Abstract: A computer system for language modeling may collect training data from one or more information sources, generate a spoken corpus containing text of transcribed speech, and generate a typed corpus containing typed text. The computer system may derive feature vectors from the spoken corpus, analyze the typed corpus to determine feature vectors representing items of typed text, and generate an unspeakable corpus by filtering the typed corpus to remove each item of typed text represented by a feature vector that is within a similarity threshold of a feature vector derived from the spoken corpus. The computer system may derive feature vectors from the unspeakable corpus and train a classifier to perform discriminative data selection for language modeling based on the feature vectors derived from the spoken corpus and the feature vectors derived from the unspeakable corpus.Type: ApplicationFiled: May 13, 2015Publication date: November 17, 2016Applicant: Microsoft Technology Licensing, LLCInventors: Michael Levit, Shuangyu Chang, Benoit Dumoulin