Patents by Inventor Shuangyu Chang
Shuangyu Chang has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 10679610Abstract: A method for eyes-off training of a dictation system includes translating an audio signal featuring speech audio of a speaker into an initial recognized text using a previously-trained general language model. The initial recognized text is provided to the speaker for error correction. The audio signal is re-translated into an updated recognized text using a specialized language model biased to recognize words included in the corrected text. The general language model is retrained in an “eyes-off” manner, based on the audio signal and the updated recognized text.Type: GrantFiled: July 16, 2018Date of Patent: June 9, 2020Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Hemant Malhotra, Shuangyu Chang, Pradip Kumar Fatehpuria
-
Patent number: 10650811Abstract: Disclosed in various examples are methods, systems, and machine-readable mediums for providing improved computer implemented speech recognition by detecting and correcting speech recognition errors during a speech session. The system recognizes repeated speech commands from a user in a speech session that are similar or identical to each other. To correct these repeated errors, the system creates a customized language model that is then utilized by the language modeler to produce a refined prediction of the meaning of the repeated speech commands. The custom language model may comprise clusters of similar past predictions of speech commands from the speech session of the user.Type: GrantFiled: March 13, 2018Date of Patent: May 12, 2020Assignee: Microsoft Technology Licensing, LLCInventors: Meryem Pinar Donmez Ediz, Ranjitha Gurunath Kulkarni, Shuangyu Chang, Nitin Kamra
-
Publication number: 20200020319Abstract: A method for eyes-off training of a dictation system includes translating an audio signal featuring speech audio of a speaker into an initial recognized text using a previously-trained general language model. The initial recognized text is provided to the speaker for error correction. The audio signal is re-translated into an updated recognized text using a specialized language model biased to recognize words included in the corrected text. The general language model is retrained in an “eyes-off” manner, based on the audio signal and the updated recognized text.Type: ApplicationFiled: July 16, 2018Publication date: January 16, 2020Applicant: Microsoft Technology Licensing, LLCInventors: Hemant MALHOTRA, Shuangyu CHANG, Pradip Kumar FATEHPURIA
-
Patent number: 10497367Abstract: The customization of language modeling components for speech recognition is provided. A list of language modeling components may be made available by a computing device. A hint may then be sent to a recognition service provider for combining the multiple language modeling components from the list. The hint may be based on a number of different domains. A customized combination of the language modeling components based on the hint may then be received from the recognition service provider.Type: GrantFiled: December 22, 2016Date of Patent: December 3, 2019Assignee: Microsoft Technology Licensing, LLCInventors: Michael Levit, Hernan Guelman, Shuangyu Chang, Sarangarajan Parthasarathy, Benoit Dumoulin
-
Publication number: 20190287519Abstract: Disclosed in various examples are methods, systems, and machine-readable mediums for providing improved computer implemented speech recognition by detecting and correcting speech recognition errors during a speech session. The system recognizes repeated speech commands from a user in a speech session that are similar or identical to each other. To correct these repeated errors, the system creates a customized language model that is then utilized by the language modeler to produce a refined prediction of the meaning of the repeated speech commands. The custom language model may comprise clusters of similar past predictions of speech commands from the speech session of the user.Type: ApplicationFiled: March 13, 2018Publication date: September 19, 2019Inventors: Meryem Pinar Donmez Ediz, Ranjitha Gurunath Kulkarni, Shuangyu Chang, Nitin Kamra
-
Patent number: 10192545Abstract: A computer system for language modeling may collect training data from one or more information sources, generate a spoken corpus containing text of transcribed speech, and generate a typed corpus containing typed text. The computer system may derive feature vectors from the spoken corpus, analyze the typed corpus to determine feature vectors representing items of typed text, and generate an unspeakable corpus by filtering the typed corpus to remove each item of typed text represented by a feature vector that is within a similarity threshold of a feature vector derived from the spoken corpus. The computer system may derive feature vectors from the unspeakable corpus and train a classifier to perform discriminative data selection for language modeling based on the feature vectors derived from the spoken corpus and the feature vectors derived from the unspeakable corpus.Type: GrantFiled: June 5, 2017Date of Patent: January 29, 2019Assignee: Microsoft Technology Licensing, LLCInventors: Michael Levit, Shuangyu Chang, Benoit Dumoulin
-
Publication number: 20180330725Abstract: A method for priming an extensible speech recognition system comprises receiving audio language input from a user. The method also comprises receiving an indication that the audio language input is associated with a first language-based intelligent agent. The first language-based intelligent agent is associated with a first grammar set that is specific to the first language-based intelligent agent. Additionally, the method comprises matching one or more spoken words or phrases within the audio language input to text-based words or phrases within a general grammar set associated with a speech recognition system and the first grammar set. The first grammar set is associated with a higher match bias than the general grammar set, such that the speech recognition system is more likely to match the one or more spoken words or phrases to the text-based words or phrases within the first grammar set.Type: ApplicationFiled: August 18, 2017Publication date: November 15, 2018Inventors: Padma VARADHARAJAN, Shuangyu CHANG, Khuram SHAHID, Meryem Pinar DONMEZ EDIZ, Nitin AGARWAL
-
Patent number: 10019984Abstract: Techniques and technologies for diagnosing speech recognition errors are described. In an example implementation, a system for diagnosing speech recognition errors may include an error detection module configured to determine that a speech recognition result is least partially erroneous, and a recognition error diagnostics module. The recognition error diagnostics module may be configured to (a) perform a first error analysis of the at least partially erroneous speech recognition result to provide a first error analysis result; (b) perform a second error analysis of the at least partially erroneous speech recognition result to provide a second error analysis result; and (c) determine at least one category of recognition error associated with the at least partially erroneous speech recognition result based on a combination of the first error analysis result and the second error analysis result.Type: GrantFiled: February 27, 2015Date of Patent: July 10, 2018Assignee: Microsoft Technology Licensing, LLCInventors: Shiun-Zu Kuo, Thomas Reutter, Yifan Gong, Mark T. Hanson, Ye Tian, Shuangyu Chang, Jonathan Hamaker, Qi Miao, Yuancheng Tu
-
Patent number: 9922654Abstract: An incremental speech recognition system. The incremental speech recognition system incrementally decodes a spoken utterance using an additional utterance decoder only when the additional utterance decoder is likely to add significant benefit to the combined result. The available utterance decoders are ordered in a series based on accuracy, performance, diversity, and other factors. A recognition management engine coordinates decoding of the spoken utterance by the series of utterance decoders, combines the decoded utterances, and determines whether additional processing is likely to significantly improve the recognition result. If so, the recognition management engine engages the next utterance decoder and the cycle continues. If the accuracy cannot be significantly improved, the result is accepted and decoding stops.Type: GrantFiled: December 13, 2016Date of Patent: March 20, 2018Assignee: Microsoft Technology Licensing, LLCInventors: Shuangyu Chang, Michael Levit, Abhik Lahiri, Barlas Oguz, Benoit Dumoulin
-
Publication number: 20170270912Abstract: A computer system for language modeling may collect training data from one or more information sources, generate a spoken corpus containing text of transcribed speech, and generate a typed corpus containing typed text. The computer system may derive feature vectors from the spoken corpus, analyze the typed corpus to determine feature vectors representing items of typed text, and generate an unspeakable corpus by filtering the typed corpus to remove each item of typed text represented by a feature vector that is within a similarity threshold of a feature vector derived from the spoken corpus. The computer system may derive feature vectors from the unspeakable corpus and train a classifier to perform discriminative data selection for language modeling based on the feature vectors derived from the spoken corpus and the feature vectors derived from the unspeakable corpus.Type: ApplicationFiled: June 5, 2017Publication date: September 21, 2017Applicant: Microsoft Technology Licensing, LLCInventors: Michael Levit, Shuangyu Chang, Benoit Dumoulin
-
Patent number: 9761220Abstract: A computer system for language modeling may collect training data from one or more information sources, generate a spoken corpus containing text of transcribed speech, and generate a typed corpus containing typed text. The computer system may derive feature vectors from the spoken corpus, analyze the typed corpus to determine feature vectors representing items of typed text, and generate an unspeakable corpus by filtering the typed corpus to remove each item of typed text represented by a feature vector that is within a similarity threshold of a feature vector derived from the spoken corpus. The computer system may derive feature vectors from the unspeakable corpus and train a classifier to perform discriminative data selection for language modeling based on the feature vectors derived from the spoken corpus and the feature vectors derived from the unspeakable corpus.Type: GrantFiled: May 13, 2015Date of Patent: September 12, 2017Assignee: MICROSOFT TECHNOLOGY LICENSING, LLCInventors: Michael Levit, Shuangyu Chang, Benoit Dumoulin
-
Patent number: 9734826Abstract: Optimized language models are provided for in-domain applications through an iterative, joint-modeling approach that interpolates a language model (LM) from a number of component LMs according to interpolation weights optimized for a target domain. The component LMs may include class-based LMs, and the interpolation may be context-specific or context-independent. Through iterative processes, the component LMs may be interpolated and used to express training material as alternative representations or parses of tokens. Posterior probabilities may be determined for these parses and used for determining new (or updated) interpolation weights for the LM components, such that a combination or interpolation of component LMs is further optimized for the domain. The component LMs may be merged, according to the optimized weights, into a single, combined LM, for deployment in an application scenario.Type: GrantFiled: March 11, 2015Date of Patent: August 15, 2017Assignee: Microsoft Technology Licensing, LLCInventors: Michael Levit, Sarangarajan Parthasarathy, Andreas Stolcke, Shuangyu Chang
-
Publication number: 20170103753Abstract: The customization of language modeling components for speech recognition is provided. A list of language modeling components may be made available by a computing device. A hint may then be sent to a recognition service provider for combining the multiple language modeling components from the list. The hint may be based on a number of different domains. A customized combination of the language modeling components based on the hint may then be received from the recognition service provider.Type: ApplicationFiled: December 22, 2016Publication date: April 13, 2017Applicant: Microsoft Technology Licensing, LLCInventors: Michael Levit, Hernan Guelman, Shuangyu Chang, Sarangarajan Parthasarathy, Benoit Dumoulin
-
Publication number: 20170092275Abstract: An incremental speech recognition system. The incremental speech recognition system incrementally decodes a spoken utterance using an additional utterance decoder only when the additional utterance decoder is likely to add significant benefit to the combined result. The available utterance decoders are ordered in a series based on accuracy, performance, diversity, and other factors. A recognition management engine coordinates decoding of the spoken utterance by the series of utterance decoders, combines the decoded utterances, and determines whether additional processing is likely to significantly improve the recognition result. If so, the recognition management engine engages the next utterance decoder and the cycle continues. If the accuracy cannot be significantly improved, the result is accepted and decoding stops.Type: ApplicationFiled: December 13, 2016Publication date: March 30, 2017Applicant: Microsoft Technology Licensing, LLCInventors: Shuangyu Chang, Michael Levit, Abhik Lahiri, Barlas Oguz, Benoit Dumoulin
-
Patent number: 9552817Abstract: An incremental speech recognition system. The incremental speech recognition system incrementally decodes a spoken utterance using an additional utterance decoder only when the additional utterance decoder is likely to add significant benefit to the combined result. The available utterance decoders are ordered in a series based on accuracy, performance, diversity, and other factors. A recognition management engine coordinates decoding of the spoken utterance by the series of utterance decoders, combines the decoded utterances, and determines whether additional processing is likely to significantly improve the recognition result. If so, the recognition management engine engages the next utterance decoder and the cycle continues. If the accuracy cannot be significantly improved, the result is accepted and decoding stops.Type: GrantFiled: March 19, 2014Date of Patent: January 24, 2017Assignee: Microsoft Technology Licensing, LLCInventors: Shuangyu Chang, Michael Levit, Abhik Lahiri, Barlas Oguz, Benoit Dumoulin
-
Patent number: 9529794Abstract: The customization of language modeling components for speech recognition is provided. A list of language modeling components may be made available by a computing device. A hint may then be sent to a recognition service provider for combining the multiple language modeling components from the list. The hint may be based on a number of different domains. A customized combination of the language modeling components based on the hint may then be received from the recognition service provider.Type: GrantFiled: March 27, 2014Date of Patent: December 27, 2016Assignee: Microsoft Technology Licensing, LLCInventors: Michael Levit, Hernan Guelman, Shuangyu Chang, Sarangarajan Parthasarathy, Benoit Dumoulin
-
Publication number: 20160336006Abstract: A computer system for language modeling may collect training data from one or more information sources, generate a spoken corpus containing text of transcribed speech, and generate a typed corpus containing typed text. The computer system may derive feature vectors from the spoken corpus, analyze the typed corpus to determine feature vectors representing items of typed text, and generate an unspeakable corpus by filtering the typed corpus to remove each item of typed text represented by a feature vector that is within a similarity threshold of a feature vector derived from the spoken corpus. The computer system may derive feature vectors from the unspeakable corpus and train a classifier to perform discriminative data selection for language modeling based on the feature vectors derived from the spoken corpus and the feature vectors derived from the unspeakable corpus.Type: ApplicationFiled: May 13, 2015Publication date: November 17, 2016Applicant: Microsoft Technology Licensing, LLCInventors: Michael Levit, Shuangyu Chang, Benoit Dumoulin
-
Publication number: 20160267905Abstract: Optimized language models are provided for in-domain applications through an iterative, joint-modeling approach that interpolates a language model (LM) from a number of component LMs according to interpolation weights optimized for a target domain. The component LMs may include class-based LMs, and the interpolation may be context-specific or context-independent. Through iterative processes, the component LMs may be interpolated and used to express training material as alternative representations or parses of tokens. Posterior probabilities may be determined for these parses and used for determining new (or updated) interpolation weights for the LM components, such that a combination or interpolation of component LMs is further optimized for the domain. The component LMs may be merged, according to the optimized weights, into a single, combined LM, for deployment in an application scenario.Type: ApplicationFiled: March 11, 2015Publication date: September 15, 2016Inventors: Michael Levit, Sarangarajan Parthasarathy, Andreas Stolcke, Shuangyu Chang
-
Publication number: 20160253989Abstract: Techniques and technologies for diagnosing speech recognition errors are described. In an example implementation, a system for diagnosing speech recognition errors may include an error detection module configured to determine that a speech recognition result is least partially erroneous, and a recognition error diagnostics module. The recognition error diagnostics module may be configured to (a) perform a first error analysis of the at least partially erroneous speech recognition result to provide a first error analysis result; (b) perform a second error analysis of the at least partially erroneous speech recognition result to provide a second error analysis result; and (c) determine at least one category of recognition error associated with the at least partially erroneous speech recognition result based on a combination of the first error analysis result and the second error analysis result.Type: ApplicationFiled: February 27, 2015Publication date: September 1, 2016Inventors: Shiun-Zu Kuo, Thomas Reutter, Yifan Gong, Mark T. Hanson, Ye Tian, Shuangyu Chang, Jon Hamaker, Qi Miao, Yuancheng Tu
-
Patent number: 9299342Abstract: Query history expansion may be provided. Upon receiving a spoken query from a user, an adapted language model may be applied to convert the spoken query to text. The adapted language model may comprise a plurality of queries interpolated from the user's previous queries and queries associated with other users. The spoken query may be executed and the results of the spoken query may be provided to the user.Type: GrantFiled: July 23, 2015Date of Patent: March 29, 2016Assignee: Microsoft Technology Licensing, LLCInventors: Shuangyu Chang, Michael Levit, Bruce Melvin Buntschuh