Patents by Inventor Venkatesh Nagesha
Venkatesh Nagesha has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 10592604Abstract: Techniques for inverse text normalization are provided. In some examples, speech input is received and a spoken-form text representation of the speech input is generated. The spoken-form text representation includes a token sequence. A feature representation is determined for the spoken-form text representation and a sequence of labels is determined based on the feature representation. The sequence of labels is assigned to the token sequence and specifies a plurality of edit operations to perform on the token sequence. Each edit operation of the plurality of edit operations corresponds to one of a plurality of predetermined types of edit operations. A written-form text representation of the speech input is generated by applying the plurality of edit operations to the token sequence in accordance with the sequence of labels. A task responsive to the speech input is performed using the generated written-form text representation.Type: GrantFiled: June 29, 2018Date of Patent: March 17, 2020Assignee: Apple Inc.Inventors: Ernest J. Pusateri, Bharat Ram Ambati, Elizabeth S. Brooks, Donald R. McAllaster, Venkatesh Nagesha, Ondrej Platek
-
Publication number: 20190278841Abstract: Techniques for inverse text normalization are provided. In some examples, speech input is received and a spoken-form text representation of the speech input is generated. The spoken-form text representation includes a token sequence. A feature representation is determined for the spoken-form text representation and a sequence of labels is determined based on the feature representation. The sequence of labels is assigned to the token sequence and specifies a plurality of edit operations to perform on the token sequence. Each edit operation of the plurality of edit operations corresponds to one of a plurality of predetermined types of edit operations. A written-form text representation of the speech input is generated by applying the plurality of edit operations to the token sequence in accordance with the sequence of labels. A task responsive to the speech input is performed using the generated written-form text representation.Type: ApplicationFiled: June 29, 2018Publication date: September 12, 2019Inventors: Ernest J. PUSATERI, Bharat Ram AMBATI, Elizabeth S. BROOKS, Donald R. MCALLASTER, Venkatesh NAGESHA, Ondrej PLATEK
-
Patent number: 10062374Abstract: According to some aspects, a method of training a transformation component using a trained acoustic model comprising first parameters having respective first values established during training of the acoustic model using first training data is provided. The method comprises using at least one computer processor to perform coupling the transformation component to a portion of the acoustic model, the transformation component comprising second parameters, and training the transformation component by determining, for the second parameters, respective second values using second training data input to the transformation component and processed by the acoustic model, wherein the acoustic model retains the first parameters having the respective first values throughout training of the transformation component.Type: GrantFiled: July 18, 2014Date of Patent: August 28, 2018Assignee: Nuance Communications, Inc.Inventors: Xiaoqiang Xiao, Chengyuan Ma, Venkatesh Nagesha
-
Patent number: 9858038Abstract: A method is described for user correction of speech recognition results. A speech recognition result for a given unknown speech input is displayed to a user. A user selection is received of a portion of the recognition result needing to be corrected. For each of multiple different recognition data sources, a ranked list of alternate recognition choices is determined which correspond to the selected portion. The alternate recognition choices are concatenated or interleaved together and duplicate choices removed to form a single ranked output list of alternate recognition choices, which is displayed to the user. The method may be adaptive over time to derive preferences that can then be leveraged in the ordering of one choice list or across choice lists.Type: GrantFiled: February 1, 2013Date of Patent: January 2, 2018Assignee: Nuance Communications, Inc.Inventors: Olivier Divay, Joev Dubach, Venkatesh Nagesha, Allan Gold
-
Patent number: 9721561Abstract: In a speech recognition system, deep neural networks (DNNs) are employed in phoneme recognition. While DNNs typically provide better phoneme recognition performance than other techniques, such as Gaussian mixture models (GMM), adapting a DNN to a particular speaker is a real challenge. According to at least one example embodiment, speech data and corresponding speaker data are both applied as input to a DNN. In response, the DNN generates a prediction of a phoneme based on the input speech data and the corresponding speaker data. The speaker data may be generated from the corresponding speech data.Type: GrantFiled: December 5, 2013Date of Patent: August 1, 2017Assignee: Nuance Communications, Inc.Inventors: Yun Tang, Venkatesh Nagesha, Xing Fan
-
Patent number: 9269349Abstract: An automatic speech recognition dictation application is described that includes a dictation module for performing automatic speech recognition in a dictation session with a speaker user to determine representative text corresponding to input speech from the speaker user. A post-processing module develops a session level metric correlated to verbatim recognition error rate of the dictation session, and determines if recognition performance degraded during the dictation session based on a comparison of the session metric to a baseline metric.Type: GrantFiled: May 24, 2012Date of Patent: February 23, 2016Assignee: Nuance Communications, Inc.Inventors: Xiaoqiang Xiao, Venkatesh Nagesha
-
Publication number: 20160019884Abstract: According to some aspects, a method of training a transformation component using a trained acoustic model comprising first parameters having respective first values established during training of the acoustic model using first training data is provided. The method comprises using at least one computer processor to perform coupling the transformation component to a portion of the acoustic model, the transformation component comprising second parameters, and training the transformation component by determining, for the second parameters, respective second values using second training data input to the transformation component and processed by the acoustic model, wherein the acoustic model retains the first parameters having the respective first values throughout training of the transformation component.Type: ApplicationFiled: July 18, 2014Publication date: January 21, 2016Inventors: Xiaoqiang Xiao, Chengyuan Ma, Venkatesh Nagesha
-
Publication number: 20150161994Abstract: In a speech recognition system, deep neural networks (DNNs) are employed in phoneme recognition. While DNNs typically provide better phoneme recognition performance than other techniques, such as Gaussian mixture models (GMM), adapting a DNN to a particular speaker is a real challenge. According to at least one example embodiment, speech data and corresponding speaker data are both applied as input to a DNN. In response, the DNN generates a prediction of a phoneme based on the input speech data and the corresponding speaker data. The speaker data may be generated from the corresponding speech data.Type: ApplicationFiled: December 5, 2013Publication date: June 11, 2015Applicant: Nuance Communications, Inc.Inventors: Yun Tang, Venkatesh Nagesha, Xing Fan
-
Patent number: 9037463Abstract: A method for speech recognition is described that uses an initial recognizer to perform an initial speech recognition pass on an input speech utterance to determine an initial recognition result corresponding to the input speech utterance, and a reliability measure reflecting a per word reliability of the initial recognition result. For portions of the initial recognition result where the reliability of the result is low, a re-evaluation recognizer is used to perform a re-evaluation recognition pass on the corresponding portions of the input speech utterance to determine a re-evaluation recognition result corresponding to the re-evaluated portions of the input speech utterance. The initial recognizer and the re-evaluation recognizer are complementary so as to make different recognition errors. A final recognition result is determined based on the re-evaluation recognition result if any, and otherwise based on the initial recognition result.Type: GrantFiled: May 27, 2010Date of Patent: May 19, 2015Assignee: Nuance Communications, Inc.Inventors: Daniel Willett, Venkatesh Nagesha
-
Publication number: 20140223310Abstract: A method is described for user correction of speech recognition results. A speech recognition result for a given unknown speech input is displayed to a user. A user selection is received of a portion of the recognition result needing to be corrected. For each of multiple different recognition data sources, a ranked list of alternate recognition choices is determined which correspond to the selected portion. The alternate recognition choices are concatenated or interleaved together and duplicate choices removed to form a single ranked output list of alternate recognition choices, which is displayed to the user. The method may be adaptive over time to derive preferences that can then be leveraged in the ordering of one choice list or across choice lists.Type: ApplicationFiled: February 1, 2013Publication date: August 7, 2014Applicant: Nuance Communications, Inc.Inventors: Olivier Divay, Joev Dubach, Venkatesh Nagesha, Allan Gold
-
Patent number: 8768695Abstract: A computer-implemented arrangement is described for performing cepstral mean normalization (CMN) in automatic speech recognition. A current CMN function is stored in a computer memory as a previous CMN function. The current CMN function is updated based on a current audio input to produce an updated CMN function. The updated CMN function is used to process the current audio input to produce a processed audio input. Automatic speech recognition of the processed audio input is performed to determine representative text. If the audio input is not recognized as representative text, the updated CMN function is replaced with the previous CMN function.Type: GrantFiled: June 13, 2012Date of Patent: July 1, 2014Assignee: Nuance Communications, Inc.Inventors: Yun Tang, Venkatesh Nagesha
-
Publication number: 20130339014Abstract: A computer-implemented arrangement is described for performing cepstral mean normalization (CMN) in automatic speech recognition. A current CMN function is stored in a computer memory as a previous CMN function. The current CMN function is updated based on a current audio input to produce an updated CMN function. The updated CMN function is used to process the current audio input to produce a processed audio input. Automatic speech recognition of the processed audio input is performed to determine representative text. If the audio input is not recognized as representative text, the updated CMN function is replaced with the previous CMN function.Type: ApplicationFiled: June 13, 2012Publication date: December 19, 2013Applicant: NUANCE COMMUNICATIONS, INC.Inventors: Yun Tang, Venkatesh Nagesha
-
Publication number: 20130317820Abstract: An automatic speech recognition dictation application is described that includes a dictation module for performing automatic speech recognition in a dictation session with a speaker user to determine representative text corresponding to input speech from the speaker user. A post-processing module develops a session level metric correlated to verbatim recognition error rate of the dictation session, and determines if recognition performance degraded during the dictation session based on a comparison of the session metric to a baseline metric.Type: ApplicationFiled: May 24, 2012Publication date: November 28, 2013Applicant: NUANCE COMMUNICATIONS, INC.Inventors: Xiaoqiang Xiao, Venkatesh Nagesha
-
Publication number: 20120259627Abstract: A method for speech recognition is described that uses an initial recognizer to perform an initial speech recognition pass on an input speech utterance to determine an initial recognition result corresponding to the input speech utterance, and a reliability measure reflecting a per word reliability of the initial recognition result. For portions of the initial recognition result where the reliability of the result is low, a re-evaluation recognizer is used to perform a re-evaluation recognition pass on the corresponding portions of the input speech utterance to determine a re-evaluation recognition result corresponding to the re-evaluated portions of the input speech utterance. The initial recognizer and the re-evaluation recognizer are complementary so as to make different recognition errors. A final recognition result is determined based on the re-evaluation recognition result if any, and otherwise based on the initial recognition result.Type: ApplicationFiled: May 27, 2010Publication date: October 11, 2012Applicant: NUANCE COMMUNICATIONS, INC.Inventors: Daniel Willett, Venkatesh Nagesha
-
Patent number: 6151575Abstract: A source-adapted model for use in speech recognition is generated by defining a linear relationship between a first element of an initial model and a first element of the source-adapted model. Thereafter, speech data that corresponds to the first element of the initial model is assembled from a set of speech data for a particular source associated with the source-adapted model. A linear transform that maps between the assembled speech data and the first element of the initial model is then determined. Finally, a first element of the source-adapted model is produced from the first element of the initial model using the linear transform.Type: GrantFiled: October 28, 1997Date of Patent: November 21, 2000Assignee: Dragon Systems, Inc.Inventors: Michael Jack Newman, Laurence S. Gillick, Venkatesh Nagesha