Patents by Inventor Venkatesh Nagesha

Venkatesh Nagesha has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Inverse text normalization for automatic speech recognition

Patent number: 10592604

Abstract: Techniques for inverse text normalization are provided. In some examples, speech input is received and a spoken-form text representation of the speech input is generated. The spoken-form text representation includes a token sequence. A feature representation is determined for the spoken-form text representation and a sequence of labels is determined based on the feature representation. The sequence of labels is assigned to the token sequence and specifies a plurality of edit operations to perform on the token sequence. Each edit operation of the plurality of edit operations corresponds to one of a plurality of predetermined types of edit operations. A written-form text representation of the speech input is generated by applying the plurality of edit operations to the token sequence in accordance with the sequence of labels. A task responsive to the speech input is performed using the generated written-form text representation.

Type: Grant

Filed: June 29, 2018

Date of Patent: March 17, 2020

Assignee: Apple Inc.

Inventors: Ernest J. Pusateri, Bharat Ram Ambati, Elizabeth S. Brooks, Donald R. McAllaster, Venkatesh Nagesha, Ondrej Platek
INVERSE TEXT NORMALIZATION FOR AUTOMATIC SPEECH RECOGNITION

Publication number: 20190278841

Abstract: Techniques for inverse text normalization are provided. In some examples, speech input is received and a spoken-form text representation of the speech input is generated. The spoken-form text representation includes a token sequence. A feature representation is determined for the spoken-form text representation and a sequence of labels is determined based on the feature representation. The sequence of labels is assigned to the token sequence and specifies a plurality of edit operations to perform on the token sequence. Each edit operation of the plurality of edit operations corresponds to one of a plurality of predetermined types of edit operations. A written-form text representation of the speech input is generated by applying the plurality of edit operations to the token sequence in accordance with the sequence of labels. A task responsive to the speech input is performed using the generated written-form text representation.

Type: Application

Filed: June 29, 2018

Publication date: September 12, 2019

Inventors: Ernest J. PUSATERI, Bharat Ram AMBATI, Elizabeth S. BROOKS, Donald R. MCALLASTER, Venkatesh NAGESHA, Ondrej PLATEK
Methods and apparatus for training a transformation component

Patent number: 10062374

Abstract: According to some aspects, a method of training a transformation component using a trained acoustic model comprising first parameters having respective first values established during training of the acoustic model using first training data is provided. The method comprises using at least one computer processor to perform coupling the transformation component to a portion of the acoustic model, the transformation component comprising second parameters, and training the transformation component by determining, for the second parameters, respective second values using second training data input to the transformation component and processed by the acoustic model, wherein the acoustic model retains the first parameters having the respective first values throughout training of the transformation component.

Type: Grant

Filed: July 18, 2014

Date of Patent: August 28, 2018

Assignee: Nuance Communications, Inc.

Inventors: Xiaoqiang Xiao, Chengyuan Ma, Venkatesh Nagesha
Correction menu enrichment with alternate choices and generation of choice lists in multi-pass recognition systems

Patent number: 9858038

Abstract: A method is described for user correction of speech recognition results. A speech recognition result for a given unknown speech input is displayed to a user. A user selection is received of a portion of the recognition result needing to be corrected. For each of multiple different recognition data sources, a ranked list of alternate recognition choices is determined which correspond to the selected portion. The alternate recognition choices are concatenated or interleaved together and duplicate choices removed to form a single ranked output list of alternate recognition choices, which is displayed to the user. The method may be adaptive over time to derive preferences that can then be leveraged in the ordering of one choice list or across choice lists.

Type: Grant

Filed: February 1, 2013

Date of Patent: January 2, 2018

Assignee: Nuance Communications, Inc.

Inventors: Olivier Divay, Joev Dubach, Venkatesh Nagesha, Allan Gold
Method and apparatus for speech recognition using neural networks with speaker adaptation

Patent number: 9721561

Abstract: In a speech recognition system, deep neural networks (DNNs) are employed in phoneme recognition. While DNNs typically provide better phoneme recognition performance than other techniques, such as Gaussian mixture models (GMM), adapting a DNN to a particular speaker is a real challenge. According to at least one example embodiment, speech data and corresponding speaker data are both applied as input to a DNN. In response, the DNN generates a prediction of a phoneme based on the input speech data and the corresponding speaker data. The speaker data may be generated from the corresponding speech data.

Type: Grant

Filed: December 5, 2013

Date of Patent: August 1, 2017

Assignee: Nuance Communications, Inc.

Inventors: Yun Tang, Venkatesh Nagesha, Xing Fan
Automatic methods to predict error rates and detect performance degradation

Patent number: 9269349

Abstract: An automatic speech recognition dictation application is described that includes a dictation module for performing automatic speech recognition in a dictation session with a speaker user to determine representative text corresponding to input speech from the speaker user. A post-processing module develops a session level metric correlated to verbatim recognition error rate of the dictation session, and determines if recognition performance degraded during the dictation session based on a comparison of the session metric to a baseline metric.

Type: Grant

Filed: May 24, 2012

Date of Patent: February 23, 2016

Assignee: Nuance Communications, Inc.

Inventors: Xiaoqiang Xiao, Venkatesh Nagesha
METHODS AND APPARATUS FOR TRAINING A TRANSFORMATION COMPONENT

Publication number: 20160019884

Abstract: According to some aspects, a method of training a transformation component using a trained acoustic model comprising first parameters having respective first values established during training of the acoustic model using first training data is provided. The method comprises using at least one computer processor to perform coupling the transformation component to a portion of the acoustic model, the transformation component comprising second parameters, and training the transformation component by determining, for the second parameters, respective second values using second training data input to the transformation component and processed by the acoustic model, wherein the acoustic model retains the first parameters having the respective first values throughout training of the transformation component.

Type: Application

Filed: July 18, 2014

Publication date: January 21, 2016

Inventors: Xiaoqiang Xiao, Chengyuan Ma, Venkatesh Nagesha
Method and Apparatus for Speech Recognition Using Neural Networks with Speaker Adaptation

Publication number: 20150161994

Abstract: In a speech recognition system, deep neural networks (DNNs) are employed in phoneme recognition. While DNNs typically provide better phoneme recognition performance than other techniques, such as Gaussian mixture models (GMM), adapting a DNN to a particular speaker is a real challenge. According to at least one example embodiment, speech data and corresponding speaker data are both applied as input to a DNN. In response, the DNN generates a prediction of a phoneme based on the input speech data and the corresponding speaker data. The speaker data may be generated from the corresponding speech data.

Type: Application

Filed: December 5, 2013

Publication date: June 11, 2015

Applicant: Nuance Communications, Inc.

Inventors: Yun Tang, Venkatesh Nagesha, Xing Fan
Efficient exploitation of model complementariness by low confidence re-scoring in automatic speech recognition

Patent number: 9037463

Abstract: A method for speech recognition is described that uses an initial recognizer to perform an initial speech recognition pass on an input speech utterance to determine an initial recognition result corresponding to the input speech utterance, and a reliability measure reflecting a per word reliability of the initial recognition result. For portions of the initial recognition result where the reliability of the result is low, a re-evaluation recognizer is used to perform a re-evaluation recognition pass on the corresponding portions of the input speech utterance to determine a re-evaluation recognition result corresponding to the re-evaluated portions of the input speech utterance. The initial recognizer and the re-evaluation recognizer are complementary so as to make different recognition errors. A final recognition result is determined based on the re-evaluation recognition result if any, and otherwise based on the initial recognition result.

Type: Grant

Filed: May 27, 2010

Date of Patent: May 19, 2015

Assignee: Nuance Communications, Inc.

Inventors: Daniel Willett, Venkatesh Nagesha
Correction Menu Enrichment with Alternate Choices and Generation of Choice Lists in Multi-Pass Recognition Systems

Publication number: 20140223310

Abstract: A method is described for user correction of speech recognition results. A speech recognition result for a given unknown speech input is displayed to a user. A user selection is received of a portion of the recognition result needing to be corrected. For each of multiple different recognition data sources, a ranked list of alternate recognition choices is determined which correspond to the selected portion. The alternate recognition choices are concatenated or interleaved together and duplicate choices removed to form a single ranked output list of alternate recognition choices, which is displayed to the user. The method may be adaptive over time to derive preferences that can then be leveraged in the ordering of one choice list or across choice lists.

Type: Application

Filed: February 1, 2013

Publication date: August 7, 2014

Applicant: Nuance Communications, Inc.

Inventors: Olivier Divay, Joev Dubach, Venkatesh Nagesha, Allan Gold
Channel normalization using recognition feedback

Patent number: 8768695

Abstract: A computer-implemented arrangement is described for performing cepstral mean normalization (CMN) in automatic speech recognition. A current CMN function is stored in a computer memory as a previous CMN function. The current CMN function is updated based on a current audio input to produce an updated CMN function. The updated CMN function is used to process the current audio input to produce a processed audio input. Automatic speech recognition of the processed audio input is performed to determine representative text. If the audio input is not recognized as representative text, the updated CMN function is replaced with the previous CMN function.

Type: Grant

Filed: June 13, 2012

Date of Patent: July 1, 2014

Assignee: Nuance Communications, Inc.

Inventors: Yun Tang, Venkatesh Nagesha
Channel Normalization Using Recognition Feedback

Publication number: 20130339014

Abstract: A computer-implemented arrangement is described for performing cepstral mean normalization (CMN) in automatic speech recognition. A current CMN function is stored in a computer memory as a previous CMN function. The current CMN function is updated based on a current audio input to produce an updated CMN function. The updated CMN function is used to process the current audio input to produce a processed audio input. Automatic speech recognition of the processed audio input is performed to determine representative text. If the audio input is not recognized as representative text, the updated CMN function is replaced with the previous CMN function.

Type: Application

Filed: June 13, 2012

Publication date: December 19, 2013

Applicant: NUANCE COMMUNICATIONS, INC.

Inventors: Yun Tang, Venkatesh Nagesha
Automatic Methods to Predict Error Rates and Detect Performance Degradation

Publication number: 20130317820

Abstract: An automatic speech recognition dictation application is described that includes a dictation module for performing automatic speech recognition in a dictation session with a speaker user to determine representative text corresponding to input speech from the speaker user. A post-processing module develops a session level metric correlated to verbatim recognition error rate of the dictation session, and determines if recognition performance degraded during the dictation session based on a comparison of the session metric to a baseline metric.

Type: Application

Filed: May 24, 2012

Publication date: November 28, 2013

Applicant: NUANCE COMMUNICATIONS, INC.

Inventors: Xiaoqiang Xiao, Venkatesh Nagesha
Efficient Exploitation of Model Complementariness by Low Confidence Re-Scoring in Automatic Speech Recognition

Publication number: 20120259627

Abstract: A method for speech recognition is described that uses an initial recognizer to perform an initial speech recognition pass on an input speech utterance to determine an initial recognition result corresponding to the input speech utterance, and a reliability measure reflecting a per word reliability of the initial recognition result. For portions of the initial recognition result where the reliability of the result is low, a re-evaluation recognizer is used to perform a re-evaluation recognition pass on the corresponding portions of the input speech utterance to determine a re-evaluation recognition result corresponding to the re-evaluated portions of the input speech utterance. The initial recognizer and the re-evaluation recognizer are complementary so as to make different recognition errors. A final recognition result is determined based on the re-evaluation recognition result if any, and otherwise based on the initial recognition result.

Type: Application

Filed: May 27, 2010

Publication date: October 11, 2012

Applicant: NUANCE COMMUNICATIONS, INC.

Inventors: Daniel Willett, Venkatesh Nagesha
Rapid adaptation of speech models

Patent number: 6151575

Abstract: A source-adapted model for use in speech recognition is generated by defining a linear relationship between a first element of an initial model and a first element of the source-adapted model. Thereafter, speech data that corresponds to the first element of the initial model is assembled from a set of speech data for a particular source associated with the source-adapted model. A linear transform that maps between the assembled speech data and the first element of the initial model is then determined. Finally, a first element of the source-adapted model is produced from the first element of the initial model using the linear transform.

Type: Grant

Filed: October 28, 1997

Date of Patent: November 21, 2000

Assignee: Dragon Systems, Inc.

Inventors: Michael Jack Newman, Laurence S. Gillick, Venkatesh Nagesha