Patents by Inventor Venkatesh Nagesha

Venkatesh Nagesha has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10592604
    Abstract: Techniques for inverse text normalization are provided. In some examples, speech input is received and a spoken-form text representation of the speech input is generated. The spoken-form text representation includes a token sequence. A feature representation is determined for the spoken-form text representation and a sequence of labels is determined based on the feature representation. The sequence of labels is assigned to the token sequence and specifies a plurality of edit operations to perform on the token sequence. Each edit operation of the plurality of edit operations corresponds to one of a plurality of predetermined types of edit operations. A written-form text representation of the speech input is generated by applying the plurality of edit operations to the token sequence in accordance with the sequence of labels. A task responsive to the speech input is performed using the generated written-form text representation.
    Type: Grant
    Filed: June 29, 2018
    Date of Patent: March 17, 2020
    Assignee: Apple Inc.
    Inventors: Ernest J. Pusateri, Bharat Ram Ambati, Elizabeth S. Brooks, Donald R. McAllaster, Venkatesh Nagesha, Ondrej Platek
  • Publication number: 20190278841
    Abstract: Techniques for inverse text normalization are provided. In some examples, speech input is received and a spoken-form text representation of the speech input is generated. The spoken-form text representation includes a token sequence. A feature representation is determined for the spoken-form text representation and a sequence of labels is determined based on the feature representation. The sequence of labels is assigned to the token sequence and specifies a plurality of edit operations to perform on the token sequence. Each edit operation of the plurality of edit operations corresponds to one of a plurality of predetermined types of edit operations. A written-form text representation of the speech input is generated by applying the plurality of edit operations to the token sequence in accordance with the sequence of labels. A task responsive to the speech input is performed using the generated written-form text representation.
    Type: Application
    Filed: June 29, 2018
    Publication date: September 12, 2019
    Inventors: Ernest J. PUSATERI, Bharat Ram AMBATI, Elizabeth S. BROOKS, Donald R. MCALLASTER, Venkatesh NAGESHA, Ondrej PLATEK
  • Patent number: 10062374
    Abstract: According to some aspects, a method of training a transformation component using a trained acoustic model comprising first parameters having respective first values established during training of the acoustic model using first training data is provided. The method comprises using at least one computer processor to perform coupling the transformation component to a portion of the acoustic model, the transformation component comprising second parameters, and training the transformation component by determining, for the second parameters, respective second values using second training data input to the transformation component and processed by the acoustic model, wherein the acoustic model retains the first parameters having the respective first values throughout training of the transformation component.
    Type: Grant
    Filed: July 18, 2014
    Date of Patent: August 28, 2018
    Assignee: Nuance Communications, Inc.
    Inventors: Xiaoqiang Xiao, Chengyuan Ma, Venkatesh Nagesha
  • Patent number: 9858038
    Abstract: A method is described for user correction of speech recognition results. A speech recognition result for a given unknown speech input is displayed to a user. A user selection is received of a portion of the recognition result needing to be corrected. For each of multiple different recognition data sources, a ranked list of alternate recognition choices is determined which correspond to the selected portion. The alternate recognition choices are concatenated or interleaved together and duplicate choices removed to form a single ranked output list of alternate recognition choices, which is displayed to the user. The method may be adaptive over time to derive preferences that can then be leveraged in the ordering of one choice list or across choice lists.
    Type: Grant
    Filed: February 1, 2013
    Date of Patent: January 2, 2018
    Assignee: Nuance Communications, Inc.
    Inventors: Olivier Divay, Joev Dubach, Venkatesh Nagesha, Allan Gold
  • Patent number: 9721561
    Abstract: In a speech recognition system, deep neural networks (DNNs) are employed in phoneme recognition. While DNNs typically provide better phoneme recognition performance than other techniques, such as Gaussian mixture models (GMM), adapting a DNN to a particular speaker is a real challenge. According to at least one example embodiment, speech data and corresponding speaker data are both applied as input to a DNN. In response, the DNN generates a prediction of a phoneme based on the input speech data and the corresponding speaker data. The speaker data may be generated from the corresponding speech data.
    Type: Grant
    Filed: December 5, 2013
    Date of Patent: August 1, 2017
    Assignee: Nuance Communications, Inc.
    Inventors: Yun Tang, Venkatesh Nagesha, Xing Fan
  • Patent number: 9269349
    Abstract: An automatic speech recognition dictation application is described that includes a dictation module for performing automatic speech recognition in a dictation session with a speaker user to determine representative text corresponding to input speech from the speaker user. A post-processing module develops a session level metric correlated to verbatim recognition error rate of the dictation session, and determines if recognition performance degraded during the dictation session based on a comparison of the session metric to a baseline metric.
    Type: Grant
    Filed: May 24, 2012
    Date of Patent: February 23, 2016
    Assignee: Nuance Communications, Inc.
    Inventors: Xiaoqiang Xiao, Venkatesh Nagesha
  • Publication number: 20160019884
    Abstract: According to some aspects, a method of training a transformation component using a trained acoustic model comprising first parameters having respective first values established during training of the acoustic model using first training data is provided. The method comprises using at least one computer processor to perform coupling the transformation component to a portion of the acoustic model, the transformation component comprising second parameters, and training the transformation component by determining, for the second parameters, respective second values using second training data input to the transformation component and processed by the acoustic model, wherein the acoustic model retains the first parameters having the respective first values throughout training of the transformation component.
    Type: Application
    Filed: July 18, 2014
    Publication date: January 21, 2016
    Inventors: Xiaoqiang Xiao, Chengyuan Ma, Venkatesh Nagesha
  • Publication number: 20150161994
    Abstract: In a speech recognition system, deep neural networks (DNNs) are employed in phoneme recognition. While DNNs typically provide better phoneme recognition performance than other techniques, such as Gaussian mixture models (GMM), adapting a DNN to a particular speaker is a real challenge. According to at least one example embodiment, speech data and corresponding speaker data are both applied as input to a DNN. In response, the DNN generates a prediction of a phoneme based on the input speech data and the corresponding speaker data. The speaker data may be generated from the corresponding speech data.
    Type: Application
    Filed: December 5, 2013
    Publication date: June 11, 2015
    Applicant: Nuance Communications, Inc.
    Inventors: Yun Tang, Venkatesh Nagesha, Xing Fan
  • Patent number: 9037463
    Abstract: A method for speech recognition is described that uses an initial recognizer to perform an initial speech recognition pass on an input speech utterance to determine an initial recognition result corresponding to the input speech utterance, and a reliability measure reflecting a per word reliability of the initial recognition result. For portions of the initial recognition result where the reliability of the result is low, a re-evaluation recognizer is used to perform a re-evaluation recognition pass on the corresponding portions of the input speech utterance to determine a re-evaluation recognition result corresponding to the re-evaluated portions of the input speech utterance. The initial recognizer and the re-evaluation recognizer are complementary so as to make different recognition errors. A final recognition result is determined based on the re-evaluation recognition result if any, and otherwise based on the initial recognition result.
    Type: Grant
    Filed: May 27, 2010
    Date of Patent: May 19, 2015
    Assignee: Nuance Communications, Inc.
    Inventors: Daniel Willett, Venkatesh Nagesha
  • Publication number: 20140223310
    Abstract: A method is described for user correction of speech recognition results. A speech recognition result for a given unknown speech input is displayed to a user. A user selection is received of a portion of the recognition result needing to be corrected. For each of multiple different recognition data sources, a ranked list of alternate recognition choices is determined which correspond to the selected portion. The alternate recognition choices are concatenated or interleaved together and duplicate choices removed to form a single ranked output list of alternate recognition choices, which is displayed to the user. The method may be adaptive over time to derive preferences that can then be leveraged in the ordering of one choice list or across choice lists.
    Type: Application
    Filed: February 1, 2013
    Publication date: August 7, 2014
    Applicant: Nuance Communications, Inc.
    Inventors: Olivier Divay, Joev Dubach, Venkatesh Nagesha, Allan Gold
  • Patent number: 8768695
    Abstract: A computer-implemented arrangement is described for performing cepstral mean normalization (CMN) in automatic speech recognition. A current CMN function is stored in a computer memory as a previous CMN function. The current CMN function is updated based on a current audio input to produce an updated CMN function. The updated CMN function is used to process the current audio input to produce a processed audio input. Automatic speech recognition of the processed audio input is performed to determine representative text. If the audio input is not recognized as representative text, the updated CMN function is replaced with the previous CMN function.
    Type: Grant
    Filed: June 13, 2012
    Date of Patent: July 1, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Yun Tang, Venkatesh Nagesha
  • Publication number: 20130339014
    Abstract: A computer-implemented arrangement is described for performing cepstral mean normalization (CMN) in automatic speech recognition. A current CMN function is stored in a computer memory as a previous CMN function. The current CMN function is updated based on a current audio input to produce an updated CMN function. The updated CMN function is used to process the current audio input to produce a processed audio input. Automatic speech recognition of the processed audio input is performed to determine representative text. If the audio input is not recognized as representative text, the updated CMN function is replaced with the previous CMN function.
    Type: Application
    Filed: June 13, 2012
    Publication date: December 19, 2013
    Applicant: NUANCE COMMUNICATIONS, INC.
    Inventors: Yun Tang, Venkatesh Nagesha
  • Publication number: 20130317820
    Abstract: An automatic speech recognition dictation application is described that includes a dictation module for performing automatic speech recognition in a dictation session with a speaker user to determine representative text corresponding to input speech from the speaker user. A post-processing module develops a session level metric correlated to verbatim recognition error rate of the dictation session, and determines if recognition performance degraded during the dictation session based on a comparison of the session metric to a baseline metric.
    Type: Application
    Filed: May 24, 2012
    Publication date: November 28, 2013
    Applicant: NUANCE COMMUNICATIONS, INC.
    Inventors: Xiaoqiang Xiao, Venkatesh Nagesha
  • Publication number: 20120259627
    Abstract: A method for speech recognition is described that uses an initial recognizer to perform an initial speech recognition pass on an input speech utterance to determine an initial recognition result corresponding to the input speech utterance, and a reliability measure reflecting a per word reliability of the initial recognition result. For portions of the initial recognition result where the reliability of the result is low, a re-evaluation recognizer is used to perform a re-evaluation recognition pass on the corresponding portions of the input speech utterance to determine a re-evaluation recognition result corresponding to the re-evaluated portions of the input speech utterance. The initial recognizer and the re-evaluation recognizer are complementary so as to make different recognition errors. A final recognition result is determined based on the re-evaluation recognition result if any, and otherwise based on the initial recognition result.
    Type: Application
    Filed: May 27, 2010
    Publication date: October 11, 2012
    Applicant: NUANCE COMMUNICATIONS, INC.
    Inventors: Daniel Willett, Venkatesh Nagesha
  • Patent number: 6151575
    Abstract: A source-adapted model for use in speech recognition is generated by defining a linear relationship between a first element of an initial model and a first element of the source-adapted model. Thereafter, speech data that corresponds to the first element of the initial model is assembled from a set of speech data for a particular source associated with the source-adapted model. A linear transform that maps between the assembled speech data and the first element of the initial model is then determined. Finally, a first element of the source-adapted model is produced from the first element of the initial model using the linear transform.
    Type: Grant
    Filed: October 28, 1997
    Date of Patent: November 21, 2000
    Assignee: Dragon Systems, Inc.
    Inventors: Michael Jack Newman, Laurence S. Gillick, Venkatesh Nagesha