Patents by Inventor Rupeshkumar Rasiklal MEHTA

Rupeshkumar Rasiklal MEHTA has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11978434
    Abstract: A computer-implemented technique identifies terms in an original reference transcription and original ASR output results that are considered valid variants of each other, even though these terms have different textual forms. Based on this finding, the technique produces a normalized reference transcription and normalized ASR output results in which valid variants are assigned the same textual form. In some implementations, the technique uses the normalized text to develop a model for an ASR system. For example, the technique may generate a word error rate (WER) measure by comparing the normalized reference transcription with the normalized ASR output results, and use the WER measure as guidance in developing the model. Some aspects of the technique involve identifying occasions in which a term can be properly split into component parts. Other aspects can identify other ways in which two terms may vary in spelling, but nonetheless remain valid variants.
    Type: Grant
    Filed: September 29, 2021
    Date of Patent: May 7, 2024
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Satarupa Guha, Ankur Gupta, Rahul Ambavat, Rupeshkumar Rasiklal Mehta
  • Patent number: 11972758
    Abstract: A training-stage technique trains a language model for use in an ASR system. The technique includes: obtaining a training corpus that includes a sequence of terms; determining that an original term in the training corpus is not present in a dictionary resource; segmenting the original term into two or more sub-terms using a segmentation resource; determining that the segmentation of the original term into the two or more sub-terms is a valid segmentation, based on two or more validity tests; and training the language model based on the terms that have been identified. A computer-implemented inference-stage technique applies the language model to produce ASR output results. The inference-stage technique merges a sub-term with a preceding term if these two terms are separated by no more than a prescribed interval of time.
    Type: Grant
    Filed: September 29, 2021
    Date of Patent: April 30, 2024
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Rahul Ambavat, Ankur Gupta, Rupeshkumar Rasiklal Mehta
  • Publication number: 20240005912
    Abstract: Systems, methods, and computer-readable storage devices are disclosed for improved recognition of multiple languages in audio data. One method including: receiving a trained split head multilingual neural network model, the trained split head multilingual neural network model including shared acoustic model layers and a plurality of projection layers, each projection layer of the plurality of projection layers corresponding to a language that the trained split head multilingual neural network model recognizes; receiving audio data, the audio data including speech in a plurality of languages in the audio data, the speech in the plurality of languages corresponding the language recognized by a projection layer of the plurality of projection layers of the trained split head multilingual neural network model; and classifying one or more languages of the speech of the audio data using the trained split head multilingual neural network model.
    Type: Application
    Filed: June 29, 2022
    Publication date: January 4, 2024
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Purvi AGRAWAL, Vikas JOSHI, Basil ABRAHAM, Tejaswi SEERAM, Rupeshkumar Rasiklal MEHTA
  • Publication number: 20230290345
    Abstract: An automatic speech recognition (ASR) system recognizes speech expressed in different languages. The ASR system includes a language-agnostic encoding component and prediction component. A language-specific joint analysis system generates first-language probabilities for symbols of a first language and second-language probabilities for symbols of a second language, based on outputs generated by the encoding component and the prediction component. The ASR system then modifies the probabilities produced by the joint analysis system by language-specific weighting information that, in turn, is produced by an attention system. This yields modified first-language probabilities and modified second-language probabilities. Finally, the ASR system predicts an updated instance of label information based on the modified first-language probabilities and the modified second-language probabilities. The ASR system can be successfully applied to recognize an utterance that combines words in two or more languages.
    Type: Application
    Filed: March 8, 2022
    Publication date: September 14, 2023
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Vikas JOSHI, Purvi AGRAWAL, Rupeshkumar Rasiklal MEHTA, Aditya Rajesh PATIL
  • Publication number: 20230111824
    Abstract: A text to speech (TTS) model is trained based on training data including text samples. The text samples are provided to a text embedding model for outputting text embeddings for the text samples. The text embeddings are clustered into several clusters of text embeddings. The several clusters are representative of variations in emotion. The TTS model is then trained based upon the several clusters of text embeddings. Upon being trained, the TTS model is configured to receive text input and output a spoken utterance that corresponds to the text input. The TTS model is configured to output the spoken utterance with emotion. The emotion is based upon the text input and the training of the TTS model.
    Type: Application
    Filed: February 22, 2022
    Publication date: April 13, 2023
    Inventors: Arijit MUKHERJEE, Shubham BANSAL, Sandeepkumar SATPAL, Rupeshkumar Rasiklal MEHTA
  • Publication number: 20230102338
    Abstract: A training-stage technique trains a language model for use in an ASR system. The technique includes: obtaining a training corpus that includes a sequence of terms; determining that an original term in the training corpus is not present in a dictionary resource; segmenting the original term into two or more sub-terms using a segmentation resource; determining that the segmentation of the original term into the two or more sub-terms is a valid segmentation, based on two or more validity tests; and training the language model based on the terms that have been identified. A computer-implemented inference-stage technique applies the language model to produce ASR output results. The inference-stage technique merges a sub-term with a preceding term if these two terms are separated by no more than a prescribed interval of time.
    Type: Application
    Filed: September 29, 2021
    Publication date: March 30, 2023
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Rahul AMBAVAT, Ankur GUPTA, Rupeshkumar Rasiklal MEHTA
  • Publication number: 20230099732
    Abstract: A computing system obtains text that includes words and provides the text as input to an emotional classifier model that has been trained based upon emotional classification. The computing system obtains a textual embedding of the computer-readable text as output of the emotional classifier model. The computing system generates a phoneme sequence based upon the words of the text. The computing system, generates, by way of an encoder of a text to speech (TTS) model, a phoneme encoding based upon the phoneme sequence. The computing system provides the textual embedding and the phoneme encoding as input to a decoder of the TTS model. The computing system causes speech that includes the words to be played over a speaker based upon output of the decoder of the TTS model, where the speech reflects an emotion underlying the text due to the textual embedding provided to the encoder.
    Type: Application
    Filed: November 11, 2021
    Publication date: March 30, 2023
    Inventors: Arijit MUKHERJEE, Shubham BANSAL, Sandeepkumar SATPAL, Rupeshkumar Rasiklal MEHTA
  • Publication number: 20230094511
    Abstract: A computer-implemented technique identifies terms in an original reference transcription and original ASR output results that are considered valid variants of each other, even though these terms have different textual forms. Based on this finding, the technique produces a normalized reference transcription and normalized ASR output results in which valid variants are assigned the same textual form. In some implementations, the technique uses the normalized text to develop a model for an ASR system. For example, the technique may generate a word error rate (WER) measure by comparing the normalized reference transcription with the normalized ASR output results, and use the WER measure as guidance in developing the model. Some aspects of the technique involve identifying occasions in which a term can be properly split into component parts. Other aspects can identify other ways in which two terms may vary in spelling, but nonetheless remain valid variants.
    Type: Application
    Filed: September 29, 2021
    Publication date: March 30, 2023
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Satarupa GUHA, Ankur GUPTA, Rahul AMBAVAT, Rupeshkumar Rasiklal MEHTA
  • Publication number: 20220358910
    Abstract: A computing system obtains features that have been extracted from an acoustic signal, where the acoustic signal comprises spoken words uttered by a user. The computing system performs automatic speech recognition (ASR) based upon the features and a language model (LM) generated based upon expanded pattern data. The expanded pattern data includes a name of an entity and a search term, where the entity belongs to a segment identified in a knowledge base. The search term has been included in queries for entities belonging to the segment. The computing system identifies a sequence of words corresponding to the features based upon results of the ASR. The computing system transmits computer-readable text to a search engine, where the text includes the sequence of words.
    Type: Application
    Filed: May 6, 2021
    Publication date: November 10, 2022
    Inventors: Ankur GUPTA, Satarupa GUHA, Rupeshkumar Rasiklal MEHTA, Issac John ALPHONSO, Anastasios ANASTASAKOS, Shuangyu CHANG