Patents by Inventor Rupeshkumar Rasiklal MEHTA
Rupeshkumar Rasiklal MEHTA has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11978434Abstract: A computer-implemented technique identifies terms in an original reference transcription and original ASR output results that are considered valid variants of each other, even though these terms have different textual forms. Based on this finding, the technique produces a normalized reference transcription and normalized ASR output results in which valid variants are assigned the same textual form. In some implementations, the technique uses the normalized text to develop a model for an ASR system. For example, the technique may generate a word error rate (WER) measure by comparing the normalized reference transcription with the normalized ASR output results, and use the WER measure as guidance in developing the model. Some aspects of the technique involve identifying occasions in which a term can be properly split into component parts. Other aspects can identify other ways in which two terms may vary in spelling, but nonetheless remain valid variants.Type: GrantFiled: September 29, 2021Date of Patent: May 7, 2024Assignee: Microsoft Technology Licensing, LLCInventors: Satarupa Guha, Ankur Gupta, Rahul Ambavat, Rupeshkumar Rasiklal Mehta
-
Patent number: 11972758Abstract: A training-stage technique trains a language model for use in an ASR system. The technique includes: obtaining a training corpus that includes a sequence of terms; determining that an original term in the training corpus is not present in a dictionary resource; segmenting the original term into two or more sub-terms using a segmentation resource; determining that the segmentation of the original term into the two or more sub-terms is a valid segmentation, based on two or more validity tests; and training the language model based on the terms that have been identified. A computer-implemented inference-stage technique applies the language model to produce ASR output results. The inference-stage technique merges a sub-term with a preceding term if these two terms are separated by no more than a prescribed interval of time.Type: GrantFiled: September 29, 2021Date of Patent: April 30, 2024Assignee: Microsoft Technology Licensing, LLCInventors: Rahul Ambavat, Ankur Gupta, Rupeshkumar Rasiklal Mehta
-
Publication number: 20240005912Abstract: Systems, methods, and computer-readable storage devices are disclosed for improved recognition of multiple languages in audio data. One method including: receiving a trained split head multilingual neural network model, the trained split head multilingual neural network model including shared acoustic model layers and a plurality of projection layers, each projection layer of the plurality of projection layers corresponding to a language that the trained split head multilingual neural network model recognizes; receiving audio data, the audio data including speech in a plurality of languages in the audio data, the speech in the plurality of languages corresponding the language recognized by a projection layer of the plurality of projection layers of the trained split head multilingual neural network model; and classifying one or more languages of the speech of the audio data using the trained split head multilingual neural network model.Type: ApplicationFiled: June 29, 2022Publication date: January 4, 2024Applicant: Microsoft Technology Licensing, LLCInventors: Purvi AGRAWAL, Vikas JOSHI, Basil ABRAHAM, Tejaswi SEERAM, Rupeshkumar Rasiklal MEHTA
-
Publication number: 20230290345Abstract: An automatic speech recognition (ASR) system recognizes speech expressed in different languages. The ASR system includes a language-agnostic encoding component and prediction component. A language-specific joint analysis system generates first-language probabilities for symbols of a first language and second-language probabilities for symbols of a second language, based on outputs generated by the encoding component and the prediction component. The ASR system then modifies the probabilities produced by the joint analysis system by language-specific weighting information that, in turn, is produced by an attention system. This yields modified first-language probabilities and modified second-language probabilities. Finally, the ASR system predicts an updated instance of label information based on the modified first-language probabilities and the modified second-language probabilities. The ASR system can be successfully applied to recognize an utterance that combines words in two or more languages.Type: ApplicationFiled: March 8, 2022Publication date: September 14, 2023Applicant: Microsoft Technology Licensing, LLCInventors: Vikas JOSHI, Purvi AGRAWAL, Rupeshkumar Rasiklal MEHTA, Aditya Rajesh PATIL
-
Publication number: 20230111824Abstract: A text to speech (TTS) model is trained based on training data including text samples. The text samples are provided to a text embedding model for outputting text embeddings for the text samples. The text embeddings are clustered into several clusters of text embeddings. The several clusters are representative of variations in emotion. The TTS model is then trained based upon the several clusters of text embeddings. Upon being trained, the TTS model is configured to receive text input and output a spoken utterance that corresponds to the text input. The TTS model is configured to output the spoken utterance with emotion. The emotion is based upon the text input and the training of the TTS model.Type: ApplicationFiled: February 22, 2022Publication date: April 13, 2023Inventors: Arijit MUKHERJEE, Shubham BANSAL, Sandeepkumar SATPAL, Rupeshkumar Rasiklal MEHTA
-
Publication number: 20230102338Abstract: A training-stage technique trains a language model for use in an ASR system. The technique includes: obtaining a training corpus that includes a sequence of terms; determining that an original term in the training corpus is not present in a dictionary resource; segmenting the original term into two or more sub-terms using a segmentation resource; determining that the segmentation of the original term into the two or more sub-terms is a valid segmentation, based on two or more validity tests; and training the language model based on the terms that have been identified. A computer-implemented inference-stage technique applies the language model to produce ASR output results. The inference-stage technique merges a sub-term with a preceding term if these two terms are separated by no more than a prescribed interval of time.Type: ApplicationFiled: September 29, 2021Publication date: March 30, 2023Applicant: Microsoft Technology Licensing, LLCInventors: Rahul AMBAVAT, Ankur GUPTA, Rupeshkumar Rasiklal MEHTA
-
Publication number: 20230099732Abstract: A computing system obtains text that includes words and provides the text as input to an emotional classifier model that has been trained based upon emotional classification. The computing system obtains a textual embedding of the computer-readable text as output of the emotional classifier model. The computing system generates a phoneme sequence based upon the words of the text. The computing system, generates, by way of an encoder of a text to speech (TTS) model, a phoneme encoding based upon the phoneme sequence. The computing system provides the textual embedding and the phoneme encoding as input to a decoder of the TTS model. The computing system causes speech that includes the words to be played over a speaker based upon output of the decoder of the TTS model, where the speech reflects an emotion underlying the text due to the textual embedding provided to the encoder.Type: ApplicationFiled: November 11, 2021Publication date: March 30, 2023Inventors: Arijit MUKHERJEE, Shubham BANSAL, Sandeepkumar SATPAL, Rupeshkumar Rasiklal MEHTA
-
Publication number: 20230094511Abstract: A computer-implemented technique identifies terms in an original reference transcription and original ASR output results that are considered valid variants of each other, even though these terms have different textual forms. Based on this finding, the technique produces a normalized reference transcription and normalized ASR output results in which valid variants are assigned the same textual form. In some implementations, the technique uses the normalized text to develop a model for an ASR system. For example, the technique may generate a word error rate (WER) measure by comparing the normalized reference transcription with the normalized ASR output results, and use the WER measure as guidance in developing the model. Some aspects of the technique involve identifying occasions in which a term can be properly split into component parts. Other aspects can identify other ways in which two terms may vary in spelling, but nonetheless remain valid variants.Type: ApplicationFiled: September 29, 2021Publication date: March 30, 2023Applicant: Microsoft Technology Licensing, LLCInventors: Satarupa GUHA, Ankur GUPTA, Rahul AMBAVAT, Rupeshkumar Rasiklal MEHTA
-
Publication number: 20220358910Abstract: A computing system obtains features that have been extracted from an acoustic signal, where the acoustic signal comprises spoken words uttered by a user. The computing system performs automatic speech recognition (ASR) based upon the features and a language model (LM) generated based upon expanded pattern data. The expanded pattern data includes a name of an entity and a search term, where the entity belongs to a segment identified in a knowledge base. The search term has been included in queries for entities belonging to the segment. The computing system identifies a sequence of words corresponding to the features based upon results of the ASR. The computing system transmits computer-readable text to a search engine, where the text includes the sequence of words.Type: ApplicationFiled: May 6, 2021Publication date: November 10, 2022Inventors: Ankur GUPTA, Satarupa GUHA, Rupeshkumar Rasiklal MEHTA, Issac John ALPHONSO, Anastasios ANASTASAKOS, Shuangyu CHANG