Patents by Inventor Rupeshkumar Rasiklal MEHTA

Rupeshkumar Rasiklal MEHTA has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Developing an automatic speech recognition system using normalization

Patent number: 11978434

Abstract: A computer-implemented technique identifies terms in an original reference transcription and original ASR output results that are considered valid variants of each other, even though these terms have different textual forms. Based on this finding, the technique produces a normalized reference transcription and normalized ASR output results in which valid variants are assigned the same textual form. In some implementations, the technique uses the normalized text to develop a model for an ASR system. For example, the technique may generate a word error rate (WER) measure by comparing the normalized reference transcription with the normalized ASR output results, and use the WER measure as guidance in developing the model. Some aspects of the technique involve identifying occasions in which a term can be properly split into component parts. Other aspects can identify other ways in which two terms may vary in spelling, but nonetheless remain valid variants.

Type: Grant

Filed: September 29, 2021

Date of Patent: May 7, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Satarupa Guha, Ankur Gupta, Rahul Ambavat, Rupeshkumar Rasiklal Mehta
Enhancing ASR system performance for agglutinative languages

Patent number: 11972758

Abstract: A training-stage technique trains a language model for use in an ASR system. The technique includes: obtaining a training corpus that includes a sequence of terms; determining that an original term in the training corpus is not present in a dictionary resource; segmenting the original term into two or more sub-terms using a segmentation resource; determining that the segmentation of the original term into the two or more sub-terms is a valid segmentation, based on two or more validity tests; and training the language model based on the terms that have been identified. A computer-implemented inference-stage technique applies the language model to produce ASR output results. The inference-stage technique merges a sub-term with a preceding term if these two terms are separated by no more than a prescribed interval of time.

Type: Grant

Filed: September 29, 2021

Date of Patent: April 30, 2024

Assignee: Microsoft Technology Licensing, LLC

Inventors: Rahul Ambavat, Ankur Gupta, Rupeshkumar Rasiklal Mehta
ACOUSTIC MODEL FOR MULTILINGUAL SPEECH RECOGNITION

Publication number: 20240005912

Abstract: Systems, methods, and computer-readable storage devices are disclosed for improved recognition of multiple languages in audio data. One method including: receiving a trained split head multilingual neural network model, the trained split head multilingual neural network model including shared acoustic model layers and a plurality of projection layers, each projection layer of the plurality of projection layers corresponding to a language that the trained split head multilingual neural network model recognizes; receiving audio data, the audio data including speech in a plurality of languages in the audio data, the speech in the plurality of languages corresponding the language recognized by a projection layer of the plurality of projection layers of the trained split head multilingual neural network model; and classifying one or more languages of the speech of the audio data using the trained split head multilingual neural network model.

Type: Application

Filed: June 29, 2022

Publication date: January 4, 2024

Applicant: Microsoft Technology Licensing, LLC

Inventors: Purvi AGRAWAL, Vikas JOSHI, Basil ABRAHAM, Tejaswi SEERAM, Rupeshkumar Rasiklal MEHTA
Code-Mixed Speech Recognition Using Attention and Language-Specific Joint Analysis

Publication number: 20230290345

Abstract: An automatic speech recognition (ASR) system recognizes speech expressed in different languages. The ASR system includes a language-agnostic encoding component and prediction component. A language-specific joint analysis system generates first-language probabilities for symbols of a first language and second-language probabilities for symbols of a second language, based on outputs generated by the encoding component and the prediction component. The ASR system then modifies the probabilities produced by the joint analysis system by language-specific weighting information that, in turn, is produced by an attention system. This yields modified first-language probabilities and modified second-language probabilities. Finally, the ASR system predicts an updated instance of label information based on the modified first-language probabilities and the modified second-language probabilities. The ASR system can be successfully applied to recognize an utterance that combines words in two or more languages.

Type: Application

Filed: March 8, 2022

Publication date: September 14, 2023

Applicant: Microsoft Technology Licensing, LLC

Inventors: Vikas JOSHI, Purvi AGRAWAL, Rupeshkumar Rasiklal MEHTA, Aditya Rajesh PATIL
COMPUTING SYSTEM FOR UNSUPERVISED EMOTIONAL TEXT TO SPEECH TRAINING

Publication number: 20230111824

Abstract: A text to speech (TTS) model is trained based on training data including text samples. The text samples are provided to a text embedding model for outputting text embeddings for the text samples. The text embeddings are clustered into several clusters of text embeddings. The several clusters are representative of variations in emotion. The TTS model is then trained based upon the several clusters of text embeddings. Upon being trained, the TTS model is configured to receive text input and output a spoken utterance that corresponds to the text input. The TTS model is configured to output the spoken utterance with emotion. The emotion is based upon the text input and the training of the TTS model.

Type: Application

Filed: February 22, 2022

Publication date: April 13, 2023

Inventors: Arijit MUKHERJEE, Shubham BANSAL, Sandeepkumar SATPAL, Rupeshkumar Rasiklal MEHTA
Enhancing ASR System Performance for Agglutinative Languages

Publication number: 20230102338

Abstract: A training-stage technique trains a language model for use in an ASR system. The technique includes: obtaining a training corpus that includes a sequence of terms; determining that an original term in the training corpus is not present in a dictionary resource; segmenting the original term into two or more sub-terms using a segmentation resource; determining that the segmentation of the original term into the two or more sub-terms is a valid segmentation, based on two or more validity tests; and training the language model based on the terms that have been identified. A computer-implemented inference-stage technique applies the language model to produce ASR output results. The inference-stage technique merges a sub-term with a preceding term if these two terms are separated by no more than a prescribed interval of time.

Type: Application

Filed: September 29, 2021

Publication date: March 30, 2023

Applicant: Microsoft Technology Licensing, LLC

Inventors: Rahul AMBAVAT, Ankur GUPTA, Rupeshkumar Rasiklal MEHTA
COMPUTING SYSTEM FOR DOMAIN EXPRESSIVE TEXT TO SPEECH

Publication number: 20230099732

Abstract: A computing system obtains text that includes words and provides the text as input to an emotional classifier model that has been trained based upon emotional classification. The computing system obtains a textual embedding of the computer-readable text as output of the emotional classifier model. The computing system generates a phoneme sequence based upon the words of the text. The computing system, generates, by way of an encoder of a text to speech (TTS) model, a phoneme encoding based upon the phoneme sequence. The computing system provides the textual embedding and the phoneme encoding as input to a decoder of the TTS model. The computing system causes speech that includes the words to be played over a speaker based upon output of the decoder of the TTS model, where the speech reflects an emotion underlying the text due to the textual embedding provided to the encoder.

Type: Application

Filed: November 11, 2021

Publication date: March 30, 2023

Inventors: Arijit MUKHERJEE, Shubham BANSAL, Sandeepkumar SATPAL, Rupeshkumar Rasiklal MEHTA
Developing an Automatic Speech Recognition System Using Normalization

Publication number: 20230094511

Abstract: A computer-implemented technique identifies terms in an original reference transcription and original ASR output results that are considered valid variants of each other, even though these terms have different textual forms. Based on this finding, the technique produces a normalized reference transcription and normalized ASR output results in which valid variants are assigned the same textual form. In some implementations, the technique uses the normalized text to develop a model for an ASR system. For example, the technique may generate a word error rate (WER) measure by comparing the normalized reference transcription with the normalized ASR output results, and use the WER measure as guidance in developing the model. Some aspects of the technique involve identifying occasions in which a term can be properly split into component parts. Other aspects can identify other ways in which two terms may vary in spelling, but nonetheless remain valid variants.

Type: Application

Filed: September 29, 2021

Publication date: March 30, 2023

Applicant: Microsoft Technology Licensing, LLC

Inventors: Satarupa GUHA, Ankur GUPTA, Rahul AMBAVAT, Rupeshkumar Rasiklal MEHTA
SCALABLE ENTITIES AND PATTERNS MINING PIPELINE TO IMPROVE AUTOMATIC SPEECH RECOGNITION

Publication number: 20220358910

Abstract: A computing system obtains features that have been extracted from an acoustic signal, where the acoustic signal comprises spoken words uttered by a user. The computing system performs automatic speech recognition (ASR) based upon the features and a language model (LM) generated based upon expanded pattern data. The expanded pattern data includes a name of an entity and a search term, where the entity belongs to a segment identified in a knowledge base. The search term has been included in queries for entities belonging to the segment. The computing system identifies a sequence of words corresponding to the features based upon results of the ASR. The computing system transmits computer-readable text to a search engine, where the text includes the sequence of words.

Type: Application

Filed: May 6, 2021

Publication date: November 10, 2022

Inventors: Ankur GUPTA, Satarupa GUHA, Rupeshkumar Rasiklal MEHTA, Issac John ALPHONSO, Anastasios ANASTASAKOS, Shuangyu CHANG