Patents by Inventor Kshitiz Kumar
Kshitiz Kumar has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12014728Abstract: A computer implemented method classifies an input corresponding to multiple different kinds of input. The method includes obtaining a set of features from the input, providing the set of features to multiple different models to generate state predictions, generating a set of state-dependent predicted weights, and combining the state predictions from the multiple models, based on the state-dependent predicted weights for classification of the set of features.Type: GrantFiled: March 25, 2019Date of Patent: June 18, 2024Assignee: Microsoft Technology Licensing, LLCInventors: Kshitiz Kumar, Yifan Gong
-
Patent number: 11929076Abstract: Disclosed speech recognition techniques improve user-perceived latency while maintaining accuracy by: receiving an audio stream, in parallel, by a primary (e.g., accurate) speech recognition engine (SRE) and a secondary (e.g., fast) SRE; generating, with the primary SRE, a primary result; generating, with the secondary SRE, a secondary result; appending the secondary result to a word list; and merging the primary result into the secondary result in the word list. Combining output from the primary and secondary SREs into a single decoder as described herein improves user-perceived latency while maintaining or improving accuracy, among other advantages.Type: GrantFiled: December 1, 2022Date of Patent: March 12, 2024Assignee: Microsoft Technology Licensing, LLC.Inventors: Hosam Adel Khalil, Emilian Stoimenov, Christopher Hakan Basoglu, Kshitiz Kumar, Jian Wu
-
Publication number: 20230401392Abstract: A data processing system is implemented for receiving speech data for a plurality of languages, and determining letters from the speech data. The data processing system also implements normalizing the speech data by applying linguistic based rules for Latin-based languages on the determined letters, building a computer model using the normalized speech data, fine-tuning the computer model using additional speech data, and recognizing words in a target language using the fine-tuned computer model.Type: ApplicationFiled: June 9, 2022Publication date: December 14, 2023Applicant: Microsoft Technology Licensing, LLCInventors: Kshitiz KUMAR, Jian WU, Bo REN, Tianyu WU, Fahimeh BAHMANINEZHAD, Edward C. LIN, Xiaoyang CHEN, Changliang LIU
-
Patent number: 11620992Abstract: A method of enhancing an automated speech recognition confidence classifier includes receiving a set of baseline confidence features from one or more decoded words, deriving word embedding confidence features from the baseline confidence features, joining the baseline confidence features with word embedding confidence features to create a feature vector, and executing the confidence classifier to generate a confidence score, wherein the confidence classifier is trained with a set of training examples having labeled features corresponding to the feature vector.Type: GrantFiled: March 31, 2021Date of Patent: April 4, 2023Assignee: Microsoft Technology Licensing, LLCInventors: Kshitiz Kumar, Anastasios Anastasakos, Yifan Gong
-
Publication number: 20230102295Abstract: Disclosed speech recognition techniques improve user-perceived latency while maintaining accuracy by: receiving an audio stream, in parallel, by a primary (e.g., accurate) speech recognition engine (SRE) and a secondary (e.g., fast) SRE; generating, with the primary SRE, a primary result; generating, with the secondary SRE, a secondary result; appending the secondary result to a word list; and merging the primary result into the secondary result in the word list. Combining output from the primary and secondary SREs into a single decoder as described herein improves user-perceived latency while maintaining or improving accuracy, among other advantages.Type: ApplicationFiled: December 1, 2022Publication date: March 30, 2023Inventors: Hosam Adel KHALIL, Emilian STOIMENOV, Christopher Hakan BASOGLU, Kshitiz KUMAR, Jian WU
-
Patent number: 11532312Abstract: Disclosed speech recognition techniques improve user-perceived latency while maintaining accuracy by: receiving an audio stream, in parallel, by a primary (e.g., accurate) speech recognition engine (SRE) and a secondary (e.g., fast) SRE; generating, with the primary SRE, a primary result; generating, with the secondary SRE, a secondary result; appending the secondary result to a word list; and merging the primary result into the secondary result in the word list. Combining output from the primary and secondary SREs into a single decoder as described herein improves user-perceived latency while maintaining or improving accuracy, among other advantages.Type: GrantFiled: December 15, 2020Date of Patent: December 20, 2022Assignee: Microsoft Technology Licensing, LLCInventors: Hosam Adel Khalil, Emilian Stoimenov, Christopher Hakan Basoglu, Kshitiz Kumar, Jian Wu
-
Publication number: 20220189467Abstract: Disclosed speech recognition techniques improve user-perceived latency while maintaining accuracy by: receiving an audio stream, in parallel, by a primary (e.g., accurate) speech recognition engine (SRE) and a secondary (e.g., fast) SRE; generating, with the primary SRE, a primary result; generating, with the secondary SRE, a secondary result; appending the secondary result to a word list; and merging the primary result into the secondary result in the word list. Combining output from the primary and secondary SREs into a single decoder as described herein improves user-perceived latency while maintaining or improving accuracy, among other advantages.Type: ApplicationFiled: December 15, 2020Publication date: June 16, 2022Inventors: Hosam Adel KHALIL, Emilian STOIMENOV, Christopher Hakan BASOGLU, Kshitiz KUMAR, Jian WU
-
Patent number: 11158305Abstract: Generally discussed herein are devices, systems, and methods for wake word verification. A method can include receiving, at a server, a message from a device indicating that an utterance of a user-defined wake word was detected at the device, the message including (a) audio samples or features extracted from the audio samples and (b) data indicating the user-defined wake word, retrieving or generating, at the server, a custom decoding graph for the user-defined wake word, wherein the decoding graph and the static portion of the wake word verification model form a custom wake word verification model for the user-defined wake word, executing the wake word verification model to determine a likelihood that the wake word was uttered, and providing a message to the device indicating whether wake was uttered based on the determined likelihood.Type: GrantFiled: July 25, 2019Date of Patent: October 26, 2021Assignee: Microsoft Technology Licensing, LLCInventors: Khuram Shahid, Kshitiz Kumar, Teng Yi, Veljko Miljanic, Huaming Wang, Yifan Gong, Hosam Adel Khalil
-
Publication number: 20210272557Abstract: A method of enhancing an automated speech recognition confidence classifier includes receiving a set of baseline confidence features from one or more decoded words, deriving word embedding confidence features from the baseline confidence features, joining the baseline confidence features with word embedding confidence features to create a feature vector, and executing the confidence classifier to generate a confidence score, wherein the confidence classifier is trained with a set of training examples having labeled features corresponding to the feature vector.Type: ApplicationFiled: March 31, 2021Publication date: September 2, 2021Inventors: Kshitiz Kumar, Anastasios Anastasakos, Yifan Gong
-
Patent number: 10991365Abstract: A method of enhancing an automated speech recognition confidence classifier includes receiving a set of baseline confidence features from one or more decoded words, deriving word embedding confidence features from the baseline confidence features, joining the baseline confidence features with word embedding confidence features to create a feature vector, and executing the confidence classifier to generate a confidence score, wherein the confidence classifier is trained with a set of training examples having labeled features corresponding to the feature vector.Type: GrantFiled: April 8, 2019Date of Patent: April 27, 2021Assignee: Microsoft Technology Licensing, LLCInventors: Kshitiz Kumar, Anastasios Anastasakos, Yifan Gong
-
Publication number: 20200349925Abstract: Generally discussed herein are devices, systems, and methods for wake word verification. A method can include receiving, at a server, a message from a device indicating that an utterance of a user-defined wake word was detected at the device, the message including (a) audio samples or features extracted from the audio samples and (b) data indicating the user-defined wake word, retrieving or generating, at the server, a custom decoding graph for the user-defined wake word, wherein the decoding graph and the static portion of the wake word verification model form a custom wake word verification model for the user-defined wake word, executing the wake word verification model to determine a likelihood that the wake word was uttered, and providing a message to the device indicating whether wake was uttered based on the determined likelihood.Type: ApplicationFiled: July 25, 2019Publication date: November 5, 2020Inventors: Khuram Shahid, Kshitiz Kumar, Teng Yi, Veljko Miljanic, Huaming Wang, Yifan Gong, Hosam Adel Khalil
-
Publication number: 20200320985Abstract: A method of enhancing an automated speech recognition confidence classifier includes receiving a set of baseline confidence features from one or more decoded words, deriving word embedding confidence features from the baseline confidence features, joining the baseline confidence features with word embedding confidence features to create a feature vector, and executing the confidence classifier to generate a confidence score, wherein the confidence classifier is trained with a set of training examples having labeled features corresponding to the feature vector.Type: ApplicationFiled: April 8, 2019Publication date: October 8, 2020Inventors: Kshitiz Kumar, Anastasios Anastasakos, Yifan Gong
-
Publication number: 20200312307Abstract: A computer implemented method classifies an input corresponding to multiple different kinds of input. The method includes obtaining a set of features from the input, providing the set of features to multiple different models to generate state predictions, generating a set of state-dependent predicted weights, and combining the state predictions from the multiple models, based on the state-dependent predicted weights for classification of the set of features.Type: ApplicationFiled: March 25, 2019Publication date: October 1, 2020Inventors: Kshitiz Kumar, Yifan Gong
-
Patent number: 10706852Abstract: The described technology provides arbitration between speech recognition results generated by different automatic speech recognition (ASR) engines, such as ASR engines trained according to different language or acoustic models. The system includes an arbitrator that selects between a first speech recognition result representing an acoustic utterance as transcribed by a first ASR engine and a second speech recognition result representing the acoustic utterance as transcribed by a second ASR engine. This selection is based on a set of confidence features that is initially used by the first ASR engine or the second ASR engine to generate the first and second speech recognition results.Type: GrantFiled: November 13, 2015Date of Patent: July 7, 2020Assignee: Microsoft Technology Licensing, LLCInventors: Kshitiz Kumar, Hosam Khalil, Yifan Gong, Ziad Al-Bawab, Chaojun Liu
-
Patent number: 10235994Abstract: The technology described herein uses a modular model to process speech. A deep learning based acoustic model comprises a stack of different types of neural network layers. The sub-modules of a deep learning based acoustic model can be used to represent distinct non-phonetic acoustic factors, such as accent origins (e.g. native, non-native), speech channels (e.g. mobile, bluetooth, desktop etc.), speech application scenario (e.g. voice search, short message dictation etc.), and speaker variation (e.g. individual speakers or clustered speakers), etc. The technology described herein uses certain sub-modules in a first context and a second group of sub-modules in a second context.Type: GrantFiled: June 30, 2016Date of Patent: March 19, 2019Assignee: Microsoft Technology Licensing, LLCInventors: Yan Huang, Chaojun Liu, Kshitiz Kumar, Kaustubh Prakash Kalgaonkar, Yifan Gong
-
Patent number: 10115393Abstract: A computer-readable speaker-adapted speech engine acoustic model can be generated. The generating of the acoustic model can include performing speaker-specific adaptation of one or more layers of the model to produce one or more adaptive layers comprising layer weights, with the speaker-specific adaptation comprising a data size reduction technique. The data size reduction technique can be threshold value adaptation, corner area adaptation, diagonal-based quantization, adaptive matrix reduction, or a combination of these reduction techniques. The speaker-adapted speech engine model can be accessed and used in performing speech recognition on computer-readable audio speech input via a computerized speech recognition engine.Type: GrantFiled: October 31, 2016Date of Patent: October 30, 2018Assignee: Microsoft Technology Licensing, LLCInventors: Kshitiz Kumar, Chaojun Liu, Yifan Gong
-
Patent number: 9997161Abstract: The described technology provides normalization of speech recognition confidence classifier (CC) scores that maintains the accuracy of acceptance metrics. A speech recognition CC scores quantitatively represents the correctness of decoded utterances in a defined range (e.g., [0,1]). An operating threshold is associated with a confidence classifier, such that utterance recognitions having scores exceeding the operating threshold are deemed acceptable. However, when a speech recognition engine, an acoustic model, and/or other parameters are updated by the platform, the correct-accept (CA) versus false-accept (FA) profile can change such that the application software's operating threshold is no longer valid or as accurate.Type: GrantFiled: September 11, 2015Date of Patent: June 12, 2018Assignee: Microsoft Technology Licensing, LLCInventors: Yifan Gong, Chaojun Liu, Kshitiz Kumar
-
Publication number: 20170256254Abstract: The technology described herein uses a modular model to process speech. A deep learning based acoustic model comprises a stack of different types of neural network layers. The sub-modules of a deep learning based acoustic model can be used to represent distinct non-phonetic acoustic factors, such as accent origins (e.g. native, non-native), speech channels (e.g. mobile, bluetooth, desktop etc.), speech application scenario (e.g. voice search, short message dictation etc.), and speaker variation (e.g. individual speakers or clustered speakers), etc. The technology described herein uses certain sub-modules in a first context and a second group of sub-modules in a second context.Type: ApplicationFiled: June 30, 2016Publication date: September 7, 2017Inventors: Yan HUANG, Chaojun LIU, Kshitiz KUMAR, Kaustubh Prakash KALGAONKAR, Yifan GONG
-
Publication number: 20170140759Abstract: The described technology provides arbitration between speech recognition results generated by different automatic speech recognition (ASR) engines, such as ASR engines trained according to different language or acoustic models. The system includes an arbitrator that selects between a first speech recognition result representing an acoustic utterance as transcribed by a first ASR engine and a second speech recognition result representing the acoustic utterance as transcribed by a second ASR engine. This selection is based on a set of confidence features that is initially used by the first ASR engine or the second ASR engine to generate the first and second speech recognition results.Type: ApplicationFiled: November 13, 2015Publication date: May 18, 2017Inventors: Kshitiz Kumar, Hosam Khalil, Yifan Gong, Ziad Al-Bawab, Chaojun Liu
-
Publication number: 20170076725Abstract: The described technology provides normalization of speech recognition confidence classifier (CC) scores that maintains the accuracy of acceptance metrics. A speech recognition CC scores quantitatively represents the correctness of decoded utterances in a defined range (e.g., [0,1]). An operating threshold is associated with a confidence classifier, such that utterance recognitions having scores exceeding the operating threshold are deemed acceptable. However, when a speech recognition engine, an acoustic model, and/or other parameters are updated by the platform, the correct-accept (CA) versus false-accept (FA) profile can change such that the application software's operating threshold is no longer valid or as accurate.Type: ApplicationFiled: September 11, 2015Publication date: March 16, 2017Inventors: Kshitiz Kumar, Yifan Gong, Chaojun Liu