Patents by Inventor Kshitiz Kumar

Kshitiz Kumar has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

TRACKING UNIQUE FACE IDENTITIES IN VIDEOS

Publication number: 20240153303

Abstract: Some aspects of the technology described herein perform identity identification on faces in a video. Object tracking is performed on detected faces in frames of a video to generate tracklets. Each tracklet comprises a sequence of consecutive frames in which each frame includes a detected face for a person. The tracklets are clustered using face feature vectors for detected faces of each tracklet to generate a plurality of clusters. Information is stored in an identity datastore, including a first identifier for a first identity in association with an indication of frames from tracklets in a first cluster from the plurality of clusters.

Type: Application

Filed: November 7, 2022

Publication date: May 9, 2024

Inventors: Ali AMINIAN, Aashish Kumar MISRAA, Kshitiz GARG, Aseem AGARWALA
Active Federated Learning for Assistant Systems

Publication number: 20240112008

Abstract: In one embodiment, a method includes receiving, by a first client system, from one or more remote servers, a current version of a neural network model including multiple model parameters, training the neural network model on multiple examples retrieved from a local data store to generate multiple updated model parameters, wherein each of the examples includes one or more features and one or more labels, calculating a user valuation associated with the first client system, wherein the user valuation represents a measure of utility of training the neural network model on the multiple examples, and sending, to one or more of the remote servers, the trained neural network model and the user valuation, wherein the user valuation is associated with a likelihood of the first client system being selected for a subsequent training of the neural network model.

Type: Application

Filed: March 11, 2020

Publication date: April 4, 2024

Inventors: Kshitiz Malik, Seungwhan Moon, Honglei Liu, Anuj Kumar, Hongyuan Zhan, Ahmed Aly
User-perceived latency while maintaining accuracy

Patent number: 11929076

Abstract: Disclosed speech recognition techniques improve user-perceived latency while maintaining accuracy by: receiving an audio stream, in parallel, by a primary (e.g., accurate) speech recognition engine (SRE) and a secondary (e.g., fast) SRE; generating, with the primary SRE, a primary result; generating, with the secondary SRE, a secondary result; appending the secondary result to a word list; and merging the primary result into the secondary result in the word list. Combining output from the primary and secondary SREs into a single decoder as described herein improves user-perceived latency while maintaining or improving accuracy, among other advantages.

Type: Grant

Filed: December 1, 2022

Date of Patent: March 12, 2024

Assignee: Microsoft Technology Licensing, LLC.

Inventors: Hosam Adel Khalil, Emilian Stoimenov, Christopher Hakan Basoglu, Kshitiz Kumar, Jian Wu
Automatic Speech Recognition Systems and Processes

Publication number: 20230401392

Abstract: A data processing system is implemented for receiving speech data for a plurality of languages, and determining letters from the speech data. The data processing system also implements normalizing the speech data by applying linguistic based rules for Latin-based languages on the determined letters, building a computer model using the normalized speech data, fine-tuning the computer model using additional speech data, and recognizing words in a target language using the fine-tuned computer model.

Type: Application

Filed: June 9, 2022

Publication date: December 14, 2023

Applicant: Microsoft Technology Licensing, LLC

Inventors: Kshitiz KUMAR, Jian WU, Bo REN, Tianyu WU, Fahimeh BAHMANINEZHAD, Edward C. LIN, Xiaoyang CHEN, Changliang LIU
Automated speech recognition confidence classifier

Patent number: 11620992

Abstract: A method of enhancing an automated speech recognition confidence classifier includes receiving a set of baseline confidence features from one or more decoded words, deriving word embedding confidence features from the baseline confidence features, joining the baseline confidence features with word embedding confidence features to create a feature vector, and executing the confidence classifier to generate a confidence score, wherein the confidence classifier is trained with a set of training examples having labeled features corresponding to the feature vector.

Type: Grant

Filed: March 31, 2021

Date of Patent: April 4, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Kshitiz Kumar, Anastasios Anastasakos, Yifan Gong
USER-PERCEIVED LATENCY WHILE MAINTAINING ACCURACY

Publication number: 20230102295

Abstract: Disclosed speech recognition techniques improve user-perceived latency while maintaining accuracy by: receiving an audio stream, in parallel, by a primary (e.g., accurate) speech recognition engine (SRE) and a secondary (e.g., fast) SRE; generating, with the primary SRE, a primary result; generating, with the secondary SRE, a secondary result; appending the secondary result to a word list; and merging the primary result into the secondary result in the word list. Combining output from the primary and secondary SREs into a single decoder as described herein improves user-perceived latency while maintaining or improving accuracy, among other advantages.

Type: Application

Filed: December 1, 2022

Publication date: March 30, 2023

Inventors: Hosam Adel KHALIL, Emilian STOIMENOV, Christopher Hakan BASOGLU, Kshitiz KUMAR, Jian WU
User-perceived latency while maintaining accuracy

Patent number: 11532312

Abstract: Disclosed speech recognition techniques improve user-perceived latency while maintaining accuracy by: receiving an audio stream, in parallel, by a primary (e.g., accurate) speech recognition engine (SRE) and a secondary (e.g., fast) SRE; generating, with the primary SRE, a primary result; generating, with the secondary SRE, a secondary result; appending the secondary result to a word list; and merging the primary result into the secondary result in the word list. Combining output from the primary and secondary SREs into a single decoder as described herein improves user-perceived latency while maintaining or improving accuracy, among other advantages.

Type: Grant

Filed: December 15, 2020

Date of Patent: December 20, 2022

Assignee: Microsoft Technology Licensing, LLC

Inventors: Hosam Adel Khalil, Emilian Stoimenov, Christopher Hakan Basoglu, Kshitiz Kumar, Jian Wu
USER-PERCEIVED LATENCY WHILE MAINTAINING ACCURACY

Publication number: 20220189467

Abstract: Disclosed speech recognition techniques improve user-perceived latency while maintaining accuracy by: receiving an audio stream, in parallel, by a primary (e.g., accurate) speech recognition engine (SRE) and a secondary (e.g., fast) SRE; generating, with the primary SRE, a primary result; generating, with the secondary SRE, a secondary result; appending the secondary result to a word list; and merging the primary result into the secondary result in the word list. Combining output from the primary and secondary SREs into a single decoder as described herein improves user-perceived latency while maintaining or improving accuracy, among other advantages.

Type: Application

Filed: December 15, 2020

Publication date: June 16, 2022

Inventors: Hosam Adel KHALIL, Emilian STOIMENOV, Christopher Hakan BASOGLU, Kshitiz KUMAR, Jian WU
Online verification of custom wake word

Patent number: 11158305

Abstract: Generally discussed herein are devices, systems, and methods for wake word verification. A method can include receiving, at a server, a message from a device indicating that an utterance of a user-defined wake word was detected at the device, the message including (a) audio samples or features extracted from the audio samples and (b) data indicating the user-defined wake word, retrieving or generating, at the server, a custom decoding graph for the user-defined wake word, wherein the decoding graph and the static portion of the wake word verification model form a custom wake word verification model for the user-defined wake word, executing the wake word verification model to determine a likelihood that the wake word was uttered, and providing a message to the device indicating whether wake was uttered based on the determined likelihood.

Type: Grant

Filed: July 25, 2019

Date of Patent: October 26, 2021

Assignee: Microsoft Technology Licensing, LLC

Inventors: Khuram Shahid, Kshitiz Kumar, Teng Yi, Veljko Miljanic, Huaming Wang, Yifan Gong, Hosam Adel Khalil
AUTOMATED SPEECH RECOGNITION CONFIDENCE CLASSIFIER

Publication number: 20210272557

Abstract: A method of enhancing an automated speech recognition confidence classifier includes receiving a set of baseline confidence features from one or more decoded words, deriving word embedding confidence features from the baseline confidence features, joining the baseline confidence features with word embedding confidence features to create a feature vector, and executing the confidence classifier to generate a confidence score, wherein the confidence classifier is trained with a set of training examples having labeled features corresponding to the feature vector.

Type: Application

Filed: March 31, 2021

Publication date: September 2, 2021

Inventors: Kshitiz Kumar, Anastasios Anastasakos, Yifan Gong
Automated speech recognition confidence classifier

Patent number: 10991365

Abstract: A method of enhancing an automated speech recognition confidence classifier includes receiving a set of baseline confidence features from one or more decoded words, deriving word embedding confidence features from the baseline confidence features, joining the baseline confidence features with word embedding confidence features to create a feature vector, and executing the confidence classifier to generate a confidence score, wherein the confidence classifier is trained with a set of training examples having labeled features corresponding to the feature vector.

Type: Grant

Filed: April 8, 2019

Date of Patent: April 27, 2021

Assignee: Microsoft Technology Licensing, LLC

Inventors: Kshitiz Kumar, Anastasios Anastasakos, Yifan Gong
ONLINE VERIFICATION OF CUSTOM WAKE WORD

Publication number: 20200349925

Abstract: Generally discussed herein are devices, systems, and methods for wake word verification. A method can include receiving, at a server, a message from a device indicating that an utterance of a user-defined wake word was detected at the device, the message including (a) audio samples or features extracted from the audio samples and (b) data indicating the user-defined wake word, retrieving or generating, at the server, a custom decoding graph for the user-defined wake word, wherein the decoding graph and the static portion of the wake word verification model form a custom wake word verification model for the user-defined wake word, executing the wake word verification model to determine a likelihood that the wake word was uttered, and providing a message to the device indicating whether wake was uttered based on the determined likelihood.

Type: Application

Filed: July 25, 2019

Publication date: November 5, 2020

Inventors: Khuram Shahid, Kshitiz Kumar, Teng Yi, Veljko Miljanic, Huaming Wang, Yifan Gong, Hosam Adel Khalil
AUTOMATED SPEECH RECOGNITION CONFIDENCE CLASSIFIER

Publication number: 20200320985

Abstract: A method of enhancing an automated speech recognition confidence classifier includes receiving a set of baseline confidence features from one or more decoded words, deriving word embedding confidence features from the baseline confidence features, joining the baseline confidence features with word embedding confidence features to create a feature vector, and executing the confidence classifier to generate a confidence score, wherein the confidence classifier is trained with a set of training examples having labeled features corresponding to the feature vector.

Type: Application

Filed: April 8, 2019

Publication date: October 8, 2020

Inventors: Kshitiz Kumar, Anastasios Anastasakos, Yifan Gong
Dynamic Combination of Acoustic Model States

Publication number: 20200312307

Abstract: A computer implemented method classifies an input corresponding to multiple different kinds of input. The method includes obtaining a set of features from the input, providing the set of features to multiple different models to generate state predictions, generating a set of state-dependent predicted weights, and combining the state predictions from the multiple models, based on the state-dependent predicted weights for classification of the set of features.

Type: Application

Filed: March 25, 2019

Publication date: October 1, 2020

Inventors: Kshitiz Kumar, Yifan Gong
Confidence features for automated speech recognition arbitration

Patent number: 10706852

Abstract: The described technology provides arbitration between speech recognition results generated by different automatic speech recognition (ASR) engines, such as ASR engines trained according to different language or acoustic models. The system includes an arbitrator that selects between a first speech recognition result representing an acoustic utterance as transcribed by a first ASR engine and a second speech recognition result representing the acoustic utterance as transcribed by a second ASR engine. This selection is based on a set of confidence features that is initially used by the first ASR engine or the second ASR engine to generate the first and second speech recognition results.

Type: Grant

Filed: November 13, 2015

Date of Patent: July 7, 2020

Assignee: Microsoft Technology Licensing, LLC

Inventors: Kshitiz Kumar, Hosam Khalil, Yifan Gong, Ziad Al-Bawab, Chaojun Liu
Modular deep learning model

Patent number: 10235994

Abstract: The technology described herein uses a modular model to process speech. A deep learning based acoustic model comprises a stack of different types of neural network layers. The sub-modules of a deep learning based acoustic model can be used to represent distinct non-phonetic acoustic factors, such as accent origins (e.g. native, non-native), speech channels (e.g. mobile, bluetooth, desktop etc.), speech application scenario (e.g. voice search, short message dictation etc.), and speaker variation (e.g. individual speakers or clustered speakers), etc. The technology described herein uses certain sub-modules in a first context and a second group of sub-modules in a second context.

Type: Grant

Filed: June 30, 2016

Date of Patent: March 19, 2019

Assignee: Microsoft Technology Licensing, LLC

Inventors: Yan Huang, Chaojun Liu, Kshitiz Kumar, Kaustubh Prakash Kalgaonkar, Yifan Gong
Reduced size computerized speech model speaker adaptation

Patent number: 10115393

Abstract: A computer-readable speaker-adapted speech engine acoustic model can be generated. The generating of the acoustic model can include performing speaker-specific adaptation of one or more layers of the model to produce one or more adaptive layers comprising layer weights, with the speaker-specific adaptation comprising a data size reduction technique. The data size reduction technique can be threshold value adaptation, corner area adaptation, diagonal-based quantization, adaptive matrix reduction, or a combination of these reduction techniques. The speaker-adapted speech engine model can be accessed and used in performing speech recognition on computer-readable audio speech input via a computerized speech recognition engine.

Type: Grant

Filed: October 31, 2016

Date of Patent: October 30, 2018

Assignee: Microsoft Technology Licensing, LLC

Inventors: Kshitiz Kumar, Chaojun Liu, Yifan Gong
Automatic speech recognition confidence classifier

Patent number: 9997161

Abstract: The described technology provides normalization of speech recognition confidence classifier (CC) scores that maintains the accuracy of acceptance metrics. A speech recognition CC scores quantitatively represents the correctness of decoded utterances in a defined range (e.g., [0,1]). An operating threshold is associated with a confidence classifier, such that utterance recognitions having scores exceeding the operating threshold are deemed acceptable. However, when a speech recognition engine, an acoustic model, and/or other parameters are updated by the platform, the correct-accept (CA) versus false-accept (FA) profile can change such that the application software's operating threshold is no longer valid or as accurate.

Type: Grant

Filed: September 11, 2015

Date of Patent: June 12, 2018

Assignee: Microsoft Technology Licensing, LLC

Inventors: Yifan Gong, Chaojun Liu, Kshitiz Kumar
MODULAR DEEP LEARNING MODEL

Publication number: 20170256254

Abstract: The technology described herein uses a modular model to process speech. A deep learning based acoustic model comprises a stack of different types of neural network layers. The sub-modules of a deep learning based acoustic model can be used to represent distinct non-phonetic acoustic factors, such as accent origins (e.g. native, non-native), speech channels (e.g. mobile, bluetooth, desktop etc.), speech application scenario (e.g. voice search, short message dictation etc.), and speaker variation (e.g. individual speakers or clustered speakers), etc. The technology described herein uses certain sub-modules in a first context and a second group of sub-modules in a second context.

Type: Application

Filed: June 30, 2016

Publication date: September 7, 2017

Inventors: Yan HUANG, Chaojun LIU, Kshitiz KUMAR, Kaustubh Prakash KALGAONKAR, Yifan GONG
CONFIDENCE FEATURES FOR AUTOMATED SPEECH RECOGNITION ARBITRATION

Publication number: 20170140759

Abstract: The described technology provides arbitration between speech recognition results generated by different automatic speech recognition (ASR) engines, such as ASR engines trained according to different language or acoustic models. The system includes an arbitrator that selects between a first speech recognition result representing an acoustic utterance as transcribed by a first ASR engine and a second speech recognition result representing the acoustic utterance as transcribed by a second ASR engine. This selection is based on a set of confidence features that is initially used by the first ASR engine or the second ASR engine to generate the first and second speech recognition results.

Type: Application

Filed: November 13, 2015

Publication date: May 18, 2017

Inventors: Kshitiz Kumar, Hosam Khalil, Yifan Gong, Ziad Al-Bawab, Chaojun Liu

1 2 next