Patents by Inventor Kshitiz Kumar

Kshitiz Kumar has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240153303
    Abstract: Some aspects of the technology described herein perform identity identification on faces in a video. Object tracking is performed on detected faces in frames of a video to generate tracklets. Each tracklet comprises a sequence of consecutive frames in which each frame includes a detected face for a person. The tracklets are clustered using face feature vectors for detected faces of each tracklet to generate a plurality of clusters. Information is stored in an identity datastore, including a first identifier for a first identity in association with an indication of frames from tracklets in a first cluster from the plurality of clusters.
    Type: Application
    Filed: November 7, 2022
    Publication date: May 9, 2024
    Inventors: Ali AMINIAN, Aashish Kumar MISRAA, Kshitiz GARG, Aseem AGARWALA
  • Publication number: 20240112008
    Abstract: In one embodiment, a method includes receiving, by a first client system, from one or more remote servers, a current version of a neural network model including multiple model parameters, training the neural network model on multiple examples retrieved from a local data store to generate multiple updated model parameters, wherein each of the examples includes one or more features and one or more labels, calculating a user valuation associated with the first client system, wherein the user valuation represents a measure of utility of training the neural network model on the multiple examples, and sending, to one or more of the remote servers, the trained neural network model and the user valuation, wherein the user valuation is associated with a likelihood of the first client system being selected for a subsequent training of the neural network model.
    Type: Application
    Filed: March 11, 2020
    Publication date: April 4, 2024
    Inventors: Kshitiz Malik, Seungwhan Moon, Honglei Liu, Anuj Kumar, Hongyuan Zhan, Ahmed Aly
  • Patent number: 11929076
    Abstract: Disclosed speech recognition techniques improve user-perceived latency while maintaining accuracy by: receiving an audio stream, in parallel, by a primary (e.g., accurate) speech recognition engine (SRE) and a secondary (e.g., fast) SRE; generating, with the primary SRE, a primary result; generating, with the secondary SRE, a secondary result; appending the secondary result to a word list; and merging the primary result into the secondary result in the word list. Combining output from the primary and secondary SREs into a single decoder as described herein improves user-perceived latency while maintaining or improving accuracy, among other advantages.
    Type: Grant
    Filed: December 1, 2022
    Date of Patent: March 12, 2024
    Assignee: Microsoft Technology Licensing, LLC.
    Inventors: Hosam Adel Khalil, Emilian Stoimenov, Christopher Hakan Basoglu, Kshitiz Kumar, Jian Wu
  • Publication number: 20230401392
    Abstract: A data processing system is implemented for receiving speech data for a plurality of languages, and determining letters from the speech data. The data processing system also implements normalizing the speech data by applying linguistic based rules for Latin-based languages on the determined letters, building a computer model using the normalized speech data, fine-tuning the computer model using additional speech data, and recognizing words in a target language using the fine-tuned computer model.
    Type: Application
    Filed: June 9, 2022
    Publication date: December 14, 2023
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Kshitiz KUMAR, Jian WU, Bo REN, Tianyu WU, Fahimeh BAHMANINEZHAD, Edward C. LIN, Xiaoyang CHEN, Changliang LIU
  • Patent number: 11620992
    Abstract: A method of enhancing an automated speech recognition confidence classifier includes receiving a set of baseline confidence features from one or more decoded words, deriving word embedding confidence features from the baseline confidence features, joining the baseline confidence features with word embedding confidence features to create a feature vector, and executing the confidence classifier to generate a confidence score, wherein the confidence classifier is trained with a set of training examples having labeled features corresponding to the feature vector.
    Type: Grant
    Filed: March 31, 2021
    Date of Patent: April 4, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Kshitiz Kumar, Anastasios Anastasakos, Yifan Gong
  • Publication number: 20230102295
    Abstract: Disclosed speech recognition techniques improve user-perceived latency while maintaining accuracy by: receiving an audio stream, in parallel, by a primary (e.g., accurate) speech recognition engine (SRE) and a secondary (e.g., fast) SRE; generating, with the primary SRE, a primary result; generating, with the secondary SRE, a secondary result; appending the secondary result to a word list; and merging the primary result into the secondary result in the word list. Combining output from the primary and secondary SREs into a single decoder as described herein improves user-perceived latency while maintaining or improving accuracy, among other advantages.
    Type: Application
    Filed: December 1, 2022
    Publication date: March 30, 2023
    Inventors: Hosam Adel KHALIL, Emilian STOIMENOV, Christopher Hakan BASOGLU, Kshitiz KUMAR, Jian WU
  • Patent number: 11532312
    Abstract: Disclosed speech recognition techniques improve user-perceived latency while maintaining accuracy by: receiving an audio stream, in parallel, by a primary (e.g., accurate) speech recognition engine (SRE) and a secondary (e.g., fast) SRE; generating, with the primary SRE, a primary result; generating, with the secondary SRE, a secondary result; appending the secondary result to a word list; and merging the primary result into the secondary result in the word list. Combining output from the primary and secondary SREs into a single decoder as described herein improves user-perceived latency while maintaining or improving accuracy, among other advantages.
    Type: Grant
    Filed: December 15, 2020
    Date of Patent: December 20, 2022
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Hosam Adel Khalil, Emilian Stoimenov, Christopher Hakan Basoglu, Kshitiz Kumar, Jian Wu
  • Publication number: 20220189467
    Abstract: Disclosed speech recognition techniques improve user-perceived latency while maintaining accuracy by: receiving an audio stream, in parallel, by a primary (e.g., accurate) speech recognition engine (SRE) and a secondary (e.g., fast) SRE; generating, with the primary SRE, a primary result; generating, with the secondary SRE, a secondary result; appending the secondary result to a word list; and merging the primary result into the secondary result in the word list. Combining output from the primary and secondary SREs into a single decoder as described herein improves user-perceived latency while maintaining or improving accuracy, among other advantages.
    Type: Application
    Filed: December 15, 2020
    Publication date: June 16, 2022
    Inventors: Hosam Adel KHALIL, Emilian STOIMENOV, Christopher Hakan BASOGLU, Kshitiz KUMAR, Jian WU
  • Patent number: 11158305
    Abstract: Generally discussed herein are devices, systems, and methods for wake word verification. A method can include receiving, at a server, a message from a device indicating that an utterance of a user-defined wake word was detected at the device, the message including (a) audio samples or features extracted from the audio samples and (b) data indicating the user-defined wake word, retrieving or generating, at the server, a custom decoding graph for the user-defined wake word, wherein the decoding graph and the static portion of the wake word verification model form a custom wake word verification model for the user-defined wake word, executing the wake word verification model to determine a likelihood that the wake word was uttered, and providing a message to the device indicating whether wake was uttered based on the determined likelihood.
    Type: Grant
    Filed: July 25, 2019
    Date of Patent: October 26, 2021
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Khuram Shahid, Kshitiz Kumar, Teng Yi, Veljko Miljanic, Huaming Wang, Yifan Gong, Hosam Adel Khalil
  • Publication number: 20210272557
    Abstract: A method of enhancing an automated speech recognition confidence classifier includes receiving a set of baseline confidence features from one or more decoded words, deriving word embedding confidence features from the baseline confidence features, joining the baseline confidence features with word embedding confidence features to create a feature vector, and executing the confidence classifier to generate a confidence score, wherein the confidence classifier is trained with a set of training examples having labeled features corresponding to the feature vector.
    Type: Application
    Filed: March 31, 2021
    Publication date: September 2, 2021
    Inventors: Kshitiz Kumar, Anastasios Anastasakos, Yifan Gong
  • Patent number: 10991365
    Abstract: A method of enhancing an automated speech recognition confidence classifier includes receiving a set of baseline confidence features from one or more decoded words, deriving word embedding confidence features from the baseline confidence features, joining the baseline confidence features with word embedding confidence features to create a feature vector, and executing the confidence classifier to generate a confidence score, wherein the confidence classifier is trained with a set of training examples having labeled features corresponding to the feature vector.
    Type: Grant
    Filed: April 8, 2019
    Date of Patent: April 27, 2021
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Kshitiz Kumar, Anastasios Anastasakos, Yifan Gong
  • Publication number: 20200349925
    Abstract: Generally discussed herein are devices, systems, and methods for wake word verification. A method can include receiving, at a server, a message from a device indicating that an utterance of a user-defined wake word was detected at the device, the message including (a) audio samples or features extracted from the audio samples and (b) data indicating the user-defined wake word, retrieving or generating, at the server, a custom decoding graph for the user-defined wake word, wherein the decoding graph and the static portion of the wake word verification model form a custom wake word verification model for the user-defined wake word, executing the wake word verification model to determine a likelihood that the wake word was uttered, and providing a message to the device indicating whether wake was uttered based on the determined likelihood.
    Type: Application
    Filed: July 25, 2019
    Publication date: November 5, 2020
    Inventors: Khuram Shahid, Kshitiz Kumar, Teng Yi, Veljko Miljanic, Huaming Wang, Yifan Gong, Hosam Adel Khalil
  • Publication number: 20200320985
    Abstract: A method of enhancing an automated speech recognition confidence classifier includes receiving a set of baseline confidence features from one or more decoded words, deriving word embedding confidence features from the baseline confidence features, joining the baseline confidence features with word embedding confidence features to create a feature vector, and executing the confidence classifier to generate a confidence score, wherein the confidence classifier is trained with a set of training examples having labeled features corresponding to the feature vector.
    Type: Application
    Filed: April 8, 2019
    Publication date: October 8, 2020
    Inventors: Kshitiz Kumar, Anastasios Anastasakos, Yifan Gong
  • Publication number: 20200312307
    Abstract: A computer implemented method classifies an input corresponding to multiple different kinds of input. The method includes obtaining a set of features from the input, providing the set of features to multiple different models to generate state predictions, generating a set of state-dependent predicted weights, and combining the state predictions from the multiple models, based on the state-dependent predicted weights for classification of the set of features.
    Type: Application
    Filed: March 25, 2019
    Publication date: October 1, 2020
    Inventors: Kshitiz Kumar, Yifan Gong
  • Patent number: 10706852
    Abstract: The described technology provides arbitration between speech recognition results generated by different automatic speech recognition (ASR) engines, such as ASR engines trained according to different language or acoustic models. The system includes an arbitrator that selects between a first speech recognition result representing an acoustic utterance as transcribed by a first ASR engine and a second speech recognition result representing the acoustic utterance as transcribed by a second ASR engine. This selection is based on a set of confidence features that is initially used by the first ASR engine or the second ASR engine to generate the first and second speech recognition results.
    Type: Grant
    Filed: November 13, 2015
    Date of Patent: July 7, 2020
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Kshitiz Kumar, Hosam Khalil, Yifan Gong, Ziad Al-Bawab, Chaojun Liu
  • Patent number: 10235994
    Abstract: The technology described herein uses a modular model to process speech. A deep learning based acoustic model comprises a stack of different types of neural network layers. The sub-modules of a deep learning based acoustic model can be used to represent distinct non-phonetic acoustic factors, such as accent origins (e.g. native, non-native), speech channels (e.g. mobile, bluetooth, desktop etc.), speech application scenario (e.g. voice search, short message dictation etc.), and speaker variation (e.g. individual speakers or clustered speakers), etc. The technology described herein uses certain sub-modules in a first context and a second group of sub-modules in a second context.
    Type: Grant
    Filed: June 30, 2016
    Date of Patent: March 19, 2019
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Yan Huang, Chaojun Liu, Kshitiz Kumar, Kaustubh Prakash Kalgaonkar, Yifan Gong
  • Patent number: 10115393
    Abstract: A computer-readable speaker-adapted speech engine acoustic model can be generated. The generating of the acoustic model can include performing speaker-specific adaptation of one or more layers of the model to produce one or more adaptive layers comprising layer weights, with the speaker-specific adaptation comprising a data size reduction technique. The data size reduction technique can be threshold value adaptation, corner area adaptation, diagonal-based quantization, adaptive matrix reduction, or a combination of these reduction techniques. The speaker-adapted speech engine model can be accessed and used in performing speech recognition on computer-readable audio speech input via a computerized speech recognition engine.
    Type: Grant
    Filed: October 31, 2016
    Date of Patent: October 30, 2018
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Kshitiz Kumar, Chaojun Liu, Yifan Gong
  • Patent number: 9997161
    Abstract: The described technology provides normalization of speech recognition confidence classifier (CC) scores that maintains the accuracy of acceptance metrics. A speech recognition CC scores quantitatively represents the correctness of decoded utterances in a defined range (e.g., [0,1]). An operating threshold is associated with a confidence classifier, such that utterance recognitions having scores exceeding the operating threshold are deemed acceptable. However, when a speech recognition engine, an acoustic model, and/or other parameters are updated by the platform, the correct-accept (CA) versus false-accept (FA) profile can change such that the application software's operating threshold is no longer valid or as accurate.
    Type: Grant
    Filed: September 11, 2015
    Date of Patent: June 12, 2018
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Yifan Gong, Chaojun Liu, Kshitiz Kumar
  • Publication number: 20170256254
    Abstract: The technology described herein uses a modular model to process speech. A deep learning based acoustic model comprises a stack of different types of neural network layers. The sub-modules of a deep learning based acoustic model can be used to represent distinct non-phonetic acoustic factors, such as accent origins (e.g. native, non-native), speech channels (e.g. mobile, bluetooth, desktop etc.), speech application scenario (e.g. voice search, short message dictation etc.), and speaker variation (e.g. individual speakers or clustered speakers), etc. The technology described herein uses certain sub-modules in a first context and a second group of sub-modules in a second context.
    Type: Application
    Filed: June 30, 2016
    Publication date: September 7, 2017
    Inventors: Yan HUANG, Chaojun LIU, Kshitiz KUMAR, Kaustubh Prakash KALGAONKAR, Yifan GONG
  • Publication number: 20170140759
    Abstract: The described technology provides arbitration between speech recognition results generated by different automatic speech recognition (ASR) engines, such as ASR engines trained according to different language or acoustic models. The system includes an arbitrator that selects between a first speech recognition result representing an acoustic utterance as transcribed by a first ASR engine and a second speech recognition result representing the acoustic utterance as transcribed by a second ASR engine. This selection is based on a set of confidence features that is initially used by the first ASR engine or the second ASR engine to generate the first and second speech recognition results.
    Type: Application
    Filed: November 13, 2015
    Publication date: May 18, 2017
    Inventors: Kshitiz Kumar, Hosam Khalil, Yifan Gong, Ziad Al-Bawab, Chaojun Liu