Patents by Inventor Kshitiz Kumar

Kshitiz Kumar has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240362506
    Abstract: This disclosure describes one or more implementations of a video inference system that utilizes machine-learning models to efficiently and flexibly process digital videos utilizing various improved video inference architectures. For example, the video inference system provides a framework for improving digital video processing by increasing the efficiency of both central processing units (CPUs) and graphics processing units (GPUs). In one example, the video inference system utilizes a first video inference architecture to reduce the number of computing resources needed to inference digital videos by analyzing multiple digital videos utilizing sets of CPU/GPU containers along with parallel pipeline processing. In a further example, the video inference system utilizes a second video inference architecture that facilitates multiple CPUs to preprocess multiple digital videos in parallel as well as a GPU to continuously, sequentially, and efficiently inference each of the digital videos.
    Type: Application
    Filed: July 12, 2024
    Publication date: October 31, 2024
    Inventors: Akhilesh Kumar, Xiaozhen Xue, Daniel Miranda, Nicolas Huynh Thien, Kshitiz Garg
  • Publication number: 20240323465
    Abstract: The present disclosure relates to a system, a method and a computer-readable medium for tagging live streaming data. The method includes generating a first intermediate tag for the live streaming program, generating a second intermediate tag for the live streaming program, and determining a final tag for the live streaming program according to the first intermediate tag and the second intermediate tag. The present disclosure can categorize contents in a more granular and precise way.
    Type: Application
    Filed: June 4, 2024
    Publication date: September 26, 2024
    Inventors: Hemanth Kumar AJARU, Kshitiz YADAV, Durgesh KUMAR, Hardik TANEJA, Shih Bo LIN
  • Patent number: 12067499
    Abstract: This disclosure describes one or more implementations of a video inference system that utilizes machine-learning models to efficiently and flexibly process digital videos utilizing various improved video inference architectures. For example, the video inference system provides a framework for improving digital video processing by increasing the efficiency of both central processing units (CPUs) and graphics processing units (GPUs). In one example, the video inference system utilizes a first video inference architecture to reduce the number of computing resources needed to inference digital videos by analyzing multiple digital videos utilizing sets of CPU/GPU containers along with parallel pipeline processing. In a further example, the video inference system utilizes a second video inference architecture that facilitates multiple CPUs to preprocess multiple digital videos in parallel as well as a GPU to continuously, sequentially, and efficiently inference each of the digital videos.
    Type: Grant
    Filed: November 2, 2020
    Date of Patent: August 20, 2024
    Assignee: Adobe Inc.
    Inventors: Akhilesh Kumar, Xiaozhen Xue, Daniel Miranda, Nicolas Huynh Thien, Kshitiz Garg
  • Patent number: 12014728
    Abstract: A computer implemented method classifies an input corresponding to multiple different kinds of input. The method includes obtaining a set of features from the input, providing the set of features to multiple different models to generate state predictions, generating a set of state-dependent predicted weights, and combining the state predictions from the multiple models, based on the state-dependent predicted weights for classification of the set of features.
    Type: Grant
    Filed: March 25, 2019
    Date of Patent: June 18, 2024
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Kshitiz Kumar, Yifan Gong
  • Patent number: 11929076
    Abstract: Disclosed speech recognition techniques improve user-perceived latency while maintaining accuracy by: receiving an audio stream, in parallel, by a primary (e.g., accurate) speech recognition engine (SRE) and a secondary (e.g., fast) SRE; generating, with the primary SRE, a primary result; generating, with the secondary SRE, a secondary result; appending the secondary result to a word list; and merging the primary result into the secondary result in the word list. Combining output from the primary and secondary SREs into a single decoder as described herein improves user-perceived latency while maintaining or improving accuracy, among other advantages.
    Type: Grant
    Filed: December 1, 2022
    Date of Patent: March 12, 2024
    Assignee: Microsoft Technology Licensing, LLC.
    Inventors: Hosam Adel Khalil, Emilian Stoimenov, Christopher Hakan Basoglu, Kshitiz Kumar, Jian Wu
  • Publication number: 20230401392
    Abstract: A data processing system is implemented for receiving speech data for a plurality of languages, and determining letters from the speech data. The data processing system also implements normalizing the speech data by applying linguistic based rules for Latin-based languages on the determined letters, building a computer model using the normalized speech data, fine-tuning the computer model using additional speech data, and recognizing words in a target language using the fine-tuned computer model.
    Type: Application
    Filed: June 9, 2022
    Publication date: December 14, 2023
    Applicant: Microsoft Technology Licensing, LLC
    Inventors: Kshitiz KUMAR, Jian WU, Bo REN, Tianyu WU, Fahimeh BAHMANINEZHAD, Edward C. LIN, Xiaoyang CHEN, Changliang LIU
  • Patent number: 11620992
    Abstract: A method of enhancing an automated speech recognition confidence classifier includes receiving a set of baseline confidence features from one or more decoded words, deriving word embedding confidence features from the baseline confidence features, joining the baseline confidence features with word embedding confidence features to create a feature vector, and executing the confidence classifier to generate a confidence score, wherein the confidence classifier is trained with a set of training examples having labeled features corresponding to the feature vector.
    Type: Grant
    Filed: March 31, 2021
    Date of Patent: April 4, 2023
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Kshitiz Kumar, Anastasios Anastasakos, Yifan Gong
  • Publication number: 20230102295
    Abstract: Disclosed speech recognition techniques improve user-perceived latency while maintaining accuracy by: receiving an audio stream, in parallel, by a primary (e.g., accurate) speech recognition engine (SRE) and a secondary (e.g., fast) SRE; generating, with the primary SRE, a primary result; generating, with the secondary SRE, a secondary result; appending the secondary result to a word list; and merging the primary result into the secondary result in the word list. Combining output from the primary and secondary SREs into a single decoder as described herein improves user-perceived latency while maintaining or improving accuracy, among other advantages.
    Type: Application
    Filed: December 1, 2022
    Publication date: March 30, 2023
    Inventors: Hosam Adel KHALIL, Emilian STOIMENOV, Christopher Hakan BASOGLU, Kshitiz KUMAR, Jian WU
  • Patent number: 11532312
    Abstract: Disclosed speech recognition techniques improve user-perceived latency while maintaining accuracy by: receiving an audio stream, in parallel, by a primary (e.g., accurate) speech recognition engine (SRE) and a secondary (e.g., fast) SRE; generating, with the primary SRE, a primary result; generating, with the secondary SRE, a secondary result; appending the secondary result to a word list; and merging the primary result into the secondary result in the word list. Combining output from the primary and secondary SREs into a single decoder as described herein improves user-perceived latency while maintaining or improving accuracy, among other advantages.
    Type: Grant
    Filed: December 15, 2020
    Date of Patent: December 20, 2022
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Hosam Adel Khalil, Emilian Stoimenov, Christopher Hakan Basoglu, Kshitiz Kumar, Jian Wu
  • Publication number: 20220189467
    Abstract: Disclosed speech recognition techniques improve user-perceived latency while maintaining accuracy by: receiving an audio stream, in parallel, by a primary (e.g., accurate) speech recognition engine (SRE) and a secondary (e.g., fast) SRE; generating, with the primary SRE, a primary result; generating, with the secondary SRE, a secondary result; appending the secondary result to a word list; and merging the primary result into the secondary result in the word list. Combining output from the primary and secondary SREs into a single decoder as described herein improves user-perceived latency while maintaining or improving accuracy, among other advantages.
    Type: Application
    Filed: December 15, 2020
    Publication date: June 16, 2022
    Inventors: Hosam Adel KHALIL, Emilian STOIMENOV, Christopher Hakan BASOGLU, Kshitiz KUMAR, Jian WU
  • Patent number: 11158305
    Abstract: Generally discussed herein are devices, systems, and methods for wake word verification. A method can include receiving, at a server, a message from a device indicating that an utterance of a user-defined wake word was detected at the device, the message including (a) audio samples or features extracted from the audio samples and (b) data indicating the user-defined wake word, retrieving or generating, at the server, a custom decoding graph for the user-defined wake word, wherein the decoding graph and the static portion of the wake word verification model form a custom wake word verification model for the user-defined wake word, executing the wake word verification model to determine a likelihood that the wake word was uttered, and providing a message to the device indicating whether wake was uttered based on the determined likelihood.
    Type: Grant
    Filed: July 25, 2019
    Date of Patent: October 26, 2021
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Khuram Shahid, Kshitiz Kumar, Teng Yi, Veljko Miljanic, Huaming Wang, Yifan Gong, Hosam Adel Khalil
  • Publication number: 20210272557
    Abstract: A method of enhancing an automated speech recognition confidence classifier includes receiving a set of baseline confidence features from one or more decoded words, deriving word embedding confidence features from the baseline confidence features, joining the baseline confidence features with word embedding confidence features to create a feature vector, and executing the confidence classifier to generate a confidence score, wherein the confidence classifier is trained with a set of training examples having labeled features corresponding to the feature vector.
    Type: Application
    Filed: March 31, 2021
    Publication date: September 2, 2021
    Inventors: Kshitiz Kumar, Anastasios Anastasakos, Yifan Gong
  • Patent number: 10991365
    Abstract: A method of enhancing an automated speech recognition confidence classifier includes receiving a set of baseline confidence features from one or more decoded words, deriving word embedding confidence features from the baseline confidence features, joining the baseline confidence features with word embedding confidence features to create a feature vector, and executing the confidence classifier to generate a confidence score, wherein the confidence classifier is trained with a set of training examples having labeled features corresponding to the feature vector.
    Type: Grant
    Filed: April 8, 2019
    Date of Patent: April 27, 2021
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Kshitiz Kumar, Anastasios Anastasakos, Yifan Gong
  • Publication number: 20200349925
    Abstract: Generally discussed herein are devices, systems, and methods for wake word verification. A method can include receiving, at a server, a message from a device indicating that an utterance of a user-defined wake word was detected at the device, the message including (a) audio samples or features extracted from the audio samples and (b) data indicating the user-defined wake word, retrieving or generating, at the server, a custom decoding graph for the user-defined wake word, wherein the decoding graph and the static portion of the wake word verification model form a custom wake word verification model for the user-defined wake word, executing the wake word verification model to determine a likelihood that the wake word was uttered, and providing a message to the device indicating whether wake was uttered based on the determined likelihood.
    Type: Application
    Filed: July 25, 2019
    Publication date: November 5, 2020
    Inventors: Khuram Shahid, Kshitiz Kumar, Teng Yi, Veljko Miljanic, Huaming Wang, Yifan Gong, Hosam Adel Khalil
  • Publication number: 20200320985
    Abstract: A method of enhancing an automated speech recognition confidence classifier includes receiving a set of baseline confidence features from one or more decoded words, deriving word embedding confidence features from the baseline confidence features, joining the baseline confidence features with word embedding confidence features to create a feature vector, and executing the confidence classifier to generate a confidence score, wherein the confidence classifier is trained with a set of training examples having labeled features corresponding to the feature vector.
    Type: Application
    Filed: April 8, 2019
    Publication date: October 8, 2020
    Inventors: Kshitiz Kumar, Anastasios Anastasakos, Yifan Gong
  • Publication number: 20200312307
    Abstract: A computer implemented method classifies an input corresponding to multiple different kinds of input. The method includes obtaining a set of features from the input, providing the set of features to multiple different models to generate state predictions, generating a set of state-dependent predicted weights, and combining the state predictions from the multiple models, based on the state-dependent predicted weights for classification of the set of features.
    Type: Application
    Filed: March 25, 2019
    Publication date: October 1, 2020
    Inventors: Kshitiz Kumar, Yifan Gong
  • Patent number: 10706852
    Abstract: The described technology provides arbitration between speech recognition results generated by different automatic speech recognition (ASR) engines, such as ASR engines trained according to different language or acoustic models. The system includes an arbitrator that selects between a first speech recognition result representing an acoustic utterance as transcribed by a first ASR engine and a second speech recognition result representing the acoustic utterance as transcribed by a second ASR engine. This selection is based on a set of confidence features that is initially used by the first ASR engine or the second ASR engine to generate the first and second speech recognition results.
    Type: Grant
    Filed: November 13, 2015
    Date of Patent: July 7, 2020
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Kshitiz Kumar, Hosam Khalil, Yifan Gong, Ziad Al-Bawab, Chaojun Liu
  • Patent number: 10235994
    Abstract: The technology described herein uses a modular model to process speech. A deep learning based acoustic model comprises a stack of different types of neural network layers. The sub-modules of a deep learning based acoustic model can be used to represent distinct non-phonetic acoustic factors, such as accent origins (e.g. native, non-native), speech channels (e.g. mobile, bluetooth, desktop etc.), speech application scenario (e.g. voice search, short message dictation etc.), and speaker variation (e.g. individual speakers or clustered speakers), etc. The technology described herein uses certain sub-modules in a first context and a second group of sub-modules in a second context.
    Type: Grant
    Filed: June 30, 2016
    Date of Patent: March 19, 2019
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Yan Huang, Chaojun Liu, Kshitiz Kumar, Kaustubh Prakash Kalgaonkar, Yifan Gong
  • Patent number: 10115393
    Abstract: A computer-readable speaker-adapted speech engine acoustic model can be generated. The generating of the acoustic model can include performing speaker-specific adaptation of one or more layers of the model to produce one or more adaptive layers comprising layer weights, with the speaker-specific adaptation comprising a data size reduction technique. The data size reduction technique can be threshold value adaptation, corner area adaptation, diagonal-based quantization, adaptive matrix reduction, or a combination of these reduction techniques. The speaker-adapted speech engine model can be accessed and used in performing speech recognition on computer-readable audio speech input via a computerized speech recognition engine.
    Type: Grant
    Filed: October 31, 2016
    Date of Patent: October 30, 2018
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Kshitiz Kumar, Chaojun Liu, Yifan Gong
  • Patent number: 9997161
    Abstract: The described technology provides normalization of speech recognition confidence classifier (CC) scores that maintains the accuracy of acceptance metrics. A speech recognition CC scores quantitatively represents the correctness of decoded utterances in a defined range (e.g., [0,1]). An operating threshold is associated with a confidence classifier, such that utterance recognitions having scores exceeding the operating threshold are deemed acceptable. However, when a speech recognition engine, an acoustic model, and/or other parameters are updated by the platform, the correct-accept (CA) versus false-accept (FA) profile can change such that the application software's operating threshold is no longer valid or as accurate.
    Type: Grant
    Filed: September 11, 2015
    Date of Patent: June 12, 2018
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Yifan Gong, Chaojun Liu, Kshitiz Kumar