Patents by Inventor Kshitiz Kumar
Kshitiz Kumar has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240362506Abstract: This disclosure describes one or more implementations of a video inference system that utilizes machine-learning models to efficiently and flexibly process digital videos utilizing various improved video inference architectures. For example, the video inference system provides a framework for improving digital video processing by increasing the efficiency of both central processing units (CPUs) and graphics processing units (GPUs). In one example, the video inference system utilizes a first video inference architecture to reduce the number of computing resources needed to inference digital videos by analyzing multiple digital videos utilizing sets of CPU/GPU containers along with parallel pipeline processing. In a further example, the video inference system utilizes a second video inference architecture that facilitates multiple CPUs to preprocess multiple digital videos in parallel as well as a GPU to continuously, sequentially, and efficiently inference each of the digital videos.Type: ApplicationFiled: July 12, 2024Publication date: October 31, 2024Inventors: Akhilesh Kumar, Xiaozhen Xue, Daniel Miranda, Nicolas Huynh Thien, Kshitiz Garg
-
Publication number: 20240323465Abstract: The present disclosure relates to a system, a method and a computer-readable medium for tagging live streaming data. The method includes generating a first intermediate tag for the live streaming program, generating a second intermediate tag for the live streaming program, and determining a final tag for the live streaming program according to the first intermediate tag and the second intermediate tag. The present disclosure can categorize contents in a more granular and precise way.Type: ApplicationFiled: June 4, 2024Publication date: September 26, 2024Inventors: Hemanth Kumar AJARU, Kshitiz YADAV, Durgesh KUMAR, Hardik TANEJA, Shih Bo LIN
-
Patent number: 12067499Abstract: This disclosure describes one or more implementations of a video inference system that utilizes machine-learning models to efficiently and flexibly process digital videos utilizing various improved video inference architectures. For example, the video inference system provides a framework for improving digital video processing by increasing the efficiency of both central processing units (CPUs) and graphics processing units (GPUs). In one example, the video inference system utilizes a first video inference architecture to reduce the number of computing resources needed to inference digital videos by analyzing multiple digital videos utilizing sets of CPU/GPU containers along with parallel pipeline processing. In a further example, the video inference system utilizes a second video inference architecture that facilitates multiple CPUs to preprocess multiple digital videos in parallel as well as a GPU to continuously, sequentially, and efficiently inference each of the digital videos.Type: GrantFiled: November 2, 2020Date of Patent: August 20, 2024Assignee: Adobe Inc.Inventors: Akhilesh Kumar, Xiaozhen Xue, Daniel Miranda, Nicolas Huynh Thien, Kshitiz Garg
-
Patent number: 12014728Abstract: A computer implemented method classifies an input corresponding to multiple different kinds of input. The method includes obtaining a set of features from the input, providing the set of features to multiple different models to generate state predictions, generating a set of state-dependent predicted weights, and combining the state predictions from the multiple models, based on the state-dependent predicted weights for classification of the set of features.Type: GrantFiled: March 25, 2019Date of Patent: June 18, 2024Assignee: Microsoft Technology Licensing, LLCInventors: Kshitiz Kumar, Yifan Gong
-
Patent number: 11929076Abstract: Disclosed speech recognition techniques improve user-perceived latency while maintaining accuracy by: receiving an audio stream, in parallel, by a primary (e.g., accurate) speech recognition engine (SRE) and a secondary (e.g., fast) SRE; generating, with the primary SRE, a primary result; generating, with the secondary SRE, a secondary result; appending the secondary result to a word list; and merging the primary result into the secondary result in the word list. Combining output from the primary and secondary SREs into a single decoder as described herein improves user-perceived latency while maintaining or improving accuracy, among other advantages.Type: GrantFiled: December 1, 2022Date of Patent: March 12, 2024Assignee: Microsoft Technology Licensing, LLC.Inventors: Hosam Adel Khalil, Emilian Stoimenov, Christopher Hakan Basoglu, Kshitiz Kumar, Jian Wu
-
Publication number: 20230401392Abstract: A data processing system is implemented for receiving speech data for a plurality of languages, and determining letters from the speech data. The data processing system also implements normalizing the speech data by applying linguistic based rules for Latin-based languages on the determined letters, building a computer model using the normalized speech data, fine-tuning the computer model using additional speech data, and recognizing words in a target language using the fine-tuned computer model.Type: ApplicationFiled: June 9, 2022Publication date: December 14, 2023Applicant: Microsoft Technology Licensing, LLCInventors: Kshitiz KUMAR, Jian WU, Bo REN, Tianyu WU, Fahimeh BAHMANINEZHAD, Edward C. LIN, Xiaoyang CHEN, Changliang LIU
-
Patent number: 11620992Abstract: A method of enhancing an automated speech recognition confidence classifier includes receiving a set of baseline confidence features from one or more decoded words, deriving word embedding confidence features from the baseline confidence features, joining the baseline confidence features with word embedding confidence features to create a feature vector, and executing the confidence classifier to generate a confidence score, wherein the confidence classifier is trained with a set of training examples having labeled features corresponding to the feature vector.Type: GrantFiled: March 31, 2021Date of Patent: April 4, 2023Assignee: Microsoft Technology Licensing, LLCInventors: Kshitiz Kumar, Anastasios Anastasakos, Yifan Gong
-
Publication number: 20230102295Abstract: Disclosed speech recognition techniques improve user-perceived latency while maintaining accuracy by: receiving an audio stream, in parallel, by a primary (e.g., accurate) speech recognition engine (SRE) and a secondary (e.g., fast) SRE; generating, with the primary SRE, a primary result; generating, with the secondary SRE, a secondary result; appending the secondary result to a word list; and merging the primary result into the secondary result in the word list. Combining output from the primary and secondary SREs into a single decoder as described herein improves user-perceived latency while maintaining or improving accuracy, among other advantages.Type: ApplicationFiled: December 1, 2022Publication date: March 30, 2023Inventors: Hosam Adel KHALIL, Emilian STOIMENOV, Christopher Hakan BASOGLU, Kshitiz KUMAR, Jian WU
-
Patent number: 11532312Abstract: Disclosed speech recognition techniques improve user-perceived latency while maintaining accuracy by: receiving an audio stream, in parallel, by a primary (e.g., accurate) speech recognition engine (SRE) and a secondary (e.g., fast) SRE; generating, with the primary SRE, a primary result; generating, with the secondary SRE, a secondary result; appending the secondary result to a word list; and merging the primary result into the secondary result in the word list. Combining output from the primary and secondary SREs into a single decoder as described herein improves user-perceived latency while maintaining or improving accuracy, among other advantages.Type: GrantFiled: December 15, 2020Date of Patent: December 20, 2022Assignee: Microsoft Technology Licensing, LLCInventors: Hosam Adel Khalil, Emilian Stoimenov, Christopher Hakan Basoglu, Kshitiz Kumar, Jian Wu
-
Publication number: 20220189467Abstract: Disclosed speech recognition techniques improve user-perceived latency while maintaining accuracy by: receiving an audio stream, in parallel, by a primary (e.g., accurate) speech recognition engine (SRE) and a secondary (e.g., fast) SRE; generating, with the primary SRE, a primary result; generating, with the secondary SRE, a secondary result; appending the secondary result to a word list; and merging the primary result into the secondary result in the word list. Combining output from the primary and secondary SREs into a single decoder as described herein improves user-perceived latency while maintaining or improving accuracy, among other advantages.Type: ApplicationFiled: December 15, 2020Publication date: June 16, 2022Inventors: Hosam Adel KHALIL, Emilian STOIMENOV, Christopher Hakan BASOGLU, Kshitiz KUMAR, Jian WU
-
Patent number: 11158305Abstract: Generally discussed herein are devices, systems, and methods for wake word verification. A method can include receiving, at a server, a message from a device indicating that an utterance of a user-defined wake word was detected at the device, the message including (a) audio samples or features extracted from the audio samples and (b) data indicating the user-defined wake word, retrieving or generating, at the server, a custom decoding graph for the user-defined wake word, wherein the decoding graph and the static portion of the wake word verification model form a custom wake word verification model for the user-defined wake word, executing the wake word verification model to determine a likelihood that the wake word was uttered, and providing a message to the device indicating whether wake was uttered based on the determined likelihood.Type: GrantFiled: July 25, 2019Date of Patent: October 26, 2021Assignee: Microsoft Technology Licensing, LLCInventors: Khuram Shahid, Kshitiz Kumar, Teng Yi, Veljko Miljanic, Huaming Wang, Yifan Gong, Hosam Adel Khalil
-
Publication number: 20210272557Abstract: A method of enhancing an automated speech recognition confidence classifier includes receiving a set of baseline confidence features from one or more decoded words, deriving word embedding confidence features from the baseline confidence features, joining the baseline confidence features with word embedding confidence features to create a feature vector, and executing the confidence classifier to generate a confidence score, wherein the confidence classifier is trained with a set of training examples having labeled features corresponding to the feature vector.Type: ApplicationFiled: March 31, 2021Publication date: September 2, 2021Inventors: Kshitiz Kumar, Anastasios Anastasakos, Yifan Gong
-
Patent number: 10991365Abstract: A method of enhancing an automated speech recognition confidence classifier includes receiving a set of baseline confidence features from one or more decoded words, deriving word embedding confidence features from the baseline confidence features, joining the baseline confidence features with word embedding confidence features to create a feature vector, and executing the confidence classifier to generate a confidence score, wherein the confidence classifier is trained with a set of training examples having labeled features corresponding to the feature vector.Type: GrantFiled: April 8, 2019Date of Patent: April 27, 2021Assignee: Microsoft Technology Licensing, LLCInventors: Kshitiz Kumar, Anastasios Anastasakos, Yifan Gong
-
Publication number: 20200349925Abstract: Generally discussed herein are devices, systems, and methods for wake word verification. A method can include receiving, at a server, a message from a device indicating that an utterance of a user-defined wake word was detected at the device, the message including (a) audio samples or features extracted from the audio samples and (b) data indicating the user-defined wake word, retrieving or generating, at the server, a custom decoding graph for the user-defined wake word, wherein the decoding graph and the static portion of the wake word verification model form a custom wake word verification model for the user-defined wake word, executing the wake word verification model to determine a likelihood that the wake word was uttered, and providing a message to the device indicating whether wake was uttered based on the determined likelihood.Type: ApplicationFiled: July 25, 2019Publication date: November 5, 2020Inventors: Khuram Shahid, Kshitiz Kumar, Teng Yi, Veljko Miljanic, Huaming Wang, Yifan Gong, Hosam Adel Khalil
-
Publication number: 20200320985Abstract: A method of enhancing an automated speech recognition confidence classifier includes receiving a set of baseline confidence features from one or more decoded words, deriving word embedding confidence features from the baseline confidence features, joining the baseline confidence features with word embedding confidence features to create a feature vector, and executing the confidence classifier to generate a confidence score, wherein the confidence classifier is trained with a set of training examples having labeled features corresponding to the feature vector.Type: ApplicationFiled: April 8, 2019Publication date: October 8, 2020Inventors: Kshitiz Kumar, Anastasios Anastasakos, Yifan Gong
-
Publication number: 20200312307Abstract: A computer implemented method classifies an input corresponding to multiple different kinds of input. The method includes obtaining a set of features from the input, providing the set of features to multiple different models to generate state predictions, generating a set of state-dependent predicted weights, and combining the state predictions from the multiple models, based on the state-dependent predicted weights for classification of the set of features.Type: ApplicationFiled: March 25, 2019Publication date: October 1, 2020Inventors: Kshitiz Kumar, Yifan Gong
-
Patent number: 10706852Abstract: The described technology provides arbitration between speech recognition results generated by different automatic speech recognition (ASR) engines, such as ASR engines trained according to different language or acoustic models. The system includes an arbitrator that selects between a first speech recognition result representing an acoustic utterance as transcribed by a first ASR engine and a second speech recognition result representing the acoustic utterance as transcribed by a second ASR engine. This selection is based on a set of confidence features that is initially used by the first ASR engine or the second ASR engine to generate the first and second speech recognition results.Type: GrantFiled: November 13, 2015Date of Patent: July 7, 2020Assignee: Microsoft Technology Licensing, LLCInventors: Kshitiz Kumar, Hosam Khalil, Yifan Gong, Ziad Al-Bawab, Chaojun Liu
-
Patent number: 10235994Abstract: The technology described herein uses a modular model to process speech. A deep learning based acoustic model comprises a stack of different types of neural network layers. The sub-modules of a deep learning based acoustic model can be used to represent distinct non-phonetic acoustic factors, such as accent origins (e.g. native, non-native), speech channels (e.g. mobile, bluetooth, desktop etc.), speech application scenario (e.g. voice search, short message dictation etc.), and speaker variation (e.g. individual speakers or clustered speakers), etc. The technology described herein uses certain sub-modules in a first context and a second group of sub-modules in a second context.Type: GrantFiled: June 30, 2016Date of Patent: March 19, 2019Assignee: Microsoft Technology Licensing, LLCInventors: Yan Huang, Chaojun Liu, Kshitiz Kumar, Kaustubh Prakash Kalgaonkar, Yifan Gong
-
Patent number: 10115393Abstract: A computer-readable speaker-adapted speech engine acoustic model can be generated. The generating of the acoustic model can include performing speaker-specific adaptation of one or more layers of the model to produce one or more adaptive layers comprising layer weights, with the speaker-specific adaptation comprising a data size reduction technique. The data size reduction technique can be threshold value adaptation, corner area adaptation, diagonal-based quantization, adaptive matrix reduction, or a combination of these reduction techniques. The speaker-adapted speech engine model can be accessed and used in performing speech recognition on computer-readable audio speech input via a computerized speech recognition engine.Type: GrantFiled: October 31, 2016Date of Patent: October 30, 2018Assignee: Microsoft Technology Licensing, LLCInventors: Kshitiz Kumar, Chaojun Liu, Yifan Gong
-
Patent number: 9997161Abstract: The described technology provides normalization of speech recognition confidence classifier (CC) scores that maintains the accuracy of acceptance metrics. A speech recognition CC scores quantitatively represents the correctness of decoded utterances in a defined range (e.g., [0,1]). An operating threshold is associated with a confidence classifier, such that utterance recognitions having scores exceeding the operating threshold are deemed acceptable. However, when a speech recognition engine, an acoustic model, and/or other parameters are updated by the platform, the correct-accept (CA) versus false-accept (FA) profile can change such that the application software's operating threshold is no longer valid or as accurate.Type: GrantFiled: September 11, 2015Date of Patent: June 12, 2018Assignee: Microsoft Technology Licensing, LLCInventors: Yifan Gong, Chaojun Liu, Kshitiz Kumar