Patents by Inventor Chaojun Liu

Chaojun Liu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Adaptive batching to reduce recognition latency

Patent number: 11705117

Abstract: Acoustic features are batched into two different batches. The second batch of the two batches is made in response to a detection of a word hypothesis output by a speech recognition network that received the first batch. The number of acoustic feature frames of the second batch is equal to a second batch size greater than the first batch size. The second batch is also to the speech recognition network for processing.

Type: Grant

Filed: October 13, 2021

Date of Patent: July 18, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Hosam A. Khalil, Emilian Y. Stoimenov, Yifan Gong, Chaojun Liu, Christopher H. Basoglu, Amit K. Agarwal, Naveen Parihar, Sayan Pathak
ADAPTIVE BATCHING TO REDUCE RECOGNITION LATENCY

Publication number: 20220068269

Abstract: Embodiments may include collection of a first batch of acoustic feature frames of an audio signal, the number of acoustic feature frames of the first batch equal to a first batch size, input of the first batch to a speech recognition network, collection, in response to detection of a word hypothesis output by the speech recognition network, of a second batch of acoustic feature frames of the audio signal, the number of acoustic feature frames of the second batch equal to a second batch size greater than the first batch size, and input of the second batch to the speech recognition network.

Type: Application

Filed: October 13, 2021

Publication date: March 3, 2022

Inventors: Hosam A. KHALIL, Emilian Y. STOIMENOV, Yifan GONG, Chaojun LIU, Christopher H. BASOGLU, Amit K. AGARWAL, Naveen PARIHAR, Sayan PATHAK
Adaptive batching to reduce recognition latency

Patent number: 11183178

Abstract: Embodiments may include collection of a first batch of acoustic feature frames of an audio signal, the number of acoustic feature frames of the first batch equal to a first batch size, input of the first batch to a speech recognition network, collection, in response to detection of a word hypothesis output by the speech recognition network, of a second batch of acoustic feature frames of the audio signal, the number of acoustic feature frames of the second batch equal to a second batch size greater than the first batch size, and input of the second batch to the speech recognition network.

Type: Grant

Filed: January 27, 2020

Date of Patent: November 23, 2021

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Hosam A. Khalil, Emilian Y. Stoimenov, Yifan Gong, Chaojun Liu, Christopher H. Basoglu, Amit K. Agarwal, Naveen Parihar, Sayan Pathak
ADAPTIVE BATCHING TO REDUCE RECOGNITION LATENCY

Publication number: 20210217410

Abstract: Embodiments may include collection of a first batch of acoustic feature frames of an audio signal, the number of acoustic feature frames of the first batch equal to a first batch size, input of the first batch to a speech recognition network, collection, in response to detection of a word hypothesis output by the speech recognition network, of a second batch of acoustic feature frames of the audio signal, the number of acoustic feature frames of the second batch equal to a second batch size greater than the first batch size, and input of the second batch to the speech recognition network.

Type: Application

Filed: January 27, 2020

Publication date: July 15, 2021

Inventors: Hosam A. KHALIL, Emilian Y. STOIMENOV, Yifan GONG, Chaojun LIU, Christopher H. BASOGLU, Amit K. AGARWAL, Naveen PARIHAR, Sayan PATHAK
Confidence features for automated speech recognition arbitration

Patent number: 10706852

Abstract: The described technology provides arbitration between speech recognition results generated by different automatic speech recognition (ASR) engines, such as ASR engines trained according to different language or acoustic models. The system includes an arbitrator that selects between a first speech recognition result representing an acoustic utterance as transcribed by a first ASR engine and a second speech recognition result representing the acoustic utterance as transcribed by a second ASR engine. This selection is based on a set of confidence features that is initially used by the first ASR engine or the second ASR engine to generate the first and second speech recognition results.

Type: Grant

Filed: November 13, 2015

Date of Patent: July 7, 2020

Assignee: Microsoft Technology Licensing, LLC

Inventors: Kshitiz Kumar, Hosam Khalil, Yifan Gong, Ziad Al-Bawab, Chaojun Liu
Modular deep learning model

Patent number: 10235994

Abstract: The technology described herein uses a modular model to process speech. A deep learning based acoustic model comprises a stack of different types of neural network layers. The sub-modules of a deep learning based acoustic model can be used to represent distinct non-phonetic acoustic factors, such as accent origins (e.g. native, non-native), speech channels (e.g. mobile, bluetooth, desktop etc.), speech application scenario (e.g. voice search, short message dictation etc.), and speaker variation (e.g. individual speakers or clustered speakers), etc. The technology described herein uses certain sub-modules in a first context and a second group of sub-modules in a second context.

Type: Grant

Filed: June 30, 2016

Date of Patent: March 19, 2019

Assignee: Microsoft Technology Licensing, LLC

Inventors: Yan Huang, Chaojun Liu, Kshitiz Kumar, Kaustubh Prakash Kalgaonkar, Yifan Gong
AIR FILTRATION MEDIA AND METHOD OF PROCESSING THE SAME

Publication number: 20190022566

Abstract: Air filtration media and methods of processing the same are described herein. One method of processing an air filtration medium includes mixing an adsorption material, a polymer material, and a reinforcement material, compressing the mixture, and heating the mixture.

Type: Application

Filed: February 29, 2016

Publication date: January 24, 2019

Inventors: Chaojun Liu, Li Wang
Reduced size computerized speech model speaker adaptation

Patent number: 10115393

Abstract: A computer-readable speaker-adapted speech engine acoustic model can be generated. The generating of the acoustic model can include performing speaker-specific adaptation of one or more layers of the model to produce one or more adaptive layers comprising layer weights, with the speaker-specific adaptation comprising a data size reduction technique. The data size reduction technique can be threshold value adaptation, corner area adaptation, diagonal-based quantization, adaptive matrix reduction, or a combination of these reduction techniques. The speaker-adapted speech engine model can be accessed and used in performing speech recognition on computer-readable audio speech input via a computerized speech recognition engine.

Type: Grant

Filed: October 31, 2016

Date of Patent: October 30, 2018

Assignee: Microsoft Technology Licensing, LLC

Inventors: Kshitiz Kumar, Chaojun Liu, Yifan Gong
Automatic speech recognition confidence classifier

Patent number: 9997161

Abstract: The described technology provides normalization of speech recognition confidence classifier (CC) scores that maintains the accuracy of acceptance metrics. A speech recognition CC scores quantitatively represents the correctness of decoded utterances in a defined range (e.g., [0,1]). An operating threshold is associated with a confidence classifier, such that utterance recognitions having scores exceeding the operating threshold are deemed acceptable. However, when a speech recognition engine, an acoustic model, and/or other parameters are updated by the platform, the correct-accept (CA) versus false-accept (FA) profile can change such that the application software's operating threshold is no longer valid or as accurate.

Type: Grant

Filed: September 11, 2015

Date of Patent: June 12, 2018

Assignee: Microsoft Technology Licensing, LLC

Inventors: Yifan Gong, Chaojun Liu, Kshitiz Kumar
MODULAR DEEP LEARNING MODEL

Publication number: 20170256254

Abstract: The technology described herein uses a modular model to process speech. A deep learning based acoustic model comprises a stack of different types of neural network layers. The sub-modules of a deep learning based acoustic model can be used to represent distinct non-phonetic acoustic factors, such as accent origins (e.g. native, non-native), speech channels (e.g. mobile, bluetooth, desktop etc.), speech application scenario (e.g. voice search, short message dictation etc.), and speaker variation (e.g. individual speakers or clustered speakers), etc. The technology described herein uses certain sub-modules in a first context and a second group of sub-modules in a second context.

Type: Application

Filed: June 30, 2016

Publication date: September 7, 2017

Inventors: Yan HUANG, Chaojun LIU, Kshitiz KUMAR, Kaustubh Prakash KALGAONKAR, Yifan GONG
CONFIDENCE FEATURES FOR AUTOMATED SPEECH RECOGNITION ARBITRATION

Publication number: 20170140759

Abstract: The described technology provides arbitration between speech recognition results generated by different automatic speech recognition (ASR) engines, such as ASR engines trained according to different language or acoustic models. The system includes an arbitrator that selects between a first speech recognition result representing an acoustic utterance as transcribed by a first ASR engine and a second speech recognition result representing the acoustic utterance as transcribed by a second ASR engine. This selection is based on a set of confidence features that is initially used by the first ASR engine or the second ASR engine to generate the first and second speech recognition results.

Type: Application

Filed: November 13, 2015

Publication date: May 18, 2017

Inventors: Kshitiz Kumar, Hosam Khalil, Yifan Gong, Ziad Al-Bawab, Chaojun Liu
AUTOMATIC SPEECH RECOGNITION CONFIDENCE CLASSIFIER

Publication number: 20170076725

Abstract: The described technology provides normalization of speech recognition confidence classifier (CC) scores that maintains the accuracy of acceptance metrics. A speech recognition CC scores quantitatively represents the correctness of decoded utterances in a defined range (e.g., [0,1]). An operating threshold is associated with a confidence classifier, such that utterance recognitions having scores exceeding the operating threshold are deemed acceptable. However, when a speech recognition engine, an acoustic model, and/or other parameters are updated by the platform, the correct-accept (CA) versus false-accept (FA) profile can change such that the application software's operating threshold is no longer valid or as accurate.

Type: Application

Filed: September 11, 2015

Publication date: March 16, 2017

Inventors: Kshitiz Kumar, Yifan Gong, Chaojun Liu
DEEP NEURAL SUPPORT VECTOR MACHINES

Publication number: 20160307565

Abstract: Aspects of the technology described herein relate to a new type of deep neural network (DNN). The new DNN is described herein as a deep neural support vector machine (DNSVM). Traditional DNNs use the multinomial logistic regression (softmax activation) at the top layer and underlying layers for training. The new DNN instead uses a support vector machine (SVM) as one or more layers, including the top layer. The technology described herein can use one of two training algorithms to train the DNSVM to learn parameters of SVM and DNN in the maximum-margin criteria. The first training method is a frame-level training. In the frame-level training, the new model is shown to be related to the multi-class SVM with DNN features. The second training method is the sequence-level training. The sequence-level training is related to the structured SVM with DNN features and HMM state transition features.

Type: Application

Filed: February 16, 2016

Publication date: October 20, 2016

Inventors: CHAOJUN LIU, KAISHENG YAO, YIFAN GONG, SHIXIONG ZHANG
Model training for automatic speech recognition from imperfect transcription data

Patent number: 9280969

Abstract: Techniques and systems for training an acoustic model are described. In an embodiment, a technique for training an acoustic model includes dividing a corpus of training data that includes transcription errors into N parts, and on each part, decoding an utterance with an incremental acoustic model and an incremental language model to produce a decoded transcription. The technique may further include inserting silence between a pair of words into the decoded transcription and aligning an original transcription corresponding to the utterance with the decoded transcription according to time for each part. The technique may further include selecting a segment from the utterance having at least Q contiguous matching aligned words, and training the incremental acoustic model with the selected segment. The trained incremental acoustic model may then be used on a subsequent part of the training data. Other embodiments are described and claimed.

Type: Grant

Filed: June 10, 2009

Date of Patent: March 8, 2016

Assignee: MICROSOFT TECHNOLOGY LICENSING, LLC

Inventors: Jinyu Li, Yifan Gong, Chaojun Liu, Kaisheng Yao
Enhanced automatic speech recognition using mapping between unsupervised and supervised speech model parameters trained on same acoustic training data

Patent number: 8306819

Abstract: Techniques for enhanced automatic speech recognition are described. An enhanced ASR system may be operative to generate an error correction function. The error correction function may represent a mapping between a supervised set of parameters and an unsupervised training set of parameters generated using a same set of acoustic training data, and apply the error correction function to an unsupervised testing set of parameters to form a corrected set of parameters used to perform speaker adaptation. Other embodiments are described and claimed.

Type: Grant

Filed: March 9, 2009

Date of Patent: November 6, 2012

Assignee: Microsoft Corporation

Inventors: Chaojun Liu, Yifan Gong
Interrupt dispatching method in multi-core environment and multi-core processor

Patent number: 7953915

Abstract: Disclosed is an interrupt dispatching system and method in a multi-core processor environment. The processor includes an interrupt dispatcher and N cores capable of interrupt handling which are divided into a plurality of groups of cores, where N is a positive integer greater than one. The method generates a token in response to an arriving interrupt; determines a group of cores to be preferentially used to handle the interrupt as a hot group in accordance with the interrupt; and sends the token to the hot group, determines sequentially from the first core in the hot group whether an interrupt dispatch termination condition is satisfied, and determines the current core as a response core to be used to handle the interrupt upon determining satisfaction of the interrupt dispatch termination condition. With the invention, delay in responding to an interrupt by the processor is reduced providing optimized performance of the processor.

Type: Grant

Filed: March 26, 2009

Date of Patent: May 31, 2011

Assignee: International Business Machines Corporation

Inventors: Yi Ge, ChaoJun Liu, Wen Bo Shen, Yuan Ping
MODEL TRAINING FOR AUTOMATIC SPEECH RECOGNITION FROM IMPERFECT TRANSCRIPTION DATA

Publication number: 20100318355

Abstract: Techniques and systems for training an acoustic model are described. In an embodiment, a technique for training an acoustic model includes dividing a corpus of training data that includes transcription errors into N parts, and on each part, decoding an utterance with an incremental acoustic model and an incremental language model to produce a decoded transcription. The technique may further include inserting silence between a pair of words into the decoded transcription and aligning an original transcription corresponding to the utterance with the decoded transcription according to time for each part. The technique may further include selecting a segment from the utterance having at least Q contiguous matching aligned words, and training the incremental acoustic model with the selected segment. The trained incremental acoustic model may then be used on a subsequent part of the training data. Other embodiments are described and claimed.

Type: Application

Filed: June 10, 2009

Publication date: December 16, 2010

Applicant: MICROSOFT CORPORATION

Inventors: Jinyu Li, Yifan Gong, Chaojun Liu, Kaisheng Yao
TECHNIQUES FOR ENHANCED AUTOMATIC SPEECH RECOGNITION

Publication number: 20100228548

Abstract: Techniques for enhanced automatic speech recognition are described. An enhanced ASR system may be operative to generate an error correction function. The error correction function may represent a mapping between a supervised set of parameters and an unsupervised training set of parameters generated using a same set of acoustic training data, and apply the error correction function to an unsupervised testing set of parameters to form a corrected set of parameters used to perform speaker adaptation. Other embodiments are described and claimed.

Type: Application

Filed: March 9, 2009

Publication date: September 9, 2010

Applicant: MICROSOFT CORPORATION

Inventors: Chaojun Liu, Yifan Gong
INTERRUPT DISPATCHING METHOD IN MULTI-CORE ENVIRONMENT AND MULTI-CORE PROCESSOR

Publication number: 20090248934

Abstract: Disclosed is an interrupt dispatching system and method in a multi-core processor environment. The processor includes an interrupt dispatcher and N cores capable of interrupt handling which are divided into a plurality of groups of cores, where N is a positive integer greater than one. The method generates a token in response to an arriving interrupt; determines a group of cores to be preferentially used to handle the interrupt as a hot group in accordance with the interrupt; and sends the token to the hot group, determines sequentially from the first core in the hot group whether an interrupt dispatch termination condition is satisfied, and determines the current core as a response core to be used to handle the interrupt upon determining satisfaction of the interrupt dispatch termination condition. With the invention, delay in responding to an interrupt by the processor is reduced providing optimized performance of the processor.

Type: Application

Filed: March 26, 2009

Publication date: October 1, 2009

Applicant: International Business Machines Corporation

Inventors: Yi Ge, ChaoJun Liu, Wen Bo Shen, Yuan Ping
USER INTERACTION FOR CONTENT BASED STORAGE AND RETRIEVAL

Publication number: 20090064008

Abstract: A graphic user interface system for use with a content based retrieval system includes an active display having display areas. For example, the display areas include a main area providing an overview of database contents by displaying representative samples of the database contents. The display areas also include one or more query areas into which one or more of the representative samples can be moved from the main area by a user employing gesture based interaction. A query formulation module employs the one or more representative samples moved into the query area to provide feedback to the content based retrieval system.

Type: Application

Filed: August 31, 2007

Publication date: March 5, 2009

Applicant: MATSUSHITA ELECTRIC INDUSTRIAL CO., LTD.

Inventors: Chaojun Liu, Luca Rigazio, Peter Veprek, David Kryze, Steve Pearson

1 2 next