Patents by Inventor Xiaobo Pi

Xiaobo Pi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Voice barge-in in telephony speech recognition

Patent number: 8473290

Abstract: An interactive voice response system is described that supports full duplex data transfer to enable the playing of a voice prompt to a user of telephony system while the system listens for voice barge-in from the user. The system includes a speech detection module that may utilize various criteria such as frame energy magnitude and duration thresholds to detect speech. The system also includes an automatic speech recognition engine. When the automatic speech recognition engine recognizes a segment of speech, a feature extraction module may be used to subtract a prompt echo spectrum, which corresponds to the currently playing voice prompt, from an echo-dirtied speech spectrum recorded by the system. In order to improve spectrum subtraction, an estimation of the time delay between the echo-dirtied speech and the prompt echo may also be performed.

Type: Grant

Filed: August 25, 2008

Date of Patent: June 25, 2013

Assignee: Intel Corporation

Inventors: Xiaobo Pi, Ying Jia
Audio-visual feature fusion and support vector machine useful for continuous speech recognition

Patent number: 7472063

Abstract: A speech recognition method includes several embodiments describing application of support vector machine analysis to a mouth region. Lip position can be accurately determined and used in conjunction with synchronous or asynchronous audio data to enhance speech recognition probabilities.

Type: Grant

Filed: December 19, 2002

Date of Patent: December 30, 2008

Assignee: Intel Corporation

Inventors: Ara V. Nefian, Xiaobo Pi, Luhong Liang, Xiaoxing Liu, Yibao Zhao
VOICE BARGE-IN IN TELEPHONY SPEECH RECOGNITION

Publication number: 20080310601

Abstract: An interactive voice response system is described that supports full duplex data transfer to enable the playing of a voice prompt to a user of telephony system while the system listens for voice barge-in from the user. The system includes a speech detection module that may utilize various criteria such as frame energy magnitude and duration thresholds to detect speech. The system also includes an automatic speech recognition engine. When the automatic speech recognition engine recognizes a segment of speech, a feature extraction module may be used to subtract a prompt echo spectrum, which corresponds to the currently playing voice prompt, from an echo-dirtied speech spectrum recorded by the system. In order to improve spectrum subtraction, an estimation of the time delay between the echo-dirtied speech and the prompt echo may also be performed.

Type: Application

Filed: August 25, 2008

Publication date: December 18, 2008

Inventors: Xiaobo Pi, Ying Jia
Coupled hidden Markov model (CHMM) for continuous audiovisual speech recognition

Patent number: 7454342

Abstract: Method and apparatus for an audiovisual continuous speech recognition (AVCSR) system using a coupled hidden Markov model (CHMM) are described herein. In one aspect, an exemplary process includes receiving an audio data stream and a video data stream, and performing continuous speech recognition based on the audio and video data streams using a plurality of hidden Markov models (HMMs), a node of each of the HMMs at a time slot being subject to one or more nodes of related HMMs at a preceding time slot. Other methods and apparatuses are also described.

Type: Grant

Filed: March 19, 2003

Date of Patent: November 18, 2008

Assignee: Intel Corporation

Inventors: Ara Victor Nefian, Xiaoxing Liu, Xiaobo Pi, Luhong Liang, Yibao Zhao
Voice barge-in in telephony speech recognition

Patent number: 7437286

Abstract: An interactive voice response system is described that supports full duplex data transfer to enable the playing of a voice prompt to a user of telephony system while the system listens for voice barge-in from the user. The system includes a speech detection module that may utilize various criteria such as frame energy magnitude and duration thresholds to detect speech. The system also includes an automatic speech recognition engine. When the automatic speech recognition engine recognizes a segment of speech, a feature extraction module may be used to subtract a prompt echo spectrum, which corresponds to the currently playing voice prompt, from an echo-dirtied speech spectrum recorded by the system. In order to improve spectrum subtraction, an estimation of the time delay between the echo-dirtied speech and the prompt echo may also be performed.

Type: Grant

Filed: December 27, 2000

Date of Patent: October 14, 2008

Assignee: Intel Corporation

Inventors: Xiaobo Pi, Ying Jia
High-order entropy error functions for neural classifiers

Patent number: 7346497

Abstract: An automatic speech recognition system comprising a speech decoder to resolve phone and word level information, a vector generator to generate information vectors on which a confidence measure is based by a neural network classifier (ANN). An error signal is designed which is not subject to false saturation or over specialization. The error signal is integrated into an error function which is back propagated through the ANN.

Type: Grant

Filed: May 8, 2001

Date of Patent: March 18, 2008

Assignee: Intel Corporation

Inventors: Xiaobo Pi, Ying Jia
Method and apparatus for rejection of speech recognition results in accordance with confidence level

Patent number: 7072750

Abstract: An automatic speech recognition system for continuous speech recognition of vocabulary words for an autoattendent system proving hand-free telephone calling and utilizing a vocabulary comprising numbers or names of people to be called using known techniques for automatic speech recognition models of word sequencing resulting in high confidence levels of recognition.

Type: Grant

Filed: May 8, 2001

Date of Patent: July 4, 2006

Assignee: Intel Corporation

Inventors: Xiaobo Pi, Ying Jia
Audio-visual speaker identification using coupled hidden markov models

Publication number: 20050027530

Abstract: A phoneme and a viseme of a person may be modeled using a coupled hidden Markov model. The coupled hidden Markov model and a second model may be compared to identify the person.

Type: Application

Filed: July 31, 2003

Publication date: February 3, 2005

Inventors: Tieyan Fu, Xiaoxing Liu, Luhong Liang, Xiaobo Pi, Ara Nefian
High-order entropy error functions for neural classifiers

Publication number: 20050015251

Abstract: An automatic speech recognition system comprising a speech decoder to resolve phone and world level information, a vector generator to generate information vectors on which a confidence measure is based by a neural network classifier (ANN). An error signal is designed which is not subject to false saturation or over specialization. The error signal is integrated into an error function which is back propagated through the ANN.

Type: Application

Filed: May 8, 2001

Publication date: January 20, 2005

Inventors: Xiaobo Pi, Ying Jia
Coupled hidden markov model (CHMM) for continuous audiovisual speech recognition

Publication number: 20040186718

Abstract: Method and apparatus for an audiovisual continuous speech recognition (AVCSR) system using a coupled hidden Markov model (CHMM) are described herein. In one aspect, an exemplary process includes receiving an audio data stream and a video data stream, and performing continuous speech recognition based on the audio and video data streams using a plurality of hidden Markov models (HMMs), a node of each of the HMMs at a time slot being subject to one or more nodes of related HMMs at a preceding time slot. Other methods and apparatuses are also described.

Type: Application

Filed: March 19, 2003

Publication date: September 23, 2004

Inventors: Ara Victor Nefian, Xiaoxing Liu, Xiaobo Pi, Luhong Liang, Yibao Zhao
Visual feature extraction procedure useful for audiovisual continuous speech recognition

Publication number: 20040122675

Abstract: A speech recognition method includes several embodiments describing application of support vector machine analysis to a mouth region. Lip position can be accurately determined and used in conjunction with synchronous or asynchronous audio data to enhance speech recognition probabilities.

Type: Application

Filed: December 19, 2002

Publication date: June 24, 2004

Inventors: Ara Victor Nefian, Xiaobo Pi, Luhong Liang, Xiaoxing Liu, Yibao Zhao
Method and apparatus for rejection of speech recognition results in accordance with confidence level

Publication number: 20040015357

Abstract: An automatic speech recognition system for continuous speech recognition of vocabulary words for an autoattendent system proving hand-free telephone calling and utilizing a vocabulary comprising numbers or names of people to be called using known techniques for automatic speech recognition models of word sequencing resulting in high confidence levels of recognition.

Type: Application

Filed: June 10, 2003

Publication date: January 22, 2004

Inventors: Xiaobo Pi, Ying Jia
Face recognition procedure useful for audiovisual speech recognition

Publication number: 20030212552

Abstract: A visual feature extraction method includes application of multiclass linear discriminant analysis to the mouth region. Lip position can be accurately determined and used in conjunction with synchronous or asynchronous audio data to enhance speech recognition probabilities.

Type: Application

Filed: May 9, 2002

Publication date: November 13, 2003

Inventors: Lu Hong Liang, Xiaobo Pi, Xiaoxing Liu, Crusoe Mao, Ara V. Nefian
Voice barge-in in telephony speech recognition

Publication number: 20030158732

Abstract: An interactive voice response system is described that supports full duplex data transfer to enable the playing of a voice prompt to a user of telephony system while the system listens for voice barge-in from the user. The system includes a speech detection module that may utilize various criteria such as frame energy magnitude and duration thresholds to detect speech. The system also includes an automatic speech recognition engine. When the automatic speech recognition engine recognizes a segment of speech, a feature extraction module may be used to subtract a prompt echo spectrum, which corresponds to the currently playing voice prompt, from an echo-dirtied speech spectrum recorded by the system. In order to improve spectrum subtraction, an estimation of the time delay between the echo-dirtied speech and the prompt echo may also be performed.

Type: Application

Filed: March 25, 2003

Publication date: August 21, 2003

Inventors: Xiaobo Pi, Ying Jia
Method and system for joint optimization of feature and model space transformation of a speech recognition system

Publication number: 20030139926

Abstract: Methods for processing speech data are described herein. In one aspect of the invention, an exemplary method includes receiving a speech data stream, performing a Mel Frequency Cepstral Coefficients (MFCC) feature extraction on the speech data stream, optimizing feature space transformation (FST), optimizing model space transformation (MST) based on the FST, and performing recognition decoding based on the FST and the MST, generating a word sequence. Other methods and apparatuses are also described.

Type: Application

Filed: January 23, 2002

Publication date: July 24, 2003

Inventors: Ying Jia, Xiaobo Pi, Yonghong Yan