Patents by Inventor Yonghong Yan

Yonghong Yan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Program endpoint time detection apparatus and method, and program information retrieval system

Patent number: 9009054

Abstract: This invention relates to retrieval for multimedia content, and provides a program endpoint time detection apparatus for detecting an endpoint time of a program by performing processing on audio signals of said program, comprising an audio classification unit for classifying said audio signals into a speech signal portion and a non-speech signal portion; a keyword retrieval unit for retrieving, as a candidate endpoint keyword, an endpoint keyword indicating start or end of the program from said speech signal portion; a content analysis unit for performing content analysis on context of the candidate endpoint keyword retrieved by the keyword retrieval unit to determine whether the candidate endpoint keyword is a valid endpoint keyword; and a program endpoint time determination unit for performing statistics analysis based on the retrieval result of said keyword retrieval unit and the determination result of said content analysis unit, and determining the endpoint time of the program.

Type: Grant

Filed: October 28, 2010

Date of Patent: April 14, 2015

Assignees: Sony Corporation, Institute of Acoustics, Chinese Academy of Sciences

Inventors: Kun Liu, Weiguo Wu, Li Lu, Qingwei Zhao, Yonghong Yan, Hongbin Suo
Method and system for expanding a word graph to a phone graph based on a cross-word acoustical model to improve continuous speech recognition

Patent number: 8260614

Abstract: A method and system that expands a word graph to a phone graph. An unknown speech signal is received. A word graph is generated based on an application task or based on information extracted from the unknown speech signal. The word graph is expanded into a phone graph. The unknown speech signal is recognized using the phone graph. The phone graph can be based on a cross-word acoustical model to improve continuous speech recognition. By expanding a word graph into a phone graph, the phone graph can consume less memory than a word graph and can reduce greatly the computation cost in the decoding process than that of the word graph thus improving system performance. Furthermore, continuous speech recognition error rate can be reduced by using the phone graph, which provides a more accurate graph for continuous speech recognition.

Type: Grant

Filed: September 28, 2000

Date of Patent: September 4, 2012

Assignee: Intel Corporation

Inventors: Qingwei Zhao, Zhiwei Lin, Yonghong Yan
PROGRAM ENDPOINT TIME DETECTION APPARATUS AND METHOD, AND PROGRAM INFORMATION RETRIEVAL SYSTEM

Publication number: 20110106531

Abstract: This invention relates to retrieval for multimedia content, and provides a program endpoint time detection apparatus for detecting an endpoint time of a program by performing processing on audio signals of said program, comprising an audio classification unit for classifying said audio signals into a speech signal portion and a non-speech signal portion; a keyword retrieval unit for retrieving, as a candidate endpoint keyword, an endpoint keyword indicating start or end of the program from said speech signal portion; a content analysis unit for performing content analysis on context of the candidate endpoint keyword retrieved by the keyword retrieval unit to determine whether the candidate endpoint keyword is a valid endpoint keyword; and a program endpoint time determination unit for performing statistics analysis based on the retrieval result of said keyword retrieval unit and the determination result of said content analysis unit, and determining the endpoint time of the program.

Type: Application

Filed: October 28, 2010

Publication date: May 5, 2011

Applicants: SONY CORPORATION, Institute of Acoustics, Chinese Academy of Scienc.

Inventors: Kun LIU, Weiguo Wu, Li Lu, Qingwei Zhao, Yonghong Yan, Hongbin Suo
Method, apparatus, and system for building context dependent models for a large vocabulary continuous speech recognition (LVCSR) system

Patent number: 7587321

Abstract: According to one aspect of the invention, a method is provided in which a set of multiple mixture monophone models is created and trained to generate a set of multiple mixture context dependent models. A set of single mixture triphone models is created and trained to generate a set of context dependent models. Corresponding states of the triphone models are clustered to obtain a set of tied states based on a decision tree clustering process. Parameters of the context dependent models are estimated using a data dependent maximum a posteriori (MAP) adaptation method in which parameters of the tied states of the context dependent models are derived by adapting corresponding parameters of the context independent models using the training data associated with the respective tied states.

Type: Grant

Filed: May 8, 2001

Date of Patent: September 8, 2009

Assignee: Intel Corporation

Inventors: Xiaoxing Liu, Baosheng Yuan, Yonghong Yan
Speaker adaptation using weighted feedback

Patent number: 7580836

Abstract: In some embodiments, the invention includes calculating estimated weights for identified errors in recognition of utterances. Sections of the utterances are marked as being misrecognized and the corresponding estimated weights are associated with these sections of the utterances. The weighted sections of the utterances are used to convert a speaker independent model to a speaker dependent model.

Type: Grant

Filed: June 15, 2000

Date of Patent: August 25, 2009

Assignee: Intel Corporation

Inventor: Yonghong Yan
Method and system to scale down a decision tree-based hidden markov model (HMM) for speech recognition

Patent number: 7472064

Abstract: A method and system are provided in which a decision tree-based model (“general model”) is scaled down (“trim-down”) for a given task. The trim-down model can be adapted for the given task using task specific data. The general model can be based on a hidden markov model (HMM). By allowing a decision tree-based acoustic model (“general model”) to be scaled according to the vocabulary of the given task, the general model can be configured dynamically into a trim-down model, which can be used to improve speech recognition performance and reduce system resource utilization. Furthermore, the trim-down model can be adapted/adjusted according to task specific data, e.g., task vocabulary, model size, or other like task specific data.

Type: Grant

Filed: September 30, 2000

Date of Patent: December 30, 2008

Assignee: Intel Corporation

Inventors: Qing Guo, Yonghong Yan, Baosheng Yuan
Method and system for building a domain specific statistical language model from rule based grammar specifications

Patent number: 7346495

Abstract: A method and system providing a statistical representation from rule-based grammar specifications. The language model is generated by obtaining a statistical representation of a rule-based language model and combining it with a statistical representation of a statistical language model for use as a final language model. The language model may be enhanced by applying smoothing and/or adapting for use as the final language model.

Type: Grant

Filed: September 30, 2000

Date of Patent: March 18, 2008

Assignee: Intel Corporation

Inventors: Yibao Zhao, Yonghong Yan, Zhiwei Lin
Method and system for using rule-based knowledge to build a class-based domain specific statistical language model

Patent number: 7275033

Abstract: A method and system for providing a class-based statistical language model representation from rule-based knowledge is disclosed. The class-based language model is generated from a statistical representation of a class-based rule net. A class-based rule net is generated using the domain-related rules with words replaced with their corresponding class-tags that are manually defined. The class-based statistical representation from the class-based rule net is combined with a class-based statistical representation from a statistical language model to generate a language model. The language model is enhanced by smoothing/adapting with general-purpose and/or domain-related corpus for use as the final language model. A two-pass search algorithm is applied for speech decoding.

Type: Grant

Filed: September 30, 2000

Date of Patent: September 25, 2007

Assignee: Intel Corporation

Inventors: Yibao Zhao, Yonghong Yan, Zhiwei Lin
Method, apparatus, and system for bottom-up tone integration to Chinese continuous speech recognition system

Patent number: 7181391

Abstract: According to one aspect of the invention, a method is provided in which knowledge about tone characteristics of a tonal syllabic language is used to model speech at various levels in a bottom-up speech recognition structure. The various levels in the bottom-up recognition structure include the acoustic level, the phonetic level, the work level, and the sentence level. At the acoustic level, pitch is treated as a continuous acoustic variable and pitch information extracted from the speech signal is included as feature component of feature vectors. At the phonetic level, main vowels having the same phonetic structure but different tones are defined and modeled as different phonemes. At the word level, as set of tone changes rules is used to build transcription for training data and pronunciation lattice for decoding. At sentence level, a set of sentence ending words with light tone are also added to the system vocabulary.

Type: Grant

Filed: September 30, 2000

Date of Patent: February 20, 2007

Assignee: Intel Corporation

Inventors: Ying Jia, Yonghong Yan, Baosheng Yuan
Search method based on single triphone tree for large vocabulary continuous speech recognizer

Patent number: 6980954

Abstract: A search method based on a single triphone tree for large vocabulary continuous speech recognizer is disclosed in which speech signal are received. Tokens are propagated in a phonetic tree to integrate a language model to recognize the received speech signals. By propagating tokens, which are preserved in tree nodes and record the path history, a single triphone tree can be used in a one pass searching process thereby reducing speech recognition processing time and system resource use.

Type: Grant

Filed: September 30, 2000

Date of Patent: December 27, 2005

Assignee: Intel Corporation

Inventors: Quingwei Zhao, Zhiwei Lin, Yonghong Yan, Baosheng Yuan
Method, apparatus, and system for building context dependent models for a large vocabulary continuous speech recognition (lvcsr) system

Publication number: 20050228666

Abstract: According to one aspect of the invention, a method is provided in which a set of multiple mixture monophone models is created and trained to generate a set of multiple mixture context dependent models. A set of single mixture triphone models is created and trained to generate a set of context dependent models. Corresponding states of the triphone models are clustered to obtain a set of tied states based on a decision tree clustering process. Parameters of the context dependent models are estimated using a data dependent maximum a posteriori (MAP) adaptation method in which parameters of the tied states of the context dependent models are derived by adapting corresponding parameters of the context independent models using the training data associated with the respective tied states.

Type: Application

Filed: May 8, 2001

Publication date: October 13, 2005

Inventors: Xiaoxing Liu, Baosheng Yuan, Yonghong Yan
Acoustic modeling using a two-level decision tree in a speech recognition system

Patent number: 6789063

Abstract: In some embodiments, the invention involves receiving phonetic samples and assembling a two-level phonetic decision tree structure using the phonetic samples. The decision tree has multiple leaf node levels each having at least one state, wherein a least one node in a second level is assigned a Gaussian of a node in the first level, but the at least one node in the second level has a weight computed for it.

Type: Grant

Filed: September 1, 2000

Date of Patent: September 7, 2004

Assignee: Intel Corporation

Inventor: Yonghong Yan
Selective merging of segments separated in response to a break in an utterance

Patent number: 6601028

Abstract: In some embodiments, the invention involves a method including segmenting an utterance into at least a first segment and a second segment, wherein a boundary between the first and second segments corresponds to a break in the utterance. The method further includes selecting potential hypothetical paths of potential words in the first and second segments that cross the boundary. The method also includes applying a language model to the potential hypothetical paths crossing to determine whether to merge the first and second segments and to apply decoding to the merged segments.

Type: Grant

Filed: August 25, 2000

Date of Patent: July 29, 2003

Assignee: Intel Corporation

Inventor: Yonghong Yan
Method and system for joint optimization of feature and model space transformation of a speech recognition system

Publication number: 20030139926

Abstract: Methods for processing speech data are described herein. In one aspect of the invention, an exemplary method includes receiving a speech data stream, performing a Mel Frequency Cepstral Coefficients (MFCC) feature extraction on the speech data stream, optimizing feature space transformation (FST), optimizing model space transformation (MST) based on the FST, and performing recognition decoding based on the FST and the MST, generating a word sequence. Other methods and apparatuses are also described.

Type: Application

Filed: January 23, 2002

Publication date: July 24, 2003

Inventors: Ying Jia, Xiaobo Pi, Yonghong Yan
Method and system for integrating long-span language model into speech recognition system

Publication number: 20030061046

Abstract: A system is described for recognizing continuous speech based on M-gram language model. The system includes a lexical tree having a number of nodes, a buffer having a number of entries and a merging task to merge tokens to form a merged token list. The system decodes an input speech by propagating tokens along a number of different paths within the lexical tree. Each token contains information relating to a probability score and a word path history. The merging task is configured (1) to access a token list containing a group of tokens that have propagated to current state from a number of transition states, (2) to place tokens into an appropriate entry in the buffer according to a hash value and (3) to merge tokens with the same sequence of word candidates.

Type: Application

Filed: September 27, 2001

Publication date: March 27, 2003

Inventors: Qingwei Zhao, Jielin Pan, Yonghong Yan, Chunrong Lai