Markov Patents (Class 704/256)

Hidden markov model (hmm) (epo) (Class 704/256.1)

Speech recognition method

Patent number: 6917919

Abstract: A speech recognition method is described in which a basic set of models is adapted to a current speaker on account of the speaker's already noticed speech data. The basic set of models comprises models for different acoustic units. The models are described each by a plurality of model parameters. The basic set of models is then represented by a supervector in a high-dimensional vector space (model space), the supervector being formed by a concatenation of the plurality of the model parameters of the models of the basic set of models. The adaptation of this basic set of models to the speaker is effected in the model space by means of a MAP method in which an asymmetric distribution in the model space is selected as an a priori distribution for the MAP method.

Type: Grant

Filed: September 24, 2001

Date of Patent: July 12, 2005

Assignee: Koninklijke Philips Electronics, N.V.

Inventor: Henrik Botterweck
Error correction in speech recognition by correcting text around selected area

Patent number: 6912498

Abstract: Correcting incorrect text associated with recognition errors in computer-implemented speech recognition includes receiving a selection of a word from a recognized utterance. The selection indicates a bound of a portion of the recognized utterance to be corrected. A first recognition correction is produced based on a comparison between a first alternative transcript and the recognized utterance. A second recognition correction is produced based on a comparison between a second alternative transcript and the recognized utterance. The duration of the first recognition correction differs from the duration of the second recognition correction. A portion of the recognition result that is replaced with one of the first recognition correction and the second recognition correction. includes at one bound a word indicated by the selection and extends for the duration of the one of the first recognition correction and the second recognition correction with which the portion is replaced.

Type: Grant

Filed: May 2, 2001

Date of Patent: June 28, 2005

Assignee: ScanSoft, Inc.

Inventors: Daniell Stevens, Robert Roth, Joel M. Gould, Michael J. Newman, Dean Sturtevant, Charles E. Ingold, David Abrahams, Allan Gold
Method for identifying a momentary acoustic scene, application of said method, and a hearing device

Patent number: 6910013

Abstract: The invention relates first of all to a method for identifying a transient acoustic scene, said method including the extraction, during an extraction phase, of characteristic features from an acoustic signal captured by at least one microphone (2a, 2b), and the identification, during an identification phase, of the transient acoustic scene on the basis of the extracted characteristics. According to the invention, at least auditory-based characteristics are identified in the extraction phase. Also specified are an application of the method per this invention and a hearing device.

Type: Grant

Filed: January 5, 2001

Date of Patent: June 21, 2005

Assignee: Phonak AG

Inventors: Sylvia Allegro, Michael Büchler
Method for calculating HMM output probability and speech recognition apparatus

Patent number: 6901365

Abstract: The invention enables even a CPU having low processing performance to find an HMM output probability by simplifying arithmetic operations. The dimensions of an input vector are grouped into several sets, and tables are created for the sets. When an output probability is calculated, codes corresponding to the first dimension to n-the dimension of the input vector are sequentially obtained, and for each code, by referring to the corresponding table, output values for each table are obtained. By substituting the output values for each table for a formula for finding an output probability, the output probability is found.

Type: Grant

Filed: September 19, 2001

Date of Patent: May 31, 2005

Assignee: Seiko Epson Corporation

Inventor: Yasunaga Miyazawa
Voice actuation with contextual learning for intelligent machine control

Patent number: 6895380

Abstract: An interactive voice actuated control system for a testing machine such as a tensile testing machine is described. Voice commands are passed through a user-command predictor and integrated with a graphical user interface control panel to allow hands-free operation. The user-command predictor learns operator command patterns on-line and predicts the most likely next action. It assists less experienced operators by recommending the next command, and it adds robustness to the voice command interpreter by verbally asking the operator to repeat unlikely commanded actions. The voice actuated control system applies to industrial machines whose normal operation is characterized by a nonrandom series of commands.

Type: Grant

Filed: March 2, 2001

Date of Patent: May 17, 2005

Assignee: Electro Standards Laboratories

Inventor: Raymond Sepe, Jr.
Phonetic data processing system and method

Patent number: 6895377

Abstract: A phonetic data processing system processes phonetic stream data to produce a set of semantic data, using a context-free rich semantic grammar database (RSG DB) that includes a grammar tree, comprised of sub-trees, representing words and phrases. A phonetic searcher accepts the phonetic estimates and searches the RSG DB to produce a best word list, which is processed by a semantic parser, using the RSG DB, to produce a semantic tree instance, including all valid interpretations of the phonetic stream. An application accesses a semantic tree evaluator to interpret the semantic tree instance according to a context to produce a final linguistic interpretation of the phonetic stream, which is returned to the application.

Type: Grant

Filed: March 23, 2001

Date of Patent: May 17, 2005

Assignee: Eliza Corporation

Inventors: John Kroeker, Oleg Boulanov, Andrey Yelpatov
Speech recognition system with barge-in capability

Patent number: 6882973

Abstract: A voice processing system includes a speech recognition facility with barge-in. The system plays out a prompt to a caller, who starts to provide their spoken response while the prompt is still being played out. The system performs speech recognition on this response to determine a corresponding text, which is then subjected to lexical analysis. This tests whether the text satisfies one or more conditions, for example, including one or more words from a predefined set of task words. If this is found to be the case, the playing out of the prompt is terminated (i.e. barge-in is effected); otherwise, the playing out of the prompt is continued, essentially as if the caller bad not interrupted.

Type: Grant

Filed: October 25, 2000

Date of Patent: April 19, 2005

Assignee: International Business Machines Corporation

Inventor: John Brian Pickering
Pattern recognition with criterion for output from selected model to trigger succeeding models

Patent number: 6871177

Abstract: A method and apparatus of recognizing a pattern comprising a sequence of sub-patterns includes a set of possible patterns being modelled by a network of sub-pattern models. One or more initial software model objects are instantiated first. As these models produce outputs, succeeding model objects are instantiated if they have not already been instantiated. However, the succeeding model objects are only instantiated if a triggering model output meets a predetermined criterion. This ensures that the processing required is maintained at a manageable level. If the models comprise finite state networks, pruning of internal states may also be performed. The criterion applied to this pruning is less harsh than that applied when determining whether to instantiate a succeeding model.

Type: Grant

Filed: October 27, 1998

Date of Patent: March 22, 2005

Assignee: British Telecommunications public limited company

Inventors: Simon A Hovell, Mark Wright, Simon P. A Ringland
Hearing prosthesis with automatic classification of the listening environment

Patent number: 6862359

Abstract: A hearing prosthesis that automatically adjusts itself to a surrounding listening environment by applying Hidden Markov Models is provided. In one aspect, classification results are utilized to support automatic parameter adjustment of a parameter or parameters of a predetermined signal processing algorithm executed by processing means of the hearing prosthesis. According to another aspect, features vectors extracted from a digital input signal of the hearing prosthesis and processed by the Hidden Markov Models represent substantially level and/or absolute spectrum shape independent signal features of the digital input signal. This level independent property of the extracted features vectors provides robust classification results in real-life acoustic environments.

Type: Grant

Filed: May 29, 2002

Date of Patent: March 1, 2005

Assignee: GN ReSound A/S

Inventors: Nils Peter Nordqvist, Arne Leijon
Training apparatus and method

Patent number: 6853962

Abstract: Training apparatus for training a user to engage in transactions (e.g. a foreign language conversation) with another person whom the apparatus is arranged to simulate, the apparatus comprising: an input for receiving input dialogue from a user; a lexical store containing data relating to individual words of said input dialogue; a rule store containing rules specifying grammatically allowable relationships between words of said input dialogue; a transaction store containing data relating to allowable transactions between said user and said person; a processor arranged to process the input dialogue to recognise the occurrence therein of words contained in said lexical store in the relationships specified by the rules contained in said rule store in accordance with the data specified in the transaction store, and to generate output dialogue indicating when correct input dialogue has been recognised; and an output device for making the output dialogue available to the user.

Type: Grant

Filed: September 11, 1997

Date of Patent: February 8, 2005

Assignee: British Telecommunications public limited company

Inventor: Stephen C Appleby
Methods and apparatus for training a pattern recognition system using maximal rank likelihood as an optimization function

Patent number: 6850888

Abstract: A method and apparatus are disclosed for training a pattern recognition system, such as a speech recognition system, using an improved objective function. The concept of rank likelihood, previously applied only to the decding process, is applied in a novel manner to the parameter estimation of the training phase of a pattern recognition system. The disclosed objective function is based on a pseudo-rank likelihood that not only maximizes the likelihood of an observation for the correct class, but also minimizes the likelihoods of the observation for all other classes, such that the discrimination between classes is maximized. A training process is disclosed that utilizes the pseudo-rank likelihood objective function to identify model parameters that will result in a pattern recognizer with the lowest possible recognition error rate. The discrete nature of the rank-based rank likelihood objective function is transformed to allow the parameter estimations to be optimized during the training phase.

Type: Grant

Filed: October 6, 2000

Date of Patent: February 1, 2005

Assignee: International Business Machines Corporation

Inventors: Yuqing Gao, Yongxin Li, Michael Alan Picheny
Method for analyzing spatially-varying noise in seismic data using Markov chains

Patent number: 6847921

Abstract: The invention is a method for analyzing spatially-varying noise in seismic data. Transitions between data values at adjacent data locations in a seismic data set are represented by Markov chains. Transition probability matrices are constructed from the Markov chains. Data values are predicted from the calculated transition probabilities. Noise values are determined from the predicted data values.

Type: Grant

Filed: April 7, 2003

Date of Patent: January 25, 2005

Assignee: ExxonMobil Upstream Research Company

Inventors: Alex Woronow, John F. Schuette, Chrysanthe S. Munn
Pattern recognition using an observable operator model

Patent number: 6845357

Abstract: Data structures, systems, and methods are aspects of pattern recognition using observable operator models (OOMs). OOMs are more efficient than Hidden Markov Models (HMMs). A data structure for an OOM has characteristic events, an initial distribution vector, a probability transition matrix, an occurrence count matrix, and at least one observable operator. System applications include computer systems, cellular phones, wearable computers, home control systems, fire safety or security systems, PDAs, and flight systems. A method of pattern recognition comprises training OOMs, receiving unknown input, computing matching probabilities, selecting the maximum probability, and displaying the match. A method of speech recognition comprises sampling a first input stream, performing a spectral analysis, clustering, training OOMs, and recognizing speech using the OOMs.

Type: Grant

Filed: July 24, 2001

Date of Patent: January 18, 2005

Assignee: Honeywell International Inc.

Inventors: Ravindra K. Shetty, Venkatesan Thyagarajan
Discriminative training of hidden Markov models for continuous speech recognition

Publication number: 20040267530

Abstract: Methods are given for improving discriminative training of hidden Markov models for continuous speech recognition. In one approach, discriminatively trained mixture models are interpolated with maximum likelihood trained mixture models. In another approach, segmentation and recognition results from one set of models are reused to discriminatively train a second set of models. For example, segmentation and recognition results from detailed match models are mapped and used to discriminatively train fast match models. In addition, gradients for the standard deviation of mixture components are clipped based on the statistics of the gradients. Pronunciation of words may also be used to determine the “incorrect” recognition hypothesis.

Type: Application

Filed: November 21, 2003

Publication date: December 30, 2004

Inventors: Chuang He, Jianxiong Wu, Vlad Sejnoha
Method and array for introducing temporal correlation in hidden markov models for speech recognition

Patent number: 6832190

Abstract: In the recognition of spoken language, phonemes of the language are modelled by hidden Markov models. A modified hidden Markov model includes a conditional probability of a feature vector dependent on chronologically preceding feature vectors and, optionally, additionally comprises a conditional probability of a respectively current status. A global search for recognizing a word sequence in the spoken language is implemented with the modified hidden Markov model.

Type: Grant

Filed: November 10, 2000

Date of Patent: December 14, 2004

Assignee: Siemens Aktiengesellschaft

Inventors: Jochen Junkawitsch, Harald Höge
Process for implementing a speech recognizer, the related recognizer and process for speech recognition

Patent number: 6832191

Abstract: To implement a speech recognizer for a language in conditions of substantial unavailability of related speech training material the first step (1,2) is, based on related speech training material, a multilingual speech recognizer (2) for a plurality of known languages. The recognizer for such given language (5) is then implemented by interpolation (4) starting from the said multilingual recognizer (2). The recognizer (5) generated in this fashion is susceptible of being subsequently refined based on related speech training material acquired online (4) during later use (FIG.

Type: Grant

Filed: August 28, 2000

Date of Patent: December 14, 2004

Assignee: Telecom Italia Lab S.p.A.

Inventors: Alessandra Frasca, Giorgio Micca, Enrico Palme
Acoustic model creation method as well as acoustic model creation apparatus and speech recognition apparatus

Publication number: 20040236577

Abstract: To provide an acoustic model which can absorb the fluctuation of a phonemic environment in an interval longer than a syllable, with the number of parameters of the acoustic model suppressed to be small, a phoneme-connected syllable HMM/syllable-connected HMM set is generated in such a way that a phoneme-connected syllable HMM set corresponding to individual syllables is generated by combining phoneme HMMs. A preliminary experiment is conducted using the phoneme-connected syllable HMM set and training speech data. Any misrecognized syllable and the preceding syllable of the misrecognized syllable are checked using results of a preliminary experiment syllable label data. The combination between a correct answer syllable for the misrecognized syllable and the preceding syllable of the misrecognized syllable is extracted as a syllable connection. A syllable-connected HMM corresponding to this syllable connection is added into the phoneme-connected syllable HMM set.

Type: Application

Filed: March 8, 2004

Publication date: November 25, 2004

Applicant: Seiko Epson Corporation

Inventors: Masanobu Nishitani, Yasunaga Miyazawa, Hiroshi Matsumoto, Kazumasa Yamamoto
Speech recognition apparatus and method performing speech recognition with feature parameter preceding lead voiced sound as feature parameter of lead consonant

Patent number: 6823304

Abstract: A lead consonant buffer stores a feature parameter preceding a lead voiced sound detected by a voiced sound detector as a feature parameter of a lead consonant. A matching processing unit performs matching processing of a feature parameter of a lead consonant stored in the lead consonant buffer with a feature parameter of a registered pattern. Hence, the matching processing unit can perform matching processing reflecting information on a lead consonant even when no lead consonant can be detected due to a noise.

Type: Grant

Filed: July 19, 2001

Date of Patent: November 23, 2004

Assignee: Renesas Technology Corp.

Inventor: Masahiko Ikeda
Speech recognition accuracy in a multimodal input system

Patent number: 6823308

Abstract: A speech recognition method for use in a multimodal input system comprises receiving a multimodal input comprising digitized speech as a first modality input and data in at least one further modality input. Features in the speech and in the data in at least one further modality are identified. The identified features in the speech and in the data are used in the recognition of words by comparing the identified features with states in models for the words. The models have states for the recognition of speech and for words having features in at least one further modality associated with the words, the models also have states for the recognition of events in the further modality or each further modality.

Type: Grant

Filed: February 16, 2001

Date of Patent: November 23, 2004

Assignee: Canon Kabushiki Kaisha

Inventors: Robert Alexander Keiller, Nicolas David Fortescue
Methods and apparatus for generating, updating and distributing speech recognition models

Patent number: 6823306

Abstract: Techniques for generating, distributing, and using speech recognition models are described. A shared speech processing facility is used to support speech recognition for a wide variety of devices with limited capabilities including business computer systems, personal data assistants, etc., which are coupled to the speech processing facility via a communications channel, e.g., the Internet. Devices with audio capture capability record and transmit to the speech processing facility, via the Internet, digitized speech and receive speech processing services, e.g., speech recognition model generation and/or speech recognition services, in response. The Internet is used to return speech recognition models and/or information identifying recognized words or phrases. Thus, the speech processing facility can be used to provide speech recognition capabilities to devices without such capabilities and/or to augment a device's speech processing capability.

Type: Grant

Filed: November 30, 2000

Date of Patent: November 23, 2004

Assignee: Telesector Resources Group, Inc.

Inventors: Craig Reding, Suzi Levas
Method and system for the reduction of processing time in a speech recognition system using the hidden markov model

Patent number: 6801892

Abstract: Disclosed is a speech recognition method in a speech recognition apparatus to applying speech recognition to a voice signal applied thereto. The input voice signal is converted from an analog to a digital signal and sequences of feature vectors are extracted based upon the digital signal (S12). A search space is defined by the sequences of feature vectors and an HMM (16) prepared beforehand for each unit of speech. The search space allows a transition between HMMs only in specific feature-vector sequences. A search is conducted in this space to find an optimum path for which the largest acoustic likelihood regarding the voice signal is obtained to find the result of recognition (S14), and this result is output (S15).

Type: Grant

Filed: March 27, 2001

Date of Patent: October 5, 2004

Assignee: Canon Kabushiki Kaisha

Inventor: Hiroki Yamamoto
Method and apparatus for segmenting a multi-media program based upon audio events

Patent number: 6801895

Abstract: The present invention provides for a method and apparatus for segmenting a multi-media program based upon audio events. In an embodiment a method of classifying an audio stream is provided. This method includes receiving an audio stream. Sampling the audio stream at a predetermined rate and then combining a predetermined number of samples into a clip. A plurality of features are then determined for the clip and are analyzed using a linear approximation algorithm. The clip is then characterized based upon the results of the analysis conducted with the linear approximation algorithm.

Type: Grant

Filed: December 6, 1999

Date of Patent: October 5, 2004

Assignee: AT&T Corp.

Inventors: Qian Huang, Zhu Liu
Method for enhancing recognition probability in voice recognition systems

Patent number: 6801890

Abstract: The invention relates to a method for enhancing recognition probability in voice recognition systems. According to the inventive method, selective post-training of the already stored homonymic term is carried out after inputting a term to be recognized. This makes it possible to improve the speaker-dependent recognition rate even in environments with prevailing acoustic interference.

Type: Grant

Filed: November 13, 2000

Date of Patent: October 5, 2004

Assignees: DeTeMobil, Deutsche Telekom MobilNet GmbH, Deutsche Telekom AG

Inventors: Ulrich Kauschke, Herbert Roland Rast, Fred Runge
System and method for cantonese speech recognition using an optimized phone set

Publication number: 20040193418

Abstract: The present invention comprises a system and method for implementing a Cantonese speech recognizer with an optimized phone set, and may include a recognizer configured to compare input speech data to phone strings from a vocabulary dictionary that is implemented according to an optimized Cantonese phone set. The optimized Cantonese phone set may be implemented with a phonetic technique to separately include consonantal phones and vocalic phones. For reasons of system efficiency, the optimized Cantonese phone set may preferably be implemented in a compact manner to include only a minimum required number of consonantal phones and vocalic phones to accurately represent Cantonese speech during the speech recognition procedure.

Type: Application

Filed: March 24, 2003

Publication date: September 30, 2004

Applicant: Sony Corporation and Sony Electronics Inc.

Inventors: Michael Emonts, Xavier Menendez-Pidal, Lex Olorenshaw
Cascaded hidden Markov model for meta-state estimation

Publication number: 20040193419

Abstract: A method and system for training an audio analyzer (114) to identify asynchronous segments of audio types using sample data sets, the sample data sets being representative of audio signals for which segmentation is desired. The system and method then label asynchronous segments of audio samples, collected at the target site, into a plurality of categories by cascading hidden Markov models (HMM). The cascaded HMMs consist of 2 stages, the output of the first stage HMM (208) being transformed and used as observation inputs to the second stage HMM (212). This cascaded HMM approach allows for modeling processes with complex temporal characteristics by using training data. It also contains a flexible framework that allows for segments of varying duration. The system and method are particularly useful in identifying and separating segments of the human voice for voice recognition systems from other audio such as music.

Type: Application

Filed: March 31, 2003

Publication date: September 30, 2004

Inventors: Steven F. Kimball, Joanne Como
Coupled hidden markov model (CHMM) for continuous audiovisual speech recognition

Publication number: 20040186718

Abstract: Method and apparatus for an audiovisual continuous speech recognition (AVCSR) system using a coupled hidden Markov model (CHMM) are described herein. In one aspect, an exemplary process includes receiving an audio data stream and a video data stream, and performing continuous speech recognition based on the audio and video data streams using a plurality of hidden Markov models (HMMs), a node of each of the HMMs at a time slot being subject to one or more nodes of related HMMs at a preceding time slot. Other methods and apparatuses are also described.

Type: Application

Filed: March 19, 2003

Publication date: September 23, 2004

Inventors: Ara Victor Nefian, Xiaoxing Liu, Xiaobo Pi, Luhong Liang, Yibao Zhao
System for reconstruction of symbols in a sequence

Publication number: 20040186717

Abstract: A method of reconstructing a damaged sequence of symbols where some symbols are missing is provided in which statistical parameters of the sequence are used with confidence windowing techniques to quickly and efficiently reconstruct the damaged sequence to its original form. Confidence windowing techniques are provided that are equivalent to generalized hidden semi-Markov models but which are more easily used to determine the most likely missing symbol at a given point in the damaged sequence being reconstructed. The method can be used to reconstruct communications consisting of speech, music, digital transmission symbols and others having a bounded symbol set which can be described by statistical behaviors in the symbol stream.

Type: Application

Filed: March 17, 2003

Publication date: September 23, 2004

Applicant: Rensselaer Polytechnic Institute

Inventors: Michael Savic, Michael Moore
Speech recognition using model parameters dependent on acoustic environment

Publication number: 20040181409

Abstract: To make speech recognition robust in a noisy environment, variable parameter Gaussian Mixture HMM is described which extends existing HMMs by allowing HMM parameters to change as a function of a continuous variable that depends on the environment. Specifically, in one embodiment the function is a polynomial, the environment is described by signal-to-noise ratio. The use of the parameters functions improves the HMM discriminability during multi-condition training. In the recognition process, a set of HMM parameters is instantiated according to parameter functions, based on current environment. The model parameters are estimated using Expectation-Maximization algorithm for variable parameter GMHMM.

Type: Application

Filed: March 11, 2003

Publication date: September 16, 2004

Inventors: Yifan Gong, Xiaodong Cui
Modelling and processing filled pauses and noises in speech recognition

Publication number: 20040181410

Abstract: A speech recognition system recognizes filled pause utterances made by a speaker. In one embodiment, an ergodic model is used to acoustically model filled pauses that provides flexibility allowing varying utterances of the filled pauses to be made. The ergodic HMM model can also be used for other types of noise such as but limited to breathing, keyboard operation, microphone noise, laughter, door openings and/or closings, or any other noise occurring in the environment of the user or made by the user. Similarly, silence can be modeled using an ergodic HMM model. Recognition can be used with N-gram, context-free grammar or hybrid language models.

Type: Application

Filed: March 13, 2003

Publication date: September 16, 2004

Applicant: Microsoft Corporation

Inventor: Mei-Yuh Hwang
Block synchronous decoding

Publication number: 20040176956

Abstract: A pattern recognition system and method are provided. Aspects of the invention are particularly useful in combination with multi-state Hidden Markov Models. Pattern recognition is effected by processing Hidden Markov Model Blocks. This block-processing allows the processor to perform more operations upon data while such data is in cache memory. By so increasing cache locality, aspects of the invention provide significantly improved pattern recognition speed.

Type: Application

Filed: March 4, 2003

Publication date: September 9, 2004

Applicant: Microsoft Corporation

Inventors: William H. Rockenbeck, Julian J. Odell
Acoustic modeling using a two-level decision tree in a speech recognition system

Patent number: 6789063

Abstract: In some embodiments, the invention involves receiving phonetic samples and assembling a two-level phonetic decision tree structure using the phonetic samples. The decision tree has multiple leaf node levels each having at least one state, wherein a least one node in a second level is assigned a Gaussian of a node in the first level, but the at least one node in the second level has a weight computed for it.

Type: Grant

Filed: September 1, 2000

Date of Patent: September 7, 2004

Assignee: Intel Corporation

Inventor: Yonghong Yan
Phoneme-delta based speech compression

Patent number: 6789066

Abstract: An arrangement is provided for compressing speech data. Speech data is compressed based on a phoneme stream, detected from the speech data, and a delta stream, determined based on the difference between the speech data and a speech signal stream, generated using the phoneme stream with respect to a voice font. The compressed speech data is decompressed into a decompressed phoneme stream and a decompressed delta stream from which the speech data is recovered.

Type: Grant

Filed: September 25, 2001

Date of Patent: September 7, 2004

Assignee: Intel Corporation

Inventors: Stephen Junkins, Chris L. Gorman
Method and system for generating squeezed acoustic models for specialized speech recognizer

Patent number: 6789061

Abstract: Computer-based methods and systems are provided for automatically generating, from a first speech recognizer, a second speech recognizer such that the second speech recognizer is tailored to a certain application and requires reduced resources compared to the first speech recognizer. The invention exploits the first speech recognizer's set of states si and set of probability density functions (pdfs) assembling output probabilities for an observation of a speech frame in said states si. The invention teaches a first step of generating a set of states of the second speech recognizer reduced to a subset of states of the first speech recognizer being distinctive of the certain application. The invention teaches a second step of generating a set of probability density functions of the second speech recognizer reduced to a subset of probability density functions of the first speech recognizer being distinctive of the certain application.

Type: Grant

Filed: August 14, 2000

Date of Patent: September 7, 2004

Assignee: International Business Machines Corporation

Inventors: Volker Fischer, Siegfried Kunzmann, Claire Waast-Ricard
System and method for speech verification using out-of-vocabulary models

Patent number: 6778959

Abstract: A system and method for speech verification using out-of-vocabulary models includes a speech recognizer that has a model bank with system vocabulary word models, a garbage model, and one or more noise models. The model bank may reject an utterance or other sound as an invalid vocabulary word when the model bank identifies the utterance or other sound as corresponding to the garbage model or the noise models. Initial noise models may be selectively combined into a pre-determined number of final noise model clusters to effectively reduce the number of noise models that are utilized by the model bank of the speech recognizer to verify system vocabulary words.

Type: Grant

Filed: October 18, 2000

Date of Patent: August 17, 2004

Assignees: Sony Corporation, Sony Electronics Inc.

Inventors: Duanpei Wu, Lex Olorenshaw, Xavier Menendez-Pidal, Ruxin Chen
Computer method and apparatus for segmenting text streams

Patent number: 6772120

Abstract: Computer method and apparatus for segmenting text streams is disclosed. Given is an input text stream formed of a series of words. A probability member provides working probabilities that a group of words is of a topic selected from a plurality of predetermined topics. The probability member accounts for relationships between words. A processing module receives the input text stream and using the probability member determines probability of certain words in the input text stream being of a same topic. As such, the processing module segments the input text stream into single topic groupings of words, where each grouping is of a respective single topic.

Type: Grant

Filed: November 21, 2000

Date of Patent: August 3, 2004

Assignee: Hewlett-Packard Development Company, L.P.

Inventors: Pedro J. Moreno, David M. Blei
Audio-Assisted segmentation and browsing of news videos

Publication number: 20040143434

Abstract: A method segments and summarizes a news video using both audio and visual features extracted from the video. The summaries can be used to quickly browse the video to locate topics of interest. A generalized sound recognition hidden Markov model (HMM) framework for joint segmentation and classification of the audio signal of the news video is used. The HMM not only provides a classification label for audio segment, but also compact state duration histogram descriptors.

Type: Application

Filed: January 17, 2003

Publication date: July 22, 2004

Inventors: Ajay Divakaran, Regunathan Radhakrishnan
Method of speech recognition using hidden trajectory hidden markov models

Publication number: 20040143435

Abstract: A method of speech recognition is provided that determines a production-related value, vocal-tract resonance frequencies in particular, for a state at a particular frame based on the production-related values associated with two preceding frames using a recursion. The production-related value is used to determine a probability distribution of the observed feature vector for the state. A probability for an observed value received for the frame is then determined from the probability distribution. Under one embodiment, the production-related value is determined using a noise-free recursive definition for the value. Use of the recursion substantially improves the decoding speed. When the decoding algorithm is applied to training data with known phonetic transcripts, forced alignment is created which improves the phone segmentation obtained from the prior art.

Type: Application

Filed: January 21, 2003

Publication date: July 22, 2004

Inventors: Li Deng, Jian-Iai Zhou, Frank Torsten Bernd Seide
Gaussian model-based dynamic time warping system and method for speech processing

Publication number: 20040122672

Abstract: The Gaussian Dynamic Time Warping model provides a hierarchical statistical model for representing an acoustic pattern. The first layer of the model represents the general acoustic space; the second layer represents each speaker space and the third layer represents the temporal structure information contained in each enrollment speech utterance, based on equally-spaced time intervals. These three layers are hierarchically developed: the second layer is derived from the first, and the third layer is derived from the second. The model is useful in speech processing application, particularly in applications such as word and speaker recognition, using a spotting recognition mode.

Type: Application

Filed: December 18, 2002

Publication date: June 24, 2004

Inventors: Jean-Francois Bonastre, Philippe Morin, Jean-Claude Junqua
System and method for automatic voice recognition using mapping

Patent number: 6754629

Abstract: A method and system that combines voice recognition engines and resolves differences between the results of individual voice recognition engines using a mapping function. Speaker independent voice recognition engines and speaker-dependent voice recognition engines are combined. Hidden Markov Model (HMM) engines and Dynamic Time Warping (DTW) engines are combined.

Type: Grant

Filed: September 8, 2000

Date of Patent: June 22, 2004

Assignee: Qualcomm Incorporated

Inventors: Yingyong Qi, Ning Bi, Harinath Garudadri
Speech recognition apparatus

Publication number: 20040117187

Abstract: An assembly of word models produced from a word model producer is sent to a matching object word selector to select one word model as matching object from them. A word matching processor judges whether or not a score of a path root of a present state serving as matching object is within a predetermined range being set based on a maximum value of the score, which is memorized in a maximum value memory buffer connected to the word matching processor. When the score of the path root is with in the above-described range, the score of this path root is designated as count object and a cumulative score is obtained. On the other hand, when the score of the path root is outside the above-described range, calculation of score for the state of the matching object is omitted.

Type: Application

Filed: July 7, 2003

Publication date: June 17, 2004

Applicant: Renesas Technology Corp.

Inventor: Masahiko Ikeda
Method of creating acoustic model and speech recognition device

Publication number: 20040111263

Abstract: The invention provides an acoustic model creating method that can reduce the number of parameters and optimize the Gaussian distribution number for respective states constituting an HMM in order to create an HMM having high recognition ability. HMM sets in which the Gaussian distribution numbers of the respective states constituting the respective syllable HMMs are set from one to the maximum distribution number (the distribution number of which is 64) is trained using training speech data, and the respective states of the respective HMMs are viterbi-aligned with the training speech data corresponding to the HMMs using a syllable HMM set to the maximum distribution number among the trained syllable HMM sets. Then, a description length computing unit computes a description length for the respective states of the respective HMMs using the alignment data, and a state selecting unit selects a state having the distribution number the description length of which is minimum.

Type: Application

Filed: September 17, 2003

Publication date: June 10, 2004

Applicant: SEIKO EPSON CORPORATION

Inventors: Masanobu Nishitani, Yasunaga Miyazawa, Hiroshi Matsumoto, Kazumasa Yamamoto
Method for computer-supported speech recognition, speech recognition sytem and control device for controlling a technical sytem and telecommunications device

Publication number: 20040102974

Abstract: The speech recognition rate which is necessary is determined for a selected speech recognition application. The information content of the feature vector components which is at least necessary to ensure the speech recognition rate is determined using a stored speech recognition rate information. The number of necessary feature vector components which is necessary to make available the determined information content is determined and the speech recognition is carried out using feature vectors with the determined required number of feature vector components.

Type: Application

Filed: September 16, 2003

Publication date: May 27, 2004

Inventors: Michael Kustner, Ralf Sambeth
Information search method and apparatus using Inverse Hidden Markov Model

Patent number: 6735588

Abstract: An information search method and apparatus employ an Inverse Hidden Markov Model (IHMM) for stochastically searching for a reference information model among a plurality of predetermined reference information models obtained by training that best matches unknown information which is expressed by a Hidden Markov Model (HMM) chain. The method and apparatus find an optimal path in a HMM state lattice using a minimum unlikelihood score, rather than a maximum likelihood score, and using a Viterbi algorithm, to recognize unknown information, so that unnecessary computations are avoided. The method and apparatus can be used for finding the most likely path through a vocabulary network for a given utterance.

Type: Grant

Filed: May 14, 2001

Date of Patent: May 11, 2004

Assignees: Samsung Electronics Co., Ltd., Sungkyunkwan University

Inventors: Bo-Sung Kim, Jun-dong Cho, Young-hoon Chang, Sun-hee Park
Generating realistic facial animation from speech

Patent number: 6735566

Abstract: A system for learning a mapping between time-varying signals is used to drive facial animation directly from speech, without laborious voice track analysis. The system learns dynamical models of facial and vocal action from observations of a face and the facial gestures made while speaking. Instead of depending on heuristic intermediate representations such as phonemes or visemes, the system trains hidden Markov models to obtain its own optimal representation of vocal and facial action. An entropy-minimizing training technique using an entropic prior ensures that these models contain sufficient dynamical information to synthesize realistic facial motion to accompany new vocal performances. In addition, they can make optimal use of context to handle ambiguity and relatively long-lasting facial co-articulation effects.

Type: Grant

Filed: October 9, 1998

Date of Patent: May 11, 2004

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventor: Matthew E. Brand
Systems and methods for providing interactive speaker identification training

Publication number: 20040083104

Abstract: A system (100) provides speaker identification training. The system (100) generates speaker models and receives audio segments. The system (100) identifies speakers corresponding to the audio segments based on the speaker models. At least one of the audio segments has an unidentified or misidentified speaker (i.e., an audio segment whose speaker cannot be accurately identified). The system (100) presents, to a user, audio segments that include an audio segment whose speaker is unidentified or misidentified and receives, from the user, the name of the unidentified or misidentified speaker. The system (100) may use this information to subsequently identify the unidentified or misidentified speaker by name for future audio segments.

Type: Application

Filed: October 16, 2003

Publication date: April 29, 2004

Inventors: Daben Liu, Francis G. Kubala
Method and system for training of a classifier

Patent number: 6728674

Abstract: A method and a system for corrective training of speech models includes changing a weight of a date sample whenever a data sample is incorrectly associated with a classifier and retraining each classifier with the weights.

Type: Grant

Filed: July 31, 2000

Date of Patent: April 27, 2004

Assignee: Intel Corporation

Inventor: Meir Griniasty
Method and apparatus for probabilistic recognition using small number of state clusters

Patent number: 6725195

Abstract: Probabilistic recognition using clusters and simple probability functions provides improved performance by employing a limited number of clusters each using a relatively large number of simple probability functions. The simple probability functions for each of the limited number of state clusters are greater in number than the limited number of state clusters.

Type: Grant

Filed: October 22, 2001

Date of Patent: April 20, 2004

Assignee: SRI International

Inventors: Ananth Sankar, Venkata Ramana Rao Gadde
Method of training an automatic speech recognizer

Patent number: 6714910

Abstract: Provided is a method of training an automatic speech recognizer, said speech recognizer using acoustic models and/or speech models, wherein speech data is collected during a training phase and used to improve the acoustic models, said method comprising: during the training phase, providing speech utterances that are predefined to a user by means of a game, wherein the game has predefined rules to enable a user to provide certain utterances; and providing the utterances by the user for training the speech recognizer.

Type: Grant

Filed: June 26, 2000

Date of Patent: March 30, 2004

Assignee: Koninklijke Philips Electronics, N.V.

Inventors: Georg Rose, Joseph Hubertus Eggen, Bartel Marinus Van Der Sluis
Voice recognition apparatus and voice recognition method

Publication number: 20040059576

Abstract: The present invention relates to a speech recognition apparatus for recognizing speeches of a plurality of users with high accuracy. An adapting unit 12 detects a best transformation function for adapting an input speech to an acoustic model from at least one transformation function based on the transformation results which are obtained by transforming the input speech by at least one transformation function stored in a storing unit 13, and allocates the input speech to the best transformation function. Further, the adapting unit 12 updates the transformation function to which the new input speech is allocated by all the input speeches allocated to the transformation function. A selecting unit 14 selects the transformation function used for transforming the input speech from at least one transformation function stored in the storing unit 13. A transforming unit 5 transforms the input speech by the selected transformation function.

Type: Application

Filed: October 20, 2003

Publication date: March 25, 2004

Inventor: Helmut Lucke
Vector fixed-lag algorithm for decoding input symbols

Patent number: 6708149

Abstract: The present invention discloses an apparatus and method of decoding information received over a noisy communications channel to determine the intended transmitted information. The present invention uses a vector fixed-lag algorithm to determine the probabilities of the intended transmitted information. The algorithm is implemented by multiplying an initial state vector with a matrix containing information about the communications channel. The product is then recursively multiplied by the matrix &tgr; times, using the new product with each recursive multiplication and the forward information is stored for a fixed period of time, &tgr;. The final product is multiplied with a unity column vector yielding a probability of a possible input. The estimated input is the input having the largest probability.

Type: Grant

Filed: April 30, 2001

Date of Patent: March 16, 2004

Assignee: AT&T Corp.

Inventor: William Turin

prev … 4 5 6 7 8 9 10 11 12 … next