Markov Patents (Class 704/256)

Hidden markov model (hmm) (epo) (Class 704/256.1)

Speech recognition system and method

Publication number: 20040044531

Abstract: The invention provides a method of speech recognition comprising the steps of receiving a signal comprising one or more spoken words, extracting a spoken word from the signal using a Hidden Markov Model, passing the spoken word to a plurality of word models, one or more of the word models based on a Hidden Markov Model, determining the word model most likely to represent the spoken word, and outputting the word model representing the spoken word. The invention also provides a related speech recognition system and a speech recognition computer program.

Type: Application

Filed: September 5, 2003

Publication date: March 4, 2004

Inventors: Nikola Kirilov Kasabov, Waleed Habib Abdulla
Speaker verification and speaker identification based on a priori knowledge

Patent number: 6697778

Abstract: Client speaker locations in a speaker space are used to generate speech models for comparison with test speaker data or test speaker speech models. The speaker space can be constructed using training speakers that are entirely separate from the population of client speakers, or from client speakers, or from a mix of training and client speakers. Reestimation of the speaker space based on client environment information is also provided to improve the likelihood that the client data will fall within the speaker space. During enrollment of the clients into the speaker space, additional client speech can be obtained when predetermined conditions are met. The speaker distribution can also be used in the client enrollment step.

Type: Grant

Filed: July 5, 2000

Date of Patent: February 24, 2004

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Roland Kuhn, Olivier Thyes, Patrick Nguyen, Jean-Claude Junqua, Robert Boman
Speech recognition system including dimensionality reduction of baseband frequency signals

Patent number: 6691090

Abstract: A method for use in a speech recognition system in which a speech waveform to be modelled is represented by a set of feature extracted parameters in the time domain, the method comprising dividing individual ones of one or more of said feature extracted parameters to provide for each divided feature extracted parameter a plurality of frequency channels, and demodulating at least one of the plurality of frequency channels to provide at least one corresponding baseband frequency signal.

Type: Grant

Filed: October 24, 2000

Date of Patent: February 10, 2004

Assignee: Nokia Mobile Phones Limited

Inventors: Kari Laurila, Jilei Tian
Audio search conducted through statistical pattern matching

Publication number: 20040024599

Abstract: A technique for audio searches by statistical pattern matching is disclosed. The audio to be located is processed for feature extraction and decoded using a maximum likelihood (“ML”) search. A left-right Hidden Markov Model (“HMM”) is constructed from the ML state sequence. Transition probabilities are defined as normalized state occupancies from the most likely state sequence of the decoding operation. Utterance duration is measured from the search sample. Other model parameters are gleaned from an acoustic model. A ML search of an audio corpus is conducted with respect to the HMM and a garbage model. New start states are added at each frame. Low scoring and long state sequences (with respect to the search sample duration) are discarded at each frame. Locations where scores of the new model are higher than those of the garbage model are marked as potential matches. The highest scoring matches are presented as results.

Type: Application

Filed: July 31, 2002

Publication date: February 5, 2004

Applicant: Intel Corporation

Inventor: Michael E. Deisher
Method and apparatus for differential compression of speaker models

Publication number: 20040015358

Abstract: A differential compression technique is disclosed for compression individual speaker models, such as Gaussian mixture models, by computing a delta model from the difference between an individual speaker model and a baseline model. Further compression may be applied to the delta model to reduce the large storage requirements generally attributed to speaker models.

Type: Application

Filed: January 2, 2003

Publication date: January 22, 2004

Applicant: Massachusetts Institute of Technology

Inventor: Douglas A. Reynolds
System and method for lossy compression of voice recognition models

Patent number: 6681207

Abstract: A method and system that improves voice recognition by improving storage of voice recognition (VR) templates. The improved storage means that more VR models can be stored in memory. The more VR models that are stored in memory, the more robust the VR system and therefore the more accurate the VR system. Lossy compression techniques are used to compress VR models. In one embodiment, A-law compression and A-law expansion are used to compress and expand VR models. In another embodiment, Mu-law compression and Mu-law expansion are used to compress and expand VR models. VR models are compressed during a training process and they are expanded during voice recognition.

Type: Grant

Filed: January 12, 2001

Date of Patent: January 20, 2004

Assignee: Qualcomm Incorporated

Inventor: Harinath Garudadri
Embedded coupled hidden markov model

Publication number: 20040002863

Abstract: An arrangement is provided for embedded coupled hidden Markov model. To train an embedded coupled hidden Markov model, training data is first segmented into uniform segments at different layers of the embedded coupled hidden Markov model. At each layer, a uniform segment corresponds to a state of a coupled hidden Markov model at that layer. An optimal segmentation is generated at the lower layer based on the uniform segmentation and is then used to update parameters of models associated with the states of coupled hidden Markov models at lower layer. The updated model parameters at the lower layer are then used to update the model parameters associated with states at the super layer.

Type: Application

Filed: June 27, 2002

Publication date: January 1, 2004

Applicant: Intel Corporation

Inventor: Ara V. Nefian
combined engine system and method for voice recognition

Patent number: 6671669

Abstract: A method and system that combines voice recognition engines and resolves any differences between the results of individual voice recognition engines. A speaker independent (SI) Hidden Markov Model (HMM) engine, a speaker independent Dynamic Time Warping (DTW-SI) engine and a speaker dependent Dynamic Time Warping (DTW-SD) engine are combined. Combining and resolving the results of these engines results in a system with better recognition accuracy and lower rejection rates than using the results of only one engine.

Type: Grant

Filed: July 18, 2000

Date of Patent: December 30, 2003

Assignee: Qualcomm Incorporated

Inventors: Harinath Garudadri, David Puig Oses, Ning Bi, Yingyong Qi
Speech recognition involving a neural network

Publication number: 20030233233

Abstract: Methods and systems for recognizing speech include receiving information reflecting the speech, determining at least one broad-class of the received information, classifying the received information based on the determined broad-class, selecting a model based on the classification of the received information, and recognizing the speech using the selected model and the received information.

Type: Application

Filed: June 13, 2002

Publication date: December 18, 2003

Applicant: INDUSTRIAL TECHNOLOGY RESEARCH INSTITUTE

Inventor: Wei-Tyng Hong
Audio coding system using spectral hole filling

Publication number: 20030233234

Abstract: Audio coding processes like quantization can cause spectral components of an encoded audio signal to be set to zero, creating spectral holes in the signal. These spectral holes can degrade the perceived quality of audio signals that are reproduced by audio coding systems. An improved decoder avoids or reduces the degradation by filling the spectral holes with synthesized spectral components. An improved encoder may also be used to realize further improvements in the decoder.

Type: Application

Filed: June 17, 2002

Publication date: December 18, 2003

Inventors: Michael Mead Truman, Grant Allen Davidson, Matthew Conrad Fellers, Mark Stuart Vinton, Matthew Aubrey Watson, Charles Quito Robinson
Adaptive speech recognition method with noise compensation

Patent number: 6662160

Abstract: An adaptive speech recognition method with noise compensation is disclosed. In speech recognition, optimal equalization factors for feature vectors of a plurality of speech frames corresponding to each probability density function in a speech model are determined based on the plurality of speech frames of the input speech and the speech model. The parameters of the speech model are adapted by the optimal equalization factor and a bias compensation vector, which is corresponding to and retrieved by the optimal equalization factor. The optimal equalization factor is provided to adjust a distance of the mean vector in the speech model. The bias compensation vector is provided to adjust a direction change of the mean vector in the speech model.

Type: Grant

Filed: October 26, 2000

Date of Patent: December 9, 2003

Assignee: Industrial Technology Research Inst.

Inventors: Jen-Tzung Chien, Kuo-Kuan Wu, Po-Cheng Chen
Temporal pattern recognition method and apparatus utilizing segment and frame-based models

Patent number: 6662158

Abstract: A method and apparatus is provided for identifying patterns from a series of feature vectors representing a time-varying signal. The method and apparatus use both a frame-based model and a segment model in a unified framework. The frame-based model determines the probability of an individual feature vector given a frame state. The segment model determines the probability of sub-sequences of feature vectors given a single segment state. The probabilities from the frame-based model and the segment model are then combined to form a single path score that is indicative of the probability of a sequence of patterns. Another aspect of the invention is the use of a frame-based model and a segment model to segment feature vectors during model training. Under this aspect of the invention, the frame-based model and the segment model are used together to identify probabilities associated with different segmentations.

Type: Grant

Filed: April 27, 2000

Date of Patent: December 9, 2003

Assignee: Microsoft Corporation

Inventors: Hsiao-Wuen Hon, Kuansan Wang
Speech recognition system and program thereof

Publication number: 20030225581

Abstract: Speech recognition is performed by matching between a characteristic quantity of an inputted speech and a composite HMM obtained by synthesizing a speech HMM (hidden Markov model) and a noise HMM for each speech frame of the inputted speech by use of the composite HMM.

Type: Application

Filed: March 14, 2003

Publication date: December 4, 2003

Applicant: International Business Machines Corporation

Inventors: Tetsuya Takiguchi, Masafumi Nishimura
Method for transforming HMMs for speaker-independent recognition in a noisy environment

Patent number: 6658385

Abstract: On improved transformation method uses an initial set of Hidden Markov Models (HMMs) trained on a large amount of speech recorded in a low noise environment R to provide rich information on co-articulation and speaker variation and a smaller database in a more noisy target environment T. A set H of HMMs is trained with data provided in the low noise environment R and the utterances in the noisy environment T are transcribed phonetically using set H of HMMs. The transcribed segments are grouped into a set of Classes C. For each subclass c of Classes C, the transformation &PHgr;c is found to maximize likelihood utterances in T, given H. The HMMs are transformed and steps repeated until likelihood stabilizes.

Type: Grant

Filed: February 10, 2000

Date of Patent: December 2, 2003

Assignee: Texas Instruments Incorporated

Inventors: Yifan Gong, John J. Godfrey
Apparatus and method for speech recognition

Publication number: 20030220791

Abstract: A true/false judgment on a result of speech recognition is made with high accuracy using a less volume of processing. By comparing acoustic models HMMsb against the feature vector sequence V(n) of utterances, a recognition result RCG specifying the acoustic model HMMsb having the maximum likelihood, a first score FSCR indicting the value of the maximum likelihood, and a second score SSCR indicating the value of the second highest likelihood are found. Then, by comparing an evaluation value FSCR×(FSCR−SSCR) based on the first score FSCR and the second score SSCR with a pre-set threshold value THD, a true/false judgment on the recognition result RCG is made. When the recognition result RCG is judged as being true, speaker adaptation is applied to the acoustic models HMMsb, and when the recognition result RCG is judged as being false, speaker adaptation is not applied to the acoustic models HMMsb. It is thus possible to improve the accuracy of speaker adaptation.

Type: Application

Filed: April 25, 2003

Publication date: November 27, 2003

Applicant: Pioneer Corporation

Inventor: Soichi Toyama
Speech recognition apparatus, speech recognition method, and computer-readable recording medium in which speech recognition program is recorded

Publication number: 20030220792

Abstract: A speech recognition device comprises an HMM model database which prestores keyword HMMs which represent feature patterns of keywords to be recognized, likelihood calculator which calculates the likelihood of an extracted feature value of a speech signal in each frame by comparing it with keyword HMMs and designated-speech HMMs, extraneous-speech likelihood setting device which sets extraneous-speech likelihood based on the calculated likelihood of a match with the designated-speech HMMs, matching processor which performs a matching process based on the calculated likelihood and the extraneous-speech likelihood, and determining device which determines the keywords contained in the spontaneous speech based on the matching process.

Type: Application

Filed: May 19, 2003

Publication date: November 27, 2003

Applicant: Pioneer Corporation

Inventors: Hajime Kobayashi, Soichi Toyama
Coupled hidden markov model for audiovisual speech recognition

Publication number: 20030212557

Abstract: A speech recognition method includes use of synchronous or asynchronous audio and a video data to enhance speech recognition probabilities. A two stream coupled hidden Markov model is trained and used to identify speech. At least one stream is derived from audio data and a second stream is derived from mouth pattern data. Gestural or other suitable data streams can optionally be combined to reduce speech recognition error rates in noisy environments.

Type: Application

Filed: May 9, 2002

Publication date: November 13, 2003

Inventor: Ara V. Nefian
Factorial hidden markov model for audiovisual speech recognition

Publication number: 20030212556

Abstract: A speech recognition method includes use of synchronous or asynchronous audio and a video data to enhance speech recognition probabilities. A two stream factorial hidden Markov model is trained and used to identify speech. At least one stream is derived from audio data and a second stream is derived from mouth pattern data. Gestural or other suitable data streams can optionally be combined to reduce speech recognition error rates in noisy environments.

Type: Application

Filed: May 9, 2002

Publication date: November 13, 2003

Inventor: Ara V. Nefian
Speech recognition apparatus, speech recognition method, and computer-readable recording medium in which speech recognition program is recorded

Publication number: 20030200086

Abstract: A speech recognition apparatus comprises a speech analyzer which extracts feature patterns of spontaneous speech divided into frames; a keyword model database which prestores keyword which represent feature patterns of a plurality of keywords to be recognized; a garbage model database which prestores feature patterns of components of extraneous speech to be identified; and a first likelihood calculator which calculates likelihood of feature values based on feature values patterns of each frames and keywords; a second likelihood calculator which calculates likelihood of feature values based on feature values patterns of each frames and extraneous speech. The device recognizes keywords contained in the spontaneous speech by calculating cumulative likelihood based on the calculated likelihood adding a predetermined correction value in the second likelihood calculator.

Type: Application

Filed: April 15, 2003

Publication date: October 23, 2003

Applicant: PIONEER CORPORATION

Inventors: Yoshihiro Kawazoe, Hajime Kobayashi
Music summarization system and method

Patent number: 6633845

Abstract: The invention provides a method and apparatus for automatically generating a summary or key phrase for a song. The song, or a portion thereof, is digitized and converted into a sequence of feature vectors, such mel-frequency cepstral coefficients (MFCCs). The feature vectors are then processed in order decipher the song's structure. Those sections that correspond to different structural elements are then marked with corresponding labels. Once the song is labeled, various heuristics are applied to select a key phrase corresponding to the song's summary. For example, the system may identify the label that appears most frequently within the song, and then select the longest duration of that label as the summary.

Type: Grant

Filed: April 7, 2000

Date of Patent: October 14, 2003

Assignee: Hewlett-Packard Development Company, L.P.

Inventors: Beth Teresa Logan, Stephen Mingyu Chu
Speech recognition front-end feature extraction for noisy speech

Patent number: 6633842

Abstract: An estimate of clean speech vector, typically Mel-Frequency Cepstral Coefficient (MFCC) given its noisy observation is provided. The method makes use of two Gaussian mixtures. The first one is trained on clean speech and the second is derived from the first one using some noise samples. The method gives an estimate of a clean speech feature vector as the conditional expectancy of clean speech given an observed noisy vector.

Type: Grant

Filed: September 21, 2000

Date of Patent: October 14, 2003

Assignee: Texas Instruments Incorporated

Inventor: Yifan Gong
Systems and methods for determining the N-best strings

Publication number: 20030187644

Abstract: Systems and methods for identifying the N-best strings of a weighted automaton. A potential for each state of an input automaton to a set of destination states of the input automaton is first determined. Then, the N-best paths are found in the result of an on-the-fly determinization of the input automaton. Only the portion of the input automaton needed to identify the N-best paths is determinized. As the input automaton is determinized, a potential for each new state of the partially determinized automaton is determined and is used in identifying the N-best paths of the determinized automaton, which correspond exactly to the N-best strings of the input automaton.

Type: Application

Filed: November 21, 2002

Publication date: October 2, 2003

Inventors: Mehryar Mohri, Michael Dennis Riley
Automatic detection of change in speaker in speaker adaptive speech recognition system

Publication number: 20030187645

Abstract: In many real applications such as voice control in vehicles there is the problem that the users change relatively frequently. Then the question arises: which is the correct data set for the current user? The invention provides a process making it possible automatically for the duration of operation of the system to recognize whether the speaker changes, or which (speaker dependent) data set is correct for the actual user. This task is solved by a speech recognition system which is based on a so-called Semi-Continuous Hidden Markov Model (SCHMM). Codebooks are produced, normal distribution is represented, speaker-specific data sets are stored in addition to a so-called base-line data set, and the inventive speech recognition system correlates the speech signal by means of vector quantitization with the speaker-independent and the speaker-dependent codebooks, making it possible to ascertain the identity of the speaker.

Type: Application

Filed: March 3, 2003

Publication date: October 2, 2003

Inventors: Fritz Class, Udo Haiber, Alfred Kaltenmeier
Speech recognition method and apparatus utilizing multi-unit models

Patent number: 6629073

Abstract: A speech recognition method and system utilize an acoustic model that is capable of providing probabilities for both a large acoustic unit and an acoustic sub-unit. Each of these probabilities describes the likelihood of a set of feature vectors from a series of feature vectors representing a speech signal. The large acoustic unit is formed from a plurality of acoustic sub-units. At least one sub-unit probability and at least on large unit probability from the acoustic model are used by a decoder to generate a score for a sequence of hypothesized words. When combined, the acoustic sub-units associated with all of the sub-unit probabilities used to determine the score span fewer than all of the feature vectors in the series of feature vectors. An overlapping decoding technique is also provided.

Type: Grant

Filed: April 27, 2000

Date of Patent: September 30, 2003

Assignee: Microsoft Corporation

Inventors: Hsiao-Wuen Hon, Kuansan Wang
Model generation apparatus and methods

Publication number: 20030163313

Abstract: A model generation unit (17) is provided. The model generation unit includes an alignment module (80) arranged to receive pairs of sequences of parameter frame vectors from a buffer (16) and to perform dynamic time warping of the parameter frame vectors to align corresponding parts of the pair of utterances. A consistency checking module (82) is provided to determine whether the aligned parameter frame vectors correspond to the same word. If this is the case the aligned parameter frame vectors are passed to a clustering module (84) which groups the parameter frame vectors into a number of clusters. Whilst clustering the parameter frame vectors, the clustering module (80) determines for each grouping an objective function calculating the best fit of a model to the clusters per degrees of freedom of that model.

Type: Application

Filed: November 6, 2002

Publication date: August 28, 2003

Applicant: Canon Kabushiki Kaisha

Inventor: David Llewellyn Rees
Speech processing apparatus and method

Publication number: 20030163312

Abstract: A speech recognition system is disclosed including a model generation unit (20) and a speech recognition unit (22). When signals are received from a microphone (7) the model generation unit (20) utilises the signals to generate hidden Markov models that are stored in a hidden Markov model database (24). Subsequently, when utterances are to be recognised, the speech recognition unit (22) utilises the stored hidden Markov models to associate an utterance with a word. When a new hidden Markov model is generated by the model generation unit (20) the new hidden Markov model is processed by a confusability checker (26) against the hidden Markov models already stored in the database (24). A value indicative of the likelihood of utterances corresponding to the new model being confused with previously stored models is determined by the confusability checker (26) directly from the parameters for the new hidden Markov model and the other hidden Markov models stored in the database (24).

Type: Application

Filed: November 6, 2002

Publication date: August 28, 2003

Applicant: Canon Kabushiki Kaisha

Inventor: Andrea Sorrentino
HMM-based echo model for noise cancellation avoiding the problem of false triggers

Patent number: 6606595

Abstract: An automatic speech recognition system for the condition that an incoming caller's speech is quiet and a resulting echo (of a loud playing prompt) can cause the residual (the portion of the echo remaining after even echo cancellation) to be of the magnitude of the incoming speech input. Such loud echoes can falsely trigger the speech recognition system and interfere with the recognition of valid input speech. An echo model has been proven to alleviate this fairly common problem and to be effective in eliminating such false triggering. Further, this automatic speech recognition system enhanced the recognition of valid speech was provided within an existing hidden Markov modeling framework.

Type: Grant

Filed: August 31, 2000

Date of Patent: August 12, 2003

Assignee: Lucent Technologies Inc.

Inventors: Rathinavelu Chengalvarayan, Richard Harry Ketchum, Anand Rangaswamy Setlur, David Lynn Thomson
System and method for a spoken language interface to a large database of changing records

Publication number: 20030149566

Abstract: Embodiments of the present invention provide a spoken language interface to an information database. A grammars database based on the entries contained in the information database may be generated. The entries in the grammars database may be a compact representation of the entries in the information database. An index database based on the entries contained in the information database may be generated. The grammars database and the index database may be updated periodically based on updated entries contained in the information database. A recognized result of a user's communication based on the updated grammars database may be generated. The updated index database may be searched for a list of matching entries that match the recognized result. The list of matching entries may be output.

Type: Application

Filed: December 31, 2002

Publication date: August 7, 2003

Inventors: Esther Levin, Susan Boyce, Brian Helfrich, Yevgeniy Lyudovyk, Robert Burke, Ilija Zeljkovic
Block processing in a maximum a posteriori processor for reduced power consumption

Publication number: 20030139927

Abstract: A maximum a posteriori (MAP) processor employs a block processing technique for the MAP algorithm to provide a parallel architecture that allows for multiple word memory read/write processing and voltage scaling of a given circuit implementation. The block processing technique forms a merged trellis with states having modified branch inputs to provide the parallel structure. When block processing occurs, the trellis may be modified to show transitions from the oldest state at time k−N to the present state at time k. For the merged trellis, the number of states remains the same, but each state receives 2N input transitions instead of the two input transitions. Branch metrics associated with the transitions in the merged trellis are cumulative, and are employed for the update process of forward and backward probabilities by the MAP algorithm.

Type: Application

Filed: January 22, 2002

Publication date: July 24, 2003

Inventors: Thaddeus J. Gabara, Inkyu Lee, Maria Luisa Lopez-Vallejo, Syed Mujtaba
Dynamic programming operation with skip mode for text line image decoding

Patent number: 6594393

Abstract: In a text recognition system, the computational efficiency of a text line image decoding operation is improved by utilizing the characteristic of a graph known as the cut set. The branches of the data structure that represents the image are initially labeled with estimated scores. When estimated scores are used, the decoding operation must perform iteratively on a text line before producing the best path through the data structure. After each iteration, nodes in the best path are re-scored with actual scores. The decoding operation incorporates an operating mode called skip mode.

Type: Grant

Filed: May 12, 2000

Date of Patent: July 15, 2003

Inventors: Thomas P. Minka, Dan S. Bloomberg, Ashok C. Popat
Voice-activated control for electrical device

Patent number: 6594630

Abstract: An apparatus for voice-activated control of an electrical device comprises a receiving arrangement for receiving audio data generated by user. A vioce recognition arrangement is provided for determining whether the received audio data is a command word for controlling the electrical device. The voice recognition arrangement includes a microprocessor for comparing the received audio data with voice recognition data previously stored in the voice recognition arrangement. The voice recognition arrangment generates at least one control signal based on the comparison when the comparison reaches a predetermined threshold value. A power control controls power delivered to the electrical device. The power control is responsive to at least one control signal generated by the voice recognition arrangement for operating the electrical device in response to the at least one audio command generated by the user.

Type: Grant

Filed: November 19, 1999

Date of Patent: July 15, 2003

Assignee: Voice Signal Technologies, Inc.

Inventors: Igor Zlokarnik, Daniel Lawrence Roth
Pattern recognition based on piecewise linear probability density function

Patent number: 6594392

Abstract: The present invention is a method and apparatus to determine a similarity measure between first and second patterns. First and second storages store first and second feature vectors which represent the first and second patterns, respectively. A similarity estimator is coupled to the first and second storages to compute a similarity probability of the first and second feature vectors using a piecewise linear probability density function (PDF). The similarity probability corresponds to the similarity measure.

Type: Grant

Filed: May 17, 1999

Date of Patent: July 15, 2003

Assignee: Intel Corporation

Inventor: Umberto Santoni
Speech processing with hmm trained on tespar parameters

Publication number: 20030130846

Abstract: A method of signal modelling comprises inputting to a statistical signal modelling system the output of a deterministic modelling system to thereby effect a reduction in the overall computational overhead.

Type: Application

Filed: October 29, 2002

Publication date: July 10, 2003

Inventor: Reginald Alfred King
Keyword recognition system and method

Patent number: 6591237

Abstract: A keyword recognition system for speaker dependent, dynamic time warping (DTW) recognition systems uses all of the trained word templates in the system, (keyword and vocabulary), to determine if an utterance is a keyword utterance or not. The utterance is selected as the keyword if a keyword score indicates a significant match to the keyword template and if the keyword score indicates a better match than do the entirety of scores to the vocabulary word templates.

Type: Grant

Filed: December 13, 1999

Date of Patent: July 8, 2003

Assignee: Intel Corporation

Inventor: Adoram Erell
System and methods for optimizing networks of weighted unweighted directed graphs

Patent number: 6587844

Abstract: Unweighted finite state automata may be used in speech recognition systems, but considerably reduce the speed and accuracy of the speech recognition system. Unfortunately, developing a suitable training corpus for a speech recognition task is time consuming and expensive, if it is even possible. Additionally, it is unlikely that a training corpus could adequately reflect the various probabilities for the word and/or phoneme combinations. Accordingly, such very-large-vocabulary speech recognition systems often must be used in an unweighted state. The directed graph optimizing systems and methods determine the shortest distances between source and end nodes of a weighted directed graph. These various directed graph optimizing systems and methods also reweight the directed graph based on the determined shortest distances, so that the weights are, for example, front weighted. Accordingly, searches through the directed graph that are based on the total weights of the paths taken will be more efficient.

Type: Grant

Filed: February 1, 2000

Date of Patent: July 1, 2003

Assignee: AT&T Corp.

Inventor: Mehryar Mohri
Method for compressing dictionary data

Publication number: 20030120482

Abstract: The invention relates to pre-processing of a pronunciation dictionary for compression in a data processing device, the pronunciation dictionary comprising at least one entry, the entry comprising a sequence of character units and a sequence of phoneme units. According to one aspect of the invention the sequence of character units and the sequence of phoneme units are aligned using a statistical algorithm. The aligned sequence of character units and aligned sequence of phoneme units are interleaved by inserting each phoneme unit at a predetermined location relative to the corresponding character unit.

Type: Application

Filed: November 11, 2002

Publication date: June 26, 2003

Inventor: Jilei Tian
Fully expanded context-dependent networks for speech recognition

Patent number: 6574597

Abstract: A large vocabulary speech recognizer including a combined weighted network of transducers reflecting fully expanded context-dependent modeling of pronunciations and language that can be used with a single-pass Viterbi or other coder based on sequences of labels provided by feature analysis of input speech.

Type: Grant

Filed: February 11, 2000

Date of Patent: June 3, 2003

Assignee: AT&T Corp.

Inventors: Mehryar Mohri, Michael Dennis Riley
Confidence measure system using a near-miss pattern

Patent number: 6571210

Abstract: A method and system of performing confidence measure in a speech recognition system includes receiving an utterance of input speech and creating a near-miss pattern or a near-miss list of possible word entries for the utterance. Each word entry includes an associated value of probability that the utterance corresponds to the word entry. The near-miss list of possible word entries is compared with corresponding stored near-miss confidence templates. Each word in the vocabulary (or keyword list) of near-miss confidence template, which includes a list of word entries and each word entry in each list includes an associated value. Confidence measure for a particular hypothesis word is performed based on the comparison of the values in the near-miss list of possible word entries with the values of the corresponding near-miss confidence template.

Type: Grant

Filed: November 13, 1998

Date of Patent: May 27, 2003

Assignee: Microsoft Corporation

Inventors: Hsiao-Wuen Hon, Asela J. R. Gunawardana
Information processing apparatus and method, a computer readable medium storing a control program for making a computer implemented information process, and a control program for selecting a specific grammar corresponding to an active input field or for controlling selection of a grammar or comprising a code of a selection step of selecting a specific grammar

Publication number: 20030097264

Abstract: An information processing apparatus inputs a document having a plurality of input items, and displays it using an information display unit. An active input item is discriminated from the plurality of input items in accordance with the display state of the document. A specific grammar corresponding to the discriminated active input item is selected from a grammar holding unit for holding a plurality of types of grammars, and the selected grammar is used in a speech recognition process.

Type: Application

Filed: November 7, 2002

Publication date: May 22, 2003

Applicant: CANON KABUSHIKI KAISHA

Inventors: Tetsuo Kosaka, Takaya Ueda, Fumiaki Ito, Hiroki Yamamoto, Yuji Ikeda
HMM-based text-to-phoneme parser and method for training same

Publication number: 20030088416

Abstract: An HMM-based text-to-phoneme parser uses probability information within a probability database to generate one or more phoneme strings for a written input word. Techniques for training the text-to-phoneme parser are provided.

Type: Application

Filed: November 6, 2001

Publication date: May 8, 2003

Applicant: D.S.P.C. TECHNOLOGIES LTD.

Inventor: Meir Griniasty
Speech recognition system including manner discrimination

Publication number: 20030088409

Abstract: A speech recognition system is trained to be sensitive not only to the actual spoken text, but also to the manner in which the text is spoken, for example, whether something is said confidently, or hesitatingly. In the preferred embodiment, this is achieved by using a Hidden Markov Model (HMM) as the recognition engine, and training the HMM to recognise different styles of input. This approach finds particular application in the telephony voice processing environment, where short caller responses need to be recognised, and the system can then react in a fashion appropriate to the tone or manner in which the caller has spoken.

Type: Application

Filed: December 20, 2002

Publication date: May 8, 2003

Applicant: International Business Machines Corporation

Inventor: Robert Harris
Low complexity speaker verification using simplified hidden markov models with universal cohort models and automatic score thresholding

Patent number: 6556969

Abstract: A low complexity speaker verification system that employs universal cohort models an automatic score thresholding. The universal cohort models are generated using a simplified cohort model generating scheme. In certain embodiments of the invention, a simplified hidden Markov modeling (HMM) scheme is used to generate the cohort models. In addition, the low complexity speaker verification system is trained by various users of the low complexity speaker verification system. The total number of users of the low complexity speaker verification system may be modified over time as required by the specific application, and the universal cohort models may be updated accordingly to accommodate the new users. The present invention employs a combination of universal cohort modeling and thresholding to ensure high performance.

Type: Grant

Filed: September 30, 1999

Date of Patent: April 29, 2003

Assignee: Conexant Systems, Inc.

Inventors: Khaled Assaleh, Ayman Asadi
Speech recognition method and apparatus utilizing multiple feature streams

Patent number: 6542866

Abstract: A method and apparatus is provided for using multiple feature streams in speech recognition. In the method and apparatus, a feature extractor generates at least two feature vectors for a segment of an input signal. A decoder then generates a path score that is indicative of the probability that a word is represented by the input signal. The path score is generated by selecting the best feature vector to use for each segment. For each segment, the corresponding part in the path score for that segment is based in part on a chosen segment score that is selected from a group of at least two segment scores. The segment scores each represent a separate probability that a particular segment unit (e.g. senone, phoneme, diphone, triphone, or word) appears in that segment of the input signal. Although each segment score in the group relates to the same segment unit, the scores are based on different feature vectors for the segment.

Type: Grant

Filed: September 22, 1999

Date of Patent: April 1, 2003

Assignee: Microsoft Corporation

Inventors: Li Jiang, Xuedong Huang
Method of calculating HMM output probability and speech recognition apparatus

Publication number: 20030061044

Abstract: In speech recognition based on HMM, in which speech recognition is performed by performing vector quantization and obtaining an output probability by table reference, the amount of computation and use of memory area are minimized while achieving a high ability of recognition.

Type: Application

Filed: July 18, 2002

Publication date: March 27, 2003

Applicant: SEIKO EPSON CORPORATION

Inventors: Yasunaga Miyazawa, Hiroshi Hasegawa
Apparatus with speech recognition and method therefor

Publication number: 20030055645

Abstract: Briefly, in accordance with one embodiment of the invention, a recognition system may modify speech models using noise in an input signal that is received prior to a speech sample in the input sample.

Type: Application

Filed: September 18, 2001

Publication date: March 20, 2003

Inventor: Meir Griniasty
Method for noise adaptation in automatic speech recognition using transformed matrices

Patent number: 6529872

Abstract: The improved noise adaptation technique employs a linear or non-linear transformation to the set of Jacobian matrices corresponding to an initial noise condition. An &agr;-adaptation parameter or artificial intelligence operation is employed in a linear or non-linear way to increase the adaptation bias added to the speech models. This corrects shortcomings of conventional Jacobian adaptation, which tend to underestimate the effect of noise. The improved adaptation technique is further enhanced by a reduced dimensionality, principal component analysis technique that reduces the computational burden, making the adaptation technique beneficial in embedded recognition systems.

Type: Grant

Filed: April 18, 2000

Date of Patent: March 4, 2003

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Christophe Cerisara, Luca Rigazio, Robert Boman, Jean-Claude Junqua
Remote control with speech recognition

Patent number: 6526381

Abstract: A processor-based system may utilize a remote control unit which not only allows mouse input commands to be provided to the processor-based system but also includes a microphone and a speech engine for decoding spoken commands and providing code for presenting the commands to the processor-based unit. The processor-based system may provide information to the remote control unit about the vocabulary currently being used by applications active on the processor-based system. This allows the speech engine in the remote control unit to focus on a more limited vocabulary, increasing the accuracy of the speech recognition function and decreasing the capabilities necessary in the remote control unit based speech engine.

Type: Grant

Filed: September 30, 1999

Date of Patent: February 25, 2003

Assignee: Intel Corporation

Inventor: Andrew T. Wilson
Discriminative clustering methods for automatic speech recognition

Patent number: 6526379

Abstract: The discriminative clustering technique tests a provided set of Gaussian distributions corresponding to an acoustic vector space. A distance metric, such as the Bhattacharyya distance, is used to assess which distributions are sufficiently proximal to be merged into a new distribution. Merging is accomplished by computing the centroid of the new distribution by minimizing the Bhattacharyya distance between the parameters of the Gaussian distributions being merged.

Type: Grant

Filed: November 29, 1999

Date of Patent: February 25, 2003

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Luca Rigazio, Brice Tsakam, Jean-Claude Junqua
Character recognition system

Patent number: 6526170

Abstract: A character recognition system is disclosed, In a feature extraction parameter storage section 22 a transformation matrix for reducing a number of dimensions of feature parameters and a codebook for quantization are stored. In an HMM storage section 23 a constitution and parameters of Hidden Markov Model (HMM) for character string expression are stored. A feature extraction section 32 scans a word image given from an image storage means from left to right in a predetermined cycle with a slit having a sufficiently small width than the character width and thus outputs a feature symbol at each predetermined timing. A matching section 33 matches a feature symbol row and a probability maximization HMM state, thereby recognizing the character string.

Type: Grant

Filed: December 13, 1994

Date of Patent: February 25, 2003

Assignee: NEC Corporation

Inventor: Shinji Matsumoto
Speech recognition system having parallel large vocabulary recognition engines

Patent number: 6526380

Abstract: A huge vocabulary speech recognition system for recognizing a sequence of spoken words, having an input means for receiving a time-sequential input pattern representative of the sequence of spoken words. The system further includes a plurality of large vocabulary speech recognizers each being associated with a respective, different large vocabulary recognition model. Each of the recognition models is targeted to a specific part of the huge vocabulary. The system comprises a controller operative to direct the input pattern to a plurality of the speech recognizers and to select a recognized word sequence from the word sequences recognized by the plurality of speech recognizers.

Type: Grant

Filed: August 9, 1999

Date of Patent: February 25, 2003

Assignee: Koninklijke Philips Electronics N.V.

Inventors: Eric Thelen, Stefan Besling, Meinhard Ullrich

prev … 5 6 7 8 9 10 11 12 13 … next