Markov Patents (Class 704/256)
  • Publication number: 20040044531
    Abstract: The invention provides a method of speech recognition comprising the steps of receiving a signal comprising one or more spoken words, extracting a spoken word from the signal using a Hidden Markov Model, passing the spoken word to a plurality of word models, one or more of the word models based on a Hidden Markov Model, determining the word model most likely to represent the spoken word, and outputting the word model representing the spoken word. The invention also provides a related speech recognition system and a speech recognition computer program.
    Type: Application
    Filed: September 5, 2003
    Publication date: March 4, 2004
    Inventors: Nikola Kirilov Kasabov, Waleed Habib Abdulla
  • Patent number: 6697778
    Abstract: Client speaker locations in a speaker space are used to generate speech models for comparison with test speaker data or test speaker speech models. The speaker space can be constructed using training speakers that are entirely separate from the population of client speakers, or from client speakers, or from a mix of training and client speakers. Reestimation of the speaker space based on client environment information is also provided to improve the likelihood that the client data will fall within the speaker space. During enrollment of the clients into the speaker space, additional client speech can be obtained when predetermined conditions are met. The speaker distribution can also be used in the client enrollment step.
    Type: Grant
    Filed: July 5, 2000
    Date of Patent: February 24, 2004
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Roland Kuhn, Olivier Thyes, Patrick Nguyen, Jean-Claude Junqua, Robert Boman
  • Patent number: 6691090
    Abstract: A method for use in a speech recognition system in which a speech waveform to be modelled is represented by a set of feature extracted parameters in the time domain, the method comprising dividing individual ones of one or more of said feature extracted parameters to provide for each divided feature extracted parameter a plurality of frequency channels, and demodulating at least one of the plurality of frequency channels to provide at least one corresponding baseband frequency signal.
    Type: Grant
    Filed: October 24, 2000
    Date of Patent: February 10, 2004
    Assignee: Nokia Mobile Phones Limited
    Inventors: Kari Laurila, Jilei Tian
  • Publication number: 20040024599
    Abstract: A technique for audio searches by statistical pattern matching is disclosed. The audio to be located is processed for feature extraction and decoded using a maximum likelihood (“ML”) search. A left-right Hidden Markov Model (“HMM”) is constructed from the ML state sequence. Transition probabilities are defined as normalized state occupancies from the most likely state sequence of the decoding operation. Utterance duration is measured from the search sample. Other model parameters are gleaned from an acoustic model. A ML search of an audio corpus is conducted with respect to the HMM and a garbage model. New start states are added at each frame. Low scoring and long state sequences (with respect to the search sample duration) are discarded at each frame. Locations where scores of the new model are higher than those of the garbage model are marked as potential matches. The highest scoring matches are presented as results.
    Type: Application
    Filed: July 31, 2002
    Publication date: February 5, 2004
    Applicant: Intel Corporation
    Inventor: Michael E. Deisher
  • Publication number: 20040015358
    Abstract: A differential compression technique is disclosed for compression individual speaker models, such as Gaussian mixture models, by computing a delta model from the difference between an individual speaker model and a baseline model. Further compression may be applied to the delta model to reduce the large storage requirements generally attributed to speaker models.
    Type: Application
    Filed: January 2, 2003
    Publication date: January 22, 2004
    Applicant: Massachusetts Institute of Technology
    Inventor: Douglas A. Reynolds
  • Patent number: 6681207
    Abstract: A method and system that improves voice recognition by improving storage of voice recognition (VR) templates. The improved storage means that more VR models can be stored in memory. The more VR models that are stored in memory, the more robust the VR system and therefore the more accurate the VR system. Lossy compression techniques are used to compress VR models. In one embodiment, A-law compression and A-law expansion are used to compress and expand VR models. In another embodiment, Mu-law compression and Mu-law expansion are used to compress and expand VR models. VR models are compressed during a training process and they are expanded during voice recognition.
    Type: Grant
    Filed: January 12, 2001
    Date of Patent: January 20, 2004
    Assignee: Qualcomm Incorporated
    Inventor: Harinath Garudadri
  • Publication number: 20040002863
    Abstract: An arrangement is provided for embedded coupled hidden Markov model. To train an embedded coupled hidden Markov model, training data is first segmented into uniform segments at different layers of the embedded coupled hidden Markov model. At each layer, a uniform segment corresponds to a state of a coupled hidden Markov model at that layer. An optimal segmentation is generated at the lower layer based on the uniform segmentation and is then used to update parameters of models associated with the states of coupled hidden Markov models at lower layer. The updated model parameters at the lower layer are then used to update the model parameters associated with states at the super layer.
    Type: Application
    Filed: June 27, 2002
    Publication date: January 1, 2004
    Applicant: Intel Corporation
    Inventor: Ara V. Nefian
  • Patent number: 6671669
    Abstract: A method and system that combines voice recognition engines and resolves any differences between the results of individual voice recognition engines. A speaker independent (SI) Hidden Markov Model (HMM) engine, a speaker independent Dynamic Time Warping (DTW-SI) engine and a speaker dependent Dynamic Time Warping (DTW-SD) engine are combined. Combining and resolving the results of these engines results in a system with better recognition accuracy and lower rejection rates than using the results of only one engine.
    Type: Grant
    Filed: July 18, 2000
    Date of Patent: December 30, 2003
    Assignee: Qualcomm Incorporated
    Inventors: Harinath Garudadri, David Puig Oses, Ning Bi, Yingyong Qi
  • Publication number: 20030233233
    Abstract: Methods and systems for recognizing speech include receiving information reflecting the speech, determining at least one broad-class of the received information, classifying the received information based on the determined broad-class, selecting a model based on the classification of the received information, and recognizing the speech using the selected model and the received information.
    Type: Application
    Filed: June 13, 2002
    Publication date: December 18, 2003
    Applicant: INDUSTRIAL TECHNOLOGY RESEARCH INSTITUTE
    Inventor: Wei-Tyng Hong
  • Publication number: 20030233234
    Abstract: Audio coding processes like quantization can cause spectral components of an encoded audio signal to be set to zero, creating spectral holes in the signal. These spectral holes can degrade the perceived quality of audio signals that are reproduced by audio coding systems. An improved decoder avoids or reduces the degradation by filling the spectral holes with synthesized spectral components. An improved encoder may also be used to realize further improvements in the decoder.
    Type: Application
    Filed: June 17, 2002
    Publication date: December 18, 2003
    Inventors: Michael Mead Truman, Grant Allen Davidson, Matthew Conrad Fellers, Mark Stuart Vinton, Matthew Aubrey Watson, Charles Quito Robinson
  • Patent number: 6662160
    Abstract: An adaptive speech recognition method with noise compensation is disclosed. In speech recognition, optimal equalization factors for feature vectors of a plurality of speech frames corresponding to each probability density function in a speech model are determined based on the plurality of speech frames of the input speech and the speech model. The parameters of the speech model are adapted by the optimal equalization factor and a bias compensation vector, which is corresponding to and retrieved by the optimal equalization factor. The optimal equalization factor is provided to adjust a distance of the mean vector in the speech model. The bias compensation vector is provided to adjust a direction change of the mean vector in the speech model.
    Type: Grant
    Filed: October 26, 2000
    Date of Patent: December 9, 2003
    Assignee: Industrial Technology Research Inst.
    Inventors: Jen-Tzung Chien, Kuo-Kuan Wu, Po-Cheng Chen
  • Patent number: 6662158
    Abstract: A method and apparatus is provided for identifying patterns from a series of feature vectors representing a time-varying signal. The method and apparatus use both a frame-based model and a segment model in a unified framework. The frame-based model determines the probability of an individual feature vector given a frame state. The segment model determines the probability of sub-sequences of feature vectors given a single segment state. The probabilities from the frame-based model and the segment model are then combined to form a single path score that is indicative of the probability of a sequence of patterns. Another aspect of the invention is the use of a frame-based model and a segment model to segment feature vectors during model training. Under this aspect of the invention, the frame-based model and the segment model are used together to identify probabilities associated with different segmentations.
    Type: Grant
    Filed: April 27, 2000
    Date of Patent: December 9, 2003
    Assignee: Microsoft Corporation
    Inventors: Hsiao-Wuen Hon, Kuansan Wang
  • Publication number: 20030225581
    Abstract: Speech recognition is performed by matching between a characteristic quantity of an inputted speech and a composite HMM obtained by synthesizing a speech HMM (hidden Markov model) and a noise HMM for each speech frame of the inputted speech by use of the composite HMM.
    Type: Application
    Filed: March 14, 2003
    Publication date: December 4, 2003
    Applicant: International Business Machines Corporation
    Inventors: Tetsuya Takiguchi, Masafumi Nishimura
  • Patent number: 6658385
    Abstract: On improved transformation method uses an initial set of Hidden Markov Models (HMMs) trained on a large amount of speech recorded in a low noise environment R to provide rich information on co-articulation and speaker variation and a smaller database in a more noisy target environment T. A set H of HMMs is trained with data provided in the low noise environment R and the utterances in the noisy environment T are transcribed phonetically using set H of HMMs. The transcribed segments are grouped into a set of Classes C. For each subclass c of Classes C, the transformation &PHgr;c is found to maximize likelihood utterances in T, given H. The HMMs are transformed and steps repeated until likelihood stabilizes.
    Type: Grant
    Filed: February 10, 2000
    Date of Patent: December 2, 2003
    Assignee: Texas Instruments Incorporated
    Inventors: Yifan Gong, John J. Godfrey
  • Publication number: 20030220791
    Abstract: A true/false judgment on a result of speech recognition is made with high accuracy using a less volume of processing. By comparing acoustic models HMMsb against the feature vector sequence V(n) of utterances, a recognition result RCG specifying the acoustic model HMMsb having the maximum likelihood, a first score FSCR indicting the value of the maximum likelihood, and a second score SSCR indicating the value of the second highest likelihood are found. Then, by comparing an evaluation value FSCR×(FSCR−SSCR) based on the first score FSCR and the second score SSCR with a pre-set threshold value THD, a true/false judgment on the recognition result RCG is made. When the recognition result RCG is judged as being true, speaker adaptation is applied to the acoustic models HMMsb, and when the recognition result RCG is judged as being false, speaker adaptation is not applied to the acoustic models HMMsb. It is thus possible to improve the accuracy of speaker adaptation.
    Type: Application
    Filed: April 25, 2003
    Publication date: November 27, 2003
    Applicant: Pioneer Corporation
    Inventor: Soichi Toyama
  • Publication number: 20030220792
    Abstract: A speech recognition device comprises an HMM model database which prestores keyword HMMs which represent feature patterns of keywords to be recognized, likelihood calculator which calculates the likelihood of an extracted feature value of a speech signal in each frame by comparing it with keyword HMMs and designated-speech HMMs, extraneous-speech likelihood setting device which sets extraneous-speech likelihood based on the calculated likelihood of a match with the designated-speech HMMs, matching processor which performs a matching process based on the calculated likelihood and the extraneous-speech likelihood, and determining device which determines the keywords contained in the spontaneous speech based on the matching process.
    Type: Application
    Filed: May 19, 2003
    Publication date: November 27, 2003
    Applicant: Pioneer Corporation
    Inventors: Hajime Kobayashi, Soichi Toyama
  • Publication number: 20030212557
    Abstract: A speech recognition method includes use of synchronous or asynchronous audio and a video data to enhance speech recognition probabilities. A two stream coupled hidden Markov model is trained and used to identify speech. At least one stream is derived from audio data and a second stream is derived from mouth pattern data. Gestural or other suitable data streams can optionally be combined to reduce speech recognition error rates in noisy environments.
    Type: Application
    Filed: May 9, 2002
    Publication date: November 13, 2003
    Inventor: Ara V. Nefian
  • Publication number: 20030212556
    Abstract: A speech recognition method includes use of synchronous or asynchronous audio and a video data to enhance speech recognition probabilities. A two stream factorial hidden Markov model is trained and used to identify speech. At least one stream is derived from audio data and a second stream is derived from mouth pattern data. Gestural or other suitable data streams can optionally be combined to reduce speech recognition error rates in noisy environments.
    Type: Application
    Filed: May 9, 2002
    Publication date: November 13, 2003
    Inventor: Ara V. Nefian
  • Publication number: 20030200086
    Abstract: A speech recognition apparatus comprises a speech analyzer which extracts feature patterns of spontaneous speech divided into frames; a keyword model database which prestores keyword which represent feature patterns of a plurality of keywords to be recognized; a garbage model database which prestores feature patterns of components of extraneous speech to be identified; and a first likelihood calculator which calculates likelihood of feature values based on feature values patterns of each frames and keywords; a second likelihood calculator which calculates likelihood of feature values based on feature values patterns of each frames and extraneous speech. The device recognizes keywords contained in the spontaneous speech by calculating cumulative likelihood based on the calculated likelihood adding a predetermined correction value in the second likelihood calculator.
    Type: Application
    Filed: April 15, 2003
    Publication date: October 23, 2003
    Applicant: PIONEER CORPORATION
    Inventors: Yoshihiro Kawazoe, Hajime Kobayashi
  • Patent number: 6633845
    Abstract: The invention provides a method and apparatus for automatically generating a summary or key phrase for a song. The song, or a portion thereof, is digitized and converted into a sequence of feature vectors, such mel-frequency cepstral coefficients (MFCCs). The feature vectors are then processed in order decipher the song's structure. Those sections that correspond to different structural elements are then marked with corresponding labels. Once the song is labeled, various heuristics are applied to select a key phrase corresponding to the song's summary. For example, the system may identify the label that appears most frequently within the song, and then select the longest duration of that label as the summary.
    Type: Grant
    Filed: April 7, 2000
    Date of Patent: October 14, 2003
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Beth Teresa Logan, Stephen Mingyu Chu
  • Patent number: 6633842
    Abstract: An estimate of clean speech vector, typically Mel-Frequency Cepstral Coefficient (MFCC) given its noisy observation is provided. The method makes use of two Gaussian mixtures. The first one is trained on clean speech and the second is derived from the first one using some noise samples. The method gives an estimate of a clean speech feature vector as the conditional expectancy of clean speech given an observed noisy vector.
    Type: Grant
    Filed: September 21, 2000
    Date of Patent: October 14, 2003
    Assignee: Texas Instruments Incorporated
    Inventor: Yifan Gong
  • Publication number: 20030187644
    Abstract: Systems and methods for identifying the N-best strings of a weighted automaton. A potential for each state of an input automaton to a set of destination states of the input automaton is first determined. Then, the N-best paths are found in the result of an on-the-fly determinization of the input automaton. Only the portion of the input automaton needed to identify the N-best paths is determinized. As the input automaton is determinized, a potential for each new state of the partially determinized automaton is determined and is used in identifying the N-best paths of the determinized automaton, which correspond exactly to the N-best strings of the input automaton.
    Type: Application
    Filed: November 21, 2002
    Publication date: October 2, 2003
    Inventors: Mehryar Mohri, Michael Dennis Riley
  • Publication number: 20030187645
    Abstract: In many real applications such as voice control in vehicles there is the problem that the users change relatively frequently. Then the question arises: which is the correct data set for the current user? The invention provides a process making it possible automatically for the duration of operation of the system to recognize whether the speaker changes, or which (speaker dependent) data set is correct for the actual user. This task is solved by a speech recognition system which is based on a so-called Semi-Continuous Hidden Markov Model (SCHMM). Codebooks are produced, normal distribution is represented, speaker-specific data sets are stored in addition to a so-called base-line data set, and the inventive speech recognition system correlates the speech signal by means of vector quantitization with the speaker-independent and the speaker-dependent codebooks, making it possible to ascertain the identity of the speaker.
    Type: Application
    Filed: March 3, 2003
    Publication date: October 2, 2003
    Inventors: Fritz Class, Udo Haiber, Alfred Kaltenmeier
  • Patent number: 6629073
    Abstract: A speech recognition method and system utilize an acoustic model that is capable of providing probabilities for both a large acoustic unit and an acoustic sub-unit. Each of these probabilities describes the likelihood of a set of feature vectors from a series of feature vectors representing a speech signal. The large acoustic unit is formed from a plurality of acoustic sub-units. At least one sub-unit probability and at least on large unit probability from the acoustic model are used by a decoder to generate a score for a sequence of hypothesized words. When combined, the acoustic sub-units associated with all of the sub-unit probabilities used to determine the score span fewer than all of the feature vectors in the series of feature vectors. An overlapping decoding technique is also provided.
    Type: Grant
    Filed: April 27, 2000
    Date of Patent: September 30, 2003
    Assignee: Microsoft Corporation
    Inventors: Hsiao-Wuen Hon, Kuansan Wang
  • Publication number: 20030163313
    Abstract: A model generation unit (17) is provided. The model generation unit includes an alignment module (80) arranged to receive pairs of sequences of parameter frame vectors from a buffer (16) and to perform dynamic time warping of the parameter frame vectors to align corresponding parts of the pair of utterances. A consistency checking module (82) is provided to determine whether the aligned parameter frame vectors correspond to the same word. If this is the case the aligned parameter frame vectors are passed to a clustering module (84) which groups the parameter frame vectors into a number of clusters. Whilst clustering the parameter frame vectors, the clustering module (80) determines for each grouping an objective function calculating the best fit of a model to the clusters per degrees of freedom of that model.
    Type: Application
    Filed: November 6, 2002
    Publication date: August 28, 2003
    Applicant: Canon Kabushiki Kaisha
    Inventor: David Llewellyn Rees
  • Publication number: 20030163312
    Abstract: A speech recognition system is disclosed including a model generation unit (20) and a speech recognition unit (22). When signals are received from a microphone (7) the model generation unit (20) utilises the signals to generate hidden Markov models that are stored in a hidden Markov model database (24). Subsequently, when utterances are to be recognised, the speech recognition unit (22) utilises the stored hidden Markov models to associate an utterance with a word. When a new hidden Markov model is generated by the model generation unit (20) the new hidden Markov model is processed by a confusability checker (26) against the hidden Markov models already stored in the database (24). A value indicative of the likelihood of utterances corresponding to the new model being confused with previously stored models is determined by the confusability checker (26) directly from the parameters for the new hidden Markov model and the other hidden Markov models stored in the database (24).
    Type: Application
    Filed: November 6, 2002
    Publication date: August 28, 2003
    Applicant: Canon Kabushiki Kaisha
    Inventor: Andrea Sorrentino
  • Patent number: 6606595
    Abstract: An automatic speech recognition system for the condition that an incoming caller's speech is quiet and a resulting echo (of a loud playing prompt) can cause the residual (the portion of the echo remaining after even echo cancellation) to be of the magnitude of the incoming speech input. Such loud echoes can falsely trigger the speech recognition system and interfere with the recognition of valid input speech. An echo model has been proven to alleviate this fairly common problem and to be effective in eliminating such false triggering. Further, this automatic speech recognition system enhanced the recognition of valid speech was provided within an existing hidden Markov modeling framework.
    Type: Grant
    Filed: August 31, 2000
    Date of Patent: August 12, 2003
    Assignee: Lucent Technologies Inc.
    Inventors: Rathinavelu Chengalvarayan, Richard Harry Ketchum, Anand Rangaswamy Setlur, David Lynn Thomson
  • Publication number: 20030149566
    Abstract: Embodiments of the present invention provide a spoken language interface to an information database. A grammars database based on the entries contained in the information database may be generated. The entries in the grammars database may be a compact representation of the entries in the information database. An index database based on the entries contained in the information database may be generated. The grammars database and the index database may be updated periodically based on updated entries contained in the information database. A recognized result of a user's communication based on the updated grammars database may be generated. The updated index database may be searched for a list of matching entries that match the recognized result. The list of matching entries may be output.
    Type: Application
    Filed: December 31, 2002
    Publication date: August 7, 2003
    Inventors: Esther Levin, Susan Boyce, Brian Helfrich, Yevgeniy Lyudovyk, Robert Burke, Ilija Zeljkovic
  • Publication number: 20030139927
    Abstract: A maximum a posteriori (MAP) processor employs a block processing technique for the MAP algorithm to provide a parallel architecture that allows for multiple word memory read/write processing and voltage scaling of a given circuit implementation. The block processing technique forms a merged trellis with states having modified branch inputs to provide the parallel structure. When block processing occurs, the trellis may be modified to show transitions from the oldest state at time k−N to the present state at time k. For the merged trellis, the number of states remains the same, but each state receives 2N input transitions instead of the two input transitions. Branch metrics associated with the transitions in the merged trellis are cumulative, and are employed for the update process of forward and backward probabilities by the MAP algorithm.
    Type: Application
    Filed: January 22, 2002
    Publication date: July 24, 2003
    Inventors: Thaddeus J. Gabara, Inkyu Lee, Maria Luisa Lopez-Vallejo, Syed Mujtaba
  • Patent number: 6594393
    Abstract: In a text recognition system, the computational efficiency of a text line image decoding operation is improved by utilizing the characteristic of a graph known as the cut set. The branches of the data structure that represents the image are initially labeled with estimated scores. When estimated scores are used, the decoding operation must perform iteratively on a text line before producing the best path through the data structure. After each iteration, nodes in the best path are re-scored with actual scores. The decoding operation incorporates an operating mode called skip mode.
    Type: Grant
    Filed: May 12, 2000
    Date of Patent: July 15, 2003
    Inventors: Thomas P. Minka, Dan S. Bloomberg, Ashok C. Popat
  • Patent number: 6594630
    Abstract: An apparatus for voice-activated control of an electrical device comprises a receiving arrangement for receiving audio data generated by user. A vioce recognition arrangement is provided for determining whether the received audio data is a command word for controlling the electrical device. The voice recognition arrangement includes a microprocessor for comparing the received audio data with voice recognition data previously stored in the voice recognition arrangement. The voice recognition arrangment generates at least one control signal based on the comparison when the comparison reaches a predetermined threshold value. A power control controls power delivered to the electrical device. The power control is responsive to at least one control signal generated by the voice recognition arrangement for operating the electrical device in response to the at least one audio command generated by the user.
    Type: Grant
    Filed: November 19, 1999
    Date of Patent: July 15, 2003
    Assignee: Voice Signal Technologies, Inc.
    Inventors: Igor Zlokarnik, Daniel Lawrence Roth
  • Patent number: 6594392
    Abstract: The present invention is a method and apparatus to determine a similarity measure between first and second patterns. First and second storages store first and second feature vectors which represent the first and second patterns, respectively. A similarity estimator is coupled to the first and second storages to compute a similarity probability of the first and second feature vectors using a piecewise linear probability density function (PDF). The similarity probability corresponds to the similarity measure.
    Type: Grant
    Filed: May 17, 1999
    Date of Patent: July 15, 2003
    Assignee: Intel Corporation
    Inventor: Umberto Santoni
  • Publication number: 20030130846
    Abstract: A method of signal modelling comprises inputting to a statistical signal modelling system the output of a deterministic modelling system to thereby effect a reduction in the overall computational overhead.
    Type: Application
    Filed: October 29, 2002
    Publication date: July 10, 2003
    Inventor: Reginald Alfred King
  • Patent number: 6591237
    Abstract: A keyword recognition system for speaker dependent, dynamic time warping (DTW) recognition systems uses all of the trained word templates in the system, (keyword and vocabulary), to determine if an utterance is a keyword utterance or not. The utterance is selected as the keyword if a keyword score indicates a significant match to the keyword template and if the keyword score indicates a better match than do the entirety of scores to the vocabulary word templates.
    Type: Grant
    Filed: December 13, 1999
    Date of Patent: July 8, 2003
    Assignee: Intel Corporation
    Inventor: Adoram Erell
  • Patent number: 6587844
    Abstract: Unweighted finite state automata may be used in speech recognition systems, but considerably reduce the speed and accuracy of the speech recognition system. Unfortunately, developing a suitable training corpus for a speech recognition task is time consuming and expensive, if it is even possible. Additionally, it is unlikely that a training corpus could adequately reflect the various probabilities for the word and/or phoneme combinations. Accordingly, such very-large-vocabulary speech recognition systems often must be used in an unweighted state. The directed graph optimizing systems and methods determine the shortest distances between source and end nodes of a weighted directed graph. These various directed graph optimizing systems and methods also reweight the directed graph based on the determined shortest distances, so that the weights are, for example, front weighted. Accordingly, searches through the directed graph that are based on the total weights of the paths taken will be more efficient.
    Type: Grant
    Filed: February 1, 2000
    Date of Patent: July 1, 2003
    Assignee: AT&T Corp.
    Inventor: Mehryar Mohri
  • Publication number: 20030120482
    Abstract: The invention relates to pre-processing of a pronunciation dictionary for compression in a data processing device, the pronunciation dictionary comprising at least one entry, the entry comprising a sequence of character units and a sequence of phoneme units. According to one aspect of the invention the sequence of character units and the sequence of phoneme units are aligned using a statistical algorithm. The aligned sequence of character units and aligned sequence of phoneme units are interleaved by inserting each phoneme unit at a predetermined location relative to the corresponding character unit.
    Type: Application
    Filed: November 11, 2002
    Publication date: June 26, 2003
    Inventor: Jilei Tian
  • Patent number: 6574597
    Abstract: A large vocabulary speech recognizer including a combined weighted network of transducers reflecting fully expanded context-dependent modeling of pronunciations and language that can be used with a single-pass Viterbi or other coder based on sequences of labels provided by feature analysis of input speech.
    Type: Grant
    Filed: February 11, 2000
    Date of Patent: June 3, 2003
    Assignee: AT&T Corp.
    Inventors: Mehryar Mohri, Michael Dennis Riley
  • Patent number: 6571210
    Abstract: A method and system of performing confidence measure in a speech recognition system includes receiving an utterance of input speech and creating a near-miss pattern or a near-miss list of possible word entries for the utterance. Each word entry includes an associated value of probability that the utterance corresponds to the word entry. The near-miss list of possible word entries is compared with corresponding stored near-miss confidence templates. Each word in the vocabulary (or keyword list) of near-miss confidence template, which includes a list of word entries and each word entry in each list includes an associated value. Confidence measure for a particular hypothesis word is performed based on the comparison of the values in the near-miss list of possible word entries with the values of the corresponding near-miss confidence template.
    Type: Grant
    Filed: November 13, 1998
    Date of Patent: May 27, 2003
    Assignee: Microsoft Corporation
    Inventors: Hsiao-Wuen Hon, Asela J. R. Gunawardana
  • Publication number: 20030097264
    Abstract: An information processing apparatus inputs a document having a plurality of input items, and displays it using an information display unit. An active input item is discriminated from the plurality of input items in accordance with the display state of the document. A specific grammar corresponding to the discriminated active input item is selected from a grammar holding unit for holding a plurality of types of grammars, and the selected grammar is used in a speech recognition process.
    Type: Application
    Filed: November 7, 2002
    Publication date: May 22, 2003
    Applicant: CANON KABUSHIKI KAISHA
    Inventors: Tetsuo Kosaka, Takaya Ueda, Fumiaki Ito, Hiroki Yamamoto, Yuji Ikeda
  • Publication number: 20030088416
    Abstract: An HMM-based text-to-phoneme parser uses probability information within a probability database to generate one or more phoneme strings for a written input word. Techniques for training the text-to-phoneme parser are provided.
    Type: Application
    Filed: November 6, 2001
    Publication date: May 8, 2003
    Applicant: D.S.P.C. TECHNOLOGIES LTD.
    Inventor: Meir Griniasty
  • Publication number: 20030088409
    Abstract: A speech recognition system is trained to be sensitive not only to the actual spoken text, but also to the manner in which the text is spoken, for example, whether something is said confidently, or hesitatingly. In the preferred embodiment, this is achieved by using a Hidden Markov Model (HMM) as the recognition engine, and training the HMM to recognise different styles of input. This approach finds particular application in the telephony voice processing environment, where short caller responses need to be recognised, and the system can then react in a fashion appropriate to the tone or manner in which the caller has spoken.
    Type: Application
    Filed: December 20, 2002
    Publication date: May 8, 2003
    Applicant: International Business Machines Corporation
    Inventor: Robert Harris
  • Patent number: 6556969
    Abstract: A low complexity speaker verification system that employs universal cohort models an automatic score thresholding. The universal cohort models are generated using a simplified cohort model generating scheme. In certain embodiments of the invention, a simplified hidden Markov modeling (HMM) scheme is used to generate the cohort models. In addition, the low complexity speaker verification system is trained by various users of the low complexity speaker verification system. The total number of users of the low complexity speaker verification system may be modified over time as required by the specific application, and the universal cohort models may be updated accordingly to accommodate the new users. The present invention employs a combination of universal cohort modeling and thresholding to ensure high performance.
    Type: Grant
    Filed: September 30, 1999
    Date of Patent: April 29, 2003
    Assignee: Conexant Systems, Inc.
    Inventors: Khaled Assaleh, Ayman Asadi
  • Patent number: 6542866
    Abstract: A method and apparatus is provided for using multiple feature streams in speech recognition. In the method and apparatus, a feature extractor generates at least two feature vectors for a segment of an input signal. A decoder then generates a path score that is indicative of the probability that a word is represented by the input signal. The path score is generated by selecting the best feature vector to use for each segment. For each segment, the corresponding part in the path score for that segment is based in part on a chosen segment score that is selected from a group of at least two segment scores. The segment scores each represent a separate probability that a particular segment unit (e.g. senone, phoneme, diphone, triphone, or word) appears in that segment of the input signal. Although each segment score in the group relates to the same segment unit, the scores are based on different feature vectors for the segment.
    Type: Grant
    Filed: September 22, 1999
    Date of Patent: April 1, 2003
    Assignee: Microsoft Corporation
    Inventors: Li Jiang, Xuedong Huang
  • Publication number: 20030061044
    Abstract: In speech recognition based on HMM, in which speech recognition is performed by performing vector quantization and obtaining an output probability by table reference, the amount of computation and use of memory area are minimized while achieving a high ability of recognition.
    Type: Application
    Filed: July 18, 2002
    Publication date: March 27, 2003
    Applicant: SEIKO EPSON CORPORATION
    Inventors: Yasunaga Miyazawa, Hiroshi Hasegawa
  • Publication number: 20030055645
    Abstract: Briefly, in accordance with one embodiment of the invention, a recognition system may modify speech models using noise in an input signal that is received prior to a speech sample in the input sample.
    Type: Application
    Filed: September 18, 2001
    Publication date: March 20, 2003
    Inventor: Meir Griniasty
  • Patent number: 6529872
    Abstract: The improved noise adaptation technique employs a linear or non-linear transformation to the set of Jacobian matrices corresponding to an initial noise condition. An &agr;-adaptation parameter or artificial intelligence operation is employed in a linear or non-linear way to increase the adaptation bias added to the speech models. This corrects shortcomings of conventional Jacobian adaptation, which tend to underestimate the effect of noise. The improved adaptation technique is further enhanced by a reduced dimensionality, principal component analysis technique that reduces the computational burden, making the adaptation technique beneficial in embedded recognition systems.
    Type: Grant
    Filed: April 18, 2000
    Date of Patent: March 4, 2003
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Christophe Cerisara, Luca Rigazio, Robert Boman, Jean-Claude Junqua
  • Patent number: 6526381
    Abstract: A processor-based system may utilize a remote control unit which not only allows mouse input commands to be provided to the processor-based system but also includes a microphone and a speech engine for decoding spoken commands and providing code for presenting the commands to the processor-based unit. The processor-based system may provide information to the remote control unit about the vocabulary currently being used by applications active on the processor-based system. This allows the speech engine in the remote control unit to focus on a more limited vocabulary, increasing the accuracy of the speech recognition function and decreasing the capabilities necessary in the remote control unit based speech engine.
    Type: Grant
    Filed: September 30, 1999
    Date of Patent: February 25, 2003
    Assignee: Intel Corporation
    Inventor: Andrew T. Wilson
  • Patent number: 6526379
    Abstract: The discriminative clustering technique tests a provided set of Gaussian distributions corresponding to an acoustic vector space. A distance metric, such as the Bhattacharyya distance, is used to assess which distributions are sufficiently proximal to be merged into a new distribution. Merging is accomplished by computing the centroid of the new distribution by minimizing the Bhattacharyya distance between the parameters of the Gaussian distributions being merged.
    Type: Grant
    Filed: November 29, 1999
    Date of Patent: February 25, 2003
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Luca Rigazio, Brice Tsakam, Jean-Claude Junqua
  • Patent number: 6526170
    Abstract: A character recognition system is disclosed, In a feature extraction parameter storage section 22 a transformation matrix for reducing a number of dimensions of feature parameters and a codebook for quantization are stored. In an HMM storage section 23 a constitution and parameters of Hidden Markov Model (HMM) for character string expression are stored. A feature extraction section 32 scans a word image given from an image storage means from left to right in a predetermined cycle with a slit having a sufficiently small width than the character width and thus outputs a feature symbol at each predetermined timing. A matching section 33 matches a feature symbol row and a probability maximization HMM state, thereby recognizing the character string.
    Type: Grant
    Filed: December 13, 1994
    Date of Patent: February 25, 2003
    Assignee: NEC Corporation
    Inventor: Shinji Matsumoto
  • Patent number: 6526380
    Abstract: A huge vocabulary speech recognition system for recognizing a sequence of spoken words, having an input means for receiving a time-sequential input pattern representative of the sequence of spoken words. The system further includes a plurality of large vocabulary speech recognizers each being associated with a respective, different large vocabulary recognition model. Each of the recognition models is targeted to a specific part of the huge vocabulary. The system comprises a controller operative to direct the input pattern to a plurality of the speech recognizers and to select a recognized word sequence from the word sequences recognized by the plurality of speech recognizers.
    Type: Grant
    Filed: August 9, 1999
    Date of Patent: February 25, 2003
    Assignee: Koninklijke Philips Electronics N.V.
    Inventors: Eric Thelen, Stefan Besling, Meinhard Ullrich