Markov Patents (Class 704/256)
  • Patent number: 7292981
    Abstract: A method for predicting a misrecognition in a speech recognition system, is based on; the insight that variations in a speech input signal are different depending on the origin of the signal being a speech or a non-speech event. The method comprises steps for receiving a speech input signal, extracting at least one signal variation feature of the speech input signal, and applying a signal variation meter to the speech input signal for deriving a signal variation measure.
    Type: Grant
    Filed: October 4, 2004
    Date of Patent: November 6, 2007
    Assignee: Sony Deutschland GmbH
    Inventors: Thomas Kemp, Yin Hay Lam, Krzysztof Marasek
  • Patent number: 7277850
    Abstract: Disclosed is a system and method of decomposing a lattice transition matrix into a block diagonal matrix. The method is applicable to automatic speech recognition but can be used in other contexts as well, such as parsing, named entity extraction and any other methods. The method normalizes the topology of any input graph according to a canonical form.
    Type: Grant
    Filed: April 2, 2003
    Date of Patent: October 2, 2007
    Assignee: AT&T Corp.
    Inventors: Dilek Z. Hakkani-Tur, Giuseppe Riccardi
  • Patent number: 7272561
    Abstract: Each word to be recognized is represented by gender-specific hidden Markov models that are stored in a ROM 6 along with output probability functions and preset transition probabilities. A speech recognizer 4 determines an occurrence probability of a feature parameter sequence detected by a feature value detector 3 using the hidden Markov models. The speech recognizer 4 determines the occurrence probability by giving each word a state sequence of one hidden Markov model common to the gender-specific hidden Markov models, multiplying each preset pair of an output probability function value and a transition probability together among the output probability functions and transition probabilities stored in the ROM 6, selecting the largest product as the probability of each state of the common hidden Markov model, determining the occurrence probability based on the selected product, and recognizing the input speech based on the occurrence probability thus determined.
    Type: Grant
    Filed: July 13, 2001
    Date of Patent: September 18, 2007
    Assignee: Asahi Kasei Kabushiki Kaisha
    Inventors: Toshiyuki Miyazaki, Yoji Ishikawa
  • Patent number: 7269558
    Abstract: For a given sentence grammar, speech recognizers are often required to decode M sets of HMMs each of which models a specific acoustic environment. In order to match input acoustic observations to each of the environments, typically recognition search methods require a network of M sub-networks. A new speech recognition search method is described here, which needs that is only the size of a single sub-network and yet gives the same recognition performance, thus reducing memory requirement for network storage by (M?1)/M.
    Type: Grant
    Filed: July 26, 2001
    Date of Patent: September 11, 2007
    Assignee: Texas Instruments Incorporated
    Inventor: Yifan Gong
  • Patent number: 7269556
    Abstract: Pattern recognition, wherein a sequence of feature vectors is formed from a digitized incoming signal, the feature vectors comprising feature vector components, and at least one feature vector is compared with templates of candidate patterns by computing a distortion measure. A control signal based on at least one time-dependent variable of the recognition process is formulated, and the distortion measure is computed using only a subset of the vector components of the feature vector, the subset being chosen in accordance with said control signal. This reduces the computational complexity of the computation, as the dimensionality of the vectors involved in the computation is effectively reduced. Although such a dimension reduction decreases the computational need, it has been found not to significantly impair the classification performance.
    Type: Grant
    Filed: March 26, 2003
    Date of Patent: September 11, 2007
    Assignee: Nokia Corporation
    Inventors: Imre Kiss, Marcel Vasilache
  • Patent number: 7266236
    Abstract: The present invention provides a method and apparatus for accelerated handwritten symbol recognition in a pen based tablet computer. In one embodiment, handwritten symbols are translated into machine readable characters using special purpose hardware. In one embodiment, the special purpose hardware is a recognition processing unit (RPU) which performs feature extraction and recognition. A user inputs the handwritten symbols and software recognition engine preprocesses the input to a reduced form. The data from the preprocessor is sent to the RPU which performs feature extraction and recognition. In one embodiment, the RPU has memory and the RPU operates on data in its memory. In one embodiment, the RPU uses a hidden Markov model (HMM) as a finite state machine that assigns probabilities to a symbol state based on the preprocessed data from the handwritten symbol. In another embodiment, the RPU recognizes collections of symbols, termed “wordlets,” in addition to individual symbols.
    Type: Grant
    Filed: May 3, 2001
    Date of Patent: September 4, 2007
    Assignee: California Institute of Technology
    Inventors: Kevin Hickerson, Uri Eden
  • Patent number: 7266497
    Abstract: Systems and methods for automatically segmenting speech inventories. A set of Hidden Markov Models (HMMs) are initialized using bootstrap data. The HMMs are next re-estimated and aligned to produce phone labels. The phone boundaries of the phone labels are then corrected using spectral boundary correction. Optionally, this process of using the spectral-boundary-corrected phone labels as input instead of the bootstrap data is performed iteratively in order to further reduce mismatches between manual labels and phone labels assigned by the HMM approach.
    Type: Grant
    Filed: January 14, 2003
    Date of Patent: September 4, 2007
    Assignee: AT&T Corp.
    Inventors: Alistair D. Conkie, Yeon-Jun Kim
  • Patent number: 7263484
    Abstract: An improved method and apparatus is disclosed which uses probabilistic techniques to map an input search string with a prestored audio file, and recognize certain portions of a search string phonetically. An improved interface is disclosed which permits users to input search strings, linguistics, phonetics, or a combination of both, and also allows logic functions to be specified by indicating how far separated specific phonemes are in time.
    Type: Grant
    Filed: March 5, 2001
    Date of Patent: August 28, 2007
    Assignee: Georgia Tech Research Corporation
    Inventors: Peter S. Cardillo, Mark A. Clements, William E. Price
  • Patent number: 7260532
    Abstract: A model generation unit (17) is provided. The model generation unit includes an alignment module (80) arranged to receive pairs of sequences of parameter frame vectors from a buffer (16) and to perform dynamic time warping of the parameter frame vectors to align corresponding parts of the pair of utterances. A consistency checking module (82) is provided to determine whether the aligned parameter frame vectors correspond to the same word. If this is the case the aligned parameter frame vectors are passed to a clustering module (84) which groups the parameter frame vectors into a number of clusters. Whilst clustering the parameter frame vectors, the clustering module (80) determines for each grouping an objective function calculating the best fit of a model to the clusters per degrees of freedom of that model.
    Type: Grant
    Filed: November 6, 2002
    Date of Patent: August 21, 2007
    Assignee: Canon Kabushiki Kaisha
    Inventor: David Llewellyn Rees
  • Patent number: 7233891
    Abstract: A method, computer program product, and apparatus for parsing consecutive sentences which includes tokenizing the words of the sentence and putting them through an iterative inductive processor. The processor has access to at least a first and second set of rules. The rules narrow the possible syntactic interpretations for the words in the sentence. After exhausting application of the first set of rules, the program moves to the second set of rules. The program reiterates back and forth between the sets of rules until no further reductions in the syntactic interpretation can be made. Thereafter, deductive token merging is performed if needed.
    Type: Grant
    Filed: March 25, 2003
    Date of Patent: June 19, 2007
    Assignee: Virtural Research Associates, Inc.
    Inventors: Douglas G. Bond, Churl Oh
  • Patent number: 7231352
    Abstract: The speech recognition rate which is necessary is determined for a selected speech recognition application. The information content of the feature vector components which is at least necessary to ensure the speech recognition rate is determined using a stored speech recognition rate information. The number of necessary feature vector components which is necessary to make available the determined information content is determined and the speech recognition is carried out using feature vectors with the determined required number of feature vector components.
    Type: Grant
    Filed: September 16, 2003
    Date of Patent: June 12, 2007
    Assignee: Infineon Technologies AG
    Inventors: Michael Küstner, Ralf Sambeth
  • Patent number: 7216066
    Abstract: A method is presented comprising assigning each of a plurality of segments comprising a received corpus to a node in a data structure denoting dependencies between nodes, and calculating a transitional probability between each of the nodes in the data structure.
    Type: Grant
    Filed: February 22, 2006
    Date of Patent: May 8, 2007
    Assignee: Microsoft Corporation
    Inventors: Shuo Di, Kai-Fu Lee, Lee-Feng Chien, Zheng Chen, Jianfeng Gao
  • Patent number: 7209883
    Abstract: A speech recognition method includes use of synchronous or asynchronous audio and a video data to enhance speech recognition probabilities. A two stream factorial hidden Markov model is trained and used to identify speech. At least one stream is derived from audio data and a second stream is derived from mouth pattern data. Gestural or other suitable data streams can optionally be combined to reduce speech recognition error rates in noisy environments.
    Type: Grant
    Filed: May 9, 2002
    Date of Patent: April 24, 2007
    Assignee: Intel Corporation
    Inventor: Ara V. Nefian
  • Patent number: 7181391
    Abstract: According to one aspect of the invention, a method is provided in which knowledge about tone characteristics of a tonal syllabic language is used to model speech at various levels in a bottom-up speech recognition structure. The various levels in the bottom-up recognition structure include the acoustic level, the phonetic level, the work level, and the sentence level. At the acoustic level, pitch is treated as a continuous acoustic variable and pitch information extracted from the speech signal is included as feature component of feature vectors. At the phonetic level, main vowels having the same phonetic structure but different tones are defined and modeled as different phonemes. At the word level, as set of tone changes rules is used to build transcription for training data and pronunciation lattice for decoding. At sentence level, a set of sentence ending words with light tone are also added to the system vocabulary.
    Type: Grant
    Filed: September 30, 2000
    Date of Patent: February 20, 2007
    Assignee: Intel Corporation
    Inventors: Ying Jia, Yonghong Yan, Baosheng Yuan
  • Patent number: 7165028
    Abstract: A speech recognizer operating in both ambient noise (additive distortion) and microphone changes (convolutive distortion) is provided. For each utterance to be recognized the recognizer system adapts HMM mean vectors with noise estimates calculated from pre-utterance pause and a channel estimate calculated using an Estimation Maximization algorithm from previous utterances.
    Type: Grant
    Filed: September 20, 2002
    Date of Patent: January 16, 2007
    Assignee: Texas Instruments Incorporated
    Inventor: Yifan Gong
  • Patent number: 7136802
    Abstract: Methods for processing speech data are described herein. In one aspect of the invention, an exemplary method includes receiving a text sentence comprising a plurality of words, each of the plurality of words having a part of speech (POS) tag, generating a POS sequence based on the POS tag of each of the plurality of words, detecting a prosodic phrase break through a recurrent neural network (RNN), based on the POS sequence, and generating a prosodic phrases boundary based on the prosodic phrase break. Other methods and apparatuses are also described.
    Type: Grant
    Filed: January 16, 2002
    Date of Patent: November 14, 2006
    Assignee: Intel Corporation
    Inventors: Zhiwei Ying, Xiaohua Shi
  • Patent number: 7133535
    Abstract: A novel method for synchronizing the lips of a sketched face to an input voice. The lip synchronization system and method approach is to use training video as much as possible when the input voice is similar to the training voice sequences. Initially, face sequences are clustered from video segments, then by making use of sub-sequence Hidden Markov Models, a correlation between speech signals and face shape sequences is built. From this re-use of video, the discontinuity between two consecutive output faces is decreased and accurate and realistic synthesized animations are obtained. The lip synchronization system and method can synthesize faces from input audio in real-time without noticeable delay. Since acoustic feature data calculated from audio is directly used to drive the system without considering its phonemic representation, the method can adapt to any kind of voice, language or sound.
    Type: Grant
    Filed: December 21, 2002
    Date of Patent: November 7, 2006
    Assignee: Microsoft Corp.
    Inventors: Ying Huang, Stephen Ssu-te Lin, Baining Guo, Heung-Yeung Shum
  • Patent number: 7127393
    Abstract: A method and apparatus are provided for automatically recognizing words of spoken speech using a computer-based speech recognition system according to a dynamic semantic model. In an embodiment, the speech recognition system recognizes speech and generates one or more word strings, each of which is a hypothesis of the speech, and creates and stores a probability value or score for each of the word strings. The word strings are ordered by probability value. The speech recognition system also creates and stores, for each of the word strings, one or more keyword-value pairs that represent semantic elements and semantic values of the semantic elements for the speech that was spoken. One or more dynamic semantic rules are defined that specify how a probability value of a word string should be modified based on information about external conditions, facts, or the environment of the application in relation to the semantic values of that word string.
    Type: Grant
    Filed: February 10, 2003
    Date of Patent: October 24, 2006
    Assignee: Speech Works International, Inc.
    Inventors: Michael S. Phillips, Etienne Barnard, Jean-Guy Dahan, Michael J. Metzger
  • Patent number: 7120580
    Abstract: An apparatus and a concomitant method for speech recognition. In one embodiment, the present method is referred to as a “Dynamic Noise Compensation” (DNC) method where the method estimates the models for noisy speech using models for clean speech and a noise model. Specifically, the model for the noisy speech is estimated by interpolation between the clean speech model and the noise model. This approach reduces computational cycles and does not require large memory capacity.
    Type: Grant
    Filed: August 15, 2001
    Date of Patent: October 10, 2006
    Assignee: SRI International
    Inventors: Venkata Ramana Rao Gadde, Horacio Franco, John Butzberger
  • Patent number: 7103541
    Abstract: A system and method facilitating signal enhancement utilizing mixture models is provided. The invention includes a signal enhancement adaptive system having a speech model, a noise model and a plurality of adaptive filter parameters. The signal enhancement adaptive system employs probabilistic modeling to perform signal enhancement of a plurality of windowed frequency transformed input signals received, for example, for an array of microphones. The signal enhancement adaptive system incorporates information about the statistical structure of speech signals. The signal enhancement adaptive system can be embedded in an overall enhancement system which also includes components of signal windowing and frequency transformation.
    Type: Grant
    Filed: June 27, 2002
    Date of Patent: September 5, 2006
    Assignee: Microsoft Corporation
    Inventors: Hagai Attias, Li Deng
  • Patent number: 7092883
    Abstract: Systems and methods for determining word confidence scores. Speech recognition systems generate a word lattice for speech input. Posterior probabilities of the words in the word lattice are determined using a forward-backward algorithm. Next, time slots are defined for the word lattice, and for all transitions that at least partially overlap a particular time slot, the posterior probabilities of transitions that have the same word label are combined for those transitions. The combined posterior probabilities are used as confidence scores. A local entropy can be computed on the competitor transitions of a particular time slot and also used as a confidence score.
    Type: Grant
    Filed: September 25, 2002
    Date of Patent: August 15, 2006
    Assignee: AT&T
    Inventors: Roberto Gretter, Giuseppe Riccardi
  • Patent number: 7089185
    Abstract: An arrangement is provided for embedded coupled hidden Markov model. To train an embedded coupled hidden Markov model, training data is first segmented into uniform segments at different layers of the embedded coupled hidden Markov model. At each layer, a uniform segment corresponds to a state of a coupled hidden Markov model at that layer. An optimal segmentation is generated at the lower layer based on the uniform segmentation and is then used to update parameters of models associated with the states of coupled hidden Markov models at lower layer. The updated model parameters at the lower layer are then used to update the model parameters associated with states at the super layer.
    Type: Grant
    Filed: June 27, 2002
    Date of Patent: August 8, 2006
    Assignee: Intel Corporation
    Inventor: Ara V Nefian
  • Patent number: 7085720
    Abstract: The invention concerns a method of task classification using morphemes which operates on the task objective of a user. The morphemes may be generated by clustering selected ones of the salient sub-morphemes selected from training speech which are semantically and syntactically similar. The method may include detecting morphemes present in the user's input communication, and making task-type classification decisions based on the detected morphemes in the user's input communication. The morphemes may be verbal and/or non-verbal.
    Type: Grant
    Filed: October 18, 2000
    Date of Patent: August 1, 2006
    Assignee: AT & T Corp.
    Inventors: Allen Louis Gorin, Dijana Petrovska-Delacretaz, Giuseppe Riccardi, Jeremy Huntley Wright
  • Patent number: 7076422
    Abstract: A speech recognition system recognizes filled pause utterances made by a speaker. In one embodiment, an ergodic model is used to acoustically model filled pauses that provides flexibility allowing varying utterances of the filled pauses to be made. The ergodic HMM model can also be used for other types of noise such as but limited to breathing, keyboard operation, microphone noise, laughter, door openings and/or closings, or any other noise occurring in the environment of the user or made by the user. Similarly, silence can be modeled using an ergodic HMM model. Recognition can be used with N-gram, context-free grammar or hybrid language models.
    Type: Grant
    Filed: March 13, 2003
    Date of Patent: July 11, 2006
    Assignee: Microsoft Corporation
    Inventor: Mei-Yuh Hwang
  • Patent number: 7069215
    Abstract: Finite-state systems and methods allow multiple input streams to be parsed and integrated by a single finite-state device. These systems and methods not only address multimodal recognition, but are also able to encode semantics and syntax into a single finite-state device. The finite-state device provides models for recognizing multimodal inputs, such as speech and gesture, and composes the meaning content from the various input streams into a single semantic representation. Compared to conventional multimodal recognition systems, finite-state systems and methods allow for compensation among the various input streams. Finite-state systems and methods allow one input stream to dynamically alter a recognition model used for another input stream, and can reduce the computational complexity of multidimensional multimodal parsing.
    Type: Grant
    Filed: July 12, 2001
    Date of Patent: June 27, 2006
    Assignee: AT&T Corp.
    Inventors: Srinivas Bangalore, Michael J. Johnston
  • Patent number: 7062433
    Abstract: A method of speech recognition with compensation is provided by modifying HMM models trained on clean speech with cepstral mean normalization. For all speech utterances the MFCC vector is calculated for the clean database. This mean MFCC vector is added to the original models. An estimate of the background noise is determined for a given speech utterance. The model mean vectors adapted to the noise are determined. The mean vector of the noisy data over the noisy speech space is determined and this is removed from model mean vectors adapted to noise to get the target model.
    Type: Grant
    Filed: January 18, 2002
    Date of Patent: June 13, 2006
    Assignee: Texas Instruments Incorporated
    Inventor: Yifan Gong
  • Patent number: 7058576
    Abstract: The invention relates to speech recognition based on HMM, in which speech recognition is performed by performing vector quantization and obtaining an output probability by table reference, and the amount of computation and use of memory area are minimized while achieving a high ability of recognition. Exemplary codebooks used for vector quantization can be provided as follows: if phonemes are used as subwords, codebooks for respective phonemes, such that a codebook CB1 is a codebook for a phoneme /a/ and a codebook CB2 is a codebook for a phoneme /i/, and these codebooks are associated with respective phoneme HMMs.
    Type: Grant
    Filed: July 18, 2002
    Date of Patent: June 6, 2006
    Assignee: Seiko Epson Corporation
    Inventors: Yasunaga Miyazawa, Hiroshi Hasegawa
  • Patent number: 7050974
    Abstract: A speech communication system comprising a speech input terminal and a speech recognition apparatus which can communicate with each other through a wire or wireless communication network wherein the speech input terminal comprises speech input unit, a unit for creating environment information for speech recognition, which is unique to the speech input terminal or represents its operation state, and a communication control unit for transmitting the environment information to the speech recognition apparatus, and the speech recognition apparatus executes speech recognition processing on the basis of the environment information.
    Type: Grant
    Filed: September 13, 2000
    Date of Patent: May 23, 2006
    Assignee: Canon Kabushiki Kaisha
    Inventors: Yasuhiro Komori, Masayuki Yamada
  • Patent number: 7035802
    Abstract: The dynamic programming technique employs a lexical tree that is encoded in computer memory as a flat representation in which the nodes of each generation occupy contiguous memory locations. The traversal algorithm employs a set of traversal rules whereby nodes of a given generation are processed before the parent nodes of that generation. The deepest child generation is processed first and traversal among nodes of each generation proceeds in the same topological direction.
    Type: Grant
    Filed: July 31, 2000
    Date of Patent: April 25, 2006
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Luca Rigazio, Patrick Nguyen
  • Patent number: 7035801
    Abstract: A method of determining the language of a text message received by a mobile telecommunications device indicates receiving an input text message at a mobile telecommunications device; analyzing the input text message using language information stored in the mobile telecommunications device; selecting, from a group of languages defined by the language information, a most likely language for the input text message; and outputting, from the mobile telecommunications device, speech signals corresponding to the input text message, in the selected language.
    Type: Grant
    Filed: September 5, 2001
    Date of Patent: April 25, 2006
    Assignee: Telefonaktiebolaget L M Ericsson (publ)
    Inventor: Alberto Jimenez-Feltström
  • Patent number: 7027979
    Abstract: A method and apparatus for speech reconstruction within a distributed speech recognition system is provided herein. Missing MFCCs are reconstructed and utilized to generate speech. Particularly, partial recovery of the missing MFCCs is achieved by exploiting the dependence of the missing MFCCs on the transmitted pitch period P as well as on the transmitted MFCCs. Harmonic magnitudes are then obtained from the transmitted and reconstructed MFCCs, and the speech is reconstructed utilizing these harmonic magnitudes.
    Type: Grant
    Filed: January 14, 2003
    Date of Patent: April 11, 2006
    Assignee: Motorola, Inc.
    Inventor: Tenkasi Ramabadran
  • Patent number: 7024360
    Abstract: A method of reconstructing a damaged sequence of symbols where some symbols are missing is provided in which statistical parameters of the sequence are used with confidence windowing techniques to quickly and efficiently reconstruct the damaged sequence to its original form. Confidence windowing techniques are provided that are equivalent to generalized hidden semi-Markov models but which are more easily used to determine the most likely missing symbol at a given point in the damaged sequence being reconstructed. The method can be used to reconstruct communications consisting of speech, music, digital transmission symbols and others having a bounded symbol set which can be described by statistical behaviors in the symbol stream.
    Type: Grant
    Filed: March 17, 2003
    Date of Patent: April 4, 2006
    Assignee: Rensselaer Polytechnic Institute
    Inventors: Michael Savic, Michael Moore
  • Patent number: 7020587
    Abstract: The generation and management of a language model data structure include assigning each segment of a received corpus to a node in a data structure that denotes dependencies between the respective nodes. A transitional probability between each of the nodes in the data structure is calculated. A frequency of occurrence is calculated for each item of the respective segments, and those nodes of the data structure associated with items that do not meet a minimum frequency of occurrence threshold are removed. The data structure may be managed across a system memory of a computer system and an extended memory of the computer system.
    Type: Grant
    Filed: June 30, 2000
    Date of Patent: March 28, 2006
    Assignee: Microsoft Corporation
    Inventors: Shuo Di, Kai-Fu Lee, Lee-Feng Chien, Zheng Chen, Jianfeng Gao
  • Patent number: 7016837
    Abstract: An initial combination HMM 16 is generated from a voice HMM 10 having multiplicative distortions and an initial noise HMM of additive noise, and at the same time, a Jacobian matrix J is calculated by a Jacobian matrix calculating section 19. Noise variation Namh (cep), in which an estimated value Ha^(cep) of the multiplicative distortions that are obtained from voice that is actually uttered, additive noise Na(cep) that is obtained in a non-utterance period, and additive noise Nm(cep) of the initial noise HMM 17 are combined, is multiplied by a Jacobian matrix, wherein the result of the multiplication and initial combination HMM 16 are combined, and an adaptive HMM 26 is generated. Thereby, an adaptive HMM 26 that is matched to the observation value series RNah(cep) generated from actual utterance voice can be generated in advance.
    Type: Grant
    Filed: September 18, 2001
    Date of Patent: March 21, 2006
    Assignee: Pioneer Corporation
    Inventors: Hiroshi Seo, Mitsuya Komamura, Soichi Toyama
  • Patent number: 7013273
    Abstract: A system and associated method of converting audio data from a television signal into textual data for display as a closed caption on an display device is provided. The audio data is decoded and audio speech signals are filtered from the audio data. The audio speech signals are parsed into phonemes in accordance by a speech recognition module. The parsed phonemes are grouped into words and sentences responsive to a database of words corresponding to the grouped phonemes. The words are converted into text data which is formatted for presentation on the display device as closed captioned textual data.
    Type: Grant
    Filed: March 29, 2001
    Date of Patent: March 14, 2006
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventor: Michael Kahn
  • Patent number: 7013276
    Abstract: Predicting speech recognizer confusion where utterances can be represented by any combination of text form and audio file. The utterances are represented with an intermediate representation that directly reflects the acoustic characteristics of the utterances. Text representations of the utterances can be directly used for predicting confusability without access to audio file examples of the utterances. First embodiment: two text utterances are represented with strings of phonemes and one of the strings of phonemes is transformed into the other strings of phonemes for a least cost as a confusability measure. Second embodiment: two utterances are represented with an intermediate representation of sequences of acoustic events based on phonetic capabilities of speakers obtained from acoustic signals of the utterances and the acoustic events are compared. Predicting confusability of the utterances according to a formula 2K/(T), K is a number of matched acoustic events and T is a total number of acoustic events.
    Type: Grant
    Filed: October 5, 2001
    Date of Patent: March 14, 2006
    Assignee: Comverse, Inc.
    Inventors: Corine A. Bickley, Lawrence A. Denenberg
  • Patent number: 7003456
    Abstract: A computer-based method of routing a message to a system includes receiving a message, and processing the message using large-vocabulary continuous speech recognition to generate a string of text corresponding to the message. The method includes generating a confidence estimate of the string of text corresponding to the message and comparing the confidence estimate to a predetermined threshold. If the confidence estimate satisfies the predetermined threshold, the string of text is forwarded to the system. If the confidence estimate does not satisfy the predetermined threshold, the information relating to the message is forwarded to a transcriptionist. The message may include one or more utterances. Each utterance in the message may be separately or jointly processed. In this way, a confidence estimate may be generated and evaluated for each utterance or for the whole message. Information relating to each utterance may be separately or jointly forwarded based on the results of the generation and evaluation.
    Type: Grant
    Filed: June 12, 2001
    Date of Patent: February 21, 2006
    Inventors: Laurence S. Gillick, Robert Roth, Linda Manganaro, Barbara R. Peskin, David C. Petty, Ashwin Rao
  • Patent number: 7003460
    Abstract: In speech recognition, phonemes of a language are modelled by a hidden Markov model, whereby each status of the hidden Markov model is described by a probability density function. For speech recognition of a modified vocabulary, the probability density function is split into a first and into a second probability density function. As a result thereof, it is possible to compensate variations in the speaking habits of a speaker or to add a new word to the vocabulary of the speech recognition unit and thereby assure that this new word is distinguished with adequate quality from the words already present in the speech recognition unit and is thus recognized.
    Type: Grant
    Filed: May 3, 1999
    Date of Patent: February 21, 2006
    Assignee: Siemens Aktiengesellschaft
    Inventors: Udo Bub, Harald Höge
  • Patent number: 6996525
    Abstract: A method for selecting a speech recognizer from a number of speech recognizers in a speech recognition system. The speech recognition system receives an audio stream from an application and derives enabling information. The speech recognition system then enables at least some of the speech recognizers and receives their results. It derives selection information and uses it to select the best speech recognizer and its results and returns those results back to the application.
    Type: Grant
    Filed: June 15, 2001
    Date of Patent: February 7, 2006
    Assignee: Intel Corporation
    Inventors: Steven M. Bennett, Andrew V. Anderson
  • Patent number: 6990442
    Abstract: The present invention provides a parsing technique wherein a parsing process provides feedback to a tokenizer to select an appropriate sub-tokenizer process corresponding to a grammar rule being implemented by the current parsing state. Each parsing state will select a corresponding sub-tokenizer process to tokenize a corresponding portion of an input stream for a message to be parsed. Each sub-tokenizer process is preferably unique and configured to provide only tokens capable of being processed by the grammar rule being implemented in the corresponding parser state. If the input string cannot be tokenized as required by the corresponding grammar rule implemented by the parser state, an error message is delivered. The parser process will move from one state to another, based on processing the respective tokens, until the input stream for the message is completely parsed.
    Type: Grant
    Filed: September 21, 2001
    Date of Patent: January 24, 2006
    Assignee: Nortel Networks Limited
    Inventor: James F. Davis
  • Patent number: 6980954
    Abstract: A search method based on a single triphone tree for large vocabulary continuous speech recognizer is disclosed in which speech signal are received. Tokens are propagated in a phonetic tree to integrate a language model to recognize the received speech signals. By propagating tokens, which are preserved in tree nodes and record the path history, a single triphone tree can be used in a one pass searching process thereby reducing speech recognition processing time and system resource use.
    Type: Grant
    Filed: September 30, 2000
    Date of Patent: December 27, 2005
    Assignee: Intel Corporation
    Inventors: Quingwei Zhao, Zhiwei Lin, Yonghong Yan, Baosheng Yuan
  • Patent number: 6970819
    Abstract: The principal object of this invention is to provide a suitable control method for closing length with respect to phonemes (such as unvoiced plosive consonants) having a closing interval, and as a result an improved rule-based speech synthesis device is provided. A phoneme type judgement part 201 judges whether the phoneme in question is a vowel or consonant and, in the case of a consonant, judges whether or not it is a consonant that anteriorly has a closing interval. As a result, it operates a vowel length estimation part 202 when it judges that the phoneme is a vowel and operates a consonant length estimation part 205 when it judges that the phoneme is a consonant, and when it has judged that this phoneme anteriorly has a closing interval, it operates a closing length estimation part 208, whereby the respective time lengths are estimated. After that, the estimated time lengths are set by vowel length setting part 203, consonant length setting part 206 and closing length setting part 209, respectively.
    Type: Grant
    Filed: October 27, 2000
    Date of Patent: November 29, 2005
    Assignee: Oki Electric Industry Co., Ltd.
    Inventor: Yukio Tabei
  • Patent number: 6963837
    Abstract: An attribute-based speech recognition system is described. A speech pre-processor receives input speech and produces a sequence of acoustic observations representative of the input speech. A database of context-dependent acoustic models characterize a probability of a given sequence of sounds producing the sequence of acoustic observations. Each acoustic model includes phonetic attributes and suprasegmental non-phonetic attributes. A finite state language model characterizes a probability of a given sequence of words being spoken. A one-pass decoder compares the sequence of acoustic observations to the acoustic models and the language model, and outputs at least one word sequence representative of the input speech.
    Type: Grant
    Filed: October 6, 2000
    Date of Patent: November 8, 2005
    Assignee: Multimodal Technologies, Inc.
    Inventors: Michael Finke, Jurgen Fritsch, Detleff Koll, Alex Waibel
  • Patent number: 6950796
    Abstract: The invention provides a Hidden Markov Model (132) based automated speech recognition system (100) that dynamically adapts to changing background noise by detecting long pauses in speech, and for each pause processing background noise during the pause to extract a feature vector that characterizes the background noise, identifying a Gaussian mixture component of noise states that most closely matches the extracted feature vector, and updating the mean of the identified Gaussian mixture component so that it more closely matches the extracted feature vector, and consequently more closely matches the current noise environment. Alternatively, the process is also applied to refine the Gaussian mixtures associated with other emitting states of the Hidden Markov Model.
    Type: Grant
    Filed: November 5, 2001
    Date of Patent: September 27, 2005
    Assignee: Motorola, Inc.
    Inventors: Changxue Ma, Yuan-Jun Wei
  • Patent number: 6937981
    Abstract: A multiplicative distortion Hm(cep) is subtracted from a voice HMM 5, a multiplicative distortion Ha(cep) of the uttered voice is subtracted from a noise HMM 6 formed by HMM, and the subtraction results Sm(cep) and {Nm(cep)?Ha (cep)} are combined with each other to thereby form a combined HMM 18 in the cepstrum domain. A cepstrum R^a(cep) obtained by subtracting the multiplicative distortion Ha (cep) from the cepstrum Ra (cep) of the uttered voice is compared with the distribution R^m(cep) of the combined HMM 18 in the cepstrum domain, and the combined HMM with the maximum likelihood is output as the voice recognition result.
    Type: Grant
    Filed: September 18, 2001
    Date of Patent: August 30, 2005
    Assignee: Pioneer Corporation
    Inventors: Hiroshi Seo, Mitsuya Komamura, Soichi Toyama
  • Patent number: 6937983
    Abstract: The present invention discloses a computer-implemented method to understand queries or commands spoken by users when they use natural language utterances similar to those that people use spontaneously to communicate. More precisely, the invention discloses a method that identifies user queries or commands from the general information involved in spoken utterances directly by the speech recognition system, and not by a post-process as is conventionally used. In a phase of preparation of the system, a vocabulary of items representing data and semantic identifiers is created as well as a syntax module having valid combinations of items. When the system is in use, a user utterance is first discretized into a plurality of basic speech units which are compared to the items in the vocabulary and a combination of items is selected according to the evaluation from the syntax module in order to generate the most likely sequence of items representative of the user utterance.
    Type: Grant
    Filed: October 15, 2001
    Date of Patent: August 30, 2005
    Assignee: International Business Machines Corporation
    Inventor: Juan Rojas Romero
  • Patent number: 6931374
    Abstract: A method is developed which includes 1) defining a switching state space model for a continuous valued hidden production-related parameter and the observed speech acoustics, and 2) approximating a posterior probability that provides the likelihood of a sequence of the hidden production-related parameters and a sequence of speech units based on a sequence of observed input values. In approximating the posterior probability, the boundaries of the speech units are not fixed but are optimally determined. Under one embodiment, a mixture of Gaussian approximation is used. In another embodiment, an HMM posterior approximation is used.
    Type: Grant
    Filed: April 1, 2003
    Date of Patent: August 16, 2005
    Assignee: Microsoft Corporation
    Inventors: Hagai Attias, Leo Jingyu Lee, Li Deng
  • Patent number: 6928409
    Abstract: A speech recognition system (10) having a sampler block (12) and a feature extractor block (14) for extracting time domain and spectral domain parameters from a spoken input speech into a feature vector. A polynomial expansion block (16) generates polynomial coefficients from the feature vector. A correlator block (20), a sequence vector block (22), an HMM table (24) and a Viterbi block (26) perform the actual speech recognition based on the speech units stored in a speech unit table (18) and the HMM word models stored in the HMM table (24). The HMM word model that produces the highest probability is determined to be the word that was spoken.
    Type: Grant
    Filed: May 31, 2001
    Date of Patent: August 9, 2005
    Assignee: Freescale Semiconductor, Inc.
    Inventors: David L. Barron, William Chunhung Yip
  • Patent number: 6922489
    Abstract: A method of interpreting an image using a statistical or probabilistic interpretation model is disclosed. The image has associated therewith contextual information. The method comprises the following steps: providing the contextual information associated with the image for analysis; analyzing the additional contextual information to identify predetermined features relating to the image; and biasing the statistical or probabilistic interpretation model in accordance with the identified features.
    Type: Grant
    Filed: October 29, 1998
    Date of Patent: July 26, 2005
    Assignees: Canon Kabushiki Kaisha, Canon Information Systems Research Australia Pty. Ltd.
    Inventors: Alison Joan Lennon, Delphine Anh Dao Le
  • Patent number: 6922668
    Abstract: This invention relates to an improved method and apparatus for speaker recognition. In this invention, prior to comparing feature vectors derived from speech with a stored reference model the feature vectors are processed by applying a speaker dependent transform which matches the characteristics of a particular speaker's vocal tract. Features derived from speech which has very dissimilar characteristics to those of the speaker on which the transform is dependent may be severely distorted by the transform, whereas features from speech which has similar characteristics to those of the speaker on which the transform is dependent will be distorted far less.
    Type: Grant
    Filed: February 25, 2000
    Date of Patent: July 26, 2005
    Assignee: British Telecommunications Public Limited Company
    Inventor: Simon N. Downey