Markov Patents (Class 704/256)
  • Patent number: 6917919
    Abstract: A speech recognition method is described in which a basic set of models is adapted to a current speaker on account of the speaker's already noticed speech data. The basic set of models comprises models for different acoustic units. The models are described each by a plurality of model parameters. The basic set of models is then represented by a supervector in a high-dimensional vector space (model space), the supervector being formed by a concatenation of the plurality of the model parameters of the models of the basic set of models. The adaptation of this basic set of models to the speaker is effected in the model space by means of a MAP method in which an asymmetric distribution in the model space is selected as an a priori distribution for the MAP method.
    Type: Grant
    Filed: September 24, 2001
    Date of Patent: July 12, 2005
    Assignee: Koninklijke Philips Electronics, N.V.
    Inventor: Henrik Botterweck
  • Patent number: 6912498
    Abstract: Correcting incorrect text associated with recognition errors in computer-implemented speech recognition includes receiving a selection of a word from a recognized utterance. The selection indicates a bound of a portion of the recognized utterance to be corrected. A first recognition correction is produced based on a comparison between a first alternative transcript and the recognized utterance. A second recognition correction is produced based on a comparison between a second alternative transcript and the recognized utterance. The duration of the first recognition correction differs from the duration of the second recognition correction. A portion of the recognition result that is replaced with one of the first recognition correction and the second recognition correction. includes at one bound a word indicated by the selection and extends for the duration of the one of the first recognition correction and the second recognition correction with which the portion is replaced.
    Type: Grant
    Filed: May 2, 2001
    Date of Patent: June 28, 2005
    Assignee: ScanSoft, Inc.
    Inventors: Daniell Stevens, Robert Roth, Joel M. Gould, Michael J. Newman, Dean Sturtevant, Charles E. Ingold, David Abrahams, Allan Gold
  • Patent number: 6910013
    Abstract: The invention relates first of all to a method for identifying a transient acoustic scene, said method including the extraction, during an extraction phase, of characteristic features from an acoustic signal captured by at least one microphone (2a, 2b), and the identification, during an identification phase, of the transient acoustic scene on the basis of the extracted characteristics. According to the invention, at least auditory-based characteristics are identified in the extraction phase. Also specified are an application of the method per this invention and a hearing device.
    Type: Grant
    Filed: January 5, 2001
    Date of Patent: June 21, 2005
    Assignee: Phonak AG
    Inventors: Sylvia Allegro, Michael Büchler
  • Patent number: 6901365
    Abstract: The invention enables even a CPU having low processing performance to find an HMM output probability by simplifying arithmetic operations. The dimensions of an input vector are grouped into several sets, and tables are created for the sets. When an output probability is calculated, codes corresponding to the first dimension to n-the dimension of the input vector are sequentially obtained, and for each code, by referring to the corresponding table, output values for each table are obtained. By substituting the output values for each table for a formula for finding an output probability, the output probability is found.
    Type: Grant
    Filed: September 19, 2001
    Date of Patent: May 31, 2005
    Assignee: Seiko Epson Corporation
    Inventor: Yasunaga Miyazawa
  • Patent number: 6895380
    Abstract: An interactive voice actuated control system for a testing machine such as a tensile testing machine is described. Voice commands are passed through a user-command predictor and integrated with a graphical user interface control panel to allow hands-free operation. The user-command predictor learns operator command patterns on-line and predicts the most likely next action. It assists less experienced operators by recommending the next command, and it adds robustness to the voice command interpreter by verbally asking the operator to repeat unlikely commanded actions. The voice actuated control system applies to industrial machines whose normal operation is characterized by a nonrandom series of commands.
    Type: Grant
    Filed: March 2, 2001
    Date of Patent: May 17, 2005
    Assignee: Electro Standards Laboratories
    Inventor: Raymond Sepe, Jr.
  • Patent number: 6895377
    Abstract: A phonetic data processing system processes phonetic stream data to produce a set of semantic data, using a context-free rich semantic grammar database (RSG DB) that includes a grammar tree, comprised of sub-trees, representing words and phrases. A phonetic searcher accepts the phonetic estimates and searches the RSG DB to produce a best word list, which is processed by a semantic parser, using the RSG DB, to produce a semantic tree instance, including all valid interpretations of the phonetic stream. An application accesses a semantic tree evaluator to interpret the semantic tree instance according to a context to produce a final linguistic interpretation of the phonetic stream, which is returned to the application.
    Type: Grant
    Filed: March 23, 2001
    Date of Patent: May 17, 2005
    Assignee: Eliza Corporation
    Inventors: John Kroeker, Oleg Boulanov, Andrey Yelpatov
  • Patent number: 6882973
    Abstract: A voice processing system includes a speech recognition facility with barge-in. The system plays out a prompt to a caller, who starts to provide their spoken response while the prompt is still being played out. The system performs speech recognition on this response to determine a corresponding text, which is then subjected to lexical analysis. This tests whether the text satisfies one or more conditions, for example, including one or more words from a predefined set of task words. If this is found to be the case, the playing out of the prompt is terminated (i.e. barge-in is effected); otherwise, the playing out of the prompt is continued, essentially as if the caller bad not interrupted.
    Type: Grant
    Filed: October 25, 2000
    Date of Patent: April 19, 2005
    Assignee: International Business Machines Corporation
    Inventor: John Brian Pickering
  • Patent number: 6871177
    Abstract: A method and apparatus of recognizing a pattern comprising a sequence of sub-patterns includes a set of possible patterns being modelled by a network of sub-pattern models. One or more initial software model objects are instantiated first. As these models produce outputs, succeeding model objects are instantiated if they have not already been instantiated. However, the succeeding model objects are only instantiated if a triggering model output meets a predetermined criterion. This ensures that the processing required is maintained at a manageable level. If the models comprise finite state networks, pruning of internal states may also be performed. The criterion applied to this pruning is less harsh than that applied when determining whether to instantiate a succeeding model.
    Type: Grant
    Filed: October 27, 1998
    Date of Patent: March 22, 2005
    Assignee: British Telecommunications public limited company
    Inventors: Simon A Hovell, Mark Wright, Simon P. A Ringland
  • Patent number: 6862359
    Abstract: A hearing prosthesis that automatically adjusts itself to a surrounding listening environment by applying Hidden Markov Models is provided. In one aspect, classification results are utilized to support automatic parameter adjustment of a parameter or parameters of a predetermined signal processing algorithm executed by processing means of the hearing prosthesis. According to another aspect, features vectors extracted from a digital input signal of the hearing prosthesis and processed by the Hidden Markov Models represent substantially level and/or absolute spectrum shape independent signal features of the digital input signal. This level independent property of the extracted features vectors provides robust classification results in real-life acoustic environments.
    Type: Grant
    Filed: May 29, 2002
    Date of Patent: March 1, 2005
    Assignee: GN ReSound A/S
    Inventors: Nils Peter Nordqvist, Arne Leijon
  • Patent number: 6853962
    Abstract: Training apparatus for training a user to engage in transactions (e.g. a foreign language conversation) with another person whom the apparatus is arranged to simulate, the apparatus comprising: an input for receiving input dialogue from a user; a lexical store containing data relating to individual words of said input dialogue; a rule store containing rules specifying grammatically allowable relationships between words of said input dialogue; a transaction store containing data relating to allowable transactions between said user and said person; a processor arranged to process the input dialogue to recognise the occurrence therein of words contained in said lexical store in the relationships specified by the rules contained in said rule store in accordance with the data specified in the transaction store, and to generate output dialogue indicating when correct input dialogue has been recognised; and an output device for making the output dialogue available to the user.
    Type: Grant
    Filed: September 11, 1997
    Date of Patent: February 8, 2005
    Assignee: British Telecommunications public limited company
    Inventor: Stephen C Appleby
  • Patent number: 6850888
    Abstract: A method and apparatus are disclosed for training a pattern recognition system, such as a speech recognition system, using an improved objective function. The concept of rank likelihood, previously applied only to the decding process, is applied in a novel manner to the parameter estimation of the training phase of a pattern recognition system. The disclosed objective function is based on a pseudo-rank likelihood that not only maximizes the likelihood of an observation for the correct class, but also minimizes the likelihoods of the observation for all other classes, such that the discrimination between classes is maximized. A training process is disclosed that utilizes the pseudo-rank likelihood objective function to identify model parameters that will result in a pattern recognizer with the lowest possible recognition error rate. The discrete nature of the rank-based rank likelihood objective function is transformed to allow the parameter estimations to be optimized during the training phase.
    Type: Grant
    Filed: October 6, 2000
    Date of Patent: February 1, 2005
    Assignee: International Business Machines Corporation
    Inventors: Yuqing Gao, Yongxin Li, Michael Alan Picheny
  • Patent number: 6847921
    Abstract: The invention is a method for analyzing spatially-varying noise in seismic data. Transitions between data values at adjacent data locations in a seismic data set are represented by Markov chains. Transition probability matrices are constructed from the Markov chains. Data values are predicted from the calculated transition probabilities. Noise values are determined from the predicted data values.
    Type: Grant
    Filed: April 7, 2003
    Date of Patent: January 25, 2005
    Assignee: ExxonMobil Upstream Research Company
    Inventors: Alex Woronow, John F. Schuette, Chrysanthe S. Munn
  • Patent number: 6845357
    Abstract: Data structures, systems, and methods are aspects of pattern recognition using observable operator models (OOMs). OOMs are more efficient than Hidden Markov Models (HMMs). A data structure for an OOM has characteristic events, an initial distribution vector, a probability transition matrix, an occurrence count matrix, and at least one observable operator. System applications include computer systems, cellular phones, wearable computers, home control systems, fire safety or security systems, PDAs, and flight systems. A method of pattern recognition comprises training OOMs, receiving unknown input, computing matching probabilities, selecting the maximum probability, and displaying the match. A method of speech recognition comprises sampling a first input stream, performing a spectral analysis, clustering, training OOMs, and recognizing speech using the OOMs.
    Type: Grant
    Filed: July 24, 2001
    Date of Patent: January 18, 2005
    Assignee: Honeywell International Inc.
    Inventors: Ravindra K. Shetty, Venkatesan Thyagarajan
  • Publication number: 20040267530
    Abstract: Methods are given for improving discriminative training of hidden Markov models for continuous speech recognition. In one approach, discriminatively trained mixture models are interpolated with maximum likelihood trained mixture models. In another approach, segmentation and recognition results from one set of models are reused to discriminatively train a second set of models. For example, segmentation and recognition results from detailed match models are mapped and used to discriminatively train fast match models. In addition, gradients for the standard deviation of mixture components are clipped based on the statistics of the gradients. Pronunciation of words may also be used to determine the “incorrect” recognition hypothesis.
    Type: Application
    Filed: November 21, 2003
    Publication date: December 30, 2004
    Inventors: Chuang He, Jianxiong Wu, Vlad Sejnoha
  • Patent number: 6832190
    Abstract: In the recognition of spoken language, phonemes of the language are modelled by hidden Markov models. A modified hidden Markov model includes a conditional probability of a feature vector dependent on chronologically preceding feature vectors and, optionally, additionally comprises a conditional probability of a respectively current status. A global search for recognizing a word sequence in the spoken language is implemented with the modified hidden Markov model.
    Type: Grant
    Filed: November 10, 2000
    Date of Patent: December 14, 2004
    Assignee: Siemens Aktiengesellschaft
    Inventors: Jochen Junkawitsch, Harald Höge
  • Patent number: 6832191
    Abstract: To implement a speech recognizer for a language in conditions of substantial unavailability of related speech training material the first step (1,2) is, based on related speech training material, a multilingual speech recognizer (2) for a plurality of known languages. The recognizer for such given language (5) is then implemented by interpolation (4) starting from the said multilingual recognizer (2). The recognizer (5) generated in this fashion is susceptible of being subsequently refined based on related speech training material acquired online (4) during later use (FIG.
    Type: Grant
    Filed: August 28, 2000
    Date of Patent: December 14, 2004
    Assignee: Telecom Italia Lab S.p.A.
    Inventors: Alessandra Frasca, Giorgio Micca, Enrico Palme
  • Publication number: 20040236577
    Abstract: To provide an acoustic model which can absorb the fluctuation of a phonemic environment in an interval longer than a syllable, with the number of parameters of the acoustic model suppressed to be small, a phoneme-connected syllable HMM/syllable-connected HMM set is generated in such a way that a phoneme-connected syllable HMM set corresponding to individual syllables is generated by combining phoneme HMMs. A preliminary experiment is conducted using the phoneme-connected syllable HMM set and training speech data. Any misrecognized syllable and the preceding syllable of the misrecognized syllable are checked using results of a preliminary experiment syllable label data. The combination between a correct answer syllable for the misrecognized syllable and the preceding syllable of the misrecognized syllable is extracted as a syllable connection. A syllable-connected HMM corresponding to this syllable connection is added into the phoneme-connected syllable HMM set.
    Type: Application
    Filed: March 8, 2004
    Publication date: November 25, 2004
    Applicant: Seiko Epson Corporation
    Inventors: Masanobu Nishitani, Yasunaga Miyazawa, Hiroshi Matsumoto, Kazumasa Yamamoto
  • Patent number: 6823304
    Abstract: A lead consonant buffer stores a feature parameter preceding a lead voiced sound detected by a voiced sound detector as a feature parameter of a lead consonant. A matching processing unit performs matching processing of a feature parameter of a lead consonant stored in the lead consonant buffer with a feature parameter of a registered pattern. Hence, the matching processing unit can perform matching processing reflecting information on a lead consonant even when no lead consonant can be detected due to a noise.
    Type: Grant
    Filed: July 19, 2001
    Date of Patent: November 23, 2004
    Assignee: Renesas Technology Corp.
    Inventor: Masahiko Ikeda
  • Patent number: 6823308
    Abstract: A speech recognition method for use in a multimodal input system comprises receiving a multimodal input comprising digitized speech as a first modality input and data in at least one further modality input. Features in the speech and in the data in at least one further modality are identified. The identified features in the speech and in the data are used in the recognition of words by comparing the identified features with states in models for the words. The models have states for the recognition of speech and for words having features in at least one further modality associated with the words, the models also have states for the recognition of events in the further modality or each further modality.
    Type: Grant
    Filed: February 16, 2001
    Date of Patent: November 23, 2004
    Assignee: Canon Kabushiki Kaisha
    Inventors: Robert Alexander Keiller, Nicolas David Fortescue
  • Patent number: 6823306
    Abstract: Techniques for generating, distributing, and using speech recognition models are described. A shared speech processing facility is used to support speech recognition for a wide variety of devices with limited capabilities including business computer systems, personal data assistants, etc., which are coupled to the speech processing facility via a communications channel, e.g., the Internet. Devices with audio capture capability record and transmit to the speech processing facility, via the Internet, digitized speech and receive speech processing services, e.g., speech recognition model generation and/or speech recognition services, in response. The Internet is used to return speech recognition models and/or information identifying recognized words or phrases. Thus, the speech processing facility can be used to provide speech recognition capabilities to devices without such capabilities and/or to augment a device's speech processing capability.
    Type: Grant
    Filed: November 30, 2000
    Date of Patent: November 23, 2004
    Assignee: Telesector Resources Group, Inc.
    Inventors: Craig Reding, Suzi Levas
  • Patent number: 6801892
    Abstract: Disclosed is a speech recognition method in a speech recognition apparatus to applying speech recognition to a voice signal applied thereto. The input voice signal is converted from an analog to a digital signal and sequences of feature vectors are extracted based upon the digital signal (S12). A search space is defined by the sequences of feature vectors and an HMM (16) prepared beforehand for each unit of speech. The search space allows a transition between HMMs only in specific feature-vector sequences. A search is conducted in this space to find an optimum path for which the largest acoustic likelihood regarding the voice signal is obtained to find the result of recognition (S14), and this result is output (S15).
    Type: Grant
    Filed: March 27, 2001
    Date of Patent: October 5, 2004
    Assignee: Canon Kabushiki Kaisha
    Inventor: Hiroki Yamamoto
  • Patent number: 6801895
    Abstract: The present invention provides for a method and apparatus for segmenting a multi-media program based upon audio events. In an embodiment a method of classifying an audio stream is provided. This method includes receiving an audio stream. Sampling the audio stream at a predetermined rate and then combining a predetermined number of samples into a clip. A plurality of features are then determined for the clip and are analyzed using a linear approximation algorithm. The clip is then characterized based upon the results of the analysis conducted with the linear approximation algorithm.
    Type: Grant
    Filed: December 6, 1999
    Date of Patent: October 5, 2004
    Assignee: AT&T Corp.
    Inventors: Qian Huang, Zhu Liu
  • Patent number: 6801890
    Abstract: The invention relates to a method for enhancing recognition probability in voice recognition systems. According to the inventive method, selective post-training of the already stored homonymic term is carried out after inputting a term to be recognized. This makes it possible to improve the speaker-dependent recognition rate even in environments with prevailing acoustic interference.
    Type: Grant
    Filed: November 13, 2000
    Date of Patent: October 5, 2004
    Assignees: DeTeMobil, Deutsche Telekom MobilNet GmbH, Deutsche Telekom AG
    Inventors: Ulrich Kauschke, Herbert Roland Rast, Fred Runge
  • Publication number: 20040193418
    Abstract: The present invention comprises a system and method for implementing a Cantonese speech recognizer with an optimized phone set, and may include a recognizer configured to compare input speech data to phone strings from a vocabulary dictionary that is implemented according to an optimized Cantonese phone set. The optimized Cantonese phone set may be implemented with a phonetic technique to separately include consonantal phones and vocalic phones. For reasons of system efficiency, the optimized Cantonese phone set may preferably be implemented in a compact manner to include only a minimum required number of consonantal phones and vocalic phones to accurately represent Cantonese speech during the speech recognition procedure.
    Type: Application
    Filed: March 24, 2003
    Publication date: September 30, 2004
    Applicant: Sony Corporation and Sony Electronics Inc.
    Inventors: Michael Emonts, Xavier Menendez-Pidal, Lex Olorenshaw
  • Publication number: 20040193419
    Abstract: A method and system for training an audio analyzer (114) to identify asynchronous segments of audio types using sample data sets, the sample data sets being representative of audio signals for which segmentation is desired. The system and method then label asynchronous segments of audio samples, collected at the target site, into a plurality of categories by cascading hidden Markov models (HMM). The cascaded HMMs consist of 2 stages, the output of the first stage HMM (208) being transformed and used as observation inputs to the second stage HMM (212). This cascaded HMM approach allows for modeling processes with complex temporal characteristics by using training data. It also contains a flexible framework that allows for segments of varying duration. The system and method are particularly useful in identifying and separating segments of the human voice for voice recognition systems from other audio such as music.
    Type: Application
    Filed: March 31, 2003
    Publication date: September 30, 2004
    Inventors: Steven F. Kimball, Joanne Como
  • Publication number: 20040186718
    Abstract: Method and apparatus for an audiovisual continuous speech recognition (AVCSR) system using a coupled hidden Markov model (CHMM) are described herein. In one aspect, an exemplary process includes receiving an audio data stream and a video data stream, and performing continuous speech recognition based on the audio and video data streams using a plurality of hidden Markov models (HMMs), a node of each of the HMMs at a time slot being subject to one or more nodes of related HMMs at a preceding time slot. Other methods and apparatuses are also described.
    Type: Application
    Filed: March 19, 2003
    Publication date: September 23, 2004
    Inventors: Ara Victor Nefian, Xiaoxing Liu, Xiaobo Pi, Luhong Liang, Yibao Zhao
  • Publication number: 20040186717
    Abstract: A method of reconstructing a damaged sequence of symbols where some symbols are missing is provided in which statistical parameters of the sequence are used with confidence windowing techniques to quickly and efficiently reconstruct the damaged sequence to its original form. Confidence windowing techniques are provided that are equivalent to generalized hidden semi-Markov models but which are more easily used to determine the most likely missing symbol at a given point in the damaged sequence being reconstructed. The method can be used to reconstruct communications consisting of speech, music, digital transmission symbols and others having a bounded symbol set which can be described by statistical behaviors in the symbol stream.
    Type: Application
    Filed: March 17, 2003
    Publication date: September 23, 2004
    Applicant: Rensselaer Polytechnic Institute
    Inventors: Michael Savic, Michael Moore
  • Publication number: 20040181409
    Abstract: To make speech recognition robust in a noisy environment, variable parameter Gaussian Mixture HMM is described which extends existing HMMs by allowing HMM parameters to change as a function of a continuous variable that depends on the environment. Specifically, in one embodiment the function is a polynomial, the environment is described by signal-to-noise ratio. The use of the parameters functions improves the HMM discriminability during multi-condition training. In the recognition process, a set of HMM parameters is instantiated according to parameter functions, based on current environment. The model parameters are estimated using Expectation-Maximization algorithm for variable parameter GMHMM.
    Type: Application
    Filed: March 11, 2003
    Publication date: September 16, 2004
    Inventors: Yifan Gong, Xiaodong Cui
  • Publication number: 20040181410
    Abstract: A speech recognition system recognizes filled pause utterances made by a speaker. In one embodiment, an ergodic model is used to acoustically model filled pauses that provides flexibility allowing varying utterances of the filled pauses to be made. The ergodic HMM model can also be used for other types of noise such as but limited to breathing, keyboard operation, microphone noise, laughter, door openings and/or closings, or any other noise occurring in the environment of the user or made by the user. Similarly, silence can be modeled using an ergodic HMM model. Recognition can be used with N-gram, context-free grammar or hybrid language models.
    Type: Application
    Filed: March 13, 2003
    Publication date: September 16, 2004
    Applicant: Microsoft Corporation
    Inventor: Mei-Yuh Hwang
  • Publication number: 20040176956
    Abstract: A pattern recognition system and method are provided. Aspects of the invention are particularly useful in combination with multi-state Hidden Markov Models. Pattern recognition is effected by processing Hidden Markov Model Blocks. This block-processing allows the processor to perform more operations upon data while such data is in cache memory. By so increasing cache locality, aspects of the invention provide significantly improved pattern recognition speed.
    Type: Application
    Filed: March 4, 2003
    Publication date: September 9, 2004
    Applicant: Microsoft Corporation
    Inventors: William H. Rockenbeck, Julian J. Odell
  • Patent number: 6789063
    Abstract: In some embodiments, the invention involves receiving phonetic samples and assembling a two-level phonetic decision tree structure using the phonetic samples. The decision tree has multiple leaf node levels each having at least one state, wherein a least one node in a second level is assigned a Gaussian of a node in the first level, but the at least one node in the second level has a weight computed for it.
    Type: Grant
    Filed: September 1, 2000
    Date of Patent: September 7, 2004
    Assignee: Intel Corporation
    Inventor: Yonghong Yan
  • Patent number: 6789066
    Abstract: An arrangement is provided for compressing speech data. Speech data is compressed based on a phoneme stream, detected from the speech data, and a delta stream, determined based on the difference between the speech data and a speech signal stream, generated using the phoneme stream with respect to a voice font. The compressed speech data is decompressed into a decompressed phoneme stream and a decompressed delta stream from which the speech data is recovered.
    Type: Grant
    Filed: September 25, 2001
    Date of Patent: September 7, 2004
    Assignee: Intel Corporation
    Inventors: Stephen Junkins, Chris L. Gorman
  • Patent number: 6789061
    Abstract: Computer-based methods and systems are provided for automatically generating, from a first speech recognizer, a second speech recognizer such that the second speech recognizer is tailored to a certain application and requires reduced resources compared to the first speech recognizer. The invention exploits the first speech recognizer's set of states si and set of probability density functions (pdfs) assembling output probabilities for an observation of a speech frame in said states si. The invention teaches a first step of generating a set of states of the second speech recognizer reduced to a subset of states of the first speech recognizer being distinctive of the certain application. The invention teaches a second step of generating a set of probability density functions of the second speech recognizer reduced to a subset of probability density functions of the first speech recognizer being distinctive of the certain application.
    Type: Grant
    Filed: August 14, 2000
    Date of Patent: September 7, 2004
    Assignee: International Business Machines Corporation
    Inventors: Volker Fischer, Siegfried Kunzmann, Claire Waast-Ricard
  • Patent number: 6778959
    Abstract: A system and method for speech verification using out-of-vocabulary models includes a speech recognizer that has a model bank with system vocabulary word models, a garbage model, and one or more noise models. The model bank may reject an utterance or other sound as an invalid vocabulary word when the model bank identifies the utterance or other sound as corresponding to the garbage model or the noise models. Initial noise models may be selectively combined into a pre-determined number of final noise model clusters to effectively reduce the number of noise models that are utilized by the model bank of the speech recognizer to verify system vocabulary words.
    Type: Grant
    Filed: October 18, 2000
    Date of Patent: August 17, 2004
    Assignees: Sony Corporation, Sony Electronics Inc.
    Inventors: Duanpei Wu, Lex Olorenshaw, Xavier Menendez-Pidal, Ruxin Chen
  • Patent number: 6772120
    Abstract: Computer method and apparatus for segmenting text streams is disclosed. Given is an input text stream formed of a series of words. A probability member provides working probabilities that a group of words is of a topic selected from a plurality of predetermined topics. The probability member accounts for relationships between words. A processing module receives the input text stream and using the probability member determines probability of certain words in the input text stream being of a same topic. As such, the processing module segments the input text stream into single topic groupings of words, where each grouping is of a respective single topic.
    Type: Grant
    Filed: November 21, 2000
    Date of Patent: August 3, 2004
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Pedro J. Moreno, David M. Blei
  • Publication number: 20040143434
    Abstract: A method segments and summarizes a news video using both audio and visual features extracted from the video. The summaries can be used to quickly browse the video to locate topics of interest. A generalized sound recognition hidden Markov model (HMM) framework for joint segmentation and classification of the audio signal of the news video is used. The HMM not only provides a classification label for audio segment, but also compact state duration histogram descriptors.
    Type: Application
    Filed: January 17, 2003
    Publication date: July 22, 2004
    Inventors: Ajay Divakaran, Regunathan Radhakrishnan
  • Publication number: 20040143435
    Abstract: A method of speech recognition is provided that determines a production-related value, vocal-tract resonance frequencies in particular, for a state at a particular frame based on the production-related values associated with two preceding frames using a recursion. The production-related value is used to determine a probability distribution of the observed feature vector for the state. A probability for an observed value received for the frame is then determined from the probability distribution. Under one embodiment, the production-related value is determined using a noise-free recursive definition for the value. Use of the recursion substantially improves the decoding speed. When the decoding algorithm is applied to training data with known phonetic transcripts, forced alignment is created which improves the phone segmentation obtained from the prior art.
    Type: Application
    Filed: January 21, 2003
    Publication date: July 22, 2004
    Inventors: Li Deng, Jian-Iai Zhou, Frank Torsten Bernd Seide
  • Publication number: 20040122672
    Abstract: The Gaussian Dynamic Time Warping model provides a hierarchical statistical model for representing an acoustic pattern. The first layer of the model represents the general acoustic space; the second layer represents each speaker space and the third layer represents the temporal structure information contained in each enrollment speech utterance, based on equally-spaced time intervals. These three layers are hierarchically developed: the second layer is derived from the first, and the third layer is derived from the second. The model is useful in speech processing application, particularly in applications such as word and speaker recognition, using a spotting recognition mode.
    Type: Application
    Filed: December 18, 2002
    Publication date: June 24, 2004
    Inventors: Jean-Francois Bonastre, Philippe Morin, Jean-Claude Junqua
  • Patent number: 6754629
    Abstract: A method and system that combines voice recognition engines and resolves differences between the results of individual voice recognition engines using a mapping function. Speaker independent voice recognition engines and speaker-dependent voice recognition engines are combined. Hidden Markov Model (HMM) engines and Dynamic Time Warping (DTW) engines are combined.
    Type: Grant
    Filed: September 8, 2000
    Date of Patent: June 22, 2004
    Assignee: Qualcomm Incorporated
    Inventors: Yingyong Qi, Ning Bi, Harinath Garudadri
  • Publication number: 20040117187
    Abstract: An assembly of word models produced from a word model producer is sent to a matching object word selector to select one word model as matching object from them. A word matching processor judges whether or not a score of a path root of a present state serving as matching object is within a predetermined range being set based on a maximum value of the score, which is memorized in a maximum value memory buffer connected to the word matching processor. When the score of the path root is with in the above-described range, the score of this path root is designated as count object and a cumulative score is obtained. On the other hand, when the score of the path root is outside the above-described range, calculation of score for the state of the matching object is omitted.
    Type: Application
    Filed: July 7, 2003
    Publication date: June 17, 2004
    Applicant: Renesas Technology Corp.
    Inventor: Masahiko Ikeda
  • Publication number: 20040111263
    Abstract: The invention provides an acoustic model creating method that can reduce the number of parameters and optimize the Gaussian distribution number for respective states constituting an HMM in order to create an HMM having high recognition ability. HMM sets in which the Gaussian distribution numbers of the respective states constituting the respective syllable HMMs are set from one to the maximum distribution number (the distribution number of which is 64) is trained using training speech data, and the respective states of the respective HMMs are viterbi-aligned with the training speech data corresponding to the HMMs using a syllable HMM set to the maximum distribution number among the trained syllable HMM sets. Then, a description length computing unit computes a description length for the respective states of the respective HMMs using the alignment data, and a state selecting unit selects a state having the distribution number the description length of which is minimum.
    Type: Application
    Filed: September 17, 2003
    Publication date: June 10, 2004
    Applicant: SEIKO EPSON CORPORATION
    Inventors: Masanobu Nishitani, Yasunaga Miyazawa, Hiroshi Matsumoto, Kazumasa Yamamoto
  • Publication number: 20040102974
    Abstract: The speech recognition rate which is necessary is determined for a selected speech recognition application. The information content of the feature vector components which is at least necessary to ensure the speech recognition rate is determined using a stored speech recognition rate information. The number of necessary feature vector components which is necessary to make available the determined information content is determined and the speech recognition is carried out using feature vectors with the determined required number of feature vector components.
    Type: Application
    Filed: September 16, 2003
    Publication date: May 27, 2004
    Inventors: Michael Kustner, Ralf Sambeth
  • Patent number: 6735588
    Abstract: An information search method and apparatus employ an Inverse Hidden Markov Model (IHMM) for stochastically searching for a reference information model among a plurality of predetermined reference information models obtained by training that best matches unknown information which is expressed by a Hidden Markov Model (HMM) chain. The method and apparatus find an optimal path in a HMM state lattice using a minimum unlikelihood score, rather than a maximum likelihood score, and using a Viterbi algorithm, to recognize unknown information, so that unnecessary computations are avoided. The method and apparatus can be used for finding the most likely path through a vocabulary network for a given utterance.
    Type: Grant
    Filed: May 14, 2001
    Date of Patent: May 11, 2004
    Assignees: Samsung Electronics Co., Ltd., Sungkyunkwan University
    Inventors: Bo-Sung Kim, Jun-dong Cho, Young-hoon Chang, Sun-hee Park
  • Patent number: 6735566
    Abstract: A system for learning a mapping between time-varying signals is used to drive facial animation directly from speech, without laborious voice track analysis. The system learns dynamical models of facial and vocal action from observations of a face and the facial gestures made while speaking. Instead of depending on heuristic intermediate representations such as phonemes or visemes, the system trains hidden Markov models to obtain its own optimal representation of vocal and facial action. An entropy-minimizing training technique using an entropic prior ensures that these models contain sufficient dynamical information to synthesize realistic facial motion to accompany new vocal performances. In addition, they can make optimal use of context to handle ambiguity and relatively long-lasting facial co-articulation effects.
    Type: Grant
    Filed: October 9, 1998
    Date of Patent: May 11, 2004
    Assignee: Mitsubishi Electric Research Laboratories, Inc.
    Inventor: Matthew E. Brand
  • Publication number: 20040083104
    Abstract: A system (100) provides speaker identification training. The system (100) generates speaker models and receives audio segments. The system (100) identifies speakers corresponding to the audio segments based on the speaker models. At least one of the audio segments has an unidentified or misidentified speaker (i.e., an audio segment whose speaker cannot be accurately identified). The system (100) presents, to a user, audio segments that include an audio segment whose speaker is unidentified or misidentified and receives, from the user, the name of the unidentified or misidentified speaker. The system (100) may use this information to subsequently identify the unidentified or misidentified speaker by name for future audio segments.
    Type: Application
    Filed: October 16, 2003
    Publication date: April 29, 2004
    Inventors: Daben Liu, Francis G. Kubala
  • Patent number: 6728674
    Abstract: A method and a system for corrective training of speech models includes changing a weight of a date sample whenever a data sample is incorrectly associated with a classifier and retraining each classifier with the weights.
    Type: Grant
    Filed: July 31, 2000
    Date of Patent: April 27, 2004
    Assignee: Intel Corporation
    Inventor: Meir Griniasty
  • Patent number: 6725195
    Abstract: Probabilistic recognition using clusters and simple probability functions provides improved performance by employing a limited number of clusters each using a relatively large number of simple probability functions. The simple probability functions for each of the limited number of state clusters are greater in number than the limited number of state clusters.
    Type: Grant
    Filed: October 22, 2001
    Date of Patent: April 20, 2004
    Assignee: SRI International
    Inventors: Ananth Sankar, Venkata Ramana Rao Gadde
  • Patent number: 6714910
    Abstract: Provided is a method of training an automatic speech recognizer, said speech recognizer using acoustic models and/or speech models, wherein speech data is collected during a training phase and used to improve the acoustic models, said method comprising: during the training phase, providing speech utterances that are predefined to a user by means of a game, wherein the game has predefined rules to enable a user to provide certain utterances; and providing the utterances by the user for training the speech recognizer.
    Type: Grant
    Filed: June 26, 2000
    Date of Patent: March 30, 2004
    Assignee: Koninklijke Philips Electronics, N.V.
    Inventors: Georg Rose, Joseph Hubertus Eggen, Bartel Marinus Van Der Sluis
  • Publication number: 20040059576
    Abstract: The present invention relates to a speech recognition apparatus for recognizing speeches of a plurality of users with high accuracy. An adapting unit 12 detects a best transformation function for adapting an input speech to an acoustic model from at least one transformation function based on the transformation results which are obtained by transforming the input speech by at least one transformation function stored in a storing unit 13, and allocates the input speech to the best transformation function. Further, the adapting unit 12 updates the transformation function to which the new input speech is allocated by all the input speeches allocated to the transformation function. A selecting unit 14 selects the transformation function used for transforming the input speech from at least one transformation function stored in the storing unit 13. A transforming unit 5 transforms the input speech by the selected transformation function.
    Type: Application
    Filed: October 20, 2003
    Publication date: March 25, 2004
    Inventor: Helmut Lucke
  • Patent number: 6708149
    Abstract: The present invention discloses an apparatus and method of decoding information received over a noisy communications channel to determine the intended transmitted information. The present invention uses a vector fixed-lag algorithm to determine the probabilities of the intended transmitted information. The algorithm is implemented by multiplying an initial state vector with a matrix containing information about the communications channel. The product is then recursively multiplied by the matrix &tgr; times, using the new product with each recursive multiplication and the forward information is stored for a fixed period of time, &tgr;. The final product is multiplied with a unity column vector yielding a probability of a possible input. The estimated input is the input having the largest probability.
    Type: Grant
    Filed: April 30, 2001
    Date of Patent: March 16, 2004
    Assignee: AT&T Corp.
    Inventor: William Turin