Markov Patents (Class 704/256)
-
Patent number: 6917919Abstract: A speech recognition method is described in which a basic set of models is adapted to a current speaker on account of the speaker's already noticed speech data. The basic set of models comprises models for different acoustic units. The models are described each by a plurality of model parameters. The basic set of models is then represented by a supervector in a high-dimensional vector space (model space), the supervector being formed by a concatenation of the plurality of the model parameters of the models of the basic set of models. The adaptation of this basic set of models to the speaker is effected in the model space by means of a MAP method in which an asymmetric distribution in the model space is selected as an a priori distribution for the MAP method.Type: GrantFiled: September 24, 2001Date of Patent: July 12, 2005Assignee: Koninklijke Philips Electronics, N.V.Inventor: Henrik Botterweck
-
Patent number: 6912498Abstract: Correcting incorrect text associated with recognition errors in computer-implemented speech recognition includes receiving a selection of a word from a recognized utterance. The selection indicates a bound of a portion of the recognized utterance to be corrected. A first recognition correction is produced based on a comparison between a first alternative transcript and the recognized utterance. A second recognition correction is produced based on a comparison between a second alternative transcript and the recognized utterance. The duration of the first recognition correction differs from the duration of the second recognition correction. A portion of the recognition result that is replaced with one of the first recognition correction and the second recognition correction. includes at one bound a word indicated by the selection and extends for the duration of the one of the first recognition correction and the second recognition correction with which the portion is replaced.Type: GrantFiled: May 2, 2001Date of Patent: June 28, 2005Assignee: ScanSoft, Inc.Inventors: Daniell Stevens, Robert Roth, Joel M. Gould, Michael J. Newman, Dean Sturtevant, Charles E. Ingold, David Abrahams, Allan Gold
-
Patent number: 6910013Abstract: The invention relates first of all to a method for identifying a transient acoustic scene, said method including the extraction, during an extraction phase, of characteristic features from an acoustic signal captured by at least one microphone (2a, 2b), and the identification, during an identification phase, of the transient acoustic scene on the basis of the extracted characteristics. According to the invention, at least auditory-based characteristics are identified in the extraction phase. Also specified are an application of the method per this invention and a hearing device.Type: GrantFiled: January 5, 2001Date of Patent: June 21, 2005Assignee: Phonak AGInventors: Sylvia Allegro, Michael Büchler
-
Patent number: 6901365Abstract: The invention enables even a CPU having low processing performance to find an HMM output probability by simplifying arithmetic operations. The dimensions of an input vector are grouped into several sets, and tables are created for the sets. When an output probability is calculated, codes corresponding to the first dimension to n-the dimension of the input vector are sequentially obtained, and for each code, by referring to the corresponding table, output values for each table are obtained. By substituting the output values for each table for a formula for finding an output probability, the output probability is found.Type: GrantFiled: September 19, 2001Date of Patent: May 31, 2005Assignee: Seiko Epson CorporationInventor: Yasunaga Miyazawa
-
Patent number: 6895380Abstract: An interactive voice actuated control system for a testing machine such as a tensile testing machine is described. Voice commands are passed through a user-command predictor and integrated with a graphical user interface control panel to allow hands-free operation. The user-command predictor learns operator command patterns on-line and predicts the most likely next action. It assists less experienced operators by recommending the next command, and it adds robustness to the voice command interpreter by verbally asking the operator to repeat unlikely commanded actions. The voice actuated control system applies to industrial machines whose normal operation is characterized by a nonrandom series of commands.Type: GrantFiled: March 2, 2001Date of Patent: May 17, 2005Assignee: Electro Standards LaboratoriesInventor: Raymond Sepe, Jr.
-
Patent number: 6895377Abstract: A phonetic data processing system processes phonetic stream data to produce a set of semantic data, using a context-free rich semantic grammar database (RSG DB) that includes a grammar tree, comprised of sub-trees, representing words and phrases. A phonetic searcher accepts the phonetic estimates and searches the RSG DB to produce a best word list, which is processed by a semantic parser, using the RSG DB, to produce a semantic tree instance, including all valid interpretations of the phonetic stream. An application accesses a semantic tree evaluator to interpret the semantic tree instance according to a context to produce a final linguistic interpretation of the phonetic stream, which is returned to the application.Type: GrantFiled: March 23, 2001Date of Patent: May 17, 2005Assignee: Eliza CorporationInventors: John Kroeker, Oleg Boulanov, Andrey Yelpatov
-
Patent number: 6882973Abstract: A voice processing system includes a speech recognition facility with barge-in. The system plays out a prompt to a caller, who starts to provide their spoken response while the prompt is still being played out. The system performs speech recognition on this response to determine a corresponding text, which is then subjected to lexical analysis. This tests whether the text satisfies one or more conditions, for example, including one or more words from a predefined set of task words. If this is found to be the case, the playing out of the prompt is terminated (i.e. barge-in is effected); otherwise, the playing out of the prompt is continued, essentially as if the caller bad not interrupted.Type: GrantFiled: October 25, 2000Date of Patent: April 19, 2005Assignee: International Business Machines CorporationInventor: John Brian Pickering
-
Patent number: 6871177Abstract: A method and apparatus of recognizing a pattern comprising a sequence of sub-patterns includes a set of possible patterns being modelled by a network of sub-pattern models. One or more initial software model objects are instantiated first. As these models produce outputs, succeeding model objects are instantiated if they have not already been instantiated. However, the succeeding model objects are only instantiated if a triggering model output meets a predetermined criterion. This ensures that the processing required is maintained at a manageable level. If the models comprise finite state networks, pruning of internal states may also be performed. The criterion applied to this pruning is less harsh than that applied when determining whether to instantiate a succeeding model.Type: GrantFiled: October 27, 1998Date of Patent: March 22, 2005Assignee: British Telecommunications public limited companyInventors: Simon A Hovell, Mark Wright, Simon P. A Ringland
-
Patent number: 6862359Abstract: A hearing prosthesis that automatically adjusts itself to a surrounding listening environment by applying Hidden Markov Models is provided. In one aspect, classification results are utilized to support automatic parameter adjustment of a parameter or parameters of a predetermined signal processing algorithm executed by processing means of the hearing prosthesis. According to another aspect, features vectors extracted from a digital input signal of the hearing prosthesis and processed by the Hidden Markov Models represent substantially level and/or absolute spectrum shape independent signal features of the digital input signal. This level independent property of the extracted features vectors provides robust classification results in real-life acoustic environments.Type: GrantFiled: May 29, 2002Date of Patent: March 1, 2005Assignee: GN ReSound A/SInventors: Nils Peter Nordqvist, Arne Leijon
-
Patent number: 6853962Abstract: Training apparatus for training a user to engage in transactions (e.g. a foreign language conversation) with another person whom the apparatus is arranged to simulate, the apparatus comprising: an input for receiving input dialogue from a user; a lexical store containing data relating to individual words of said input dialogue; a rule store containing rules specifying grammatically allowable relationships between words of said input dialogue; a transaction store containing data relating to allowable transactions between said user and said person; a processor arranged to process the input dialogue to recognise the occurrence therein of words contained in said lexical store in the relationships specified by the rules contained in said rule store in accordance with the data specified in the transaction store, and to generate output dialogue indicating when correct input dialogue has been recognised; and an output device for making the output dialogue available to the user.Type: GrantFiled: September 11, 1997Date of Patent: February 8, 2005Assignee: British Telecommunications public limited companyInventor: Stephen C Appleby
-
Patent number: 6850888Abstract: A method and apparatus are disclosed for training a pattern recognition system, such as a speech recognition system, using an improved objective function. The concept of rank likelihood, previously applied only to the decding process, is applied in a novel manner to the parameter estimation of the training phase of a pattern recognition system. The disclosed objective function is based on a pseudo-rank likelihood that not only maximizes the likelihood of an observation for the correct class, but also minimizes the likelihoods of the observation for all other classes, such that the discrimination between classes is maximized. A training process is disclosed that utilizes the pseudo-rank likelihood objective function to identify model parameters that will result in a pattern recognizer with the lowest possible recognition error rate. The discrete nature of the rank-based rank likelihood objective function is transformed to allow the parameter estimations to be optimized during the training phase.Type: GrantFiled: October 6, 2000Date of Patent: February 1, 2005Assignee: International Business Machines CorporationInventors: Yuqing Gao, Yongxin Li, Michael Alan Picheny
-
Patent number: 6847921Abstract: The invention is a method for analyzing spatially-varying noise in seismic data. Transitions between data values at adjacent data locations in a seismic data set are represented by Markov chains. Transition probability matrices are constructed from the Markov chains. Data values are predicted from the calculated transition probabilities. Noise values are determined from the predicted data values.Type: GrantFiled: April 7, 2003Date of Patent: January 25, 2005Assignee: ExxonMobil Upstream Research CompanyInventors: Alex Woronow, John F. Schuette, Chrysanthe S. Munn
-
Patent number: 6845357Abstract: Data structures, systems, and methods are aspects of pattern recognition using observable operator models (OOMs). OOMs are more efficient than Hidden Markov Models (HMMs). A data structure for an OOM has characteristic events, an initial distribution vector, a probability transition matrix, an occurrence count matrix, and at least one observable operator. System applications include computer systems, cellular phones, wearable computers, home control systems, fire safety or security systems, PDAs, and flight systems. A method of pattern recognition comprises training OOMs, receiving unknown input, computing matching probabilities, selecting the maximum probability, and displaying the match. A method of speech recognition comprises sampling a first input stream, performing a spectral analysis, clustering, training OOMs, and recognizing speech using the OOMs.Type: GrantFiled: July 24, 2001Date of Patent: January 18, 2005Assignee: Honeywell International Inc.Inventors: Ravindra K. Shetty, Venkatesan Thyagarajan
-
Publication number: 20040267530Abstract: Methods are given for improving discriminative training of hidden Markov models for continuous speech recognition. In one approach, discriminatively trained mixture models are interpolated with maximum likelihood trained mixture models. In another approach, segmentation and recognition results from one set of models are reused to discriminatively train a second set of models. For example, segmentation and recognition results from detailed match models are mapped and used to discriminatively train fast match models. In addition, gradients for the standard deviation of mixture components are clipped based on the statistics of the gradients. Pronunciation of words may also be used to determine the “incorrect” recognition hypothesis.Type: ApplicationFiled: November 21, 2003Publication date: December 30, 2004Inventors: Chuang He, Jianxiong Wu, Vlad Sejnoha
-
Method and array for introducing temporal correlation in hidden markov models for speech recognition
Patent number: 6832190Abstract: In the recognition of spoken language, phonemes of the language are modelled by hidden Markov models. A modified hidden Markov model includes a conditional probability of a feature vector dependent on chronologically preceding feature vectors and, optionally, additionally comprises a conditional probability of a respectively current status. A global search for recognizing a word sequence in the spoken language is implemented with the modified hidden Markov model.Type: GrantFiled: November 10, 2000Date of Patent: December 14, 2004Assignee: Siemens AktiengesellschaftInventors: Jochen Junkawitsch, Harald Höge -
Patent number: 6832191Abstract: To implement a speech recognizer for a language in conditions of substantial unavailability of related speech training material the first step (1,2) is, based on related speech training material, a multilingual speech recognizer (2) for a plurality of known languages. The recognizer for such given language (5) is then implemented by interpolation (4) starting from the said multilingual recognizer (2). The recognizer (5) generated in this fashion is susceptible of being subsequently refined based on related speech training material acquired online (4) during later use (FIG.Type: GrantFiled: August 28, 2000Date of Patent: December 14, 2004Assignee: Telecom Italia Lab S.p.A.Inventors: Alessandra Frasca, Giorgio Micca, Enrico Palme
-
Publication number: 20040236577Abstract: To provide an acoustic model which can absorb the fluctuation of a phonemic environment in an interval longer than a syllable, with the number of parameters of the acoustic model suppressed to be small, a phoneme-connected syllable HMM/syllable-connected HMM set is generated in such a way that a phoneme-connected syllable HMM set corresponding to individual syllables is generated by combining phoneme HMMs. A preliminary experiment is conducted using the phoneme-connected syllable HMM set and training speech data. Any misrecognized syllable and the preceding syllable of the misrecognized syllable are checked using results of a preliminary experiment syllable label data. The combination between a correct answer syllable for the misrecognized syllable and the preceding syllable of the misrecognized syllable is extracted as a syllable connection. A syllable-connected HMM corresponding to this syllable connection is added into the phoneme-connected syllable HMM set.Type: ApplicationFiled: March 8, 2004Publication date: November 25, 2004Applicant: Seiko Epson CorporationInventors: Masanobu Nishitani, Yasunaga Miyazawa, Hiroshi Matsumoto, Kazumasa Yamamoto
-
Patent number: 6823304Abstract: A lead consonant buffer stores a feature parameter preceding a lead voiced sound detected by a voiced sound detector as a feature parameter of a lead consonant. A matching processing unit performs matching processing of a feature parameter of a lead consonant stored in the lead consonant buffer with a feature parameter of a registered pattern. Hence, the matching processing unit can perform matching processing reflecting information on a lead consonant even when no lead consonant can be detected due to a noise.Type: GrantFiled: July 19, 2001Date of Patent: November 23, 2004Assignee: Renesas Technology Corp.Inventor: Masahiko Ikeda
-
Patent number: 6823308Abstract: A speech recognition method for use in a multimodal input system comprises receiving a multimodal input comprising digitized speech as a first modality input and data in at least one further modality input. Features in the speech and in the data in at least one further modality are identified. The identified features in the speech and in the data are used in the recognition of words by comparing the identified features with states in models for the words. The models have states for the recognition of speech and for words having features in at least one further modality associated with the words, the models also have states for the recognition of events in the further modality or each further modality.Type: GrantFiled: February 16, 2001Date of Patent: November 23, 2004Assignee: Canon Kabushiki KaishaInventors: Robert Alexander Keiller, Nicolas David Fortescue
-
Patent number: 6823306Abstract: Techniques for generating, distributing, and using speech recognition models are described. A shared speech processing facility is used to support speech recognition for a wide variety of devices with limited capabilities including business computer systems, personal data assistants, etc., which are coupled to the speech processing facility via a communications channel, e.g., the Internet. Devices with audio capture capability record and transmit to the speech processing facility, via the Internet, digitized speech and receive speech processing services, e.g., speech recognition model generation and/or speech recognition services, in response. The Internet is used to return speech recognition models and/or information identifying recognized words or phrases. Thus, the speech processing facility can be used to provide speech recognition capabilities to devices without such capabilities and/or to augment a device's speech processing capability.Type: GrantFiled: November 30, 2000Date of Patent: November 23, 2004Assignee: Telesector Resources Group, Inc.Inventors: Craig Reding, Suzi Levas
-
Patent number: 6801892Abstract: Disclosed is a speech recognition method in a speech recognition apparatus to applying speech recognition to a voice signal applied thereto. The input voice signal is converted from an analog to a digital signal and sequences of feature vectors are extracted based upon the digital signal (S12). A search space is defined by the sequences of feature vectors and an HMM (16) prepared beforehand for each unit of speech. The search space allows a transition between HMMs only in specific feature-vector sequences. A search is conducted in this space to find an optimum path for which the largest acoustic likelihood regarding the voice signal is obtained to find the result of recognition (S14), and this result is output (S15).Type: GrantFiled: March 27, 2001Date of Patent: October 5, 2004Assignee: Canon Kabushiki KaishaInventor: Hiroki Yamamoto
-
Patent number: 6801895Abstract: The present invention provides for a method and apparatus for segmenting a multi-media program based upon audio events. In an embodiment a method of classifying an audio stream is provided. This method includes receiving an audio stream. Sampling the audio stream at a predetermined rate and then combining a predetermined number of samples into a clip. A plurality of features are then determined for the clip and are analyzed using a linear approximation algorithm. The clip is then characterized based upon the results of the analysis conducted with the linear approximation algorithm.Type: GrantFiled: December 6, 1999Date of Patent: October 5, 2004Assignee: AT&T Corp.Inventors: Qian Huang, Zhu Liu
-
Patent number: 6801890Abstract: The invention relates to a method for enhancing recognition probability in voice recognition systems. According to the inventive method, selective post-training of the already stored homonymic term is carried out after inputting a term to be recognized. This makes it possible to improve the speaker-dependent recognition rate even in environments with prevailing acoustic interference.Type: GrantFiled: November 13, 2000Date of Patent: October 5, 2004Assignees: DeTeMobil, Deutsche Telekom MobilNet GmbH, Deutsche Telekom AGInventors: Ulrich Kauschke, Herbert Roland Rast, Fred Runge
-
Publication number: 20040193418Abstract: The present invention comprises a system and method for implementing a Cantonese speech recognizer with an optimized phone set, and may include a recognizer configured to compare input speech data to phone strings from a vocabulary dictionary that is implemented according to an optimized Cantonese phone set. The optimized Cantonese phone set may be implemented with a phonetic technique to separately include consonantal phones and vocalic phones. For reasons of system efficiency, the optimized Cantonese phone set may preferably be implemented in a compact manner to include only a minimum required number of consonantal phones and vocalic phones to accurately represent Cantonese speech during the speech recognition procedure.Type: ApplicationFiled: March 24, 2003Publication date: September 30, 2004Applicant: Sony Corporation and Sony Electronics Inc.Inventors: Michael Emonts, Xavier Menendez-Pidal, Lex Olorenshaw
-
Publication number: 20040193419Abstract: A method and system for training an audio analyzer (114) to identify asynchronous segments of audio types using sample data sets, the sample data sets being representative of audio signals for which segmentation is desired. The system and method then label asynchronous segments of audio samples, collected at the target site, into a plurality of categories by cascading hidden Markov models (HMM). The cascaded HMMs consist of 2 stages, the output of the first stage HMM (208) being transformed and used as observation inputs to the second stage HMM (212). This cascaded HMM approach allows for modeling processes with complex temporal characteristics by using training data. It also contains a flexible framework that allows for segments of varying duration. The system and method are particularly useful in identifying and separating segments of the human voice for voice recognition systems from other audio such as music.Type: ApplicationFiled: March 31, 2003Publication date: September 30, 2004Inventors: Steven F. Kimball, Joanne Como
-
Publication number: 20040186718Abstract: Method and apparatus for an audiovisual continuous speech recognition (AVCSR) system using a coupled hidden Markov model (CHMM) are described herein. In one aspect, an exemplary process includes receiving an audio data stream and a video data stream, and performing continuous speech recognition based on the audio and video data streams using a plurality of hidden Markov models (HMMs), a node of each of the HMMs at a time slot being subject to one or more nodes of related HMMs at a preceding time slot. Other methods and apparatuses are also described.Type: ApplicationFiled: March 19, 2003Publication date: September 23, 2004Inventors: Ara Victor Nefian, Xiaoxing Liu, Xiaobo Pi, Luhong Liang, Yibao Zhao
-
Publication number: 20040186717Abstract: A method of reconstructing a damaged sequence of symbols where some symbols are missing is provided in which statistical parameters of the sequence are used with confidence windowing techniques to quickly and efficiently reconstruct the damaged sequence to its original form. Confidence windowing techniques are provided that are equivalent to generalized hidden semi-Markov models but which are more easily used to determine the most likely missing symbol at a given point in the damaged sequence being reconstructed. The method can be used to reconstruct communications consisting of speech, music, digital transmission symbols and others having a bounded symbol set which can be described by statistical behaviors in the symbol stream.Type: ApplicationFiled: March 17, 2003Publication date: September 23, 2004Applicant: Rensselaer Polytechnic InstituteInventors: Michael Savic, Michael Moore
-
Publication number: 20040181409Abstract: To make speech recognition robust in a noisy environment, variable parameter Gaussian Mixture HMM is described which extends existing HMMs by allowing HMM parameters to change as a function of a continuous variable that depends on the environment. Specifically, in one embodiment the function is a polynomial, the environment is described by signal-to-noise ratio. The use of the parameters functions improves the HMM discriminability during multi-condition training. In the recognition process, a set of HMM parameters is instantiated according to parameter functions, based on current environment. The model parameters are estimated using Expectation-Maximization algorithm for variable parameter GMHMM.Type: ApplicationFiled: March 11, 2003Publication date: September 16, 2004Inventors: Yifan Gong, Xiaodong Cui
-
Publication number: 20040181410Abstract: A speech recognition system recognizes filled pause utterances made by a speaker. In one embodiment, an ergodic model is used to acoustically model filled pauses that provides flexibility allowing varying utterances of the filled pauses to be made. The ergodic HMM model can also be used for other types of noise such as but limited to breathing, keyboard operation, microphone noise, laughter, door openings and/or closings, or any other noise occurring in the environment of the user or made by the user. Similarly, silence can be modeled using an ergodic HMM model. Recognition can be used with N-gram, context-free grammar or hybrid language models.Type: ApplicationFiled: March 13, 2003Publication date: September 16, 2004Applicant: Microsoft CorporationInventor: Mei-Yuh Hwang
-
Publication number: 20040176956Abstract: A pattern recognition system and method are provided. Aspects of the invention are particularly useful in combination with multi-state Hidden Markov Models. Pattern recognition is effected by processing Hidden Markov Model Blocks. This block-processing allows the processor to perform more operations upon data while such data is in cache memory. By so increasing cache locality, aspects of the invention provide significantly improved pattern recognition speed.Type: ApplicationFiled: March 4, 2003Publication date: September 9, 2004Applicant: Microsoft CorporationInventors: William H. Rockenbeck, Julian J. Odell
-
Patent number: 6789063Abstract: In some embodiments, the invention involves receiving phonetic samples and assembling a two-level phonetic decision tree structure using the phonetic samples. The decision tree has multiple leaf node levels each having at least one state, wherein a least one node in a second level is assigned a Gaussian of a node in the first level, but the at least one node in the second level has a weight computed for it.Type: GrantFiled: September 1, 2000Date of Patent: September 7, 2004Assignee: Intel CorporationInventor: Yonghong Yan
-
Patent number: 6789066Abstract: An arrangement is provided for compressing speech data. Speech data is compressed based on a phoneme stream, detected from the speech data, and a delta stream, determined based on the difference between the speech data and a speech signal stream, generated using the phoneme stream with respect to a voice font. The compressed speech data is decompressed into a decompressed phoneme stream and a decompressed delta stream from which the speech data is recovered.Type: GrantFiled: September 25, 2001Date of Patent: September 7, 2004Assignee: Intel CorporationInventors: Stephen Junkins, Chris L. Gorman
-
Patent number: 6789061Abstract: Computer-based methods and systems are provided for automatically generating, from a first speech recognizer, a second speech recognizer such that the second speech recognizer is tailored to a certain application and requires reduced resources compared to the first speech recognizer. The invention exploits the first speech recognizer's set of states si and set of probability density functions (pdfs) assembling output probabilities for an observation of a speech frame in said states si. The invention teaches a first step of generating a set of states of the second speech recognizer reduced to a subset of states of the first speech recognizer being distinctive of the certain application. The invention teaches a second step of generating a set of probability density functions of the second speech recognizer reduced to a subset of probability density functions of the first speech recognizer being distinctive of the certain application.Type: GrantFiled: August 14, 2000Date of Patent: September 7, 2004Assignee: International Business Machines CorporationInventors: Volker Fischer, Siegfried Kunzmann, Claire Waast-Ricard
-
Patent number: 6778959Abstract: A system and method for speech verification using out-of-vocabulary models includes a speech recognizer that has a model bank with system vocabulary word models, a garbage model, and one or more noise models. The model bank may reject an utterance or other sound as an invalid vocabulary word when the model bank identifies the utterance or other sound as corresponding to the garbage model or the noise models. Initial noise models may be selectively combined into a pre-determined number of final noise model clusters to effectively reduce the number of noise models that are utilized by the model bank of the speech recognizer to verify system vocabulary words.Type: GrantFiled: October 18, 2000Date of Patent: August 17, 2004Assignees: Sony Corporation, Sony Electronics Inc.Inventors: Duanpei Wu, Lex Olorenshaw, Xavier Menendez-Pidal, Ruxin Chen
-
Patent number: 6772120Abstract: Computer method and apparatus for segmenting text streams is disclosed. Given is an input text stream formed of a series of words. A probability member provides working probabilities that a group of words is of a topic selected from a plurality of predetermined topics. The probability member accounts for relationships between words. A processing module receives the input text stream and using the probability member determines probability of certain words in the input text stream being of a same topic. As such, the processing module segments the input text stream into single topic groupings of words, where each grouping is of a respective single topic.Type: GrantFiled: November 21, 2000Date of Patent: August 3, 2004Assignee: Hewlett-Packard Development Company, L.P.Inventors: Pedro J. Moreno, David M. Blei
-
Publication number: 20040143434Abstract: A method segments and summarizes a news video using both audio and visual features extracted from the video. The summaries can be used to quickly browse the video to locate topics of interest. A generalized sound recognition hidden Markov model (HMM) framework for joint segmentation and classification of the audio signal of the news video is used. The HMM not only provides a classification label for audio segment, but also compact state duration histogram descriptors.Type: ApplicationFiled: January 17, 2003Publication date: July 22, 2004Inventors: Ajay Divakaran, Regunathan Radhakrishnan
-
Publication number: 20040143435Abstract: A method of speech recognition is provided that determines a production-related value, vocal-tract resonance frequencies in particular, for a state at a particular frame based on the production-related values associated with two preceding frames using a recursion. The production-related value is used to determine a probability distribution of the observed feature vector for the state. A probability for an observed value received for the frame is then determined from the probability distribution. Under one embodiment, the production-related value is determined using a noise-free recursive definition for the value. Use of the recursion substantially improves the decoding speed. When the decoding algorithm is applied to training data with known phonetic transcripts, forced alignment is created which improves the phone segmentation obtained from the prior art.Type: ApplicationFiled: January 21, 2003Publication date: July 22, 2004Inventors: Li Deng, Jian-Iai Zhou, Frank Torsten Bernd Seide
-
Publication number: 20040122672Abstract: The Gaussian Dynamic Time Warping model provides a hierarchical statistical model for representing an acoustic pattern. The first layer of the model represents the general acoustic space; the second layer represents each speaker space and the third layer represents the temporal structure information contained in each enrollment speech utterance, based on equally-spaced time intervals. These three layers are hierarchically developed: the second layer is derived from the first, and the third layer is derived from the second. The model is useful in speech processing application, particularly in applications such as word and speaker recognition, using a spotting recognition mode.Type: ApplicationFiled: December 18, 2002Publication date: June 24, 2004Inventors: Jean-Francois Bonastre, Philippe Morin, Jean-Claude Junqua
-
Patent number: 6754629Abstract: A method and system that combines voice recognition engines and resolves differences between the results of individual voice recognition engines using a mapping function. Speaker independent voice recognition engines and speaker-dependent voice recognition engines are combined. Hidden Markov Model (HMM) engines and Dynamic Time Warping (DTW) engines are combined.Type: GrantFiled: September 8, 2000Date of Patent: June 22, 2004Assignee: Qualcomm IncorporatedInventors: Yingyong Qi, Ning Bi, Harinath Garudadri
-
Publication number: 20040117187Abstract: An assembly of word models produced from a word model producer is sent to a matching object word selector to select one word model as matching object from them. A word matching processor judges whether or not a score of a path root of a present state serving as matching object is within a predetermined range being set based on a maximum value of the score, which is memorized in a maximum value memory buffer connected to the word matching processor. When the score of the path root is with in the above-described range, the score of this path root is designated as count object and a cumulative score is obtained. On the other hand, when the score of the path root is outside the above-described range, calculation of score for the state of the matching object is omitted.Type: ApplicationFiled: July 7, 2003Publication date: June 17, 2004Applicant: Renesas Technology Corp.Inventor: Masahiko Ikeda
-
Publication number: 20040111263Abstract: The invention provides an acoustic model creating method that can reduce the number of parameters and optimize the Gaussian distribution number for respective states constituting an HMM in order to create an HMM having high recognition ability. HMM sets in which the Gaussian distribution numbers of the respective states constituting the respective syllable HMMs are set from one to the maximum distribution number (the distribution number of which is 64) is trained using training speech data, and the respective states of the respective HMMs are viterbi-aligned with the training speech data corresponding to the HMMs using a syllable HMM set to the maximum distribution number among the trained syllable HMM sets. Then, a description length computing unit computes a description length for the respective states of the respective HMMs using the alignment data, and a state selecting unit selects a state having the distribution number the description length of which is minimum.Type: ApplicationFiled: September 17, 2003Publication date: June 10, 2004Applicant: SEIKO EPSON CORPORATIONInventors: Masanobu Nishitani, Yasunaga Miyazawa, Hiroshi Matsumoto, Kazumasa Yamamoto
-
Publication number: 20040102974Abstract: The speech recognition rate which is necessary is determined for a selected speech recognition application. The information content of the feature vector components which is at least necessary to ensure the speech recognition rate is determined using a stored speech recognition rate information. The number of necessary feature vector components which is necessary to make available the determined information content is determined and the speech recognition is carried out using feature vectors with the determined required number of feature vector components.Type: ApplicationFiled: September 16, 2003Publication date: May 27, 2004Inventors: Michael Kustner, Ralf Sambeth
-
Patent number: 6735588Abstract: An information search method and apparatus employ an Inverse Hidden Markov Model (IHMM) for stochastically searching for a reference information model among a plurality of predetermined reference information models obtained by training that best matches unknown information which is expressed by a Hidden Markov Model (HMM) chain. The method and apparatus find an optimal path in a HMM state lattice using a minimum unlikelihood score, rather than a maximum likelihood score, and using a Viterbi algorithm, to recognize unknown information, so that unnecessary computations are avoided. The method and apparatus can be used for finding the most likely path through a vocabulary network for a given utterance.Type: GrantFiled: May 14, 2001Date of Patent: May 11, 2004Assignees: Samsung Electronics Co., Ltd., Sungkyunkwan UniversityInventors: Bo-Sung Kim, Jun-dong Cho, Young-hoon Chang, Sun-hee Park
-
Patent number: 6735566Abstract: A system for learning a mapping between time-varying signals is used to drive facial animation directly from speech, without laborious voice track analysis. The system learns dynamical models of facial and vocal action from observations of a face and the facial gestures made while speaking. Instead of depending on heuristic intermediate representations such as phonemes or visemes, the system trains hidden Markov models to obtain its own optimal representation of vocal and facial action. An entropy-minimizing training technique using an entropic prior ensures that these models contain sufficient dynamical information to synthesize realistic facial motion to accompany new vocal performances. In addition, they can make optimal use of context to handle ambiguity and relatively long-lasting facial co-articulation effects.Type: GrantFiled: October 9, 1998Date of Patent: May 11, 2004Assignee: Mitsubishi Electric Research Laboratories, Inc.Inventor: Matthew E. Brand
-
Publication number: 20040083104Abstract: A system (100) provides speaker identification training. The system (100) generates speaker models and receives audio segments. The system (100) identifies speakers corresponding to the audio segments based on the speaker models. At least one of the audio segments has an unidentified or misidentified speaker (i.e., an audio segment whose speaker cannot be accurately identified). The system (100) presents, to a user, audio segments that include an audio segment whose speaker is unidentified or misidentified and receives, from the user, the name of the unidentified or misidentified speaker. The system (100) may use this information to subsequently identify the unidentified or misidentified speaker by name for future audio segments.Type: ApplicationFiled: October 16, 2003Publication date: April 29, 2004Inventors: Daben Liu, Francis G. Kubala
-
Patent number: 6728674Abstract: A method and a system for corrective training of speech models includes changing a weight of a date sample whenever a data sample is incorrectly associated with a classifier and retraining each classifier with the weights.Type: GrantFiled: July 31, 2000Date of Patent: April 27, 2004Assignee: Intel CorporationInventor: Meir Griniasty
-
Patent number: 6725195Abstract: Probabilistic recognition using clusters and simple probability functions provides improved performance by employing a limited number of clusters each using a relatively large number of simple probability functions. The simple probability functions for each of the limited number of state clusters are greater in number than the limited number of state clusters.Type: GrantFiled: October 22, 2001Date of Patent: April 20, 2004Assignee: SRI InternationalInventors: Ananth Sankar, Venkata Ramana Rao Gadde
-
Patent number: 6714910Abstract: Provided is a method of training an automatic speech recognizer, said speech recognizer using acoustic models and/or speech models, wherein speech data is collected during a training phase and used to improve the acoustic models, said method comprising: during the training phase, providing speech utterances that are predefined to a user by means of a game, wherein the game has predefined rules to enable a user to provide certain utterances; and providing the utterances by the user for training the speech recognizer.Type: GrantFiled: June 26, 2000Date of Patent: March 30, 2004Assignee: Koninklijke Philips Electronics, N.V.Inventors: Georg Rose, Joseph Hubertus Eggen, Bartel Marinus Van Der Sluis
-
Publication number: 20040059576Abstract: The present invention relates to a speech recognition apparatus for recognizing speeches of a plurality of users with high accuracy. An adapting unit 12 detects a best transformation function for adapting an input speech to an acoustic model from at least one transformation function based on the transformation results which are obtained by transforming the input speech by at least one transformation function stored in a storing unit 13, and allocates the input speech to the best transformation function. Further, the adapting unit 12 updates the transformation function to which the new input speech is allocated by all the input speeches allocated to the transformation function. A selecting unit 14 selects the transformation function used for transforming the input speech from at least one transformation function stored in the storing unit 13. A transforming unit 5 transforms the input speech by the selected transformation function.Type: ApplicationFiled: October 20, 2003Publication date: March 25, 2004Inventor: Helmut Lucke
-
Patent number: 6708149Abstract: The present invention discloses an apparatus and method of decoding information received over a noisy communications channel to determine the intended transmitted information. The present invention uses a vector fixed-lag algorithm to determine the probabilities of the intended transmitted information. The algorithm is implemented by multiplying an initial state vector with a matrix containing information about the communications channel. The product is then recursively multiplied by the matrix &tgr; times, using the new product with each recursive multiplication and the forward information is stored for a fixed period of time, &tgr;. The final product is multiplied with a unity column vector yielding a probability of a possible input. The estimated input is the input having the largest probability.Type: GrantFiled: April 30, 2001Date of Patent: March 16, 2004Assignee: AT&T Corp.Inventor: William Turin