Specialized Equations Or Comparisons Patents (Class 704/236)

Correlation (Class 704/237)

Distance (Class 704/238)

Similarity (Class 704/239)

Probability (Class 704/240)

Dynamic time warping (Class 704/241)

Viterbi trellis (Class 704/242)

Method and apparatus for using coding scheme selection patterns in a predictive speech coder to reduce sensitivity to frame error conditions

Patent number: 6438518

Abstract: A method and apparatus for using coding scheme selection patterns in a predictive speech coder to reduce sensitivity to frame error conditions includes a speech coder configured to select from among various predictive coding modes. After a predefined number of speech frames have been predictively coded, the speech coder codes one frame with a nonpredictive coding mode or a mildly predictive coding mode. The predefined number of frames can be determined in advance from the subjective standpoint of a listener. The predefined number of frames may be varied periodically. An average coding bit rate may be maintained for the speech coder by ensuring that an average coding bit rate is maintained for each successive pattern, or group, of predictively coded speech frames including at least one nonpredictively coded or mildly predictively coded speech frame.

Type: Grant

Filed: October 28, 1999

Date of Patent: August 20, 2002

Assignee: Qualcomm Incorporated

Inventors: Sharath Manjunath, Andrew P. Dejaco, Arasanipalai K. Ananthapadmanabhan, Eddie Lun Tik Choy
System and method for relating syntax and semantics for a conversational speech application

Publication number: 20020095286

Abstract: A conversation manager processes spoken utterances from a user of a computer. The conversation manager includes a semantics analysis module and a syntax manager. A domain model that is used in processing the spoken utterances includes an ontology (i.e., world view for the relevant domain of the spoken utterances), lexicon, and syntax definitions. The syntax manager combines the ontology, lexicon, and syntax definitions to generate a grammatic specification. The semantics module uses the grammatic specification and the domain model to develop a set of frames (i.e., internal representation of the spoken utterance). The semantics module then develops a set of propositions from the set of frames. The conversation manager then uses the set of propositions in further processing to provide a reply to the spoken utterance.

Type: Application

Filed: October 25, 2001

Publication date: July 18, 2002

Applicant: International Business Machines Corporation

Inventors: Steven I. Ross, Robert C. Armes, Julie F. Alweis, Elizabeth A. Brownholtz, Jeffrey G. MacAllister
Information apparatus for dispatching output phrase to remote terminal in response to input sound

Patent number: 6421644

Abstract: An information apparatus is constructed for notifying output information to a remote terminal in response to an input signal of a sound. In the information apparatus, a first memory block memorizes characteristic data representing characteristics of various sounds. A second memory block memorizes various items of output information in correspondence to the characteristic data of the various sounds such that each item of the output information is associated to each sound. An input device collects a sound to provide an input signal of the collected sound. An analyzer device extracts characteristic data from the input signal of the collected sound. A controller device operates according to the extracted characteristic data for addressing the first memory block and the second memory block to identity the item of the output information corresponding to the collected sound. A transmitter device transmits the identified item of the output information to the remote terminal.

Type: Grant

Filed: July 26, 1999

Date of Patent: July 16, 2002

Assignee: Yamaha Corporation

Inventor: Hiromi Okitsu
Speech recognition method using confidence measure evaluation

Patent number: 6421640

Abstract: The invention relates to a method of automatically recognizing speech utterances, in which a recognition result is evaluated by means of a first confidence measure and a plurality of second confidence measures determined for a recognition result is automatically combined for determining the first confidence measure. To reduce the resultant error rate in the assessment of the correctness of a recognition result, the method is characterized in that the determination of the parameters weighting the combination of the second confidence measures is based on a minimization of a cross-entropy-error measure. A further improvement is achieved by means of a post-processing operation based on the maximization of the Gardner-Derrida error function.

Type: Grant

Filed: September 13, 1999

Date of Patent: July 16, 2002

Assignee: Koninklijke Philips Electronics N.V.

Inventors: Jannes G. A. Dolfing, Andreas Wendemuth
Error derived scores for detection systems

Patent number: 6418409

Abstract: A method for use in a detection unit that produces a score to be converted into a binary decision via the setting of a threshold is a method for generating the score as an error-derived score such that the threshold is a tolerable one-sided error probability. The method includes the steps of generating a primary score that is a monotonic function of the posterior probability, obtaining a distribution of primary scores of input signals that ought to lead to a particular binary decision, and translating, based on the distribution, the primary score of a current input signal to the error-derived score.

Type: Grant

Filed: October 26, 1999

Date of Patent: July 9, 2002

Assignee: Persay Inc.

Inventor: Yaakov Metzger
Discriminative gaussian mixture models for speaker verification

Patent number: 6411930

Abstract: Speaker identification is performed using a single Gaussian mixture model (GMM) for multiple speakers—referred to herein as a Discriminative Gaussian mixture model (DGMM). A likelihood sum of the single GMM is factored into two parts, one of which depends only on the Gaussian mixture model, and the other of which is a discriminative term. The discriminative term allows for the use of a binary classifier, such as a support vector machine (SVM). In one embodiment of the invention, a voice messaging system incorporates a DGMM to identify the speaker who generated a message, if that speaker is a member of a chosen list of target speakers, or to identify the speaker as a “non-target” otherwise.

Type: Grant

Filed: February 12, 1999

Date of Patent: June 25, 2002

Assignee: Lucent Technologies Inc.

Inventor: Christopher John Burges
Speech processing apparatus and method for noise masking

Patent number: 6411925

Abstract: A speech processing apparatus is provided in which the distribution of energy with frequency within each frame of an input speech signal is determined and any energy components which are less than a masking level determined relative to the maximum energy within the frame are made equal to the masking level.

Type: Grant

Filed: September 30, 1999

Date of Patent: June 25, 2002

Assignee: Canon Kabushiki Kaisha

Inventor: Robert Alexander Keiller
Automatic speech recognition using segmented curves of individual speech components having arc lengths generated along space-time trajectories

Patent number: 6401064

Abstract: A method and apparatus for speech recognition using Markov processes on curves are presented. The method and apparatus operate such that input speech utterances are received and represented as multidimensional curves. The curve is split into acoustic segments representing different components based on initial model estimates. The segments are used to create a new statistical model for the curve. The process may be reiterated to produce a more precise statistical model for recognition. As a result, feature vectors are extracted from input speech and contribute to a recognition score in proportion to their arc length. The arc lengths are weighted to minimize recognition errors due to variations in speaking rate. In addition, more importance is attached to short-lived but non-stationary sounds, such as consonants.

Type: Grant

Filed: May 24, 2001

Date of Patent: June 4, 2002

Assignee: AT&T Corp.

Inventor: Lawrence Kevin Saul
Easily tunable auditory-based speech signal feature extraction method and apparatus for use in automatic speech recognition

Publication number: 20020062211

Abstract: A method and apparatus for extracting speech features from a speech signal in which the linear frequency spectrum data, as generated, for example, by a conventional frequency transform, is first converted to logarithmic frequency spectrum data having frequency data distributed on a substantially logarithmic (rather than linear) frequency scale. Then, a plurality of digital auditory filters is applied to the resultant logarithmic frequency spectrum data, each of these filters having a substantially similar shape, but centered at different points on the logarithmic frequency scale. Because each of the filters have a similar shape, the feature extraction approach of the present invention advantageously can be easily modified or tuned by adjusting each of the filters in a coordinated manner, with the adjustment of only a handful of filter parameters.

Type: Application

Filed: April 2, 2001

Publication date: May 23, 2002

Inventors: Qi P. Li, Olivier Siohan, Frank Kao-Ping Soong
Methods for speech processing

Publication number: 20020055838

Abstract: The invention relates to a method for speech processing in which input variables containing speech features are mapped onto output variables. In the mapping process, the input variables are weighted and/or identical maps are produced for different sets of input variables and at least one output variable.

Type: Application

Filed: September 24, 2001

Publication date: May 9, 2002

Inventors: Achim Mueller, Hans-Georg Zimmermann
Method for recognizing speech

Publication number: 20020046024

Abstract: A method for recognizing speech is proposed wherein the process of recognition is started using the starting acoustic model (SAM) and wherein the current acoustic model (CAM) is modified by removing or cancelling model function mixture components (MFMjk) which are negligible for the description of the speaking behaviour and quality of the current speaker. Therefore, the size of the acoustic model (SAM, CAM) is reduced by adaptation to the current speaker enabling fast performance and increased recognition efficiency.

Type: Application

Filed: September 5, 2001

Publication date: April 18, 2002

Inventors: Ralf Kompe, Silke Goronzy
Penalized maximum likelihood estimation methods, the baum welch algorithm and diagonal balancing of symmetric matrices for the training of acoustic models in speech recognition

Patent number: 6374216

Abstract: A nonparametric family of density functions formed by histogram estimators for modeling acoustic vectors are used in automatic recognition of speech. A Gaussian kernel is set forth in the density estimator. When the densities are found for all the basic sounds in a training stage, an acoustic vector is assigned to a phoneme label corresponding to the highest likelihood for the basis of the decoding of acoustic vectors into text.

Type: Grant

Filed: September 27, 1999

Date of Patent: April 16, 2002

Assignee: International Business Machines Corporation

Inventors: Charles A. Micchelli, Peder A. Olsen
System, method and article of manufacture for an emotion detection system improving emotion recognition

Patent number: 6353810

Abstract: A voice signal and an emotion associated therewith is provided. Then, the emotion associated with the voice signal is determined in an automated manner and subsequently stored. Next, a user determined emotion associated with the voice signal is determined by a user and received. The automatically determined emotion with the user determined emotion are then compared.

Type: Grant

Filed: August 31, 1999

Date of Patent: March 5, 2002

Assignee: Accenture LLP

Inventor: Valery A. Petrushin
Voice recognition device

Publication number: 20020010581

Abstract: A voice recognition device, where at least two input signals are routed in parallel via respective, separate channels to a recognition device having a feature extraction device for forming feature vectors, a transformation device for forming transformed feature vectors, and having a subsequent classification unit that classifies the supplied, transformed feature vectors and emits output signals corresponding to the determined classes. A high rate of recognition at a relatively low expenditure for the design and processing are achieved in that the feature extraction device has feature extraction stages separately arranged in the individual channels, the feature extraction stages being connected at their outputs to the shared transformation device.

Type: Application

Filed: June 13, 2001

Publication date: January 24, 2002

Inventors: Stephan Euler, Andreas Korthauer
Adaptation system and method for E-commerce and V-commerce applications

Patent number: 6341264

Abstract: Electronic commerce (E-commerce) and Voice commerce (V-commerce) proceeds by having the user speak into the system. The user's speech is converted by speech recognizer into a form required by the transaction processor that effects the electronic commerce operation. A dimensionality reduction processor converts the user's input speech into a reduced dimensionality set of values termed eigenvoice parameters. These parameters are compared with a set of previously stored eigenvoice parameters representing a speaker population (the eigenspace representing speaker space) and the comparison is used by the speech model adaptation system to rapidly adapt the speech recognizer to the user's speech characteristics. The user's eigenvoice parameters are also stored for subsequent use by the speaker verification and speaker identification modules.

Type: Grant

Filed: February 25, 1999

Date of Patent: January 22, 2002

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Roland Kuhn, Jean-Claude Junqua
Bitstream-based feature extraction method for a front-end speech recognizer

Publication number: 20010044718

Abstract: A feature extraction process for use in a wireless communication system provides automatic speech recognition based on both spectral envelope and voicing information. The shape of the spectral envelope is used to determine the LSPs of the incoming bitstream and the adaptive gain coefficients and fixed gain coefficients are used to generate the “voiced” and “unvoiced” feature parameter information.

Type: Application

Filed: December 5, 2000

Publication date: November 22, 2001

Inventors: Richard Vandervoort Cox, Hong Kook Kim
Method and apparatus for clustering-based signal segmentation

Patent number: 6314392

Abstract: In a computerized method a continuous signal is segmented in order to determine statistically stationary units of the signal. The continuous signal is sampled at periodic intervals to produce a timed sequence of digital samples. Fixed numbers of adjacent digital samples are grouped into a plurality of disjoint sets or frames. A statistical distance between adjacent frames is determined. The adjacent sets are merged into a larger set of samples or cluster if the statistical distance is less than a predetermined threshold. In an iterative process, the statistical distance between the adjacent sets are determined, and as long as the distance is less than the predetermined threshold, the sets are iteratively merged to segment the signal into statistically stationary units.

Type: Grant

Filed: September 20, 1996

Date of Patent: November 6, 2001

Assignee: Digital Equipment Corporation

Inventors: Brian S. Eberman, William D. Goldenthal
Speech recognition using both time encoding and HMM in parallel

Patent number: 6301562

Abstract: A speech recognition method that combines time encoding and hidden Markov approaches. The speech is input and encoded using time encoding, such as TESPAR. A hidden Markov model generates scores; the scores are used to determine the speech element; and the result is output.

Type: Grant

Filed: April 27, 2000

Date of Patent: October 9, 2001

Assignee: New Transducers Limited

Inventors: Henry Azima, Charalampos Ferekidis, Sean Kavanagh
Hierarchial subband linear predictive cepstral features for HMM-based speech recognition

Patent number: 6292776

Abstract: A method and apparatus for first training and then recognizing speech. The method and apparatus use subband cepstral features to improve the recognition string accuracy rates for speech inputs.

Type: Grant

Filed: March 12, 1999

Date of Patent: September 18, 2001

Assignee: Lucent Technologies Inc.

Inventor: Rathinavelu Chengalvarayan
Methods and apparatus for performing speech recognition using acoustic models which are improved through an interactive process

Patent number: 6263308

Abstract: Automated methods and apparatus for synchronizing audio and text data, e.g., in the form of electronic files, representing audio and text expressions of the same work or information are described. Also described are automated methods of detecting errors and other discrepancies between the audio and text versions of the same work. A speech recognition operation is performed on the audio data initially using a speaker independent acoustic model. The recognized text in addition to audio time stamps are produced by the speech recognition operation. The recognized text is compared to the text in text data to identify correctly recognized words. The acoustic model is then retrained using the correctly recognized text and corresponding audio segments from the audio data transforming the initial acoustic model into a speaker trained acoustic model. The retrained acoustic model is then used to perform an additional speech recognition operation on the audio data.

Type: Grant

Filed: March 20, 2000

Date of Patent: July 17, 2001

Assignee: Microsoft Corporation

Inventors: David E. Heckerman, Fileno A. Alleva, Robert L. Rounthwaite, Daniel Rosen, Mei-Yuh Hwang, Yoram Yaacovi, John L. Manferdelli
Maximum likelihood method for finding an adapted speaker model in eigenvoice space

Patent number: 6263309

Abstract: A set of speaker dependent models is trained upon a comparatively large number of training speakers, one model per speaker, and model parameters are extracted in a predefined order to construct a set of supervectors, one per speaker. Principle component analysis is then performed on the set of supervectors to generate a set of eigenvectors that define an eigenvoice space. If desired, the number of vectors may be reduced to achieve data compression. Thereafter, a new speaker provides adaptation data from which a supervector is constructed by constraining this supervector to be in the eigenvoice space based on a maximum likelihood estimation. The resulting coefficients in the eigenspace of this new speaker may then be used to construct a new set of model parameters from which an adapted model is constructed for that speaker. Environmental adaptation may be performed by including environmental variations in the training data.

Type: Grant

Filed: April 30, 1998

Date of Patent: July 17, 2001

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Patrick Nguyen, Roland Kuhn, Jean-Claude Junqua
Mobile phone having speaker dependent voice recognition method and apparatus

Patent number: 6260012

Abstract: An apparatus and method for performing improved speech recognition in a communication terminal, e.g., a mobile phone with a hands-free voice dialing function. In a speech recognition mode, a user's input speech such as a desired called party name, number or a phone command, is converted to feature data and compared to individual pre-stored feature data sets corresponding to pre-recorded speech obtained during a registration process. Difference values representing the respective differences between the current user's input speech and the respective data sets are computed. A first closest (most similar) and second closest feature data set correspond to the first smallest and second smallest difference values so obtained. A closeness threshold is computed as the sum of a small, predetermined threshold and a differential value between the first and second difference values.

Type: Grant

Filed: March 1, 1999

Date of Patent: July 10, 2001

Assignee: Samsung Electronics Co., LTD

Inventor: Joung-Kyou Park
Speech recognition and teaching apparatus able to rapidly adapt to difficult speech of children and foreign speakers

Patent number: 6253181

Abstract: The recognizer tests input utterances using a confidence measure to select words of high recognition confidence for use in the adaptation process. Adaptation is performed rapidly using a priori knowledge of about the class of speakers who will be using the system. This a priori knowledge can be expressed using eigenvoice basis vectors that capture information about the entire targeted user population. The dialogue system may also use the confidence measure to output a pronunciation example to the user, based on the confidence that the system has in the results of recognition, given the different possibilities that can be recognized. The dialogue system may also provide voiced prompts that teach the user how to correctly pronounce words.

Type: Grant

Filed: January 22, 1999

Date of Patent: June 26, 2001

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventor: Jean-Claude Junqua
Method and system for measurement of speech distortion from samples of telephonic voice signals

Patent number: 6246978

Abstract: A system that provides measurements of speech distortion that correspond closely to user perceptions of speech distortion is provided. The system calculates and analyzes first and second discrete derivatives to detect and determine the incidence of change in the voice waveform that would not have been made by human articulation because natural voice signals change at a limited rate. Statistical analysis is performed of both the first and second discrete derivatives to detect speech distortion by looking at the distribution of the signals. For example, the kurtosis of the signals is analyzed as well as the number of times these values exceed a predetermined threshold. Additionally, the number of times the first derivative data is less than a predetermined low value is analyzed to provide a level of speech distortion and clipping of the signal due to lost data packets.

Type: Grant

Filed: May 18, 1999

Date of Patent: June 12, 2001

Assignee: MCI WorldCom, Inc.

Inventor: William C. Hardy
Factorial packing method and apparatus for information coding

Patent number: 6236960

Abstract: An improved speech coder takes advantage of the fact that any given pulse combination can be uniquely described by the following four properties: number of degenerate pulses, signs of pulses, positions of pulses, and pulse magnitudes. In accordance with the invention, a four stage iterative classification of the pulse combinations, where each stage groups the pulse combinations by one of these four properties, is performed. The process starts with the number of pulses, then determines the total number of possible sign combinations, pulse position combinations, and pulse magnitude combinations. This flexibility allows for the sign combinations to be grouped in the last stage. Since the number of sign combinations is always a power of two, leaving the sign combinations for last along with appropriately ordering the elements in the previous three stages allows the signs to be coded by independent bits, in turn allowing for error protection of those bits.

Type: Grant

Filed: August 6, 1999

Date of Patent: May 22, 2001

Assignee: Motorola, Inc.

Inventors: Weimin Peng, Edgardo Manuel Cruz Zeno, James Patrick Ashley
Path link passing speech recognition with vocabulary node being capable of simultaneously processing plural path links

Patent number: 6230128

Abstract: A path link passing speech recognition system and method recognizes input connected speech. The recognition system has a plurality of vocabulary nodes associated with word representation models, at least one of the vocabulary nodes of the network being able to process more than one path link simultaneously, so allowing for more than one recognition result.

Type: Grant

Filed: November 21, 1995

Date of Patent: May 8, 2001

Assignee: British Telecommunications public limited company

Inventor: Samuel Gavin Smyth
Word-spotting speech recognition device and system

Patent number: 6230126

Abstract: A device for speech recognition includes a dictionary which stores features of recognition objects. The device further includes a matching unit which compares features of input speech with the features of the recognition objects, and a dictionary updating unit which updates time lengths of phonemics in the dictionary based on the input speech when the matching unit finds substantial similarities between the input speech and one of the recognition objects.

Type: Grant

Filed: December 17, 1998

Date of Patent: May 8, 2001

Assignee: Ricoh Company, Ltd.

Inventor: Masaru Kuroda
Method for integrating computer processes with an interface controlled by voice actuated grammars

Patent number: 6208972

Abstract: According to the present invention, a method for integrating processes with a multi-faceted human centered interface is provided. The interface is facilitated to implement a hands free, voice driven environment to control processes and applications. A natural language model is used to parse voice initiated commands and data, and to route those voice initiated inputs to the required applications or processes. The use of an intelligent context based parser allows the system to intelligently determine what processes are required to complete a task which is initiated using natural language. A single window environment provides an interface which is comfortable to the user by preventing the occurrence of distracting windows from appearing. The single window has a plurality of facets which allow distinct viewing areas. Each facet has an independent process routing its outputs thereto. As other processes are activated, each facet can reshape itself to bring a new process into one of the viewing areas.

Type: Grant

Filed: December 23, 1998

Date of Patent: March 27, 2001

Inventors: Richard Grant, Pedro E. McGregor
Synthesis subband filter in MPEG-II audio decoding

Patent number: 6199039

Abstract: An MPEG-II audio decoder with a synthesis subband filter includes a fast IMDCT (Inverse Modified Discrete Cosine Transform) module and an IPQMF (Inverse Pseudo Quadrature Mirror Filter) module. The fast IMDCT module involves a butterfly stage of input subband samples which requires only about ¼ the amount of multiplier-accumulate computation of the ISO suggested method. The IPQMF module involves an efficient memory configuration which requires only half size of the standard synthesis subband filter bank.

Type: Grant

Filed: August 3, 1998

Date of Patent: March 6, 2001

Assignee: National Science Council

Inventors: Liang-Gee Chen, Tsung-Han Tsai, Yuan-Chen Liu
Matching algorithm for isolated speech recognition

Patent number: 6195639

Abstract: The present invention provides a system and method for improving conventional, isolated word, speech recognition systems. According to exemplary embodiments of the present invention, a pattern matching algorithm is provided that permits an unknown speech signal to be recognized with fewer memory access operations compared to conventional techniques. The pattern matching algorithm performs multiple successive calculations on speech reference data retrieved from memory to thereby reduce the number of times that the same data is retrieved. By reducing the number of memory access operations, the throughput of the speech recognition system can be increased. As an alternative, the pattern matching algorithm allows for an increase in the size of the speech recognition system's vocabulary.

Type: Grant

Filed: May 28, 1999

Date of Patent: February 27, 2001

Assignee: Telefonaktiebolaget LM Ericsson (publ)

Inventors: Alberto Jimenez Feltström, Jim Rasmusson
Multiresolutional classifier with training system and method

Patent number: 6192353

Abstract: An improved method and system for training and classifying using a low complexity and high accuracy multiresolutional polynomial classifier (412) is presented. A method of training an multiresolutional polynomial classifier which reduces the complexity of existing classifiers allows models representing subgroups of classes to easily be created. The models which represent subgroups of classes are applied to an unidentified input to produce a coarse classification of the unidentified input using a low order classifier. Once a coarse classification of the unidentified input is performed, a more detailed classification is performed using another low complexity classifier.

Type: Grant

Filed: February 9, 1998

Date of Patent: February 20, 2001

Assignee: Motorola, Inc.

Inventors: Khaled Assaleh, William Michael Campbell, John Eric Kleider
Topic indexing method

Patent number: 6185531

Abstract: A method for improving the associating articles of information or stories with topics associated with specific subjects (subject topics) and with a general topic of words that are not associated with any subject. The inventive method is trained using Hidden Markov Models (HMM) to represent each story with each state in the HMM representing each topic. A standard Expectation and Maximization algorithm, as are known in this art field can be used to maximize the expected likelihood to the method relating the words associated with each topic to that topic. In the method, the probability that each word in a story is related to a subject topic is determined and evaluated, and the subject topics with the lowest probability are discarded. The remaining subject topics are evaluated and a sub-set of subject topics with the highest probabilities over all the words in a story are considered to be the “correct” subject topic set.

Type: Grant

Filed: January 9, 1998

Date of Patent: February 6, 2001

Assignee: GTE Internetworking Incorporated

Inventors: Richard M. Schwartz, Toru Imai
Method and apparatus for detecting voice activity

Patent number: 6182035

Abstract: A voice activity detector that implements a fast wavelet transformation using filter pairs. A quadrature high pass filter provides an output signal corresponding to the upper half of the Nyquist frequency and a quadrature low pass filter provides an output signal corresponding to the lower half of the Nyquist frequency. The quadrature high pass filter is useful for catching and isolating transients in the input signal and the quadrature low pass filter is useful for fine frequency analysis. The voice activity detector can utilize multiple decomposition levels that are arranged in a pyramid or tree formation to increase the reliability of the voice activity decision. For example, the output of the quadrature low pass filter can be further decomposed using a second pair of filters. The voice activity decision can be generated by comparing a signal power estimate for the output of the filter pairs to threshold levels that are specific for each filter or frequency range.

Type: Grant

Filed: March 26, 1998

Date of Patent: January 30, 2001

Assignee: Telefonaktiebolaget LM Ericsson (publ)

Inventor: Fisseha Mekuria
Rapid adaptation of speech models

Patent number: 6151575

Abstract: A source-adapted model for use in speech recognition is generated by defining a linear relationship between a first element of an initial model and a first element of the source-adapted model. Thereafter, speech data that corresponds to the first element of the initial model is assembled from a set of speech data for a particular source associated with the source-adapted model. A linear transform that maps between the assembled speech data and the first element of the initial model is then determined. Finally, a first element of the source-adapted model is produced from the first element of the initial model using the linear transform.

Type: Grant

Filed: October 28, 1997

Date of Patent: November 21, 2000

Assignee: Dragon Systems, Inc.

Inventors: Michael Jack Newman, Laurence S. Gillick, Venkatesh Nagesha
Identification-function calculator, identification-function calculating method, identification unit, identification method, and speech recognition system

Patent number: 6134525

Abstract: A discriminant or identification function is used for pattern recognition in which the highest performance can be offered when adaptation is made. Learning is carried out while a discriminant or identification function is adapted to a learning sample. For example, a standard pattern of the character "A" used as an identification function is learned such that when the character "A" slanting in the right or left direction is input, the standard pattern of the character "A" is rotated (adapted) in accordance with the slanting of the input learning sample.

Type: Grant

Filed: October 21, 1998

Date of Patent: October 17, 2000

Assignee: Sony Corporation

Inventor: Naoto Iwahashi
Method and apparatus for discriminative utterance verification using multiple confidence measures

Patent number: 6125345

Abstract: A multiple confidence measures subsystem of an automated speech recognition system allows otherwise independent confidence measures to be integrated and used for both training and testing on a consistent basis. Speech to be recognized is input to a speech recognizer and a recognition verifier of the multiple confidence measures subsystem. The speech recognizer generates one or more confidence measures. The speech recognizer preferably generates a misclassification error (MCE) distance as one of the confidence measures. The recognized speech output by the speech recognizer is input to the recognition verifier, which outputs one or more confidence measures. The recognition verifier preferably outputs a misverification error (MVE) distance as one of the confidence measures. The confidence measures output by the speech recognizer and the recognition verifier are normalized and then input to an integrator.

Type: Grant

Filed: September 19, 1997

Date of Patent: September 26, 2000

Assignee: AT&T Corporation

Inventors: Piyush C. Modi, Mazin G. Rahim
Check-sum based method and apparatus for performing speech recognition

Patent number: 6122612

Abstract: A method and apparatus for matching at least a first input identifier with a reference identifier. A user provides an input identifier into a system, and the system produces a recognized identifier based on the input identifier. The system of the present invention perform a check-sum operation to determine whether the recognized identifier was recognized correctly. If the check-sum operation reveals that the recognized identifier is incorrect, the system of the present invention generates a plurality of substitute identifiers. The substitute identifiers are compared to a set of pre-stored reference identifiers. If a match is found between a reference identifier and a substitute identifier, the matched reference identifier is selected as corresponding to the input identifier provided by the user.

Type: Grant

Filed: November 20, 1997

Date of Patent: September 19, 2000

Assignee: AT&T Corp

Inventor: Randy G. Goldberg
Method for training a speech recognition system and an apparatus for practising the method, in particular, a portable telephone apparatus

Patent number: 6078883

Abstract: For training a speech recognition to a multi-item repertoire, the following steps are executed: a speech item is presented by a user person, and the distinctivity thereof in the repertoire is asserted. Under control of a distinctivity found the speech item is inserted into the repertoire. These steps are repeated until reaching repertoire sufficiency. In particular, the asserting determines a likeness among the actually presented speech item and all items already in the repertoire, wherein undue likeness with one particular stored item creates a contingency procedure. This implies offering to the user a choice between ignoring the actually presented speech item and alternatively inserting the actually presented speech item at a price of deleting the particular stored item.

Type: Grant

Filed: December 17, 1997

Date of Patent: June 20, 2000

Assignee: U.S. Philips Corporation

Inventors: Benoit Guilhaumon, Gilles Miet
Methods and apparatus for discriminative training and adaptation of pronunciation networks

Patent number: 6076053

Abstract: A speech recognition method comprises the steps of using given speech data and the N-best algorithm to generate alternative pronunciations and then merging the obtained pronunciations into a pronunciation networks structure; using additional parameters to characterize a pronunciation network for a particular word; optimizing the parameters of the pronunciation networks using a minimum classification error criterion that maximizes a discrimination between different pronunciation networks; and adapting parameters of the pronunciation networks by, first, adjusting probabilities of the possible pronunciations that may be generated by the pronunciation network for a word claimed to be a true one and, second, to correct weights for all of the pronunciation networks by using the adjusted probabilities.

Type: Grant

Filed: May 21, 1998

Date of Patent: June 13, 2000

Assignee: Lucent Technologies Inc.

Inventors: Biing-Hwang Juang, Filipp E. Korkmazskiy
Adaptive speech recognition with selective input data to a speech classifier

Patent number: 6044343

Abstract: One embodiment of a speech recognition system is organized with speech input signal preprocessing and feature extraction followed by a fuzzy matrix quantizer (FMQ) designed with respective codebook sets at multiple signal to noise ratios. The FMQ quantizes various training words from a set of vocabulary words and produces observation sequences O output data to train a hidden Markov model (HMM) processes .lambda.j and produces fuzzy distance measure output data for each vocabulary word codebook. A fuzzy Viterbi algorithm is used by a processor to compute maximum likelihood probabilities PR(O.vertline..lambda.j) for each vocabulary word. The fuzzy distance measures and maximum likelihood probabilities are mixed in a variety of ways to preferably optimize speech recognition accuracy and speech recognition speed performance.

Type: Grant

Filed: June 27, 1997

Date of Patent: March 28, 2000

Assignee: Advanced Micro Devices, Inc.

Inventors: Lin Cong, Safdar M. Asghar
Hierarchical labeler in a speech recognition system

Patent number: 6023673

Abstract: A speech coding apparatus and method uses a hierarchy of prototype sets to code an utterance while consuming fewer computing resources. The value of at least one feature of an utterance is measured during each of a series of successive time intervals to produce a series of feature vector signals representing the feature values. A plurality of level subsets of prototype vector signals is computed, wherein each prototype vector signal in a higher level subset is associated with at least one prototype vector signal in a lower level subset. Each level subset contains a plurality of prototype vector signals, with lower level subsets containing more prototypes than higher level subsets. The closeness of the feature value of the first feature vector signal is compared to the parameter values of prototype vector signals in the first level subset of prototype vector signals to obtain a ranked list of prototype match scores for the first feature vector signal and each prototype vector signal in the first level subset.

Type: Grant

Filed: June 4, 1997

Date of Patent: February 8, 2000

Assignee: International Business Machines Corporation

Inventors: Raimo Bakis, David Nahamoo, Michael Alan Picheny, Jan Sedivy
Line spectral frequencies and energy features in a robust signal recognition system

Patent number: 6009391

Abstract: One embodiment of a speech recognition system is organized with speech input signal preprocessing and feature extraction followed by a fuzzy matrix quantizer (FMQ). Frames of the speech input signal are represented in a matrix by a vectorf of line spectral pair frequencies and energy coefficients and are fuzzy matrix quantized to respective vector f entries of a matrix codeword in a codebook of the FMQ. The energy coefficients include the original energy and the first and second derivatives of the original energy which increase recognition accuracy by, for example, being generally distinctive speech input signal parameters and providing noise signal suppression especially when the noise signal has a relatively constant energy over at least two time frame intervals. To reduce data while maintaining sufficient resolution, the energy coefficients may be normalized and logarithmically represented. A distance measure between f and f, d(f, f), is defined as ##EQU1## where the constants .alpha..sub.1, .alpha..sub.

Type: Grant

Filed: August 6, 1997

Date of Patent: December 28, 1999

Assignee: Advanced Micro Devices, Inc.

Inventors: Safdar M. Asghar, Lin Cong
Method and system of adapting speech recognition models to speaker environment

Patent number: 6003002

Abstract: The method and system of adapting speech recognition models to a speaker environment may comprise receiving a spoken password (52) and getting a set of speaker independent (SI) speech recognition models (54). A mapping sequence may be determined for the spoken password (56). Using the mapping sequence, a speaker ID may be identified (58). A transform may be determined (66) between the SI speech recognition models and the spoken password using the mapping sequence. Speaker adapted (SA) speech recognition models may be generated (68) by applying the transform to SI speech recognition models. A speech input may be recognized (70) by applying the SA speech recognition models.

Type: Grant

Filed: December 29, 1997

Date of Patent: December 14, 1999

Assignee: Texas Instruments Incorporated

Inventor: Lorin P. Netsch
Speech compression by speech recognition

Patent number: 5987405

Abstract: A method of transmitting speech signals with reduced bandwith requirements. With this invention an original speech signal is first converted to a textual representation, and a facsimile of the original speech is determined from the textual representation. Then a minimum error turn is derived from the difference between the original speech signal and the facsimile of the original speech signal. The minimum error turn is then compressed, and it is this compressed minimum error turn, along with the textual representation, that is transmitted on the communications medium. At the receiving end, the textual representation and the difference representation are split through a demultiplexer. The textual representation is then passed through a synthesizer while the difference representation is passed through a mapper.

Type: Grant

Filed: June 24, 1997

Date of Patent: November 16, 1999

Assignee: International Business Machines Corporation

Inventors: David Frederick Bantz, Robert Joseph Zavrel, Jr.
Evaluation of media content in media files

Patent number: 5983176

Abstract: A method and apparatus for searching for multimedia files in a distributed database and for displaying results of the search based on the context and content of the multimedia files.

Type: Grant

Filed: April 30, 1997

Date of Patent: November 9, 1999

Assignee: Magnifi, Inc.

Inventors: Eric M. Hoffert, Karl Cremin, Leo Degen
Apparatus and method for performing model estimation utilizing a discriminant measure

Patent number: 5970239

Abstract: Method for performing acoustic model estimation to optimize classification accuracy on speaker derived feature vectors with respect to a plurality of classes corresponding to phones to which a plurality of acoustic models respectively correspond comprises: (a) initializing an acoustic model for each phone; (b) evaluating the merit of the acoustic model initialized for each phone utilizing an objective function having a two component discriminant measure capable of characterizing each phone whereby a first component is defined as a probability that the model for the phone assigns to feature vectors from the phone and a second component is defined as a probability that the model for the phone assigns to feature vectors from other phones; (c) adapting the model for selected phones so as to increase the first component for the phone or decrease the second component for the phone, the adapting step yielding a new model for each selected phone; (d) evaluating the merit of the new models for each phone adapted in st

Type: Grant

Filed: August 11, 1997

Date of Patent: October 19, 1999

Assignee: International Business Machines Corporation

Inventors: Lalit Rai Bahl, Mukund Padmanabhan
Speech recognition training

Patent number: 5963906

Abstract: A method and system performs speech recognition training using Hidden Markov Models. Initially, preprocessed speech signals that include a plurality of observations are stored by the system. Initial Hidden Markov Model (HMM) parameters are then assigned. Summations are then calculated using modified equations derived substantially from the following equations, wherein u.ltoreq.v<w:P(X.sub.u.sup.v)=P(x.sub.u.sup.v)P(x.sub.v+1.sup.w)and.OMEGA..sub.ij (x.sub.u.sup.w)=.OMEGA..sub.ij (x.sub.u.sup.v)P(x.sub.v+1.sup.w)+P(x.sub.u.sup.v).OMEGA..sub.ij (x.sub.v+1.sup.w)The calculated summations are then used to perform HMM parameter reestimation. It then determines whether the HMM parameters have converged. If they have, the HMM parameters are then stored. However, if the HMM parameters have not converged, the system again calculates summations, performs HMM parameter reestimation using the summations, and determines whether the parameters have converged.

Type: Grant

Filed: May 20, 1997

Date of Patent: October 5, 1999

Assignee: AT & T Corp

Inventor: William Turin
Method and arrangement for adaptation of data models

Patent number: 5960392

Abstract: A method and an arrangement for adapting data models in adaptive speaker verification systems or similar adaptive systems using models based on data collected from a person, system or process during a certain time period. A plurality of different model units are used in the same speaker verification system. The verification system is put into operation using a simple model unit requiring a small amount of speech data. During the use, more speech data is collected continuously. This speech data is used to further train either (1) only more complex model units, or (2) both the simple model unit already in operation and the more complex model units. At suitable intervals, a comparison is made of the performance capacities of the model units. Once a more complex model unit yields a more reliable verification result, the more complex model unit is assigned to take over the verification in the operating situation. The more complex model unit may be put into operation either instantaneously or gradually, e.g.

Type: Grant

Filed: August 15, 1997

Date of Patent: September 28, 1999

Assignee: Telia Research AB

Inventors: Erik Sundberg, Hakan Melin
Speech processing apparatus and method using a noise-adaptive PMC model

Patent number: 5956679

Abstract: A speech processing apparatus includes a noise model production device for extracting a noise-speech interval from input speech data and producing a noise model by using the data of the extracted interval. The apparatus also includes a composite distribution production device for dividing the distributions of a speech model into a plurality of groups, producing a composite distribution of each group, and determining the positional relationship of each distribution within each group. In addition, the apparatus includes a memory for storing each composite distribution and the positional relationship of each distribution within the group, and a PMC conversion device for PMC-converting each produced composite distribution. Also provided is a noise-adaptive speech model production device for producing a noise-adaptive speech model on the basis of the composite distribution which is PMC-converted by the PMC conversion device and the positional relationship stored by the memory.

Type: Grant

Filed: December 2, 1997

Date of Patent: September 21, 1999

Assignee: Canon Kabushiki Kaisha

Inventors: Yasuhiro Komori, Hiroki Yamamoto
Method of recognizing a sequence of words and device for carrying out the method

Patent number: 5946655

Abstract: When a language model is to be used for the recognition of a speech signal and the vocabulary is composed as a tree, the language model value cannot be taken into account before the word end. Customarily, after each word end the comparison with a tree root is started anew, be it with a score which has been increased by the language model value so that the threshold value for the scores at which hypotheses are terminated must be high and hence many, even unattractive hypotheses remain active for a prolonged period of time. In order to avoid this, in accordance with the invention a correction value is added to the score for at least a part of the nodes of the vocabulary tree; the sum of the correction values on the path to a word then may not be greater than the language model value for the relevant word. As a result, for each test signal the scores of all hypotheses are of a comparable order of magnitude.

Type: Grant

Filed: March 29, 1995

Date of Patent: August 31, 1999

Assignee: U.S. Philips Corporation

Inventors: Volker Steinbiss, Bach-Hiep Tran, Hermann Ney

prev … 11 12 13 14 15 16 next