Patents by Inventor Peter V. de Souza

Peter V. de Souza has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Enhanced likelihood computation using regression in a speech recognition system

Patent number: 6493667

Abstract: In order to achieve low error rates in a speech recognition system, for example, in a system employing rank-based decoding, we discriminate the most confusable incorrect leaves from the correct leaf by lowering their ranks. That is, we increase the likelihood of the correct leaf of a frame, while decreasing the likelihoods of the confusable leaves. In order to do this, we use the auxiliary information from the prediction of the neighboring frames to augment the likelihood computation of the current frame. We then use the residual errors in the predictions of neighboring frames to discriminate between the correct (best) and incorrect leaves of a given frame. We present a new methodology that incorporates prediction error likelihoods into the overall likelihood computation to improve the rank position of the correct leaf.

Type: Grant

Filed: August 5, 1999

Date of Patent: December 10, 2002

Assignee: International Business Machines Corporation

Inventors: Peter V. de Souza, Yuqing Gao, Michael Picheny, Bhuvana Ramabhadran
Method and apparatus for tone-sensitive acoustic modeling

Patent number: 5884261

Abstract: Tone-sensitive acoustic models are generated by first generating acoustic vectors which represent the input data. The input data is separated into multiple frames and an acoustic vector is generated for each frame which represents the input data over its corresponding frame. A tone-sensitive parameter is then generated for each of the frames which indicates the tone of the input data at its corresponding frame. Tone-sensitive parameters are generated in accordance with two embodiments. First, a pitch detector may be used to calculate a pitch for each of the frames. If a pitch cannot be detected for a particular frame, then a pitch is created for that frame based on the pitch values of surrounding frames. Second, the cross covariance between the autocorrelation coefficients for each frame and its successive frame may be generated and used as the tone-sensitive parameter.

Type: Grant

Filed: July 7, 1994

Date of Patent: March 16, 1999

Assignee: Apple Computer, inc.

Inventors: Peter V. de Souza, Adam B. Fineberg, Hsiao-Wuen Hon, Baosheng Yuan
Speech recognition using dynamic features

Patent number: 5615299

Abstract: A speech recognition technique utilizes a set of N different principal discriminant matrices. Each principal discriminant matrix is associated with a distinct class. The class is an indication of the proximity of a speech segment to neighboring phones. A technique for speech encoding includes arranging speech signal into a series of frames. A feature vector is derived which represents the speech signal for a speech segment or series of speech segments for each frame. A set of N different projected vectors are generated for each frame, by multiplying the principal discriminant matrices by the vector. This speech encoding technique is capable of being used in speech recognition systems by utilizing models, in which each model transition is tagged with one of the N classes. The projected vector is utilized with the corresponding tag to compute the probability that at least one particular speech port is present in said frame.

Type: Grant

Filed: June 20, 1994

Date of Patent: March 25, 1997

Assignee: International Business Machines Corporation

Inventors: Lahit R. Bahl, Peter V. de Souza, Ponani Gopalakrishnan, Michael A. Picheny
Speech recognizer having a speech coder for an acoustic match based on context-dependent speech-transition acoustic models

Patent number: 5333236

Abstract: A speech coding apparatus compares the closeness of the feature value of a feature vector signal of an utterance to the parameter values of prototype vector signals to obtain prototype match scores for the feature vector signal and each prototype vector signal. The speech coding apparatus stores a plurality of speech transition models representing speech transitions. At least one speech transition is represented by a plurality of different models. Each speech transition model has a plurality of model outputs, each comprising a prototype match score for a prototype vector signal. Each model output has an output probability. A model match score for a first feature vector signal and each speech transition model comprises the output probability for at least one prototype match score for the first feature vector signal and a prototype vector signal.

Type: Grant

Filed: September 10, 1992

Date of Patent: July 26, 1994

Assignee: International Business Machines Corporation

Inventors: Lalit R. Bahl, Peter V. De Souza, Ponani S. Gopalakrishnan, Michael A. Picheny
Speech coding apparatus having speaker dependent prototypes generated from nonuser reference data

Patent number: 5278942

Abstract: A speech coding apparatus and method for use in a speech recognition apparatus and method. The value of at least one feature of an utterance is measured during each of a series of successive time intervals to produce a series of feature vector signals representing the feature values. A plurality of prototype vector signals, each having at least one parameter value and a unique identification value are stored. The closeness of the feature vector signal is compared to the parameter values of the prototype vector signals to obtain prototype match scores for the feature value signal and each prototype vector signal. The identification value of the prototype vector signal having the best prototype match score is output as a coded representation signal of the feature vector signal. Speaker-dependent prototype vector signals are generated from both synthesized training vector signals and measured training vector signals.

Type: Grant

Filed: December 5, 1991

Date of Patent: January 11, 1994

Assignee: International Business Machines Corporation

Inventors: Lalit R. Bahl, Jerome R. Bellegarda, Peter V. De Souza, Ponani S. Gopalakrishnan, Arthur J. Nadas, David Nahamoo, Michael A. Picheny
Context-dependent speech recognizer using estimated next word context

Patent number: 5233681

Abstract: A speech recognition apparatus and method estimates the next word context for each current candidate word in a speech hypothesis. An initial model of each speech hypothesis comprises a model of a partial hypothesis of zero or more words followed by a model of a candidate word. An initial hypothesis score for each speech hypothesis comprises an estimate of the closeness of a match between the initial model of the speech hypothesis and a sequence of coded representations of the utterance. The speech hypotheses having the best initial hypothesis scores form an initial subset. For each speech hypothesis in the initial subset, the word which is most likely to follow the speech hypothesis is estimated. A revised model of each speech hypothesis in the initial subset comprises a model of the partial hypothesis followed by a revised model of the candidate word. The revised candidate word model is dependent at least on the word which is estimated to be most likely to follow the speech hypothesis.

Type: Grant

Filed: April 24, 1992

Date of Patent: August 3, 1993

Assignee: International Business Machines Corporation

Inventors: Lalit R. Bahl, Peter V. De Souza, Ponani S. Gopalakrishnan, Michael A. Picheny
Speech recognition apparatus having a speech coder outputting acoustic prototype ranks

Patent number: 5222146

Abstract: A speech coding and speech recognition apparatus. The value of at least one feature of an utterance is measured over each of a series of successive time intervals to produce a series of feature vector signals. The closeness of the feature value of each feature vector signal to the parameter value of each of a set of prototype vector signals is determined to obtain prototype match scores for each vector signal and each prototype vector signal. For each feature vector signal, first-rank and second-rank scores are associated with the prototype vector signals having the best and second best prototype match scores, respectively. For each feature vector signal, at least the identification value and the rank score of the first-ranked and second-ranked prototype vector signals are output as a coded utterance representation signal of the feature vector signal, to produce a series of coded utterance representation signals.

Type: Grant

Filed: October 23, 1991

Date of Patent: June 22, 1993

Assignee: International Business Machines Corporation

Inventors: Latit R. Bahl, Peter V. De Souza, Ponani S. Gopalakrishnan, Michael A. Picheny
Apparatus and method of grouping utterances of a phoneme into context-dependent categories based on sound-similarity for automatic speech recognition

Patent number: 5195167

Abstract: Symbol feature values and contextual feature values of each event in a training set of events are measured. At least two pairs of complementary subsets of observed events are selected. In each pair of complementary subsets of observed events, one subset has contextual features with values in a set C.sub.n, and the other set has contextual features with values in a set C.sub.n, were the sets in C.sub.n and C.sub.n are complementary sets of contextual feature values. For each subset of observed events, the similarity values of the symbol features of the observed events in the subsets are calculated. For each pair of complementary sets of observed events, a "goodness of fit" is the sum of the symbol feature value similarity of the subsets. The sets of contextual feature values associated with the subsets of observed events having the best "goodness of fit" are identified and form context-dependent bases for grouping the observed events into two output sets.

Type: Grant

Filed: April 17, 1992

Date of Patent: March 16, 1993

Assignee: International Business Machines Corporation

Inventors: Lalit R. Bahl, Peter V. De Souza, Ponani S. Gopalakrishnan, David Nahamoo, Michael A. Picheny
Speaker-independent label coding apparatus

Patent number: 5182773

Abstract: The present invention is related to speech recognition and particularly to a new type of vector quantizer and a new vector quantization technique in which the error rate of associating a sound with an incoming speech signal is drastically reduced. To achieve this end, the present invention technique groups the feature vectors in a space into different prototypes at least two of which represent a class of sound. Each of the prototypes may in turn have a number of subclasses or partitions. Each of the prototypes and their subclasses may be assigned respective identifying values. To identify an incoming speech feature vector, at least one of the feature values of the incoming feature vector is compared with the different values of the respective prototypes, or the subclasses of the prototypes.

Type: Grant

Filed: March 22, 1991

Date of Patent: January 26, 1993

Assignee: International Business Machines Corporation

Inventors: Lalit R. Bahl, Michael A. Picheny, David Nahamoo, Peter V. de Souza
Method and apparatus for modeling words with multi-arc markov models

Patent number: 5129001

Abstract: Modeling a word is done by concatenating a series of elemental models to form a word model. At least one elemental model in the series is a composite elemental model formed by combining the starting states of at least first and second primitive elemental models. Each primitive elemental model represents a speech component. The primitive elemental models are combined by a weighted combination of their parameters in proportion to the values of the weighting factors. To tailor the word model to closely represent variations in the pronunciation of the word, the word is uttered a plurality of times by a plurality of different speakers. Constructing word models from composite elemental models, and constructing composite elemental models from primitive elemental models enables word models to represent many variations in the pronunciation of a word.

Type: Grant

Filed: April 25, 1990

Date of Patent: July 7, 1992

Assignee: International Business Machines Corporation

Inventors: Lalit R. Bahl, Jerome R. Bellegarda, Peter V. De Souza, Ponani S. Gopalakrishnan, David Nahamoo, Michael A. Picheny