Patents by Inventor Sankar Basu

Sankar Basu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Speech driven lip synthesis using viseme based hidden markov models

Patent number: 6366885

Abstract: A method of speech driven lip synthesis which applies viseme based training models to units of visual speech. The audio data is grouped into a smaller number of visually distinct visemes rather than the larger number of phonemes. These visemes then form the basis for a Hidden Markov Model (HMM) state sequence or the output nodes of a neural network. During the training phase, audio and visual features are extracted from input speech, which is then aligned according to the apparent viseme sequence with the corresponding audio features being used to calculate the HMM state output probabilities or the output of the neutral network. During the synthesis phase, the acoustic input is aligned with the most likely viseme HMM sequence (in the case of an HMM based model) or with the nodes of the network (in the case of a neural network based system), which is then used for animation.

Type: Grant

Filed: August 27, 1999

Date of Patent: April 2, 2002

Assignee: International Business Machines Corporation

Inventors: Sankar Basu, Tanveer Atzal Faruquie, Chalapathy V. Neti, Nitendra Rajput, Andrew William Senior, L. Venkata Subramaniam, Ashish Verma
Nongaussian density estimation for the classification of acoustic feature vectors in speech recognition

Patent number: 6269334

Abstract: A statistical modeling paradigm for automatic machine recognition of speech uses mixtures of nongaussion statistical probability densities which provides improved recognition accuracy. Speech is modeled by building probability densities from functions of the form exp(−t&agr;/2) for t≧0 and &agr;>0. Mixture components are constructed from different univariate functions. The mixture model is used in a maximum likelihood model of speech data.

Type: Grant

Filed: June 25, 1998

Date of Patent: July 31, 2001

Assignee: International Business Machines Corporation

Inventors: Sankar Basu, Charles A. Micchelli
Wavelet-based energy binning cepstal features for automatic speech recognition

Patent number: 6253175

Abstract: Systems and methods for processing acoustic speech signals which utilize the wavelet transform (and alternatively, the Fourier transform) as a fundamental tool. The method essentially involves “synchrosqueezing” spectral component data obtained by performing a wavelet transform (or Fourier transform) on digitized speech signals. In one aspect, spectral components of the synchrosqueezed plane are dynamically tracked via a K-means clustering algorithm. The amplitude, frequency and bandwidth of each of the components are, thus, extracted. The cepstrum generated from this information is referred to as “K-mean Wastrum.” In another aspect, the result of the K-mean clustering process is further processed to limit the set of primary components to formants. The resulting features are referred to as “formant-based wastrum.” Formants are interpolated in unvoiced regions and the contribution of unvoiced turbulent part of the spectrum are added.

Type: Grant

Filed: November 30, 1998

Date of Patent: June 26, 2001

Assignee: International Business Machines Corporation

Inventors: Sankar Basu, Stephane H. Maes
Methods and apparatus for audio-visual speaker recognition and utterance verification

Patent number: 6219640

Abstract: Methods and apparatus for performing speaker recognition comprise processing a video signal associated with an arbitrary content video source and processing an audio signal associated with the video signal. Then, an identification and/or verification decision is made based on the processed audio signal and the processed video signal. Various decision making embodiments may be employed including, but not limited to, a score combination approach, a feature combination approach, and a re-scoring approach. In another aspect of the invention, a method of verifying a speech utterance comprises processing a video signal associated with a video source and processing an audio signal associated with the video signal. Then, the processed audio signal is compared with the processed video signal to determine a level of correlation between the signals. This is referred to as unsupervised utterance verification.

Type: Grant

Filed: August 6, 1999

Date of Patent: April 17, 2001

Assignee: International Business Machines Corporation

Inventors: Sankar Basu, Homayoon S. M. Beigi, Stephane Herman Maes, Benoit Emmanuel Ghislain Maison, Chalapathy Venkata Neti, Andrew William Senior

prev 1 2 3

Speech driven lip synthesis using viseme based hidden markov models

Nongaussian density estimation for the classification of acoustic feature vectors in speech recognition

Wavelet-based energy binning cepstal features for automatic speech recognition

Methods and apparatus for audio-visual speaker recognition and utterance verification