Patents by Inventor Sankar Basu

Sankar Basu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Adaptive probabilistic query expansion

Patent number: 7437349

Abstract: A method, system and computer program for adaptively processing a query search. An expanding operation is utilized to expand the query into sub-queries, wherein at least one of the sub-queries is expanded probabilistically. A retrieving operation retrieves the results of the sub-queries, and a merging operation is used to merge the sub-query results into a search result. An adapting operation is configured to modify the search such that the relevance of the search result is increased when the search is repeated.

Type: Grant

Filed: May 10, 2002

Date of Patent: October 14, 2008

Assignee: International Business Machines Corporation

Inventors: Sankar Basu, Milind R. Naphade, John R. Smith
Method and apparatus for audio-visual speech detection and recognition

Patent number: 6816836

Abstract: Techniques for providing speech recognition comprise the steps of processing a video signal associated with an arbitrary content video source, processing an audio signal associated with the video signal, and recognizing at least a portion of the processed audio signal, using at least a portion of the processed video signal, to generate an output signal representative of the audio signal.

Type: Grant

Filed: August 30, 2002

Date of Patent: November 9, 2004

Assignee: International Business Machines Corporation

Inventors: Sankar Basu, Philippe Christian de Cuetos, Stephane Herman Maes, Chalapathy Venkata Neti, Andrew William Senior
Method and apparatus for active annotation of multimedia content

Publication number: 20040205482

Abstract: Semantic indexing and retrieval of multimedia content requires that the content is sufficiently annotated. However, the great volumes of multimedia data and diversity of labels make annotation a difficult and costly process. Disclosed is an annotation framework in which supervised training with partially labeled data is facilitated using active learning. The system trains a classifier with a small set of labeled data and subsequently updates the classifier by selecting a subset of the available data-set according to optimization criteria. The process results in propagation of labels to unlabeled data and greatly facilitates the user in annotating large amounts of multimedia content.

Type: Application

Filed: January 24, 2002

Publication date: October 14, 2004

Applicant: International Business Machines Corporation

Inventors: Sankar Basu, Ching-Yung Lin, Milind R. Naphade, John R. Smith, Belle L. Tseng
Impulsivity estimates of mixtures of the power exponential distrubutions in speech modeling

Patent number: 6804648

Abstract: A parametric family of multivariate density functions formed by mixture models from univariate functions of the type exp(−|x|&bgr;) for modeling acoustic feature vectores are used in automatic recognition of speech. The parameter &bgr; is used to measure the non-Gaussian nature of the data. &bgr; is estimated from the input data using a maximum likelihood criterion. There is a balance between &bgr; and the number of data points that must be satisfied for efficient estimation.

Type: Grant

Filed: March 25, 1999

Date of Patent: October 12, 2004

Assignee: International Business Machines Corporation

Inventors: Sankar Basu, Charles A. Micchelli, Peder A. Olsen
Adaptive probabilistic query expansion

Publication number: 20030212666

Abstract: A method, system and computer program for adaptively processing a query search. An expanding operation is utilized to expand the query into sub-queries, wherein at least one of the sub-queries is expanded probabilistically. A retrieving operation retrieves the results of the sub-queries, and a merging operation is used to merge the sub-query results into a search result. An adapting operation is configured to modify the search such that the relevance of the search result is increased when the search is repeated.

Type: Application

Filed: May 10, 2002

Publication date: November 13, 2003

Inventors: Sankar Basu, Milind R. Naphade, John R. Smith
Late integration in audio-visual continuous speech recognition

Patent number: 6633844

Abstract: The combination of audio and video speech recognition in a manner to improve the robustness of speech recognition systems in noisy environments. Contemplated are methods and apparatus in which a video signal associated with a video source and an audio signal associated with the video signal are processed, the most likely viseme associated with the audio signal and video signal is determined and, thereafter, the most likely phoneme associated with the audio signal and video signal is determined.

Type: Grant

Filed: December 2, 1999

Date of Patent: October 14, 2003

Assignee: International Business Machines Corporation

Inventors: Ashish Verma, Sankar Basu, Chalapathy Neti
Maximum entropy and maximum likelihood criteria for feature selection from multivariate data

Patent number: 6609094

Abstract: Improvements in speech recognition systems are achieved by considering projections of the high dimensional data on lower dimensional subspaces, subsequently by estimating the univariate probability densities via known univariate techniques, and then by reconstructing the density in the original higher dimensional space from the collection of univariate densities so obtained. The reconstructed density is by no means unique unless further restrictions on the estimated density are imposed. The variety of choices of candidate univariate densities as well as the choices of subspaces on which to project the data including their number further add to this non-uniqueness. Probability density functions are then considered that maximize certain optimality criterion as a solution to this problem. Specifically, those probability density function's that either maximize the entropy functional, or alternatively, the likelihood associated with the data are considered.

Type: Grant

Filed: May 22, 2000

Date of Patent: August 19, 2003

Assignee: International Business Machines Corporation

Inventors: Sankar Basu, Charles A. Micchelli, Peder Olsen
Methods and apparatus for audio-visual speech detection and recognition

Patent number: 6594629

Abstract: In a first aspect of the invention, methods and apparatus for providing speech recognition comprise the steps of processing a video signal associated with an arbitrary content video source, processing an audio signal associated with the video signal, and decoding the processed audio signal in conjunction with the processed video signal to generate a decoded output signal representative of the audio signal. In a second aspect 6f the invention, methods and apparatus for providing speech detection in accordance with a speech recognition system comprise the steps of processing a video signal associated with a video source to detect whether one or more features associated with the video signal are representative of speech, and processing an audio signal associated with the video signal in accordance with the speech recognition system to generate a decoded output signal representative of the audio signal when the one or more features associated with the video signal are representative of speech.

Type: Grant

Filed: August 6, 1999

Date of Patent: July 15, 2003

Assignee: International Business Machines Corporation

Inventors: Sankar Basu, Philippe Christian de Cuetos, Stephane Herman Maes, Chalapathy Venkata Neti, Andrew William Senior
Method and apparatus for audio-visual speech detection and recognition

Publication number: 20030018475

Abstract: Techniques for providing speech recognition comprise the steps of processing a video signal associated with an arbitrary content video source, processing an audio signal associated with the video signal, and recognizing at least a portion of the processed audio signal, using at least a portion of the processed video signal, to generate an output signal representative of the audio signal.

Type: Application

Filed: August 30, 2002

Publication date: January 23, 2003

Applicant: International Business Machines Corporation

Inventors: Sankar Basu, Philippe Christian de Cuetos, Stephane Herman Maes, Chalapathy Venkata Neti, Andrew William Senior
Speech driven lip synthesis using viseme based hidden markov models

Patent number: 6366885

Abstract: A method of speech driven lip synthesis which applies viseme based training models to units of visual speech. The audio data is grouped into a smaller number of visually distinct visemes rather than the larger number of phonemes. These visemes then form the basis for a Hidden Markov Model (HMM) state sequence or the output nodes of a neural network. During the training phase, audio and visual features are extracted from input speech, which is then aligned according to the apparent viseme sequence with the corresponding audio features being used to calculate the HMM state output probabilities or the output of the neutral network. During the synthesis phase, the acoustic input is aligned with the most likely viseme HMM sequence (in the case of an HMM based model) or with the nodes of the network (in the case of a neural network based system), which is then used for animation.

Type: Grant

Filed: August 27, 1999

Date of Patent: April 2, 2002

Assignee: International Business Machines Corporation

Inventors: Sankar Basu, Tanveer Atzal Faruquie, Chalapathy V. Neti, Nitendra Rajput, Andrew William Senior, L. Venkata Subramaniam, Ashish Verma
Nongaussian density estimation for the classification of acoustic feature vectors in speech recognition

Patent number: 6269334

Abstract: A statistical modeling paradigm for automatic machine recognition of speech uses mixtures of nongaussion statistical probability densities which provides improved recognition accuracy. Speech is modeled by building probability densities from functions of the form exp(−t&agr;/2) for t≧0 and &agr;>0. Mixture components are constructed from different univariate functions. The mixture model is used in a maximum likelihood model of speech data.

Type: Grant

Filed: June 25, 1998

Date of Patent: July 31, 2001

Assignee: International Business Machines Corporation

Inventors: Sankar Basu, Charles A. Micchelli
Wavelet-based energy binning cepstal features for automatic speech recognition

Patent number: 6253175

Abstract: Systems and methods for processing acoustic speech signals which utilize the wavelet transform (and alternatively, the Fourier transform) as a fundamental tool. The method essentially involves “synchrosqueezing” spectral component data obtained by performing a wavelet transform (or Fourier transform) on digitized speech signals. In one aspect, spectral components of the synchrosqueezed plane are dynamically tracked via a K-means clustering algorithm. The amplitude, frequency and bandwidth of each of the components are, thus, extracted. The cepstrum generated from this information is referred to as “K-mean Wastrum.” In another aspect, the result of the K-mean clustering process is further processed to limit the set of primary components to formants. The resulting features are referred to as “formant-based wastrum.” Formants are interpolated in unvoiced regions and the contribution of unvoiced turbulent part of the spectrum are added.

Type: Grant

Filed: November 30, 1998

Date of Patent: June 26, 2001

Assignee: International Business Machines Corporation

Inventors: Sankar Basu, Stephane H. Maes
Methods and apparatus for audio-visual speaker recognition and utterance verification

Patent number: 6219640

Abstract: Methods and apparatus for performing speaker recognition comprise processing a video signal associated with an arbitrary content video source and processing an audio signal associated with the video signal. Then, an identification and/or verification decision is made based on the processed audio signal and the processed video signal. Various decision making embodiments may be employed including, but not limited to, a score combination approach, a feature combination approach, and a re-scoring approach. In another aspect of the invention, a method of verifying a speech utterance comprises processing a video signal associated with a video source and processing an audio signal associated with the video signal. Then, the processed audio signal is compared with the processed video signal to determine a level of correlation between the signals. This is referred to as unsupervised utterance verification.

Type: Grant

Filed: August 6, 1999

Date of Patent: April 17, 2001

Assignee: International Business Machines Corporation

Inventors: Sankar Basu, Homayoon S. M. Beigi, Stephane Herman Maes, Benoit Emmanuel Ghislain Maison, Chalapathy Venkata Neti, Andrew William Senior