Patents by Inventor Sankar Basu

Sankar Basu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 7437349
    Abstract: A method, system and computer program for adaptively processing a query search. An expanding operation is utilized to expand the query into sub-queries, wherein at least one of the sub-queries is expanded probabilistically. A retrieving operation retrieves the results of the sub-queries, and a merging operation is used to merge the sub-query results into a search result. An adapting operation is configured to modify the search such that the relevance of the search result is increased when the search is repeated.
    Type: Grant
    Filed: May 10, 2002
    Date of Patent: October 14, 2008
    Assignee: International Business Machines Corporation
    Inventors: Sankar Basu, Milind R. Naphade, John R. Smith
  • Patent number: 6816836
    Abstract: Techniques for providing speech recognition comprise the steps of processing a video signal associated with an arbitrary content video source, processing an audio signal associated with the video signal, and recognizing at least a portion of the processed audio signal, using at least a portion of the processed video signal, to generate an output signal representative of the audio signal.
    Type: Grant
    Filed: August 30, 2002
    Date of Patent: November 9, 2004
    Assignee: International Business Machines Corporation
    Inventors: Sankar Basu, Philippe Christian de Cuetos, Stephane Herman Maes, Chalapathy Venkata Neti, Andrew William Senior
  • Publication number: 20040205482
    Abstract: Semantic indexing and retrieval of multimedia content requires that the content is sufficiently annotated. However, the great volumes of multimedia data and diversity of labels make annotation a difficult and costly process. Disclosed is an annotation framework in which supervised training with partially labeled data is facilitated using active learning. The system trains a classifier with a small set of labeled data and subsequently updates the classifier by selecting a subset of the available data-set according to optimization criteria. The process results in propagation of labels to unlabeled data and greatly facilitates the user in annotating large amounts of multimedia content.
    Type: Application
    Filed: January 24, 2002
    Publication date: October 14, 2004
    Applicant: International Business Machines Corporation
    Inventors: Sankar Basu, Ching-Yung Lin, Milind R. Naphade, John R. Smith, Belle L. Tseng
  • Patent number: 6804648
    Abstract: A parametric family of multivariate density functions formed by mixture models from univariate functions of the type exp(−|x|&bgr;) for modeling acoustic feature vectores are used in automatic recognition of speech. The parameter &bgr; is used to measure the non-Gaussian nature of the data. &bgr; is estimated from the input data using a maximum likelihood criterion. There is a balance between &bgr; and the number of data points that must be satisfied for efficient estimation.
    Type: Grant
    Filed: March 25, 1999
    Date of Patent: October 12, 2004
    Assignee: International Business Machines Corporation
    Inventors: Sankar Basu, Charles A. Micchelli, Peder A. Olsen
  • Publication number: 20030212666
    Abstract: A method, system and computer program for adaptively processing a query search. An expanding operation is utilized to expand the query into sub-queries, wherein at least one of the sub-queries is expanded probabilistically. A retrieving operation retrieves the results of the sub-queries, and a merging operation is used to merge the sub-query results into a search result. An adapting operation is configured to modify the search such that the relevance of the search result is increased when the search is repeated.
    Type: Application
    Filed: May 10, 2002
    Publication date: November 13, 2003
    Inventors: Sankar Basu, Milind R. Naphade, John R. Smith
  • Patent number: 6633844
    Abstract: The combination of audio and video speech recognition in a manner to improve the robustness of speech recognition systems in noisy environments. Contemplated are methods and apparatus in which a video signal associated with a video source and an audio signal associated with the video signal are processed, the most likely viseme associated with the audio signal and video signal is determined and, thereafter, the most likely phoneme associated with the audio signal and video signal is determined.
    Type: Grant
    Filed: December 2, 1999
    Date of Patent: October 14, 2003
    Assignee: International Business Machines Corporation
    Inventors: Ashish Verma, Sankar Basu, Chalapathy Neti
  • Patent number: 6609094
    Abstract: Improvements in speech recognition systems are achieved by considering projections of the high dimensional data on lower dimensional subspaces, subsequently by estimating the univariate probability densities via known univariate techniques, and then by reconstructing the density in the original higher dimensional space from the collection of univariate densities so obtained. The reconstructed density is by no means unique unless further restrictions on the estimated density are imposed. The variety of choices of candidate univariate densities as well as the choices of subspaces on which to project the data including their number further add to this non-uniqueness. Probability density functions are then considered that maximize certain optimality criterion as a solution to this problem. Specifically, those probability density function's that either maximize the entropy functional, or alternatively, the likelihood associated with the data are considered.
    Type: Grant
    Filed: May 22, 2000
    Date of Patent: August 19, 2003
    Assignee: International Business Machines Corporation
    Inventors: Sankar Basu, Charles A. Micchelli, Peder Olsen
  • Patent number: 6594629
    Abstract: In a first aspect of the invention, methods and apparatus for providing speech recognition comprise the steps of processing a video signal associated with an arbitrary content video source, processing an audio signal associated with the video signal, and decoding the processed audio signal in conjunction with the processed video signal to generate a decoded output signal representative of the audio signal. In a second aspect 6f the invention, methods and apparatus for providing speech detection in accordance with a speech recognition system comprise the steps of processing a video signal associated with a video source to detect whether one or more features associated with the video signal are representative of speech, and processing an audio signal associated with the video signal in accordance with the speech recognition system to generate a decoded output signal representative of the audio signal when the one or more features associated with the video signal are representative of speech.
    Type: Grant
    Filed: August 6, 1999
    Date of Patent: July 15, 2003
    Assignee: International Business Machines Corporation
    Inventors: Sankar Basu, Philippe Christian de Cuetos, Stephane Herman Maes, Chalapathy Venkata Neti, Andrew William Senior
  • Publication number: 20030018475
    Abstract: Techniques for providing speech recognition comprise the steps of processing a video signal associated with an arbitrary content video source, processing an audio signal associated with the video signal, and recognizing at least a portion of the processed audio signal, using at least a portion of the processed video signal, to generate an output signal representative of the audio signal.
    Type: Application
    Filed: August 30, 2002
    Publication date: January 23, 2003
    Applicant: International Business Machines Corporation
    Inventors: Sankar Basu, Philippe Christian de Cuetos, Stephane Herman Maes, Chalapathy Venkata Neti, Andrew William Senior
  • Patent number: 6366885
    Abstract: A method of speech driven lip synthesis which applies viseme based training models to units of visual speech. The audio data is grouped into a smaller number of visually distinct visemes rather than the larger number of phonemes. These visemes then form the basis for a Hidden Markov Model (HMM) state sequence or the output nodes of a neural network. During the training phase, audio and visual features are extracted from input speech, which is then aligned according to the apparent viseme sequence with the corresponding audio features being used to calculate the HMM state output probabilities or the output of the neutral network. During the synthesis phase, the acoustic input is aligned with the most likely viseme HMM sequence (in the case of an HMM based model) or with the nodes of the network (in the case of a neural network based system), which is then used for animation.
    Type: Grant
    Filed: August 27, 1999
    Date of Patent: April 2, 2002
    Assignee: International Business Machines Corporation
    Inventors: Sankar Basu, Tanveer Atzal Faruquie, Chalapathy V. Neti, Nitendra Rajput, Andrew William Senior, L. Venkata Subramaniam, Ashish Verma
  • Patent number: 6269334
    Abstract: A statistical modeling paradigm for automatic machine recognition of speech uses mixtures of nongaussion statistical probability densities which provides improved recognition accuracy. Speech is modeled by building probability densities from functions of the form exp(−t&agr;/2) for t≧0 and &agr;>0. Mixture components are constructed from different univariate functions. The mixture model is used in a maximum likelihood model of speech data.
    Type: Grant
    Filed: June 25, 1998
    Date of Patent: July 31, 2001
    Assignee: International Business Machines Corporation
    Inventors: Sankar Basu, Charles A. Micchelli
  • Patent number: 6253175
    Abstract: Systems and methods for processing acoustic speech signals which utilize the wavelet transform (and alternatively, the Fourier transform) as a fundamental tool. The method essentially involves “synchrosqueezing” spectral component data obtained by performing a wavelet transform (or Fourier transform) on digitized speech signals. In one aspect, spectral components of the synchrosqueezed plane are dynamically tracked via a K-means clustering algorithm. The amplitude, frequency and bandwidth of each of the components are, thus, extracted. The cepstrum generated from this information is referred to as “K-mean Wastrum.” In another aspect, the result of the K-mean clustering process is further processed to limit the set of primary components to formants. The resulting features are referred to as “formant-based wastrum.” Formants are interpolated in unvoiced regions and the contribution of unvoiced turbulent part of the spectrum are added.
    Type: Grant
    Filed: November 30, 1998
    Date of Patent: June 26, 2001
    Assignee: International Business Machines Corporation
    Inventors: Sankar Basu, Stephane H. Maes
  • Patent number: 6219640
    Abstract: Methods and apparatus for performing speaker recognition comprise processing a video signal associated with an arbitrary content video source and processing an audio signal associated with the video signal. Then, an identification and/or verification decision is made based on the processed audio signal and the processed video signal. Various decision making embodiments may be employed including, but not limited to, a score combination approach, a feature combination approach, and a re-scoring approach. In another aspect of the invention, a method of verifying a speech utterance comprises processing a video signal associated with a video source and processing an audio signal associated with the video signal. Then, the processed audio signal is compared with the processed video signal to determine a level of correlation between the signals. This is referred to as unsupervised utterance verification.
    Type: Grant
    Filed: August 6, 1999
    Date of Patent: April 17, 2001
    Assignee: International Business Machines Corporation
    Inventors: Sankar Basu, Homayoon S. M. Beigi, Stephane Herman Maes, Benoit Emmanuel Ghislain Maison, Chalapathy Venkata Neti, Andrew William Senior