Patents by Inventor Chalapathy Venkata Neti

Chalapathy Venkata Neti has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Language context dependent data labeling

Patent number: 7295979

Abstract: Bootstrapping of a system from one language to another often works well when the two languages share the similar acoustic space. However, when the new language has sounds that do not occur in the language from which the bootstrapping is to be done, bootstrapping does not produce good initial models and the new language data is not properly aligned to these models. The present invention provides techniques to generate context dependent labeling of the new language data using the recognition system of another language. Then, this labeled data is used to generate models for the new language phones.

Type: Grant

Filed: February 22, 2001

Date of Patent: November 13, 2007

Assignee: International Business Machines Corporation

Inventors: Chalapathy Venkata Neti, Nitendra Rajput, L. Venkata Subramaniam, Ashish Verma
Audio-only backoff in audio-visual speech recognition system

Patent number: 7251603

Abstract: Techniques for performing audio-visual speech recognition, with improved recognition performance, in a degraded visual environment. For example, in one aspect of the invention, a technique for use in accordance with an audio-visual speech recognition system for improving a recognition performance thereof includes the steps/operations of: (i) selecting between an acoustic-only data model and an acoustic-visual data model based on a condition associated with a visual environment; and (ii) decoding at least a portion of an input spoken utterance using the selected data model. Advantageously, during periods of degraded visual conditions, the audio-visual speech recognition system is able to decode (recognize) input speech data using audio-only data, thus avoiding recognition inaccuracies that may result from performing speech recognition based on acoustic-visual data models and degraded visual data.

Type: Grant

Filed: June 23, 2003

Date of Patent: July 31, 2007

Assignee: International Business Machines Corporation

Inventors: Jonathan H. Connell, Norman Haas, Etienne Marcheret, Chalapathy Venkata Neti, Gerasimos Potamianos
System and method for multi-modal focus detection, referential ambiguity resolution and mood classification using multi-modal input

Patent number: 6964023

Abstract: Systems and methods are provided for performing focus detection, referential ambiguity resolution and mood classification in accordance with multi-modal input data, in varying operating conditions, in order to provide an effective conversational computing environment for one or more users.

Type: Grant

Filed: February 5, 2001

Date of Patent: November 8, 2005

Assignee: International Business Machines Corporation

Inventors: Stephane Herman Maes, Chalapathy Venkata Neti
Audio-only backoff in audio-visual speech recognition system

Publication number: 20040260554

Abstract: Techniques for performing audio-visual speech recognition, with improved recognition performance, in a degraded visual environment. For example, in one aspect of the invention, a technique for use in accordance with an audio-visual speech recognition system for improving a recognition performance thereof includes the steps/operations of: (i) selecting between an acoustic-only data model and an acoustic-visual data model based on a condition associated with a visual environment; and (ii) decoding at least a portion of an input spoken utterance using the selected data model. Advantageously, during periods of degraded visual conditions, the audio-visual speech recognition system is able to decode (recognize) input speech data using audio-only data, thus avoiding recognition inaccuracies that may result from performing speech recognition based on acoustic-visual data models and degraded visual data.

Type: Application

Filed: June 23, 2003

Publication date: December 23, 2004

Applicant: International Business Machines Corporation

Inventors: Jonathan H. Connell, Norman Haas, Etienne Marcheret, Chalapathy Venkata Neti, Gerasimos Potamianos
Method and apparatus for audio-visual speech detection and recognition

Patent number: 6816836

Abstract: Techniques for providing speech recognition comprise the steps of processing a video signal associated with an arbitrary content video source, processing an audio signal associated with the video signal, and recognizing at least a portion of the processed audio signal, using at least a portion of the processed video signal, to generate an output signal representative of the audio signal.

Type: Grant

Filed: August 30, 2002

Date of Patent: November 9, 2004

Assignee: International Business Machines Corporation

Inventors: Sankar Basu, Philippe Christian de Cuetos, Stephane Herman Maes, Chalapathy Venkata Neti, Andrew William Senior
Apparatus and methods for semantic representation and retrieval of multimedia content

Publication number: 20040111432

Abstract: An apparatus and method for analyzing multimedia content to identify the presence of audio, visual and textual cues that together correspond to one or more high-level semantics are provided. The apparatus and method make use of one or more analysis models that are trained to analyze audio, visual and textual portions of multimedia content to generate scores associated with the audio, visual and textual portions with respect to various high-level semantic concepts. These scores are used to generate a vector of scores. The apparatus is trained with regard to relationships between audio, visual and textual scores to thereby take the vector of scores generated for the multimedia content and classify the multimedia content into one or more high-level semantic concepts. Based on the scores for the various audio, video and textual portions of the multimedia content, a level of certainty regarding the high-level semantic concepts may be generated.

Type: Application

Filed: December 10, 2002

Publication date: June 10, 2004

Applicant: International Business Machines Corporation

Inventors: Hugh William Adams, Giridharan Iyengar, Ching-Yung Lin, Milind R. Naphade, Chalapathy Venkata Neti, Harriet Jane Nock, John Richard Smith, Belle L. Tseng
Methods and apparatus for audio-visual speech detection and recognition

Patent number: 6594629

Abstract: In a first aspect of the invention, methods and apparatus for providing speech recognition comprise the steps of processing a video signal associated with an arbitrary content video source, processing an audio signal associated with the video signal, and decoding the processed audio signal in conjunction with the processed video signal to generate a decoded output signal representative of the audio signal. In a second aspect 6f the invention, methods and apparatus for providing speech detection in accordance with a speech recognition system comprise the steps of processing a video signal associated with a video source to detect whether one or more features associated with the video signal are representative of speech, and processing an audio signal associated with the video signal in accordance with the speech recognition system to generate a decoded output signal representative of the audio signal when the one or more features associated with the video signal are representative of speech.

Type: Grant

Filed: August 6, 1999

Date of Patent: July 15, 2003

Assignee: International Business Machines Corporation

Inventors: Sankar Basu, Philippe Christian de Cuetos, Stephane Herman Maes, Chalapathy Venkata Neti, Andrew William Senior
Method and apparatus for audio-visual speech detection and recognition

Publication number: 20030018475

Abstract: Techniques for providing speech recognition comprise the steps of processing a video signal associated with an arbitrary content video source, processing an audio signal associated with the video signal, and recognizing at least a portion of the processed audio signal, using at least a portion of the processed video signal, to generate an output signal representative of the audio signal.

Type: Application

Filed: August 30, 2002

Publication date: January 23, 2003

Applicant: International Business Machines Corporation

Inventors: Sankar Basu, Philippe Christian de Cuetos, Stephane Herman Maes, Chalapathy Venkata Neti, Andrew William Senior
New language context dependent data labeling

Publication number: 20020152068

Abstract: Bootstrapping of a system from one language to another often works well when the two languages share the similar acoustic space. However, when the new language has sounds that do not occur in the language from which the bootstrapping is to be done, bootstrapping does not produce good initial models and the new language data is not properly aligned to these models. The present invention provides techniques to generate context dependent labeling of the new language data using the recognition system of another language. Then, this labeled data is used to generate models for the new language phones.

Type: Application

Filed: February 22, 2001

Publication date: October 17, 2002

Applicant: International Business Machines Corporation

Inventors: Chalapathy Venkata Neti, Nitendra Rajput, L. Venkata Subramaniam, Ashish Verma
System and method for multi-modal focus detection, referential ambiguity resolution and mood classification using multi-modal input

Publication number: 20020135618

Abstract: Systems and methods are provided for performing focus detection, referential ambiguity resolution and mood classification in accordance with multi-modal input data, in varying operating conditions, in order to provide an effective conversational computing environment for one or more users.

Type: Application

Filed: February 5, 2001

Publication date: September 26, 2002

Applicant: International Business Machines Corporation

Inventors: Stephane Herman Maes, Chalapathy Venkata Neti
Methods and apparatus for audio-visual speaker recognition and utterance verification

Patent number: 6219640

Abstract: Methods and apparatus for performing speaker recognition comprise processing a video signal associated with an arbitrary content video source and processing an audio signal associated with the video signal. Then, an identification and/or verification decision is made based on the processed audio signal and the processed video signal. Various decision making embodiments may be employed including, but not limited to, a score combination approach, a feature combination approach, and a re-scoring approach. In another aspect of the invention, a method of verifying a speech utterance comprises processing a video signal associated with a video source and processing an audio signal associated with the video signal. Then, the processed audio signal is compared with the processed video signal to determine a level of correlation between the signals. This is referred to as unsupervised utterance verification.

Type: Grant

Filed: August 6, 1999

Date of Patent: April 17, 2001

Assignee: International Business Machines Corporation

Inventors: Sankar Basu, Homayoon S. M. Beigi, Stephane Herman Maes, Benoit Emmanuel Ghislain Maison, Chalapathy Venkata Neti, Andrew William Senior
Speech recognition models combining gender-dependent and gender-independent phone states and using phonetic-context-dependence

Patent number: 5953701

Abstract: A method of gender dependent speech recognition includes the steps of identifying phone state models common to both genders, identifying gender specific phone state models, identifying a gender of a speaker and recognizing acoustic data from the speaker.

Type: Grant

Filed: January 22, 1998

Date of Patent: September 14, 1999

Assignee: International Business Machines Corporation

Inventors: Chalapathy Venkata Neti, Salim Estephan Roukos