Patents by Inventor Vaibhava Goel

Vaibhava Goel has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

ADJUSTING A DEEP NEURAL NETWORK ACOUSTIC MODEL

Publication number: 20180068654

Abstract: A computer-implemented method according to one embodiment includes estimating a speaker dependent acoustic model utilizing test speech data and maximum likelihood linear regression (MLLR), transforming labeled speech data to create transformed speech data, utilizing the speaker dependent acoustic model and a linear transformation, and adjusting a deep neural network (DNN) acoustic model, utilizing the transformed speech data.

Type: Application

Filed: September 7, 2016

Publication date: March 8, 2018

Inventors: Xiaodong Cui, Vaibhava Goel
Combining installed audio-visual sensors with ad-hoc mobile audio-visual sensors for smart meeting rooms

Patent number: 9912909

Abstract: A method of combining data streams from fixed audio-visual sensors with data streams from personal mobile devices including, forming a communication link with at least one of one or more personal mobile devices; receiving at least one of an audio data stream and/or a video data stream from the at least one of the one or more personal mobile devices; determining the quality of the at least one of the audio data stream and/or the video data stream, wherein the audio data stream and/or the video data stream having a quality above a threshold quality is retained; and combining the retained audio data stream and/or the video data stream with the data streams from the fixed audio-visual sensors.

Type: Grant

Filed: December 6, 2016

Date of Patent: March 6, 2018

Assignee: International Business Machines Corporation

Inventors: Stanley Chen, Kenneth W. Church, Vaibhava Goel, Lidia L. Mangu, Etienne Marcheret, Bhuvana Ramabhadran, Laurence P. Sansone, Abhinav Sethy, Samuel Thomas
COMBINING INSTALLED AUDIO-VISUAL SENSORS WITH AD-HOC MOBILE AUDIO-VISUAL SENSORS FOR SMART MEETING ROOMS

Publication number: 20180027213

Abstract: A method of combining data streams from fixed audio-visual sensors with data streams from personal mobile devices including, forming a communication link with at least one of one or more personal mobile devices; receiving at least one of an audio data stream and/or a video data stream from the at least one of the one or more personal mobile devices; determining the quality of the at least one of the audio data stream and/or the video data stream, wherein the audio data stream and/or the video data stream having a quality above a threshold quality is retained; and combining the retained audio data stream and/or the video data stream with the data streams from the fixed audio-visual sensors.

Type: Application

Filed: October 2, 2017

Publication date: January 25, 2018

Inventors: STANLEY CHEN, KENNETH W. CHURCH, VAIBHAVA GOEL, LIDIA L. MANGU, ETIENNE MARCHERET, BHUVANA RAMABHADRAN, LAURENCE P. SANSONE, ABHINAV SETHY, SAMUEL THOMAS
Audio-Visual Speech Recognition with Scattering Operators

Publication number: 20180025729

Abstract: Aspects described herein are directed towards methods, computing devices, systems, and computer-readable media that apply scattering operations to extracted visual features of audiovisual input to generate predictions regarding the speech status of a subject. Visual scattering coefficients generated according to one or more aspects described herein may be used as input to a neural network operative to generate the predictions regarding the speech status of the subject. Predictions generated based on the visual features may be combined with predictions based on audio input associated with the visual features. In some embodiments, the extracted visual features may be combined with the audio input to generate a combined feature vector for use in generating predictions.

Type: Application

Filed: June 30, 2017

Publication date: January 25, 2018

Inventors: Etienne Marcheret, Josef Vopicka, Vaibhava Goel
Data augmentation method based on stochastic feature mapping for automatic speech recognition

Patent number: 9824683

Abstract: A method of augmenting training data includes converting a feature sequence of a source speaker determined from a plurality of utterances within a transcript to a feature sequence of a target speaker under the same transcript, training a speaker-dependent acoustic model for the target speaker for corresponding speaker-specific acoustic characteristics, estimating a mapping function between the feature sequence of the source speaker and the speaker-dependent acoustic model of the target speaker, and mapping each utterance from each speaker in a training set using the mapping function to multiple selected target speakers in the training set.

Type: Grant

Filed: December 22, 2015

Date of Patent: November 21, 2017

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Xiaodong Cui, Vaibhava Goel, Brian E. D. Kingsbury
Order statistic techniques for neural networks

Patent number: 9761221

Abstract: According to some aspects, a method of classifying speech recognition results is provided, using a neural network comprising a plurality of interconnected network units, each network unit having one or more weight values, the method comprising using at least one computer, performing acts of providing a first vector as input to a first network layer comprising one or more network units of the neural network, transforming, by a first network unit of the one or more network units, the input vector to produce a plurality of values, the transformation being based at least in part on a plurality of weight values of the first network unit, sorting the plurality of values to produce a sorted plurality of values, and providing the sorted plurality of values as input to a second network layer of the neural network.

Type: Grant

Filed: August 20, 2015

Date of Patent: September 12, 2017

Assignee: Nuance Communications, Inc.

Inventors: Steven John Rennie, Vaibhava Goel
Data augmentation method based on stochastic feature mapping for automatic speech recognition

Patent number: 9721559

Abstract: A method of augmenting training data includes converting a feature sequence of a source speaker determined from a plurality of utterances within a transcript to a feature sequence of a target speaker under the same transcript, training a speaker-dependent acoustic model for the target speaker for corresponding speaker-specific acoustic characteristics, estimating a mapping function between the feature sequence of the source speaker and the speaker-dependent acoustic model of the target speaker, and mapping each utterance from each speaker in a training set using the mapping function to multiple selected target speakers in the training set.

Type: Grant

Filed: April 17, 2015

Date of Patent: August 1, 2017

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Xiaodong Cui, Vaibhava Goel, Brian E. D. Kingsbury
DATA AUGMENTATION METHOD BASED ON STOCHASTIC FEATURE MAPPING FOR AUTOMATIC SPEECH RECOGNITION

Publication number: 20170200446

Abstract: A method of augmenting training data includes converting a feature sequence of a source speaker determined from a plurality of utterances within a transcript to a feature sequence of a target speaker under the same transcript, training a speaker-dependent acoustic model for the target speaker for corresponding speaker-specific acoustic characteristics, estimating a mapping function between the feature sequence of the source speaker and the speaker-dependent acoustic model of the target speaker, and mapping each utterance from each speaker in a training set using the mapping function to multiple selected target speakers in the training set.

Type: Application

Filed: December 22, 2015

Publication date: July 13, 2017

Inventors: Xiaodong Cui, Vaibhava Goel, Brian E. D. Kingsbury
Audio-visual speech recognition with scattering operators

Patent number: 9697833

Abstract: Aspects described herein are directed towards methods, computing devices, systems, and computer-readable media that apply scattering operations to extracted visual features of audiovisual input to generate predictions regarding the speech status of a subject. Visual scattering coefficients generated according to one or more aspects described herein may be used as input to a neural network operative to generate the predictions regarding the speech status of the subject. Predictions generated based on the visual features may be combined with predictions based on audio input associated with the visual features. In some embodiments, the extracted visual features may be combined with the audio input to generate a combined feature vector for use in generating predictions.

Type: Grant

Filed: August 25, 2015

Date of Patent: July 4, 2017

Assignee: Nuance Communications, Inc.

Inventors: Etienne Marcheret, Josef Vopicka, Vaibhava Goel
AUTONOMOUS COLLABORATION AGENT FOR MEETINGS

Publication number: 20170154264

Abstract: A method, executed by a computer, includes monitoring a conversation between a plurality of meeting participants, identifying a conversational focus within the conversation, generating at least one question corresponding to the conversational focus, and retrieving at least one answer corresponding to the at least one question. A computer system and computer program product corresponding to the method are also disclosed herein.

Type: Application

Filed: November 30, 2015

Publication date: June 1, 2017

Inventors: Stanley Chen, Kenneth W. Church, Robert G. Farrell, Vaibhava Goel, Lidia L. Mangu, Etienne Marcheret, Michael A. Picheny, Bhuvana Ramabhadran, Laurence P. Sansone, Abhinav Sethy, Samuel Thomas
COMBINING INSTALLED AUDIO-VISUAL SENSORS WITH AD-HOC MOBILE AUDIO-VISUAL SENSORS FOR SMART MEETING ROOMS

Publication number: 20170150100

Abstract: A method of combining data streams from fixed audio-visual sensors with data streams from personal mobile devices including, forming a communication link with at least one of one or more personal mobile devices; receiving at least one of an audio data stream and/or a video data stream from the at least one of the one or more personal mobile devices; determining the quality of the at least one of the audio data stream and/or the video data stream, wherein the audio data stream and/or the video data stream having a quality above a threshold quality is retained; and combining the retained audio data stream and/or the video data stream with the data streams from the fixed audio-visual sensors.

Type: Application

Filed: December 6, 2016

Publication date: May 25, 2017

Inventors: STANLEY CHEN, KENNETH W. CHURCH, VAIBHAVA GOEL, LIDIA L. MANGU, ETIENNE MARCHERET, BHUVANA RAMABHADRAN, LAURENCE P. SANSONE, ABHINAV SETHY, SAMUEL THOMAS
Machine translation (MT) based spoken dialog systems customer/machine dialog

Patent number: 9646001

Abstract: Operation of an automated dialog system is described using a source language to conduct a real time human machine dialog process with a human user using a target language. A user query in the target language is received and automatically machine translated into the source language. An automated reply of the dialog process is then delivered to the user in the target language. If the dialog process reaches an initial assistance state, a first human agent using the source language is provided to interact in real time with the user in the target language by machine translation to continue the dialog process. Then if the dialog process reaches a further assistance state, a second human agent using the target language is provided to interact in real time with the user in the target language to continue the dialog process.

Type: Grant

Filed: September 19, 2011

Date of Patent: May 9, 2017

Assignee: Nuance Communications, Inc.

Inventors: Ruhi Sarikaya, Vaibhava Goel, David Nahamoo, Rèal Tremblay, Bhuvana Ramabhadran, Osamuyimen Stewart
Deep scattering spectrum in acoustic modeling for speech recognition

Patent number: 9640186

Abstract: Deep scattering spectral features are extracted from an acoustic input signal to generate a deep scattering spectral feature representation of the acoustic input signal. The deep scattering spectral feature representation is input to a speech recognition engine. The acoustic input signal is decoded based on at least a portion of the deep scattering spectral feature representation input to a speech recognition engine.

Type: Grant

Filed: May 2, 2014

Date of Patent: May 2, 2017

Assignee: International Business Machines Corporation

Inventors: Petr Fousek, Vaibhava Goel, Brian E. D. Kingsbury, Etienne Marcheret, Shay Maymon, David Nahamoo, Vijayaditya Peddinti, Bhuvana Ramabhadran, Tara N. Sainath
Systems and methods for combining stochastic average gradient and hessian-free optimization for sequence training of deep neural networks

Patent number: 9626621

Abstract: A method for training a deep neural network (DNN), comprises receiving and formatting speech data for the training, performing Hessian-free sequence training (HFST) on a first subset of a plurality of subsets of the speech data, and iteratively performing the HFST on successive subsets of the plurality of subsets of the speech data, wherein iteratively performing the HFST comprises reusing information from at least one previous iteration.

Type: Grant

Filed: July 7, 2015

Date of Patent: April 18, 2017

Assignee: International Business Machines Corporation

Inventors: Pierre Dognin, Vaibhava Goel
Audio-Visual Speech Recognition with Scattering Operators

Publication number: 20170061966

Abstract: Aspects described herein are directed towards methods, computing devices, systems, and computer-readable media that apply scattering operations to extracted visual features of audiovisual input to generate predictions regarding the speech status of a subject. Visual scattering coefficients generated according to one or more aspects described herein may be used as input to a neural network operative to generate the predictions regarding the speech status of the subject. Predictions generated based on the visual features may be combined with predictions based on audio input associated with the visual features. In some embodiments, the extracted visual features may be combined with the audio input to generate a combined feature vector for use in generating predictions.

Type: Application

Filed: August 25, 2015

Publication date: March 2, 2017

Inventors: Etienne Marcheret, Josef Vopicka, Vaibhava Goel
Combining installed audio-visual sensors with ad-hoc mobile audio-visual sensors for smart meeting rooms

Patent number: 9584758

Abstract: A method of combining data streams from fixed audio-visual sensors with data streams from personal mobile devices including, forming a communication link with at least one of one or more personal mobile devices; receiving at least one of an audio data stream and/or a video data stream from the at least one of the one or more personal mobile devices; determining the quality of the at least one of the audio data stream and/or the video data stream, wherein the audio data stream and/or the video data stream having a quality above a threshold quality is retained; and combining the retained audio data stream and/or the video data stream with the data streams from the fixed audio-visual sensors.

Type: Grant

Filed: November 25, 2015

Date of Patent: February 28, 2017

Assignee: International Business Machines Corporation

Inventors: Stanley Chen, Kenneth W. Church, Vaibhava Goel, Lidia L. Mangu, Etienne Marcheret, Bhuvana Ramabhadran, Laurence P. Sansone, Abhinav Sethy, Samuel Thomas
ORDER STATISTIC TECHNIQUES FOR NEURAL NETWORKS

Publication number: 20170053644

Abstract: According to some aspects, a method of classifying speech recognition results is provided, using a neural network comprising a plurality of interconnected network units, each network unit having one or more weight values, the method comprising using at least one computer, performing acts of providing a first vector as input to a first network layer comprising one or more network units of the neural network, transforming, by a first network unit of the one or more network units, the input vector to produce a plurality of values, the transformation being based at least in part on a plurality of weight values of the first network unit, sorting the plurality of values to produce a sorted plurality of values, and providing the sorted plurality of values as input to a second network layer of the neural network.

Type: Application

Filed: August 20, 2015

Publication date: February 23, 2017

Applicant: Nuance Communications, Inc.

Inventors: Steven John Rennie, Vaibhava Goel
DATA AUGMENTATION METHOD BASED ON STOCHASTIC FEATURE MAPPING FOR AUTOMATIC SPEECH RECOGNITION

Publication number: 20170040016

Abstract: A method of augmenting training data includes converting a feature sequence of a source speaker determined from a plurality of utterances within a transcript to a feature sequence of a target speaker under the same transcript, training a speaker-dependent acoustic model for the target speaker for corresponding speaker-specific acoustic characteristics, estimating a mapping function between the feature sequence of the source speaker and the speaker-dependent acoustic model of the target speaker, and mapping each utterance from each speaker in a training set using the mapping function to multiple selected target speakers in the training set.

Type: Application

Filed: April 17, 2015

Publication date: February 9, 2017

Inventors: Xiaodong Cui, Vaibhava Goel, Brian E. D. Kingsbury
Systems and methods for combining stochastic average gradient and hessian-free optimization for sequence training of deep neural networks

Patent number: 9483728

Abstract: A method for training a deep neural network (DNN), comprises receiving and formatting speech data for the training, performing Hessian-free sequence training (HFST) on a first subset of a plurality of subsets of the speech data, and iteratively performing the HFST on successive subsets of the plurality of subsets of the speech data, wherein iteratively performing the HFST comprises reusing information from at least one previous iteration.

Type: Grant

Filed: October 30, 2014

Date of Patent: November 1, 2016

Assignee: International Business Machines Corporation

Inventors: Pierre Dognin, Vaibhava Goel
METHODS AND APPARATUS FOR SPEECH RECOGNITION USING VISUAL INFORMATION

Publication number: 20160314789

Abstract: Methods and apparatus for using visual information to facilitate a speech recognition process. The method comprises dividing received audio information into a plurality of audio frames, determining for each of the plurality of audio frames, whether the audio information in the audio frame comprises speech from the foreground speaker, wherein the determining is based, at least in part, on received visual information, and transmitting the audio frame to an automatic speech recognition (ASR) engine for speech recognition when it is determined that the audio frame comprises speech from the foreground speaker.

Type: Application

Filed: April 27, 2015

Publication date: October 27, 2016

Applicant: Nuance Communications, Inc.

Inventors: Etienne Marcheret, Josef Vopicka, Vaibhava Goel

prev 1 2 3 4 5 next