Patents by Inventor Vaibhava Goel

Vaibhava Goel has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20180068654
    Abstract: A computer-implemented method according to one embodiment includes estimating a speaker dependent acoustic model utilizing test speech data and maximum likelihood linear regression (MLLR), transforming labeled speech data to create transformed speech data, utilizing the speaker dependent acoustic model and a linear transformation, and adjusting a deep neural network (DNN) acoustic model, utilizing the transformed speech data.
    Type: Application
    Filed: September 7, 2016
    Publication date: March 8, 2018
    Inventors: Xiaodong Cui, Vaibhava Goel
  • Patent number: 9912909
    Abstract: A method of combining data streams from fixed audio-visual sensors with data streams from personal mobile devices including, forming a communication link with at least one of one or more personal mobile devices; receiving at least one of an audio data stream and/or a video data stream from the at least one of the one or more personal mobile devices; determining the quality of the at least one of the audio data stream and/or the video data stream, wherein the audio data stream and/or the video data stream having a quality above a threshold quality is retained; and combining the retained audio data stream and/or the video data stream with the data streams from the fixed audio-visual sensors.
    Type: Grant
    Filed: December 6, 2016
    Date of Patent: March 6, 2018
    Assignee: International Business Machines Corporation
    Inventors: Stanley Chen, Kenneth W. Church, Vaibhava Goel, Lidia L. Mangu, Etienne Marcheret, Bhuvana Ramabhadran, Laurence P. Sansone, Abhinav Sethy, Samuel Thomas
  • Publication number: 20180027213
    Abstract: A method of combining data streams from fixed audio-visual sensors with data streams from personal mobile devices including, forming a communication link with at least one of one or more personal mobile devices; receiving at least one of an audio data stream and/or a video data stream from the at least one of the one or more personal mobile devices; determining the quality of the at least one of the audio data stream and/or the video data stream, wherein the audio data stream and/or the video data stream having a quality above a threshold quality is retained; and combining the retained audio data stream and/or the video data stream with the data streams from the fixed audio-visual sensors.
    Type: Application
    Filed: October 2, 2017
    Publication date: January 25, 2018
    Inventors: STANLEY CHEN, KENNETH W. CHURCH, VAIBHAVA GOEL, LIDIA L. MANGU, ETIENNE MARCHERET, BHUVANA RAMABHADRAN, LAURENCE P. SANSONE, ABHINAV SETHY, SAMUEL THOMAS
  • Publication number: 20180025729
    Abstract: Aspects described herein are directed towards methods, computing devices, systems, and computer-readable media that apply scattering operations to extracted visual features of audiovisual input to generate predictions regarding the speech status of a subject. Visual scattering coefficients generated according to one or more aspects described herein may be used as input to a neural network operative to generate the predictions regarding the speech status of the subject. Predictions generated based on the visual features may be combined with predictions based on audio input associated with the visual features. In some embodiments, the extracted visual features may be combined with the audio input to generate a combined feature vector for use in generating predictions.
    Type: Application
    Filed: June 30, 2017
    Publication date: January 25, 2018
    Inventors: Etienne Marcheret, Josef Vopicka, Vaibhava Goel
  • Patent number: 9824683
    Abstract: A method of augmenting training data includes converting a feature sequence of a source speaker determined from a plurality of utterances within a transcript to a feature sequence of a target speaker under the same transcript, training a speaker-dependent acoustic model for the target speaker for corresponding speaker-specific acoustic characteristics, estimating a mapping function between the feature sequence of the source speaker and the speaker-dependent acoustic model of the target speaker, and mapping each utterance from each speaker in a training set using the mapping function to multiple selected target speakers in the training set.
    Type: Grant
    Filed: December 22, 2015
    Date of Patent: November 21, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Xiaodong Cui, Vaibhava Goel, Brian E. D. Kingsbury
  • Patent number: 9761221
    Abstract: According to some aspects, a method of classifying speech recognition results is provided, using a neural network comprising a plurality of interconnected network units, each network unit having one or more weight values, the method comprising using at least one computer, performing acts of providing a first vector as input to a first network layer comprising one or more network units of the neural network, transforming, by a first network unit of the one or more network units, the input vector to produce a plurality of values, the transformation being based at least in part on a plurality of weight values of the first network unit, sorting the plurality of values to produce a sorted plurality of values, and providing the sorted plurality of values as input to a second network layer of the neural network.
    Type: Grant
    Filed: August 20, 2015
    Date of Patent: September 12, 2017
    Assignee: Nuance Communications, Inc.
    Inventors: Steven John Rennie, Vaibhava Goel
  • Patent number: 9721559
    Abstract: A method of augmenting training data includes converting a feature sequence of a source speaker determined from a plurality of utterances within a transcript to a feature sequence of a target speaker under the same transcript, training a speaker-dependent acoustic model for the target speaker for corresponding speaker-specific acoustic characteristics, estimating a mapping function between the feature sequence of the source speaker and the speaker-dependent acoustic model of the target speaker, and mapping each utterance from each speaker in a training set using the mapping function to multiple selected target speakers in the training set.
    Type: Grant
    Filed: April 17, 2015
    Date of Patent: August 1, 2017
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Xiaodong Cui, Vaibhava Goel, Brian E. D. Kingsbury
  • Publication number: 20170200446
    Abstract: A method of augmenting training data includes converting a feature sequence of a source speaker determined from a plurality of utterances within a transcript to a feature sequence of a target speaker under the same transcript, training a speaker-dependent acoustic model for the target speaker for corresponding speaker-specific acoustic characteristics, estimating a mapping function between the feature sequence of the source speaker and the speaker-dependent acoustic model of the target speaker, and mapping each utterance from each speaker in a training set using the mapping function to multiple selected target speakers in the training set.
    Type: Application
    Filed: December 22, 2015
    Publication date: July 13, 2017
    Inventors: Xiaodong Cui, Vaibhava Goel, Brian E. D. Kingsbury
  • Patent number: 9697833
    Abstract: Aspects described herein are directed towards methods, computing devices, systems, and computer-readable media that apply scattering operations to extracted visual features of audiovisual input to generate predictions regarding the speech status of a subject. Visual scattering coefficients generated according to one or more aspects described herein may be used as input to a neural network operative to generate the predictions regarding the speech status of the subject. Predictions generated based on the visual features may be combined with predictions based on audio input associated with the visual features. In some embodiments, the extracted visual features may be combined with the audio input to generate a combined feature vector for use in generating predictions.
    Type: Grant
    Filed: August 25, 2015
    Date of Patent: July 4, 2017
    Assignee: Nuance Communications, Inc.
    Inventors: Etienne Marcheret, Josef Vopicka, Vaibhava Goel
  • Publication number: 20170154264
    Abstract: A method, executed by a computer, includes monitoring a conversation between a plurality of meeting participants, identifying a conversational focus within the conversation, generating at least one question corresponding to the conversational focus, and retrieving at least one answer corresponding to the at least one question. A computer system and computer program product corresponding to the method are also disclosed herein.
    Type: Application
    Filed: November 30, 2015
    Publication date: June 1, 2017
    Inventors: Stanley Chen, Kenneth W. Church, Robert G. Farrell, Vaibhava Goel, Lidia L. Mangu, Etienne Marcheret, Michael A. Picheny, Bhuvana Ramabhadran, Laurence P. Sansone, Abhinav Sethy, Samuel Thomas
  • Publication number: 20170150100
    Abstract: A method of combining data streams from fixed audio-visual sensors with data streams from personal mobile devices including, forming a communication link with at least one of one or more personal mobile devices; receiving at least one of an audio data stream and/or a video data stream from the at least one of the one or more personal mobile devices; determining the quality of the at least one of the audio data stream and/or the video data stream, wherein the audio data stream and/or the video data stream having a quality above a threshold quality is retained; and combining the retained audio data stream and/or the video data stream with the data streams from the fixed audio-visual sensors.
    Type: Application
    Filed: December 6, 2016
    Publication date: May 25, 2017
    Inventors: STANLEY CHEN, KENNETH W. CHURCH, VAIBHAVA GOEL, LIDIA L. MANGU, ETIENNE MARCHERET, BHUVANA RAMABHADRAN, LAURENCE P. SANSONE, ABHINAV SETHY, SAMUEL THOMAS
  • Patent number: 9646001
    Abstract: Operation of an automated dialog system is described using a source language to conduct a real time human machine dialog process with a human user using a target language. A user query in the target language is received and automatically machine translated into the source language. An automated reply of the dialog process is then delivered to the user in the target language. If the dialog process reaches an initial assistance state, a first human agent using the source language is provided to interact in real time with the user in the target language by machine translation to continue the dialog process. Then if the dialog process reaches a further assistance state, a second human agent using the target language is provided to interact in real time with the user in the target language to continue the dialog process.
    Type: Grant
    Filed: September 19, 2011
    Date of Patent: May 9, 2017
    Assignee: Nuance Communications, Inc.
    Inventors: Ruhi Sarikaya, Vaibhava Goel, David Nahamoo, Rèal Tremblay, Bhuvana Ramabhadran, Osamuyimen Stewart
  • Patent number: 9640186
    Abstract: Deep scattering spectral features are extracted from an acoustic input signal to generate a deep scattering spectral feature representation of the acoustic input signal. The deep scattering spectral feature representation is input to a speech recognition engine. The acoustic input signal is decoded based on at least a portion of the deep scattering spectral feature representation input to a speech recognition engine.
    Type: Grant
    Filed: May 2, 2014
    Date of Patent: May 2, 2017
    Assignee: International Business Machines Corporation
    Inventors: Petr Fousek, Vaibhava Goel, Brian E. D. Kingsbury, Etienne Marcheret, Shay Maymon, David Nahamoo, Vijayaditya Peddinti, Bhuvana Ramabhadran, Tara N. Sainath
  • Patent number: 9626621
    Abstract: A method for training a deep neural network (DNN), comprises receiving and formatting speech data for the training, performing Hessian-free sequence training (HFST) on a first subset of a plurality of subsets of the speech data, and iteratively performing the HFST on successive subsets of the plurality of subsets of the speech data, wherein iteratively performing the HFST comprises reusing information from at least one previous iteration.
    Type: Grant
    Filed: July 7, 2015
    Date of Patent: April 18, 2017
    Assignee: International Business Machines Corporation
    Inventors: Pierre Dognin, Vaibhava Goel
  • Publication number: 20170061966
    Abstract: Aspects described herein are directed towards methods, computing devices, systems, and computer-readable media that apply scattering operations to extracted visual features of audiovisual input to generate predictions regarding the speech status of a subject. Visual scattering coefficients generated according to one or more aspects described herein may be used as input to a neural network operative to generate the predictions regarding the speech status of the subject. Predictions generated based on the visual features may be combined with predictions based on audio input associated with the visual features. In some embodiments, the extracted visual features may be combined with the audio input to generate a combined feature vector for use in generating predictions.
    Type: Application
    Filed: August 25, 2015
    Publication date: March 2, 2017
    Inventors: Etienne Marcheret, Josef Vopicka, Vaibhava Goel
  • Patent number: 9584758
    Abstract: A method of combining data streams from fixed audio-visual sensors with data streams from personal mobile devices including, forming a communication link with at least one of one or more personal mobile devices; receiving at least one of an audio data stream and/or a video data stream from the at least one of the one or more personal mobile devices; determining the quality of the at least one of the audio data stream and/or the video data stream, wherein the audio data stream and/or the video data stream having a quality above a threshold quality is retained; and combining the retained audio data stream and/or the video data stream with the data streams from the fixed audio-visual sensors.
    Type: Grant
    Filed: November 25, 2015
    Date of Patent: February 28, 2017
    Assignee: International Business Machines Corporation
    Inventors: Stanley Chen, Kenneth W. Church, Vaibhava Goel, Lidia L. Mangu, Etienne Marcheret, Bhuvana Ramabhadran, Laurence P. Sansone, Abhinav Sethy, Samuel Thomas
  • Publication number: 20170053644
    Abstract: According to some aspects, a method of classifying speech recognition results is provided, using a neural network comprising a plurality of interconnected network units, each network unit having one or more weight values, the method comprising using at least one computer, performing acts of providing a first vector as input to a first network layer comprising one or more network units of the neural network, transforming, by a first network unit of the one or more network units, the input vector to produce a plurality of values, the transformation being based at least in part on a plurality of weight values of the first network unit, sorting the plurality of values to produce a sorted plurality of values, and providing the sorted plurality of values as input to a second network layer of the neural network.
    Type: Application
    Filed: August 20, 2015
    Publication date: February 23, 2017
    Applicant: Nuance Communications, Inc.
    Inventors: Steven John Rennie, Vaibhava Goel
  • Publication number: 20170040016
    Abstract: A method of augmenting training data includes converting a feature sequence of a source speaker determined from a plurality of utterances within a transcript to a feature sequence of a target speaker under the same transcript, training a speaker-dependent acoustic model for the target speaker for corresponding speaker-specific acoustic characteristics, estimating a mapping function between the feature sequence of the source speaker and the speaker-dependent acoustic model of the target speaker, and mapping each utterance from each speaker in a training set using the mapping function to multiple selected target speakers in the training set.
    Type: Application
    Filed: April 17, 2015
    Publication date: February 9, 2017
    Inventors: Xiaodong Cui, Vaibhava Goel, Brian E. D. Kingsbury
  • Patent number: 9483728
    Abstract: A method for training a deep neural network (DNN), comprises receiving and formatting speech data for the training, performing Hessian-free sequence training (HFST) on a first subset of a plurality of subsets of the speech data, and iteratively performing the HFST on successive subsets of the plurality of subsets of the speech data, wherein iteratively performing the HFST comprises reusing information from at least one previous iteration.
    Type: Grant
    Filed: October 30, 2014
    Date of Patent: November 1, 2016
    Assignee: International Business Machines Corporation
    Inventors: Pierre Dognin, Vaibhava Goel
  • Publication number: 20160314789
    Abstract: Methods and apparatus for using visual information to facilitate a speech recognition process. The method comprises dividing received audio information into a plurality of audio frames, determining for each of the plurality of audio frames, whether the audio information in the audio frame comprises speech from the foreground speaker, wherein the determining is based, at least in part, on received visual information, and transmitting the audio frame to an automatic speech recognition (ASR) engine for speech recognition when it is determined that the audio frame comprises speech from the foreground speaker.
    Type: Application
    Filed: April 27, 2015
    Publication date: October 27, 2016
    Applicant: Nuance Communications, Inc.
    Inventors: Etienne Marcheret, Josef Vopicka, Vaibhava Goel