Patents by Inventor Tara N. Sainath

Tara N. Sainath has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10403269
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for processing audio waveforms. In some implementations, a time-frequency feature representation is generated based on audio data. The time-frequency feature representation is input to an acoustic model comprising a trained artificial neural network. The trained artificial neural network comprising a frequency convolution layer, a memory layer, and one or more hidden layers. An output that is based on output of the trained artificial neural network is received. A transcription is provided, where the transcription is determined based on the output of the acoustic model.
    Type: Grant
    Filed: March 25, 2016
    Date of Patent: September 3, 2019
    Inventors: Tara N. Sainath, Ron J. Weiss, Andrew W. Senior, Kevin William Wilson
  • Publication number: 20190259409
    Abstract: This specification describes computer-implemented methods and systems. One method includes receiving, by a neural network of a speech recognition system, first data representing a first raw audio signal and second data representing a second raw audio signal. The first raw audio signal and the second raw audio signal describe audio occurring at a same period of time. The method further includes generating, by a spatial filtering layer of the neural network, a spatial filtered output using the first data and the second data, and generating, by a spectral filtering layer of the neural network, a spectral filtered output using the spatial filtered output. Generating the spectral filtered output comprises processing frequency-domain data representing the spatial filtered output. The method still further includes processing, by one or more additional layers of the neural network, the spectral filtered output to predict sub-word units encoded in both the first raw audio signal and the second raw audio signal.
    Type: Application
    Filed: February 19, 2019
    Publication date: August 22, 2019
    Inventors: Ehsan Variani, Kevin William Wilson, Ron J. Weiss, Tara N. Sainath, Arun Narayanan
  • Patent number: 10360901
    Abstract: Techniques for learning front-end speech recognition parameters as part of training a neural network classifier include obtaining an input speech signal, and applying front-end speech recognition parameters to extract features from the input speech signal. The extracted features may be fed through a neural network to obtain an output classification for the input speech signal, and an error measure may be computed for the output classification through comparison of the output classification with a known target classification. Back propagation may be applied to adjust one or more of the front-end parameters as one or more layers of the neural network, based on the error measure.
    Type: Grant
    Filed: December 5, 2014
    Date of Patent: July 23, 2019
    Assignee: Nuance Communications, Inc.
    Inventors: Tara N. Sainath, Brian E. D. Kingsbury, Abdel-rahman Mohamed, Bhuvana Ramabhadran
  • Patent number: 10339921
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for using neural networks. One of the methods includes receiving, by a neural network in a speech recognition system, first data representing a first raw audio signal and second data representing a second raw audio signal, the first raw audio signal and the second raw audio signal for the same period of time, generating, by a spatial filtering convolutional layer in the neural network, a spatial filtered output the first data and the second data, generating, by a spectral filtering convolutional layer in the neural network, a spectral filtered output using the spatial filtered output, and processing, by one or more additional layers in the neural network, the spectral filtered output to predict sub-word units encoded in both the first raw audio signal and the second raw audio signal.
    Type: Grant
    Filed: January 4, 2016
    Date of Patent: July 2, 2019
    Assignee: Google LLC
    Inventors: Tara N. Sainath, Ron J. Weiss, Kevin William Wilson
  • Publication number: 20190115013
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition using complex linear projection are disclosed. In one aspect, a method includes the actions of receiving audio data corresponding to an utterance. The method further includes generating frequency domain data using the audio data. The method further includes processing the frequency domain data using complex linear projection. The method further includes providing the processed frequency domain data to a neural network trained as an acoustic model. The method further includes generating a transcription for the utterance that is determined based at least on output that the neural network provides in response to receiving the processed frequency domain data.
    Type: Application
    Filed: October 26, 2018
    Publication date: April 18, 2019
    Inventors: Samuel Bengio, Mirko Visontai, Christopher Walter George Thornton, Michiel A.U. Bacchiani, Tara N. Sainath, Ehsan Variani, Izhak Shafran
  • Patent number: 10229700
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for detecting voice activity. In one aspect, a method include actions of receiving, by a neural network included in an automated voice activity detection system, a raw audio waveform, processing, by the neural network, the raw audio waveform to determine whether the audio waveform includes speech, and provide, by the neural network, a classification of the raw audio waveform indicating whether the raw audio waveform includes speech.
    Type: Grant
    Filed: January 4, 2016
    Date of Patent: March 12, 2019
    Assignee: Google LLC
    Inventors: Tara N. Sainath, Gabor Simko, Maria Carolina Parada San Martin, Ruben Zazo Candil
  • Patent number: 10224058
    Abstract: This specification describes computer-implemented methods and systems. One method includes receiving, by a neural network of a speech recognition system, first data representing a first raw audio signal and second data representing a second raw audio signal. The first raw audio signal and the second raw audio signal describe audio occurring at a same period of time. The method further includes generating, by a spatial filtering layer of the neural network, a spatial filtered output using the first data and the second data, and generating, by a spectral filtering layer of the neural network, a spectral filtered output using the spatial filtered output. Generating the spectral filtered output comprises processing frequency-domain data representing the spatial filtered output. The method still further includes processing, by one or more additional layers of the neural network, the spectral filtered output to predict sub-word units encoded in both the first raw audio signal and the second raw audio signal.
    Type: Grant
    Filed: November 14, 2016
    Date of Patent: March 5, 2019
    Assignee: Google LLC
    Inventors: Ehsan Variani, Kevin William Wilson, Ron J. Weiss, Tara N. Sainath, Arun Narayanan
  • Patent number: 10180974
    Abstract: Systems and methods for generating content corresponding to an event are provided. A method for generating content corresponding to an event, comprises defining a plurality of sub-events of the event, classifying one or more actual occurrences in the event into one or more of the sub-events, monitoring behavior of one or more users to determine areas of the event of interest to the one or more users, linking the one or more users to the one or more classified actual occurrences based on the areas of the event of interest, and generating content for the one or more classified actual occurrences.
    Type: Grant
    Filed: September 16, 2014
    Date of Patent: January 15, 2019
    Assignee: International Business Machines Corporation
    Inventors: Aleksandr Y. Aravkin, Carlos H. Cardonha, Sasha P. Caskey, Dimitri Kanevsky, Tara N. Sainath
  • Patent number: 10140371
    Abstract: Approaches for translating a transliterated search query are provided. An approach includes receiving a search query containing a transliterated word. The approach also includes determining a source language corresponding to the transliterated word. The approach further includes converting the transliterated word to a word in the source language. The approach additionally includes translating the word in the source language to a word in a target language. The approach also includes performing a search using the word in the target language.
    Type: Grant
    Filed: July 18, 2017
    Date of Patent: November 27, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Sasha P. Caskey, Rick A. Hamilton, II, Dimitri Kanevsky, Tara N. Sainath
  • Patent number: 10140980
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition using complex linear projection are disclosed. In one aspect, a method includes the actions of receiving audio data corresponding to an utterance. The method further includes generating frequency domain data using the audio data. The method further includes processing the frequency domain data using complex linear projection. The method further includes providing the processed frequency domain data to a neural network trained as an acoustic model. The method further includes generating a transcription for the utterance that is determined based at least on output that the neural network provides in response to receiving the processed frequency domain data.
    Type: Grant
    Filed: December 21, 2016
    Date of Patent: November 27, 2018
    Assignee: Google LCC
    Inventors: Samuel Bengio, Mirko Visontai, Christopher Walter George Thornton, Michiel A. U. Bacchiani, Tara N. Sainath, Ehsan Variani, Izhak Shafran
  • Patent number: 10078636
    Abstract: A method and system presents a sensory trait to a person for providing a human-sense perceivable representation of an aspect of an event. Information associated with an event can be received which has a first aspect being perceivable by a human via a first human sense at a distance from the event. A second aspect of the event is imperceivable by the human via a second human sense at the distance from the event. A query can be sent to a database for a representation of the first aspect of the event. A response to the query can be received which identifies the representation of the second aspect. The representation of the second aspect can be outputted in a manner that is perceivable by the human via the second human sense, while the human perceives the first aspect via the first human sense at the distance from the event.
    Type: Grant
    Filed: July 18, 2014
    Date of Patent: September 18, 2018
    Assignee: International Business Machines Corporation
    Inventors: Aleksandr Y. Aravkin, Dimitri Kanevsky, Peter K. Malkin, Tara N. Sainath
  • Patent number: 10068492
    Abstract: Computing system resources are controlled based on the behavioral attributes associated with users of the computing system. These behavioral attributes are monitored in real time and through a historical log, and behavioral attributes that fall outside pre-determined preferred behavioral parameters are detected. Access by the computing system user to computing system resources contained in a preferred and habitually accessed computing system resource set associated with the computing system user are adjusted in response to the detection of the behavioral attribute outside the pre-determined preferred behavioral parameters.
    Type: Grant
    Filed: January 8, 2016
    Date of Patent: September 4, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Sasha P Caskey, Dimitri Kanevsky, Sameer Maskey, Tara N Sainath
  • Patent number: 10056075
    Abstract: A method for training a deep neural network, comprises receiving and formatting speech data for the training, preconditioning a system of equations to be used for analyzing the speech data in connection with the training by using a non-fixed point quasi-Newton preconditioning scheme, and employing flexible Krylov subspace solvers in response to variations in the preconditioning scheme for different iterations of the training.
    Type: Grant
    Filed: December 9, 2016
    Date of Patent: August 21, 2018
    Assignee: International Business Machines Corporation
    Inventors: Lior Horesh, Brian E. D. Kingsbury, Tara N. Sainath
  • Publication number: 20180197534
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.
    Type: Application
    Filed: December 20, 2017
    Publication date: July 12, 2018
    Inventors: Bo Li, Ron J. Weiss, Michiel A.U. Bacchiani, Tara N. Sainath, Kevin William Wilson
  • Patent number: 10007887
    Abstract: A possible failure of a first device may be identified. Whether a user of the first device has a scheduled meeting to be held within a time range of the possible failure, may be determined by accessing calendar information. Responsive to determining that the user has the scheduled meeting to be held within the time range of the possible failure, at least one other participant of the scheduled meeting may be determined by accessing the calendar information, a contact address for said at least one other participant may be determined, and information may be transferred to the at least one other participant via the contact address.
    Type: Grant
    Filed: May 23, 2014
    Date of Patent: June 26, 2018
    Assignee: International Business Machines Corporation
    Inventors: Robert G. Farrell, Dimitri Kanevsky, Peter K. Malkin, Tara N. Sainath
  • Publication number: 20180174575
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for speech recognition using complex linear projection are disclosed. In one aspect, a method includes the actions of receiving audio data corresponding to an utterance. The method further includes generating frequency domain data using the audio data. The method further includes processing the frequency domain data using complex linear projection. The method further includes providing the processed frequency domain data to a neural network trained as an acoustic model. The method further includes generating a transcription for the utterance that is determined based at least on output that the neural network provides in response to receiving the processed frequency domain data.
    Type: Application
    Filed: December 21, 2016
    Publication date: June 21, 2018
    Inventors: Samuel Bengio, Mirko Visontai, Christopher Walter George Thornton, Michiel A.U. Bacchiani, Tara N. Sainath, Ehsan Variani, Izhak Shafran
  • Patent number: 9984104
    Abstract: In a method for generating a searchable index from an analysis of a software application, receiving a first software application. The one or more processors determine that a first source code of the first software application is inaccessible. The one or more processors stimulate the first software application. The one or more processors analyze textual data resulting from the stimulation of the first software application. The one or more processors classify one or more images resulting from the stimulation of the first software application. The one or more processors index the analyzed textual data and the classified one or more images resulting from the stimulation of the first software application.
    Type: Grant
    Filed: January 13, 2016
    Date of Patent: May 29, 2018
    Assignee: International Business Machines Corporation
    Inventors: Aleksandr Y. Aravkin, Sasha P. Caskey, Ossama S. Emam, Dimitri Kanevsky, Tara N. Sainath
  • Patent number: 9984683
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for automatic speech recognition using multi-dimensional models. In some implementations, audio data that describes an utterance is received. A transcription for the utterance is determined using an acoustic model that includes a neural network having first memory blocks for time information and second memory blocks for frequency information. The transcription for the utterance is provided as output of an automated speech recognizer.
    Type: Grant
    Filed: July 22, 2016
    Date of Patent: May 29, 2018
    Assignee: Google LLC
    Inventors: Bo Li, Tara N. Sainath
  • Publication number: 20180068675
    Abstract: This specification describes computer-implemented methods and systems. One method includes receiving, by a neural network of a speech recognition system, first data representing a first raw audio signal and second data representing a second raw audio signal. The first raw audio signal and the second raw audio signal describe audio occurring at a same period of time. The method further includes generating, by a spatial filtering layer of the neural network, a spatial filtered output using the first data and the second data, and generating, by a spectral filtering layer of the neural network, a spectral filtered output using the spatial filtered output. Generating the spectral filtered output comprises processing frequency-domain data representing the spatial filtered output. The method still further includes processing, by one or more additional layers of the neural network, the spectral filtered output to predict sub-word units encoded in both the first raw audio signal and the second raw audio signal.
    Type: Application
    Filed: November 14, 2016
    Publication date: March 8, 2018
    Inventors: Ehsan Variani, Kevin William Wilson, Ron J. Weiss, Tara N. Sainath, Arun Narayanan
  • Patent number: 9886949
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for neural network adaptive beamforming for multichannel speech recognition are disclosed. In one aspect, a method includes the actions of receiving a first channel of audio data corresponding to an utterance and a second channel of audio data corresponding to the utterance. The actions further include generating a first set of filter parameters for a first filter based on the first channel of audio data and the second channel of audio data and a second set of filter parameters for a second filter based on the first channel of audio data and the second channel of audio data. The actions further include generating a single combined channel of audio data. The actions further include inputting the audio data to a neural network. The actions further include providing a transcription for the utterance.
    Type: Grant
    Filed: December 28, 2016
    Date of Patent: February 6, 2018
    Assignee: Google Inc.
    Inventors: Bo Li, Ron J. Weiss, Michiel A. U. Bacchiani, Tara N. Sainath, Kevin William Wilson