Patents by Inventor Hasim Sak

Hasim Sak has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 9292489
    Abstract: An automatic speech recognition (ASR) system and method are provided for using sub-lexical language models together with word level pronunciation lexicons. These approaches operate by introducing a transduction between sequences of sub-lexical units and sequences of words.
    Type: Grant
    Filed: April 3, 2013
    Date of Patent: March 22, 2016
    Assignee: Google Inc.
    Inventors: Hasim Sak, Murat Saraclar
  • Publication number: 20160035344
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying the language of a spoken utterance. One of the methods includes receiving a plurality of audio frames that collectively represent at least a portion of a spoken utterance; processing the plurality of audio frames using a long short term memory (LSTM) neural network to generate a respective language score for each of a plurality of languages, wherein the respective language score for each of the plurality of languages represents a likelihood that the spoken utterance was spoken in the language; and classifying the spoken utterance as being spoken in one of the plurality of languages using the language scores.
    Type: Application
    Filed: August 4, 2015
    Publication date: February 4, 2016
    Inventors: Javier Gonzalez-Dominguez, Hasim Sak, Ignacio Lopez Moreno
  • Publication number: 20150356075
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating representations of input sequences. One of the methods includes receiving a grapheme sequence, the grapheme sequence comprising a plurality of graphemes arranged according to an input order; processing the sequence of graphemes using a long short-term memory (LSTM) neural network to generate an initial phoneme sequence from the grapheme sequence, the initial phoneme sequence comprising a plurality of phonemes arranged according to an output order; and generating a phoneme representation of the grapheme sequence from the initial phoneme sequence generated by the LSTM neural network, wherein generating the phoneme representation comprises removing, from the initial phoneme sequence, phonemes in one or more positions in the output order.
    Type: Application
    Filed: June 2, 2015
    Publication date: December 10, 2015
    Inventors: Kanury Kanishka Rao, Fuchun Peng, Hasim Sak, Francoise Beaufays
  • Patent number: 9208779
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for creating a static language model from a mixture of n-gram language models. One of the methods includes receiving a set of development sentences W, receiving a set of language models GM, determining a set of n-gram language model weights ?M based on the development sentences W and the set of language models GM, determining a set of sentence cluster weights ?C, each of the sentence cluster weights corresponding to a cluster in a set of sentence clusters, each cluster in the set of sentence clusters associated with at least one sentence from the set of development sentences W, and generating a language model from the set of language models GM, the set of n-gram language model weights ?M, the set of sentence clusters, and the set of sentence cluster weights ?C.
    Type: Grant
    Filed: September 6, 2013
    Date of Patent: December 8, 2015
    Assignee: Google Inc.
    Inventors: Hasim Sak, Cyril Georges Luc Allauzen
  • Publication number: 20150340034
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for recognizing speech using neural networks. One of the methods includes receiving an audio input; processing the audio input using an acoustic model to generate a respective phoneme score for each of a plurality of phoneme labels; processing one or more of the phoneme scores using an inverse pronunciation model to generate a respective grapheme score for each of a plurality of grapheme labels; and processing one or more of the grapheme scores using a language model to generate a respective text label score for each of a plurality of text labels.
    Type: Application
    Filed: May 22, 2015
    Publication date: November 26, 2015
    Inventors: Johan Schalkwyk, Francoise Beaufays, Hasim Sak, John Giannandrea
  • Publication number: 20150170640
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating representation of acoustic sequences. One of the methods includes: receiving an acoustic sequence, the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; processing the acoustic feature representation at an initial time step using an acoustic modeling neural network; for each subsequent time step of the plurality of time steps: receiving an output generated by the acoustic modeling neural network for a preceding time step, generating a modified input from the output generated by the acoustic modeling neural network for the preceding time step and the acoustic representation for the time step, and processing the modified input using the acoustic modeling neural network to generate an output for the time step; and generating a phoneme representation for the utterance from the outputs for each of the time steps.
    Type: Application
    Filed: December 3, 2014
    Publication date: June 18, 2015
    Inventors: Hasim Sak, Andrew W. Senior
  • Publication number: 20150161991
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating phoneme representations of acoustic sequences using projection sequences. One of the methods includes receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; for each of the plurality of time steps, processing the acoustic feature representation through each of one or more long short-term memory (LSTM) layers; and for each of the plurality of time steps, processing the recurrent projected output generated by the highest LSTM layer for the time step using an output layer to generate a set of scores for the time step.
    Type: Application
    Filed: December 2, 2014
    Publication date: June 11, 2015
    Inventors: Hasim Sak, Andrew W. Senior
  • Publication number: 20150073788
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for creating a static language model from a mixture of n-gram language models. One of the methods includes receiving a set of development sentences W, receiving a set of language models GM, determining a set of n-gram language model weights ?M based on the development sentences W and the set of language models GM, determining a set of sentence cluster weights ?C, each of the sentence cluster weights corresponding to a cluster in a set of sentence clusters, each cluster in the set of sentence clusters associated with at least one sentence from the set of development sentences W, and generating a language model from the set of language models GM, the set of n-gram language model weights ?M, the set of sentence clusters, and the set of sentence cluster weights ?C.
    Type: Application
    Filed: September 6, 2013
    Publication date: March 12, 2015
    Applicant: Google Inc.
    Inventors: Hasim Sak, Cyril Georges Luc Allauzen
  • Publication number: 20140278407
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for language modeling of complete language sequences. Training data indicating language sequences is accessed, and counts for a number of times each language sequence occurs in the training data are determined. A proper subset of the language sequences is selected, and a first component of a language model is trained. The first component includes first probability data for assigning scores to the selected language sequences. A second component of the language model is trained based on the training data, where the second component includes second probability data for assigning scores to language sequences that are not included in the selected language sequences. Adjustment data that normalizes the second probability data with respect to the first probability data is generated, and the first component, the second component, and the adjustment data are stored.
    Type: Application
    Filed: May 2, 2013
    Publication date: September 18, 2014
    Applicant: Google Inc.
    Inventors: Ciprian I. Chelba, Hasim Sak, Johan Schalkwyk
  • Publication number: 20140149119
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for transcribing utterances into written text are disclosed. The methods, systems, and apparatus include actions of obtaining a lexicon model that maps phones to spoken text and obtaining a language model that assigns probabilities to written text. Further includes generating a transducer that maps the written text to the spoken text, the transducer mapping multiple items of the written text to an item of the spoken text. Additionally, the actions include constructing a decoding network for transcribing utterances into written text, by composing the lexicon model, the inverse of the transducer, and the language model.
    Type: Application
    Filed: March 14, 2013
    Publication date: May 29, 2014
    Applicant: Google Inc.
    Inventors: Hasim Sak, Francoise Beaufays