Patents by Inventor Hasim Sak

Hasim Sak has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Sub-lexical language models with word level pronunciation lexicons

Patent number: 9292489

Abstract: An automatic speech recognition (ASR) system and method are provided for using sub-lexical language models together with word level pronunciation lexicons. These approaches operate by introducing a transduction between sequences of sub-lexical units and sequences of words.

Type: Grant

Filed: April 3, 2013

Date of Patent: March 22, 2016

Assignee: Google Inc.

Inventors: Hasim Sak, Murat Saraclar
IDENTIFYING THE LANGUAGE OF A SPOKEN UTTERANCE

Publication number: 20160035344

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for identifying the language of a spoken utterance. One of the methods includes receiving a plurality of audio frames that collectively represent at least a portion of a spoken utterance; processing the plurality of audio frames using a long short term memory (LSTM) neural network to generate a respective language score for each of a plurality of languages, wherein the respective language score for each of the plurality of languages represents a likelihood that the spoken utterance was spoken in the language; and classifying the spoken utterance as being spoken in one of the plurality of languages using the language scores.

Type: Application

Filed: August 4, 2015

Publication date: February 4, 2016

Inventors: Javier Gonzalez-Dominguez, Hasim Sak, Ignacio Lopez Moreno
GENERATING REPRESENTATIONS OF INPUT SEQUENCES USING NEURAL NETWORKS

Publication number: 20150356075

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating representations of input sequences. One of the methods includes receiving a grapheme sequence, the grapheme sequence comprising a plurality of graphemes arranged according to an input order; processing the sequence of graphemes using a long short-term memory (LSTM) neural network to generate an initial phoneme sequence from the grapheme sequence, the initial phoneme sequence comprising a plurality of phonemes arranged according to an output order; and generating a phoneme representation of the grapheme sequence from the initial phoneme sequence generated by the LSTM neural network, wherein generating the phoneme representation comprises removing, from the initial phoneme sequence, phonemes in one or more positions in the output order.

Type: Application

Filed: June 2, 2015

Publication date: December 10, 2015

Inventors: Kanury Kanishka Rao, Fuchun Peng, Hasim Sak, Francoise Beaufays
Mixture of n-gram language models

Patent number: 9208779

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for creating a static language model from a mixture of n-gram language models. One of the methods includes receiving a set of development sentences W, receiving a set of language models GM, determining a set of n-gram language model weights ?M based on the development sentences W and the set of language models GM, determining a set of sentence cluster weights ?C, each of the sentence cluster weights corresponding to a cluster in a set of sentence clusters, each cluster in the set of sentence clusters associated with at least one sentence from the set of development sentences W, and generating a language model from the set of language models GM, the set of n-gram language model weights ?M, the set of sentence clusters, and the set of sentence cluster weights ?C.

Type: Grant

Filed: September 6, 2013

Date of Patent: December 8, 2015

Assignee: Google Inc.

Inventors: Hasim Sak, Cyril Georges Luc Allauzen
RECOGNIZING SPEECH USING NEURAL NETWORKS

Publication number: 20150340034

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for recognizing speech using neural networks. One of the methods includes receiving an audio input; processing the audio input using an acoustic model to generate a respective phoneme score for each of a plurality of phoneme labels; processing one or more of the phoneme scores using an inverse pronunciation model to generate a respective grapheme score for each of a plurality of grapheme labels; and processing one or more of the grapheme scores using a language model to generate a respective text label score for each of a plurality of text labels.

Type: Application

Filed: May 22, 2015

Publication date: November 26, 2015

Inventors: Johan Schalkwyk, Francoise Beaufays, Hasim Sak, John Giannandrea
GENERATING REPRESENTATIONS OF ACOUSTIC SEQUENCES

Publication number: 20150170640

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating representation of acoustic sequences. One of the methods includes: receiving an acoustic sequence, the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; processing the acoustic feature representation at an initial time step using an acoustic modeling neural network; for each subsequent time step of the plurality of time steps: receiving an output generated by the acoustic modeling neural network for a preceding time step, generating a modified input from the output generated by the acoustic modeling neural network for the preceding time step and the acoustic representation for the time step, and processing the modified input using the acoustic modeling neural network to generate an output for the time step; and generating a phoneme representation for the utterance from the outputs for each of the time steps.

Type: Application

Filed: December 3, 2014

Publication date: June 18, 2015

Inventors: Hasim Sak, Andrew W. Senior
GENERATING REPRESENTATIONS OF ACOUSTIC SEQUENCES USING PROJECTION LAYERS

Publication number: 20150161991

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating phoneme representations of acoustic sequences using projection sequences. One of the methods includes receiving an acoustic sequence, the acoustic sequence representing an utterance, and the acoustic sequence comprising a respective acoustic feature representation at each of a plurality of time steps; for each of the plurality of time steps, processing the acoustic feature representation through each of one or more long short-term memory (LSTM) layers; and for each of the plurality of time steps, processing the recurrent projected output generated by the highest LSTM layer for the time step using an output layer to generate a set of scores for the time step.

Type: Application

Filed: December 2, 2014

Publication date: June 11, 2015

Inventors: Hasim Sak, Andrew W. Senior
MIXTURE OF N-GRAM LANGUAGE MODELS

Publication number: 20150073788

Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for creating a static language model from a mixture of n-gram language models. One of the methods includes receiving a set of development sentences W, receiving a set of language models GM, determining a set of n-gram language model weights ?M based on the development sentences W and the set of language models GM, determining a set of sentence cluster weights ?C, each of the sentence cluster weights corresponding to a cluster in a set of sentence clusters, each cluster in the set of sentence clusters associated with at least one sentence from the set of development sentences W, and generating a language model from the set of language models GM, the set of n-gram language model weights ?M, the set of sentence clusters, and the set of sentence cluster weights ?C.

Type: Application

Filed: September 6, 2013

Publication date: March 12, 2015

Applicant: Google Inc.

Inventors: Hasim Sak, Cyril Georges Luc Allauzen
LANGUAGE MODELING OF COMPLETE LANGUAGE SEQUENCES

Publication number: 20140278407

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for language modeling of complete language sequences. Training data indicating language sequences is accessed, and counts for a number of times each language sequence occurs in the training data are determined. A proper subset of the language sequences is selected, and a first component of a language model is trained. The first component includes first probability data for assigning scores to the selected language sequences. A second component of the language model is trained based on the training data, where the second component includes second probability data for assigning scores to language sequences that are not included in the selected language sequences. Adjustment data that normalizes the second probability data with respect to the first probability data is generated, and the first component, the second component, and the adjustment data are stored.

Type: Application

Filed: May 2, 2013

Publication date: September 18, 2014

Applicant: Google Inc.

Inventors: Ciprian I. Chelba, Hasim Sak, Johan Schalkwyk
SPEECH TRANSCRIPTION INCLUDING WRITTEN TEXT

Publication number: 20140149119

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for transcribing utterances into written text are disclosed. The methods, systems, and apparatus include actions of obtaining a lexicon model that maps phones to spoken text and obtaining a language model that assigns probabilities to written text. Further includes generating a transducer that maps the written text to the spoken text, the transducer mapping multiple items of the written text to an item of the spoken text. Additionally, the actions include constructing a decoding network for transcribing utterances into written text, by composing the lexicon model, the inverse of the transducer, and the language model.

Type: Application

Filed: March 14, 2013

Publication date: May 29, 2014

Applicant: Google Inc.

Inventors: Hasim Sak, Francoise Beaufays

prev 1 2 3 4