Patents by Inventor Ciprian I. Chelba

Ciprian I. Chelba has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 9786269
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for language modeling of complete language sequences. Training data indicating language sequences is accessed, and counts for a number of times each language sequence occurs in the training data are determined. A proper subset of the language sequences is selected, and a first component of a language model is trained. The first component includes first probability data for assigning scores to the selected language sequences. A second component of the language model is trained based on the training data, where the second component includes second probability data for assigning scores to language sequences that are not included in the selected language sequences. Adjustment data that normalizes the second probability data with respect to the first probability data is generated, and the first component, the second component, and the adjustment data are stored.
    Type: Grant
    Filed: May 2, 2013
    Date of Patent: October 10, 2017
    Assignee: Google Inc.
    Inventors: Ciprian I. Chelba, Hasim Sak, Johan Schalkwyk
  • Patent number: 9612726
    Abstract: In one example, a method includes: receiving from a first user interface a first input from a first user specifying a first particular instant in a video other than a beginning of the video; in response to the first input, generating by one or more computer systems first data for inclusion in a link to the video, the first data representing the first particular instant in the video and being operable automatically to direct playback of the video at a second user interface to start at the first particular instant in the video in response to a second user selecting the link at the second user interface; and communicating the first data to a link generator for inclusion in the link to the video.
    Type: Grant
    Filed: December 28, 2013
    Date of Patent: April 4, 2017
    Assignee: Google Inc.
    Inventor: Ciprian I. Chelba
  • Patent number: 9405849
    Abstract: A method identifies pairs of first and second command inputs from respective user device sessions for which the first and second operation data are indicative of a first operation failure and a second operation success. The first operation data indicate a first operation performed on data from a first resource property in response to the first command input, and the second operation data indicate a second operation performed on data from a second resource property in response to the second command input. They system determines, from the identified pairs of first and second command inputs, command inputs for which a parsing rule that is associated with the second operation is to be generated.
    Type: Grant
    Filed: March 29, 2016
    Date of Patent: August 2, 2016
    Assignee: Google Inc.
    Inventors: Jakob D. Uszkoreit, John Blitzer, Engin Cinar Sahin, Rahul Gupta, Dekang Lin, Fernando Pereira, Ciprian I. Chelba
  • Patent number: 9336771
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for using non-parametric models in speech recognition. In some implementations, speech data is accessed. The speech data represents utterances of a particular phonetic unit occurring in a particular phonetic context, and the speech data includes values for multiple dimensions. Boundaries are determined for a set of quantiles for each of the multiple dimensions. Models for the distribution of values within the quantiles are generated. A multidimensional probability function is generated. Data indicating the boundaries of the quantiles, the models for the distribution of values in the quantiles, and the multidimensional probability function are stored.
    Type: Grant
    Filed: May 16, 2013
    Date of Patent: May 10, 2016
    Assignee: Google Inc.
    Inventor: Ciprian I. Chelba
  • Patent number: 9330195
    Abstract: A method identifies pairs of first and second command inputs from respective user device sessions for which the first and second operation data are indicative of a first operation failure and a second operation success. The first operation data indicate a first operation performed on data from a first resource property in response to the first command input, and the second operation data indicate a second operation performed on data from a second resource property in response to the second command input. They system determines, from the identified pairs of first and second command inputs, command inputs for which a parsing rule that is associated with the second operation is to be generated.
    Type: Grant
    Filed: June 25, 2013
    Date of Patent: May 3, 2016
    Assignee: Google Inc.
    Inventors: Jakob D. Uszkoreit, John Blitzer, Engin Cinar Sahin, Rahul Gupta, Dekang Lin, Fernando Pereira, Ciprian I. Chelba
  • Patent number: 9299339
    Abstract: A language processing system identifies sequential command inputs in user session data stored in logs. Each sequence command input is a first command input followed by a second command input. The system determines user actions in response to each command input. For the second command input, an action was taken at the user device in response to the command input, and there is no parsing rule associated with the action that parses to the first command input. If there is a sufficient co-occurrence of the first and second command inputs and the resulting action in the logs, then a parsing rule for the action may be augmented with a rule for the first command input.
    Type: Grant
    Filed: June 25, 2013
    Date of Patent: March 29, 2016
    Assignee: Google Inc.
    Inventors: Jakob D. Uszkoreit, Percy Liang, Daniel M. Bikel, Ciprian I. Chelba
  • Publication number: 20150371633
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for using non-parametric models in speech recognition. In some implementations, speech data is accessed. The speech data represents utterances of a particular phonetic unit occurring in a particular phonetic context, and the speech data includes values for multiple dimensions. Boundaries are determined for a set of quantiles for each of the multiple dimensions. Models for the distribution of values within the quantiles are generated. A multidimensional probability function is generated. Data indicating the boundaries of the quantiles, the models for the distribution of values in the quantiles, and the multidimensional probability function are stored.
    Type: Application
    Filed: May 16, 2013
    Publication date: December 24, 2015
    Inventor: Ciprian I. Chelba
  • Patent number: 8990692
    Abstract: In one example, a method includes: receiving from a first user interface a first input from a first user specifying a first particular instant in a video other than a beginning of the video; in response to the first input, generating by one or more computer systems first data for inclusion in a link to the video, the first data representing the first particular instant in the video and being operable automatically to direct playback of the video at a second user interface to start at the first particular instant in the video in response to a second user selecting the link at the second user interface; and communicating the first data to a link generator for inclusion in the link to the video.
    Type: Grant
    Filed: March 26, 2009
    Date of Patent: March 24, 2015
    Assignee: Google Inc.
    Inventor: Ciprian I. Chelba
  • Patent number: 8959014
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training acoustic models. Speech data and data identifying a transcription for the speech data are received. A phonetic representation for the transcription is accessed. Training sequences are identified for a particular phone in the phonetic representation. Each of the training sequences includes a different set of contextual phones surrounding the particular phone. A partitioning key is identified based on a sequence of phones that occurs in each of the training sequences. A processing module to which the identified partitioning key is assigned is selected. Data identifying the training sequences and a portion of the speech data are transmitted to the selected processing module.
    Type: Grant
    Filed: June 29, 2012
    Date of Patent: February 17, 2015
    Assignee: Google Inc.
    Inventors: Peng Xu, Fernando Pereira, Ciprian I. Chelba
  • Publication number: 20140278407
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for language modeling of complete language sequences. Training data indicating language sequences is accessed, and counts for a number of times each language sequence occurs in the training data are determined. A proper subset of the language sequences is selected, and a first component of a language model is trained. The first component includes first probability data for assigning scores to the selected language sequences. A second component of the language model is trained based on the training data, where the second component includes second probability data for assigning scores to language sequences that are not included in the selected language sequences. Adjustment data that normalizes the second probability data with respect to the first probability data is generated, and the first component, the second component, and the adjustment data are stored.
    Type: Application
    Filed: May 2, 2013
    Publication date: September 18, 2014
    Applicant: Google Inc.
    Inventors: Ciprian I. Chelba, Hasim Sak, Johan Schalkwyk
  • Publication number: 20140074470
    Abstract: Methods, systems, and apparatus, including computer programs encoded on computer storage media, for improved pronunciation. One of the methods includes receiving data that represents an audible pronunciation of the name of an individual from a user device. The method includes identifying one or more other users that are members of a social circle that the individual is a member. The method includes identifying one or more devices associated with the other users. The method also includes providing information that identifies the individual and the data representing the audible pronunciation to the one or more identified devices.
    Type: Application
    Filed: July 23, 2013
    Publication date: March 13, 2014
    Applicant: Google Inc.
    Inventors: Martin Jansche, Mark Edward Epstein, Ciprian I. Chelba
  • Patent number: 8521523
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting training data. In one aspect, a method comprises: selecting a target out of vocabulary rate; selecting a target percentage of user sessions; and determining a minimum training data freshness for a vocabulary of words, the minimum training data freshness corresponding to the target percentage of user sessions experiencing the target out of vocabulary rate.
    Type: Grant
    Filed: August 24, 2012
    Date of Patent: August 27, 2013
    Assignee: Google Inc.
    Inventors: Maryam Garrett, Ciprian I. Chelba
  • Patent number: 8515746
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for selecting training data. In an aspect, a method comprises: selecting a target out of vocabulary rate; selecting a target percentage of user sessions; and determining a minimum training data collection duration for a vocabulary of words, the minimum training data collection duration corresponding to the target percentage of user sessions experiencing the target out of vocabulary rate.
    Type: Grant
    Filed: August 24, 2012
    Date of Patent: August 20, 2013
    Assignee: Google Inc.
    Inventors: Maryam Garrett, Ciprian I. Chelba
  • Patent number: 8515745
    Abstract: Methods, systems, and apparatus for selecting training data. In an aspect, a method comprises: obtaining search session data comprising search sessions that include search queries, wherein each search query comprises words; determining a threshold out of vocabulary rate indicating a rate at which a word in a search query is not included in a vocabulary; determining a threshold session out of vocabulary rate, the session out of vocabulary rate indicating a rate at which search sessions have an out of vocabulary rate that meets the threshold out of vocabulary rate; selecting a vocabulary of words that, for a set of test data, has a session out of vocabulary rate that meets the threshold session out of vocabulary rate, the vocabulary of words being selected from the one or more words included in each of the search queries included in the search sessions.
    Type: Grant
    Filed: August 24, 2012
    Date of Patent: August 20, 2013
    Assignee: Google Inc.
    Inventors: Maryam Garrett, Ciprian I. Chelba
  • Patent number: 8494850
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for recognizing speech using a variable length of context. Speech data and data identifying a candidate transcription for the speech data are received. A phonetic representation for the candidate transcription is accessed. Multiple test sequences are extracted for a particular phone in the phonetic representation. Each of the multiple test sequences includes a different set of contextual phones surrounding the particular phone. Data indicating that an acoustic model includes data corresponding to one or more of the multiple test sequences is received. From among the one or more test sequences, the test sequence that includes the highest number of contextual phones is selected. A score for the candidate transcription is generated based on the data from the acoustic model that corresponds to the selected test sequence.
    Type: Grant
    Filed: June 29, 2012
    Date of Patent: July 23, 2013
    Assignee: Google Inc.
    Inventors: Ciprian I. Chelba, Peng Xu, Fernando Pereira
  • Patent number: 8453058
    Abstract: An audio shortcut may involve an audio command being used to represent a sequence of one or more inputs on a client device. When the client device receives the audio command, the client device may automatically perform the sequence of one or more inputs, as if this sequence were entered manually. If a threshold number of client devices share the same or a similar audio shortcut with a server device, the server device may make this audio shortcut available to additional client devices.
    Type: Grant
    Filed: February 20, 2012
    Date of Patent: May 28, 2013
    Assignee: Google Inc.
    Inventors: Noah B. Coccaro, Ciprian I. Chelba
  • Publication number: 20130006612
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training acoustic models. Speech data and data identifying a transcription for the speech data are received. A phonetic representation for the transcription is accessed. Training sequences are identified for a particular phone in the phonetic representation. Each of the training sequences includes a different set of contextual phones surrounding the particular phone. A partitioning key is identified based on a sequence of phones that occurs in each of the training sequences. A processing module to which the identified partitioning key is assigned is selected. Data identifying the training sequences and a portion of the speech data are transmitted to the selected processing module.
    Type: Application
    Filed: June 29, 2012
    Publication date: January 3, 2013
    Applicant: GOOGLE INC.
    Inventors: Peng Xu, Fernando Pereira, Ciprian I. Chelba
  • Publication number: 20130006623
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for recognizing speech using a variable length of context. Speech data and data identifying a candidate transcription for the speech data are received. A phonetic representation for the candidate transcription is accessed. Multiple test sequences are extracted for a particular phone in the phonetic representation. Each of the multiple test sequences includes a different set of contextual phones surrounding the particular phone. Data indicating that an acoustic model includes data corresponding to one or more of the multiple test sequences is received. From among the one or more test sequences, the test sequence that includes the highest number of contextual phones is selected. A score for the candidate transcription is generated based on the data from the acoustic model that corresponds to the selected test sequence.
    Type: Application
    Filed: June 29, 2012
    Publication date: January 3, 2013
    Applicant: GOOGLE INC.
    Inventors: Ciprian I. Chelba, Peng Xu, Fernando Pereira
  • Patent number: 7860314
    Abstract: A method and apparatus are provided for adapting an exponential probability model. In a first stage, a general-purpose background model is built from background data by determining a set of model parameters for the probability model based on a set of background data. The background model parameters are then used to define a prior model for the parameters of an adapted probability model that is adapted and more specific to an adaptation data set of interest. The adaptation data set is generally of much smaller size than the background data set. A second set of model parameters are then determined for the adapted probability model based on the set of adaptation data and the prior model.
    Type: Grant
    Filed: October 29, 2004
    Date of Patent: December 28, 2010
    Assignee: Microsoft Corporation
    Inventors: Ciprian I. Chelba, Alejandro Acero
  • Patent number: 7831428
    Abstract: A speech segment is indexed by identifying at least two alternative word sequences for the speech segment. For each word in the alternative sequences, information is placed in an entry for the word in the index. Speech units are eliminated from entries in the index based on a comparison of a probability that the word appears in the speech segment and a threshold value.
    Type: Grant
    Filed: November 9, 2005
    Date of Patent: November 9, 2010
    Assignee: Microsoft Corporation
    Inventors: Ciprian I. Chelba, Alejandro Acero, Jorge F. Silva Sanchez