Patents by Inventor Fadi Biadsy

Fadi Biadsy has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11335324
    Abstract: A method for training a speech conversion model personalized for a target speaker with atypical speech includes obtaining a plurality of transcriptions in a set of spoken training utterances and obtaining a plurality of unspoken training text utterances. Each spoken training utterance is spoken by a target speaker associated with atypical speech and includes a corresponding transcription paired with a corresponding non-synthetic speech representation. The method also includes adapting, using the set of spoken training utterances, a text-to-speech (TTS) model to synthesize speech in a voice of the target speaker and that captures the atypical speech. For each unspoken training text utterance, the method also includes generating, as output from the adapted TTS model, a synthetic speech representation that includes the voice of the target speaker and that captures the atypical speech. The method also includes training the speech conversion model based on the synthetic speech representations.
    Type: Grant
    Filed: August 31, 2020
    Date of Patent: May 17, 2022
    Assignee: Google LLC
    Inventors: Fadi Biadsy, Liyang Jiang, Pedro J. Moreno Mengibar, Andrew Rosenberg
  • Publication number: 20220122579
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for end to end speech conversion are disclosed. In one aspect, a method includes the actions of receiving first audio data of a first utterance of one or more first terms spoken by a user. The actions further include providing the first audio data as an input to a model that is configured to receive first given audio data in a first voice and output second given audio data in a synthesized voice without performing speech recognition on the first given audio data. The actions further include receiving second audio data of a second utterance of the one or more first terms spoken in the synthesized voice. The actions further include providing, for output, the second audio data of the second utterance of the one or more first terms spoken in the synthesized voice.
    Type: Application
    Filed: November 26, 2019
    Publication date: April 21, 2022
    Applicant: Google LLC
    Inventors: Fadi Biadsy, Ron J. Weiss, Aleksandar Kracun, Pedro J. Moreno Mengibar
  • Publication number: 20220068257
    Abstract: A method for training a speech conversion model personalized for a target speaker with atypical speech includes obtaining a plurality of transcriptions in a set of spoken training utterances and obtaining a plurality of unspoken training text utterances. Each spoken training utterance is spoken by a target speaker associated with atypical speech and includes a corresponding transcription paired with a corresponding non-synthetic speech representation. The method also includes adapting, using the set of spoken training utterances, a text-to-speech (TTS) model to synthesize speech in a voice of the target speaker and that captures the atypical speech. For each unspoken training text utterance, the method also includes generating, as output from the adapted TTS model, a synthetic speech representation that includes the voice of the target speaker and that captures the atypical speech. The method also includes training the speech conversion model based on the synthetic speech representations.
    Type: Application
    Filed: August 31, 2020
    Publication date: March 3, 2022
    Applicant: Google LLC
    Inventors: Fadi Biadsy, Liyang Jiang, Pedro J. Moreno Mengibar, Andrew Rosenberg
  • Patent number: 11164566
    Abstract: Methods and systems for automatic speech recognition and methods and systems for training acoustic language models are disclosed. In accordance with one automatic speech recognition method, an acoustic input data set is analyzed to identify portions of the input data set that conform to a general language and to identify portions of the input data set that conform to at least one dialect of the general language. In addition, a general language model and at least one dialect language model is applied to the input data set to perform speech recognition by dynamically selecting between the models in accordance with each of the identified portions. Further, speech recognition results obtained in accordance with the application of the models is output.
    Type: Grant
    Filed: May 7, 2018
    Date of Patent: November 2, 2021
    Assignee: International Business Machines Corporation
    Inventors: Fadi Biadsy, Lidia Mangu, Hagen Soltau
  • Publication number: 20210209315
    Abstract: The present disclosure provides systems and methods that train and use machine-learned models such as, for example, sequence-to-sequence models, to perform direct and text-free speech-to-speech translation. In particular, aspects of the present disclosure provide an attention-based sequence-to-sequence neural network which can directly translate speech from one language into speech in another language, without relying on an intermediate text representation.
    Type: Application
    Filed: March 7, 2020
    Publication date: July 8, 2021
    Inventors: Ye Jia, Zhifeng Chen, Yonghui Wu, Melvin Johnson, Fadi Biadsy, Ron Weiss, Wolfgang Macherey
  • Publication number: 20210020170
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for language models using domain-specific model components. In some implementations, context data for an utterance is obtained. A domain-specific model component is selected from among multiple domain-specific model components of a language model based on the non-linguistic context of the utterance. A score for a candidate transcription for the utterance is generated using the selected domain-specific model component and a baseline model component of the language model that is domain-independent. A transcription for the utterance is determined using the score the transcription is provided as output of an automated speech recognition system.
    Type: Application
    Filed: October 1, 2020
    Publication date: January 21, 2021
    Applicant: Google LLC
    Inventors: Fadi Biadsy, Diamantino Antionio Caseiro
  • Patent number: 10832664
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for language models using domain-specific model components. In some implementations, context data for an utterance is obtained. A domain-specific model component is selected from among multiple domain-specific model components of a language model based on the non-linguistic context of the utterance. A score for a candidate transcription for the utterance is generated using the selected domain-specific model component and a baseline model component of the language model that is domain-independent. A transcription for the utterance is determined using the score the transcription is provided as output of an automated speech recognition system.
    Type: Grant
    Filed: August 21, 2017
    Date of Patent: November 10, 2020
    Assignee: Google LLC
    Inventors: Fadi Biadsy, Diamantino Antionio Caseiro
  • Publication number: 20190244610
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating expressions associated with voice commands. The methods, systems, and apparatus include actions of obtaining segments of one or more expressions associated with a voice command. Further actions include combining the segments into a candidate expression and scoring the candidate expression using a text corpus. Additional actions include selecting the candidate expression as an expression associated with the voice command based on the scoring of the candidate expression.
    Type: Application
    Filed: January 25, 2019
    Publication date: August 8, 2019
    Inventors: Fadi Biadsy, Pedro J. Moreno Mengibar
  • Publication number: 20190156820
    Abstract: Methods and systems for automatic speech recognition and methods and systems for training acoustic language models are disclosed. In accordance with one automatic speech recognition method, an acoustic input data set is analyzed to identify portions of the input data set that conform to a general language and to identify portions of the input data set that conform to at least one dialect of the general language. In addition, a general language model and at least one dialect language model is applied to the input data set to perform speech recognition by dynamically selecting between the models in accordance with each of the identified portions. Further, speech recognition results obtained in accordance with the application of the models is output.
    Type: Application
    Filed: May 7, 2018
    Publication date: May 23, 2019
    Inventors: FADI BIADSY, LIDIA MANGU, HAGEN SOLTAU
  • Patent number: 10134394
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, relating to generating log-linear models. In some implementations, n-gram parameter values derived from an n-gram language model are obtained. N-gram features for a log-linear language model are determined based on the n-grams corresponding to the obtained n-gram parameter values. A weight for each of the determined n-gram features is determined, where the weight is determined based on (i) an n-gram parameter value that is derived from the n-gram language model and that corresponds to a particular n-gram, and (ii) an n-gram parameter value that is derived from the n-gram language model and that corresponds to an n-gram that is a sub-sequence within the particular n-gram. A log-linear language model having the determined n-gram features is generated, where the determined n-gram features in the log-linear language model have weights that are initialized based on the determined weights.
    Type: Grant
    Filed: May 11, 2015
    Date of Patent: November 20, 2018
    Assignee: Google LLC
    Inventors: Diamantino Antonio Caseiro, Fadi Biadsy
  • Patent number: 9966064
    Abstract: Methods and systems for automatic speech recognition and methods and systems for training acoustic language models are disclosed. In accordance with one automatic speech recognition method, an acoustic input data set is analyzed to identify portions of the input data set that conform to a general language and to identify portions of the input data set that conform to at least one dialect of the general language. In addition, a general language model and at least one dialect language model is applied to the input data set to perform speech recognition by dynamically selecting between the models in accordance with each of the identified portions. Further, speech recognition results obtained in accordance with the application of the models is output.
    Type: Grant
    Filed: July 18, 2012
    Date of Patent: May 8, 2018
    Assignee: International Business Machines Corporation
    Inventors: Fadi Biadsy, Lidia Mangu, Hagen Soltau
  • Publication number: 20180053502
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for language models using domain-specific model components. In some implementations, context data for an utterance is obtained. A domain-specific model component is selected from among multiple domain-specific model components of a language model based on the non-linguistic context of the utterance. A score for a candidate transcription for the utterance is generated using the selected domain-specific model component and a baseline model component of the language model that is domain-independent. A transcription for the utterance is determined using the score the transcription is provided as output of an automated speech recognition system.
    Type: Application
    Filed: August 21, 2017
    Publication date: February 22, 2018
    Inventors: Fadi Biadsy, Diamantino Antonio Caseiro
  • Patent number: 9842592
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for language models using non-linguistic context. In some implementations, context data indicating non-linguistic context for the utterance is received. Based on the context data, feature scores for one or more non-linguistic features are generated. The feature scores for the non-linguistic features are provided to a language model trained to process scores for non-linguistic features. The output from the language model is received, and a transcription for the utterance is determined using the output of the language model.
    Type: Grant
    Filed: February 12, 2014
    Date of Patent: December 12, 2017
    Assignee: Google Inc.
    Inventors: Fadi Biadsy, Pedro J. Moreno Mengibar
  • Patent number: 9805713
    Abstract: Systems and methods for addressing missing features in models are provided. In some implementations, a model configured to indicate likelihoods of different outcomes is accessed. The model includes a respective score for each of a plurality of features, and each feature corresponds to an outcome in an associated context. It is determined that the model does not include a score for a feature corresponding to a potential outcome in a particular context. A score is determined for the potential outcome in the particular context based on the scores for one or more features in the model that correspond to different outcomes in the particular context. The model and the score are used to determine a likelihood of occurrence of the potential outcome.
    Type: Grant
    Filed: April 8, 2015
    Date of Patent: October 31, 2017
    Assignee: Google Inc.
    Inventors: Fadi Biadsy, Diamantino Antonio Caseiro
  • Publication number: 20160275946
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, relating to generating log-linear models. In some implementations, n-gram parameter values derived from an n-gram language model are obtained. N-gram features for a log-linear language model are determined based on the n-grams corresponding to the obtained n-gram parameter values. A weight for each of the determined n-gram features is determined, where the weight is determined based on (i) an n-gram parameter value that is derived from the n-gram language model and that corresponds to a particular n-gram, and (ii) an n-gram parameter value that is derived from the n-gram language model and that corresponds to an n-gram that is a sub-sequence within the particular n-gram. A log-linear language model having the determined n-gram features is generated, where the determined n-gram features in the log-linear language model have weights that are initialized based on the determined weights.
    Type: Application
    Filed: May 11, 2015
    Publication date: September 22, 2016
    Inventors: Diamantino Antonio Caseiro, Fadi Biadsy
  • Publication number: 20160267904
    Abstract: Systems and methods for addressing missing features in models are provided. In some implementations, a model configured to indicate likelihoods of different outcomes is accessed. The model includes a respective score for each of a plurality of features, and each feature corresponds to an outcome in an associated context. It is determined that the model does not include a score for a feature corresponding to a potential outcome in a particular context. A score is determined for the potential outcome in the particular context based on the scores for one or more features in the model that correspond to different outcomes in the particular context. The model and the score are used to determine a likelihood of occurrence of the potential outcome.
    Type: Application
    Filed: April 8, 2015
    Publication date: September 15, 2016
    Inventors: Fadi Biadsy, Diamantino Antonio Caseiro
  • Patent number: 9412365
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, relating to enhanced maximum entropy models. In some implementations, data indicating a candidate transcription for an utterance and a particular context for the utterance are received. A maximum entropy language model is obtained. Feature values are determined for n-gram features and backoff features of the maximum entropy language model. The feature values are input to the maximum entropy language model, and an output is received from the maximum entropy language model. A transcription for the utterance is selected from among a plurality of candidate transcriptions based on the output from the maximum entropy language model. The selected transcription is provided to a client device.
    Type: Grant
    Filed: March 24, 2015
    Date of Patent: August 9, 2016
    Assignee: Google Inc.
    Inventors: Fadi Biadsy, Brian E. Roark
  • Patent number: 9324323
    Abstract: Speech recognition techniques may include: receiving audio; identifying one or more topics associated with audio; identifying language models in a topic space that correspond to the one or more topics, where the language models are identified based on proximity of a representation of the audio to representations of other audio in the topic space; using the language models to generate recognition candidates for the audio, where the recognition candidates have scores associated therewith that are indicative of a likelihood of a recognition candidate matching the audio; and selecting a recognition candidate for the audio based on the scores.
    Type: Grant
    Filed: December 14, 2012
    Date of Patent: April 26, 2016
    Assignee: Google Inc.
    Inventors: Daniel M. Bikel, Kapil R. Thadini, Fernando Pereira, Maria Shugrina, Fadi Biadsy
  • Patent number: 9318128
    Abstract: Methods and systems for facilitating development of voice-enabled applications are provided. The method may comprise receiving, at a computing device, a plurality of actions associated with a given application, parameters associated with each respective action, and example instructions responsive to respective actions. The method may also comprise determining candidate instructions based on the actions, parameters, and example instructions. Each candidate instruction may comprise one or more grammars recognizable by a voice interface for the given application. The method may further comprise the computing device receiving respective acceptance information for each candidate instruction, and comparing at least a portion of the respective acceptance information with a stored acceptance information log comprising predetermined acceptance information so as to determine a correlation.
    Type: Grant
    Filed: February 1, 2013
    Date of Patent: April 19, 2016
    Assignee: Google Inc.
    Inventors: Mark Edward Epstein, Pedro J. Moreno Mengibar, Fadi Biadsy
  • Patent number: 9293136
    Abstract: The subject matter of this specification can be embodied in, among other things, a method that includes receiving audio data that corresponds to an utterance, obtaining a first transcription of the utterance that was generated using a limited speech recognizer. The limited speech recognizer includes a speech recognizer that includes a language model that is trained over a limited speech recognition vocabulary that includes one or more terms from a voice command grammar, but that includes fewer than all terms of an expanded grammar. A second transcription of the utterance is obtained that was generated using an expanded speech recognizer. The expanded speech recognizer includes a speech recognizer that includes a language model that is trained over an expanded speech recognition vocabulary that includes all of the terms of the expanded grammar. The utterance is classified based at least on a portion of the first transcription or the second transcription.
    Type: Grant
    Filed: June 1, 2015
    Date of Patent: March 22, 2016
    Assignee: Google Inc.
    Inventors: Petar Aleksic, Pedro J. Moreno Mengibar, Fadi Biadsy