Patents by Inventor Fadi Biadsy
Fadi Biadsy has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11335324Abstract: A method for training a speech conversion model personalized for a target speaker with atypical speech includes obtaining a plurality of transcriptions in a set of spoken training utterances and obtaining a plurality of unspoken training text utterances. Each spoken training utterance is spoken by a target speaker associated with atypical speech and includes a corresponding transcription paired with a corresponding non-synthetic speech representation. The method also includes adapting, using the set of spoken training utterances, a text-to-speech (TTS) model to synthesize speech in a voice of the target speaker and that captures the atypical speech. For each unspoken training text utterance, the method also includes generating, as output from the adapted TTS model, a synthetic speech representation that includes the voice of the target speaker and that captures the atypical speech. The method also includes training the speech conversion model based on the synthetic speech representations.Type: GrantFiled: August 31, 2020Date of Patent: May 17, 2022Assignee: Google LLCInventors: Fadi Biadsy, Liyang Jiang, Pedro J. Moreno Mengibar, Andrew Rosenberg
-
Publication number: 20220122579Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for end to end speech conversion are disclosed. In one aspect, a method includes the actions of receiving first audio data of a first utterance of one or more first terms spoken by a user. The actions further include providing the first audio data as an input to a model that is configured to receive first given audio data in a first voice and output second given audio data in a synthesized voice without performing speech recognition on the first given audio data. The actions further include receiving second audio data of a second utterance of the one or more first terms spoken in the synthesized voice. The actions further include providing, for output, the second audio data of the second utterance of the one or more first terms spoken in the synthesized voice.Type: ApplicationFiled: November 26, 2019Publication date: April 21, 2022Applicant: Google LLCInventors: Fadi Biadsy, Ron J. Weiss, Aleksandar Kracun, Pedro J. Moreno Mengibar
-
Publication number: 20220068257Abstract: A method for training a speech conversion model personalized for a target speaker with atypical speech includes obtaining a plurality of transcriptions in a set of spoken training utterances and obtaining a plurality of unspoken training text utterances. Each spoken training utterance is spoken by a target speaker associated with atypical speech and includes a corresponding transcription paired with a corresponding non-synthetic speech representation. The method also includes adapting, using the set of spoken training utterances, a text-to-speech (TTS) model to synthesize speech in a voice of the target speaker and that captures the atypical speech. For each unspoken training text utterance, the method also includes generating, as output from the adapted TTS model, a synthetic speech representation that includes the voice of the target speaker and that captures the atypical speech. The method also includes training the speech conversion model based on the synthetic speech representations.Type: ApplicationFiled: August 31, 2020Publication date: March 3, 2022Applicant: Google LLCInventors: Fadi Biadsy, Liyang Jiang, Pedro J. Moreno Mengibar, Andrew Rosenberg
-
Patent number: 11164566Abstract: Methods and systems for automatic speech recognition and methods and systems for training acoustic language models are disclosed. In accordance with one automatic speech recognition method, an acoustic input data set is analyzed to identify portions of the input data set that conform to a general language and to identify portions of the input data set that conform to at least one dialect of the general language. In addition, a general language model and at least one dialect language model is applied to the input data set to perform speech recognition by dynamically selecting between the models in accordance with each of the identified portions. Further, speech recognition results obtained in accordance with the application of the models is output.Type: GrantFiled: May 7, 2018Date of Patent: November 2, 2021Assignee: International Business Machines CorporationInventors: Fadi Biadsy, Lidia Mangu, Hagen Soltau
-
Publication number: 20210209315Abstract: The present disclosure provides systems and methods that train and use machine-learned models such as, for example, sequence-to-sequence models, to perform direct and text-free speech-to-speech translation. In particular, aspects of the present disclosure provide an attention-based sequence-to-sequence neural network which can directly translate speech from one language into speech in another language, without relying on an intermediate text representation.Type: ApplicationFiled: March 7, 2020Publication date: July 8, 2021Inventors: Ye Jia, Zhifeng Chen, Yonghui Wu, Melvin Johnson, Fadi Biadsy, Ron Weiss, Wolfgang Macherey
-
Publication number: 20210020170Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for language models using domain-specific model components. In some implementations, context data for an utterance is obtained. A domain-specific model component is selected from among multiple domain-specific model components of a language model based on the non-linguistic context of the utterance. A score for a candidate transcription for the utterance is generated using the selected domain-specific model component and a baseline model component of the language model that is domain-independent. A transcription for the utterance is determined using the score the transcription is provided as output of an automated speech recognition system.Type: ApplicationFiled: October 1, 2020Publication date: January 21, 2021Applicant: Google LLCInventors: Fadi Biadsy, Diamantino Antionio Caseiro
-
Patent number: 10832664Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for language models using domain-specific model components. In some implementations, context data for an utterance is obtained. A domain-specific model component is selected from among multiple domain-specific model components of a language model based on the non-linguistic context of the utterance. A score for a candidate transcription for the utterance is generated using the selected domain-specific model component and a baseline model component of the language model that is domain-independent. A transcription for the utterance is determined using the score the transcription is provided as output of an automated speech recognition system.Type: GrantFiled: August 21, 2017Date of Patent: November 10, 2020Assignee: Google LLCInventors: Fadi Biadsy, Diamantino Antionio Caseiro
-
Publication number: 20190244610Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating expressions associated with voice commands. The methods, systems, and apparatus include actions of obtaining segments of one or more expressions associated with a voice command. Further actions include combining the segments into a candidate expression and scoring the candidate expression using a text corpus. Additional actions include selecting the candidate expression as an expression associated with the voice command based on the scoring of the candidate expression.Type: ApplicationFiled: January 25, 2019Publication date: August 8, 2019Inventors: Fadi Biadsy, Pedro J. Moreno Mengibar
-
Publication number: 20190156820Abstract: Methods and systems for automatic speech recognition and methods and systems for training acoustic language models are disclosed. In accordance with one automatic speech recognition method, an acoustic input data set is analyzed to identify portions of the input data set that conform to a general language and to identify portions of the input data set that conform to at least one dialect of the general language. In addition, a general language model and at least one dialect language model is applied to the input data set to perform speech recognition by dynamically selecting between the models in accordance with each of the identified portions. Further, speech recognition results obtained in accordance with the application of the models is output.Type: ApplicationFiled: May 7, 2018Publication date: May 23, 2019Inventors: FADI BIADSY, LIDIA MANGU, HAGEN SOLTAU
-
Patent number: 10134394Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, relating to generating log-linear models. In some implementations, n-gram parameter values derived from an n-gram language model are obtained. N-gram features for a log-linear language model are determined based on the n-grams corresponding to the obtained n-gram parameter values. A weight for each of the determined n-gram features is determined, where the weight is determined based on (i) an n-gram parameter value that is derived from the n-gram language model and that corresponds to a particular n-gram, and (ii) an n-gram parameter value that is derived from the n-gram language model and that corresponds to an n-gram that is a sub-sequence within the particular n-gram. A log-linear language model having the determined n-gram features is generated, where the determined n-gram features in the log-linear language model have weights that are initialized based on the determined weights.Type: GrantFiled: May 11, 2015Date of Patent: November 20, 2018Assignee: Google LLCInventors: Diamantino Antonio Caseiro, Fadi Biadsy
-
Patent number: 9966064Abstract: Methods and systems for automatic speech recognition and methods and systems for training acoustic language models are disclosed. In accordance with one automatic speech recognition method, an acoustic input data set is analyzed to identify portions of the input data set that conform to a general language and to identify portions of the input data set that conform to at least one dialect of the general language. In addition, a general language model and at least one dialect language model is applied to the input data set to perform speech recognition by dynamically selecting between the models in accordance with each of the identified portions. Further, speech recognition results obtained in accordance with the application of the models is output.Type: GrantFiled: July 18, 2012Date of Patent: May 8, 2018Assignee: International Business Machines CorporationInventors: Fadi Biadsy, Lidia Mangu, Hagen Soltau
-
Publication number: 20180053502Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for language models using domain-specific model components. In some implementations, context data for an utterance is obtained. A domain-specific model component is selected from among multiple domain-specific model components of a language model based on the non-linguistic context of the utterance. A score for a candidate transcription for the utterance is generated using the selected domain-specific model component and a baseline model component of the language model that is domain-independent. A transcription for the utterance is determined using the score the transcription is provided as output of an automated speech recognition system.Type: ApplicationFiled: August 21, 2017Publication date: February 22, 2018Inventors: Fadi Biadsy, Diamantino Antonio Caseiro
-
Patent number: 9842592Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for language models using non-linguistic context. In some implementations, context data indicating non-linguistic context for the utterance is received. Based on the context data, feature scores for one or more non-linguistic features are generated. The feature scores for the non-linguistic features are provided to a language model trained to process scores for non-linguistic features. The output from the language model is received, and a transcription for the utterance is determined using the output of the language model.Type: GrantFiled: February 12, 2014Date of Patent: December 12, 2017Assignee: Google Inc.Inventors: Fadi Biadsy, Pedro J. Moreno Mengibar
-
Patent number: 9805713Abstract: Systems and methods for addressing missing features in models are provided. In some implementations, a model configured to indicate likelihoods of different outcomes is accessed. The model includes a respective score for each of a plurality of features, and each feature corresponds to an outcome in an associated context. It is determined that the model does not include a score for a feature corresponding to a potential outcome in a particular context. A score is determined for the potential outcome in the particular context based on the scores for one or more features in the model that correspond to different outcomes in the particular context. The model and the score are used to determine a likelihood of occurrence of the potential outcome.Type: GrantFiled: April 8, 2015Date of Patent: October 31, 2017Assignee: Google Inc.Inventors: Fadi Biadsy, Diamantino Antonio Caseiro
-
Publication number: 20160275946Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, relating to generating log-linear models. In some implementations, n-gram parameter values derived from an n-gram language model are obtained. N-gram features for a log-linear language model are determined based on the n-grams corresponding to the obtained n-gram parameter values. A weight for each of the determined n-gram features is determined, where the weight is determined based on (i) an n-gram parameter value that is derived from the n-gram language model and that corresponds to a particular n-gram, and (ii) an n-gram parameter value that is derived from the n-gram language model and that corresponds to an n-gram that is a sub-sequence within the particular n-gram. A log-linear language model having the determined n-gram features is generated, where the determined n-gram features in the log-linear language model have weights that are initialized based on the determined weights.Type: ApplicationFiled: May 11, 2015Publication date: September 22, 2016Inventors: Diamantino Antonio Caseiro, Fadi Biadsy
-
Publication number: 20160267904Abstract: Systems and methods for addressing missing features in models are provided. In some implementations, a model configured to indicate likelihoods of different outcomes is accessed. The model includes a respective score for each of a plurality of features, and each feature corresponds to an outcome in an associated context. It is determined that the model does not include a score for a feature corresponding to a potential outcome in a particular context. A score is determined for the potential outcome in the particular context based on the scores for one or more features in the model that correspond to different outcomes in the particular context. The model and the score are used to determine a likelihood of occurrence of the potential outcome.Type: ApplicationFiled: April 8, 2015Publication date: September 15, 2016Inventors: Fadi Biadsy, Diamantino Antonio Caseiro
-
Patent number: 9412365Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, relating to enhanced maximum entropy models. In some implementations, data indicating a candidate transcription for an utterance and a particular context for the utterance are received. A maximum entropy language model is obtained. Feature values are determined for n-gram features and backoff features of the maximum entropy language model. The feature values are input to the maximum entropy language model, and an output is received from the maximum entropy language model. A transcription for the utterance is selected from among a plurality of candidate transcriptions based on the output from the maximum entropy language model. The selected transcription is provided to a client device.Type: GrantFiled: March 24, 2015Date of Patent: August 9, 2016Assignee: Google Inc.Inventors: Fadi Biadsy, Brian E. Roark
-
Patent number: 9324323Abstract: Speech recognition techniques may include: receiving audio; identifying one or more topics associated with audio; identifying language models in a topic space that correspond to the one or more topics, where the language models are identified based on proximity of a representation of the audio to representations of other audio in the topic space; using the language models to generate recognition candidates for the audio, where the recognition candidates have scores associated therewith that are indicative of a likelihood of a recognition candidate matching the audio; and selecting a recognition candidate for the audio based on the scores.Type: GrantFiled: December 14, 2012Date of Patent: April 26, 2016Assignee: Google Inc.Inventors: Daniel M. Bikel, Kapil R. Thadini, Fernando Pereira, Maria Shugrina, Fadi Biadsy
-
Patent number: 9318128Abstract: Methods and systems for facilitating development of voice-enabled applications are provided. The method may comprise receiving, at a computing device, a plurality of actions associated with a given application, parameters associated with each respective action, and example instructions responsive to respective actions. The method may also comprise determining candidate instructions based on the actions, parameters, and example instructions. Each candidate instruction may comprise one or more grammars recognizable by a voice interface for the given application. The method may further comprise the computing device receiving respective acceptance information for each candidate instruction, and comparing at least a portion of the respective acceptance information with a stored acceptance information log comprising predetermined acceptance information so as to determine a correlation.Type: GrantFiled: February 1, 2013Date of Patent: April 19, 2016Assignee: Google Inc.Inventors: Mark Edward Epstein, Pedro J. Moreno Mengibar, Fadi Biadsy
-
Patent number: 9293136Abstract: The subject matter of this specification can be embodied in, among other things, a method that includes receiving audio data that corresponds to an utterance, obtaining a first transcription of the utterance that was generated using a limited speech recognizer. The limited speech recognizer includes a speech recognizer that includes a language model that is trained over a limited speech recognition vocabulary that includes one or more terms from a voice command grammar, but that includes fewer than all terms of an expanded grammar. A second transcription of the utterance is obtained that was generated using an expanded speech recognizer. The expanded speech recognizer includes a speech recognizer that includes a language model that is trained over an expanded speech recognition vocabulary that includes all of the terms of the expanded grammar. The utterance is classified based at least on a portion of the first transcription or the second transcription.Type: GrantFiled: June 1, 2015Date of Patent: March 22, 2016Assignee: Google Inc.Inventors: Petar Aleksic, Pedro J. Moreno Mengibar, Fadi Biadsy