Patents by Inventor Fadi Biadsy

Fadi Biadsy has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Synthesized data augmentation using voice conversion and speech recognition models

Patent number: 11335324

Abstract: A method for training a speech conversion model personalized for a target speaker with atypical speech includes obtaining a plurality of transcriptions in a set of spoken training utterances and obtaining a plurality of unspoken training text utterances. Each spoken training utterance is spoken by a target speaker associated with atypical speech and includes a corresponding transcription paired with a corresponding non-synthetic speech representation. The method also includes adapting, using the set of spoken training utterances, a text-to-speech (TTS) model to synthesize speech in a voice of the target speaker and that captures the atypical speech. For each unspoken training text utterance, the method also includes generating, as output from the adapted TTS model, a synthetic speech representation that includes the voice of the target speaker and that captures the atypical speech. The method also includes training the speech conversion model based on the synthetic speech representations.

Type: Grant

Filed: August 31, 2020

Date of Patent: May 17, 2022

Assignee: Google LLC

Inventors: Fadi Biadsy, Liyang Jiang, Pedro J. Moreno Mengibar, Andrew Rosenberg
END-TO-END SPEECH CONVERSION

Publication number: 20220122579

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for end to end speech conversion are disclosed. In one aspect, a method includes the actions of receiving first audio data of a first utterance of one or more first terms spoken by a user. The actions further include providing the first audio data as an input to a model that is configured to receive first given audio data in a first voice and output second given audio data in a synthesized voice without performing speech recognition on the first given audio data. The actions further include receiving second audio data of a second utterance of the one or more first terms spoken in the synthesized voice. The actions further include providing, for output, the second audio data of the second utterance of the one or more first terms spoken in the synthesized voice.

Type: Application

Filed: November 26, 2019

Publication date: April 21, 2022

Applicant: Google LLC

Inventors: Fadi Biadsy, Ron J. Weiss, Aleksandar Kracun, Pedro J. Moreno Mengibar
Synthesized Data Augmentation Using Voice Conversion and Speech Recognition Models

Publication number: 20220068257

Abstract: A method for training a speech conversion model personalized for a target speaker with atypical speech includes obtaining a plurality of transcriptions in a set of spoken training utterances and obtaining a plurality of unspoken training text utterances. Each spoken training utterance is spoken by a target speaker associated with atypical speech and includes a corresponding transcription paired with a corresponding non-synthetic speech representation. The method also includes adapting, using the set of spoken training utterances, a text-to-speech (TTS) model to synthesize speech in a voice of the target speaker and that captures the atypical speech. For each unspoken training text utterance, the method also includes generating, as output from the adapted TTS model, a synthetic speech representation that includes the voice of the target speaker and that captures the atypical speech. The method also includes training the speech conversion model based on the synthetic speech representations.

Type: Application

Filed: August 31, 2020

Publication date: March 3, 2022

Applicant: Google LLC

Inventors: Fadi Biadsy, Liyang Jiang, Pedro J. Moreno Mengibar, Andrew Rosenberg
Dialect-specific acoustic language modeling and speech recognition

Patent number: 11164566

Abstract: Methods and systems for automatic speech recognition and methods and systems for training acoustic language models are disclosed. In accordance with one automatic speech recognition method, an acoustic input data set is analyzed to identify portions of the input data set that conform to a general language and to identify portions of the input data set that conform to at least one dialect of the general language. In addition, a general language model and at least one dialect language model is applied to the input data set to perform speech recognition by dynamically selecting between the models in accordance with each of the identified portions. Further, speech recognition results obtained in accordance with the application of the models is output.

Type: Grant

Filed: May 7, 2018

Date of Patent: November 2, 2021

Assignee: International Business Machines Corporation

Inventors: Fadi Biadsy, Lidia Mangu, Hagen Soltau
Direct Speech-to-Speech Translation via Machine Learning

Publication number: 20210209315

Abstract: The present disclosure provides systems and methods that train and use machine-learned models such as, for example, sequence-to-sequence models, to perform direct and text-free speech-to-speech translation. In particular, aspects of the present disclosure provide an attention-based sequence-to-sequence neural network which can directly translate speech from one language into speech in another language, without relying on an intermediate text representation.

Type: Application

Filed: March 7, 2020

Publication date: July 8, 2021

Inventors: Ye Jia, Zhifeng Chen, Yonghui Wu, Melvin Johnson, Fadi Biadsy, Ron Weiss, Wolfgang Macherey
LANGUAGE MODELS USING DOMAIN-SPECIFIC MODEL COMPONENTS

Publication number: 20210020170

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for language models using domain-specific model components. In some implementations, context data for an utterance is obtained. A domain-specific model component is selected from among multiple domain-specific model components of a language model based on the non-linguistic context of the utterance. A score for a candidate transcription for the utterance is generated using the selected domain-specific model component and a baseline model component of the language model that is domain-independent. A transcription for the utterance is determined using the score the transcription is provided as output of an automated speech recognition system.

Type: Application

Filed: October 1, 2020

Publication date: January 21, 2021

Applicant: Google LLC

Inventors: Fadi Biadsy, Diamantino Antionio Caseiro
Automated speech recognition using language models that selectively use domain-specific model components

Patent number: 10832664

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for language models using domain-specific model components. In some implementations, context data for an utterance is obtained. A domain-specific model component is selected from among multiple domain-specific model components of a language model based on the non-linguistic context of the utterance. A score for a candidate transcription for the utterance is generated using the selected domain-specific model component and a baseline model component of the language model that is domain-independent. A transcription for the utterance is determined using the score the transcription is provided as output of an automated speech recognition system.

Type: Grant

Filed: August 21, 2017

Date of Patent: November 10, 2020

Assignee: Google LLC

Inventors: Fadi Biadsy, Diamantino Antionio Caseiro
FACTOR GRAPH FOR SEMANTIC PARSING

Publication number: 20190244610

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for generating expressions associated with voice commands. The methods, systems, and apparatus include actions of obtaining segments of one or more expressions associated with a voice command. Further actions include combining the segments into a candidate expression and scoring the candidate expression using a text corpus. Additional actions include selecting the candidate expression as an expression associated with the voice command based on the scoring of the candidate expression.

Type: Application

Filed: January 25, 2019

Publication date: August 8, 2019

Inventors: Fadi Biadsy, Pedro J. Moreno Mengibar
DIALECT-SPECIFIC ACOUSTIC LANGUAGE MODELING AND SPEECH RECOGNITION

Publication number: 20190156820

Abstract: Methods and systems for automatic speech recognition and methods and systems for training acoustic language models are disclosed. In accordance with one automatic speech recognition method, an acoustic input data set is analyzed to identify portions of the input data set that conform to a general language and to identify portions of the input data set that conform to at least one dialect of the general language. In addition, a general language model and at least one dialect language model is applied to the input data set to perform speech recognition by dynamically selecting between the models in accordance with each of the identified portions. Further, speech recognition results obtained in accordance with the application of the models is output.

Type: Application

Filed: May 7, 2018

Publication date: May 23, 2019

Inventors: FADI BIADSY, LIDIA MANGU, HAGEN SOLTAU
Speech recognition using log-linear model

Patent number: 10134394

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, relating to generating log-linear models. In some implementations, n-gram parameter values derived from an n-gram language model are obtained. N-gram features for a log-linear language model are determined based on the n-grams corresponding to the obtained n-gram parameter values. A weight for each of the determined n-gram features is determined, where the weight is determined based on (i) an n-gram parameter value that is derived from the n-gram language model and that corresponds to a particular n-gram, and (ii) an n-gram parameter value that is derived from the n-gram language model and that corresponds to an n-gram that is a sub-sequence within the particular n-gram. A log-linear language model having the determined n-gram features is generated, where the determined n-gram features in the log-linear language model have weights that are initialized based on the determined weights.

Type: Grant

Filed: May 11, 2015

Date of Patent: November 20, 2018

Assignee: Google LLC

Inventors: Diamantino Antonio Caseiro, Fadi Biadsy
Dialect-specific acoustic language modeling and speech recognition

Patent number: 9966064

Abstract: Methods and systems for automatic speech recognition and methods and systems for training acoustic language models are disclosed. In accordance with one automatic speech recognition method, an acoustic input data set is analyzed to identify portions of the input data set that conform to a general language and to identify portions of the input data set that conform to at least one dialect of the general language. In addition, a general language model and at least one dialect language model is applied to the input data set to perform speech recognition by dynamically selecting between the models in accordance with each of the identified portions. Further, speech recognition results obtained in accordance with the application of the models is output.

Type: Grant

Filed: July 18, 2012

Date of Patent: May 8, 2018

Assignee: International Business Machines Corporation

Inventors: Fadi Biadsy, Lidia Mangu, Hagen Soltau
LANGUAGE MODELS USING DOMAIN-SPECIFIC MODEL COMPONENTS

Publication number: 20180053502

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for language models using domain-specific model components. In some implementations, context data for an utterance is obtained. A domain-specific model component is selected from among multiple domain-specific model components of a language model based on the non-linguistic context of the utterance. A score for a candidate transcription for the utterance is generated using the selected domain-specific model component and a baseline model component of the language model that is domain-independent. A transcription for the utterance is determined using the score the transcription is provided as output of an automated speech recognition system.

Type: Application

Filed: August 21, 2017

Publication date: February 22, 2018

Inventors: Fadi Biadsy, Diamantino Antonio Caseiro
Language models using non-linguistic context

Patent number: 9842592

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for language models using non-linguistic context. In some implementations, context data indicating non-linguistic context for the utterance is received. Based on the context data, feature scores for one or more non-linguistic features are generated. The feature scores for the non-linguistic features are provided to a language model trained to process scores for non-linguistic features. The output from the language model is received, and a transcription for the utterance is determined using the output of the language model.

Type: Grant

Filed: February 12, 2014

Date of Patent: December 12, 2017

Assignee: Google Inc.

Inventors: Fadi Biadsy, Pedro J. Moreno Mengibar
Addressing missing features in models

Patent number: 9805713

Abstract: Systems and methods for addressing missing features in models are provided. In some implementations, a model configured to indicate likelihoods of different outcomes is accessed. The model includes a respective score for each of a plurality of features, and each feature corresponds to an outcome in an associated context. It is determined that the model does not include a score for a feature corresponding to a potential outcome in a particular context. A score is determined for the potential outcome in the particular context based on the scores for one or more features in the model that correspond to different outcomes in the particular context. The model and the score are used to determine a likelihood of occurrence of the potential outcome.

Type: Grant

Filed: April 8, 2015

Date of Patent: October 31, 2017

Assignee: Google Inc.

Inventors: Fadi Biadsy, Diamantino Antonio Caseiro
SPEECH RECOGNITION USING LOG-LINEAR MODEL

Publication number: 20160275946

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, relating to generating log-linear models. In some implementations, n-gram parameter values derived from an n-gram language model are obtained. N-gram features for a log-linear language model are determined based on the n-grams corresponding to the obtained n-gram parameter values. A weight for each of the determined n-gram features is determined, where the weight is determined based on (i) an n-gram parameter value that is derived from the n-gram language model and that corresponds to a particular n-gram, and (ii) an n-gram parameter value that is derived from the n-gram language model and that corresponds to an n-gram that is a sub-sequence within the particular n-gram. A log-linear language model having the determined n-gram features is generated, where the determined n-gram features in the log-linear language model have weights that are initialized based on the determined weights.

Type: Application

Filed: May 11, 2015

Publication date: September 22, 2016

Inventors: Diamantino Antonio Caseiro, Fadi Biadsy
Addressing Missing Features in Models

Publication number: 20160267904

Abstract: Systems and methods for addressing missing features in models are provided. In some implementations, a model configured to indicate likelihoods of different outcomes is accessed. The model includes a respective score for each of a plurality of features, and each feature corresponds to an outcome in an associated context. It is determined that the model does not include a score for a feature corresponding to a potential outcome in a particular context. A score is determined for the potential outcome in the particular context based on the scores for one or more features in the model that correspond to different outcomes in the particular context. The model and the score are used to determine a likelihood of occurrence of the potential outcome.

Type: Application

Filed: April 8, 2015

Publication date: September 15, 2016

Inventors: Fadi Biadsy, Diamantino Antonio Caseiro
Enhanced maximum entropy models

Patent number: 9412365

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, relating to enhanced maximum entropy models. In some implementations, data indicating a candidate transcription for an utterance and a particular context for the utterance are received. A maximum entropy language model is obtained. Feature values are determined for n-gram features and backoff features of the maximum entropy language model. The feature values are input to the maximum entropy language model, and an output is received from the maximum entropy language model. A transcription for the utterance is selected from among a plurality of candidate transcriptions based on the output from the maximum entropy language model. The selected transcription is provided to a client device.

Type: Grant

Filed: March 24, 2015

Date of Patent: August 9, 2016

Assignee: Google Inc.

Inventors: Fadi Biadsy, Brian E. Roark
Speech recognition using topic-specific language models

Patent number: 9324323

Abstract: Speech recognition techniques may include: receiving audio; identifying one or more topics associated with audio; identifying language models in a topic space that correspond to the one or more topics, where the language models are identified based on proximity of a representation of the audio to representations of other audio in the topic space; using the language models to generate recognition candidates for the audio, where the recognition candidates have scores associated therewith that are indicative of a likelihood of a recognition candidate matching the audio; and selecting a recognition candidate for the audio based on the scores.

Type: Grant

Filed: December 14, 2012

Date of Patent: April 26, 2016

Assignee: Google Inc.

Inventors: Daniel M. Bikel, Kapil R. Thadini, Fernando Pereira, Maria Shugrina, Fadi Biadsy
Methods and systems for determining instructions for applications that are recognizable by a voice interface

Patent number: 9318128

Abstract: Methods and systems for facilitating development of voice-enabled applications are provided. The method may comprise receiving, at a computing device, a plurality of actions associated with a given application, parameters associated with each respective action, and example instructions responsive to respective actions. The method may also comprise determining candidate instructions based on the actions, parameters, and example instructions. Each candidate instruction may comprise one or more grammars recognizable by a voice interface for the given application. The method may further comprise the computing device receiving respective acceptance information for each candidate instruction, and comparing at least a portion of the respective acceptance information with a stored acceptance information log comprising predetermined acceptance information so as to determine a correlation.

Type: Grant

Filed: February 1, 2013

Date of Patent: April 19, 2016

Assignee: Google Inc.

Inventors: Mark Edward Epstein, Pedro J. Moreno Mengibar, Fadi Biadsy
Multiple recognizer speech recognition

Patent number: 9293136

Abstract: The subject matter of this specification can be embodied in, among other things, a method that includes receiving audio data that corresponds to an utterance, obtaining a first transcription of the utterance that was generated using a limited speech recognizer. The limited speech recognizer includes a speech recognizer that includes a language model that is trained over a limited speech recognition vocabulary that includes one or more terms from a voice command grammar, but that includes fewer than all terms of an expanded grammar. A second transcription of the utterance is obtained that was generated using an expanded speech recognizer. The expanded speech recognizer includes a speech recognizer that includes a language model that is trained over an expanded speech recognition vocabulary that includes all of the terms of the expanded grammar. The utterance is classified based at least on a portion of the first transcription or the second transcription.

Type: Grant

Filed: June 1, 2015

Date of Patent: March 22, 2016

Assignee: Google Inc.

Inventors: Petar Aleksic, Pedro J. Moreno Mengibar, Fadi Biadsy

prev 1 2 3 4 next