Patents by Inventor Fadi Biadsy

Fadi Biadsy has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240135117
    Abstract: The present disclosure relates to a streaming speech-to-speech conversion model, where an encoder runs in real time while a user is speaking, then after the speaking stops, a decoder generates output audio in real time. A streaming-based approach produces an acceptable delay with minimal loss in conversion quality when compared to other non-streaming server-based models. A hybrid model approach for combines look-ahead in the encoder and a non-causal stacker with non-causal self-attention.
    Type: Application
    Filed: October 23, 2023
    Publication date: April 25, 2024
    Applicant: GOOGLE LLC
    Inventors: Oleg RYBAKOV, Fadi BIADSY
  • Publication number: 20240127807
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for language models using domain-specific model components. In some implementations, context data for an utterance is obtained. A domain-specific model component is selected from among multiple domain-specific model components of a language model based on the non-linguistic context of the utterance. A score for a candidate transcription for the utterance is generated using the selected domain-specific model component and a baseline model component of the language model that is domain-independent. A transcription for the utterance is determined using the score the transcription is provided as output of an automated speech recognition system.
    Type: Application
    Filed: December 21, 2023
    Publication date: April 18, 2024
    Applicant: Google LLC
    Inventors: Fadi Biadsy, Diamantino Antonio Caseiro
  • Publication number: 20240021190
    Abstract: A method for training a sub-model for contextual biasing for speech recognition includes obtaining a base speech recognition model trained on non-biased data. The method includes obtaining a set of training utterances representative of a particular domain, each training utterance in the set of training utterances including audio data characterizing the training utterances and a ground truth transcription of the training utterance. The method further includes, for each corresponding training utterance in the set of training utterances, determining, using an embedding encoder, a corresponding document embedding from the ground truth transcription of the corresponding training utterance. The method includes training, using the corresponding document embeddings determined from the ground truth transcriptions of the set of training utterances, a sub-model to bias the base speech recognition model to recognize speech in the particular domain.
    Type: Application
    Filed: July 18, 2022
    Publication date: January 18, 2024
    Applicant: Google LLC
    Inventors: Fadi Biadsy, Pedro Jose Moreno Mengibar
  • Patent number: 11875789
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for language models using domain-specific model components. In some implementations, context data for an utterance is obtained. A domain-specific model component is selected from among multiple domain-specific model components of a language model based on the non-linguistic context of the utterance. A score for a candidate transcription for the utterance is generated using the selected domain-specific model component and a baseline model component of the language model that is domain-independent. A transcription for the utterance is determined using the score the transcription is provided as output of an automated speech recognition system.
    Type: Grant
    Filed: December 20, 2022
    Date of Patent: January 16, 2024
    Assignee: Google LLC
    Inventors: Fadi Biadsy, Diamantino Antonio Caseiro
  • Publication number: 20230395061
    Abstract: A method for turn detection in a speech-to-speech model includes receiving, as input to the speech-to-speech (S2S) model, a sequence of acoustic frames corresponding to an utterance. The method further includes, at each of a plurality of output steps, generating, by an audio encoder of the S2S model, a higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames, and determining, by a turn detector of the S2S model, based on the higher order feature representation generated by the audio encoder at the corresponding output step, whether the utterance is at a breakpoint at the corresponding output step. When the turn detector determines that the utterance is at the breakpoint, the method includes synthesizing a sequence of output audio frames output by a speech decoder of the S2S model into a time-domain audio waveform of synthesized speech representing the utterance spoken by the user.
    Type: Application
    Filed: May 17, 2023
    Publication date: December 7, 2023
    Applicant: Google LLC
    Inventors: Fadi Biadsy, Oleg Rybakov
  • Patent number: 11823685
    Abstract: A method includes receiving acoustic features of a first utterance spoken by a first user that speaks with typical speech and processing the acoustic features of the first utterance using a general speech recognizer to generate a first transcription of the first utterance. The operations also include analyzing the first transcription of the first utterance to identify one or more bias terms in the first transcription and biasing the alternative speech recognizer on the one or more bias terms identified in the first transcription. The operations also include receiving acoustic features of a second utterance spoken by a second user that speaks with atypical speech and processing, using the alternative speech recognizer biased on the one or more terms identified in the first transcription, the acoustic features of the second utterance to generate a second transcription of the second utterance.
    Type: Grant
    Filed: January 25, 2023
    Date of Patent: November 21, 2023
    Assignee: Google LLC
    Inventors: Fadi Biadsy, Pedro J. Moreno Mengibar
  • Publication number: 20230360632
    Abstract: A method includes receiving a reference audio signal corresponding to reference speech spoken by a target speaker with atypical speech, and generating, by a speaker embedding network configured to receive the reference audio signal as input, a speaker embedding for the target speaker. The speaker embedding conveys speaker characteristics of the target speaker. The method also includes receiving a speech conversion request that includes input audio data corresponding to an utterance spoken by the target speaker associated with the atypical speech. The method also includes biasing, using the speaker embedding generated for the target speaker by the speaker embedding network, a speech conversion model to convert the input audio data corresponding to the utterance spoken by the target speaker associated with atypical speech into an output canonical representation of the utterance spoken by the target speaker.
    Type: Application
    Filed: May 3, 2022
    Publication date: November 9, 2023
    Applicant: Google LLC
    Inventors: Fadi Biadsy, Dirk Ryan Padfield, Victoria Zayats
  • Publication number: 20230335122
    Abstract: A method for contextual biasing for speech recognition includes obtaining a base automatic speech recognition (ASR) model trained on non-biased data and a sub-model trained on biased data representative of a particular domain. The method includes receiving a speech recognition request including audio data characterizing an utterance captured in streaming audio. The method further includes determining whether the speech recognition request includes a contextual indicator indicating the particular domain. When the speech recognition request does not include the contextual indicator, the method includes generating, using the base ASR model, a first speech recognition result of the utterance by processing the audio data.
    Type: Application
    Filed: April 19, 2022
    Publication date: October 19, 2023
    Applicant: Google LLC
    Inventors: Fadi Biadsy, Pedro J. Moreno Mengibar
  • Publication number: 20230298574
    Abstract: A method for speech conversion includes obtaining a speech conversion model configured to convert input utterances of human speech directly into corresponding output utterances of synthesized speech. The method further includes receiving a speech conversion request including input audio data corresponding to an utterance spoken by a target speaker associated with atypical speech and a speaker identifier uniquely identifying the target speaker. The method includes activating, using the speaker identifier, a particular sub-model for biasing the speech conversion model to recognize a type of the atypical speech associated with the target speaker identified by the speaker identifier.
    Type: Application
    Filed: March 15, 2023
    Publication date: September 21, 2023
    Applicant: Google LLC
    Inventors: Fadi Biadsy, Youzheng Chen, Xia Zhang, Oleg Rybakov, Andrew M. Rosenberg, Pedro J.Moreno Mengibar
  • Publication number: 20230298565
    Abstract: A method includes receiving a set of training utterances each including a non-synthetic speech representation of a corresponding utterance, and for each training utterance, generating a corresponding synthetic speech representation by using a voice conversion model. The non-synthetic speech representation and the synthetic speech representation form a corresponding training utterance pair. At each of a plurality of output steps for each training utterance pair, the method also includes generating, for output by a speech recognition model, a first probability distribution over possible non-synthetic speech recognition hypotheses for the non-synthetic speech representation and a second probability distribution over possible synthetic speech recognition hypotheses for the synthetic speech representation.
    Type: Application
    Filed: April 25, 2022
    Publication date: September 21, 2023
    Applicant: Google LLC
    Inventors: Andrew M. Rosenberg, Gary Wang, Bhuvana Ramabhadran, Fadi Biadsy
  • Publication number: 20230267949
    Abstract: A method includes receiving a current spectrogram frame and reconstructing a phase of the current spectrogram frame by, for each corresponding committed spectrogram frame in a sequence of M number of committed spectrogram frames preceding the current spectrogram frame, obtaining a value of a committed phase of the corresponding committed spectrogram frame and estimating the phase of the current spectrogram frame based on a magnitude of the current spectrogram frame and the value of the committed phase of each corresponding committed spectrogram frame in the sequence of M number of committed spectrogram frames preceding the current spectrogram frame. The method also includes synthesizing, for the current spectrogram frame, a new time-domain audio waveform frame based on the estimated phase of the current spectrogram frame.
    Type: Application
    Filed: February 2, 2023
    Publication date: August 24, 2023
    Applicant: Google LLC
    Inventors: Oleg Rybakov, Liyang Jiang, Fadi Biadsy
  • Publication number: 20230230572
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for end to end speech conversion are disclosed. In one aspect, a method includes the actions of receiving first audio data of a first utterance of one or more first terms spoken by a user. The actions further include providing the first audio data as an input to a model that is configured to receive first given audio data in a first voice and output second given audio data in a synthesized voice without performing speech recognition on the first given audio data. The actions further include receiving second audio data of a second utterance of the one or more first terms spoken in the synthesized voice. The actions further include providing, for output, the second audio data of the second utterance of the one or more first terms spoken in the synthesized voice.
    Type: Application
    Filed: March 23, 2023
    Publication date: July 20, 2023
    Applicant: Google LLC
    Inventors: Fadi Biadsy, Ron J. Weiss, Aleksandar Kracun, Pedro J. Moreno Mengibar
  • Publication number: 20230169983
    Abstract: A method includes receiving acoustic features of a first utterance spoken by a first user that speaks with typical speech and processing the acoustic features of the first utterance using a general speech recognizer to generate a first transcription of the first utterance. The operations also include analyzing the first transcription of the first utterance to identify one or more bias terms in the first transcription and biasing the alternative speech recognizer on the one or more bias terms identified in the first transcription. The operations also include receiving acoustic features of a second utterance spoken by a second user that speaks with atypical speech and processing, using the alternative speech recognizer biased on the one or more terms identified in the first transcription, the acoustic features of the second utterance to generate a second transcription of the second utterance.
    Type: Application
    Filed: January 25, 2023
    Publication date: June 1, 2023
    Applicant: Google LLC
    Inventors: Fadi Biadsy, Pedro J. Moreno Mengibar
  • Publication number: 20230122941
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for language models using domain-specific model components. In some implementations, context data for an utterance is obtained. A domain-specific model component is selected from among multiple domain-specific model components of a language model based on the non-linguistic context of the utterance. A score for a candidate transcription for the utterance is generated using the selected domain-specific model component and a baseline model component of the language model that is domain-independent. A transcription for the utterance is determined using the score the transcription is provided as output of an automated speech recognition system.
    Type: Application
    Filed: December 20, 2022
    Publication date: April 20, 2023
    Applicant: Google LLC
    Inventors: Fadi Biadsy, Diamantino Antonio Caseiro
  • Patent number: 11580994
    Abstract: A method includes receiving acoustic features of a first utterance spoken by a first user that speaks with typical speech and processing the acoustic features of the first utterance using a general speech recognizer to generate a first transcription of the first utterance. The operations also include analyzing the first transcription of the first utterance to identify one or more bias terms in the first transcription and biasing the alternative speech recognizer on the one or more bias terms identified in the first transcription. The operations also include receiving acoustic features of a second utterance spoken by a second user that speaks with atypical speech and processing, using the alternative speech recognizer biased on the one or more terms identified in the first transcription, the acoustic features of the second utterance to generate a second transcription of the second utterance.
    Type: Grant
    Filed: January 20, 2021
    Date of Patent: February 14, 2023
    Assignee: Google LLC
    Inventors: Fadi Biadsy, Pedro Jose Moreno Mengibar
  • Publication number: 20230037085
    Abstract: Implementations disclosed herein are directed to techniques for selectively enabling and/or disabling non-transient storage of one or more instances of assistant interaction data for turn(s) of a dialog between a user and an automated assistant. Implementations are additionally or alternatively directed to techniques for retroactive wiping of non-transiently stored assistant interaction data from previous assistant interaction(s).
    Type: Application
    Filed: January 7, 2021
    Publication date: February 2, 2023
    Inventors: Fadi Biadsy, Johan Schalkwyk, Jason Pelecanos
  • Patent number: 11557289
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for language models using domain-specific model components. In some implementations, context data for an utterance is obtained. A domain-specific model component is selected from among multiple domain-specific model components of a language model based on the non-linguistic context of the utterance. A score for a candidate transcription for the utterance is generated using the selected domain-specific model component and a baseline model component of the language model that is domain-independent. A transcription for the utterance is determined using the score the transcription is provided as output of an automated speech recognition system.
    Type: Grant
    Filed: October 1, 2020
    Date of Patent: January 17, 2023
    Assignee: Google LLC
    Inventors: Fadi Biadsy, Diamantino Antionio Caseiro
  • Publication number: 20230009613
    Abstract: A method (800) of training a text-to-speech (TTS) model (108) includes obtaining training data (150) including reference input text (104) that includes a sequence of characters, a sequence of reference audio features (402) representative of the sequence of characters, and a sequence of reference phone labels (502) representative of distinct speech sounds of the reference audio features. For each of a plurality of time steps, the method includes generating a corresponding predicted audio feature (120) based on a respective portion of the reference input text for the time step and generating, using a phone label mapping network (510), a corresponding predicted phone label (520) associated with the predicted audio feature. The method also includes aligning the predicted phone label with the reference phone label to determine a corresponding predicted phone label loss (622) and updating the TTS model based on the corresponding predicted phone label loss.
    Type: Application
    Filed: December 13, 2019
    Publication date: January 12, 2023
    Applicant: Google LLC
    Inventors: Andrew Rosenberg, Bhuvana Ramabhadran, Fadi Biadsy, Yu Zhang
  • Publication number: 20220414542
    Abstract: The present disclosure relates generally to machine learning. More particularly, the present disclosure relates to on-the-fly feeding of personalized, domain-specific, context-specific, and/or task-specific submodels as input to an existing base model which has already been loaded into a memory (e.g., loaded into an existing session associated with execution of a machine learning library).
    Type: Application
    Filed: June 28, 2022
    Publication date: December 29, 2022
    Inventors: Fadi Biadsy, Katrin Ruth Sarah Tomanek
  • Publication number: 20220310056
    Abstract: A method for speech conversion includes receiving, as input to an encoder of a speech conversion model, an input spectrogram corresponding to an utterance, the encoder including a stack of self-attention blocks. The method further includes generating, as output from the encoder, an encoded spectrogram and receiving, as input to a spectrogram decoder of the speech conversion model, the encoded spectrogram generated as output from the encoder. The method further includes generating, as output from the spectrogram decoder, an output spectrogram corresponding to a synthesized speech representation of the utterance.
    Type: Application
    Filed: March 16, 2022
    Publication date: September 29, 2022
    Applicant: Google LLC
    Inventors: Bhuvana Ramabhadran, Zhehuai Chen, Fadi Biadsy, Pedro J. Moreno Mengibar