Patents by Inventor Fadi Biadsy

Fadi Biadsy has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

METHOD FOR SPEECH-TO-SPEECH CONVERSION

Publication number: 20240135117

Abstract: The present disclosure relates to a streaming speech-to-speech conversion model, where an encoder runs in real time while a user is speaking, then after the speaking stops, a decoder generates output audio in real time. A streaming-based approach produces an acceptable delay with minimal loss in conversion quality when compared to other non-streaming server-based models. A hybrid model approach for combines look-ahead in the encoder and a non-causal stacker with non-causal self-attention.

Type: Application

Filed: October 23, 2023

Publication date: April 25, 2024

Applicant: GOOGLE LLC

Inventors: Oleg RYBAKOV, Fadi BIADSY
LANGUAGE MODELS USING DOMAIN-SPECIFIC MODEL COMPONENTS

Publication number: 20240127807

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for language models using domain-specific model components. In some implementations, context data for an utterance is obtained. A domain-specific model component is selected from among multiple domain-specific model components of a language model based on the non-linguistic context of the utterance. A score for a candidate transcription for the utterance is generated using the selected domain-specific model component and a baseline model component of the language model that is domain-independent. A transcription for the utterance is determined using the score the transcription is provided as output of an automated speech recognition system.

Type: Application

Filed: December 21, 2023

Publication date: April 18, 2024

Applicant: Google LLC

Inventors: Fadi Biadsy, Diamantino Antonio Caseiro
Sub-models for Neural Contextual Biasing with Attention and Embedding Space

Publication number: 20240021190

Abstract: A method for training a sub-model for contextual biasing for speech recognition includes obtaining a base speech recognition model trained on non-biased data. The method includes obtaining a set of training utterances representative of a particular domain, each training utterance in the set of training utterances including audio data characterizing the training utterances and a ground truth transcription of the training utterance. The method further includes, for each corresponding training utterance in the set of training utterances, determining, using an embedding encoder, a corresponding document embedding from the ground truth transcription of the corresponding training utterance. The method includes training, using the corresponding document embeddings determined from the ground truth transcriptions of the set of training utterances, a sub-model to bias the base speech recognition model to recognize speech in the particular domain.

Type: Application

Filed: July 18, 2022

Publication date: January 18, 2024

Applicant: Google LLC

Inventors: Fadi Biadsy, Pedro Jose Moreno Mengibar
Language models using domain-specific model components

Patent number: 11875789

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for language models using domain-specific model components. In some implementations, context data for an utterance is obtained. A domain-specific model component is selected from among multiple domain-specific model components of a language model based on the non-linguistic context of the utterance. A score for a candidate transcription for the utterance is generated using the selected domain-specific model component and a baseline model component of the language model that is domain-independent. A transcription for the utterance is determined using the score the transcription is provided as output of an automated speech recognition system.

Type: Grant

Filed: December 20, 2022

Date of Patent: January 16, 2024

Assignee: Google LLC

Inventors: Fadi Biadsy, Diamantino Antonio Caseiro
Streaming Speech-to-speech Model With Automatic Speaker Turn Detection

Publication number: 20230395061

Abstract: A method for turn detection in a speech-to-speech model includes receiving, as input to the speech-to-speech (S2S) model, a sequence of acoustic frames corresponding to an utterance. The method further includes, at each of a plurality of output steps, generating, by an audio encoder of the S2S model, a higher order feature representation for a corresponding acoustic frame in the sequence of acoustic frames, and determining, by a turn detector of the S2S model, based on the higher order feature representation generated by the audio encoder at the corresponding output step, whether the utterance is at a breakpoint at the corresponding output step. When the turn detector determines that the utterance is at the breakpoint, the method includes synthesizing a sequence of output audio frames output by a speech decoder of the S2S model into a time-domain audio waveform of synthesized speech representing the utterance spoken by the user.

Type: Application

Filed: May 17, 2023

Publication date: December 7, 2023

Applicant: Google LLC

Inventors: Fadi Biadsy, Oleg Rybakov
Speech recognition

Patent number: 11823685

Abstract: A method includes receiving acoustic features of a first utterance spoken by a first user that speaks with typical speech and processing the acoustic features of the first utterance using a general speech recognizer to generate a first transcription of the first utterance. The operations also include analyzing the first transcription of the first utterance to identify one or more bias terms in the first transcription and biasing the alternative speech recognizer on the one or more bias terms identified in the first transcription. The operations also include receiving acoustic features of a second utterance spoken by a second user that speaks with atypical speech and processing, using the alternative speech recognizer biased on the one or more terms identified in the first transcription, the acoustic features of the second utterance to generate a second transcription of the second utterance.

Type: Grant

Filed: January 25, 2023

Date of Patent: November 21, 2023

Assignee: Google LLC

Inventors: Fadi Biadsy, Pedro J. Moreno Mengibar
Speaker Embeddings for Improved Automatic Speech Recognition

Publication number: 20230360632

Abstract: A method includes receiving a reference audio signal corresponding to reference speech spoken by a target speaker with atypical speech, and generating, by a speaker embedding network configured to receive the reference audio signal as input, a speaker embedding for the target speaker. The speaker embedding conveys speaker characteristics of the target speaker. The method also includes receiving a speech conversion request that includes input audio data corresponding to an utterance spoken by the target speaker associated with the atypical speech. The method also includes biasing, using the speaker embedding generated for the target speaker by the speaker embedding network, a speech conversion model to convert the input audio data corresponding to the utterance spoken by the target speaker associated with atypical speech into an output canonical representation of the utterance spoken by the target speaker.

Type: Application

Filed: May 3, 2022

Publication date: November 9, 2023

Applicant: Google LLC

Inventors: Fadi Biadsy, Dirk Ryan Padfield, Victoria Zayats
Sub-models For Neural Contextual Biasing

Publication number: 20230335122

Abstract: A method for contextual biasing for speech recognition includes obtaining a base automatic speech recognition (ASR) model trained on non-biased data and a sub-model trained on biased data representative of a particular domain. The method includes receiving a speech recognition request including audio data characterizing an utterance captured in streaming audio. The method further includes determining whether the speech recognition request includes a contextual indicator indicating the particular domain. When the speech recognition request does not include the contextual indicator, the method includes generating, using the base ASR model, a first speech recognition result of the utterance by processing the audio data.

Type: Application

Filed: April 19, 2022

Publication date: October 19, 2023

Applicant: Google LLC

Inventors: Fadi Biadsy, Pedro J. Moreno Mengibar
Scalable Model Specialization Framework for Speech Model Personalization

Publication number: 20230298574

Abstract: A method for speech conversion includes obtaining a speech conversion model configured to convert input utterances of human speech directly into corresponding output utterances of synthesized speech. The method further includes receiving a speech conversion request including input audio data corresponding to an utterance spoken by a target speaker associated with atypical speech and a speaker identifier uniquely identifying the target speaker. The method includes activating, using the speaker identifier, a particular sub-model for biasing the speech conversion model to recognize a type of the atypical speech associated with the target speaker identified by the speaker identifier.

Type: Application

Filed: March 15, 2023

Publication date: September 21, 2023

Applicant: Google LLC

Inventors: Fadi Biadsy, Youzheng Chen, Xia Zhang, Oleg Rybakov, Andrew M. Rosenberg, Pedro J.Moreno Mengibar
Using Non-Parallel Voice Conversion for Speech Conversion Models

Publication number: 20230298565

Abstract: A method includes receiving a set of training utterances each including a non-synthetic speech representation of a corresponding utterance, and for each training utterance, generating a corresponding synthetic speech representation by using a voice conversion model. The non-synthetic speech representation and the synthetic speech representation form a corresponding training utterance pair. At each of a plurality of output steps for each training utterance pair, the method also includes generating, for output by a speech recognition model, a first probability distribution over possible non-synthetic speech recognition hypotheses for the non-synthetic speech representation and a second probability distribution over possible synthetic speech recognition hypotheses for the synthetic speech representation.

Type: Application

Filed: April 25, 2022

Publication date: September 21, 2023

Applicant: Google LLC

Inventors: Andrew M. Rosenberg, Gary Wang, Bhuvana Ramabhadran, Fadi Biadsy
Streaming Vocoder

Publication number: 20230267949

Abstract: A method includes receiving a current spectrogram frame and reconstructing a phase of the current spectrogram frame by, for each corresponding committed spectrogram frame in a sequence of M number of committed spectrogram frames preceding the current spectrogram frame, obtaining a value of a committed phase of the corresponding committed spectrogram frame and estimating the phase of the current spectrogram frame based on a magnitude of the current spectrogram frame and the value of the committed phase of each corresponding committed spectrogram frame in the sequence of M number of committed spectrogram frames preceding the current spectrogram frame. The method also includes synthesizing, for the current spectrogram frame, a new time-domain audio waveform frame based on the estimated phase of the current spectrogram frame.

Type: Application

Filed: February 2, 2023

Publication date: August 24, 2023

Applicant: Google LLC

Inventors: Oleg Rybakov, Liyang Jiang, Fadi Biadsy
END-TO-END SPEECH CONVERSION

Publication number: 20230230572

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for end to end speech conversion are disclosed. In one aspect, a method includes the actions of receiving first audio data of a first utterance of one or more first terms spoken by a user. The actions further include providing the first audio data as an input to a model that is configured to receive first given audio data in a first voice and output second given audio data in a synthesized voice without performing speech recognition on the first given audio data. The actions further include receiving second audio data of a second utterance of the one or more first terms spoken in the synthesized voice. The actions further include providing, for output, the second audio data of the second utterance of the one or more first terms spoken in the synthesized voice.

Type: Application

Filed: March 23, 2023

Publication date: July 20, 2023

Applicant: Google LLC

Inventors: Fadi Biadsy, Ron J. Weiss, Aleksandar Kracun, Pedro J. Moreno Mengibar
SPEECH RECOGNITION

Publication number: 20230169983

Abstract: A method includes receiving acoustic features of a first utterance spoken by a first user that speaks with typical speech and processing the acoustic features of the first utterance using a general speech recognizer to generate a first transcription of the first utterance. The operations also include analyzing the first transcription of the first utterance to identify one or more bias terms in the first transcription and biasing the alternative speech recognizer on the one or more bias terms identified in the first transcription. The operations also include receiving acoustic features of a second utterance spoken by a second user that speaks with atypical speech and processing, using the alternative speech recognizer biased on the one or more terms identified in the first transcription, the acoustic features of the second utterance to generate a second transcription of the second utterance.

Type: Application

Filed: January 25, 2023

Publication date: June 1, 2023

Applicant: Google LLC

Inventors: Fadi Biadsy, Pedro J. Moreno Mengibar
LANGUAGE MODELS USING DOMAIN-SPECIFIC MODEL COMPONENTS

Publication number: 20230122941

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for language models using domain-specific model components. In some implementations, context data for an utterance is obtained. A domain-specific model component is selected from among multiple domain-specific model components of a language model based on the non-linguistic context of the utterance. A score for a candidate transcription for the utterance is generated using the selected domain-specific model component and a baseline model component of the language model that is domain-independent. A transcription for the utterance is determined using the score the transcription is provided as output of an automated speech recognition system.

Type: Application

Filed: December 20, 2022

Publication date: April 20, 2023

Applicant: Google LLC

Inventors: Fadi Biadsy, Diamantino Antonio Caseiro
Speech recognition

Patent number: 11580994

Abstract: A method includes receiving acoustic features of a first utterance spoken by a first user that speaks with typical speech and processing the acoustic features of the first utterance using a general speech recognizer to generate a first transcription of the first utterance. The operations also include analyzing the first transcription of the first utterance to identify one or more bias terms in the first transcription and biasing the alternative speech recognizer on the one or more bias terms identified in the first transcription. The operations also include receiving acoustic features of a second utterance spoken by a second user that speaks with atypical speech and processing, using the alternative speech recognizer biased on the one or more terms identified in the first transcription, the acoustic features of the second utterance to generate a second transcription of the second utterance.

Type: Grant

Filed: January 20, 2021

Date of Patent: February 14, 2023

Assignee: Google LLC

Inventors: Fadi Biadsy, Pedro Jose Moreno Mengibar
PREVENTING NON-TRANSIENT STORAGE OF ASSISTANT INTERACTION DATA AND/OR WIPING OF STORED ASSISTANT INTERACTION DATA

Publication number: 20230037085

Abstract: Implementations disclosed herein are directed to techniques for selectively enabling and/or disabling non-transient storage of one or more instances of assistant interaction data for turn(s) of a dialog between a user and an automated assistant. Implementations are additionally or alternatively directed to techniques for retroactive wiping of non-transiently stored assistant interaction data from previous assistant interaction(s).

Type: Application

Filed: January 7, 2021

Publication date: February 2, 2023

Inventors: Fadi Biadsy, Johan Schalkwyk, Jason Pelecanos
Language models using domain-specific model components

Patent number: 11557289

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for language models using domain-specific model components. In some implementations, context data for an utterance is obtained. A domain-specific model component is selected from among multiple domain-specific model components of a language model based on the non-linguistic context of the utterance. A score for a candidate transcription for the utterance is generated using the selected domain-specific model component and a baseline model component of the language model that is domain-independent. A transcription for the utterance is determined using the score the transcription is provided as output of an automated speech recognition system.

Type: Grant

Filed: October 1, 2020

Date of Patent: January 17, 2023

Assignee: Google LLC

Inventors: Fadi Biadsy, Diamantino Antionio Caseiro
Training Speech Synthesis to Generate Distinct Speech Sounds

Publication number: 20230009613

Abstract: A method (800) of training a text-to-speech (TTS) model (108) includes obtaining training data (150) including reference input text (104) that includes a sequence of characters, a sequence of reference audio features (402) representative of the sequence of characters, and a sequence of reference phone labels (502) representative of distinct speech sounds of the reference audio features. For each of a plurality of time steps, the method includes generating a corresponding predicted audio feature (120) based on a respective portion of the reference input text for the time step and generating, using a phone label mapping network (510), a corresponding predicted phone label (520) associated with the predicted audio feature. The method also includes aligning the predicted phone label with the reference phone label to determine a corresponding predicted phone label loss (622) and updating the TTS model based on the corresponding predicted phone label loss.

Type: Application

Filed: December 13, 2019

Publication date: January 12, 2023

Applicant: Google LLC

Inventors: Andrew Rosenberg, Bhuvana Ramabhadran, Fadi Biadsy, Yu Zhang
On-The-Fly Feeding of Personalized or Domain-Specific Submodels

Publication number: 20220414542

Abstract: The present disclosure relates generally to machine learning. More particularly, the present disclosure relates to on-the-fly feeding of personalized, domain-specific, context-specific, and/or task-specific submodels as input to an existing base model which has already been loaded into a memory (e.g., loaded into an existing session associated with execution of a machine learning library).

Type: Application

Filed: June 28, 2022

Publication date: December 29, 2022

Inventors: Fadi Biadsy, Katrin Ruth Sarah Tomanek
Conformer-based Speech Conversion Model

Publication number: 20220310056

Abstract: A method for speech conversion includes receiving, as input to an encoder of a speech conversion model, an input spectrogram corresponding to an utterance, the encoder including a stack of self-attention blocks. The method further includes generating, as output from the encoder, an encoded spectrogram and receiving, as input to a spectrogram decoder of the speech conversion model, the encoded spectrogram generated as output from the encoder. The method further includes generating, as output from the spectrogram decoder, an output spectrogram corresponding to a synthesized speech representation of the utterance.

Type: Application

Filed: March 16, 2022

Publication date: September 29, 2022

Applicant: Google LLC

Inventors: Bhuvana Ramabhadran, Zhehuai Chen, Fadi Biadsy, Pedro J. Moreno Mengibar

1 2 3 4 next