Patents by Inventor Sean Matthew Shannon

Sean Matthew Shannon has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Learning the Joint Distribution of Two Sequences Using Little or No Paired Data

Publication number: 20250131273

Abstract: Provided is a noisy channel generative model of two sequences, for example text and speech, which enables uncovering the associations between the two modalities when limited paired data is available. To address the intractability of the exact model under a realistic data set-up, example aspects of the present disclosure include a variational inference approximation. To train this variational model with categorical data, a KL encoder loss approach is proposed which has connections to the wake-sleep algorithm.

Type: Application

Filed: September 27, 2023

Publication date: April 24, 2025

Inventors: Soroosh Mariooryad, Sean Matthew Shannon, Thomas Edward Bagby, Siyuan Ma, David Teh-Hwa Kao, Daisy Antonia Stanton, Eric Dean Battenberg, Russell John Wyatt Skerry-Ryan
Neural-Network-Based Text-to-Speech Model for Novel Speaker Generation

Publication number: 20240395239

Abstract: Systems and methods for text-to-speech with novel speakers can obtain text data and output audio data. The input text data may be input along with one or more speaker preferences. The speaker preferences can include speaker characteristics. The speaker preferences can be processed by a machine-learned model conditioned on a learned prior distribution to determine a speaker embedding. The speaker embedding can then be processed with the text data to generate an output that includes audio data descriptive of the text data spoken by a novel speaker.

Type: Application

Filed: August 6, 2024

Publication date: November 28, 2024

Inventors: Daisy Antonia Stanton, Sean Matthew Shannon, Soroosh Mariooryad, Russell John Wyatt Skerry-Ryan, Eric Dean Battenberg, Thomas Edward Bagby, David Teh-Hwa Kao
Variational Embedding Capacity in Expressive End-to-End Speech Synthesis

Publication number: 20240395238

Abstract: A method for estimating an embedding capacity includes receiving, at a deterministic reference encoder, a reference audio signal, and determining a reference embedding corresponding to the reference audio signal, the reference embedding having a corresponding embedding dimensionality. The method also includes measuring a first reconstruction loss as a function of the corresponding embedding dimensionality of the reference embedding and obtaining a variational embedding from a variational posterior. The variational embedding has a corresponding embedding dimensionality and a specified capacity. The method also includes measuring a second reconstruction loss as a function of the corresponding embedding dimensionality of the variational embedding and estimating a capacity of the reference embedding by comparing the first measured reconstruction loss for the reference embedding relative to the second measured reconstruction loss for the variational embedding having the specified capacity.

Type: Application

Filed: August 7, 2024

Publication date: November 28, 2024

Applicant: Google LLC

Inventors: Eric Dean Battenberg, Daisy Stanton, Russell John Wyatt Skerry-Ryan, Soroosh Mariooryad, David Teh-hwa Kao, Thomas Edward Bagby, Sean Matthew Shannon
Neural-network-based text-to-speech model for novel speaker generation

Patent number: 12087275

Abstract: Systems and methods for text-to-speech with novel speakers can obtain text data and output audio data. The input text data may be input along with one or more speaker preferences. The speaker preferences can include speaker characteristics. The speaker preferences can be processed by a machine-learned model conditioned on a learned prior distribution to determine a speaker embedding. The speaker embedding can then be processed with the text data to generate an output that includes audio data descriptive of the text data spoken by a novel speaker.

Type: Grant

Filed: February 16, 2022

Date of Patent: September 10, 2024

Assignee: GOOGLE LLC

Inventors: Daisy Antonia Stanton, Sean Matthew Shannon, Soroosh Mariooryad, Russell John-Wyatt Skerry-Ryan, Eric Dean Battenberg, Thomas Edward Bagby, David Teh-Hwa Kao
Variational embedding capacity in expressive end-to-end speech synthesis

Patent number: 12067969

Abstract: A method for estimating an embedding capacity includes receiving, at a deterministic reference encoder, a reference audio signal, and determining a reference embedding corresponding to the reference audio signal, the reference embedding having a corresponding embedding dimensionality. The method also includes measuring a first reconstruction loss as a function of the corresponding embedding dimensionality of the reference embedding and obtaining a variational embedding from a variational posterior. The variational embedding has a corresponding embedding dimensionality and a specified capacity. The method also includes measuring a second reconstruction loss as a function of the corresponding embedding dimensionality of the variational embedding and estimating a capacity of the reference embedding by comparing the first measured reconstruction loss for the reference embedding relative to the second measured reconstruction loss for the variational embedding having the specified capacity.

Type: Grant

Filed: April 18, 2023

Date of Patent: August 20, 2024

Assignee: Google LLC

Inventors: Eric Dean Battenberg, Daisy Stanton, Russell John Wyatt Skerry-Ryan, Soroosh Mariooryad, David Teh-Hwa Kao, Thomas Edward Bagby, Sean Matthew Shannon
ENCODER-DECODER MODELS FOR SEQUENCE TO SEQUENCE MAPPING

Publication number: 20230410796

Abstract: Methods, systems, and apparatus for performing speech recognition. In some implementations, acoustic data representing an utterance is obtained. The acoustic data corresponds to time steps in a series of time steps. One or more computers process scores indicative of the acoustic data using a recurrent neural network to generate a sequence of outputs. The sequence of outputs indicates a likely output label from among a predetermined set of output labels. The predetermined set of output labels includes output labels that respectively correspond to different linguistic units and to a placeholder label that does not represent a classification of acoustic data. The recurrent neural network is configured to use an output label indicated for a previous time step to determine an output label for the current time step. The generated sequence of outputs is processed to generate a transcription of the utterance, and the transcription of the utterance is provided.

Type: Application

Filed: September 1, 2023

Publication date: December 21, 2023

Applicant: GOOGLE LLC

Inventors: Hasim Sak, Sean Matthew Shannon
Encoder-decoder models for sequence to sequence mapping

Patent number: 11776531

Abstract: Methods, systems, and apparatus for performing speech recognition. In some implementations, acoustic data representing an utterance is obtained. The acoustic data corresponds to time steps in a series of time steps. One or more computers process scores indicative of the acoustic data using a recurrent neural network to generate a sequence of outputs. The sequence of outputs indicates a likely output label from among a predetermined set of output labels. The predetermined set of output labels includes output labels that respectively correspond to different linguistic units and to a placeholder label that does not represent a classification of acoustic data. The recurrent neural network is configured to use an output label indicated for a previous time step to determine an output label for the current time step. The generated sequence of outputs is processed to generate a transcription of the utterance, and the transcription of the utterance is provided.

Type: Grant

Filed: May 28, 2020

Date of Patent: October 3, 2023

Assignee: Google LLC

Inventors: Hasim Sak, Sean Matthew Shannon
Controlling Expressivity In End-to-End Speech Synthesis Systems

Publication number: 20230274728

Abstract: A system for generating an output audio signal includes a context encoder, a text-prediction network, and a text-to-speech (TTS) model. The context encoder is configured to receive one or more context features associated with current input text and process the one or more context features to generate a context embedding associated with the current input text. The text-prediction network is configured to process the current input text and the context embedding to predict, as output, a style embedding for the current input text. The style embedding specifies a specific prosody and/or style for synthesizing the current input text into expressive speech. The TTS model is configured to process the current input text and the style embedding to generate an output audio signal of expressive speech of the current input text. The output audio signal has the specific prosody and/or style specified by the style embedding.

Type: Application

Filed: May 9, 2023

Publication date: August 31, 2023

Applicant: Google LLC

Inventors: Daisy Stanton, Eric Dean Battenberg, Russell John Wyatt Skerry-Ryan, Soroosh Mariooryad, David Teh-hwa Kao, Thomas Edward Bagby, Sean Matthew Shannon
Variational Embedding Capacity in Expressive End-to-End Speech Synthesis

Publication number: 20230260504

Abstract: A method for estimating an embedding capacity includes receiving, at a deterministic reference encoder, a reference audio signal, and determining a reference embedding corresponding to the reference audio signal, the reference embedding having a corresponding embedding dimensionality. The method also includes measuring a first reconstruction loss as a function of the corresponding embedding dimensionality of the reference embedding and obtaining a variational embedding from a variational posterior. The variational embedding has a corresponding embedding dimensionality and a specified capacity. The method also includes measuring a second reconstruction loss as a function of the corresponding embedding dimensionality of the variational embedding and estimating a capacity of the reference embedding by comparing the first measured reconstruction loss for the reference embedding relative to the second measured reconstruction loss for the variational embedding having the specified capacity.

Type: Application

Filed: April 18, 2023

Publication date: August 17, 2023

Applicant: Google LLC

Inventors: Eric Dean Battenberg, Daisy Stanton, Russell John Wyatt Skerry-Ryan, Soroosh Mariooryad, David Teh-hwa Kao, Thomas Edward Bagby, Sean Matthew Shannon
Neural-Network-Based Text-to-Speech Model for Novel Speaker Generation

Publication number: 20230206898

Abstract: Systems and methods for text-to-speech with novel speakers can obtain text data and output audio data. The input text data may be input along with one or more speaker preferences. The speaker preferences can include speaker characteristics. The speaker preferences can be processed by a machine-learned model conditioned on a learned prior distribution to determine a speaker embedding. The speaker embedding can then be processed with the text data to generate an output that includes audio data descriptive of the text data spoken by a novel speaker.

Type: Application

Filed: February 16, 2022

Publication date: June 29, 2023

Inventors: Daisy Antonia Stanton, Sean Matthew Shannon, Soroosh Mariooryad, Russell John-Wyatt Skerry-Ryan, Eric Dean Battenberg, Thomas Edward Bagby, David Teh-Hwa Kao
Controlling expressivity in end-to-end speech synthesis systems

Patent number: 11676573

Abstract: A system for generating an output audio signal includes a context encoder, a text-prediction network, and a text-to-speech (TTS) model. The context encoder is configured to receive one or more context features associated with current input text and process the one or more context features to generate a context embedding associated with the current input text. The text-prediction network is configured to process the current input text and the context embedding to predict, as output, a style embedding for the current input text. The style embedding specifies a specific prosody and/or style for synthesizing the current input text into expressive speech. The TTS model is configured to process the current input text and the style embedding to generate an output audio signal of expressive speech of the current input text. The output audio signal has the specific prosody and/or style specified by the style embedding.

Type: Grant

Filed: July 16, 2020

Date of Patent: June 13, 2023

Assignee: Google LLC

Inventors: Daisy Stanton, Eric Dean Battenberg, Russell John Wyatt Skerry-Ryan, Soroosh Mariooryad, David Teh-Hwa Kao, Thomas Edward Bagby, Sean Matthew Shannon
Unified endpointer using multitask and multidomain learning

Patent number: 11676625

Abstract: A method for training an endpointer model includes short-form speech utterances and long-form speech utterances. The method also includes providing a short-form speech utterance as input to a shared neural network, the shared neural network configured to learn shared hidden representations suitable for both voice activity detection (VAD) and end-of-query (EOQ) detection. The method also includes generating, using a VAD classifier, a sequence of predicted VAD labels and determining a VAD loss by comparing the sequence of predicted VAD labels to a corresponding sequence of reference VAD labels. The method also includes, generating, using an EOQ classifier, a sequence of predicted EOQ labels and determining an EOQ loss by comparing the sequence of predicted EOQ labels to a corresponding sequence of reference EOQ labels. The method also includes training, using a cross-entropy criterion, the endpointer model based on the VAD loss and the EOQ loss.

Type: Grant

Filed: January 20, 2021

Date of Patent: June 13, 2023

Assignee: Google LLC

Inventors: Shuo-Yiin Chang, Bo Li, Gabor Simko, Maria Carolina Parada San Martin, Sean Matthew Shannon
Variational embedding capacity in expressive end-to-end speech synthesis

Patent number: 11646010

Abstract: A method for estimating an embedding capacity includes receiving, at a deterministic reference encoder, a reference audio signal, and determining a reference embedding corresponding to the reference audio signal, the reference embedding having a corresponding embedding dimensionality. The method also includes measuring a first reconstruction loss as a function of the corresponding embedding dimensionality of the reference embedding and obtaining a variational embedding from a variational posterior. The variational embedding has a corresponding embedding dimensionality and a specified capacity. The method also includes measuring a second reconstruction loss as a function of the corresponding embedding dimensionality of the variational embedding and estimating a capacity of the reference embedding by comparing the first measured reconstruction loss for the reference embedding relative to the second measured reconstruction loss for the variational embedding having the specified capacity.

Type: Grant

Filed: December 9, 2021

Date of Patent: May 9, 2023

Assignee: Google LLC

Inventors: Eric Dean Battenberg, Daisy Stanton, Russell John Wyatt Skerry-Ryan, Soroosh Mariooryad, David Teh-Hwa Kao, Thomas Edward Bagby, Sean Matthew Shannon
End of query detection

Patent number: 11551709

Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for detecting an end of a query are disclosed. In one aspect, a method includes the actions of receiving audio data that corresponds to an utterance spoken by a user. The actions further include applying, to the audio data, an end of query model. The actions further include determining the confidence score that reflects a likelihood that the utterance is a complete utterance. The actions further include comparing the confidence score that reflects the likelihood that the utterance is a complete utterance to a confidence score threshold. The actions further include determining whether the utterance is likely complete or likely incomplete. The actions further include providing, for output, an instruction to (i) maintain a microphone that is receiving the utterance in an active state or (ii) deactivate the microphone that is receiving the utterance.

Type: Grant

Filed: January 31, 2020

Date of Patent: January 10, 2023

Assignee: Google LLC

Inventors: Gabor Simko, Maria Carolina Parada San Martin, Sean Matthew Shannon
Variational Embedding Capacity in Expressive End-to-End Speech Synthesis

Publication number: 20220101826

Abstract: A method for estimating an embedding capacity includes receiving, at a deterministic reference encoder, a reference audio signal, and determining a reference embedding corresponding to the reference audio signal, the reference embedding having a corresponding embedding dimensionality. The method also includes measuring a first reconstruction loss as a function of the corresponding embedding dimensionality of the reference embedding and obtaining a variational embedding from a variational posterior. The variational embedding has a corresponding embedding dimensionality and a specified capacity. The method also includes measuring a second reconstruction loss as a function of the corresponding embedding dimensionality of the variational embedding and estimating a capacity of the reference embedding by comparing the first measured reconstruction loss for the reference embedding relative to the second measured reconstruction loss for the variational embedding having the specified capacity.

Type: Application

Filed: December 9, 2021

Publication date: March 31, 2022

Applicant: Google LLC

Inventors: Eric Dean Battenberg, Daisy Stanton, Russell John Wyatt Skerry-Ryan, Soroosh Mariooryad, David Teh-Hwa Kao, Thomas Edward Bagby, Sean Matthew Shannon
Variational embedding capacity in expressive end-to-end speech synthesis

Patent number: 11222621

Abstract: A method for estimating an embedding capacity includes receiving, at a deterministic reference encoder, a reference audio signal, and determining a reference embedding corresponding to the reference audio signal, the reference embedding having a corresponding embedding dimensionality. The method also includes measuring a first reconstruction loss as a function of the corresponding embedding dimensionality of the reference embedding and obtaining a variational embedding from a variational posterior. The variational embedding has a corresponding embedding dimensionality and a specified capacity. The method also includes measuring a second reconstruction loss as a function of the corresponding embedding dimensionality of the variational embedding and estimating a capacity of the reference embedding by comparing the first measured reconstruction loss for the reference embedding relative to the second measured reconstruction loss for the variational embedding having the specified capacity.

Type: Grant

Filed: May 20, 2020

Date of Patent: January 11, 2022

Assignee: Google LLC

Inventors: Eric Dean Battenberg, Daisy Stanton, Russell John Wyatt Skerry-Ryan, Soroosh Mariooryad, David Teh-hwa Kao, Thomas Edward Bagby, Sean Matthew Shannon
Unified Endpointer Using Multitask and Multidomain Learning

Publication number: 20210142174

Abstract: A method for training an endpointer model includes short-form speech utterances and long-form speech utterances. The method also includes providing a short-form speech utterance as input to a shared neural network, the shared neural network configured to learn shared hidden representations suitable for both voice activity detection (VAD) and end-of-query (EOQ) detection. The method also includes generating, using a VAD classifier, a sequence of predicted VAD labels and determining a VAD loss by comparing the sequence of predicted VAD labels to a corresponding sequence of reference VAD labels. The method also includes, generating, using an EOQ classifier, a sequence of predicted EOQ labels and determining an EOQ loss by comparing the sequence of predicted EOQ labels to a corresponding sequence of reference EOQ labels. The method also includes training, using a cross-entropy criterion, the endpointer model based on the VAD loss and the EOQ loss.

Type: Application

Filed: January 20, 2021

Publication date: May 13, 2021

Applicant: Google LLC

Inventors: Shuo-yiin Chang, Bo Li, Gabor Simko, Maria Corolina Parada San Martin, Sean Matthew Shannon
Unified endpointer using multitask and multidomain learning

Patent number: 10929754

Abstract: A method for training an endpointer model includes short-form speech utterances and long-form speech utterances. The method also includes providing a short-form speech utterance as input to a shared neural network, the shared neural network configured to learn shared hidden representations suitable for both voice activity detection (VAD) and end-of-query (EOQ) detection. The method also includes generating, using a VAD classifier, a sequence of predicted VAD labels and determining a VAD loss by comparing the sequence of predicted VAD labels to a corresponding sequence of reference VAD labels. The method also includes, generating, using an EOQ classifier, a sequence of predicted EOQ labels and determining an EOQ loss by comparing the sequence of predicted EOQ labels to a corresponding sequence of reference EOQ labels. The method also includes training, using a cross-entropy criterion, the endpointer model based on the VAD loss and the EOQ loss.

Type: Grant

Filed: December 11, 2019

Date of Patent: February 23, 2021

Assignee: Google LLC

Inventors: Shuo-yiin Chang, Bo Li, Gabor Simko, Maria Carolina Parada San Martin, Sean Matthew Shannon
Controlling Expressivity In End-to-End Speech Synthesis Systems

Publication number: 20210035551

Abstract: A system for generating an output audio signal includes a context encoder, a text-prediction network, and a text-to-speech (TTS) model. The context encoder is configured to receive one or more context features associated with current input text and process the one or more context features to generate a context embedding associated with the current input text. The text-prediction network is configured to process the current input text and the context embedding to predict, as output, a style embedding for the current input text. The style embedding specifies a specific prosody and/or style for synthesizing the current input text into expressive speech The TTS model is configured to process the current input text and the style embedding to generate an output audio signal of expressive speech of the current input text. The output audio signal has the specific prosody and/or style specified by the style embedding.

Type: Application

Filed: July 16, 2020

Publication date: February 4, 2021

Applicant: Google LLC

Inventors: Daisy Stanton, Eric Dean Battenberg, Russell John Wyatt Skerry-Ryan, Soroosh Mariooryad, David Teh-Hwa Kao, Thomas Edward Bagby, Sean Matthew Shannon
Variational Embedding Capacity in Expressive End-to-End Speech Synthesis

Publication number: 20200372897

Abstract: A method for estimating an embedding capacity includes receiving, at a deterministic reference encoder, a reference audio signal, and determining a reference embedding corresponding to the reference audio signal, the reference embedding having a corresponding embedding dimensionality. The method also includes measuring a first reconstruction loss as a function of the corresponding embedding dimensionality of the reference embedding and obtaining a variational embedding from a variational posterior. The variational embedding has a corresponding embedding dimensionality and a specified capacity. The method also includes measuring a second reconstruction loss as a function of the corresponding embedding dimensionality of the variational embedding and estimating a capacity of the reference embedding by comparing the first measured reconstruction loss for the reference embedding relative to the second measured reconstruction loss for the variational embedding having the specified capacity.

Type: Application

Filed: May 20, 2020

Publication date: November 26, 2020

Applicant: Google LLC

Inventors: Eric Dean Battenberg, Daisy Stanton, Russell John Wyatt Skerry-Ryan, Soroosh Mariooryad, David Teh-hwa Kao, Thomas Edward Bagby, Sean Matthew Shannon

1 2 next