Patents by Inventor John Wyatt
John Wyatt has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20250131273Abstract: Provided is a noisy channel generative model of two sequences, for example text and speech, which enables uncovering the associations between the two modalities when limited paired data is available. To address the intractability of the exact model under a realistic data set-up, example aspects of the present disclosure include a variational inference approximation. To train this variational model with categorical data, a KL encoder loss approach is proposed which has connections to the wake-sleep algorithm.Type: ApplicationFiled: September 27, 2023Publication date: April 24, 2025Inventors: Soroosh Mariooryad, Sean Matthew Shannon, Thomas Edward Bagby, Siyuan Ma, David Teh-Hwa Kao, Daisy Antonia Stanton, Eric Dean Battenberg, Russell John Wyatt Skerry-Ryan
-
Publication number: 20250025349Abstract: Embodiments of negative pressure wound therapy systems and methods for operating the systems are disclosed. In some embodiments, a system includes a pump assembly, canister, and a wound dressing configured to be positioned over a wound. The pump assembly, canister, and the wound dressing can be fluidically connected to facilitate delivery of negative pressure to a wound. The pump assembly can present graphical user interface screens for controlling and monitoring delivery of negative pressure. The system can be configured to efficiently deliver negative pressure and to detect and indicate presence of certain conditions, such as low pressure, high pressure, leak, canister full, and the like. Monitoring and detection of operating condition can be performed by measuring one or more operational parameters, such as pressure, flow rate, and the like.Type: ApplicationFiled: October 7, 2024Publication date: January 23, 2025Inventors: Alex Fowler, William W. Gregory, William Joseph Jaecklein, Kathryn Ann Leigh, Paul N. Minor, Michael Mosholder, Felix C. Quintanar, John P. Racette, Christopher Rouseff, Matthew Smith, W. Len Smith, John Wyatt, Annaliese Yeaman
-
Publication number: 20240404506Abstract: A method includes receiving an input text sequence to be synthesized into speech in a first language and obtaining a speaker embedding, the speaker embedding specifying specific voice characteristics of a target speaker for synthesizing the input text sequence into speech that clones a voice of the target speaker. The target speaker includes a native speaker of a second language different than the first language. The method also includes generating, using a text-to-speech (TTS) model, an output audio feature representation of the input text by processing the input text sequence and the speaker embedding. The output audio feature representation includes the voice characteristics of the target speaker specified by the speaker embedding.Type: ApplicationFiled: August 8, 2024Publication date: December 5, 2024Applicant: Google LLCInventors: Yu Zhang, Ron J. Weiss, Byungha Chun, Yonghui Wu, Zhifeng Chen, Russell John Wyatt Skerry-Ryan, Ye Jia, Andrew M. Rosenberg, Bhuvana Ramabhadran
-
Publication number: 20240395238Abstract: A method for estimating an embedding capacity includes receiving, at a deterministic reference encoder, a reference audio signal, and determining a reference embedding corresponding to the reference audio signal, the reference embedding having a corresponding embedding dimensionality. The method also includes measuring a first reconstruction loss as a function of the corresponding embedding dimensionality of the reference embedding and obtaining a variational embedding from a variational posterior. The variational embedding has a corresponding embedding dimensionality and a specified capacity. The method also includes measuring a second reconstruction loss as a function of the corresponding embedding dimensionality of the variational embedding and estimating a capacity of the reference embedding by comparing the first measured reconstruction loss for the reference embedding relative to the second measured reconstruction loss for the variational embedding having the specified capacity.Type: ApplicationFiled: August 7, 2024Publication date: November 28, 2024Applicant: Google LLCInventors: Eric Dean Battenberg, Daisy Stanton, Russell John Wyatt Skerry-Ryan, Soroosh Mariooryad, David Teh-hwa Kao, Thomas Edward Bagby, Sean Matthew Shannon
-
Publication number: 20240395239Abstract: Systems and methods for text-to-speech with novel speakers can obtain text data and output audio data. The input text data may be input along with one or more speaker preferences. The speaker preferences can include speaker characteristics. The speaker preferences can be processed by a machine-learned model conditioned on a learned prior distribution to determine a speaker embedding. The speaker embedding can then be processed with the text data to generate an output that includes audio data descriptive of the text data spoken by a novel speaker.Type: ApplicationFiled: August 6, 2024Publication date: November 28, 2024Inventors: Daisy Antonia Stanton, Sean Matthew Shannon, Soroosh Mariooryad, Russell John Wyatt Skerry-Ryan, Eric Dean Battenberg, Thomas Edward Bagby, David Teh-Hwa Kao
-
Publication number: 20240386885Abstract: A method includes receiving an input sequence of speech features characterizing a spoken prompt. The method also includes generating a corresponding sequence of audio encodings using an audio encoder of a spoken language model. Without applying any intermediary cross-attention to the sequence of audio encoding between the audio encoder and a language model decoder of the spoken language model, the method includes processing the sequence of audio encodings generated by the audio encoder using the language model decoder to generate an output sequence of speech features characterizing a continuation of the spoken prompt.Type: ApplicationFiled: May 13, 2024Publication date: November 21, 2024Applicant: Google LLCInventors: Michelle Dana Tadmor, Eliya Nachmani, Alon Levkovitch, Julian Salazar, Chulayuth Asawaroengchai, Russell John Wyatt Skerry-Ryan, Soroosh Mariooryad
-
Patent number: 12148444Abstract: Methods, systems, and computer program products for generating, from an input character sequence, an output sequence of audio data representing the input character sequence. The output sequence of audio data includes a respective audio output sample for each of a number of time steps. One example method includes, for each of the time steps: generating a mel-frequency spectrogram for the time step by processing a representation of a respective portion of the input character sequence using a decoder neural network; generating a probability distribution over a plurality of possible audio output samples for the time step by processing the mel-frequency spectrogram for the time step using a vocoder neural network; and selecting the audio output sample for the time step from the possible audio output samples in accordance with the probability distribution.Type: GrantFiled: April 5, 2021Date of Patent: November 19, 2024Assignee: Google LLCInventors: Yonghui Wu, Jonathan Shen, Ruoming Pang, Ron J. Weiss, Michael Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, Russell John Wyatt Skerry-Ryan, Ryan M. Rifkin, Ioannis Agiomyrgiannakis
-
Patent number: 12133789Abstract: Embodiments of negative pressure wound therapy systems and methods for operating the systems are disclosed. In some embodiments, a system includes a pump assembly, canister, and a wound dressing configured to be positioned over a wound. The pump assembly, canister, and the wound dressing can be fluidically connected to facilitate delivery of negative pressure to a wound. The pump assembly can present graphical user interface screens for controlling and monitoring delivery of negative pressure. The system can be configured to efficiently deliver negative pressure and to detect and indicate presence of certain conditions, such as low pressure, high pressure, leak, canister full, and the like. Monitoring and detection of operating condition can be performed by measuring one or more operational parameters, such as pressure, flow rate, and the like.Type: GrantFiled: March 30, 2020Date of Patent: November 5, 2024Assignee: Smith & Nephew, Inc.Inventors: William W. Gregory, William Joseph Jaecklein, Kathryn Ann Leigh, Paul N. Minor, Michael Mosholder, Felix C. Quintanar, John P. Racette, Christopher Rouseff, Matthew Smith, W. Len Smith, John Wyatt
-
Patent number: 12087273Abstract: A method includes receiving an input text sequence to be synthesized into speech in a first language and obtaining a speaker embedding, the speaker embedding specifying specific voice characteristics of a target speaker for synthesizing the input text sequence into speech that clones a voice of the target speaker. The target speaker includes a native speaker of a second language different than the first language. The method also includes generating, using a text-to-speech (TTS) model, an output audio feature representation of the input text by processing the input text sequence and the speaker embedding. The output audio feature representation includes the voice characteristics of the target speaker specified by the speaker embedding.Type: GrantFiled: January 30, 2023Date of Patent: September 10, 2024Assignee: Google LLCInventors: Yu Zhang, Ron J. Weiss, Byungha Chun, Yonghui Wu, Zhifeng Chen, Russell John Wyatt Skerry-Ryan, Ye Jia, Andrew M. Rosenberg, Bhuvana Ramabhadran
-
Patent number: 12087275Abstract: Systems and methods for text-to-speech with novel speakers can obtain text data and output audio data. The input text data may be input along with one or more speaker preferences. The speaker preferences can include speaker characteristics. The speaker preferences can be processed by a machine-learned model conditioned on a learned prior distribution to determine a speaker embedding. The speaker embedding can then be processed with the text data to generate an output that includes audio data descriptive of the text data spoken by a novel speaker.Type: GrantFiled: February 16, 2022Date of Patent: September 10, 2024Assignee: GOOGLE LLCInventors: Daisy Antonia Stanton, Sean Matthew Shannon, Soroosh Mariooryad, Russell John-Wyatt Skerry-Ryan, Eric Dean Battenberg, Thomas Edward Bagby, David Teh-Hwa Kao
-
Patent number: 12067969Abstract: A method for estimating an embedding capacity includes receiving, at a deterministic reference encoder, a reference audio signal, and determining a reference embedding corresponding to the reference audio signal, the reference embedding having a corresponding embedding dimensionality. The method also includes measuring a first reconstruction loss as a function of the corresponding embedding dimensionality of the reference embedding and obtaining a variational embedding from a variational posterior. The variational embedding has a corresponding embedding dimensionality and a specified capacity. The method also includes measuring a second reconstruction loss as a function of the corresponding embedding dimensionality of the variational embedding and estimating a capacity of the reference embedding by comparing the first measured reconstruction loss for the reference embedding relative to the second measured reconstruction loss for the variational embedding having the specified capacity.Type: GrantFiled: April 18, 2023Date of Patent: August 20, 2024Assignee: Google LLCInventors: Eric Dean Battenberg, Daisy Stanton, Russell John Wyatt Skerry-Ryan, Soroosh Mariooryad, David Teh-Hwa Kao, Thomas Edward Bagby, Sean Matthew Shannon
-
Publication number: 20230274728Abstract: A system for generating an output audio signal includes a context encoder, a text-prediction network, and a text-to-speech (TTS) model. The context encoder is configured to receive one or more context features associated with current input text and process the one or more context features to generate a context embedding associated with the current input text. The text-prediction network is configured to process the current input text and the context embedding to predict, as output, a style embedding for the current input text. The style embedding specifies a specific prosody and/or style for synthesizing the current input text into expressive speech. The TTS model is configured to process the current input text and the style embedding to generate an output audio signal of expressive speech of the current input text. The output audio signal has the specific prosody and/or style specified by the style embedding.Type: ApplicationFiled: May 9, 2023Publication date: August 31, 2023Applicant: Google LLCInventors: Daisy Stanton, Eric Dean Battenberg, Russell John Wyatt Skerry-Ryan, Soroosh Mariooryad, David Teh-hwa Kao, Thomas Edward Bagby, Sean Matthew Shannon
-
Publication number: 20230260504Abstract: A method for estimating an embedding capacity includes receiving, at a deterministic reference encoder, a reference audio signal, and determining a reference embedding corresponding to the reference audio signal, the reference embedding having a corresponding embedding dimensionality. The method also includes measuring a first reconstruction loss as a function of the corresponding embedding dimensionality of the reference embedding and obtaining a variational embedding from a variational posterior. The variational embedding has a corresponding embedding dimensionality and a specified capacity. The method also includes measuring a second reconstruction loss as a function of the corresponding embedding dimensionality of the variational embedding and estimating a capacity of the reference embedding by comparing the first measured reconstruction loss for the reference embedding relative to the second measured reconstruction loss for the variational embedding having the specified capacity.Type: ApplicationFiled: April 18, 2023Publication date: August 17, 2023Applicant: Google LLCInventors: Eric Dean Battenberg, Daisy Stanton, Russell John Wyatt Skerry-Ryan, Soroosh Mariooryad, David Teh-hwa Kao, Thomas Edward Bagby, Sean Matthew Shannon
-
Publication number: 20230206898Abstract: Systems and methods for text-to-speech with novel speakers can obtain text data and output audio data. The input text data may be input along with one or more speaker preferences. The speaker preferences can include speaker characteristics. The speaker preferences can be processed by a machine-learned model conditioned on a learned prior distribution to determine a speaker embedding. The speaker embedding can then be processed with the text data to generate an output that includes audio data descriptive of the text data spoken by a novel speaker.Type: ApplicationFiled: February 16, 2022Publication date: June 29, 2023Inventors: Daisy Antonia Stanton, Sean Matthew Shannon, Soroosh Mariooryad, Russell John-Wyatt Skerry-Ryan, Eric Dean Battenberg, Thomas Edward Bagby, David Teh-Hwa Kao
-
Patent number: 11676573Abstract: A system for generating an output audio signal includes a context encoder, a text-prediction network, and a text-to-speech (TTS) model. The context encoder is configured to receive one or more context features associated with current input text and process the one or more context features to generate a context embedding associated with the current input text. The text-prediction network is configured to process the current input text and the context embedding to predict, as output, a style embedding for the current input text. The style embedding specifies a specific prosody and/or style for synthesizing the current input text into expressive speech. The TTS model is configured to process the current input text and the style embedding to generate an output audio signal of expressive speech of the current input text. The output audio signal has the specific prosody and/or style specified by the style embedding.Type: GrantFiled: July 16, 2020Date of Patent: June 13, 2023Assignee: Google LLCInventors: Daisy Stanton, Eric Dean Battenberg, Russell John Wyatt Skerry-Ryan, Soroosh Mariooryad, David Teh-Hwa Kao, Thomas Edward Bagby, Sean Matthew Shannon
-
Publication number: 20230178068Abstract: A method includes receiving an input text sequence to be synthesized into speech in a first language and obtaining a speaker embedding, the speaker embedding specifying specific voice characteristics of a target speaker for synthesizing the input text sequence into speech that clones a voice of the target speaker. The target speaker includes a native speaker of a second language different than the first language. The method also includes generating, using a text-to-speech (TTS) model, an output audio feature representation of the input text by processing the input text sequence and the speaker embedding. The output audio feature representation includes the voice characteristics of the target speaker specified by the speaker embedding.Type: ApplicationFiled: January 30, 2023Publication date: June 8, 2023Applicant: Google LLCInventors: Yu Zhang, Ron J. Weiss, Byungha Chun, Yonghui Wu, Zhifeng Chen, Russell John Wyatt Skerry-Ryan, Ye Jia, Andrew M. Rosenberg, Bhuvana Ramabhadran
-
Patent number: 11646010Abstract: A method for estimating an embedding capacity includes receiving, at a deterministic reference encoder, a reference audio signal, and determining a reference embedding corresponding to the reference audio signal, the reference embedding having a corresponding embedding dimensionality. The method also includes measuring a first reconstruction loss as a function of the corresponding embedding dimensionality of the reference embedding and obtaining a variational embedding from a variational posterior. The variational embedding has a corresponding embedding dimensionality and a specified capacity. The method also includes measuring a second reconstruction loss as a function of the corresponding embedding dimensionality of the variational embedding and estimating a capacity of the reference embedding by comparing the first measured reconstruction loss for the reference embedding relative to the second measured reconstruction loss for the variational embedding having the specified capacity.Type: GrantFiled: December 9, 2021Date of Patent: May 9, 2023Assignee: Google LLCInventors: Eric Dean Battenberg, Daisy Stanton, Russell John Wyatt Skerry-Ryan, Soroosh Mariooryad, David Teh-Hwa Kao, Thomas Edward Bagby, Sean Matthew Shannon
-
Patent number: 11580952Abstract: A method includes receiving an input text sequence to be synthesized into speech in a first language and obtaining a speaker embedding, the speaker embedding specifying specific voice characteristics of a target speaker for synthesizing the input text sequence into speech that clones a voice of the target speaker. The target speaker includes a native speaker of a second language different than the first language. The method also includes generating, using a text-to-speech (TTS) model, an output audio feature representation of the input text by processing the input text sequence and the speaker embedding. The output audio feature representation includes the voice characteristics of the target speaker specified by the speaker embedding.Type: GrantFiled: April 22, 2020Date of Patent: February 14, 2023Assignee: Google LLCInventors: Yu Zhang, Ron J. Weiss, Byungha Chun, Yonghui Wu, Zhifeng Chen, Russell John Wyatt Skerry-Ryan, Ye Jia, Andrew M. Rosenberg, Bhuvana Ramabhadran
-
Patent number: 11335321Abstract: A method of building a text-to-speech (TTS) system from a small amount of speech data includes receiving a first plurality of recorded speech samples from an assortment of speakers and a second plurality of recorded speech samples from a target speaker where the assortment of speakers does not include the target speaker. The method further includes training a TTS model using the first plurality of recorded speech samples from the assortment of speakers. Here, the trained TTS model is configured to output synthetic speech as an audible representation of a text input. The method also includes re-training the trained TTS model using the second plurality of recorded speech samples from the target speaker combined with the first plurality of recorded speech samples from the assortment of speakers. Here, the re-trained TTS model is configured to output synthetic speech resembling speaking characteristics of the target speaker.Type: GrantFiled: August 28, 2020Date of Patent: May 17, 2022Assignee: Google LLCInventors: Ye Jia, Byungha Chun, Yusuke Oda, Norman Casagrande, Tejas Iyer, Fan Luo, Russell John Wyatt Skerry-Ryan, Jonathan Shen, Yonghui Wu, Yu Zhang
-
Publication number: 20220101826Abstract: A method for estimating an embedding capacity includes receiving, at a deterministic reference encoder, a reference audio signal, and determining a reference embedding corresponding to the reference audio signal, the reference embedding having a corresponding embedding dimensionality. The method also includes measuring a first reconstruction loss as a function of the corresponding embedding dimensionality of the reference embedding and obtaining a variational embedding from a variational posterior. The variational embedding has a corresponding embedding dimensionality and a specified capacity. The method also includes measuring a second reconstruction loss as a function of the corresponding embedding dimensionality of the variational embedding and estimating a capacity of the reference embedding by comparing the first measured reconstruction loss for the reference embedding relative to the second measured reconstruction loss for the variational embedding having the specified capacity.Type: ApplicationFiled: December 9, 2021Publication date: March 31, 2022Applicant: Google LLCInventors: Eric Dean Battenberg, Daisy Stanton, Russell John Wyatt Skerry-Ryan, Soroosh Mariooryad, David Teh-Hwa Kao, Thomas Edward Bagby, Sean Matthew Shannon