Patents by Inventor Tom Marius Kenter

Tom Marius Kenter has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Key frame networks

Patent number: 12046227

Abstract: A method for generating frame values using a key frame network includes receiving a text utterance having at least one phoneme, and for each respective phoneme of the at least one phoneme, predicting, using a predictive model, a fixed quantity of key frames. Each respective key frame of the fixed quantity of key frames includes a representation of a component of the respective phoneme. The method also includes generating, using the fixed quantity of key frames, a plurality of frame values. Here, each respective frame value of the plurality of frame values is representative of a fixed-duration of audio.

Type: Grant

Filed: April 19, 2022

Date of Patent: July 23, 2024

Assignee: Google LLC

Inventors: Tom Marius Kenter, Tobias Alexander Hawker, Robert Clark
Speech synthesis prosody using a BERT model

Patent number: 11881210

Abstract: A method for generating a prosodic representation includes receiving a text utterance having one or more words. Each word has at least one syllable having at least one phoneme. The method also includes generating, using a Bidirectional Encoder Representations from Transformers (BERT) model, a sequence of wordpiece embeddings and selecting an utterance embedding for the text utterance, the utterance embedding representing an intended prosody. Each wordpiece embedding is associated with one of the one or more words of the text utterance. For each syllable, using the selected utterance embedding and a prosody model that incorporates the BERT model, the method also includes generating a corresponding prosodic syllable embedding for the syllable based on the wordpiece embedding associated with the word that includes the syllable and predicting a duration of the syllable by encoding linguistic features of each phoneme of the syllable with the corresponding prosodic syllable embedding for the syllable.

Type: Grant

Filed: May 5, 2020

Date of Patent: January 23, 2024

Assignee: Google LLC

Inventors: Tom Marius Kenter, Manish Kumar Sharma, Robert Andrew James Clark, Aliaksei Severyn
Key Frame Networks

Publication number: 20230335110

Abstract: A method for generating frame values using a key frame network includes receiving a text utterance having at least one phoneme, and for each respective phoneme of the at least one phoneme, predicting, using a predictive model, a fixed quantity of key frames. Each respective key frame of the fixed quantity of key frames includes a representation of a component of the respective phoneme. The method also includes generating, using the fixed quantity of key frames, a plurality of frame values. Here, each respective frame value of the plurality of frame values is representative of a fixed-duration of audio.

Type: Application

Filed: April 19, 2022

Publication date: October 19, 2023

Applicant: Google LLC

Inventors: Tom Marius Kenter, Tobias Alexander Hawker, Robert Clark
Self-training WaveNet for text-to-speech

Patent number: 11295725

Abstract: A method of self-training WaveNet includes receiving a plurality of recorded speech samples and training a first autoregressive neural network using the plurality of recorded speech samples. The trained first autoregressive neural network is configured to output synthetic speech as an audible representations of a text input. The method further includes generating a plurality of synthetic speech samples using the trained first autoregressive neural network. The method additionally includes training a second autoregressive neural network using the plurality of synthetic speech samples from the trained first autoregressive neural network and distilling the trained second autoregressive neural network into a feedforward neural network.

Type: Grant

Filed: July 9, 2020

Date of Patent: April 5, 2022

Assignee: Google LLC

Inventors: Manish Sharma, Tom Marius Kenter, Robert Clark
Self-Training WaveNet for Text-to-Speech

Publication number: 20220013105

Abstract: A method of self-training WaveNet includes receiving a plurality of recorded speech samples and training a first autoregressive neural network using the plurality of recorded speech samples. The trained first autoregressive neural network is configured to output synthetic speech as an audible representations of a text input. The method further includes generating a plurality of synthetic speech samples using the trained first autoregressive neural network. The method additionally includes training a second autoregressive neural network using the plurality of synthetic speech samples from the trained first autoregressive neural network and distilling the trained second autoregressive neural network into a feedforward neural network.

Type: Application

Filed: July 9, 2020

Publication date: January 13, 2022

Applicant: Google LLC

Inventors: Manish Sharma, Tom Marius Kenter, Robert Clark
Speech Synthesis Prosody Using A BERT Model

Publication number: 20210350795

Abstract: A method for generating a prosodic representation includes receiving a text utterance having one or more words. Each word has at least one syllable having at least one phoneme. The method also includes generating, using a Bidirectional Encoder Representations from Transformers (BERT) model, a sequence of wordpiece embeddings and selecting an utterance embedding for the text utterance, the utterance embedding representing an intended prosody. Each wordpiece embedding is associated with one of the one or more words of the text utterance. For each syllable, using the selected utterance embedding and a prosody model that incorporates the BERT model, the method also includes generating a corresponding prosodic syllable embedding for the syllable based on the wordpiece embedding associated with the word that includes the syllable and predicting a duration of the syllable by encoding linguistic features of each phoneme of the syllable with the corresponding prosodic syllable embedding for the syllable.

Type: Application

Filed: May 5, 2020

Publication date: November 11, 2021

Applicant: Google LLC

Inventors: Tom Marius Kenter, Manish Kumar Sharma, Robert Andrew James Clark, Aliaksei Severyn

Key frame networks

Speech synthesis prosody using a BERT model

Key Frame Networks

Self-training WaveNet for text-to-speech

Self-Training WaveNet for Text-to-Speech

Speech Synthesis Prosody Using A BERT Model