Patents by Inventor Tom Marius Kenter

Tom Marius Kenter has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11881210
    Abstract: A method for generating a prosodic representation includes receiving a text utterance having one or more words. Each word has at least one syllable having at least one phoneme. The method also includes generating, using a Bidirectional Encoder Representations from Transformers (BERT) model, a sequence of wordpiece embeddings and selecting an utterance embedding for the text utterance, the utterance embedding representing an intended prosody. Each wordpiece embedding is associated with one of the one or more words of the text utterance. For each syllable, using the selected utterance embedding and a prosody model that incorporates the BERT model, the method also includes generating a corresponding prosodic syllable embedding for the syllable based on the wordpiece embedding associated with the word that includes the syllable and predicting a duration of the syllable by encoding linguistic features of each phoneme of the syllable with the corresponding prosodic syllable embedding for the syllable.
    Type: Grant
    Filed: May 5, 2020
    Date of Patent: January 23, 2024
    Assignee: Google LLC
    Inventors: Tom Marius Kenter, Manish Kumar Sharma, Robert Andrew James Clark, Aliaksei Severyn
  • Publication number: 20230335110
    Abstract: A method for generating frame values using a key frame network includes receiving a text utterance having at least one phoneme, and for each respective phoneme of the at least one phoneme, predicting, using a predictive model, a fixed quantity of key frames. Each respective key frame of the fixed quantity of key frames includes a representation of a component of the respective phoneme. The method also includes generating, using the fixed quantity of key frames, a plurality of frame values. Here, each respective frame value of the plurality of frame values is representative of a fixed-duration of audio.
    Type: Application
    Filed: April 19, 2022
    Publication date: October 19, 2023
    Applicant: Google LLC
    Inventors: Tom Marius Kenter, Tobias Alexander Hawker, Robert Clark
  • Patent number: 11295725
    Abstract: A method of self-training WaveNet includes receiving a plurality of recorded speech samples and training a first autoregressive neural network using the plurality of recorded speech samples. The trained first autoregressive neural network is configured to output synthetic speech as an audible representations of a text input. The method further includes generating a plurality of synthetic speech samples using the trained first autoregressive neural network. The method additionally includes training a second autoregressive neural network using the plurality of synthetic speech samples from the trained first autoregressive neural network and distilling the trained second autoregressive neural network into a feedforward neural network.
    Type: Grant
    Filed: July 9, 2020
    Date of Patent: April 5, 2022
    Assignee: Google LLC
    Inventors: Manish Sharma, Tom Marius Kenter, Robert Clark
  • Publication number: 20220013105
    Abstract: A method of self-training WaveNet includes receiving a plurality of recorded speech samples and training a first autoregressive neural network using the plurality of recorded speech samples. The trained first autoregressive neural network is configured to output synthetic speech as an audible representations of a text input. The method further includes generating a plurality of synthetic speech samples using the trained first autoregressive neural network. The method additionally includes training a second autoregressive neural network using the plurality of synthetic speech samples from the trained first autoregressive neural network and distilling the trained second autoregressive neural network into a feedforward neural network.
    Type: Application
    Filed: July 9, 2020
    Publication date: January 13, 2022
    Applicant: Google LLC
    Inventors: Manish Sharma, Tom Marius Kenter, Robert Clark
  • Publication number: 20210350795
    Abstract: A method for generating a prosodic representation includes receiving a text utterance having one or more words. Each word has at least one syllable having at least one phoneme. The method also includes generating, using a Bidirectional Encoder Representations from Transformers (BERT) model, a sequence of wordpiece embeddings and selecting an utterance embedding for the text utterance, the utterance embedding representing an intended prosody. Each wordpiece embedding is associated with one of the one or more words of the text utterance. For each syllable, using the selected utterance embedding and a prosody model that incorporates the BERT model, the method also includes generating a corresponding prosodic syllable embedding for the syllable based on the wordpiece embedding associated with the word that includes the syllable and predicting a duration of the syllable by encoding linguistic features of each phoneme of the syllable with the corresponding prosodic syllable embedding for the syllable.
    Type: Application
    Filed: May 5, 2020
    Publication date: November 11, 2021
    Applicant: Google LLC
    Inventors: Tom Marius Kenter, Manish Kumar Sharma, Robert Andrew James Clark, Aliaksei Severyn