Patents by Inventor Jaime Lorenzo Trueba

Jaime Lorenzo Trueba has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

AUGMENTING DATASETS FOR TRAINING AUDIO GENERATION MODELS

Publication number: 20250191573

Abstract: A target voice dataset may be augmented using speech prediction. Encoder and decoder models may be trained to encode audio data into encoded speech data and convert it back to audio. The encoded units may include semantic information (e.g., phonemes and/or words) as well as feature data indicating prosody, timbre, speaker identity, speech style, emotion, etc. of speech. An acoustic/semantic language model (ASLM) may be configured to predict encoded speech data in a manner analogous to a language model predicting words; for example, based on preceding encoded speech data. The models may be used to generate synthesized speech samples having voice characteristics (e.g., feature data) similar to those of the target voice dataset. The augmented dataset may be used to train a text-to-speech (TTS) model to reproduce the target voice characteristics, and may improve performance of the TTS model over training with only the original target voice dataset.

Type: Application

Filed: February 24, 2025

Publication date: June 12, 2025

Inventors: Mateusz Aleksander Lajszczak, Adam Marek Gabrys, Arent van Korlaar, Ruizhe Li, Elena Sergeevna Sokolova, Jaime Lorenzo Trueba, Arnaud Vincent Pierre Yves Joly, Marco Nicolis, Ekaterina Petrova
Synthetic speech processing

Patent number: 12283266

Abstract: A speech-processing system receives first audio data correspond to a first voice and second audio data corresponding to a second voice. The speech-processing system determines vocal characteristics of the second voice and determines output corresponding to the first audio data and the vocal characteristics.

Type: Grant

Filed: April 24, 2023

Date of Patent: April 22, 2025

Assignee: Amazon Technologies, Inc.

Inventors: Jaime Lorenzo Trueba, Alejandro Ricardo Mottini d'Oliveira, Thomas Renaud Drugman, Sri Vishnu Kumar Karlapati
Text-to-speech (TTS) processing

Patent number: 12272350

Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.

Type: Grant

Filed: May 15, 2024

Date of Patent: April 8, 2025

Assignee: Amazon Technologies, Inc.

Inventors: Jaime Lorenzo Trueba, Thomas Renaud Drugman, Viacheslav Klimkov, Srikanth Ronanki, Thomas Edward Merritt, Andrew Paul Breen, Roberto Barra-Chicote
Augmenting datasets for training audio generation models

Patent number: 12254864

Abstract: A target voice dataset may be augmented using speech prediction. Encoder and decoder models may be trained to encode audio data into encoded speech data, and convert it back to audio. The encoded units may include semantic information (e.g., phonemes and/or words) as well as feature data indicating prosody, timbre, speaker identity, speech style, emotion, etc. of speech. An acoustic/semantic language model (ASLM) may be configured to predict encoded speech data in a manner analogous to a language model predicting words; for example, based on preceding encoded speech data. The models may be used to generate synthesized speech samples having voice characteristics (e.g., feature data) similar to those of the target voice dataset. The augmented dataset may be used to train a text-to-speech (TTS) model to reproduce the target voice characteristics, and may improve performance of the TTS model over training with only the original target voice dataset.

Type: Grant

Filed: June 30, 2022

Date of Patent: March 18, 2025

Assignee: Amazon Technologies, Inc.

Inventors: Mateusz Aleksander Lajszczak, Adam Marek Gabrys, Arent van Korlaar, Ruizhe Li, Elena Sergeevna Sokolova, Jaime Lorenzo Trueba, Arnaud Vincent Pierre Yves Joly, Marco Nicolis, Ekaterina Petrova
SYNTHETIC SPEECH PROCESSING

Publication number: 20250087203

Abstract: A speech-processing system receives input data corresponding to one or more characteristics of speech. The system determines parameters representing the characteristics and, using the parameters, encoded values corresponding to the characteristics. A speech synthesis component of the speech-processing processes the encoded values to determine audio data including a representation of the speech and corresponding to the characteristics.

Type: Application

Filed: August 28, 2024

Publication date: March 13, 2025

Inventors: Abdigani Mohamed Diriye, Jaime Lorenzo Trueba, Patryk Golebiowski, Piotr Jozwiak
TEXT-TO-SPEECH (TTS) PROCESSING

Publication number: 20240296827

Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.

Type: Application

Filed: May 15, 2024

Publication date: September 5, 2024

Inventors: Jaime Lorenzo Trueba, Thomas Renaud Drugman, Viacheslav Klimkov, Srikanth Ronanki, Thomas Edward Merritt, Andrew Paul Breen, Roberto Barra-Chicote
Synthetic speech processing

Patent number: 12080269

Abstract: A speech-processing system receives input data corresponding to one or more characteristics of speech. The system determines parameters representing the characteristics and, using the parameters, encoded values corresponding to the characteristics. A speech synthesis component of the speech-processing processes the encoded values to determine audio data including a representation of the speech and corresponding to the characteristics.

Type: Grant

Filed: May 10, 2022

Date of Patent: September 3, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Abdigani Mohamed Diriye, Jaime Lorenzo Trueba, Patryk Golebiowski, Piotr Jozwiak
Text-to-speech (TTS) processing

Patent number: 11990118

Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.

Type: Grant

Filed: June 6, 2023

Date of Patent: May 21, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Jaime Lorenzo Trueba, Thomas Renaud Drugman, Viacheslav Klimkov, Srikanth Ronanki, Thomas Edward Merritt, Andrew Paul Breen, Roberto Barra-Chicote
Voice adaptation using synthetic speech processing

Patent number: 11915683

Abstract: A text-to-speech (TTS) system may be configured to imitate characteristics of a target voice based on a limited dataset. The TTS system may include a machine learning model pre-trained using a synthetic parallel dataset and fine-tuned using examples of the target voice. A TTS component trained using a large single-speaker dataset may be used to generate the synthetic parallel dataset based on a multi-speaker dataset. The synthetic parallel dataset may include target audio data representing speech in the multi-speaker dataset and predicted audio data generated by the TTS component based on transcripts of the speech. The machine learning model may be pre-trained using the synthetic parallel dataset and fine-tuned using audio data representing target voice speech and predicted audio generated by the TTS component based on transcripts of the target voice speech. The trained model may be used to modify synthetic speech to approximate the characteristics of the target speech.

Type: Grant

Filed: February 14, 2022

Date of Patent: February 27, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Adam Marek Gabrys, Jaime Lorenzo Trueba, Goeric Sydney Huybrechts
TEXT-TO-SPEECH (TTS) PROCESSING

Publication number: 20240013770

Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.

Type: Application

Filed: June 6, 2023

Publication date: January 11, 2024

Inventors: Jaime Lorenzo Trueba, Thomas Renaud Drugman, Viacheslav Klimkov, Srikanth Ronanki, Thomas Edward Merritt, Andrew Paul Breen, Roberto Barra-Chicote
Learned condition text-to-speech synthesis

Patent number: 11830476

Abstract: Devices and techniques are generally described for learned condition text-to-speech synthesis. In some examples, first data representing a selection of a type of prosodic expressivity may be received. In some further examples, a selection of content comprising text data may be received. First audio data may be determined that includes an audio representation of the text data. The first audio data may be generated based at least in part on sampling from a first latent distribution generated using a conditional primary variational autoencoder (VAE). The sampling from the first latent distribution may be conditioned on a first learned distribution associated with the type of prosodic expressivity. In various examples, the first audio data may be sent to a first computing device.

Type: Grant

Filed: June 8, 2021

Date of Patent: November 28, 2023

Assignee: AMAZON TECHNOLOGIES, INC.

Inventors: Panagiota Karanasou, Sri Vishnu Kumar Karlapati, Alexis Pierre Moinet, Arnaud Vincent Pierre Yves Joly, Syed Ammar Abbas, Thomas Renaud Drugman, Jaime Lorenzo Trueba
Synthetic speech processing

Patent number: 11735156

Abstract: A speech-processing system receives first audio data correspond to a first voice and second audio data corresponding to a second voice. The speech-processing system determines vocal characteristics of the second voice and determines output corresponding to the first audio data and the vocal characteristics.

Type: Grant

Filed: August 31, 2020

Date of Patent: August 22, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Jaime Lorenzo Trueba, Alejandro Ricardo Mottini d'Oliveira, Thomas Renaud Drugman, Sri Vishnu Kumar Karlapati
Text-to-speech (TTS) processing

Patent number: 11735162

Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.

Type: Grant

Filed: August 8, 2022

Date of Patent: August 22, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Jaime Lorenzo Trueba, Thomas Renaud Drugman, Viacheslav Klimkov, Srikanth Ronanki, Thomas Edward Merritt, Andrew Paul Breen, Roberto Barra-Chicote
VOICE ADAPTATION USING SYNTHETIC SPEECH PROCESSING

Publication number: 20230260502

Abstract: A text-to-speech (TTS) system may be configured to imitate characteristics of a target voice based on a limited dataset. The TTS system may include a machine learning model pre-trained using a synthetic parallel dataset and fine-tuned using examples of the target voice. A TTS component trained using a large single-speaker dataset may be used to generate the synthetic parallel dataset based on a multi-speaker dataset. The synthetic parallel dataset may include target audio data representing speech in the multi-speaker dataset and predicted audio data generated by the TTS component based on transcripts of the speech. The machine learning model may be pre-trained using the synthetic parallel dataset and fine-tuned using audio data representing target voice speech and predicted audio generated by the TTS component based on transcripts of the target voice speech. The trained model may be used to modify synthetic speech to approximate the characteristics of the target speech.

Type: Application

Filed: February 14, 2022

Publication date: August 17, 2023

Inventors: Adam Marek Gabrys, Jaime Lorenzo Trueba, Goeric Sydney Huybrechts
SYNTHETIC SPEECH PROCESSING

Publication number: 20230260501

Abstract: A speech-processing system receives first audio data correspond to a first voice and second audio data corresponding to a second voice. The speech-processing system determines vocal characteristics of the second voice and determines output corresponding to the first audio data and the vocal characteristics.

Type: Application

Filed: April 24, 2023

Publication date: August 17, 2023

Inventors: Jaime Lorenzo Trueba, Alejandro Ricardo Mottini d'Oliveira, Thomas Renaud Drugman, Sri Vishnu Kumar Karlapati
TEXT-TO-SPEECH (TTS) PROCESSING

Publication number: 20230058658

Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.

Type: Application

Filed: August 8, 2022

Publication date: February 23, 2023

Inventors: Jaime Lorenzo Trueba, Thomas Renaud Drugman, Viacheslav Klimkov, Srikanth Ronanki, Thomas Edward Merritt, Andrew Paul Breen, Roberto Barra-Chicote
SYNTHETIC SPEECH PROCESSING

Publication number: 20230018972

Abstract: A speech-processing system receives input data corresponding to one or more characteristics of speech. The system determines parameters representing the characteristics and, using the parameters, encoded values corresponding to the characteristics. A speech synthesis component of the speech-processing processes the encoded values to determine audio data including a representation of the speech and corresponding to the characteristics.

Type: Application

Filed: May 10, 2022

Publication date: January 19, 2023

Inventors: Abdigani Mohamed Diriye, Jaime Lorenzo Trueba, Patryk Golebiowski, Piotr Jozwiak
Text-to-speech (TTS) processing

Patent number: 11410639

Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.

Type: Grant

Filed: July 7, 2020

Date of Patent: August 9, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Jaime Lorenzo Trueba, Thomas Renaud Drugman, Viacheslav Klimkov, Srikanth Ronanki, Thomas Edward Merritt, Andrew Paul Breen, Roberto Barra-Chicote
Synthetic speech processing

Patent number: 11341953

Abstract: A speech-processing system receives input data corresponding to one or more characteristics of speech. The system determines parameters representing the characteristics and, using the parameters, encoded values corresponding to the characteristics. A speech synthesis component of the speech-processing processes the encoded values to determine audio data including a representation of the speech and corresponding to the characteristics.

Type: Grant

Filed: September 21, 2020

Date of Patent: May 24, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Abdigani Mohamed Diriye, Jaime Lorenzo Trueba, Patryk Golebiowski, Piotr Jozwiak
SYNTHETIC SPEECH PROCESSING

Publication number: 20220093078

Abstract: A speech-processing system receives input data corresponding to one or more characteristics of speech. The system determines parameters representing the characteristics and, using the parameters, encoded values corresponding to the characteristics. A speech synthesis component of the speech-processing processes the encoded values to determine audio data including a representation of the speech and corresponding to the characteristics.

Type: Application

Filed: September 21, 2020

Publication date: March 24, 2022

Inventors: Abdigani Mohamed Diriye, Jaime Lorenzo Trueba, Patryk Golebiowski, Piotr Jozwiak

1 2 next