Patents by Inventor Jaime Lorenzo Trueba
Jaime Lorenzo Trueba has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20250191573Abstract: A target voice dataset may be augmented using speech prediction. Encoder and decoder models may be trained to encode audio data into encoded speech data and convert it back to audio. The encoded units may include semantic information (e.g., phonemes and/or words) as well as feature data indicating prosody, timbre, speaker identity, speech style, emotion, etc. of speech. An acoustic/semantic language model (ASLM) may be configured to predict encoded speech data in a manner analogous to a language model predicting words; for example, based on preceding encoded speech data. The models may be used to generate synthesized speech samples having voice characteristics (e.g., feature data) similar to those of the target voice dataset. The augmented dataset may be used to train a text-to-speech (TTS) model to reproduce the target voice characteristics, and may improve performance of the TTS model over training with only the original target voice dataset.Type: ApplicationFiled: February 24, 2025Publication date: June 12, 2025Inventors: Mateusz Aleksander Lajszczak, Adam Marek Gabrys, Arent van Korlaar, Ruizhe Li, Elena Sergeevna Sokolova, Jaime Lorenzo Trueba, Arnaud Vincent Pierre Yves Joly, Marco Nicolis, Ekaterina Petrova
-
Patent number: 12283266Abstract: A speech-processing system receives first audio data correspond to a first voice and second audio data corresponding to a second voice. The speech-processing system determines vocal characteristics of the second voice and determines output corresponding to the first audio data and the vocal characteristics.Type: GrantFiled: April 24, 2023Date of Patent: April 22, 2025Assignee: Amazon Technologies, Inc.Inventors: Jaime Lorenzo Trueba, Alejandro Ricardo Mottini d'Oliveira, Thomas Renaud Drugman, Sri Vishnu Kumar Karlapati
-
Patent number: 12272350Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.Type: GrantFiled: May 15, 2024Date of Patent: April 8, 2025Assignee: Amazon Technologies, Inc.Inventors: Jaime Lorenzo Trueba, Thomas Renaud Drugman, Viacheslav Klimkov, Srikanth Ronanki, Thomas Edward Merritt, Andrew Paul Breen, Roberto Barra-Chicote
-
Patent number: 12254864Abstract: A target voice dataset may be augmented using speech prediction. Encoder and decoder models may be trained to encode audio data into encoded speech data, and convert it back to audio. The encoded units may include semantic information (e.g., phonemes and/or words) as well as feature data indicating prosody, timbre, speaker identity, speech style, emotion, etc. of speech. An acoustic/semantic language model (ASLM) may be configured to predict encoded speech data in a manner analogous to a language model predicting words; for example, based on preceding encoded speech data. The models may be used to generate synthesized speech samples having voice characteristics (e.g., feature data) similar to those of the target voice dataset. The augmented dataset may be used to train a text-to-speech (TTS) model to reproduce the target voice characteristics, and may improve performance of the TTS model over training with only the original target voice dataset.Type: GrantFiled: June 30, 2022Date of Patent: March 18, 2025Assignee: Amazon Technologies, Inc.Inventors: Mateusz Aleksander Lajszczak, Adam Marek Gabrys, Arent van Korlaar, Ruizhe Li, Elena Sergeevna Sokolova, Jaime Lorenzo Trueba, Arnaud Vincent Pierre Yves Joly, Marco Nicolis, Ekaterina Petrova
-
Publication number: 20250087203Abstract: A speech-processing system receives input data corresponding to one or more characteristics of speech. The system determines parameters representing the characteristics and, using the parameters, encoded values corresponding to the characteristics. A speech synthesis component of the speech-processing processes the encoded values to determine audio data including a representation of the speech and corresponding to the characteristics.Type: ApplicationFiled: August 28, 2024Publication date: March 13, 2025Inventors: Abdigani Mohamed Diriye, Jaime Lorenzo Trueba, Patryk Golebiowski, Piotr Jozwiak
-
Publication number: 20240296827Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.Type: ApplicationFiled: May 15, 2024Publication date: September 5, 2024Inventors: Jaime Lorenzo Trueba, Thomas Renaud Drugman, Viacheslav Klimkov, Srikanth Ronanki, Thomas Edward Merritt, Andrew Paul Breen, Roberto Barra-Chicote
-
Patent number: 12080269Abstract: A speech-processing system receives input data corresponding to one or more characteristics of speech. The system determines parameters representing the characteristics and, using the parameters, encoded values corresponding to the characteristics. A speech synthesis component of the speech-processing processes the encoded values to determine audio data including a representation of the speech and corresponding to the characteristics.Type: GrantFiled: May 10, 2022Date of Patent: September 3, 2024Assignee: Amazon Technologies, Inc.Inventors: Abdigani Mohamed Diriye, Jaime Lorenzo Trueba, Patryk Golebiowski, Piotr Jozwiak
-
Patent number: 11990118Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.Type: GrantFiled: June 6, 2023Date of Patent: May 21, 2024Assignee: Amazon Technologies, Inc.Inventors: Jaime Lorenzo Trueba, Thomas Renaud Drugman, Viacheslav Klimkov, Srikanth Ronanki, Thomas Edward Merritt, Andrew Paul Breen, Roberto Barra-Chicote
-
Patent number: 11915683Abstract: A text-to-speech (TTS) system may be configured to imitate characteristics of a target voice based on a limited dataset. The TTS system may include a machine learning model pre-trained using a synthetic parallel dataset and fine-tuned using examples of the target voice. A TTS component trained using a large single-speaker dataset may be used to generate the synthetic parallel dataset based on a multi-speaker dataset. The synthetic parallel dataset may include target audio data representing speech in the multi-speaker dataset and predicted audio data generated by the TTS component based on transcripts of the speech. The machine learning model may be pre-trained using the synthetic parallel dataset and fine-tuned using audio data representing target voice speech and predicted audio generated by the TTS component based on transcripts of the target voice speech. The trained model may be used to modify synthetic speech to approximate the characteristics of the target speech.Type: GrantFiled: February 14, 2022Date of Patent: February 27, 2024Assignee: Amazon Technologies, Inc.Inventors: Adam Marek Gabrys, Jaime Lorenzo Trueba, Goeric Sydney Huybrechts
-
Publication number: 20240013770Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.Type: ApplicationFiled: June 6, 2023Publication date: January 11, 2024Inventors: Jaime Lorenzo Trueba, Thomas Renaud Drugman, Viacheslav Klimkov, Srikanth Ronanki, Thomas Edward Merritt, Andrew Paul Breen, Roberto Barra-Chicote
-
Patent number: 11830476Abstract: Devices and techniques are generally described for learned condition text-to-speech synthesis. In some examples, first data representing a selection of a type of prosodic expressivity may be received. In some further examples, a selection of content comprising text data may be received. First audio data may be determined that includes an audio representation of the text data. The first audio data may be generated based at least in part on sampling from a first latent distribution generated using a conditional primary variational autoencoder (VAE). The sampling from the first latent distribution may be conditioned on a first learned distribution associated with the type of prosodic expressivity. In various examples, the first audio data may be sent to a first computing device.Type: GrantFiled: June 8, 2021Date of Patent: November 28, 2023Assignee: AMAZON TECHNOLOGIES, INC.Inventors: Panagiota Karanasou, Sri Vishnu Kumar Karlapati, Alexis Pierre Moinet, Arnaud Vincent Pierre Yves Joly, Syed Ammar Abbas, Thomas Renaud Drugman, Jaime Lorenzo Trueba
-
Patent number: 11735156Abstract: A speech-processing system receives first audio data correspond to a first voice and second audio data corresponding to a second voice. The speech-processing system determines vocal characteristics of the second voice and determines output corresponding to the first audio data and the vocal characteristics.Type: GrantFiled: August 31, 2020Date of Patent: August 22, 2023Assignee: Amazon Technologies, Inc.Inventors: Jaime Lorenzo Trueba, Alejandro Ricardo Mottini d'Oliveira, Thomas Renaud Drugman, Sri Vishnu Kumar Karlapati
-
Patent number: 11735162Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.Type: GrantFiled: August 8, 2022Date of Patent: August 22, 2023Assignee: Amazon Technologies, Inc.Inventors: Jaime Lorenzo Trueba, Thomas Renaud Drugman, Viacheslav Klimkov, Srikanth Ronanki, Thomas Edward Merritt, Andrew Paul Breen, Roberto Barra-Chicote
-
Publication number: 20230260502Abstract: A text-to-speech (TTS) system may be configured to imitate characteristics of a target voice based on a limited dataset. The TTS system may include a machine learning model pre-trained using a synthetic parallel dataset and fine-tuned using examples of the target voice. A TTS component trained using a large single-speaker dataset may be used to generate the synthetic parallel dataset based on a multi-speaker dataset. The synthetic parallel dataset may include target audio data representing speech in the multi-speaker dataset and predicted audio data generated by the TTS component based on transcripts of the speech. The machine learning model may be pre-trained using the synthetic parallel dataset and fine-tuned using audio data representing target voice speech and predicted audio generated by the TTS component based on transcripts of the target voice speech. The trained model may be used to modify synthetic speech to approximate the characteristics of the target speech.Type: ApplicationFiled: February 14, 2022Publication date: August 17, 2023Inventors: Adam Marek Gabrys, Jaime Lorenzo Trueba, Goeric Sydney Huybrechts
-
Publication number: 20230260501Abstract: A speech-processing system receives first audio data correspond to a first voice and second audio data corresponding to a second voice. The speech-processing system determines vocal characteristics of the second voice and determines output corresponding to the first audio data and the vocal characteristics.Type: ApplicationFiled: April 24, 2023Publication date: August 17, 2023Inventors: Jaime Lorenzo Trueba, Alejandro Ricardo Mottini d'Oliveira, Thomas Renaud Drugman, Sri Vishnu Kumar Karlapati
-
Publication number: 20230058658Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.Type: ApplicationFiled: August 8, 2022Publication date: February 23, 2023Inventors: Jaime Lorenzo Trueba, Thomas Renaud Drugman, Viacheslav Klimkov, Srikanth Ronanki, Thomas Edward Merritt, Andrew Paul Breen, Roberto Barra-Chicote
-
Publication number: 20230018972Abstract: A speech-processing system receives input data corresponding to one or more characteristics of speech. The system determines parameters representing the characteristics and, using the parameters, encoded values corresponding to the characteristics. A speech synthesis component of the speech-processing processes the encoded values to determine audio data including a representation of the speech and corresponding to the characteristics.Type: ApplicationFiled: May 10, 2022Publication date: January 19, 2023Inventors: Abdigani Mohamed Diriye, Jaime Lorenzo Trueba, Patryk Golebiowski, Piotr Jozwiak
-
Patent number: 11410639Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.Type: GrantFiled: July 7, 2020Date of Patent: August 9, 2022Assignee: Amazon Technologies, Inc.Inventors: Jaime Lorenzo Trueba, Thomas Renaud Drugman, Viacheslav Klimkov, Srikanth Ronanki, Thomas Edward Merritt, Andrew Paul Breen, Roberto Barra-Chicote
-
Patent number: 11341953Abstract: A speech-processing system receives input data corresponding to one or more characteristics of speech. The system determines parameters representing the characteristics and, using the parameters, encoded values corresponding to the characteristics. A speech synthesis component of the speech-processing processes the encoded values to determine audio data including a representation of the speech and corresponding to the characteristics.Type: GrantFiled: September 21, 2020Date of Patent: May 24, 2022Assignee: Amazon Technologies, Inc.Inventors: Abdigani Mohamed Diriye, Jaime Lorenzo Trueba, Patryk Golebiowski, Piotr Jozwiak
-
Publication number: 20220093078Abstract: A speech-processing system receives input data corresponding to one or more characteristics of speech. The system determines parameters representing the characteristics and, using the parameters, encoded values corresponding to the characteristics. A speech synthesis component of the speech-processing processes the encoded values to determine audio data including a representation of the speech and corresponding to the characteristics.Type: ApplicationFiled: September 21, 2020Publication date: March 24, 2022Inventors: Abdigani Mohamed Diriye, Jaime Lorenzo Trueba, Patryk Golebiowski, Piotr Jozwiak