Patents by Inventor Jaime Lorenzo Trueba

Jaime Lorenzo Trueba has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20250191573
    Abstract: A target voice dataset may be augmented using speech prediction. Encoder and decoder models may be trained to encode audio data into encoded speech data and convert it back to audio. The encoded units may include semantic information (e.g., phonemes and/or words) as well as feature data indicating prosody, timbre, speaker identity, speech style, emotion, etc. of speech. An acoustic/semantic language model (ASLM) may be configured to predict encoded speech data in a manner analogous to a language model predicting words; for example, based on preceding encoded speech data. The models may be used to generate synthesized speech samples having voice characteristics (e.g., feature data) similar to those of the target voice dataset. The augmented dataset may be used to train a text-to-speech (TTS) model to reproduce the target voice characteristics, and may improve performance of the TTS model over training with only the original target voice dataset.
    Type: Application
    Filed: February 24, 2025
    Publication date: June 12, 2025
    Inventors: Mateusz Aleksander Lajszczak, Adam Marek Gabrys, Arent van Korlaar, Ruizhe Li, Elena Sergeevna Sokolova, Jaime Lorenzo Trueba, Arnaud Vincent Pierre Yves Joly, Marco Nicolis, Ekaterina Petrova
  • Patent number: 12283266
    Abstract: A speech-processing system receives first audio data correspond to a first voice and second audio data corresponding to a second voice. The speech-processing system determines vocal characteristics of the second voice and determines output corresponding to the first audio data and the vocal characteristics.
    Type: Grant
    Filed: April 24, 2023
    Date of Patent: April 22, 2025
    Assignee: Amazon Technologies, Inc.
    Inventors: Jaime Lorenzo Trueba, Alejandro Ricardo Mottini d'Oliveira, Thomas Renaud Drugman, Sri Vishnu Kumar Karlapati
  • Patent number: 12272350
    Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.
    Type: Grant
    Filed: May 15, 2024
    Date of Patent: April 8, 2025
    Assignee: Amazon Technologies, Inc.
    Inventors: Jaime Lorenzo Trueba, Thomas Renaud Drugman, Viacheslav Klimkov, Srikanth Ronanki, Thomas Edward Merritt, Andrew Paul Breen, Roberto Barra-Chicote
  • Patent number: 12254864
    Abstract: A target voice dataset may be augmented using speech prediction. Encoder and decoder models may be trained to encode audio data into encoded speech data, and convert it back to audio. The encoded units may include semantic information (e.g., phonemes and/or words) as well as feature data indicating prosody, timbre, speaker identity, speech style, emotion, etc. of speech. An acoustic/semantic language model (ASLM) may be configured to predict encoded speech data in a manner analogous to a language model predicting words; for example, based on preceding encoded speech data. The models may be used to generate synthesized speech samples having voice characteristics (e.g., feature data) similar to those of the target voice dataset. The augmented dataset may be used to train a text-to-speech (TTS) model to reproduce the target voice characteristics, and may improve performance of the TTS model over training with only the original target voice dataset.
    Type: Grant
    Filed: June 30, 2022
    Date of Patent: March 18, 2025
    Assignee: Amazon Technologies, Inc.
    Inventors: Mateusz Aleksander Lajszczak, Adam Marek Gabrys, Arent van Korlaar, Ruizhe Li, Elena Sergeevna Sokolova, Jaime Lorenzo Trueba, Arnaud Vincent Pierre Yves Joly, Marco Nicolis, Ekaterina Petrova
  • Publication number: 20250087203
    Abstract: A speech-processing system receives input data corresponding to one or more characteristics of speech. The system determines parameters representing the characteristics and, using the parameters, encoded values corresponding to the characteristics. A speech synthesis component of the speech-processing processes the encoded values to determine audio data including a representation of the speech and corresponding to the characteristics.
    Type: Application
    Filed: August 28, 2024
    Publication date: March 13, 2025
    Inventors: Abdigani Mohamed Diriye, Jaime Lorenzo Trueba, Patryk Golebiowski, Piotr Jozwiak
  • Publication number: 20240296827
    Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.
    Type: Application
    Filed: May 15, 2024
    Publication date: September 5, 2024
    Inventors: Jaime Lorenzo Trueba, Thomas Renaud Drugman, Viacheslav Klimkov, Srikanth Ronanki, Thomas Edward Merritt, Andrew Paul Breen, Roberto Barra-Chicote
  • Patent number: 12080269
    Abstract: A speech-processing system receives input data corresponding to one or more characteristics of speech. The system determines parameters representing the characteristics and, using the parameters, encoded values corresponding to the characteristics. A speech synthesis component of the speech-processing processes the encoded values to determine audio data including a representation of the speech and corresponding to the characteristics.
    Type: Grant
    Filed: May 10, 2022
    Date of Patent: September 3, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Abdigani Mohamed Diriye, Jaime Lorenzo Trueba, Patryk Golebiowski, Piotr Jozwiak
  • Patent number: 11990118
    Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.
    Type: Grant
    Filed: June 6, 2023
    Date of Patent: May 21, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Jaime Lorenzo Trueba, Thomas Renaud Drugman, Viacheslav Klimkov, Srikanth Ronanki, Thomas Edward Merritt, Andrew Paul Breen, Roberto Barra-Chicote
  • Patent number: 11915683
    Abstract: A text-to-speech (TTS) system may be configured to imitate characteristics of a target voice based on a limited dataset. The TTS system may include a machine learning model pre-trained using a synthetic parallel dataset and fine-tuned using examples of the target voice. A TTS component trained using a large single-speaker dataset may be used to generate the synthetic parallel dataset based on a multi-speaker dataset. The synthetic parallel dataset may include target audio data representing speech in the multi-speaker dataset and predicted audio data generated by the TTS component based on transcripts of the speech. The machine learning model may be pre-trained using the synthetic parallel dataset and fine-tuned using audio data representing target voice speech and predicted audio generated by the TTS component based on transcripts of the target voice speech. The trained model may be used to modify synthetic speech to approximate the characteristics of the target speech.
    Type: Grant
    Filed: February 14, 2022
    Date of Patent: February 27, 2024
    Assignee: Amazon Technologies, Inc.
    Inventors: Adam Marek Gabrys, Jaime Lorenzo Trueba, Goeric Sydney Huybrechts
  • Publication number: 20240013770
    Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.
    Type: Application
    Filed: June 6, 2023
    Publication date: January 11, 2024
    Inventors: Jaime Lorenzo Trueba, Thomas Renaud Drugman, Viacheslav Klimkov, Srikanth Ronanki, Thomas Edward Merritt, Andrew Paul Breen, Roberto Barra-Chicote
  • Patent number: 11830476
    Abstract: Devices and techniques are generally described for learned condition text-to-speech synthesis. In some examples, first data representing a selection of a type of prosodic expressivity may be received. In some further examples, a selection of content comprising text data may be received. First audio data may be determined that includes an audio representation of the text data. The first audio data may be generated based at least in part on sampling from a first latent distribution generated using a conditional primary variational autoencoder (VAE). The sampling from the first latent distribution may be conditioned on a first learned distribution associated with the type of prosodic expressivity. In various examples, the first audio data may be sent to a first computing device.
    Type: Grant
    Filed: June 8, 2021
    Date of Patent: November 28, 2023
    Assignee: AMAZON TECHNOLOGIES, INC.
    Inventors: Panagiota Karanasou, Sri Vishnu Kumar Karlapati, Alexis Pierre Moinet, Arnaud Vincent Pierre Yves Joly, Syed Ammar Abbas, Thomas Renaud Drugman, Jaime Lorenzo Trueba
  • Patent number: 11735156
    Abstract: A speech-processing system receives first audio data correspond to a first voice and second audio data corresponding to a second voice. The speech-processing system determines vocal characteristics of the second voice and determines output corresponding to the first audio data and the vocal characteristics.
    Type: Grant
    Filed: August 31, 2020
    Date of Patent: August 22, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Jaime Lorenzo Trueba, Alejandro Ricardo Mottini d'Oliveira, Thomas Renaud Drugman, Sri Vishnu Kumar Karlapati
  • Patent number: 11735162
    Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.
    Type: Grant
    Filed: August 8, 2022
    Date of Patent: August 22, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Jaime Lorenzo Trueba, Thomas Renaud Drugman, Viacheslav Klimkov, Srikanth Ronanki, Thomas Edward Merritt, Andrew Paul Breen, Roberto Barra-Chicote
  • Publication number: 20230260502
    Abstract: A text-to-speech (TTS) system may be configured to imitate characteristics of a target voice based on a limited dataset. The TTS system may include a machine learning model pre-trained using a synthetic parallel dataset and fine-tuned using examples of the target voice. A TTS component trained using a large single-speaker dataset may be used to generate the synthetic parallel dataset based on a multi-speaker dataset. The synthetic parallel dataset may include target audio data representing speech in the multi-speaker dataset and predicted audio data generated by the TTS component based on transcripts of the speech. The machine learning model may be pre-trained using the synthetic parallel dataset and fine-tuned using audio data representing target voice speech and predicted audio generated by the TTS component based on transcripts of the target voice speech. The trained model may be used to modify synthetic speech to approximate the characteristics of the target speech.
    Type: Application
    Filed: February 14, 2022
    Publication date: August 17, 2023
    Inventors: Adam Marek Gabrys, Jaime Lorenzo Trueba, Goeric Sydney Huybrechts
  • Publication number: 20230260501
    Abstract: A speech-processing system receives first audio data correspond to a first voice and second audio data corresponding to a second voice. The speech-processing system determines vocal characteristics of the second voice and determines output corresponding to the first audio data and the vocal characteristics.
    Type: Application
    Filed: April 24, 2023
    Publication date: August 17, 2023
    Inventors: Jaime Lorenzo Trueba, Alejandro Ricardo Mottini d'Oliveira, Thomas Renaud Drugman, Sri Vishnu Kumar Karlapati
  • Publication number: 20230058658
    Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.
    Type: Application
    Filed: August 8, 2022
    Publication date: February 23, 2023
    Inventors: Jaime Lorenzo Trueba, Thomas Renaud Drugman, Viacheslav Klimkov, Srikanth Ronanki, Thomas Edward Merritt, Andrew Paul Breen, Roberto Barra-Chicote
  • Publication number: 20230018972
    Abstract: A speech-processing system receives input data corresponding to one or more characteristics of speech. The system determines parameters representing the characteristics and, using the parameters, encoded values corresponding to the characteristics. A speech synthesis component of the speech-processing processes the encoded values to determine audio data including a representation of the speech and corresponding to the characteristics.
    Type: Application
    Filed: May 10, 2022
    Publication date: January 19, 2023
    Inventors: Abdigani Mohamed Diriye, Jaime Lorenzo Trueba, Patryk Golebiowski, Piotr Jozwiak
  • Patent number: 11410639
    Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.
    Type: Grant
    Filed: July 7, 2020
    Date of Patent: August 9, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Jaime Lorenzo Trueba, Thomas Renaud Drugman, Viacheslav Klimkov, Srikanth Ronanki, Thomas Edward Merritt, Andrew Paul Breen, Roberto Barra-Chicote
  • Patent number: 11341953
    Abstract: A speech-processing system receives input data corresponding to one or more characteristics of speech. The system determines parameters representing the characteristics and, using the parameters, encoded values corresponding to the characteristics. A speech synthesis component of the speech-processing processes the encoded values to determine audio data including a representation of the speech and corresponding to the characteristics.
    Type: Grant
    Filed: September 21, 2020
    Date of Patent: May 24, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Abdigani Mohamed Diriye, Jaime Lorenzo Trueba, Patryk Golebiowski, Piotr Jozwiak
  • Publication number: 20220093078
    Abstract: A speech-processing system receives input data corresponding to one or more characteristics of speech. The system determines parameters representing the characteristics and, using the parameters, encoded values corresponding to the characteristics. A speech synthesis component of the speech-processing processes the encoded values to determine audio data including a representation of the speech and corresponding to the characteristics.
    Type: Application
    Filed: September 21, 2020
    Publication date: March 24, 2022
    Inventors: Abdigani Mohamed Diriye, Jaime Lorenzo Trueba, Patryk Golebiowski, Piotr Jozwiak