Patents by Inventor Andrew Paul Breen

Andrew Paul Breen has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Text-to-speech (TTS) processing

Patent number: 12272350

Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.

Type: Grant

Filed: May 15, 2024

Date of Patent: April 8, 2025

Assignee: Amazon Technologies, Inc.

Inventors: Jaime Lorenzo Trueba, Thomas Renaud Drugman, Viacheslav Klimkov, Srikanth Ronanki, Thomas Edward Merritt, Andrew Paul Breen, Roberto Barra-Chicote
TEXT-TO-SPEECH (TTS) PROCESSING

Publication number: 20240296827

Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.

Type: Application

Filed: May 15, 2024

Publication date: September 5, 2024

Inventors: Jaime Lorenzo Trueba, Thomas Renaud Drugman, Viacheslav Klimkov, Srikanth Ronanki, Thomas Edward Merritt, Andrew Paul Breen, Roberto Barra-Chicote
Text-to-speech (TTS) processing

Patent number: 11990118

Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.

Type: Grant

Filed: June 6, 2023

Date of Patent: May 21, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Jaime Lorenzo Trueba, Thomas Renaud Drugman, Viacheslav Klimkov, Srikanth Ronanki, Thomas Edward Merritt, Andrew Paul Breen, Roberto Barra-Chicote
TEXT-TO-SPEECH (TTS) PROCESSING

Publication number: 20240013770

Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.

Type: Application

Filed: June 6, 2023

Publication date: January 11, 2024

Inventors: Jaime Lorenzo Trueba, Thomas Renaud Drugman, Viacheslav Klimkov, Srikanth Ronanki, Thomas Edward Merritt, Andrew Paul Breen, Roberto Barra-Chicote
Synthetic speech processing

Patent number: 11823655

Abstract: A speech-processing system receives both text data and natural-understanding data (e.g., a domain, intent, and/or entity) related to a command represented in the text data. The system uses the natural-understanding data to vary vocal characteristics in determining spectrogram data corresponding to the text data based on the natural-understanding data.

Type: Grant

Filed: June 9, 2022

Date of Patent: November 21, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Antonio Bonafonte, Panagiotis Agis Oikonomou Filandras, Bartosz Perz, Arent van Korlaar, Ioannis Douratsos, Jonas Felix Ananda Rohnke, Elena Sokolova, Andrew Paul Breen, Nikhil Sharma
Text-to-speech (TTS) processing

Patent number: 11763797

Abstract: A speech model includes a sub-model corresponding to a vocal attribute. The speech model generates an output waveform using a sample model, which receives text data, and a conditioning model, which receives text metadata and produces a prosody output for use by the sample model. If, during training or runtime, a different vocal attribute is desired or needed, the sub-model is re-trained or switched to a different sub-model corresponding to the different vocal attribute.

Type: Grant

Filed: June 23, 2020

Date of Patent: September 19, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Roberto Barra Chicote, Adam Franciszek Nadolski, Thomas Edward Merritt, Bartosz Putrycz, Andrew Paul Breen
Text-to-speech (TTS) processing

Patent number: 11735162

Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.

Type: Grant

Filed: August 8, 2022

Date of Patent: August 22, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Jaime Lorenzo Trueba, Thomas Renaud Drugman, Viacheslav Klimkov, Srikanth Ronanki, Thomas Edward Merritt, Andrew Paul Breen, Roberto Barra-Chicote
SYNTHETIC SPEECH PROCESSING

Publication number: 20230113297

Abstract: A speech-processing system receives both text data and natural-understanding data (e.g., a domain, intent, and/or entity) related to a command represented in the text data. The system uses the natural-understanding data to vary vocal characteristics in determining spectrogram data corresponding to the text data based on the natural-understanding data.

Type: Application

Filed: June 9, 2022

Publication date: April 13, 2023

Inventors: Antonio Bonafonte, Panagiotis Agis Oikonomou Filandras, Bartosz Perz, Arent van Korlaar, Ioannis Douratsos, Jonas Felix Ananda Rohnke, Elena Sokolova, Andrew Paul Breen, Nikhil Sharma
TEXT-TO-SPEECH (TTS) PROCESSING

Publication number: 20230058658

Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.

Type: Application

Filed: August 8, 2022

Publication date: February 23, 2023

Inventors: Jaime Lorenzo Trueba, Thomas Renaud Drugman, Viacheslav Klimkov, Srikanth Ronanki, Thomas Edward Merritt, Andrew Paul Breen, Roberto Barra-Chicote
TEXT-TO-SPEECH PROCESSING USING INPUT VOICE CHARACTERISTIC DATA

Publication number: 20230043916

Abstract: During text-to-speech processing, a speech model creates synthesized speech that corresponds to input data. The speech model may include an encoder for encoding the input data into a context vector and a decoder for decoding the context vector into spectrogram data. The speech model may further include a voice decoder that receives vocal characteristic data representing a desired vocal characteristic of synthesized speech. The voice decoder may process the vocal characteristic data to determine configuration data, such as weights, for use by the speech decoder.

Type: Application

Filed: June 24, 2022

Publication date: February 9, 2023

Inventors: Roberto Barra Chicote, Vatsal Aggarwal, Andrew Paul Breen, Javier Gonzalez Hernandez, Nishant Prateek
Multilingual speech translation with adaptive speech synthesis and adaptive physiognomy

Patent number: 11545134

Abstract: Techniques for the generation of dubbed audio for an audio/video are described.

Type: Grant

Filed: December 10, 2019

Date of Patent: January 3, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Marcello Federico, Robert Enyedi, Yaser Al-Onaizan, Roberto Barra-Chicote, Andrew Paul Breen, Ritwik Giri, Mehmet Umut Isik, Arvindh Krishnaswamy, Hassan Sawaf
Text-to-speech (TTS) processing

Patent number: 11410639

Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.

Type: Grant

Filed: July 7, 2020

Date of Patent: August 9, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Jaime Lorenzo Trueba, Thomas Renaud Drugman, Viacheslav Klimkov, Srikanth Ronanki, Thomas Edward Merritt, Andrew Paul Breen, Roberto Barra-Chicote
Text-to-speech processing using input voice characteristic data

Patent number: 11373633

Abstract: During text-to-speech processing, a speech model creates synthesized speech that corresponds to input data. The speech model may include an encoder for encoding the input data into a context vector and a decoder for decoding the context vector into spectrogram data. The speech model may further include a voice decoder that receives vocal characteristic data representing a desired vocal characteristic of synthesized speech. The voice decoder may process the vocal characteristic data to determine configuration data, such as weights, for use by the speech decoder.

Type: Grant

Filed: September 27, 2019

Date of Patent: June 28, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Roberto Barra Chicote, Vatsal Aggarwal, Andrew Paul Breen, Javier Gonzalez Hernandez, Nishant Prateek
Synthetic speech processing

Patent number: 11367431

Abstract: A speech-processing system receives both text data and natural-understanding data (e.g., a domain, intent, and/or entity) related to a command represented in the text data. The system uses the natural-understanding data to vary vocal characteristics in determining spectrogram data corresponding to the text data based on the natural-understanding data.

Type: Grant

Filed: March 13, 2020

Date of Patent: June 21, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Antonio Bonafonte, Panagiotis Agis Oikonomou Filandras, Bartosz Perz, Arent van Korlaar, Ioannis Douratsos, Jonas Felix Ananda Rohnke, Elena Sokolova, Andrew Paul Breen, Nikhil Sharma
SYNTHETIC SPEECH PROCESSING

Publication number: 20210287656

Abstract: A speech-processing system receives both text data and natural-understanding data (e.g., a domain, intent, and/or entity) related to a command represented in the text data. The system uses the natural-understanding data to vary vocal characteristics in determining spectrogram data corresponding to the text data based on the natural-understanding data.

Type: Application

Filed: March 13, 2020

Publication date: September 16, 2021

Inventors: Antonio Bonafonte, Panagiotis Agis Oikonomou Filandras, Bartosz Perz, Arent van Korlaar, Ioannis Douratsos, Jonas Felix Ananda Rohnke, Elena Sokolova, Andrew Paul Breen, Nikhil Sharma
Synthetic speech processing

Patent number: 11017763

Abstract: During text-to-speech processing, a sequence-to-sequence neural network model may process text data and determine corresponding spectrogram data. A normalizing flow component may then process this spectrogram data to predict corresponding phase data. An inverse Fourier transform may then be performed on the spectrogram and phase data to create an audio waveform that includes speech corresponding to the text.

Type: Grant

Filed: December 12, 2019

Date of Patent: May 25, 2021

Assignee: Amazon Technologies, Inc.

Inventors: Vatsal Aggarwal, Nishant Prateek, Roberto Barra Chicote, Andrew Paul Breen
TEXT-TO-SPEECH PROCESSING

Publication number: 20210097976

Abstract: During text-to-speech processing, a speech model creates synthesized speech that corresponds to input data. The speech model may include an encoder for encoding the input data into a context vector and a decoder for decoding the context vector into spectrogram data. The speech model may further include a voice decoder that receives vocal characteristic data representing a desired vocal characteristic of synthesized speech. The voice decoder may process the vocal characteristic data to determine configuration data, such as weights, for use by the speech decoder.

Type: Application

Filed: September 27, 2019

Publication date: April 1, 2021

Inventors: Roberto Barra Chicote, Vatsal Aggarwal, Andrew Paul Breen, Javier Gonzalez Hernandez, Nishant Prateek
TEXT-TO-SPEECH (TTS) PROCESSING

Publication number: 20200410981

Abstract: A speech model is trained using multi-task learning. A first task may correspond to how well predicted audio matches training audio; a second task may correspond to a metric of perceived audio quality. The speech model may include, during training, layers related to the second task that are discarded at runtime.

Type: Application

Filed: May 19, 2020

Publication date: December 31, 2020

Inventors: Thomas Edward Merritt, Adam Franciszek Nadolski, Nishant Prateek, Bartosz Putrycz, Roberto Barra Chicote, Vatsal Aggarwal, Andrew Paul Breen
TEXT-TO-SPEECH (TTS) PROCESSING

Publication number: 20200394997

Abstract: During text-to-speech processing, a speech model creates output audio data, including speech, that corresponds to input text data that includes a representation of the speech. A spectrogram estimator estimates a frequency spectrogram of the speech; the corresponding frequency-spectrogram data is used to condition the speech model. A plurality of acoustic features corresponding to different segments of the input text data, such as phonemes, syllable-level features, and/or word-level features, may be separately encoded into context vectors; the spectrogram estimator uses these separate context vectors to create the frequency spectrogram.

Type: Application

Filed: July 7, 2020

Publication date: December 17, 2020

Inventors: Jaime Lorenzo Trueba, Thomas Renaud Drugman, Viacheslav Klimkov, Srikanth Ronanki, Thomas Edward Merritt, Andrew Paul Breen, Roberto Barra-Chicote
TEXT-TO-SPEECH (TTS) PROCESSING

Publication number: 20200365137

Abstract: A speech model includes a sub-model corresponding to a vocal attribute. The speech model generates an output waveform using a sample model, which receives text data, and a conditioning model, which receives text metadata and produces a prosody output for use by the sample model. If, during training or runtime, a different vocal attribute is desired or needed, the sub-model is re-trained or switched to a different sub-model corresponding to the different vocal attribute.

Type: Application

Filed: June 23, 2020

Publication date: November 19, 2020

Inventors: Roberto Barra Chicote, Adam Franciszek Nadolski, Thomas Edward Merritt, Bartosz Putrycz, Andrew Paul Breen

1 2 next