Patents by Inventor Adam Franciszek Nadolski

Adam Franciszek Nadolski has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Text-to-speech (TTS) processing

Patent number: 11763797

Abstract: A speech model includes a sub-model corresponding to a vocal attribute. The speech model generates an output waveform using a sample model, which receives text data, and a conditioning model, which receives text metadata and produces a prosody output for use by the sample model. If, during training or runtime, a different vocal attribute is desired or needed, the sub-model is re-trained or switched to a different sub-model corresponding to the different vocal attribute.

Type: Grant

Filed: June 23, 2020

Date of Patent: September 19, 2023

Assignee: Amazon Technologies, Inc.

Inventors: Roberto Barra Chicote, Adam Franciszek Nadolski, Thomas Edward Merritt, Bartosz Putrycz, Andrew Paul Breen
Contextual text-to-speech processing

Patent number: 11443733

Abstract: A text-to-speech (TTS) system that is capable of considering characteristics of various portions of text data in order to create continuity between segments of synthesized speech. The system can analyze text portions of a work and create feature vectors including data corresponding to characteristics of the individual portions and/or the overall work. A TTS processing component can then consider feature vector(s) from other portions when performing TTS processing on text of a first portion, thus giving the TTS component some intelligence regarding other portions of the work, which can then result in more continuity between synthesized speech segments.

Type: Grant

Filed: October 28, 2019

Date of Patent: September 13, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Roberto Barra Chicote, Javier Latorre, Adam Franciszek Nadolski, Viacheslav Klimkov, Thomas Edward Merritt
Text-to-speech processing with emphasized output audio

Patent number: 11062694

Abstract: Systems and methods for generating output audio with emphasized portions are described. Spoken audio is obtained and undergoes speech processing (e.g., ASR and optionally NLU) to create text. It may be determined that the resulting text includes a portion that should be emphasized (e.g., an interjection) using at least one of knowledge of an application run on a device that captured the spoken audio, prosodic analysis, and/or linguistic analysis. The portion of text to be emphasized may be tagged (e.g., using a Speech Synthesis Markup Language (SSML) tag). TTS processing is then performed on the tagged text to create output audio including an emphasized portion corresponding to the tagged portion of the text.

Type: Grant

Filed: June 7, 2019

Date of Patent: July 13, 2021

Assignee: Amazon Technologies, Inc.

Inventors: Marco Nicolis, Adam Franciszek Nadolski
TEXT-TO-SPEECH (TTS) PROCESSING

Publication number: 20200410981

Abstract: A speech model is trained using multi-task learning. A first task may correspond to how well predicted audio matches training audio; a second task may correspond to a metric of perceived audio quality. The speech model may include, during training, layers related to the second task that are discarded at runtime.

Type: Application

Filed: May 19, 2020

Publication date: December 31, 2020

Inventors: Thomas Edward Merritt, Adam Franciszek Nadolski, Nishant Prateek, Bartosz Putrycz, Roberto Barra Chicote, Vatsal Aggarwal, Andrew Paul Breen
TEXT-TO-SPEECH (TTS) PROCESSING

Publication number: 20200365137

Abstract: A speech model includes a sub-model corresponding to a vocal attribute. The speech model generates an output waveform using a sample model, which receives text data, and a conditioning model, which receives text metadata and produces a prosody output for use by the sample model. If, during training or runtime, a different vocal attribute is desired or needed, the sub-model is re-trained or switched to a different sub-model corresponding to the different vocal attribute.

Type: Application

Filed: June 23, 2020

Publication date: November 19, 2020

Inventors: Roberto Barra Chicote, Adam Franciszek Nadolski, Thomas Edward Merritt, Bartosz Putrycz, Andrew Paul Breen
Text-to-speech (TTS) processing

Patent number: 10706837

Abstract: A speech model includes a sub-model corresponding to a vocal attribute. The speech model generates an output waveform using a sample model, which receives text data, and a conditioning model, which receives text metadata and produces a prosody output for use by the sample model. If, during training or runtime, a different vocal attribute is desired or needed, the sub-model is re-trained or switched to a different sub-model corresponding to the different vocal attribute.

Type: Grant

Filed: June 13, 2018

Date of Patent: July 7, 2020

Assignee: Amazon Technologies, Inc.

Inventors: Roberto Barra Chicote, Adam Franciszek Nadolski, Thomas Edward Merritt, Bartosz Putrycz, Andrew Paul Breen
Text-to-speech (TTS) processing

Patent number: 10699695

Abstract: During text-to-speech processing, audio data corresponding to a word part, word, or group of words is generated using a trained model and used by a unit selection engine to create output audio. The audio data is generated at least when an input word is unrecognized or when a cost of a unit selection is too high.

Type: Grant

Filed: June 29, 2018

Date of Patent: June 30, 2020

Assignee: Amazon Washington, Inc.

Inventors: Adam Franciszek Nadolski, Daniel Korzekwa, Thomas Edward Merritt, Marco Nicolis, Bartosz Putrycz, Roberto Barra Chicote, Rafal Kuklinski, Wiktor Dolecki
Text-to-speech (TTS) processing

Patent number: 10692484

Abstract: A speech model is trained using multi-task learning. A first task may correspond to how well predicted audio matches training audio; a second task may correspond to a metric of perceived audio quality. The speech model may include, during training, layers related to the second task that are discarded at runtime.

Type: Grant

Filed: June 13, 2018

Date of Patent: June 23, 2020

Assignee: Amazon Technologies, Inc.

Inventors: Thomas Edward Merritt, Adam Franciszek Nadolski, Nishant Prateek, Bartosz Putrycz, Roberto Barra Chicote, Vatsal Aggarwal, Andrew Paul Breen
CONTEXTUAL TEXT-TO-SPEECH PROCESSING

Publication number: 20200152169

Abstract: A text-to-speech (TTS) system that is capable of considering characteristics of various portions of text data in order to create continuity between segments of synthesized speech. The system can analyze text portions of a work and create feature vectors including data corresponding to characteristics of the individual portions and/or the overall work. A TTS processing component can then consider feature vector(s) from other portions when performing TTS processing on text of a first portion, thus giving the TTS component some intelligence regarding other portions of the work, which can then result in more continuity between synthesized speech segments.

Type: Application

Filed: October 28, 2019

Publication date: May 14, 2020

Inventors: Roberto Barra Chicote, Javier Latorre, Adam Franciszek Nadolski, Viacheslav Klimkov, Thomas Edward Merritt
TEXT-TO-SPEECH PROCESSING WITH EMPHASIZED OUTPUT AUDIO

Publication number: 20190362704

Abstract: Systems and methods for generating output audio with emphasized portions are described. Spoken audio is obtained and undergoes speech processing (e.g., ASR and optionally NLU) to create text. It may be determined that the resulting text includes a portion that should be emphasized (e.g., an interjection) using at least one of knowledge of an application run on a device that captured the spoken audio, prosodic analysis, and/or linguistic analysis. The portion of text to be emphasized may be tagged (e.g., using a Speech Synthesis Markup Language (SSML) tag). TTS processing is then performed on the tagged text to create output audio including an emphasized portion corresponding to the tagged portion of the text.

Type: Application

Filed: June 7, 2019

Publication date: November 28, 2019

Inventors: Marco Nicolis, Adam Franciszek Nadolski
Contextual text-to-speech processing

Patent number: 10475438

Abstract: A text-to-speech (TTS) system that is capable of considering characteristics of various portions of text data in order to create continuity between segments of synthesized speech. The system can analyze text portions of a work and create feature vectors including data corresponding to characteristics of the individual portions and/or the overall work. A TTS processing component can then consider feature vector(s) from other portions when performing TTS processing on text of a first portion, thus giving the TTS component some intelligence regarding other portions of the work, which can then result in more continuity between synthesized speech segments.

Type: Grant

Filed: March 2, 2017

Date of Patent: November 12, 2019

Assignee: Amazon Technologies, Inc.

Inventors: Roberto Barra Chicote, Javier Latorre, Adam Franciszek Nadolski, Viacheslav Klimkov, Thomas Edward Merritt
Text-to-speech processing with emphasized output audio

Patent number: 10319365

Abstract: Systems and methods for generating output audio with emphasized portions are described. Spoken audio is obtained and undergoes speech processing (e.g., ASR and optionally NLU) to create text. It may be determined that the resulting text includes a portion that should be emphasized (e.g., an interjection) using at least one of knowledge of an application run on a device that captured the spoken audio, prosodic analysis, and/or linguistic analysis. The portion of text to be emphasized may be tagged (e.g., using a Speech Synthesis Markup Language (SSML) tag). TTS processing is then performed on the tagged text to create output audio including an emphasized portion corresponding to the tagged portion of the text.

Type: Grant

Filed: June 27, 2016

Date of Patent: June 11, 2019

Assignee: Amazon Technologies, Inc.

Inventors: Marco Nicolis, Adam Franciszek Nadolski
Low latency audio interface

Patent number: 10079021

Abstract: Systems and methods for utilizing incremental processing of portions of output data to limit the time required to provide a response to a user request are provided herein. In some embodiments, portions of the user request for information can be analyzed using techniques such as automatic speech recognition (ASR), speech-to-text (STT), and natural language understanding (NLU) to determine the overall topic of the user request. One the topic has been determined, portions of the anticipated audio output data can be synthesized independently instead of waiting for the complete response. The synthesized portions can then be provided to the electronic device in anticipation of being output through one or more speakers on the electronic device, which speeds up the time that the response can be provided to the user.

Type: Grant

Filed: December 18, 2015

Date of Patent: September 18, 2018

Assignee: Amazon Technologies, Inc.

Inventors: Roberto Barra Chicote, Adam Franciszek Nadolski
Iterative text-to-speech with user feedback

Patent number: 9978359

Abstract: A text-to-speech (TTS) processing system may be configured for iterative processing. Speech units for unit selection may be tagged according to extra segmental features, such as emotional features, dramatic features, etc. Preliminary TTS results based on input text may be provided to a user through a user interface. The user may offer corrections to the preliminary results. Those corrections may correspond to the extra segmental features. The user corrections may then be input into the TTS system along with the input text to provide refined TTS results. This process may be repeated iteratively to obtain desired TTS results.

Type: Grant

Filed: December 6, 2013

Date of Patent: May 22, 2018

Assignee: Amazon Technologies, Inc.

Inventors: Michal Tadeusz Kaszczuk, Jeffrey Penrod Adams, Adam Franciszek Nadolski
Text-to-speech processing using pre-stored results

Patent number: 9240178

Abstract: A text-to-speech (TTS) system is configured with multiple voice corpuses used to synthesize speech. An incoming TTS request may be processed by a first, smaller, voice corpus to quickly return results to the user. The text of the request may be stored by the TTS system and then processed in the background using a second, larger, voice corpus. The second corpus takes longer to process but returns higher quality results. Future incoming TTS requests may be compared against the text of the first TTS request. If the text, or portions thereof match, the system may return stored results from the processing by the second corpus, thus returning high quality speech results in a shorter time.

Type: Grant

Filed: June 26, 2014

Date of Patent: January 19, 2016

Assignee: AMAZON TECHNOLOGIES, INC.

Inventors: Adam Franciszek Nadolski, Michal Krzysztof Kiedrowicz