Patents by Inventor Adam Franciszek Nadolski

Adam Franciszek Nadolski has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11763797
    Abstract: A speech model includes a sub-model corresponding to a vocal attribute. The speech model generates an output waveform using a sample model, which receives text data, and a conditioning model, which receives text metadata and produces a prosody output for use by the sample model. If, during training or runtime, a different vocal attribute is desired or needed, the sub-model is re-trained or switched to a different sub-model corresponding to the different vocal attribute.
    Type: Grant
    Filed: June 23, 2020
    Date of Patent: September 19, 2023
    Assignee: Amazon Technologies, Inc.
    Inventors: Roberto Barra Chicote, Adam Franciszek Nadolski, Thomas Edward Merritt, Bartosz Putrycz, Andrew Paul Breen
  • Patent number: 11443733
    Abstract: A text-to-speech (TTS) system that is capable of considering characteristics of various portions of text data in order to create continuity between segments of synthesized speech. The system can analyze text portions of a work and create feature vectors including data corresponding to characteristics of the individual portions and/or the overall work. A TTS processing component can then consider feature vector(s) from other portions when performing TTS processing on text of a first portion, thus giving the TTS component some intelligence regarding other portions of the work, which can then result in more continuity between synthesized speech segments.
    Type: Grant
    Filed: October 28, 2019
    Date of Patent: September 13, 2022
    Assignee: Amazon Technologies, Inc.
    Inventors: Roberto Barra Chicote, Javier Latorre, Adam Franciszek Nadolski, Viacheslav Klimkov, Thomas Edward Merritt
  • Patent number: 11062694
    Abstract: Systems and methods for generating output audio with emphasized portions are described. Spoken audio is obtained and undergoes speech processing (e.g., ASR and optionally NLU) to create text. It may be determined that the resulting text includes a portion that should be emphasized (e.g., an interjection) using at least one of knowledge of an application run on a device that captured the spoken audio, prosodic analysis, and/or linguistic analysis. The portion of text to be emphasized may be tagged (e.g., using a Speech Synthesis Markup Language (SSML) tag). TTS processing is then performed on the tagged text to create output audio including an emphasized portion corresponding to the tagged portion of the text.
    Type: Grant
    Filed: June 7, 2019
    Date of Patent: July 13, 2021
    Assignee: Amazon Technologies, Inc.
    Inventors: Marco Nicolis, Adam Franciszek Nadolski
  • Publication number: 20200410981
    Abstract: A speech model is trained using multi-task learning. A first task may correspond to how well predicted audio matches training audio; a second task may correspond to a metric of perceived audio quality. The speech model may include, during training, layers related to the second task that are discarded at runtime.
    Type: Application
    Filed: May 19, 2020
    Publication date: December 31, 2020
    Inventors: Thomas Edward Merritt, Adam Franciszek Nadolski, Nishant Prateek, Bartosz Putrycz, Roberto Barra Chicote, Vatsal Aggarwal, Andrew Paul Breen
  • Publication number: 20200365137
    Abstract: A speech model includes a sub-model corresponding to a vocal attribute. The speech model generates an output waveform using a sample model, which receives text data, and a conditioning model, which receives text metadata and produces a prosody output for use by the sample model. If, during training or runtime, a different vocal attribute is desired or needed, the sub-model is re-trained or switched to a different sub-model corresponding to the different vocal attribute.
    Type: Application
    Filed: June 23, 2020
    Publication date: November 19, 2020
    Inventors: Roberto Barra Chicote, Adam Franciszek Nadolski, Thomas Edward Merritt, Bartosz Putrycz, Andrew Paul Breen
  • Patent number: 10706837
    Abstract: A speech model includes a sub-model corresponding to a vocal attribute. The speech model generates an output waveform using a sample model, which receives text data, and a conditioning model, which receives text metadata and produces a prosody output for use by the sample model. If, during training or runtime, a different vocal attribute is desired or needed, the sub-model is re-trained or switched to a different sub-model corresponding to the different vocal attribute.
    Type: Grant
    Filed: June 13, 2018
    Date of Patent: July 7, 2020
    Assignee: Amazon Technologies, Inc.
    Inventors: Roberto Barra Chicote, Adam Franciszek Nadolski, Thomas Edward Merritt, Bartosz Putrycz, Andrew Paul Breen
  • Patent number: 10699695
    Abstract: During text-to-speech processing, audio data corresponding to a word part, word, or group of words is generated using a trained model and used by a unit selection engine to create output audio. The audio data is generated at least when an input word is unrecognized or when a cost of a unit selection is too high.
    Type: Grant
    Filed: June 29, 2018
    Date of Patent: June 30, 2020
    Assignee: Amazon Washington, Inc.
    Inventors: Adam Franciszek Nadolski, Daniel Korzekwa, Thomas Edward Merritt, Marco Nicolis, Bartosz Putrycz, Roberto Barra Chicote, Rafal Kuklinski, Wiktor Dolecki
  • Patent number: 10692484
    Abstract: A speech model is trained using multi-task learning. A first task may correspond to how well predicted audio matches training audio; a second task may correspond to a metric of perceived audio quality. The speech model may include, during training, layers related to the second task that are discarded at runtime.
    Type: Grant
    Filed: June 13, 2018
    Date of Patent: June 23, 2020
    Assignee: Amazon Technologies, Inc.
    Inventors: Thomas Edward Merritt, Adam Franciszek Nadolski, Nishant Prateek, Bartosz Putrycz, Roberto Barra Chicote, Vatsal Aggarwal, Andrew Paul Breen
  • Publication number: 20200152169
    Abstract: A text-to-speech (TTS) system that is capable of considering characteristics of various portions of text data in order to create continuity between segments of synthesized speech. The system can analyze text portions of a work and create feature vectors including data corresponding to characteristics of the individual portions and/or the overall work. A TTS processing component can then consider feature vector(s) from other portions when performing TTS processing on text of a first portion, thus giving the TTS component some intelligence regarding other portions of the work, which can then result in more continuity between synthesized speech segments.
    Type: Application
    Filed: October 28, 2019
    Publication date: May 14, 2020
    Inventors: Roberto Barra Chicote, Javier Latorre, Adam Franciszek Nadolski, Viacheslav Klimkov, Thomas Edward Merritt
  • Publication number: 20190362704
    Abstract: Systems and methods for generating output audio with emphasized portions are described. Spoken audio is obtained and undergoes speech processing (e.g., ASR and optionally NLU) to create text. It may be determined that the resulting text includes a portion that should be emphasized (e.g., an interjection) using at least one of knowledge of an application run on a device that captured the spoken audio, prosodic analysis, and/or linguistic analysis. The portion of text to be emphasized may be tagged (e.g., using a Speech Synthesis Markup Language (SSML) tag). TTS processing is then performed on the tagged text to create output audio including an emphasized portion corresponding to the tagged portion of the text.
    Type: Application
    Filed: June 7, 2019
    Publication date: November 28, 2019
    Inventors: Marco Nicolis, Adam Franciszek Nadolski
  • Patent number: 10475438
    Abstract: A text-to-speech (TTS) system that is capable of considering characteristics of various portions of text data in order to create continuity between segments of synthesized speech. The system can analyze text portions of a work and create feature vectors including data corresponding to characteristics of the individual portions and/or the overall work. A TTS processing component can then consider feature vector(s) from other portions when performing TTS processing on text of a first portion, thus giving the TTS component some intelligence regarding other portions of the work, which can then result in more continuity between synthesized speech segments.
    Type: Grant
    Filed: March 2, 2017
    Date of Patent: November 12, 2019
    Assignee: Amazon Technologies, Inc.
    Inventors: Roberto Barra Chicote, Javier Latorre, Adam Franciszek Nadolski, Viacheslav Klimkov, Thomas Edward Merritt
  • Patent number: 10319365
    Abstract: Systems and methods for generating output audio with emphasized portions are described. Spoken audio is obtained and undergoes speech processing (e.g., ASR and optionally NLU) to create text. It may be determined that the resulting text includes a portion that should be emphasized (e.g., an interjection) using at least one of knowledge of an application run on a device that captured the spoken audio, prosodic analysis, and/or linguistic analysis. The portion of text to be emphasized may be tagged (e.g., using a Speech Synthesis Markup Language (SSML) tag). TTS processing is then performed on the tagged text to create output audio including an emphasized portion corresponding to the tagged portion of the text.
    Type: Grant
    Filed: June 27, 2016
    Date of Patent: June 11, 2019
    Assignee: Amazon Technologies, Inc.
    Inventors: Marco Nicolis, Adam Franciszek Nadolski
  • Patent number: 10079021
    Abstract: Systems and methods for utilizing incremental processing of portions of output data to limit the time required to provide a response to a user request are provided herein. In some embodiments, portions of the user request for information can be analyzed using techniques such as automatic speech recognition (ASR), speech-to-text (STT), and natural language understanding (NLU) to determine the overall topic of the user request. One the topic has been determined, portions of the anticipated audio output data can be synthesized independently instead of waiting for the complete response. The synthesized portions can then be provided to the electronic device in anticipation of being output through one or more speakers on the electronic device, which speeds up the time that the response can be provided to the user.
    Type: Grant
    Filed: December 18, 2015
    Date of Patent: September 18, 2018
    Assignee: Amazon Technologies, Inc.
    Inventors: Roberto Barra Chicote, Adam Franciszek Nadolski
  • Patent number: 9978359
    Abstract: A text-to-speech (TTS) processing system may be configured for iterative processing. Speech units for unit selection may be tagged according to extra segmental features, such as emotional features, dramatic features, etc. Preliminary TTS results based on input text may be provided to a user through a user interface. The user may offer corrections to the preliminary results. Those corrections may correspond to the extra segmental features. The user corrections may then be input into the TTS system along with the input text to provide refined TTS results. This process may be repeated iteratively to obtain desired TTS results.
    Type: Grant
    Filed: December 6, 2013
    Date of Patent: May 22, 2018
    Assignee: Amazon Technologies, Inc.
    Inventors: Michal Tadeusz Kaszczuk, Jeffrey Penrod Adams, Adam Franciszek Nadolski
  • Patent number: 9240178
    Abstract: A text-to-speech (TTS) system is configured with multiple voice corpuses used to synthesize speech. An incoming TTS request may be processed by a first, smaller, voice corpus to quickly return results to the user. The text of the request may be stored by the TTS system and then processed in the background using a second, larger, voice corpus. The second corpus takes longer to process but returns higher quality results. Future incoming TTS requests may be compared against the text of the first TTS request. If the text, or portions thereof match, the system may return stored results from the processing by the second corpus, thus returning high quality speech results in a shorter time.
    Type: Grant
    Filed: June 26, 2014
    Date of Patent: January 19, 2016
    Assignee: AMAZON TECHNOLOGIES, INC.
    Inventors: Adam Franciszek Nadolski, Michal Krzysztof Kiedrowicz