Patents by Inventor Michal Tadeusz Kaszczuk

Michal Tadeusz Kaszczuk has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Iterative text-to-speech with user feedback

Patent number: 9978359

Abstract: A text-to-speech (TTS) processing system may be configured for iterative processing. Speech units for unit selection may be tagged according to extra segmental features, such as emotional features, dramatic features, etc. Preliminary TTS results based on input text may be provided to a user through a user interface. The user may offer corrections to the preliminary results. Those corrections may correspond to the extra segmental features. The user corrections may then be input into the TTS system along with the input text to provide refined TTS results. This process may be repeated iteratively to obtain desired TTS results.

Type: Grant

Filed: December 6, 2013

Date of Patent: May 22, 2018

Assignee: Amazon Technologies, Inc.

Inventors: Michal Tadeusz Kaszczuk, Jeffrey Penrod Adams, Adam Franciszek Nadolski
Adjustable TTS devices

Patent number: 9704476

Abstract: In a distributed text-to-speech (TTS) system, a remote TTS device, such as a TTS server, may experience increased loads of TTS requests, which may result in delayed processing of TTS requests. To avoid such delays, upon indication or prediction of an increased load, a TTS server may adjust unit selection TTS processing by altering unit selection techniques to speed processing, at the expense of potential result quality. Such techniques may include use of a reduced size unit database, a narrow Viterbi beam search, and/or a reduced size candidate unit graph.

Type: Grant

Filed: June 27, 2013

Date of Patent: July 11, 2017

Assignee: AMAZON TECHNOLOGIES, INC.

Inventors: Krzysztof Franciszek Swietlinski, Michal Tadeusz Kaszczuk
Reduced latency text-to-speech system

Patent number: 9646601

Abstract: In delivering text-to-speech (TTS) results to a user, the time between the user request and delivery of initial TTS results is reduced using one or more of various techniques. Caching of TTS results may be reconfigured to cache unit indices rather than full speech synthesis results. More powerful computing resources may be dedicated to early TTS processing. A user may be notified of TTS results prior to complete processing of a TTS request. Early TTS processing may be performed by a local device and then passed to a remote device.

Type: Grant

Filed: July 26, 2013

Date of Patent: May 9, 2017

Assignee: AMAZON TECHNOLOGIES, INC.

Inventors: Jacek Jerzy Jedrzejczak, Krzysztof Franciszek Swietlinski, Michal Tadeusz Kaszczuk, Lukasz Maciej Osowski
Inserting breath sounds into text-to-speech output

Patent number: 9508338

Abstract: A text-to-speech (TTS) system may be configured to incorporate breath sounds in the output speech. By incorporating breath sounds into speech output from text a TTS system may be able to mimic more naturally sounding human speech, particularly for long-form narration of text longer than short phrases. The breath sounds may be stored as units for unit selection or may be generated during parametric synthesis. The acoustic features of the breath sounds and duration between breaths may depend upon the punctuation of text, the linguistic distance between breaths, the breaks between intonational phrases, the linguistic context of the breaths, and other factors.

Type: Grant

Filed: November 15, 2013

Date of Patent: November 29, 2016

Assignee: Amazon Technologies, Inc.

Inventors: Michal Tadeusz Kaszczuk, Maciej Tegi, Michal Czuczman, Remus Razvan Mois
Hybrid unit selection / parametric TTS system

Patent number: 9484014

Abstract: In a text-to-speech (TTS) system, a database including sample speech units for unit selection may be include both units represented by sample audio segments as well as parametric representations of units created by Hidden Markov Models (HMMs). Inclusion of parametric representations in the database may reduce the storage necessary to maintain the database. The parametric representations may be configured to match a voice of the audio segments. The parametric representations may correspond to phonetic units that are less frequently encountered in TTS processing, such as rare diphones or phonemes corresponding to foreign languages. Multiple foreign language HMM models may be used to enable polyglot synthesis with a reduction in storage capacity requirements. Parametrically stored speech units may be combined with speech segments generated during processing time by a parametric model.

Type: Grant

Filed: February 20, 2013

Date of Patent: November 1, 2016

Assignee: Amazon Technologies, Inc.

Inventors: Michal Tadeusz Kaszczuk, Lukasz Maciej Osowski
Cost efficient distributed text-to-speech processing

Patent number: 9311912

Abstract: Text-to-speech (TTS) processing systems may be divided among remote TTS servers which are accessible through a network connection to local user devices. The costs for performing processing on these servers may vary according to time. To improve efficiency of TTS processing certain requests may be scheduled during low cost server times. A user may indicate a preference for such low cost delivery. A user may also indicate a preference for quick turnaround time, permitting scheduling of TTS processing during higher cost server times. A TTS processing system may also consider quality of TTS results when scheduling server processing time for a particular TTS request and may allocate more server time when higher quality results are desired.

Type: Grant

Filed: July 22, 2013

Date of Patent: April 12, 2016

Assignee: Amazon Technologies, Inc.

Inventors: Krzysztof Franciszek Swietlinski, Michal Tadeusz Kaszczuk

Iterative text-to-speech with user feedback

Adjustable TTS devices

Reduced latency text-to-speech system

Inserting breath sounds into text-to-speech output

Hybrid unit selection / parametric TTS system

Cost efficient distributed text-to-speech processing