Patents by Inventor Marco Nicolis

Marco Nicolis has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

AUGMENTING DATASETS FOR TRAINING AUDIO GENERATION MODELS

Publication number: 20250191573

Abstract: A target voice dataset may be augmented using speech prediction. Encoder and decoder models may be trained to encode audio data into encoded speech data and convert it back to audio. The encoded units may include semantic information (e.g., phonemes and/or words) as well as feature data indicating prosody, timbre, speaker identity, speech style, emotion, etc. of speech. An acoustic/semantic language model (ASLM) may be configured to predict encoded speech data in a manner analogous to a language model predicting words; for example, based on preceding encoded speech data. The models may be used to generate synthesized speech samples having voice characteristics (e.g., feature data) similar to those of the target voice dataset. The augmented dataset may be used to train a text-to-speech (TTS) model to reproduce the target voice characteristics, and may improve performance of the TTS model over training with only the original target voice dataset.

Type: Application

Filed: February 24, 2025

Publication date: June 12, 2025

Inventors: Mateusz Aleksander Lajszczak, Adam Marek Gabrys, Arent van Korlaar, Ruizhe Li, Elena Sergeevna Sokolova, Jaime Lorenzo Trueba, Arnaud Vincent Pierre Yves Joly, Marco Nicolis, Ekaterina Petrova
Augmenting datasets for training audio generation models

Patent number: 12254864

Abstract: A target voice dataset may be augmented using speech prediction. Encoder and decoder models may be trained to encode audio data into encoded speech data, and convert it back to audio. The encoded units may include semantic information (e.g., phonemes and/or words) as well as feature data indicating prosody, timbre, speaker identity, speech style, emotion, etc. of speech. An acoustic/semantic language model (ASLM) may be configured to predict encoded speech data in a manner analogous to a language model predicting words; for example, based on preceding encoded speech data. The models may be used to generate synthesized speech samples having voice characteristics (e.g., feature data) similar to those of the target voice dataset. The augmented dataset may be used to train a text-to-speech (TTS) model to reproduce the target voice characteristics, and may improve performance of the TTS model over training with only the original target voice dataset.

Type: Grant

Filed: June 30, 2022

Date of Patent: March 18, 2025

Assignee: Amazon Technologies, Inc.

Inventors: Mateusz Aleksander Lajszczak, Adam Marek Gabrys, Arent van Korlaar, Ruizhe Li, Elena Sergeevna Sokolova, Jaime Lorenzo Trueba, Arnaud Vincent Pierre Yves Joly, Marco Nicolis, Ekaterina Petrova
Emphasizing portions of synthesized speech

Patent number: 12243511

Abstract: A neural text-to-speech system may be configured to emphasize words. Applying emphasis where appropriate enables the TTS system to better reproduce prosodic characteristics of human speech. Emphasis may make the resulting synthesized speech more understandable and engaging than synthesized speech lacking emphasis. Emphasis may be manually annotated to, and/or predicted from, a source text (e.g., a book). In some implementations, the system may use a generative model such as a variational autoencoder to generate word acoustic embeddings indicating how emphasis is to be reflected in the synthesized speech. A phoneme encoder of the TTS system may process phonemes to generate phoneme embeddings. A decoder may process the word acoustic embeddings and the phoneme embeddings to generate spectrogram data representing the synthesized speech.

Type: Grant

Filed: March 31, 2022

Date of Patent: March 4, 2025

Assignee: Amazon Technologies, Inc.

Inventors: Arnaud Vincent Pierre Yves Joly, Marco Nicolis, Elena Sergeevna Sokolova, Jedrzej Sobanski, Mateusz Aleksander Lajszczak, Arent van Korlaar, Ruizhe Li
Text-to-speech processing with emphasized output audio

Patent number: 11062694

Abstract: Systems and methods for generating output audio with emphasized portions are described. Spoken audio is obtained and undergoes speech processing (e.g., ASR and optionally NLU) to create text. It may be determined that the resulting text includes a portion that should be emphasized (e.g., an interjection) using at least one of knowledge of an application run on a device that captured the spoken audio, prosodic analysis, and/or linguistic analysis. The portion of text to be emphasized may be tagged (e.g., using a Speech Synthesis Markup Language (SSML) tag). TTS processing is then performed on the tagged text to create output audio including an emphasized portion corresponding to the tagged portion of the text.

Type: Grant

Filed: June 7, 2019

Date of Patent: July 13, 2021

Assignee: Amazon Technologies, Inc.

Inventors: Marco Nicolis, Adam Franciszek Nadolski
Text-to-speech (TTS) processing

Patent number: 10699695

Abstract: During text-to-speech processing, audio data corresponding to a word part, word, or group of words is generated using a trained model and used by a unit selection engine to create output audio. The audio data is generated at least when an input word is unrecognized or when a cost of a unit selection is too high.

Type: Grant

Filed: June 29, 2018

Date of Patent: June 30, 2020

Assignee: Amazon Washington, Inc.

Inventors: Adam Franciszek Nadolski, Daniel Korzekwa, Thomas Edward Merritt, Marco Nicolis, Bartosz Putrycz, Roberto Barra Chicote, Rafal Kuklinski, Wiktor Dolecki
TEXT-TO-SPEECH PROCESSING WITH EMPHASIZED OUTPUT AUDIO

Publication number: 20190362704

Abstract: Systems and methods for generating output audio with emphasized portions are described. Spoken audio is obtained and undergoes speech processing (e.g., ASR and optionally NLU) to create text. It may be determined that the resulting text includes a portion that should be emphasized (e.g., an interjection) using at least one of knowledge of an application run on a device that captured the spoken audio, prosodic analysis, and/or linguistic analysis. The portion of text to be emphasized may be tagged (e.g., using a Speech Synthesis Markup Language (SSML) tag). TTS processing is then performed on the tagged text to create output audio including an emphasized portion corresponding to the tagged portion of the text.

Type: Application

Filed: June 7, 2019

Publication date: November 28, 2019

Inventors: Marco Nicolis, Adam Franciszek Nadolski
Systems and methods for providing natural responses to commands

Patent number: 10339166

Abstract: Methods and devices for generating unique and different responses to commands are described herein. Natural language generation techniques may be employed to formulate responses to commands that are tailored to particular users. These responses account for previously provided responses, previously commands that have been made, and/or geographic locations of the requesting individual, for example. In some embodiments, an audible command may be received by a backend system from a voice activated electronic device. Text data may be generated from the audible command, and a user intent of the command is determined. Based on the user intent, a response from a particular application may be obtained. The response may be compared with previously generated responses and, if a similar responses was determined to have been provided previously, one or more different words, or a different arrangement of words, may be used to generate a new response.

Type: Grant

Filed: October 15, 2018

Date of Patent: July 2, 2019

Assignee: Amazon Technologies, Inc.

Inventors: Remus Razvan Mois, Marco Nicolis
Text-to-speech processing with emphasized output audio

Patent number: 10319365

Abstract: Systems and methods for generating output audio with emphasized portions are described. Spoken audio is obtained and undergoes speech processing (e.g., ASR and optionally NLU) to create text. It may be determined that the resulting text includes a portion that should be emphasized (e.g., an interjection) using at least one of knowledge of an application run on a device that captured the spoken audio, prosodic analysis, and/or linguistic analysis. The portion of text to be emphasized may be tagged (e.g., using a Speech Synthesis Markup Language (SSML) tag). TTS processing is then performed on the tagged text to create output audio including an emphasized portion corresponding to the tagged portion of the text.

Type: Grant

Filed: June 27, 2016

Date of Patent: June 11, 2019

Assignee: Amazon Technologies, Inc.

Inventors: Marco Nicolis, Adam Franciszek Nadolski
Systems and methods for providing natural responses to commands

Patent number: 10102844

Abstract: Methods and devices for generating unique and different responses to commands are described herein. Natural language generation techniques may be employed to formulate responses to commands that are tailored to particular users. These responses account for previously provided responses, previously commands that have been made, and/or geographic locations of the requesting individual, for example. In some embodiments, an audible command may be received by a backend system from a voice activated electronic device. Text data may be generated from the audible command, and a user intent of the command is determined. Based on the user intent, a response from a particular application may be obtained. The response may be compared with previously generated responses and, if a similar responses was determined to have been provided previously, one or more different words, or a different arrangement of words, may be used to generate a new response.

Type: Grant

Filed: March 29, 2016

Date of Patent: October 16, 2018

Assignee: Amazon Technologies, Inc.

Inventors: Remus Razvan Mois, Marco Nicolis