Abstract: A style encoder can be trained to encode audio style and audio characteristics into selected regions of a style vector. The style vector can be used to condition a text to speech (TTS) model to generate speech with human-understandable and controllable styles. Various training strategies of the style encoder are described, including a first, second and third training strategy that can be used to disentangle audio styles into selected regions of a style vector. The distinct regions of the style vector can be used to provide numerous customization options to a user of the described system, along with tools to generate speech with a speaker identity and using selected audio styles and characteristics.
Type:
Grant
Filed:
August 11, 2023
Date of Patent:
March 10, 2026
Assignee:
Naro Corp.
Inventors:
Todd Silverstein, Max Florian Frenzel, Lyle Patrick Stein
Abstract: A style encoder can be trained to encode audio style and audio characteristics into selected regions of a style vector. The style vector can be used to condition a text to speech (TTS) model to generate speech with human-understandable and controllable styles. Various training strategies of the style encoder are described, including a first, second and third training strategy that can be used to disentangle audio styles into selected regions of a style vector. The distinct regions of the style vector can be used to provide numerous customization options to a user of the described system, along with tools to generate speech with a speaker identity and using selected audio styles and characteristics.
Type:
Grant
Filed:
August 11, 2023
Date of Patent:
March 3, 2026
Assignee:
Naro Corp.
Inventors:
Lyle Patrick Stein, Max Florian Frenzel, Todd Silverstein