Prosody Rules Derived From Text (epo) Patents (Class 704/E13.013)

Multi-speaker neural text-to-speech

Patent number: 11651763

Abstract: Described herein are systems and methods for augmenting neural speech synthesis networks with low-dimensional trainable speaker embeddings in order to generate speech from different voices from a single model. As a starting point for multi-speaker experiments, improved single-speaker model embodiments, which may be referred to generally as Deep Voice 2 embodiments, were developed, as well as a post-processing neural vocoder for Tacotron (a neural character-to-spectrogram model). New techniques for multi-speaker speech synthesis were performed for both Deep Voice 2 and Tacotron embodiments on two multi-speaker TTS datasets—showing that neural text-to-speech systems can learn hundreds of unique voices from twenty-five minutes of audio per speaker.

Type: Grant

Filed: November 2, 2020

Date of Patent: May 16, 2023

Assignee: Baidu USA LLC

Inventors: Sercan O. Arik, Gregory Diamos, Andrew Gibiansky, John Miller, Kainan Peng, Wei Ping, Jonathan Raiman, Yanqi Zhou
Systems and methods for text-to-speech synthesis using spoken example

Patent number: 8886538

Abstract: Systems and methods for speech synthesis and, in particular, text-to-speech systems and methods for converting a text input to a synthetic waveform by processing prosodic and phonetic content of a spoken example of the text input to accurately mimic the input speech style and pronunciation. Systems and methods provide an interface to a TTS system to allow a user to input a text string and a spoken utterance of the text string, extract prosodic parameters from the spoken input, and process the prosodic parameters to derive corresponding markup for the text input to enable a more natural sounding synthesized speech.

Type: Grant

Filed: September 26, 2003

Date of Patent: November 11, 2014

Assignee: Nuance Communications, Inc.

Inventors: Andy Aaron, Raimo Bakis, Ellen M. Eide, Wael M. Hamza
Method and apparatus for labelling speech

Patent number: 7962341

Abstract: A method for the prosodic labelling of speech including performing a first analysis step using data from an audio file, wherein the audio file is analysed as a plurality of frames positioned at fixed time intervals in said audio file; and performing a second analysis step on said data from said audio file using results of said first analysis step, wherein analysis is performed using a plurality of analysis windows and wherein the position of the analysis windows are determined by segmental information.

Type: Grant

Filed: December 8, 2006

Date of Patent: June 14, 2011

Assignee: Kabushiki Kaisha Toshiba

Inventor: Norbert Braunschweiler
TECHNIQUES TO CREATE A CUSTOM VOICE FONT

Publication number: 20100312563

Abstract: Techniques to create and share custom voice fonts are described. An apparatus may include a preprocessing component to receive voice audio data and a corresponding text script from a client and to process the voice audio data to produce prosody labels and a rich script. The apparatus may further include a verification component to automatically verify the voice audio data and the text script. The apparatus may further include a training component to train a custom voice font from the verified voice audio data and rich script and to generate custom voice font data usable by the TTS component. Other embodiments are described and claimed.

Type: Application

Filed: June 4, 2009

Publication date: December 9, 2010

Applicant: MICROSOFT CORPORATION

Inventors: Sheng Zhao, Zhi Li, Shenghao Qin, Chiwei Che, Jingyang Xu, Binggong Ding
Method and System for Training a Text-to-Speech Synthesis System Using a Specific Domain Speech Database

Publication number: 20090300041

Abstract: A method and system are disclosed that train a text-to-speech synthesis system for use in speech synthesis. The method includes generating a speech database of audio files comprising domain-specific voices having various prosodies, and training a text-to-speech synthesis system using the speech database by selecting audio segments having a prosody based on at least one dialog state. The system includes a processor, a speech database of audio files, and modules for implementing the method.

Type: Application

Filed: August 13, 2009

Publication date: December 3, 2009

Applicant: AT&T Corp.

Inventor: Horst Juergen Schroeter
SPEECH SYNTHESIZER

Publication number: 20090254349

Abstract: A speech synthesizer can execute speech content editing at high speed and generate speech content easily. The speech synthesizer includes a small speech element DB (101), a small speech element selection unit (102), a small speech element concatenation unit (103), a prosody modification unit (104), a large speech element DB (105), a correspondence DB (106) that associates the small speech element DB (101) with the large speech element DB (105), a speech element candidate obtainment unit (107), a large speech element selection unit (108), and a large speech element concatenation unit (109). By editing synthetic speech using the small speech element DB (101) and performing quality enhancement on an editing result using the large speech element DB (105), speech content can be generated easily on a mobile terminal.

Type: Application

Filed: May 11, 2007

Publication date: October 8, 2009

Inventors: Yoshifumi Hirose, Yumiko Kato, Takahiro Kamai
PITCH PATTERN GENERATION METHOD AND APPARATUS THEREOF

Publication number: 20090055188

Abstract: The prosody control unit pattern generation module generates pitch patterns in respective prosody control units based on language attribute information, the phoneme duration and emphasis degree information, the modification method decision module decides a modification method by smoothing processing with respect to the pitch pattern in a connection portion between the prosody control unit and at least one of previous and next prosody control units based on at least emphasis degree information to generate modification method information, and the pattern connection module modifies pitch patterns generated in respective prosody control units by smoothing processing according to the modification method information and connects them to generate a sentence pitch pattern corresponding to a text to be a target for speech synthesis.

Type: Application

Filed: February 22, 2008

Publication date: February 26, 2009

Applicant: KABUSHIKI KAISHA TOSHIBA

Inventors: Gou Hirabayashi, Takehiko Kagoshima
MOBILE LANGUAGE INTERPRETER WITH TEXT TO SPEECH

Publication number: 20090048821

Abstract: Embodiments are directed towards a language learning environment accessible from within virtually any website that enables a user to practice a language using tools such as translators, and text to speech capabilities. In one embodiment, the user may access a webpage in one language, and employ the language widget to select portions of content on the webpage, perform translation of the content, or perform a text to audio (speech) conversion of the selected portions. The text to speech conversion may be performed independent of translation, thereby allowing the user to hear a pronunciation of text within the website in a language associated with the website. The user may download an audio file of the converted text for use in later replay for mobile learning.

Type: Application

Filed: June 2, 2008

Publication date: February 19, 2009

Applicant: Yahoo! Inc.

Inventors: Shuk Yin Yam, Jeong Sik Jang
SYSTEM FOR TUNING SYNTHESIZED SPEECH

Publication number: 20080167875

Abstract: An embodiment of the invention is a software tool used to convert text, speech synthesis markup language (SSML), and or extended SSML to synthesized audio. Provisions are provided to create, view, play, and edit the synthesized speech including editing pitch and duration targets, speaking type, paralinguistic events, and prosody. Prosody can be provided by way of a sample recording. Users can interact with the software tool by way of a graphical user interface (GUI). The software tool can produce synthesized audio file output in many file formats.

Type: Application

Filed: January 9, 2007

Publication date: July 10, 2008

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Raimo Bakis, Ellen M. Eide, Roberto Pieraccini, Maria E. Smith, Jie Zeng
Chinese prosodic words forming method and apparatus

Publication number: 20080147405

Abstract: The present invention provides a method and apparatus of forming Chinese prosodic words, which method comprises the steps of inputting Chinese text; performing process of word segmentation and part of speech annotation for the input Chinese text to generate an initial prosodic word sequence; inserting grids representing prosodic word boundaries for all the words in the initial prosodic word sequence to generate a grid prosodic word sequence; annotating the grids ready to be deleted in the grid prosodic word sequence based on the prosodic word forming means; judging the grids which actually need to be deleted in the grids ready to be deleted based on the prosodic word forming means; deleting the grids which actually need to be deleted in the grid prosodic word sequence, and word forming the words between every two grids in the remaining grids to generate prosodic words.

Type: Application

Filed: December 10, 2007

Publication date: June 19, 2008

Applicant: FUJITSU LIMITED

Inventors: Guo Qing, Nobuyuki Katae
Automatic pattern recognition using category dependent feature selection

Publication number: 20080147402

Abstract: Disclosed are apparatus and methods that employ a modified version of a computational model of the human peripheral and central auditory system, and that provide for automatic pattern recognition using category dependent feature selection. The validity of the output of the model is examined by deriving feature vectors from the dimension expanded cortical response of the central auditory system for use in a conventional phoneme recognition task. In addition, the cortical response may be a place-coded data set where sounds are categorized according to the regions containing their most distinguishing features. This provides for a novel category-dependent feature selection apparatus and methods in which this mechanism may be utilized to better simulate robust human pattern (speech) recognition.

Type: Application

Filed: November 29, 2007

Publication date: June 19, 2008

Inventors: Woojay Jeon, Biing-Hwang Juang
Speech Synthesis Device, Speech Synthesis Method, and Program

Publication number: 20080109225

Abstract: A speech piece editing section (5) retrieves speech piece data on a speech piece the read of which matches that of a speech piece in a fixed message from a speech piece database (7) and converts the speech piece so as to match the speed specified by utterance speed data. The speech piece editing section (5) predicts the prosody of a fixed message and selects an item of the retrieved speech piece data most matching each speech piece of the fixed message one by one according to the prosody prediction results. However, if the proportion of the speech piece corresponding to the selected item of the speech piece data does not reach a predetermined value, the selection is cancelled. Concerning the speech piece for which selection is not made, waveform data representing the waveform of each unit speech is supplied to a sound processing section (41). The selected speech piece data and the supplied waveform data are interconnected thereby to create data representing a synthesized speech.

Type: Application

Filed: March 10, 2006

Publication date: May 8, 2008

Applicant: KABUSHIKI KAISHA KENWOOD

Inventor: Yasushi Sato