Elementary Speech Units Used In Speech Synthesizers; Concatenation Rules (epo) Patents (Class 704/E13.009)

E Subclasses

Concatenation (epo) (Class 704/E13.01)

SYSTEM AND METHOD FOR VOICE TRANSFORMATION

Publication number: 20140142946

Abstract: The present invention is a method and system to convert speech signal into a parametric representation in terms of timbre vectors, and to recover the speech signal thereof. The speech signal is first segmented into non-overlapping frames using the glottal closure instant information, each frame is converted into an amplitude spectrum using a Fourier analyzer, and then using Laguerre functions to generate a set of coefficients which constitute a timbre vector. A sequence of timbre vectors can be subject to a variety of manipulations. The new timbre vectors are converted back into voice signals by first transforming into amplitude spectra using Laguerre functions, then generating phase spectra from the amplitude spectra using Kramers-Knonig relations. A Fourier transformer converts the amplitude spectra and phase spectra into elementary waveforms, then superposed to become the output voice. The method and system can be used for voice transformation, speech synthesis, and automatic speech recognition.

Type: Application

Filed: September 24, 2012

Publication date: May 22, 2014

Inventor: Chengjun Julian Chen
System and method for unit selection text-to-speech using a modified Viterbi approach

Patent number: 8731931

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for speech synthesis. A system practicing the method receives a set of ordered lists of speech units, for each respective speech unit in each ordered list in the set of ordered lists, constructs a sublist of speech units from a next ordered list which are suitable for concatenation, performs a cost analysis of paths through the set of ordered lists of speech units based on the sublist of speech units for each respective speech unit, and synthesizes speech using a lowest cost path of speech units through the set of ordered lists based on the cost analysis. The ordered lists can be ordered based on the respective pitch of each speech unit. In one embodiment, speech units which do not have an assigned pitch can be assigned a pitch.

Type: Grant

Filed: June 18, 2010

Date of Patent: May 20, 2014

Assignee: AT&T Intellectual Property I, L.P.

Inventor: Alistair D. Conkie
Voice Synthesis Apparatus

Publication number: 20120310651

Abstract: A voice signal is synthesized using a plurality of phonetic piece data each indicating a phonetic piece containing at least two phoneme sections corresponding to different phonemes. In the apparatus, a phonetic piece adjustor forms a target section from first and second phonetic pieces so as to connect the first and second phonetic pieces to each other such that the target section includes a rear phoneme section of the first piece and a front phoneme section of the second piece, and expands the target section by a target time length to form an adjustment section such that a central part is expanded at an expansion rate higher than that of front and rear parts of the target section, to thereby create synthesized phonetic piece data having the target time length. A voice synthesizer creates a voice signal from the synthesized phonetic piece data.

Type: Application

Filed: May 31, 2012

Publication date: December 6, 2012

Applicant: Yamaha Corporation

Inventor: Keijiro SAINO
System for Controlling Digital Effects in Live Performances with Vocal Improvisation

Publication number: 20110218810

Abstract: A system for controlling digital effects in live performances with vocal improvisation is described. The system features a complex controller that in one embodiment utilizes several magnetically activated electronic switches attached to a glove that is worn by an artist during a live performance. The switches are activated by a permanent magnet that is also attached to the switch bearing glove and a second magnet attached to a glove worn on the opposite hand. Furthermore, the switches are wirelessly connected by a miniature, battery-operated wireless data communications unit to a digital vocal processor unit that provides a dual mode, multi-channel phrase looping capability wherein individual channels can be selected for re-recording and selected banks of channels can be deleted during the performance. This combination of features allows a complex sequence of digital effects to be controlled by the artist during a performance while maintaining the freedom of movement desired to enhance the performance.

Type: Application

Filed: February 28, 2011

Publication date: September 8, 2011

Inventor: Momilani Ramstrum
VOICE QUALITY CONVERSION APPARATUS, PITCH CONVERSION APPARATUS, AND VOICE QUALITY CONVERSION METHOD

Publication number: 20110125493

Abstract: The voice quality conversion apparatus includes: low-frequency harmonic level calculating units and a harmonic level mixing unit for calculating a low-frequency sound source spectrum by mixing a level of a harmonic of an input sound source waveform and a level of a harmonic of a target sound source waveform at a predetermined conversion ratio for each order of harmonics including fundamental, in a frequency range equal to or lower than a boundary frequency; a high-frequency spectral envelope mixing unit that calculates a high-frequency sound source spectrum by mixing the input sound source spectrum and the target sound source spectrum at the predetermined conversion ratio in a frequency range larger than the boundary frequency; and a spectrum combining unit that combines the low-frequency sound source spectrum with the high-frequency sound source spectrum at the boundary frequency to generate a sound source spectrum for an entire frequency range.

Type: Application

Filed: January 31, 2011

Publication date: May 26, 2011

Inventors: Yoshifumi Hirose, Takahiro Kamai
SPEECH SYNTHESIS APPARATUS AND METHOD

Publication number: 20110087488

Abstract: According to an embodiment, a speech synthesis apparatus includes a selecting unit configured to select speaker's parameters one by one for respective speakers and obtain a plurality of speakers' parameters, the speaker's parameters being prepared for respective pitch waveforms corresponding to speaker's speech sounds, the speaker's parameters including formant frequencies, formant phases, formant powers, and window functions concerning respective formants that are contained in the respective pitch waveforms. The apparatus includes a mapping unit configured to make formants correspond to each other between the plurality of speakers' parameters using a cost function based on the formant frequencies and the formant powers. The apparatus includes a generating unit configured to generate an interpolated speaker's parameter by interpolating, at desired interpolation ratios, the formant frequencies, formant phases, formant powers, and window functions of formants which are made to correspond to each other.

Type: Application

Filed: December 16, 2010

Publication date: April 14, 2011

Inventors: Ryo Morinaka, Takehiko Kagoshima
RICH CONTEXT MODELING FOR TEXT-TO-SPEECH ENGINES

Publication number: 20110054903

Abstract: Embodiments of rich text modeling for speech synthesis are disclosed. In operation, a text-to-speech engine refines a plurality of rich context models based on decision tree-tied Hidden Markov Models (HMMs) to produce a plurality of refined rich context models. The text-to-speech engine then generates synthesized speech for an input text based at least on some of the plurality of refined rich context models.

Type: Application

Filed: December 2, 2009

Publication date: March 3, 2011

Applicant: MICROSOFT CORPORATION

Inventors: Zhi-Jie Yan, Yao Qian, Frank Kao-Ping Soong
METHOD AND APPARATUS FOR EXTRACTING PROSODIC FEATURE OF SPEECH SIGNAL

Publication number: 20110046958

Abstract: The present invention discloses a method and an apparatus for extracting a prosodic feature of a speech signal, the method including: dividing the speech signal into speech frames; transforming the speech frames from time domain to frequency domain; and extracting respective prosodic features for different frequency ranges. According to the above technical solution of the present invention, it is possible to effectively extract the prosodic feature which can combine with a traditional acoustics feature without any obstacle.

Type: Application

Filed: August 16, 2010

Publication date: February 24, 2011

Applicant: Sony Corporation

Inventors: Kun LIU, Weiguo Wu
Apparatus and Method for Creating Singing Synthesizing Database, and Pitch Curve Generation Apparatus and Method

Publication number: 20110004476

Abstract: Variation over time in fundamental frequency in singing voices is separated into a melody-dependent component and a phoneme-dependent component, modeled for each of the components and stored into a singing synthesizing database. In execution of singing synthesis, a pitch curve indicative of variation over time in fundamental frequency of the melody is synthesized in accordance with an arrangement of notes represented by a singing synthesizing score and the melody-dependent component, and the pitch curve is corrected, for each of pitch curve sections corresponding to phonemes constituting lyrics, using a phoneme-dependent component model corresponding to the phoneme. Such arrangements can accurately model a singing expression, unique to a singing person and appearing in a melody singing style of the person, while taking into account phoneme-dependent pitch variation, and thereby permits synthesis of singing voices that sound more natural.

Type: Application

Filed: July 1, 2010

Publication date: January 6, 2011

Applicant: Yamaha Corporation

Inventors: Keijiro Saino, Jordi Bonada
SPEECH SYNTHESIS DEVICE, SPEECH SYNTHESIS METHOD, AND SPEECH SYNTHESIS PROGRAM

Publication number: 20100223058

Abstract: A speech synthesis device includes a pitch pattern generation unit (104) which generates a pitch pattern by combining, based on pitch pattern target data including phonemic information formed from at least syllables, phonemes, and words, a standard pattern which approximately expresses the rough shape of the pitch pattern and an original utterance pattern which expresses the pitch pattern of a recorded speech, a unit waveform selection unit (106) which selects unit waveform data based on the generated pitch pattern and upon selection, selects original utterance unit waveform data corresponding to the original utterance pattern in a section where the original utterance pattern is used, and a speech waveform generation unit (107) which generates a synthetic speech by editing the selected unit waveform data so as to reproduce prosody represented by the generated pitch pattern.

Type: Application

Filed: August 28, 2008

Publication date: September 2, 2010

Inventors: Yasuyuki Mitsui, Reishi Kondo
SPEECH SYNTHESIS DEVICE, SPEECH SYNTHESIS METHOD, AND SPEECH SYNTHESIS PROGRAM

Publication number: 20100211393

Abstract: A speech synthesis device is provided with: a central segment selection unit for selecting a central segment from among a plurality of speech segments; a prosody generation unit for generating prosody information based on the central segment; a non-central segment selection unit for selecting a non-central segment, which is a segment outside of a central segment section, based on the central segment and the prosody information; and a waveform generation unit for generating a synthesized speech waveform based on the prosody information, the central segment, and the non-central segment. The speech synthesis device first selects a central segment that forms a basis for prosody generation and generates prosody information based on the central segment so that it is possible to sufficiently reduce both concatenation distortion and sound quality degradation accompanying prosody control in the section of the central segment.

Type: Application

Filed: April 28, 2008

Publication date: August 19, 2010

Inventors: Masanori Kato, Yasuyuki Mitsui, Reishi Kondo
SYSTEM AND METHOD FOR CONFIGURING VOICE SYNTHESIS

Publication number: 20100049523

Abstract: Systems and methods for providing synthesized speech in a manner that takes into account the environment where the speech is presented. A method embodiment includes, based on a listening environment and at least one other parameter associated with at least one other parameter, selecting an approach from the plurality of approaches for presenting synthesized speech in a listening environment, presenting synthesized speech according to the selected approach and based on natural language input received from a user indicating that an inability to understand the presented synthesized speech, selecting a second approach from the plurality of approaches and presenting subsequent synthesized speech using the second approach.

Type: Application

Filed: October 28, 2009

Publication date: February 25, 2010

Applicant: AT&T Corp.

Inventors: Kenneth H. Rosen, Carroll W. Creswell, Jeffrey J. Farah, Pradeep K. Bansal, Ann K. Syrdal
SPEECH SYNTHESIZING APPARATUS AND METHOD THEREOF

Publication number: 20090326951

Abstract: Ratios of powers at the peaks of respective formants of the spectrum of a pitch-cycle waveform and powers at boundaries between the formants are obtained and, when the ratios are large, bandwidth of window functions are widened and the formant waveforms are generated by multiplying generated sinusoidal waveforms from the formant parameter sets on the basis of pitch-cycle waveform generating data by the window functions of the widened bandwidth, whereby a pitch-cycle waveform is generated by the sum of these formant waveforms.

Type: Application

Filed: April 14, 2009

Publication date: December 31, 2009

Applicant: KABUSHIKI KAISHA TOSHIBA

Inventors: Ryo Morinaka, Takehiko Kagoshima
Automatic Segmentation in Speech Synthesis

Publication number: 20090313025

Abstract: A method and system are disclosed that automatically segment speech to generate a speech inventory. The method includes initializing a Hidden Markov Model (HMM) using seed input data, performing a segmentation of the HMM into speech units to generate phone labels, correcting the segmentation of the speech units. Correcting the segmentation of the speech units includes re-estimating the HMM based on a current version of the phone labels, embedded re-estimating of the HMM, and updating the current version of the phone labels using spectral boundary correction. The system includes modules configured to control a processor to perform steps of the method.

Type: Application

Filed: August 20, 2009

Publication date: December 17, 2009

Applicant: AT&T Corp.

Inventors: Alistair D. CONKIE, Yeon-Jun KIM
VOICE QUALITY CONVERSION DEVICE AND VOICE QUALITY CONVERSION METHOD

Publication number: 20090281807

Abstract: A voice quality conversion device converts voice quality of an input speech using information of the speech.

Type: Application

Filed: May 8, 2008

Publication date: November 12, 2009

Inventors: Yoshifumi Hirose, Takahiro Kamai, Yumiko Kato
SPEECH PROCESSING APPARATUS, METHOD, AND COMPUTER PROGRAM PRODUCT

Publication number: 20090248417

Abstract: A method to generate a pitch contour for speech synthesis is proposed. The method is based on finding the pitch contour that maximizes a total likelihood function created by the combination of all the statistical models of the pitch contour segments of an utterance, at one or multiple linguistic levels. These statistical models are trained from a database of spoken speech, by means of a decision tree that for each linguistic level clusters the parametric representation of the pitch segments extracted from the spoken speech data with some features obtained from the text associated with that speech data. The parameterization of the pitch segments is performed in such a way, the likelihood function of any linguistic level can be expressed in terms of the parameters of one of the levels, thus allowing the maximization to be calculated with respect to the parameters of that level.

Type: Application

Filed: March 17, 2009

Publication date: October 1, 2009

Applicant: KABUSHIKI KAISHA TOSHIBA

Inventors: Javier Latorre, Masami Akamine
METHOD, APPARATUS AND PROGRAM FOR SPEECH SYNTHESIS

Publication number: 20090204405

Abstract: Apparatus and method for generating high quality synthesized speech having smooth waveform concatenation. The apparatus includes a pitch frequency calculation section, a pitch synchronization position calculation section, a unit waveform storage, a unit waveform selection section, a unit waveform generation section, and a waveform synthesis section. The unit waveform generation section includes a conversion ratio calculation section, a sampling rate conversion section, and a unit waveform re-selection section. The conversion ratio calculation section calculates a sampling rate conversion ratio from the pitch information and the position of pitch synchronization, and the sampling rate conversion section converts the sampling rate of the unit waveform, delivered as input, based on the sampling rate conversion ratio.

Type: Application

Filed: September 4, 2006

Publication date: August 13, 2009

Applicant: NEC CORPORATION

Inventors: Masanori Kato, Satoshi Tsukada
Text-to-speech apparatus

Publication number: 20080319754

Abstract: According to an aspect of an embodiment, an apparatus for converting text data into sound signal, comprises: a phoneme determiner for determining phoneme data corresponding to a plurality of phonemes and pause data corresponding to a plurality of pauses to be inserted among a series of phonemes in the text data to be converted into sound signal; a phoneme length adjuster for modifying the phoneme data and the pause data by determining lengths of the phonemes, respectively in accordance with a speed of the sound signal and selectively adjusting, the length of at least one of the phonemes which is a fricative in the text data so that the at least one of the fricative phonemes is relatively extended timewise as compared to other phonemes; and an output unit for outputting sound signal on the basis of the adjusted phoneme data and pause data by the phoneme length adjuster.

Type: Application

Filed: June 13, 2008

Publication date: December 25, 2008

Applicant: FUJITSU LIMITED

Inventors: Rika Nishiike, Hitoshi Sasaki
SPEECH SYNTHESIS METHOD, SPEECH SYNTHESIS SYSTEM, AND SPEECH SYNTHESIS PROGRAM

Publication number: 20080312931

Abstract: A speech synthesis system stores a group of speech units in a memory, selects a plurality of speech units from the group based on prosodic information of target speech, the speech units selected corresponding to each of segments which are obtained by segmenting a phoneme string of the target speech and minimizing distortion of synthetic speech generated from the speech units selected to the target speech, generates a new speech unit corresponding to the each of the segments, by fusing the speech units selected, to obtain a plurality of new speech units corresponding to the segments respectively, and generates synthetic speech by concatenating the new speech units.

Type: Application

Filed: August 18, 2008

Publication date: December 18, 2008

Inventors: Tatsuya MIZUTANI, Takehiko Kagoshima
Speech synthesizer

Publication number: 20080243511

Abstract: The present invention is a speech synthesizer that generates speech data of text including a fixed part and a variable part, in combination with recorded speech and rule-based synthetic speech. The speech synthesizer is a high-quality one in which recorded speech and synthetic speech are concatenated with the discontinuity of timbres and prosodies not perceived.

Type: Application

Filed: October 22, 2007

Publication date: October 2, 2008

Inventors: Yusuke Fujita, Ryota Kamoshida, Kenji Nagamatsu
SYSTEM AND METHOD FOR DYNAMICALLY SELECTING AMONG TTS SYSTEMS

Publication number: 20080172234

Abstract: Systems and methods for dynamically selecting among text-to-speech (TTS) systems. Exemplary embodiments of the systems and methods include identifying text for converting into a speech waveform, synthesizing said text by three TTS systems, generating a candidate waveform from each of the three systems, generating a score from each of the three systems, comparing each of the three scores, selecting a score based on a criteria and selecting one of the three waveforms based on the selected of the three scores.

Type: Application

Filed: January 12, 2007

Publication date: July 17, 2008

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Ellen M. Eide, Raul Fernandez, Wael M. Hamza, Michael A. Picheny
Method, Apparatus and Computer Program Product for Providing a Language Based Interactive Multimedia System

Publication number: 20080126093

Abstract: An apparatus for providing a language based interactive multimedia system includes a selection element, a comparison element and a processing element. The selection element may be configured to select a phoneme graph based on a type of speech processing associated with an input sequence of phonemes. The comparison element may be configured to compare the input sequence of phonemes to the selected phoneme graph. The processing element may be in communication with the comparison element and configured to process the input sequence of phonemes based on the comparison.

Type: Application

Filed: November 28, 2006

Publication date: May 29, 2008

Inventor: Sunil Sivadas
SPEECH SYNTHESIS APPARATUS AND METHOD

Publication number: 20080027727

Abstract: A speech unit corpus stores a group of speech units. A selection unit divides a phoneme sequence of target speech into a plurality of segments, and selects a combination of speech units for each segment from the speech unit corpus. An estimation unit estimates a distortion between the target speech and synthesized speech generated by fusing each speech unit of the combination for each segment. The selection unit recursively selects the combination of speech units for each segment based on the distortion. A fusion unit generates a new speech unit for each segment by fusing each speech unit of the combination selected for each segment. A concatenation unit generates synthesized speech by concatenating the new speech unit for each segment.

Type: Application

Filed: July 23, 2007

Publication date: January 31, 2008

Applicant: KABUSHIKI KAISHA TOSHIBA

Inventors: Masahiro MORITA, Takehiko Kagoshima