Time Element Patents (Class 704/267)

Automatic speech segmentation and verification using segment confidence measures

Patent number: 7472066

Abstract: An automatic speech segmentation and verification system and method is disclosed, which has a known text script and a recorded speech corpus corresponding to the known text script. A speech unit segmentor segments the recorded speech corpus into N test speech unit segments referring to the phonetic information of the known text script. Then, a segmental verifier is applied to obtain a confidence measure of syllable segmentation for verifying the correctness of the cutting points of test speech unit segments. A phonetic verifier obtains a confidence measure of syllable verification by using verification models for verifying whether the recorded speech corpus is correctly recorded. Finally, a speech unit inspector integrates the confidence measure of syllable segmentation and the confidence measure of syllable verification to determine whether the test speech unit segment is accepted or not.

Type: Grant

Filed: February 23, 2004

Date of Patent: December 30, 2008

Assignee: Industrial Technology Research Institute

Inventors: Chih-Chung Kuo, Chi-Shiang Kuo, Jau-Hung Chen
Text-to-speech apparatus

Publication number: 20080319754

Abstract: According to an aspect of an embodiment, an apparatus for converting text data into sound signal, comprises: a phoneme determiner for determining phoneme data corresponding to a plurality of phonemes and pause data corresponding to a plurality of pauses to be inserted among a series of phonemes in the text data to be converted into sound signal; a phoneme length adjuster for modifying the phoneme data and the pause data by determining lengths of the phonemes, respectively in accordance with a speed of the sound signal and selectively adjusting, the length of at least one of the phonemes which is a fricative in the text data so that the at least one of the fricative phonemes is relatively extended timewise as compared to other phonemes; and an output unit for outputting sound signal on the basis of the adjusted phoneme data and pause data by the phoneme length adjuster.

Type: Application

Filed: June 13, 2008

Publication date: December 25, 2008

Applicant: FUJITSU LIMITED

Inventors: Rika Nishiike, Hitoshi Sasaki
Text-to-speech apparatus

Publication number: 20080319755

Abstract: According to an aspect of an embodiment, an apparatus for converting text data into sound signal, comprises: a phoneme determiner for determining phoneme data corresponding to a plurality of phonemes and pause data corresponding to a plurality of pauses to be inserted among a series of phonemes in the text data to be converted into sound signal; a phoneme length adjuster for modifying the phoneme data and the pause data by determining lengths of the phonemes, respectively in accordance with a speed of the sound signal and selectively adjusting the length of at least one of the phonemes which is placed immediately after one of the pauses so that the at least one of the phonemes is relatively extended timewise as compared to other phonemes; and a output unit for outputting sound signal on the basis of the adjusted phoneme data and pause data by the phoneme length adjuster.

Type: Application

Filed: June 24, 2008

Publication date: December 25, 2008

Applicant: FUJITSU LIMITED

Inventors: Rika Nishiike, Hitoshi Sasaki, Nobuyuki Katae, Kentaro Murase, Takuya Noda
System and method for converting text-to-voice

Patent number: 7451087

Abstract: A method for converting text to concatenated voice by utilizing a digital voice library and a set of playback rules is provided. The method includes receiving and expanding text data to form a sequence of text and pseudo words. The sequence of text and pseudo words is converted into a sequence of speech items, and the sequence of speech items is converted into a sequence of voice recordings. The method includes generating voice data on the sequence of voice recordings by concatenating adjacent recordings in the sequence of voice recordings.

Type: Grant

Filed: March 27, 2001

Date of Patent: November 11, 2008

Assignee: Qwest Communications International Inc.

Inventors: Eliot M. Case, Richard P. Phillips
SYSTEM AND METHOD FOR HYBRID SPEECH SYNTHESIS

Publication number: 20080270140

Abstract: A speech synthesis system receives symbolic input describing an utterance to be synthesized. In one embodiment, different portions of the utterance are constructed from different sources, one of which is a speech corpus recorded from a human speaker whose voice is to be modeled. The other sources may include other human speech corpora or speech produced using Rule-Based Speech Synthesis (RBSS). At least some portions of the utterance may be constructed by modifying prototype speech units to produce adapted speech units that are contextually appropriate for the utterance. The system concatenates the adapted speech units with the other speech units to produce a speech waveform. In another embodiment, a speech unit of a speech corpus recorded from a human speaker lacks transitions at one or both of its edges. A transition is synthesized using RBSS and concatenated with the speech unit in producing a speech waveform for the utterance.

Type: Application

Filed: April 24, 2007

Publication date: October 30, 2008

Inventors: Susan R. Hertz, Harold G. Mills
Speech synthesizer

Publication number: 20080243511

Abstract: The present invention is a speech synthesizer that generates speech data of text including a fixed part and a variable part, in combination with recorded speech and rule-based synthetic speech. The speech synthesizer is a high-quality one in which recorded speech and synthetic speech are concatenated with the discontinuity of timbres and prosodies not perceived.

Type: Application

Filed: October 22, 2007

Publication date: October 2, 2008

Inventors: Yusuke Fujita, Ryota Kamoshida, Kenji Nagamatsu
Defining atom units between phone and syllable for TTS systems

Patent number: 7418389

Abstract: A method for identifying common multiphone units to add to a unit inventory for a text-to-speech generator is disclosed. The common multiphone units are units that are larger than a phone, but smaller than a syllable. The method slices each syllable into a plurality of slices. These slices are then sorted and the frequency of each slice is determined. Those slices whose frequencies exceed a threshold are added to the unit inventory. The remaining slices are decomposed according to a predetermined set of rules to determine if they contain slices that should be added to the unit inventory.

Type: Grant

Filed: January 11, 2005

Date of Patent: August 26, 2008

Assignee: Microsoft Corporation

Inventors: Min Chu, Yong Zhao
Method and apparatus for speech synthesis, program, recording medium, method and apparatus for generating constraint information and robot apparatus

Patent number: 7412390

Abstract: The emotion is to be added to the synthesized speech as the prosodic feature of the language is maintained. In a speech synthesis device 200, a language processor 201 generates a string of pronunciation marks from the text, and a prosodic data generating unit 202 creates prosodic data, expressing the time duration, pitch, sound volume or the like parameters of phonemes, based on the string of pronunciation marks. A constraint information generating unit 203 is fed with the prosodic data and with the string of pronunciation marks to generate the constraint information which limits the changes in the parameters to add the so generated constraint information to the prosodic data. A emotion filter 204, fed with the prosodic data, to which has been added the constraint information, changes the parameters of the prosodic data, within the constraint, responsive to the feeling state information, imparted to it.

Type: Grant

Filed: March 13, 2003

Date of Patent: August 12, 2008

Assignees: Sony France S.A., Sony Corporation

Inventors: Erika Kobayashi, Toshiyuki Kumakura, Makoto Akabane, Kenichiro Kobayashi, Nobuhide Yamazaki, Tomoaki Nitta, Pierre Yves Oudeyer
Data-driven global boundary optimization

Patent number: 7409347

Abstract: Portions from segment boundary regions of a plurality of speech segments are extracted. Each segment boundary region is based on a corresponding initial unit boundary. Feature vectors that represent the portions in a vector space are created. For each of a plurality of potential unit boundaries within each segment boundary region, an average discontinuity based on distances between the feature vectors is determined. For each segment, the potential unit boundary associated with a minimum average discontinuity is selected as a new unit boundary.

Type: Grant

Filed: October 23, 2003

Date of Patent: August 5, 2008

Assignee: Apple Inc.

Inventor: Jerome R. Bellegarda
Semiconductor integrated circuit device and electronic instrument

Publication number: 20080120106

Abstract: A semiconductor integrated circuit device including: a storage section which temporarily stores a command and text data input from the outside; a speech synthesis section which synthesizes a speech signal corresponding to the text data based on the command and the text data stored in the storage section, and outputs the synthesized speech signal to the outside; and a control section which controls a timing at which the command and the text data stored in the storage section are transferred to the speech synthesis section based on a speech synthesis start control signal. The control section controls an output of a speech output start notification signal which notifies in advance a start of outputting the synthesized speech signal to the outside based on occurrence of a speech synthesis start event, and then controls a start of outputting the synthesized speech signal to the outside at a given timing.

Type: Application

Filed: November 7, 2007

Publication date: May 22, 2008

Applicant: SEIKO EPSON CORPORATION

Inventors: Masamichi Izumida, Masayuki Murakami
Variable voice rate apparatus and variable voice rate method

Patent number: 7373299

Abstract: A variable voice rate apparatus to control a reproduction rate of voice, includes a voice data generation unit configured to generate voice data from the voice, a text data generation unit configured to generate text data indicating a content of the voice data, a division information generation unit configured to generate division information used for dividing the text data into a plurality of linguistic units each of which is characterized by a linguistic form, a reproduction information generation unit configured to generate reproduction information set for each of the linguistic units, and a voice reproduction controller which controls reproduction of each of the linguistic units, based on the reproduction information and the division information.

Type: Grant

Filed: December 23, 2003

Date of Patent: May 13, 2008

Assignee: Kabushiki Kaisha Toshiba

Inventors: Hisayoshi Nagae, Kohei Momosaki, Masahide Ariu
SPEECH SYNTHESIS APPARATUS AND METHOD

Publication number: 20080027727

Abstract: A speech unit corpus stores a group of speech units. A selection unit divides a phoneme sequence of target speech into a plurality of segments, and selects a combination of speech units for each segment from the speech unit corpus. An estimation unit estimates a distortion between the target speech and synthesized speech generated by fusing each speech unit of the combination for each segment. The selection unit recursively selects the combination of speech units for each segment based on the distortion. A fusion unit generates a new speech unit for each segment by fusing each speech unit of the combination selected for each segment. A concatenation unit generates synthesized speech by concatenating the new speech unit for each segment.

Type: Application

Filed: July 23, 2007

Publication date: January 31, 2008

Applicant: KABUSHIKI KAISHA TOSHIBA

Inventors: Masahiro MORITA, Takehiko Kagoshima
Text-to-speech (TTS) for hand-held devices

Patent number: 7299182

Abstract: There is provided an Ebook. The Ebook includes a memory device, a text-to-speech (TTS) module, and at least one speaker. The memory device stores files. The files include text. The TTS module synthesizes speech corresponding to the text. The at least one speaker outputs the speech.

Type: Grant

Filed: May 9, 2002

Date of Patent: November 20, 2007

Assignee: Thomson Licensing

Inventor: Jianlei Xie
System and method for speech generation from brain activity

Patent number: 7275035

Abstract: In a method of assisting a subject to generate speech, at least one first neural impulse is sensed from a first preselected location in the subject's brain. A first preselected sound is associated with the first neural impulse. The first preselected sound is generated in an audible format. In an apparatus for assisting the subject to generate speech, at least one sensor senses a neural impulse in the subject's brain and generates a signal representative thereof. An electronic speech generator generates a phoneme in response to the generation of the signal. An audio system generates audible sounds corresponding to the phoneme based upon the signal received from the speech generator.

Type: Grant

Filed: December 8, 2004

Date of Patent: September 25, 2007

Assignee: Neural Signals, Inc.

Inventor: Philip R. Kennedy
Singing voice-synthesizing method and apparatus and storage medium

Patent number: 7249022

Abstract: There are provided a singing voice-synthesizing method and apparatus capable of performing synthesis of natural singing voices close to human singing voices based on performance data being input in real time. Performance data is inputted for each phonetic unit constituting a lyric, to supply phonetic unit information, singing-starting time point information, singing length information, etc. Each performance data is inputted in timing earlier than the actual singing-starting time point, and a phonetic unit transition time length is generated. By using the phonetic unit transition time, the singing-starting time point information, and the singing length information, the singing-starting time points and singing duration times of the first and second phonemes are determined. In the singing voice synthesis, for each phoneme, a singing voice is generated at the determined singing-starting time point and continues to be generated for the determined singing duration time.

Type: Grant

Filed: December 1, 2005

Date of Patent: July 24, 2007

Assignee: Yamaha Corporation

Inventors: Hiraku Kayama, Oscar Celma, Jaume Ortola
Method of controlling high-speed reading in a text-to-speech conversion system

Patent number: 7240005

Abstract: A method of high-speed reading in a text-to-speech conversion system including a text analysis module (101) for generating a phoneme and prosody character string from an input text; a prosody generation module (102) for generating a synthesis parameter of at least a voice segment, a phoneme duration, and a fundamental frequency for the phoneme and prosody character string; and a speech generation module (103) for generating a synthetic waveform by waveform superimposition by referring to a voice segment dictionary (105). The prosody generation module is provided with both a duration rule table containing empirically found phoneme durations and a duration prediction table containing phoneme durations predicted by statistical analysis and, when the user-designated utterance speed exceeds a threshold, uses the duration rule table and, when the threshold is not exceeded, uses the duration prediction table to determined the phoneme duration.

Type: Grant

Filed: January 29, 2002

Date of Patent: July 3, 2007

Assignee: Oki Electric Industry Co., Ltd.

Inventor: Keiichi Chihara
Method for detecting the time sequences of a fundamental frequency of an audio response unit to be synthesized

Patent number: 7219061

Abstract: Predetermined macrosegments of the fundamental frequency are determined by a neural network, and these predefined macrosegments are reproduced by fundamental-frequency sequences stored in a database. The fundamental frequency is generated on the basis of a relatively large text section which is analyzed by the neural network. Microstructures from the database are received in the fundamental frequency. The fundamental frequency thus formed is thus optimized both with regard to its macrostructure and to its microstructure. As a result, an extremely natural sound is achieved.

Type: Grant

Filed: October 24, 2000

Date of Patent: May 15, 2007

Assignee: Siemens Aktiengesellschaft

Inventors: Caglayan Erdem, Martin Holzapfel
Emphasis of short-duration transient speech features

Patent number: 7219065

Abstract: A sound processor including a microphone (1), a pre-amplifier (2), a bank of N parallel filters (3), means for detecting short-duration transitions in the envelope signal of each filter channel, and means for applying gain to the outputs of these filter channels in which the gain is related to a function of the second-order derivative of the slow-varying envelope signal in each filter channel, to assist in perception of low-intensity short-duration speech features in said signal.

Type: Grant

Filed: October 25, 2000

Date of Patent: May 15, 2007

Inventors: Andrew E. Vandali, Graeme M. Clark
Method for sending multi-media messages using customizable background images

Patent number: 7177811

Abstract: A method is provided for customizing a multi-media message created by a sender for a recipient, in which the multi-media message includes an animated entity audibly presenting speech converted from text by the sender. At least one image is received from the sender. Each of the at least one image is associated with a tag. The sender is presented with options to insert the tag associated with one of the at least one image into the sender text.

Type: Grant

Filed: March 6, 2006

Date of Patent: February 13, 2007

Assignee: AT&T Corp.

Inventors: Joern Ostermann, Barbara Buda, Mehmet Reha Civanlar, Eric Cosatto, Hans Peter Graf, Thomas M. Isaacson, Yann Andre LeCun
Assignment of phonemes to the graphemes producing them

Patent number: 7171362

Abstract: The assignment of phonemes to graphemes producing them in a lexicon having words (grapheme sequences) and their associated phonetic transcription (phoneme sequences) for the preparation of patterns for training neural networks for the purpose of grapheme-phoneme conversion is carried out with the aid of a variant of dynamic programming which is known as dynamic time warping (DTW).

Type: Grant

Filed: August 31, 2001

Date of Patent: January 30, 2007

Assignee: Siemens Aktiengesellschaft

Inventor: Horst-Udo Hain
Singing voice-synthesizing method and apparatus and storage medium

Patent number: 7124084

Abstract: There are provided a singing voice-synthesizing method and apparatus capable of performing synthesis of natural singing voices close to human singing voices based on performance data being input in real time. Performance data is inputted for each phonetic unit constituting a lyric, to supply phonetic unit information, singing-starting time point information, singing length information, etc. Each performance data is inputted in timing earlier than the actual singing-starting time point, and a phonetic unit transition time length is generated. By using the phonetic unit transition time, the singing-starting time point information, and the singing length information, the singing-starting time points and singing duration times of the first and second phonemes are determined. In the singing voice synthesis, for each phoneme, a singing voice is generated at the determined singing-starting time point and continues to be generated for the determined singing duration time.

Type: Grant

Filed: December 27, 2001

Date of Patent: October 17, 2006

Assignee: Yamaha Corporation

Inventors: Hiraku Kayama, Oscar Celma, Jaume Ortola
Method and apparatus for performing packet loss or frame erasure concealment

Patent number: 7117156

Abstract: The invention concerns a method and apparatus for performing packet loss or Frame Erasure Concealment (FEC) for a speech coder that does not have a built-in or standard FEC process. A receiver with a decoder receives encoded frames of compressed speech information transmitted from an encoder. A lost frame detector at the receiver determines if an encoded frame has been lost or corrupted in transmission, or erased. If the encoded frame is not erased, the encoded frame is decoded by a decoder and a temporary memory is updated with the decoder's output. A predetermined delay period is applied and the audio frame is then output. If the lost frame detector determines that the encoded frame is erased, a FEC module applies a frame concealment process to the signal. The FEC processing produces natural sounding synthetic speech for the erased frames.

Type: Grant

Filed: April 19, 2000

Date of Patent: October 3, 2006

Assignee: AT&T Corp.

Inventor: David A. Kapilow
Voice synthesizing method and voice synthesizer performing the same

Patent number: 7113909

Abstract: A stereotypical sentence is synthesized into a voice of an arbitrary speech style. A third party is able to prepare prosody data and a user of a terminal device having a voice synthesizing part can acquire the prosody data. The voice synthesizing method determines a voice-contents identifier to point to a type of voice contents of a stereotypical sentence, prepares a speech style dictionary including speech style and prosody data which correspond to the voice-contents identifier, selects prosody data of the synthesized voice to be generated from the speech style dictionary, and adds the selected prosody data to a voice synthesizer 13 as voice-synthesizer driving data to thereby perform voice synthesis with a specific speech style. Thus, a voice of a stereotypical sentence can be synthesized with an arbitrary speech style.

Type: Grant

Filed: July 31, 2001

Date of Patent: September 26, 2006

Assignee: Hitachi, Ltd.

Inventors: Nobuo Nukaga, Kenji Nagamatsu, Yoshinori Kitahara
Voice synthesizing system, segment generation apparatus for generating segments for voice synthesis, voice synthesizing method and storage medium storing program therefor

Patent number: 7089187

Abstract: A voice synthesizing system can make necessary calculation amount satisfactorily small and can make necessary file size small. The system includes a compressed pitch segment database storing compressed voice waveform segments, a pitch developing portion reading out the voice waveform segment from the database and decompressing the compressed data for reproducing an original voice waveform segment when the voice waveform segment necessary for voice waveform synthesis is demanded, and a cache processing portion temporarily storing the voice waveform segment already used in voice waveform synthesis, and when voice waveform segment necessary for voice waveform synthesis is demanded, returning demanded voice waveform segment to a demander when demanded voice waveform segment is already stored, and obtaining the voice waveform segment from the database via the pitch developing portion to hold the obtained voice waveform segment and return to the demander when demanded voice waveform segment is not stored.

Type: Grant

Filed: September 26, 2002

Date of Patent: August 8, 2006

Assignee: NEC Corporation

Inventors: Reishi Kondo, Hiroaki Hattori
Speech synthesizing method and apparatus using prosody control

Patent number: 7054815

Abstract: A speech synthesizing apparatus extracts small speech segments from a speech waveform as a prosody control target and adds inhibition information for inhibiting a predetermined prosody change process to a selected small speech segment in executing prosody control. Prosody control is performed by performing a predetermined prosody change process by using small speech segments of the extracted small speech segments other than small speech segments to which inhibition information is added. This makes it possible to prevent a deterioration in synthesized speech due to waveform editing operation.

Type: Grant

Filed: March 27, 2001

Date of Patent: May 30, 2006

Assignee: Canon Kabushiki Kaisha

Inventors: Masayuki Yamada, Yasuhiro Komori
Method for sending multi-media messages using customizable background images

Patent number: 7035803

Abstract: A system and method of providing sender customization of multi-media messages through the use of inserted images or video. The images or video may be sender-created or predefined and available to the sender via a web server. The method relates to customizing a multi-media message created by a sender for a recipient, the multi-media message having an animated entity audibly presenting speech converted from text created by the sender. The method comprises receiving at least one image from the sender, associating each at least one image with a tag, presenting the sender with options to insert the tag associated with one of the at least one image into the sender text, and after the sender inserts the tag associated with one of the at least one images into the sender text, delivering the multi-media message with the at least one image presented as background to the animated entity according to a position of the tag associated with the at least one image in the sender text.

Type: Grant

Filed: November 2, 2001

Date of Patent: April 25, 2006

Assignee: AT&T Corp.

Inventors: Joern Ostermann, Barbara Buda, Mehmet Reha Civanlar, Eric Cosatto, Hans Peter Graf, Thomas M. Isaacson, Yann Andre LeCun
Speech synthesizing apparatus and method, and storage medium therefor

Patent number: 7031919

Abstract: A speech synthesizing apparatus for synthesizing a speech waveform stores speech data, which is obtained by adding attribute information onto phoneme data, in a database. In accordance with prescribed retrieval conditions, a phoneme retrieval unit retrieves phoneme data from the speech data that has been stored in the database and retains the retrieved results in a retrieved-result storage area. A processing unit for assigning a power penalty and a processing unit for assigning a phoneme-duration penalty assign the penalties, on the basis of power and phoneme duration constituting the attribute information, to a set of phoneme data stored in the retrieved-result storage area. A processing unit for determining typical phoneme data performs sorting on the basis of the assigned penalties and, based upon the stored results, selects phoneme data to be employed in the synthesis of a speech waveform.

Type: Grant

Filed: August 30, 1999

Date of Patent: April 18, 2006

Assignee: Canon Kabushiki Kaisha

Inventors: Yasuo Okutani, Masayuki Yamada
Method and system for waveform compression and expansion with time axis

Patent number: 7010491

Abstract: With the goal of presenting a waveform compression and expansion apparatus with which the sound quality of such things as musical tones that are expressed by waveforms is satisfactory following the compression and expansion of the waveforms of the musical tones etc., a method and system for waveform compression and expansion is disclosed in which all of the multiple number of band divided waveforms that comprise the original waveform which has been band divided are apportioned to at least two kinds of compression and expansion formats and form a multiple number of compressed and expanded waveforms by compression or expansion an identical amount only in the direction of the temporal axis.

Type: Grant

Filed: December 9, 1999

Date of Patent: March 7, 2006

Assignee: Roland Corporation

Inventor: Tadao Kikumoto
Synchronization and overlap method and system for single buffer speech compression and expansion

Patent number: 6999922

Abstract: The present invention (110) permits a user to speed up and slow down speech without changing the speakers pitch (102, 110, 112, 128, 402–416). It is a user adjustable feature to change the spoken rate to the listeners' preferred listening rate or comfort. It can be included on the phone as a customer convenience feature without changing any characteristics of the speakers voice besides the speaking rate with soft key button (202) combinations (in interconnect or normal). From the users perspective, it would seem only that the talker changed his speaking rate, and not that the speech was digitally altered in any way. The pitch and general prosody of the speaker are preserved. The following uses of the time expansion/compression feature are listed to compliment already existing technologies or applications in progress including messaging services, messaging applications and games, real-time feature to slow down the listening rate.

Type: Grant

Filed: June 27, 2003

Date of Patent: February 14, 2006

Assignee: Motorola, Inc.

Inventors: Marc Andre Boillot, John Gregory Harris, Thomas Lawrence Reinke
Linguistic prosodic model-based text to speech

Patent number: 6961704

Abstract: An arrangement is provided for text to speech processing based on linguistic prosodic models. Linguistic prosodic models are established to characterize different linguistic prosodic characteristics. When an input text is received, a target unit sequence is generated with a linguistic target that annotates target units in the target unit sequence with a plurality of linguistic prosodic characteristics so that speech synthesized in accordance with the target unit sequence and the linguistic target has certain desired prosodic properties. A unit sequence is selected in accordance with the target unit sequence and the linguistic target based on joint cost information evaluated using established linguistic prosodic models. The selected unit sequence is used to produce synthesized speech corresponding to the input text.

Type: Grant

Filed: January 31, 2003

Date of Patent: November 1, 2005

Assignee: Speechworks International, Inc.

Inventors: Michael S. Phillips, Daniel S. Faulkner, Marek A. Przezdzieci
Method and apparatus for performing packet loss or frame erasure concealment

Patent number: 6961697

Abstract: The invention concerns a method and apparatus for performing packet loss or Frame Erasure Concealment (FEC) for a speech coder that does not have a built-in or standard FEC process. A receiver with a decoder receives encoded frames of compressed speech information transmitted from an encoder. A lost frame detector at the receiver determines if an encoded frame has been lost or corrupted in transmission, or erased. If the encoded frame is not erased, the encoded frame is decoded by a decoder and a temporary memory is updated with the decoder's output. A predetermined delay period is applied and the audio frame is then output. If the lost frame detector determines that the encoded frame is erased, a FEC module applies a frame concealment process to the signal. The FEC processing produces natural sounding synthetic speech for the erased frames.

Type: Grant

Filed: April 19, 2000

Date of Patent: November 1, 2005

Assignee: AT&T Corp.

Inventor: David A. Kapilow
Employing speech models in concatenative speech synthesis

Patent number: 6950798

Abstract: A text-to-speech synthesizer employs database that includes units. For each unit there is a collection of unit selection parameters and a plurality of frames. Each frame has a set of model parameters derived from a base speech frame, and a speech frame synthesized from the frame's model parameters. A text to be synthesized is converted to a sequence of desired unit features sets, and for each such set the database is perused to retrieve a best-matching unit. An assessment is made whether modifications to the frames are needed, because of discontinuities in the model parameters at unit boundaries, or because of differences between the desired and selected unit features. When modifications are necessary, the model parameters of frames that need to be altered are modified, and new frames are synthesized from the modified model parameters and concatenated to the output. Otherwise, the speech frames previously stored in the database are retrieved and concatenated to the output.

Type: Grant

Filed: March 2, 2002

Date of Patent: September 27, 2005

Assignee: AT&T Corp.

Inventors: Mark Charles Beutnagel, David A. Kapilow, Ioannis G. Stylianou, Ann K. Syrdal
Stochastic modeling of spectral adjustment for high quality pitch modification

Patent number: 6910007

Abstract: Natural-sounding synthesized speech is obtained from pieced elemental speech units that have their super-class identities known (e.g. phoneme type), and their line spectral frequencies (LSF) set in accordance with a correlation between the desired fundamental frequency and the LSF vectors that are known for different classes in the super-class. The correlation between a fundamental frequency in a class and the corresponding LSF is obtained by, for example, analyzing the database of recorded speech of a person and, more particularly, by analyzing frames of the speech signal.

Type: Grant

Filed: January 25, 2001

Date of Patent: June 21, 2005

Assignee: AT&T Corp

Inventors: Ioannis G (Yannis) Stylianou, Alexander Kain
Method for producing a speech rendition of text from diphone sounds

Patent number: 6879957

Abstract: A text-to-speech system utilizes a method for producing a speech rendition of text based on dividing some or all words of a sentence into component diphones. A phonetic dictionary is aligned so that each letter within each word has a single corresponding phoneme. The aligned dictionary is analyzed to determine the most common phoneme representation of the letter in the context of a string of letters before and after it. The results for each letter are stored in phoneme rule matrix. A diphone database is created using a way editor to cut 2,000 distinct diphones out of specially selected words. A computer algorithm selects a phoneme for each letter. Then, two phonemes are used to create a diphone. Words are then read aloud by concatenating sounds from the diphone database. In one embodiment, diphones are used only when a word is not one of a list of pre-recorded words.

Type: Grant

Filed: September 1, 2000

Date of Patent: April 12, 2005

Inventors: William H. Pechter, Joseph E. Pechter
Method and apparatus for recording/reproducing or producing a waveform using time position information

Patent number: 6873955

Abstract: Partial waveform data representative of a waveform shape variation are extracted from supplied waveform data, and the extracted partial waveform data are stored along with time position information indicative of their respective time positions. In reproduction, the partial waveform data and time position information are read out, then the partial waveform data are arranged on the time axis in accordance with the time position information, and a waveform is produced on the basis of the waveform data arranged on the time axis. In another implementation, sets of sample identification information and time position information are obtained in accordance with a performance tone waveform to be reproduced, and sample data are obtained from a database in accordance with the sample identification information. The thus-obtained sample data are arranged on the time axis in accordance with the time position information, and the desired waveform is produced on the basis of the sample data arranged on the time axis.

Type: Grant

Filed: September 22, 2000

Date of Patent: March 29, 2005

Assignee: Yamaha Corporation

Inventors: Hideo Suzuki, Motoichi Tamura, Satoshi Usa
Speech synthesis device handling phoneme units of extended CV

Patent number: 6847932

Abstract: Given phonetic information is divided into speech units of extended CV which is a contiguous sequence of phonemes without clear distinction containing a vowel or some vowels. Contour of vocal tract transmission function of phoneme of the speech unit of extended CV is obtained from the phoneme directory which contains a contour of vocal tract transmission function of each phoneme associated with phonetic information in a unit of extended CV. Speech waveform data is generated based on the contour of vocal tract transmission function of phoneme of the speech unit of extended CV. Speech waveform data is converted into analog voice signal.

Type: Grant

Filed: September 28, 2000

Date of Patent: January 25, 2005

Assignee: Arcadia, Inc.

Inventors: Kazuyuki Ashimura, Seiichi Tenpaku
Speech synthesizing method and apparatus

Patent number: 6832192

Abstract: A speech synthesizing apparatus acquires a synthesis unit speech segment divided as a speech synthesis unit, and acquires partial speech segments by dividing the synthesis unit speech segment with a phoneme boundary. The power value required for each partial speech segment is estimated on the basis of a target power value in reproduction. An amplitude magnification is acquired from the ratio of the estimated power value to the reference power value for each of the partial speech segments. Synthesized speech is generated by changing the amplitude of each partial speech segment of the synthesis unit speech segment on the basis of the acquired amplitude magnification.

Type: Grant

Filed: March 29, 2001

Date of Patent: December 14, 2004

Assignee: Canon Kabushiki Kaisha

Inventor: Masayuki Yamada
Speech synthesizing system and method for modifying prosody based on match to database

Patent number: 6823309

Abstract: A speech synthesis system for storing in advance a degree of modification of prosodic data in a prosodic data modifying rule apparatus, the degree of modification corresponding to an approximate cost and being stored as a modifying rule, a prosodic data retrieving section for retrieving a prosodic data stored corresponding to a key data for use in retrieval, the prosodic data retrieved according to a degree of matching between the input data and the key data, the degree of matching represented by the approximate cost, a modifying section for modifying the retrieved prosodic data based on the degree of matching and the modifying rule stored in the prosodic data modifying rule means, and an output section for outputting synthesized speech based on the input data and the modified prosodic data.

Type: Grant

Filed: November 27, 2000

Date of Patent: November 23, 2004

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Yumiko Kato, Kenji Matsui, Takahiro Kamai, Katsuyoshi Yamagami
Methods and apparatus for speaker specific durational adaptation

Patent number: 6813604

Abstract: A text to speech system modeling durational characteristics of a target speaker is addressed herein. A body of target speaker training text is selected having maximum possible information about speaker specific characteristics. The body of target speaker training text is read by a target speaker to produce a target speaker training corpus. A previously generated source model reflecting characteristics of a source model is retrieved and the target speaker training corpus is processed to produce modification parameters reflecting differences between durational characteristics of the target speaker and those predicted by the source model. The modification parameters are applied to the source model to produce a target model. Text inputs are processed using the target model to produce speech outputs reflecting durational characteristics of the target speaker.

Type: Grant

Filed: November 13, 2000

Date of Patent: November 2, 2004

Assignee: Lucent Technologies Inc.

Inventors: Chi-Lin Shih, Jan Pieter Hendrik van Santen
Method and apparatus for improved duration modeling of phonemes

Patent number: 6785652

Abstract: A method and an apparatus for improved duration modeling of phonemes in a speech synthesis system are provided. According to one aspect, text is received into a processor of a speech synthesis system. The received text is processed using a sum-of-products phoneme duration model that is used in either the formant method or the concatenative method of speech generation. The phoneme duration model, which is used along with a phoneme pitch model, is produced by developing a non-exponential functional transformation form for use with a generalized additive model. The non-exponential functional transformation form comprises a root sinusoidal transformation that is controlled in response to a minimum phoneme duration and a maximum phoneme duration. The minimum and maximum phoneme durations are observed in training data. The received text is processed by specifying at least one of a number of contextual factors for the generalized additive model.

Type: Grant

Filed: December 19, 2002

Date of Patent: August 31, 2004

Assignee: Apple Computer, Inc.

Inventors: Jerome R. Bellegarda, Kim Silverman
Speech information processing method and apparatus and storage medium

Patent number: 6778960

Abstract: A speech information processing apparatus which sets the duration of phonological series with accuracy, and sets a natural phoneme duration in accordance with phonemic/linguistic environment. For this purpose, the duration of predetermined unit of phonological series is obtained based on a duration model for entire segment. Then duration of each of phonemes constructing the phonological series is obtained based on the duration model for the entire segment. Then duration of each phoneme is set based on the duration of the phonological series and the duration of each phoneme.

Type: Grant

Filed: March 28, 2001

Date of Patent: August 17, 2004

Assignee: Canon Kabushiki Kaisha

Inventor: Toshiaki Fukada
Audio signal coding and decoding methods and apparatus and recording media with programs therefor

Patent number: 6658382

Abstract: An input signal is time-frequency transformed, then the frequency-domain coefficients are divided into coefficient segments of about 100 Hz width to generate a sequence of coefficient segments, and the sequence of coefficient segments is split into subbands each consisting of plural coefficient segments. A threshold value is determined based on the intensity of each coefficient segment in each subband. The intensity of each coefficient segment is compared with the threshold value, and the coefficient segments are classified into low- and high-intensity groups. The coefficient segments are quantized for each group, or they are flattened respectively and then quantized through recombination.

Type: Grant

Filed: March 23, 2000

Date of Patent: December 2, 2003

Assignee: Nippon Telegraph and Telephone Corporation

Inventors: Naoki Iwakami, Takehiro Moriya, Akio Jin, Kazuaki Chikira, Takeshi Mori
Method and apparatus for processing a physiological signal

Patent number: 6647280

Abstract: A signal processing method, preferably for extracting a fundamental period from a noisy, low-frequency signal, is disclosed. The signal processing method generally comprises calculating a numerical transform for a number of selected periods by multiplying signal data by discrete points of a sine and a cosine wave of varying period and summing the results. The period of the sine and cosine waves are preferably selected to have a period substantially equivalent to the period of interest when performing the transform.

Type: Grant

Filed: January 14, 2002

Date of Patent: November 11, 2003

Assignee: OB Scientific, Inc.

Inventors: Dennis E. Bahr, James L. Reuss.
Range control system

Patent number: 6629067

Abstract: A range control system includes an input section for inputting a singing voice, a fundamental frequency extracting section for extracting a fundamental frequency of the inputted voice, and a pitch control section for performing a pitch control of the inputted voice so as to match the extracted fundamental frequency with a given frequency. The system further includes a formant extracting section for extracting a formant of the inputted voice, and a formant filter section for performing a filter operation relative to the pitch-controlled voice so that the pitch-controlled voice has a characteristic of the extracted formant. The system further includes an input loudness detecting section for detecting a first loudness of the inputted voice, and a loudness control section for controlling a second loudness of the voice subjected to the filter operation to match with the first loudness.

Type: Grant

Filed: May 14, 1998

Date of Patent: September 30, 2003

Assignee: Kabushiki Kaisha Kawai Gakki Seisakusho

Inventors: Tsutomu Saito, Hiroshi Kato, Youichi Kondo
Method and apparatus for improved duration modeling of phonemes

Patent number: 6553344

Abstract: A method and an apparatus for improved duration modeling of phonemes in a speech synthesis system are provided. According to one aspect, text is received into a processor of a speech synthesis system. The received text is processed using a sum-of-products phoneme duration model that is used in either the formant method or the concatenative method of speech generation. The phoneme duration model, which is used along with a phoneme pitch model, is produced by developing a non-exponential functional transformation form for use with a generalized additive model. The non-exponential functional transformation form comprises a root sinusoidal transformation that is controlled in response to a minimum phoneme duration and a maximum phoneme duration. The minimum and maximum phoneme durations are observed in training data. The received text is processed by specifying at least one of a number of contextual factors for the generalized additive model.

Type: Grant

Filed: February 22, 2002

Date of Patent: April 22, 2003

Assignee: Apple Computer, Inc.

Inventors: Jerome R. Bellegarda, Kim Silverman
Synthesizing phoneme string of predetermined duration by adjusting initial phoneme duration on values from multiple regression by adding values based on their standard deviations

Patent number: 6546367

Abstract: Statistical data including an average value, a standard deviation, and a minimum value of a phoneme duration of each phoneme is stored in a memory. When speech production time is determined for a phoneme string in a predetermined expiratory paragraph, the total phoneme duration of the phoneme string is set so as to become equal to the speech production time. Based on the set phoneme duration, phonemes are connected and a speech waveform is generated. To set a phoneme duration for each phoneme, a phoneme duration initial value is first set based on an average value, obtained by equally dividing the speech production time by phonemes of the phoneme string, and a phoneme duration range, phoneme. Then, set based on statistical data of each the phoneme duration initial value is adjusted based on the statistical data and the speech production time.

Type: Grant

Filed: March 9, 1999

Date of Patent: April 8, 2003

Assignee: Canon Kabushiki Kaisha

Inventor: Mitsuru Otsuka
Speech duration processing method and apparatus for Chinese text-to-speech system

Patent number: 6542867

Abstract: The duration of speech varies according to the characteristics of pronounced speech and pronouncing habit of the speaker. In the speech duration processing method and apparatus of this invention, a large amount of natural speech was analyzed, and the following was known: Speech duration of monosyllables will vary according to factors, such as phonemes, tones, phrase construction, locations in the phrases, locations in the sentence, and front and rear connected phonemes, etc. of the syllables. Through the use of these varying factors, a “speech duration parameter storage portion” for speech duration parameters is constructed. By retrieving the speech duration parameters and combining the same with the basic speech duration of a syllable during syllable speech duration calculation, the speech duration of each monosyllable in any sentence can be accurately decided.

Type: Grant

Filed: March 28, 2000

Date of Patent: April 1, 2003

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Shih Chang Sun, Chin Yun Hsieh
Apparatus and method for synthesizing a plurality of waveforms in synchronized manner

Publication number: 20030050781

Abstract: A plurality of blocks of waveform data are stored in a memory, which also stores, for each of the blocks, synchronizing information representative of a plurality of cycle synchronizing points that are indicative of periodic specific phase positions where the block of waveform data should be synchronized in phase with another block of waveform data. Two blocks of waveform data (e.g., harmonic and nonharmonic components) are read out from the memory, along with the synchronizing information. On the basis of the synchronizing information, the readout of two blocks of waveform data is controlled using the synchronizing information. There is stored, for each of the blocks, at least one piece of synchronizing position information indicative of a specific position where the block should be synchronized with another block, and the readout of the individual blocks of waveform data is controlled so that the blocks are synchronized with each other using the synchronizing position information.

Type: Application

Filed: September 11, 2002

Publication date: March 13, 2003

Applicant: Yamaha Corporation

Inventors: Motoichi Tamura, Yasuyuki Umeyama
Speech synthesis apparatus

Patent number: 6499014

Abstract: The speech synthesis apparatus of the present invention includes: a text analyzer operable to generate a phonetic and prosodic symbol string from character information of an input text; a word dictionary storing a reading and an accent of a word; an voice segment dictionary storing a phoneme that is a basic unit of speech; a parameter generator operable to generate synthesizing parameters including at least a phoneme, a duration of the phoneme and a fundamental frequency for the phonetic and prosodic symbol string, the parameter generator including a calculating means operable to obtain a sum of phrase components and a sum of accent components and to calculate an average pitch from the sum of the phrase components and the sum of the accent components, and a determining means operable to determine a base pitch from the average pitch; and a waveform generator operable to generate a synthesized waveform by making waveform-overlapping referring to the synthesizing parameters generated by the parameter generator a

Type: Grant

Filed: March 7, 2000

Date of Patent: December 24, 2002

Assignee: Oki Electric Industry Co., Ltd.

Inventor: Keiichi Chihara
Formant-based speech synthesizer employing demi-syllable concatenation with independent cross fade in the filter parameter and source domains

Patent number: RE39336

Abstract: The concatenative speech synthesizer employs demi-syllable subword units to generate speech. The synthesizer is based on a source-filter model that uses source signals that correspond closely to the human glottal source and that uses filter parameters that correspond closely to the human vocal tract. Concatenation of the demi-syllable units is facilitated by two separate cross face techniques, one applied in the time domain in the demi-syllable source signal waveforms, and one applied in the frequency domain by interpolating the corresponding filter parameters of the concatenated demi-syllables. The dual cross fade technique results in natural sounding synthesis that avoids time-domain glitches without degrading or smearing characteristic resonances in the filter domain.

Type: Grant

Filed: November 5, 2002

Date of Patent: October 10, 2006

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Steve Pearson, Nicholas Kibre, Nancy Niedzielski

prev 1 2 3 4 5 6 next