Time Element Patents (Class 704/267)
  • Patent number: 7472066
    Abstract: An automatic speech segmentation and verification system and method is disclosed, which has a known text script and a recorded speech corpus corresponding to the known text script. A speech unit segmentor segments the recorded speech corpus into N test speech unit segments referring to the phonetic information of the known text script. Then, a segmental verifier is applied to obtain a confidence measure of syllable segmentation for verifying the correctness of the cutting points of test speech unit segments. A phonetic verifier obtains a confidence measure of syllable verification by using verification models for verifying whether the recorded speech corpus is correctly recorded. Finally, a speech unit inspector integrates the confidence measure of syllable segmentation and the confidence measure of syllable verification to determine whether the test speech unit segment is accepted or not.
    Type: Grant
    Filed: February 23, 2004
    Date of Patent: December 30, 2008
    Assignee: Industrial Technology Research Institute
    Inventors: Chih-Chung Kuo, Chi-Shiang Kuo, Jau-Hung Chen
  • Publication number: 20080319754
    Abstract: According to an aspect of an embodiment, an apparatus for converting text data into sound signal, comprises: a phoneme determiner for determining phoneme data corresponding to a plurality of phonemes and pause data corresponding to a plurality of pauses to be inserted among a series of phonemes in the text data to be converted into sound signal; a phoneme length adjuster for modifying the phoneme data and the pause data by determining lengths of the phonemes, respectively in accordance with a speed of the sound signal and selectively adjusting, the length of at least one of the phonemes which is a fricative in the text data so that the at least one of the fricative phonemes is relatively extended timewise as compared to other phonemes; and an output unit for outputting sound signal on the basis of the adjusted phoneme data and pause data by the phoneme length adjuster.
    Type: Application
    Filed: June 13, 2008
    Publication date: December 25, 2008
    Applicant: FUJITSU LIMITED
    Inventors: Rika Nishiike, Hitoshi Sasaki
  • Publication number: 20080319755
    Abstract: According to an aspect of an embodiment, an apparatus for converting text data into sound signal, comprises: a phoneme determiner for determining phoneme data corresponding to a plurality of phonemes and pause data corresponding to a plurality of pauses to be inserted among a series of phonemes in the text data to be converted into sound signal; a phoneme length adjuster for modifying the phoneme data and the pause data by determining lengths of the phonemes, respectively in accordance with a speed of the sound signal and selectively adjusting the length of at least one of the phonemes which is placed immediately after one of the pauses so that the at least one of the phonemes is relatively extended timewise as compared to other phonemes; and a output unit for outputting sound signal on the basis of the adjusted phoneme data and pause data by the phoneme length adjuster.
    Type: Application
    Filed: June 24, 2008
    Publication date: December 25, 2008
    Applicant: FUJITSU LIMITED
    Inventors: Rika Nishiike, Hitoshi Sasaki, Nobuyuki Katae, Kentaro Murase, Takuya Noda
  • Patent number: 7451087
    Abstract: A method for converting text to concatenated voice by utilizing a digital voice library and a set of playback rules is provided. The method includes receiving and expanding text data to form a sequence of text and pseudo words. The sequence of text and pseudo words is converted into a sequence of speech items, and the sequence of speech items is converted into a sequence of voice recordings. The method includes generating voice data on the sequence of voice recordings by concatenating adjacent recordings in the sequence of voice recordings.
    Type: Grant
    Filed: March 27, 2001
    Date of Patent: November 11, 2008
    Assignee: Qwest Communications International Inc.
    Inventors: Eliot M. Case, Richard P. Phillips
  • Publication number: 20080270140
    Abstract: A speech synthesis system receives symbolic input describing an utterance to be synthesized. In one embodiment, different portions of the utterance are constructed from different sources, one of which is a speech corpus recorded from a human speaker whose voice is to be modeled. The other sources may include other human speech corpora or speech produced using Rule-Based Speech Synthesis (RBSS). At least some portions of the utterance may be constructed by modifying prototype speech units to produce adapted speech units that are contextually appropriate for the utterance. The system concatenates the adapted speech units with the other speech units to produce a speech waveform. In another embodiment, a speech unit of a speech corpus recorded from a human speaker lacks transitions at one or both of its edges. A transition is synthesized using RBSS and concatenated with the speech unit in producing a speech waveform for the utterance.
    Type: Application
    Filed: April 24, 2007
    Publication date: October 30, 2008
    Inventors: Susan R. Hertz, Harold G. Mills
  • Publication number: 20080243511
    Abstract: The present invention is a speech synthesizer that generates speech data of text including a fixed part and a variable part, in combination with recorded speech and rule-based synthetic speech. The speech synthesizer is a high-quality one in which recorded speech and synthetic speech are concatenated with the discontinuity of timbres and prosodies not perceived.
    Type: Application
    Filed: October 22, 2007
    Publication date: October 2, 2008
    Inventors: Yusuke Fujita, Ryota Kamoshida, Kenji Nagamatsu
  • Patent number: 7418389
    Abstract: A method for identifying common multiphone units to add to a unit inventory for a text-to-speech generator is disclosed. The common multiphone units are units that are larger than a phone, but smaller than a syllable. The method slices each syllable into a plurality of slices. These slices are then sorted and the frequency of each slice is determined. Those slices whose frequencies exceed a threshold are added to the unit inventory. The remaining slices are decomposed according to a predetermined set of rules to determine if they contain slices that should be added to the unit inventory.
    Type: Grant
    Filed: January 11, 2005
    Date of Patent: August 26, 2008
    Assignee: Microsoft Corporation
    Inventors: Min Chu, Yong Zhao
  • Patent number: 7412390
    Abstract: The emotion is to be added to the synthesized speech as the prosodic feature of the language is maintained. In a speech synthesis device 200, a language processor 201 generates a string of pronunciation marks from the text, and a prosodic data generating unit 202 creates prosodic data, expressing the time duration, pitch, sound volume or the like parameters of phonemes, based on the string of pronunciation marks. A constraint information generating unit 203 is fed with the prosodic data and with the string of pronunciation marks to generate the constraint information which limits the changes in the parameters to add the so generated constraint information to the prosodic data. A emotion filter 204, fed with the prosodic data, to which has been added the constraint information, changes the parameters of the prosodic data, within the constraint, responsive to the feeling state information, imparted to it.
    Type: Grant
    Filed: March 13, 2003
    Date of Patent: August 12, 2008
    Assignees: Sony France S.A., Sony Corporation
    Inventors: Erika Kobayashi, Toshiyuki Kumakura, Makoto Akabane, Kenichiro Kobayashi, Nobuhide Yamazaki, Tomoaki Nitta, Pierre Yves Oudeyer
  • Patent number: 7409347
    Abstract: Portions from segment boundary regions of a plurality of speech segments are extracted. Each segment boundary region is based on a corresponding initial unit boundary. Feature vectors that represent the portions in a vector space are created. For each of a plurality of potential unit boundaries within each segment boundary region, an average discontinuity based on distances between the feature vectors is determined. For each segment, the potential unit boundary associated with a minimum average discontinuity is selected as a new unit boundary.
    Type: Grant
    Filed: October 23, 2003
    Date of Patent: August 5, 2008
    Assignee: Apple Inc.
    Inventor: Jerome R. Bellegarda
  • Publication number: 20080120106
    Abstract: A semiconductor integrated circuit device including: a storage section which temporarily stores a command and text data input from the outside; a speech synthesis section which synthesizes a speech signal corresponding to the text data based on the command and the text data stored in the storage section, and outputs the synthesized speech signal to the outside; and a control section which controls a timing at which the command and the text data stored in the storage section are transferred to the speech synthesis section based on a speech synthesis start control signal. The control section controls an output of a speech output start notification signal which notifies in advance a start of outputting the synthesized speech signal to the outside based on occurrence of a speech synthesis start event, and then controls a start of outputting the synthesized speech signal to the outside at a given timing.
    Type: Application
    Filed: November 7, 2007
    Publication date: May 22, 2008
    Applicant: SEIKO EPSON CORPORATION
    Inventors: Masamichi Izumida, Masayuki Murakami
  • Patent number: 7373299
    Abstract: A variable voice rate apparatus to control a reproduction rate of voice, includes a voice data generation unit configured to generate voice data from the voice, a text data generation unit configured to generate text data indicating a content of the voice data, a division information generation unit configured to generate division information used for dividing the text data into a plurality of linguistic units each of which is characterized by a linguistic form, a reproduction information generation unit configured to generate reproduction information set for each of the linguistic units, and a voice reproduction controller which controls reproduction of each of the linguistic units, based on the reproduction information and the division information.
    Type: Grant
    Filed: December 23, 2003
    Date of Patent: May 13, 2008
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Hisayoshi Nagae, Kohei Momosaki, Masahide Ariu
  • Publication number: 20080027727
    Abstract: A speech unit corpus stores a group of speech units. A selection unit divides a phoneme sequence of target speech into a plurality of segments, and selects a combination of speech units for each segment from the speech unit corpus. An estimation unit estimates a distortion between the target speech and synthesized speech generated by fusing each speech unit of the combination for each segment. The selection unit recursively selects the combination of speech units for each segment based on the distortion. A fusion unit generates a new speech unit for each segment by fusing each speech unit of the combination selected for each segment. A concatenation unit generates synthesized speech by concatenating the new speech unit for each segment.
    Type: Application
    Filed: July 23, 2007
    Publication date: January 31, 2008
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Masahiro MORITA, Takehiko Kagoshima
  • Patent number: 7299182
    Abstract: There is provided an Ebook. The Ebook includes a memory device, a text-to-speech (TTS) module, and at least one speaker. The memory device stores files. The files include text. The TTS module synthesizes speech corresponding to the text. The at least one speaker outputs the speech.
    Type: Grant
    Filed: May 9, 2002
    Date of Patent: November 20, 2007
    Assignee: Thomson Licensing
    Inventor: Jianlei Xie
  • Patent number: 7275035
    Abstract: In a method of assisting a subject to generate speech, at least one first neural impulse is sensed from a first preselected location in the subject's brain. A first preselected sound is associated with the first neural impulse. The first preselected sound is generated in an audible format. In an apparatus for assisting the subject to generate speech, at least one sensor senses a neural impulse in the subject's brain and generates a signal representative thereof. An electronic speech generator generates a phoneme in response to the generation of the signal. An audio system generates audible sounds corresponding to the phoneme based upon the signal received from the speech generator.
    Type: Grant
    Filed: December 8, 2004
    Date of Patent: September 25, 2007
    Assignee: Neural Signals, Inc.
    Inventor: Philip R. Kennedy
  • Patent number: 7249022
    Abstract: There are provided a singing voice-synthesizing method and apparatus capable of performing synthesis of natural singing voices close to human singing voices based on performance data being input in real time. Performance data is inputted for each phonetic unit constituting a lyric, to supply phonetic unit information, singing-starting time point information, singing length information, etc. Each performance data is inputted in timing earlier than the actual singing-starting time point, and a phonetic unit transition time length is generated. By using the phonetic unit transition time, the singing-starting time point information, and the singing length information, the singing-starting time points and singing duration times of the first and second phonemes are determined. In the singing voice synthesis, for each phoneme, a singing voice is generated at the determined singing-starting time point and continues to be generated for the determined singing duration time.
    Type: Grant
    Filed: December 1, 2005
    Date of Patent: July 24, 2007
    Assignee: Yamaha Corporation
    Inventors: Hiraku Kayama, Oscar Celma, Jaume Ortola
  • Patent number: 7240005
    Abstract: A method of high-speed reading in a text-to-speech conversion system including a text analysis module (101) for generating a phoneme and prosody character string from an input text; a prosody generation module (102) for generating a synthesis parameter of at least a voice segment, a phoneme duration, and a fundamental frequency for the phoneme and prosody character string; and a speech generation module (103) for generating a synthetic waveform by waveform superimposition by referring to a voice segment dictionary (105). The prosody generation module is provided with both a duration rule table containing empirically found phoneme durations and a duration prediction table containing phoneme durations predicted by statistical analysis and, when the user-designated utterance speed exceeds a threshold, uses the duration rule table and, when the threshold is not exceeded, uses the duration prediction table to determined the phoneme duration.
    Type: Grant
    Filed: January 29, 2002
    Date of Patent: July 3, 2007
    Assignee: Oki Electric Industry Co., Ltd.
    Inventor: Keiichi Chihara
  • Patent number: 7219061
    Abstract: Predetermined macrosegments of the fundamental frequency are determined by a neural network, and these predefined macrosegments are reproduced by fundamental-frequency sequences stored in a database. The fundamental frequency is generated on the basis of a relatively large text section which is analyzed by the neural network. Microstructures from the database are received in the fundamental frequency. The fundamental frequency thus formed is thus optimized both with regard to its macrostructure and to its microstructure. As a result, an extremely natural sound is achieved.
    Type: Grant
    Filed: October 24, 2000
    Date of Patent: May 15, 2007
    Assignee: Siemens Aktiengesellschaft
    Inventors: Caglayan Erdem, Martin Holzapfel
  • Patent number: 7219065
    Abstract: A sound processor including a microphone (1), a pre-amplifier (2), a bank of N parallel filters (3), means for detecting short-duration transitions in the envelope signal of each filter channel, and means for applying gain to the outputs of these filter channels in which the gain is related to a function of the second-order derivative of the slow-varying envelope signal in each filter channel, to assist in perception of low-intensity short-duration speech features in said signal.
    Type: Grant
    Filed: October 25, 2000
    Date of Patent: May 15, 2007
    Inventors: Andrew E. Vandali, Graeme M. Clark
  • Patent number: 7177811
    Abstract: A method is provided for customizing a multi-media message created by a sender for a recipient, in which the multi-media message includes an animated entity audibly presenting speech converted from text by the sender. At least one image is received from the sender. Each of the at least one image is associated with a tag. The sender is presented with options to insert the tag associated with one of the at least one image into the sender text.
    Type: Grant
    Filed: March 6, 2006
    Date of Patent: February 13, 2007
    Assignee: AT&T Corp.
    Inventors: Joern Ostermann, Barbara Buda, Mehmet Reha Civanlar, Eric Cosatto, Hans Peter Graf, Thomas M. Isaacson, Yann Andre LeCun
  • Patent number: 7171362
    Abstract: The assignment of phonemes to graphemes producing them in a lexicon having words (grapheme sequences) and their associated phonetic transcription (phoneme sequences) for the preparation of patterns for training neural networks for the purpose of grapheme-phoneme conversion is carried out with the aid of a variant of dynamic programming which is known as dynamic time warping (DTW).
    Type: Grant
    Filed: August 31, 2001
    Date of Patent: January 30, 2007
    Assignee: Siemens Aktiengesellschaft
    Inventor: Horst-Udo Hain
  • Patent number: 7124084
    Abstract: There are provided a singing voice-synthesizing method and apparatus capable of performing synthesis of natural singing voices close to human singing voices based on performance data being input in real time. Performance data is inputted for each phonetic unit constituting a lyric, to supply phonetic unit information, singing-starting time point information, singing length information, etc. Each performance data is inputted in timing earlier than the actual singing-starting time point, and a phonetic unit transition time length is generated. By using the phonetic unit transition time, the singing-starting time point information, and the singing length information, the singing-starting time points and singing duration times of the first and second phonemes are determined. In the singing voice synthesis, for each phoneme, a singing voice is generated at the determined singing-starting time point and continues to be generated for the determined singing duration time.
    Type: Grant
    Filed: December 27, 2001
    Date of Patent: October 17, 2006
    Assignee: Yamaha Corporation
    Inventors: Hiraku Kayama, Oscar Celma, Jaume Ortola
  • Patent number: 7117156
    Abstract: The invention concerns a method and apparatus for performing packet loss or Frame Erasure Concealment (FEC) for a speech coder that does not have a built-in or standard FEC process. A receiver with a decoder receives encoded frames of compressed speech information transmitted from an encoder. A lost frame detector at the receiver determines if an encoded frame has been lost or corrupted in transmission, or erased. If the encoded frame is not erased, the encoded frame is decoded by a decoder and a temporary memory is updated with the decoder's output. A predetermined delay period is applied and the audio frame is then output. If the lost frame detector determines that the encoded frame is erased, a FEC module applies a frame concealment process to the signal. The FEC processing produces natural sounding synthetic speech for the erased frames.
    Type: Grant
    Filed: April 19, 2000
    Date of Patent: October 3, 2006
    Assignee: AT&T Corp.
    Inventor: David A. Kapilow
  • Patent number: 7113909
    Abstract: A stereotypical sentence is synthesized into a voice of an arbitrary speech style. A third party is able to prepare prosody data and a user of a terminal device having a voice synthesizing part can acquire the prosody data. The voice synthesizing method determines a voice-contents identifier to point to a type of voice contents of a stereotypical sentence, prepares a speech style dictionary including speech style and prosody data which correspond to the voice-contents identifier, selects prosody data of the synthesized voice to be generated from the speech style dictionary, and adds the selected prosody data to a voice synthesizer 13 as voice-synthesizer driving data to thereby perform voice synthesis with a specific speech style. Thus, a voice of a stereotypical sentence can be synthesized with an arbitrary speech style.
    Type: Grant
    Filed: July 31, 2001
    Date of Patent: September 26, 2006
    Assignee: Hitachi, Ltd.
    Inventors: Nobuo Nukaga, Kenji Nagamatsu, Yoshinori Kitahara
  • Patent number: 7089187
    Abstract: A voice synthesizing system can make necessary calculation amount satisfactorily small and can make necessary file size small. The system includes a compressed pitch segment database storing compressed voice waveform segments, a pitch developing portion reading out the voice waveform segment from the database and decompressing the compressed data for reproducing an original voice waveform segment when the voice waveform segment necessary for voice waveform synthesis is demanded, and a cache processing portion temporarily storing the voice waveform segment already used in voice waveform synthesis, and when voice waveform segment necessary for voice waveform synthesis is demanded, returning demanded voice waveform segment to a demander when demanded voice waveform segment is already stored, and obtaining the voice waveform segment from the database via the pitch developing portion to hold the obtained voice waveform segment and return to the demander when demanded voice waveform segment is not stored.
    Type: Grant
    Filed: September 26, 2002
    Date of Patent: August 8, 2006
    Assignee: NEC Corporation
    Inventors: Reishi Kondo, Hiroaki Hattori
  • Patent number: 7054815
    Abstract: A speech synthesizing apparatus extracts small speech segments from a speech waveform as a prosody control target and adds inhibition information for inhibiting a predetermined prosody change process to a selected small speech segment in executing prosody control. Prosody control is performed by performing a predetermined prosody change process by using small speech segments of the extracted small speech segments other than small speech segments to which inhibition information is added. This makes it possible to prevent a deterioration in synthesized speech due to waveform editing operation.
    Type: Grant
    Filed: March 27, 2001
    Date of Patent: May 30, 2006
    Assignee: Canon Kabushiki Kaisha
    Inventors: Masayuki Yamada, Yasuhiro Komori
  • Patent number: 7035803
    Abstract: A system and method of providing sender customization of multi-media messages through the use of inserted images or video. The images or video may be sender-created or predefined and available to the sender via a web server. The method relates to customizing a multi-media message created by a sender for a recipient, the multi-media message having an animated entity audibly presenting speech converted from text created by the sender. The method comprises receiving at least one image from the sender, associating each at least one image with a tag, presenting the sender with options to insert the tag associated with one of the at least one image into the sender text, and after the sender inserts the tag associated with one of the at least one images into the sender text, delivering the multi-media message with the at least one image presented as background to the animated entity according to a position of the tag associated with the at least one image in the sender text.
    Type: Grant
    Filed: November 2, 2001
    Date of Patent: April 25, 2006
    Assignee: AT&T Corp.
    Inventors: Joern Ostermann, Barbara Buda, Mehmet Reha Civanlar, Eric Cosatto, Hans Peter Graf, Thomas M. Isaacson, Yann Andre LeCun
  • Patent number: 7031919
    Abstract: A speech synthesizing apparatus for synthesizing a speech waveform stores speech data, which is obtained by adding attribute information onto phoneme data, in a database. In accordance with prescribed retrieval conditions, a phoneme retrieval unit retrieves phoneme data from the speech data that has been stored in the database and retains the retrieved results in a retrieved-result storage area. A processing unit for assigning a power penalty and a processing unit for assigning a phoneme-duration penalty assign the penalties, on the basis of power and phoneme duration constituting the attribute information, to a set of phoneme data stored in the retrieved-result storage area. A processing unit for determining typical phoneme data performs sorting on the basis of the assigned penalties and, based upon the stored results, selects phoneme data to be employed in the synthesis of a speech waveform.
    Type: Grant
    Filed: August 30, 1999
    Date of Patent: April 18, 2006
    Assignee: Canon Kabushiki Kaisha
    Inventors: Yasuo Okutani, Masayuki Yamada
  • Patent number: 7010491
    Abstract: With the goal of presenting a waveform compression and expansion apparatus with which the sound quality of such things as musical tones that are expressed by waveforms is satisfactory following the compression and expansion of the waveforms of the musical tones etc., a method and system for waveform compression and expansion is disclosed in which all of the multiple number of band divided waveforms that comprise the original waveform which has been band divided are apportioned to at least two kinds of compression and expansion formats and form a multiple number of compressed and expanded waveforms by compression or expansion an identical amount only in the direction of the temporal axis.
    Type: Grant
    Filed: December 9, 1999
    Date of Patent: March 7, 2006
    Assignee: Roland Corporation
    Inventor: Tadao Kikumoto
  • Patent number: 6999922
    Abstract: The present invention (110) permits a user to speed up and slow down speech without changing the speakers pitch (102, 110, 112, 128, 402–416). It is a user adjustable feature to change the spoken rate to the listeners' preferred listening rate or comfort. It can be included on the phone as a customer convenience feature without changing any characteristics of the speakers voice besides the speaking rate with soft key button (202) combinations (in interconnect or normal). From the users perspective, it would seem only that the talker changed his speaking rate, and not that the speech was digitally altered in any way. The pitch and general prosody of the speaker are preserved. The following uses of the time expansion/compression feature are listed to compliment already existing technologies or applications in progress including messaging services, messaging applications and games, real-time feature to slow down the listening rate.
    Type: Grant
    Filed: June 27, 2003
    Date of Patent: February 14, 2006
    Assignee: Motorola, Inc.
    Inventors: Marc Andre Boillot, John Gregory Harris, Thomas Lawrence Reinke
  • Patent number: 6961704
    Abstract: An arrangement is provided for text to speech processing based on linguistic prosodic models. Linguistic prosodic models are established to characterize different linguistic prosodic characteristics. When an input text is received, a target unit sequence is generated with a linguistic target that annotates target units in the target unit sequence with a plurality of linguistic prosodic characteristics so that speech synthesized in accordance with the target unit sequence and the linguistic target has certain desired prosodic properties. A unit sequence is selected in accordance with the target unit sequence and the linguistic target based on joint cost information evaluated using established linguistic prosodic models. The selected unit sequence is used to produce synthesized speech corresponding to the input text.
    Type: Grant
    Filed: January 31, 2003
    Date of Patent: November 1, 2005
    Assignee: Speechworks International, Inc.
    Inventors: Michael S. Phillips, Daniel S. Faulkner, Marek A. Przezdzieci
  • Patent number: 6961697
    Abstract: The invention concerns a method and apparatus for performing packet loss or Frame Erasure Concealment (FEC) for a speech coder that does not have a built-in or standard FEC process. A receiver with a decoder receives encoded frames of compressed speech information transmitted from an encoder. A lost frame detector at the receiver determines if an encoded frame has been lost or corrupted in transmission, or erased. If the encoded frame is not erased, the encoded frame is decoded by a decoder and a temporary memory is updated with the decoder's output. A predetermined delay period is applied and the audio frame is then output. If the lost frame detector determines that the encoded frame is erased, a FEC module applies a frame concealment process to the signal. The FEC processing produces natural sounding synthetic speech for the erased frames.
    Type: Grant
    Filed: April 19, 2000
    Date of Patent: November 1, 2005
    Assignee: AT&T Corp.
    Inventor: David A. Kapilow
  • Patent number: 6950798
    Abstract: A text-to-speech synthesizer employs database that includes units. For each unit there is a collection of unit selection parameters and a plurality of frames. Each frame has a set of model parameters derived from a base speech frame, and a speech frame synthesized from the frame's model parameters. A text to be synthesized is converted to a sequence of desired unit features sets, and for each such set the database is perused to retrieve a best-matching unit. An assessment is made whether modifications to the frames are needed, because of discontinuities in the model parameters at unit boundaries, or because of differences between the desired and selected unit features. When modifications are necessary, the model parameters of frames that need to be altered are modified, and new frames are synthesized from the modified model parameters and concatenated to the output. Otherwise, the speech frames previously stored in the database are retrieved and concatenated to the output.
    Type: Grant
    Filed: March 2, 2002
    Date of Patent: September 27, 2005
    Assignee: AT&T Corp.
    Inventors: Mark Charles Beutnagel, David A. Kapilow, Ioannis G. Stylianou, Ann K. Syrdal
  • Patent number: 6910007
    Abstract: Natural-sounding synthesized speech is obtained from pieced elemental speech units that have their super-class identities known (e.g. phoneme type), and their line spectral frequencies (LSF) set in accordance with a correlation between the desired fundamental frequency and the LSF vectors that are known for different classes in the super-class. The correlation between a fundamental frequency in a class and the corresponding LSF is obtained by, for example, analyzing the database of recorded speech of a person and, more particularly, by analyzing frames of the speech signal.
    Type: Grant
    Filed: January 25, 2001
    Date of Patent: June 21, 2005
    Assignee: AT&T Corp
    Inventors: Ioannis G (Yannis) Stylianou, Alexander Kain
  • Patent number: 6879957
    Abstract: A text-to-speech system utilizes a method for producing a speech rendition of text based on dividing some or all words of a sentence into component diphones. A phonetic dictionary is aligned so that each letter within each word has a single corresponding phoneme. The aligned dictionary is analyzed to determine the most common phoneme representation of the letter in the context of a string of letters before and after it. The results for each letter are stored in phoneme rule matrix. A diphone database is created using a way editor to cut 2,000 distinct diphones out of specially selected words. A computer algorithm selects a phoneme for each letter. Then, two phonemes are used to create a diphone. Words are then read aloud by concatenating sounds from the diphone database. In one embodiment, diphones are used only when a word is not one of a list of pre-recorded words.
    Type: Grant
    Filed: September 1, 2000
    Date of Patent: April 12, 2005
    Inventors: William H. Pechter, Joseph E. Pechter
  • Patent number: 6873955
    Abstract: Partial waveform data representative of a waveform shape variation are extracted from supplied waveform data, and the extracted partial waveform data are stored along with time position information indicative of their respective time positions. In reproduction, the partial waveform data and time position information are read out, then the partial waveform data are arranged on the time axis in accordance with the time position information, and a waveform is produced on the basis of the waveform data arranged on the time axis. In another implementation, sets of sample identification information and time position information are obtained in accordance with a performance tone waveform to be reproduced, and sample data are obtained from a database in accordance with the sample identification information. The thus-obtained sample data are arranged on the time axis in accordance with the time position information, and the desired waveform is produced on the basis of the sample data arranged on the time axis.
    Type: Grant
    Filed: September 22, 2000
    Date of Patent: March 29, 2005
    Assignee: Yamaha Corporation
    Inventors: Hideo Suzuki, Motoichi Tamura, Satoshi Usa
  • Patent number: 6847932
    Abstract: Given phonetic information is divided into speech units of extended CV which is a contiguous sequence of phonemes without clear distinction containing a vowel or some vowels. Contour of vocal tract transmission function of phoneme of the speech unit of extended CV is obtained from the phoneme directory which contains a contour of vocal tract transmission function of each phoneme associated with phonetic information in a unit of extended CV. Speech waveform data is generated based on the contour of vocal tract transmission function of phoneme of the speech unit of extended CV. Speech waveform data is converted into analog voice signal.
    Type: Grant
    Filed: September 28, 2000
    Date of Patent: January 25, 2005
    Assignee: Arcadia, Inc.
    Inventors: Kazuyuki Ashimura, Seiichi Tenpaku
  • Patent number: 6832192
    Abstract: A speech synthesizing apparatus acquires a synthesis unit speech segment divided as a speech synthesis unit, and acquires partial speech segments by dividing the synthesis unit speech segment with a phoneme boundary. The power value required for each partial speech segment is estimated on the basis of a target power value in reproduction. An amplitude magnification is acquired from the ratio of the estimated power value to the reference power value for each of the partial speech segments. Synthesized speech is generated by changing the amplitude of each partial speech segment of the synthesis unit speech segment on the basis of the acquired amplitude magnification.
    Type: Grant
    Filed: March 29, 2001
    Date of Patent: December 14, 2004
    Assignee: Canon Kabushiki Kaisha
    Inventor: Masayuki Yamada
  • Patent number: 6823309
    Abstract: A speech synthesis system for storing in advance a degree of modification of prosodic data in a prosodic data modifying rule apparatus, the degree of modification corresponding to an approximate cost and being stored as a modifying rule, a prosodic data retrieving section for retrieving a prosodic data stored corresponding to a key data for use in retrieval, the prosodic data retrieved according to a degree of matching between the input data and the key data, the degree of matching represented by the approximate cost, a modifying section for modifying the retrieved prosodic data based on the degree of matching and the modifying rule stored in the prosodic data modifying rule means, and an output section for outputting synthesized speech based on the input data and the modified prosodic data.
    Type: Grant
    Filed: November 27, 2000
    Date of Patent: November 23, 2004
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Yumiko Kato, Kenji Matsui, Takahiro Kamai, Katsuyoshi Yamagami
  • Patent number: 6813604
    Abstract: A text to speech system modeling durational characteristics of a target speaker is addressed herein. A body of target speaker training text is selected having maximum possible information about speaker specific characteristics. The body of target speaker training text is read by a target speaker to produce a target speaker training corpus. A previously generated source model reflecting characteristics of a source model is retrieved and the target speaker training corpus is processed to produce modification parameters reflecting differences between durational characteristics of the target speaker and those predicted by the source model. The modification parameters are applied to the source model to produce a target model. Text inputs are processed using the target model to produce speech outputs reflecting durational characteristics of the target speaker.
    Type: Grant
    Filed: November 13, 2000
    Date of Patent: November 2, 2004
    Assignee: Lucent Technologies Inc.
    Inventors: Chi-Lin Shih, Jan Pieter Hendrik van Santen
  • Patent number: 6785652
    Abstract: A method and an apparatus for improved duration modeling of phonemes in a speech synthesis system are provided. According to one aspect, text is received into a processor of a speech synthesis system. The received text is processed using a sum-of-products phoneme duration model that is used in either the formant method or the concatenative method of speech generation. The phoneme duration model, which is used along with a phoneme pitch model, is produced by developing a non-exponential functional transformation form for use with a generalized additive model. The non-exponential functional transformation form comprises a root sinusoidal transformation that is controlled in response to a minimum phoneme duration and a maximum phoneme duration. The minimum and maximum phoneme durations are observed in training data. The received text is processed by specifying at least one of a number of contextual factors for the generalized additive model.
    Type: Grant
    Filed: December 19, 2002
    Date of Patent: August 31, 2004
    Assignee: Apple Computer, Inc.
    Inventors: Jerome R. Bellegarda, Kim Silverman
  • Patent number: 6778960
    Abstract: A speech information processing apparatus which sets the duration of phonological series with accuracy, and sets a natural phoneme duration in accordance with phonemic/linguistic environment. For this purpose, the duration of predetermined unit of phonological series is obtained based on a duration model for entire segment. Then duration of each of phonemes constructing the phonological series is obtained based on the duration model for the entire segment. Then duration of each phoneme is set based on the duration of the phonological series and the duration of each phoneme.
    Type: Grant
    Filed: March 28, 2001
    Date of Patent: August 17, 2004
    Assignee: Canon Kabushiki Kaisha
    Inventor: Toshiaki Fukada
  • Patent number: 6658382
    Abstract: An input signal is time-frequency transformed, then the frequency-domain coefficients are divided into coefficient segments of about 100 Hz width to generate a sequence of coefficient segments, and the sequence of coefficient segments is split into subbands each consisting of plural coefficient segments. A threshold value is determined based on the intensity of each coefficient segment in each subband. The intensity of each coefficient segment is compared with the threshold value, and the coefficient segments are classified into low- and high-intensity groups. The coefficient segments are quantized for each group, or they are flattened respectively and then quantized through recombination.
    Type: Grant
    Filed: March 23, 2000
    Date of Patent: December 2, 2003
    Assignee: Nippon Telegraph and Telephone Corporation
    Inventors: Naoki Iwakami, Takehiro Moriya, Akio Jin, Kazuaki Chikira, Takeshi Mori
  • Patent number: 6647280
    Abstract: A signal processing method, preferably for extracting a fundamental period from a noisy, low-frequency signal, is disclosed. The signal processing method generally comprises calculating a numerical transform for a number of selected periods by multiplying signal data by discrete points of a sine and a cosine wave of varying period and summing the results. The period of the sine and cosine waves are preferably selected to have a period substantially equivalent to the period of interest when performing the transform.
    Type: Grant
    Filed: January 14, 2002
    Date of Patent: November 11, 2003
    Assignee: OB Scientific, Inc.
    Inventors: Dennis E. Bahr, James L. Reuss.
  • Patent number: 6629067
    Abstract: A range control system includes an input section for inputting a singing voice, a fundamental frequency extracting section for extracting a fundamental frequency of the inputted voice, and a pitch control section for performing a pitch control of the inputted voice so as to match the extracted fundamental frequency with a given frequency. The system further includes a formant extracting section for extracting a formant of the inputted voice, and a formant filter section for performing a filter operation relative to the pitch-controlled voice so that the pitch-controlled voice has a characteristic of the extracted formant. The system further includes an input loudness detecting section for detecting a first loudness of the inputted voice, and a loudness control section for controlling a second loudness of the voice subjected to the filter operation to match with the first loudness.
    Type: Grant
    Filed: May 14, 1998
    Date of Patent: September 30, 2003
    Assignee: Kabushiki Kaisha Kawai Gakki Seisakusho
    Inventors: Tsutomu Saito, Hiroshi Kato, Youichi Kondo
  • Patent number: 6553344
    Abstract: A method and an apparatus for improved duration modeling of phonemes in a speech synthesis system are provided. According to one aspect, text is received into a processor of a speech synthesis system. The received text is processed using a sum-of-products phoneme duration model that is used in either the formant method or the concatenative method of speech generation. The phoneme duration model, which is used along with a phoneme pitch model, is produced by developing a non-exponential functional transformation form for use with a generalized additive model. The non-exponential functional transformation form comprises a root sinusoidal transformation that is controlled in response to a minimum phoneme duration and a maximum phoneme duration. The minimum and maximum phoneme durations are observed in training data. The received text is processed by specifying at least one of a number of contextual factors for the generalized additive model.
    Type: Grant
    Filed: February 22, 2002
    Date of Patent: April 22, 2003
    Assignee: Apple Computer, Inc.
    Inventors: Jerome R. Bellegarda, Kim Silverman
  • Patent number: 6546367
    Abstract: Statistical data including an average value, a standard deviation, and a minimum value of a phoneme duration of each phoneme is stored in a memory. When speech production time is determined for a phoneme string in a predetermined expiratory paragraph, the total phoneme duration of the phoneme string is set so as to become equal to the speech production time. Based on the set phoneme duration, phonemes are connected and a speech waveform is generated. To set a phoneme duration for each phoneme, a phoneme duration initial value is first set based on an average value, obtained by equally dividing the speech production time by phonemes of the phoneme string, and a phoneme duration range, phoneme. Then, set based on statistical data of each the phoneme duration initial value is adjusted based on the statistical data and the speech production time.
    Type: Grant
    Filed: March 9, 1999
    Date of Patent: April 8, 2003
    Assignee: Canon Kabushiki Kaisha
    Inventor: Mitsuru Otsuka
  • Patent number: 6542867
    Abstract: The duration of speech varies according to the characteristics of pronounced speech and pronouncing habit of the speaker. In the speech duration processing method and apparatus of this invention, a large amount of natural speech was analyzed, and the following was known: Speech duration of monosyllables will vary according to factors, such as phonemes, tones, phrase construction, locations in the phrases, locations in the sentence, and front and rear connected phonemes, etc. of the syllables. Through the use of these varying factors, a “speech duration parameter storage portion” for speech duration parameters is constructed. By retrieving the speech duration parameters and combining the same with the basic speech duration of a syllable during syllable speech duration calculation, the speech duration of each monosyllable in any sentence can be accurately decided.
    Type: Grant
    Filed: March 28, 2000
    Date of Patent: April 1, 2003
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Shih Chang Sun, Chin Yun Hsieh
  • Publication number: 20030050781
    Abstract: A plurality of blocks of waveform data are stored in a memory, which also stores, for each of the blocks, synchronizing information representative of a plurality of cycle synchronizing points that are indicative of periodic specific phase positions where the block of waveform data should be synchronized in phase with another block of waveform data. Two blocks of waveform data (e.g., harmonic and nonharmonic components) are read out from the memory, along with the synchronizing information. On the basis of the synchronizing information, the readout of two blocks of waveform data is controlled using the synchronizing information. There is stored, for each of the blocks, at least one piece of synchronizing position information indicative of a specific position where the block should be synchronized with another block, and the readout of the individual blocks of waveform data is controlled so that the blocks are synchronized with each other using the synchronizing position information.
    Type: Application
    Filed: September 11, 2002
    Publication date: March 13, 2003
    Applicant: Yamaha Corporation
    Inventors: Motoichi Tamura, Yasuyuki Umeyama
  • Patent number: 6499014
    Abstract: The speech synthesis apparatus of the present invention includes: a text analyzer operable to generate a phonetic and prosodic symbol string from character information of an input text; a word dictionary storing a reading and an accent of a word; an voice segment dictionary storing a phoneme that is a basic unit of speech; a parameter generator operable to generate synthesizing parameters including at least a phoneme, a duration of the phoneme and a fundamental frequency for the phonetic and prosodic symbol string, the parameter generator including a calculating means operable to obtain a sum of phrase components and a sum of accent components and to calculate an average pitch from the sum of the phrase components and the sum of the accent components, and a determining means operable to determine a base pitch from the average pitch; and a waveform generator operable to generate a synthesized waveform by making waveform-overlapping referring to the synthesizing parameters generated by the parameter generator a
    Type: Grant
    Filed: March 7, 2000
    Date of Patent: December 24, 2002
    Assignee: Oki Electric Industry Co., Ltd.
    Inventor: Keiichi Chihara
  • Patent number: RE39336
    Abstract: The concatenative speech synthesizer employs demi-syllable subword units to generate speech. The synthesizer is based on a source-filter model that uses source signals that correspond closely to the human glottal source and that uses filter parameters that correspond closely to the human vocal tract. Concatenation of the demi-syllable units is facilitated by two separate cross face techniques, one applied in the time domain in the demi-syllable source signal waveforms, and one applied in the frequency domain by interpolating the corresponding filter parameters of the concatenated demi-syllables. The dual cross fade technique results in natural sounding synthesis that avoids time-domain glitches without degrading or smearing characteristic resonances in the filter domain.
    Type: Grant
    Filed: November 5, 2002
    Date of Patent: October 10, 2006
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Steve Pearson, Nicholas Kibre, Nancy Niedzielski