Frequency Element Patents (Class 704/268)
  • Patent number: 7526430
    Abstract: A speech synthesis apparatus, which can embed unchangeable additional information into synthesized speech without causing a deterioration of speech quality and restriction by bands, includes a language processing unit which generates synthesized speech generation information necessary for generating synthesized speech in accordance with a language string, a prosody generating unit which generates prosody information of speech based on the synthesized speech generation information, and a waveform generating unit which synthesizes speech based on the prosody information, in which the prosody generating unit embed code information as watermark information in the prosody information of a segment having a predetermined time duration within a phoneme length including a phoneme boundary.
    Type: Grant
    Filed: September 15, 2005
    Date of Patent: April 28, 2009
    Assignee: Panasonic Corporation
    Inventors: Yumiko Kato, Takahiro Kamai
  • Patent number: 7512536
    Abstract: Low-complexity synthesis filter bank for MPEG audio decoding uses a factoring of the 64×32 matrixing for the inverse-quantized subband coefficients. Factoring into non-standard 4-point discrete cosine and sine transforms, point-wise multiplications and combinations, and non-standard 8-point discrete cosine and sine transforms limits memory requirements and computational complexity.
    Type: Grant
    Filed: May 2, 2005
    Date of Patent: March 31, 2009
    Assignee: Texas Instruments Incorporated
    Inventor: Mohamed F. Mansour
  • Publication number: 20090076822
    Abstract: A sequence is received of time domain digital audio samples representing sound (e.g., a sound generated by a human voice or a musical instrument). The time domain digital audio samples are processed to derive a corresponding sequence of audio pulses in the time domain. Each of the audio pulses is associated with a characteristic frequency. Frequency domain information is derived about each of at least some of the audio pulses. The sound represented by the time domain digital audio samples is transformed by processing the audio pulses using the frequency domain information.
    Type: Application
    Filed: September 13, 2007
    Publication date: March 19, 2009
    Inventor: Jordi Bonada Sanjaume
  • Patent number: 7502739
    Abstract: In generation of an intonation pattern of a speech synthesis, a speech synthesis system is capable of providing a highly natural speech and capable of reproducing speech characteristics of a speaker flexibly and accurately by effectively utilizing FO patterns of actual speech accumulated in a database. An intonation generation method generates an intonation of synthesized speech for text by estimating, based on language information of the text and based on the estimated outline of the intonation, and then selects an optimum intonation pattern from a database which stores intonation patterns of actual speech. Speech characteristics recorded in advance are reflected in an estimation of an outline of the intonation pattern and selection of a waveform element of a speech.
    Type: Grant
    Filed: January 24, 2005
    Date of Patent: March 10, 2009
    Assignee: International Business Machines Corporation
    Inventors: Takashi Saito, Masaharu Sakamoto
  • Publication number: 20090025537
    Abstract: A plurality of blocks of waveform data are stored in a memory, which also stores, for each of the blocks, synchronizing information representative of a plurality of cycle synchronizing points that are indicative of periodic specific phase positions where the block of waveform data should be synchronized in phase with another block of waveform data. Two blocks of waveform data (e.g., harmonic and nonharmonic components) are read out from the memory, along with the synchronizing information. On the basis of the synchronizing information, the readout of two blocks of waveform data is controlled using the synchronizing information. There is stored, for each of the blocks, at least one piece of synchronizing position information indicative of a specific position where the block should be synchronized with another block, and the readout of the individual blocks of waveform data is controlled so that the blocks are synchronized with each other using the synchronizing position information.
    Type: Application
    Filed: September 24, 2007
    Publication date: January 29, 2009
    Applicant: Yamaha Corporation
    Inventors: Motoichi Tamura, Yasuyuki Umeyama
  • Publication number: 20090024393
    Abstract: A speech synthesizer conducts a dialogue among a plurality of synthesized speakers, including a self speaker and one or more partner speakers, by use of a voice profile table describing emotional characteristics of synthesized voices, a speaker database storing feature data for different types of speakers and/or different speaking tones, a speech synthesis engine that synthesizes speech from input text according to feature data fitting the voice profile assigned to each synthesized speaker, and a profile manager that updates the voice profiles according to the content of the spoken text. The voice profiles of partner speakers are initially derived from the voice profile of the self speaker. A synthesized dialogue can be set up simply by selecting the voice profile of the self speaker.
    Type: Application
    Filed: June 11, 2008
    Publication date: January 22, 2009
    Applicant: OKI ELECTRIC INDUSTRY CO., LTD.
    Inventor: Tsutomu Kaneyasu
  • Patent number: 7457752
    Abstract: Method and apparatus for controlling the operation of an emotion synthesizing device, notably of the type where the emotion is conveyed by a sound, having at least one input parameter whose value is used to set a type of emotion to be conveyed, by making at least one parameter a variable parameter over a determined control range, thereby to confer a variability in an amount of the type of emotion to be conveyed. The variable parameter can be made variable according to a variation model over the control range, the model relating a quantity of emotion control variable to the variable parameter, whereby said control variable is used to variably establish a value of said variable parameter. Preferably the variation obeys a linear model, the variable parameter being made to vary linearly with a variation in a quantity of emotion control variable.
    Type: Grant
    Filed: August 12, 2002
    Date of Patent: November 25, 2008
    Assignee: Sony France S.A.
    Inventor: Pierre Yves Oudeyer
  • Patent number: 7454347
    Abstract: A labeling part 3 analyzes the character string data to produce a phoneme label and a prosody label, partition the voice data stored in a voice database 1 into phonemic data, and label the phonemic data, employing the phoneme label and the like. A phoneme segmenting part 4 connects the voice data labeled with the same kind of phonemic data, and a formant extracting part 5 specifies the frequency of formant of each piece of phonemic data. A processing part 6 decides an evaluation value for each phonemic data based on the frequency of formant, and an error detection part 7 detects the phonemic data of which a deviation of the evaluation value within a set of phonemic data reaches a predetermined amount.
    Type: Grant
    Filed: August 18, 2004
    Date of Patent: November 18, 2008
    Assignee: Kabushiki Kaisha Kenwood
    Inventor: Rika Koyama
  • Patent number: 7444289
    Abstract: An audio decoding method and apparatus for reconstructing high frequency components with less computation are provided. The audio decoding apparatus includes a decoder, a channel similarity determination unit, a high frequency component generation unit, and an audio synthesizing unit. The audio decoding method generates high frequency components of frames while skipping every other frame for each channel signal; when right and left channel signals are similar to each other, generates high frequency components of the skipped frame for any one channel signal by using the generated high frequency components of the corresponding frame for the other channel signal; and when the right and left channel signals are not similar to each other, generates high frequency components of the skipped frames for each channel signal by using previous frames for the relevant channel signal.
    Type: Grant
    Filed: September 2, 2003
    Date of Patent: October 28, 2008
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Yoonhark Oh, Mathew Manu
  • Publication number: 20080243511
    Abstract: The present invention is a speech synthesizer that generates speech data of text including a fixed part and a variable part, in combination with recorded speech and rule-based synthetic speech. The speech synthesizer is a high-quality one in which recorded speech and synthetic speech are concatenated with the discontinuity of timbres and prosodies not perceived.
    Type: Application
    Filed: October 22, 2007
    Publication date: October 2, 2008
    Inventors: Yusuke Fujita, Ryota Kamoshida, Kenji Nagamatsu
  • Publication number: 20080228487
    Abstract: A language processing unit identifies a word by performing language analysis on a text supplied from a text holding unit. A synthesis selection unit selects speech synthesis processing performed by a rule-based synthesis unit or speech synthesis processing performed by a pre-recorded-speech-based synthesis unit for a word of interest extracted from the language analysis result. The selected rule-based synthesis unit or pre-recorded-speech-based synthesis unit executes speech synthesis processing for the word of interest.
    Type: Application
    Filed: February 22, 2008
    Publication date: September 18, 2008
    Applicant: CANON KABUSHIKI KAISHA
    Inventors: Yasuo Okutani, Michio Aizawa, Toshiaki Fukada
  • Patent number: 7424430
    Abstract: A sound source apparatus has a plurality of tone forming parts for outputting either of desired tones or formants according to designation of a wave table sound source mode or a voice synthesizing mode, such that the tone forming parts generate the tones in the wave table sound source mode, and generate the formants for synthesis of a voice in the voice synthesizing mode. Each tone forming part has an envelope application section that operates in the wave table sound source mode for generating an envelope signal which rises in synchronization with an instruction to start the generating of the tone and decays in synchronization with another instruction to stop the generating of the tone, and applying the generated envelope signal to waveform data read from a wave table.
    Type: Grant
    Filed: January 26, 2004
    Date of Patent: September 9, 2008
    Assignee: Yamaha Corporation
    Inventors: Takehiko Kawahara, Nobukazu Nakamura
  • Patent number: 7389231
    Abstract: A voice synthesizing apparatus comprises: a storage device that stores a first database storing a first parameter obtained by analyzing a voice and a second database storing a second parameter obtained by analyzing a voice with vibrato; an input device that inputs information for a voice to be synthesized; a generating device that generates a third parameter based on the first parameter read from the first database and the second parameter read from the second database in accordance with the input information; and a synthesizing device that synthesizes the voice in accordance with the third parameter. A very real vibrato effect can be added to a synthesized voice.
    Type: Grant
    Filed: August 30, 2002
    Date of Patent: June 17, 2008
    Assignee: Yamaha Corporation
    Inventors: Yasuo Yoshioka, Alex Loscos
  • Patent number: 7379873
    Abstract: Voice synthesis unit data stored in a phoneme database 10 is selected by a voice synthesis unit selector 12 in accordance with MIDI information stored in a performance data storage unit 11. Characteristic parameters are derived from the selected voice synthesis unit data. A characteristic parameter correction unit 21 corrects the characteristic parameters based on pitch information, etc. A spectrum envelope generating unit 23 generates a spectrum envelope in accordance with the corrected characteristic parameter. A timbre transformation unit 25 changes timbre by correcting the characteristic parameters in accordance with timbre transformation parameters in a time axis. Timbres in the same song position can be transformed into different arbitrary timbres respectively; therefore, the synthesized singing voice will be rich in variety and reality.
    Type: Grant
    Filed: July 3, 2003
    Date of Patent: May 27, 2008
    Assignee: Yamaha Corporation
    Inventor: Hideki Kemmochi
  • Patent number: 7376563
    Abstract: A system for rehabilitation of a hearing disorder which comprises at least one acoustic sensor for picking up an acoustic signal and converting it into an electrical audio signal, an electronic signal processing unit for audio signal processing and amplification, an electrical power supply unit which supplies individual components of the system with current, and an actuator arrangement which is provided with one or more electroacoustic, electromechanical or purely electrical output-side actuators or any combination of these actuators for stimulation of damaged hearing, wherein the signal processing unit has a speech analysis and recognition module and a speech synthesis module.
    Type: Grant
    Filed: July 2, 2001
    Date of Patent: May 20, 2008
    Assignee: Cochlear Limited
    Inventors: Hans Leysieffer, Bernd Waldmann
  • Patent number: 7359854
    Abstract: A solution for improving the perceived sound quality of a decoded acoustic signal is accomplished by extending the spectrum of a received narrow-band acoustic signal (aNB). A wide-band acoustic signal (AWB) is produced by extracting at least one essential attribute (zNB) from the narrow-band acoustic signal (aNB). Parameters, e.g., representing signal energies, with respect to wide-band frequency components outside the spectrum (ANB) of the narrow-band acoustic signal (aNB), are estimated based on the at least one essential attribute (zNB). This estimation involves allocating a parameter value to a wide-band frequency component, based on a corresponding confidence level.
    Type: Grant
    Filed: April 10, 2002
    Date of Patent: April 15, 2008
    Assignee: Telefonaktiebolaget LM Ericsson (publ)
    Inventors: Mattias Nilsson, Bastiaan Kleijn
  • Patent number: 7328159
    Abstract: An improved system for an interactive voice recognition system (400) includes a voice prompt generator (401) for generating voice prompt in a first frequency band (501). A speech detector (406) detects presence of speech energy in a second frequency band (502). The first and second frequency bands (501, 502) are essentially conjugate frequency bands. A voice data generator (412) generates voice data based on an output of the voice prompt generator (401) and audible speech of a voice response generator (402). A control signal (422) controls the voice prompt generator (401) based on whether the speech detector (406) detects presence of speech energy in the second frequency band (502). A back end (405) of the interactive voice recognition system (400) is configured to operate on an extracted front end voice feature based on whether the speech detector (406) detects presence of speech energy in the second frequency band (502).
    Type: Grant
    Filed: January 15, 2002
    Date of Patent: February 5, 2008
    Assignee: Qualcomm Inc.
    Inventors: Chienchung Chang, Narendranath Malayath
  • Patent number: 7313523
    Abstract: A method and apparatus is provided for generating speech that sounds more natural. In one embodiment, word prominence and latent semantic analysis are used to generate more natural sounding speech. A method for generating speech that sounds more natural may comprise generating synthesized speech having certain word prominence characteristics and applying a semantically-driven word prominence assignment model to specify word prominence consistent with the way humans assign word prominence. A speech representative of a current sentence is generated. The determination is made whether information in the current sentence is new or previously given in accordance with a semantic relationship between the current sentence and a number of preceding sentences. A word prominence is assigned to a word in the current sentence in accordance with the information determination.
    Type: Grant
    Filed: May 14, 2003
    Date of Patent: December 25, 2007
    Assignee: Apple Inc.
    Inventors: Jerome R. Bellegarda, Kim E. A. Silverman
  • Patent number: 7286986
    Abstract: A method of smoothing fundamental frequency discontinuities at boundaries of concatenated speech segments includes determining, for each speech segment, a beginning fundamental frequency value and an ending fundamental frequency value. The method further includes adjusting the fundamental frequency contour of each of the speech segments according to a linear function calculated for each particular speech segment, and dependent on the beginning and ending fundamental frequency values of the corresponding speech segment. The method calculates the linear function for each speech segment according to a coupled spring model with three springs for each segment. A first spring constant, associated with the first spring and the second spring, is proportional to a duration of voicing in the associated speech segment. A second spring constant, associated with the third spring, models a non-linear restoring force that resists a change in slope of the segment fundamental frequency contour.
    Type: Grant
    Filed: August 1, 2003
    Date of Patent: October 23, 2007
    Assignee: Rhetorical Systems Limited
    Inventor: David Talkin
  • Patent number: 7280969
    Abstract: A speech synthesis system is disclosed that utilizes a pitch contour resulting in a more natural-sounding speech. The present invention modifies the predicted pitch, b(t), for synthesized speech using a low frequency energy booster. The low frequency energy booster interpolates the discrete pitch values, if necessary, and increase the amount of energy of the pitch contour associated with low frequency values, such as all frequency values below 10 Hertz. The amount of energy of the pitch contour associated with low frequency values can be increased, for example, by adding band-limited noise (a carrier signal) to the pitch contour, b(t), or by filtering the pitch values with an impulse response filter having a pole at the desired low frequency value. The present invention serves to add vibrato to the to the original pitch contour, b(t), and thereby improves the naturalness of the synthetic waveform.
    Type: Grant
    Filed: December 7, 2000
    Date of Patent: October 9, 2007
    Assignee: International Business Machines Corporation
    Inventors: Ellen Marie Eide, Raimo Bakis
  • Publication number: 20070213987
    Abstract: The conversion of speech can be used to transform an utterance by a source speaker to match the speech characteristic of a target speaker, for applications such as dubbing a motion picture. During a training phase, utterances corresponding to the same sentences by both the target speaker and source speaker are force aligned according to the phonemes within the sentences. A transformation or mapping is trained so that each frame of the source utterances is mapped to a corresponding frame of the target utterance. After the completion of the training phase, a source utterance is divided into frames, which are transformed into target frames. After all target frames are created from the sequence of frames from the source utterance, a target utterance is created having the speech of the source speaker, but with the vocal characteristics of the target speaker.
    Type: Application
    Filed: March 8, 2006
    Publication date: September 13, 2007
    Applicant: Voxonic, Inc.
    Inventors: Oytun Turk, Levent Mustafa Arslan, Fred Deutsch
  • Patent number: 7251601
    Abstract: A speech synthesis method comprises selecting a predetermined formant parameters from formant parameters according to a pitch pattern, phoneme duration, and phoneme symbol string, generating a plurality of sine waves based on formant frequency and formant phase of the formant parameters selected, multiplying the sine waves by windowing functions of the selected formant parameters, respectively, to generate a plurality of formant waveforms, adding the formant waveforms to generate a plurality of pitch waveforms, and superposing the pitch waveforms according to a pitch period to generate a speech signal.
    Type: Grant
    Filed: March 21, 2002
    Date of Patent: July 31, 2007
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Takehiko Kagoshima, Masami Akamine
  • Patent number: 7249022
    Abstract: There are provided a singing voice-synthesizing method and apparatus capable of performing synthesis of natural singing voices close to human singing voices based on performance data being input in real time. Performance data is inputted for each phonetic unit constituting a lyric, to supply phonetic unit information, singing-starting time point information, singing length information, etc. Each performance data is inputted in timing earlier than the actual singing-starting time point, and a phonetic unit transition time length is generated. By using the phonetic unit transition time, the singing-starting time point information, and the singing length information, the singing-starting time points and singing duration times of the first and second phonemes are determined. In the singing voice synthesis, for each phoneme, a singing voice is generated at the determined singing-starting time point and continues to be generated for the determined singing duration time.
    Type: Grant
    Filed: December 1, 2005
    Date of Patent: July 24, 2007
    Assignee: Yamaha Corporation
    Inventors: Hiraku Kayama, Oscar Celma, Jaume Ortola
  • Patent number: 7241947
    Abstract: A singing voice synthesizing method and a singing voice synthesizing apparatus in which the singing voice is synthesized using performance data such as MIDI data. The performance data entered is analyzed as the musical information of the sound pitch, sound duration and the lyric (S2, S3). From the analyzed music information, the lyric is accorded to a string of sounds to form singing voice data (S5). Before delivering the singing voice data to a speech synthesizer, the sound range of the singing voice data is compared to the sound range of the speech synthesizer, and the key of the signing voice data and the performance data is changed so that the singing voice will be comprised within the sound range of the speech synthesizer (S9 to S12 and S14). A program, a recording medium and a robot apparatus, in which the singing voice is synthesized from performance data, are also disclosed.
    Type: Grant
    Filed: March 17, 2004
    Date of Patent: July 10, 2007
    Assignee: Sony Corporation
    Inventor: Kenichiro Kobayashi
  • Patent number: 7228273
    Abstract: A voice control method that allows vocal characteristics of a character to diversely be set in a computer game where characters are capable of voice output is provided. The voice control method comprises, converting a voice that is externally input or provided in advance, based upon attribute information on the character; and an output step for outputting the converted voice as voice of the character. According to this method, the voice produced by a character that appears in a computer game can be set in accordance with the character's characteristics and various voices for each character set by each player can be created.
    Type: Grant
    Filed: November 12, 2002
    Date of Patent: June 5, 2007
    Assignee: Sega Corporation
    Inventor: Yutaka Okunoki
  • Patent number: 7219061
    Abstract: Predetermined macrosegments of the fundamental frequency are determined by a neural network, and these predefined macrosegments are reproduced by fundamental-frequency sequences stored in a database. The fundamental frequency is generated on the basis of a relatively large text section which is analyzed by the neural network. Microstructures from the database are received in the fundamental frequency. The fundamental frequency thus formed is thus optimized both with regard to its macrostructure and to its microstructure. As a result, an extremely natural sound is achieved.
    Type: Grant
    Filed: October 24, 2000
    Date of Patent: May 15, 2007
    Assignee: Siemens Aktiengesellschaft
    Inventors: Caglayan Erdem, Martin Holzapfel
  • Patent number: 7203647
    Abstract: A speech output apparatus is disclosed, which can allow the user to easily catch synthetic speech when the synthetic speech is output upon being superposed on a music output. The apparatus output can output a music and synthetic speech that indicates contents of information such as an e-mail and is superposed on the music. When the synthetic speech is output to be superposed on the music during output, the apparatus gradually decreases a tone volume of the music.
    Type: Grant
    Filed: August 13, 2002
    Date of Patent: April 10, 2007
    Assignee: Canon Kabushiki Kaisha
    Inventors: Makoto Hirota, Hideo Kuboyama
  • Patent number: 7191132
    Abstract: A speech synthesiser is provided with a dialog-style selection arrangement responsive to a factor affecting intelligibility of speech output by the apparatus to select a dialog style intended to provide at least a minimum level of intelligibility of speech output by the synthesiser. The selected dialog style is used by a speech-application text provider when generating text-form utterances for a current speech application, these text-form utterances then being converted into speech form by a text-to-speech converter. The factor affecting intelligibility may be a measure of the intelligibility of the speech-form output or an environmental factor such as background noise in the user's environment.
    Type: Grant
    Filed: May 31, 2002
    Date of Patent: March 13, 2007
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Paul St John Brittan, Marianne Hickey, Roger Cecil Ferry Tucker
  • Patent number: 7189915
    Abstract: A singing voice synthesizing method for synthesizing the singing voice from performance data is disclosed. The input performance data are analyzed as the musical information of the pitch and the length of sounds and the lyric (S2 and S3). A track as the subject of the lyric is selected from the analyzed musical information (S5). A note the singing voice is allocated to is selected from the track (S6). The length of the note is changed to suit to a song being sung (S7). The voice quality suited to the singing is selected based on e.g. the track name/sequence name (S8) and singing voice data is prepared (S9). The singing voice is generated based on the singing voice data (S10).
    Type: Grant
    Filed: March 19, 2004
    Date of Patent: March 13, 2007
    Assignee: Sony Corporation
    Inventor: Kenichiro Kobayashi
  • Patent number: 7183482
    Abstract: A singing voice synthesizing method synthesizes the singing voice by exploiting performance data, such as MIDI data. The input performance data are analyzed as the musical information including the pitch and the length of the sounds and the lyric (S2 and S3). If the musical information analyzed lacks in the lyric information, an arbitrary lyric is donated to an arbitrary string of notes (S9, S11, S12 and S15). The singing voice is generated based on the so donated lyric (S17).
    Type: Grant
    Filed: March 19, 2004
    Date of Patent: February 27, 2007
    Assignee: Sony Corporation
    Inventor: Kenichiro Kobayashi
  • Patent number: 7184951
    Abstract: Methods and systems for digitally generating sound from phase and amplitude information of a narrow bandwidth signal, such as a narrow bandwidth locator signal. Phase-derivative information is determined from the phase information. The bandwidth of the phase-derivative information is spread out, or stretched, over a wider bandwidth, so that the frequency variations will be more perceptible to users. The result is combined with an audio band carrier frequency, the result of which controls an oscillator. The oscillator output is combined with the amplitude information to generate an analog audio signal that is modulated with the amplitude information and the phase-derivative information. The amplitude information wider bandwidth phase-derivative information are used to modulate an audio carrier in both frequency and amplitude.
    Type: Grant
    Filed: February 15, 2002
    Date of Patent: February 27, 2007
    Assignee: Radiodetection Limted
    Inventors: John Mark Royle, James Ian King, Richard David Pearson
  • Patent number: 7173178
    Abstract: A singing voice synthesizing method and a singing voice synthesizing apparatus in which the singing voice is synthesized using performance data such as MIDI data. The performance data entered is analyzed as the musical information of the sound pitch, sound duration and the lyric (S2, S3). From the analyzed music information, the lyric is accorded to a string of sounds to form singing voice data (S5, S6). The speech waveform of the singing voice is formulated from the singing voice data (S7, S8). The waveform of the music sound is formulated from the input performance data (S14). The portion of the performance data used for the singing voice is desirably not used in reproducing the music sound, or lowered in the reproducing sound volume. A program, a recording medium and a robot apparatus, in which the singing voice is synthesized from performance data, are also disclosed.
    Type: Grant
    Filed: March 15, 2004
    Date of Patent: February 6, 2007
    Assignee: Sony Corporation
    Inventor: Kenichiro Kobayashi
  • Patent number: 7162424
    Abstract: The invention relates to a method for defining a sequence of sound modules for synthesis of a speech signal in a tonal language corresponding to a sequence of speech modules. The method according to the invention differs from known methods in that the speech modules represent triphones, which each comprise one phoneme with the respective context, and with syllables in the tonal language being composed of one or more triphones. This results in a high level of flexibility for the synthesis of tonal languages.
    Type: Grant
    Filed: April 26, 2002
    Date of Patent: January 9, 2007
    Assignee: Siemens Aktiengesellschaft
    Inventors: Martin Holzapfel, Jianhua Tao
  • Patent number: 7117154
    Abstract: A voice converter synthesizes an output voice signal from an input voice signal and a reference voice signal. In the voice converter, an analyzer device analyzes a plurality of sinusoidal wave components contained in the input voice signal to derive a parameter set of an original frequency and an original amplitude representing each sinusoidal wave component. A source device provides reference information characteristic of the reference voice signal. A modulator device modulates the parameter set of each sinusoidal wave component according to the reference information. A regenerator device operates according to each of the parameter sets as modulated to regenerate each of the sinusoidal wave components so that at least one of the frequency and the amplitude of each sinusoidal wave component as regenerated varies from original one, and mixes the regenerated sinusoidal wave components altogether to synthesize the output voice signal.
    Type: Grant
    Filed: October 27, 1998
    Date of Patent: October 3, 2006
    Assignees: Yamaha Corporation, Pompeu Fabra University
    Inventors: Yasuo Yoshioka, Xavier Serra
  • Patent number: 7113909
    Abstract: A stereotypical sentence is synthesized into a voice of an arbitrary speech style. A third party is able to prepare prosody data and a user of a terminal device having a voice synthesizing part can acquire the prosody data. The voice synthesizing method determines a voice-contents identifier to point to a type of voice contents of a stereotypical sentence, prepares a speech style dictionary including speech style and prosody data which correspond to the voice-contents identifier, selects prosody data of the synthesized voice to be generated from the speech style dictionary, and adds the selected prosody data to a voice synthesizer 13 as voice-synthesizer driving data to thereby perform voice synthesis with a specific speech style. Thus, a voice of a stereotypical sentence can be synthesized with an arbitrary speech style.
    Type: Grant
    Filed: July 31, 2001
    Date of Patent: September 26, 2006
    Assignee: Hitachi, Ltd.
    Inventors: Nobuo Nukaga, Kenji Nagamatsu, Yoshinori Kitahara
  • Patent number: 7096183
    Abstract: A method is provided for customizing the speaking style of a speech synthesizer. The method includes: receiving input text; determining semantic information for the input text; determining a speaking style for rendering the input text based on the semantic information; and customizing the audible speech output of the speech synthesizer based on the identified speaking style.
    Type: Grant
    Filed: February 27, 2002
    Date of Patent: August 22, 2006
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventor: Jean-Claude Junqua
  • Patent number: 7069210
    Abstract: Method of and system for coding a sound signal (10) as multiple independent streams of frames (14, 15) by creating frames (1,2,3,4,5,6) using sinusoidal coding and then placing frame i into stream i modulo the number of streams, method of and system for reconstructing a sound signal (23) by decoding frames from multiple streams (21, 22) in an interleaved fashion and reconstructing missing frames by using information from surrounding frames, system for recording and playing back sound signals implementing the above two methods, where under normal circumstances both streams (31, 32) of a coded signal are stored, and when capacity on the storage medium (35) is low, only one of the two streams of a coded signal is stored while one of the two streams of existing coded signals is overwritten and allowing a decoder (37) to reconstruct a sound signal by using either both or the one available stream for that sound signal.
    Type: Grant
    Filed: November 29, 2000
    Date of Patent: June 27, 2006
    Assignee: Koninklijke Philips Electronics N.V.
    Inventor: Rakesh Taori
  • Patent number: 7069212
    Abstract: An audio decoding apparatus decodes high frequency component signals using a band expander that generates multiple high frequency subband signals from low frequency subband signals divided into multiple subbands and transmitted high frequency encoded information. The apparatus is provided with an aliasing detector and an aliasing remover. The aliasing detector detects the degree of occurrence of aliasing components in the multiple high frequency subband signals generated by the band expander. The aliasing remover suppresses aliasing components in the high frequency subband signals by adjusting the gain used to generate the high frequency subband signals. Thus occurrence of aliasing can be suppressed and the resulting degradation in sound quality can be reduced, even when real-valued subband signals are used in order to reduce the number of operations.
    Type: Grant
    Filed: September 11, 2003
    Date of Patent: June 27, 2006
    Assignees: Matsushita Elecric Industrial Co., Ltd., NEC Corporation
    Inventors: Naoya Tanaka, Osamu Shimada, Mineo Tsushima, Takeshi Norimatsu, Kok Seng Chong, Kim Hann Kuah, Sua Hong Neo, Toshiyuki Nomura, Yuichiro Takamizawa, Masahiro Serizawa
  • Patent number: 7065489
    Abstract: A voice synthesizing apparatus comprises: a memory that stores phoneme pieces having a plurality of different pitches for each phoneme represented by a same phoneme symbol; a reading device that reads a phoneme piece by using a pitch as an index; and a voice synthesizer that synthesizes a voice in accordance with the read phoneme piece.
    Type: Grant
    Filed: March 8, 2002
    Date of Patent: June 20, 2006
    Assignee: Yamaha Corporation
    Inventors: Yuji Hisaminato, Jordi Bonada Sanjaume
  • Patent number: 7058571
    Abstract: A wideband, high quality audio signal is decoded with few calculations at a low bitrate. Unwanted spectrum components accompanying sinusoidal signal injection by a synthesis subband filter built with real-value operations are suppressed by inserting a suppression signal to subbands adjacent to the subband to which the sine wave is injected. This makes it possible to inject a desired sinusoid with few calculations.
    Type: Grant
    Filed: July 30, 2003
    Date of Patent: June 6, 2006
    Assignees: Matsushita Electric Industrial Co., Ltd., NEC Corporation
    Inventors: Mineo Tsushima, Naoya Tanaka, Takeshi Norimatsu, Kok Seng Chong, Kim Hann Kuah, Sua Hong Neo, Toshiyuki Nomura, Osamu Shimada, Yuichiro Takamizawa, Masahiro Serizawa
  • Patent number: 7013278
    Abstract: A method for generating concatenative speech uses a speech synthesis input to populate a triphone-indexed database that is later used for searching and retrieval to create a phoneme string acceptable for a text-to-speech operation. Prior to initiating the “real time” synthesis process, a database is created of all possible triphone contexts by inputting a continuous stream of speech. The speech data is then analyzed to identify all possible triphone sequences in the stream, and the various units chosen for each context. During a later text-to-speech operation, the triphone contexts in the text are identified and the triphone-indexed phonemes in the database are searched to retrieve the best-matched candidates.
    Type: Grant
    Filed: September 5, 2002
    Date of Patent: March 14, 2006
    Assignee: AT&T Corp.
    Inventor: Alistair D. Conkie
  • Patent number: 7003467
    Abstract: The present invention provides a method of decoding two-channel matrix encoded audio to reconstruct multichannel audio that more closely approximates a discrete surround-sound presentation. This is accomplished by subband filtering the two-channel matrix encoded audio, mapping each of the subband signals into an expanded sound field to produce multichannel subband signals, and synthesizing those subband signals to reconstruct multichannel audio. By steering the subbands separately about an expanded sound field, various sounds can be simultaneously positioned about the sound field at different points allowing for more accurate placement and more distinct definition of each sound element.
    Type: Grant
    Filed: October 6, 2000
    Date of Patent: February 21, 2006
    Assignee: Digital Theater Systems, Inc.
    Inventors: William P. Smith, Stephen M. Smyth, Ming Yan
  • Patent number: 6978241
    Abstract: An analyzer determines frequency and amplitudes of an audio signal represented by sinusoids for transmission transmitted to a receiver decoder which includes a synthesizer to reconstruct the audio signal. A pitch detector determines the pitch for transmission to the receiver along with the structure of the spectrum of the speech signal. The structure of the spectrum is often transmitted in the form of LPC parameters. To correct for frequency changes of the periodic component of an audio signal, a frequency change determiner determines a change of the frequency of the periodical component over the analysis period. This change of frequency is transmitted to the decoder for increasing the accuracy of the reconstruction of the audio signal. Further, the frequency change is only used to obtain a more accurate value of the pitch. The frequency change is determined by using a time warper which performs a time transformation such that a time transformed audio signal is obtained with a minimum frequency change.
    Type: Grant
    Filed: May 22, 2000
    Date of Patent: December 20, 2005
    Assignee: Koninklijke Philips Electronics, N.V.
    Inventors: Robert Johannes Sluijter, Augustus Josephus Elizabeth Maria Janssen
  • Patent number: 6975987
    Abstract: The present invention provides pitch conversion processing technology capable of minimizing the distortion of speech sound naturalness. A speech waveform in a pitch-unit is considered to be divided into two segments: 1) the segment of ?, that starts from the minus peak, where the waveform depending on the shape of vocal tracts appears, and 2) the segment of ? where the waveform depending on the vocal tract shape is attenuating and converging on the next minus peak. In addition, ? is the point where a minus peak appears along with the glottal closure. Based on characteristics of speech waveforms, the present invention processes waveform for converting pitch in the segment of ? just before the next minus peak, which is least affected by the minus peak associated with the glottal closure. As such, waveform processing can be performed by keeping the complete contour of waveform at around the peak, and thereby reducing the effects of pitch conversion.
    Type: Grant
    Filed: October 4, 2000
    Date of Patent: December 13, 2005
    Assignee: Arcadia, Inc.
    Inventors: Seiichi Tenpaku, Toshio Hirai
  • Patent number: 6970819
    Abstract: The principal object of this invention is to provide a suitable control method for closing length with respect to phonemes (such as unvoiced plosive consonants) having a closing interval, and as a result an improved rule-based speech synthesis device is provided. A phoneme type judgement part 201 judges whether the phoneme in question is a vowel or consonant and, in the case of a consonant, judges whether or not it is a consonant that anteriorly has a closing interval. As a result, it operates a vowel length estimation part 202 when it judges that the phoneme is a vowel and operates a consonant length estimation part 205 when it judges that the phoneme is a consonant, and when it has judged that this phoneme anteriorly has a closing interval, it operates a closing length estimation part 208, whereby the respective time lengths are estimated. After that, the estimated time lengths are set by vowel length setting part 203, consonant length setting part 206 and closing length setting part 209, respectively.
    Type: Grant
    Filed: October 27, 2000
    Date of Patent: November 29, 2005
    Assignee: Oki Electric Industry Co., Ltd.
    Inventor: Yukio Tabei
  • Patent number: 6961704
    Abstract: An arrangement is provided for text to speech processing based on linguistic prosodic models. Linguistic prosodic models are established to characterize different linguistic prosodic characteristics. When an input text is received, a target unit sequence is generated with a linguistic target that annotates target units in the target unit sequence with a plurality of linguistic prosodic characteristics so that speech synthesized in accordance with the target unit sequence and the linguistic target has certain desired prosodic properties. A unit sequence is selected in accordance with the target unit sequence and the linguistic target based on joint cost information evaluated using established linguistic prosodic models. The selected unit sequence is used to produce synthesized speech corresponding to the input text.
    Type: Grant
    Filed: January 31, 2003
    Date of Patent: November 1, 2005
    Assignee: Speechworks International, Inc.
    Inventors: Michael S. Phillips, Daniel S. Faulkner, Marek A. Przezdzieci
  • Patent number: 6950798
    Abstract: A text-to-speech synthesizer employs database that includes units. For each unit there is a collection of unit selection parameters and a plurality of frames. Each frame has a set of model parameters derived from a base speech frame, and a speech frame synthesized from the frame's model parameters. A text to be synthesized is converted to a sequence of desired unit features sets, and for each such set the database is perused to retrieve a best-matching unit. An assessment is made whether modifications to the frames are needed, because of discontinuities in the model parameters at unit boundaries, or because of differences between the desired and selected unit features. When modifications are necessary, the model parameters of frames that need to be altered are modified, and new frames are synthesized from the modified model parameters and concatenated to the output. Otherwise, the speech frames previously stored in the database are retrieved and concatenated to the output.
    Type: Grant
    Filed: March 2, 2002
    Date of Patent: September 27, 2005
    Assignee: AT&T Corp.
    Inventors: Mark Charles Beutnagel, David A. Kapilow, Ioannis G. Stylianou, Ann K. Syrdal
  • Patent number: 6928410
    Abstract: A method and apparatus for modification of a speech signal indicative of a stream of speech data having a plurality of syllables. The method comprises the steps of mapping the stream of speech data from the speech signal into a stream of tone data according to a linguistic rule regarding the syllables for providing a tone signal indicative of the stream of tone data; forming a string of musical notes responsive to the tone signal for providing a carrier signal indicative of the string of musical notes; modulating the carrier signal with the speech signal for providing a modified signal; and providing an audible signal representative of the speech signal, musically modified according to the linguistic rule. The linguistic rule includes an assignment of a tone to a syllable of the speech data based on a vowel of the syllable, a consonant of the syllable, the intonation of the syllable for a monosyllabic language.
    Type: Grant
    Filed: November 6, 2000
    Date of Patent: August 9, 2005
    Assignee: Nokia Mobile Phones Ltd.
    Inventors: Juha Marila, Sami Ronkainen, Mika Röykkee, Fumiko Ichikawa
  • Patent number: 6882976
    Abstract: An efficient finite length POW10 calculation for MPEG audio encoding. A method for encoding an audio input signal includes storing a plurality of predetermined tonal values corresponding to a plurality of predetermined power levels. The method also includes receiving a plurality of input values each representative of a power level of a spectral component of the audio input signal at a corresponding frequency sub-band and accessing at least one corresponding tonal value of the plurality of predetermined tonal values. The method further includes generating an encoded output signal representative of the audio input signal by using at least one corresponding tonal value for each of the plurality of input values. Further, the storing of the plurality of predetermined tonal values is performed prior to the receiving of the plurality of input values.
    Type: Grant
    Filed: February 28, 2001
    Date of Patent: April 19, 2005
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Wei-Lien Hsu, Travis Wheatley
  • Patent number: RE39336
    Abstract: The concatenative speech synthesizer employs demi-syllable subword units to generate speech. The synthesizer is based on a source-filter model that uses source signals that correspond closely to the human glottal source and that uses filter parameters that correspond closely to the human vocal tract. Concatenation of the demi-syllable units is facilitated by two separate cross face techniques, one applied in the time domain in the demi-syllable source signal waveforms, and one applied in the frequency domain by interpolating the corresponding filter parameters of the concatenated demi-syllables. The dual cross fade technique results in natural sounding synthesis that avoids time-domain glitches without degrading or smearing characteristic resonances in the filter domain.
    Type: Grant
    Filed: November 5, 2002
    Date of Patent: October 10, 2006
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Steve Pearson, Nicholas Kibre, Nancy Niedzielski