Frequency Element Patents (Class 704/268)
  • Patent number: 6873954
    Abstract: Audio artifacts due to overrun or underrun in a playout buffer caused by the sampling rates at a sending and receiving side not being at the same rate are reduced. An LPC-residual is modified on a sample-by-sample basis. The LPC-residual block, which includes N samples, is converted to a block comprising N+1 or N?1 samples. A sample rate controller decides whether samples should be added to or removed from the LPC-residual. The exact position at which to add respective remove samples is either chosen arbitrarily or found by searching for low energy segments in the LPC-residual. A speech synthesiser module then reproduces the speech. By using the proposed sample rate conversion method the playout buffer can be continuously controlled. Furthermore, since the method works on a sample-by-sample basis the buffer can be kept to a minimum and hence no extra delay is introduced.
    Type: Grant
    Filed: September 5, 2000
    Date of Patent: March 29, 2005
    Assignee: Telefonaktiebolaget LM Ericsson (publ)
    Inventors: Jim Sundqvist, Tomas Frankkila, Anders Nohlgren
  • Patent number: 6862568
    Abstract: A method for converting text to concatenated voice by utilizing a digital voice library and a set of playback rules is provided. The method comprises generating voice data based on a sequence of voice recordings by concatenating adjacent recordings in the sequence of voice recordings. Concatenating a first recording and a second recording adjacent to the first recording includes manipulating the ending sonic feature of the first recording to determine a first recording switch point, manipulating the starting sonic feature of the second recording to determine a second recording switch point, and synchronizing the first recording switch point and the second recording switch point.
    Type: Grant
    Filed: March 27, 2001
    Date of Patent: March 1, 2005
    Assignee: Qwest Communications International, Inc.
    Inventor: Eliot M. Case
  • Patent number: 6845359
    Abstract: A Fast Fourier Transform (FFT) based voice synthesis method 110, program product and vocoder. Sounds, e.g., speech and audio, are synthesized from multiple sine waves. Each sine wave component is represented by a small number of FFT coefficients 116. Amplitude 120 and phase 124 information of the components may be incorporated into these coefficients. The FFT coefficients corresponding to each of the components are summed 126 and, then, an inverse FFT is applied 128 to the sum to generate a time domain signal. An appropriate section is extracted 130 from the inverse transformed time domain signal as an approximation to the desired output. FFT based synthesis 110 may be combined with simple sine wave summation 100, using FFT based synthesis 110 for complex sounds, e.g., male voices and unvoiced speech, and sine wave summation 100 for simpler sounds, e.g., female voices.
    Type: Grant
    Filed: March 22, 2001
    Date of Patent: January 18, 2005
    Assignee: Motorola, Inc.
    Inventor: Tenkasi Ramabadran
  • Publication number: 20040267532
    Abstract: An audio encoder, for encoding an input signal, comprising:
    Type: Application
    Filed: June 29, 2004
    Publication date: December 30, 2004
    Applicant: Nokia Corporation
    Inventor: Alastair Black
  • Patent number: 6836761
    Abstract: A voice converting apparatus is constructed for converting an input voice into an output voice according to a target voice. In the apparatus, a storage section provisionally stores source data, which is associated to and extracted from the target voice. An analyzing section analyzes the input voice to extract therefrom a series of input data frames representing the input voice. A producing section produces a series of target data frames representing the target voice based on the source data, while aligning the target data frames with the input data frames to secure synchronization between the target data frames and the input data frames. A synthesizing section synthesizes the output voice according to the target data frames and the input data frames.
    Type: Grant
    Filed: October 20, 2000
    Date of Patent: December 28, 2004
    Assignees: Yamaha Corporation, Pompeu Fabra University
    Inventors: Takahiro Kawashima, Yasuo Yoshioka, Pedro Cano, Alex Loscos, Xavier Serra, Mark Schiementz, Jordi Bonada
  • Patent number: 6829939
    Abstract: Sounds input via a sound input device are corrected in accordance with age-based hearing characteristic data read from a memory or correction values for reference hearing characteristics for individual hearing characteristics-measured by an individual hearing characteristics measurement device, and the sounds corresponding to the hearing characteristics are measured and displayed by the display.
    Type: Grant
    Filed: April 29, 2002
    Date of Patent: December 14, 2004
    Assignees: National Institute of Advanced Industrial Science and Technology, Kenji Kurakata
    Inventors: Kenji Kurakata, Yasuo Kuchinomachi
  • Patent number: 6832192
    Abstract: A speech synthesizing apparatus acquires a synthesis unit speech segment divided as a speech synthesis unit, and acquires partial speech segments by dividing the synthesis unit speech segment with a phoneme boundary. The power value required for each partial speech segment is estimated on the basis of a target power value in reproduction. An amplitude magnification is acquired from the ratio of the estimated power value to the reference power value for each of the partial speech segments. Synthesized speech is generated by changing the amplitude of each partial speech segment of the synthesis unit speech segment on the basis of the acquired amplitude magnification.
    Type: Grant
    Filed: March 29, 2001
    Date of Patent: December 14, 2004
    Assignee: Canon Kabushiki Kaisha
    Inventor: Masayuki Yamada
  • Patent number: 6826531
    Abstract: A speech information processing apparatus synthesizes speech with natural intonation by modeling time change in fundamental frequency of a predetermined unit of phoneme. When a predetermined unit of phonological series is inputted, fundamental frequencies of respective phonemes constructing the phonological series are generated based on a segment pitch pattern model. Phonemes are synthesized based on the generated fundamental frequencies of the respective phonemes.
    Type: Grant
    Filed: March 28, 2001
    Date of Patent: November 30, 2004
    Assignee: Canon Kabushiki Kaisha
    Inventor: Toshiaki Fukada
  • Patent number: 6816833
    Abstract: An audio processing apparatus is constructed for generating an auxiliary audio signal based on an original audio signal and mixing the auxiliary audio signal to the original audio signal. In the apparatus, a control section designates a pitch of the auxiliary audio signal. A processing section processes the original audio signal under control of the control section to generate the auxiliary audio signal having the designated pitch, and applies a first effect to the generated auxiliary audio signal. An effector section applies a second effect different from the first effect to the original audio signal. An output section outputs the original audio signal applied with the second effect concurrently with the auxiliary audio signal applied with the first effect. The control section may control the processing section to alter the first effect dependently on a difference between a pitch of the original audio signal and the designated pitch of the auxiliary audio signal.
    Type: Grant
    Filed: October 30, 1998
    Date of Patent: November 9, 2004
    Assignee: Yamaha Corporation
    Inventors: Kazuhide Iwamoto, Shinichi Ito
  • Patent number: 6778962
    Abstract: A speech synthesizing method includes determining the accent type of the input character string, selecting the prosodic model data from a prosody dictionary for storing typical ones of the prosodic models representing the prosodic information for the character strings in a word dictionary, based on the input character string and the accent type, transforming the prosodic information of the prosodic model when the character string of the selected prosodic model is not coincident with the input character string, selecting the waveform data corresponding to each character of the input character string from a waveform dictionary, based on the prosodic model data after transformation, and connecting the selected waveform data with each other. Therefore, a difference between an input character string and a character string stored in a dictionary is absorbed, then it is possible to synthesize a natural voice.
    Type: Grant
    Filed: July 21, 2000
    Date of Patent: August 17, 2004
    Assignees: Konami Corporation, Konami Computer Entertainment Tokyo, Inc.
    Inventors: Osamu Kasai, Toshiyuki Mizoguchi
  • Publication number: 20040148172
    Abstract: A method and apparatus for synthesizing audible phrases (words) that includes capturing a spoken utterance, which may be a word, and extracting prosodic information (parameters) there from, then applying the prosodic parameters to a synthesized (nominal) word to produce a prosodic mimic word corresponding to the spoken utterance and the nominal word.
    Type: Application
    Filed: September 8, 2003
    Publication date: July 29, 2004
    Applicant: Voice Signal Technologies, Inc,
    Inventors: Jordan Cohen, Daniel L. Roth, Igor Zlokarnik
  • Patent number: 6754630
    Abstract: In a method of synthesizing voiced speech from pitch prototype waveforms by time-synchronous waveform interpolation (TSWI), one or more pitch prototypes is extracted from a speech signal or a residue signal. The extraction process is performed in such a way that the prototype has minimum energy at the boundary. Each prototype is circularly shifted so as to be time-synchronous with the original signal. A linear phase shift is applied to each extracted prototype relative to the previously extracted prototype so as to maximize the cross-correlation between successive extracted prototypes. A two-dimensional prototype-evolving surface is constructed by unsampling the prototypes to every sample point. The two-dimensional prototype-evolving surface is re-sampled to generate a one-dimensional, synthesized signal frame with sample points defined by piecewise continuous cubic phase contour functions computed from the pitch lags and the phase shifts added to the extracted prototypes.
    Type: Grant
    Filed: November 13, 1998
    Date of Patent: June 22, 2004
    Assignee: Qualcomm, Inc.
    Inventors: Amitava Das, Eddie L. T. Choy
  • Patent number: 6741666
    Abstract: A method and a device by which original digital signals are analysis-filtered, where the original digital signals include original samples representing physical quantities, and where the original samples are transformed by successive calculation steps into high and low frequency output samples. Any sample calculated at a given step is calculated by a predetermined function of the original samples and/or previously calculated samples, where the samples are ordered by increasing rank. The signal is processed by successive input blocks of samples, where the calculations made on an input block under consideration take into account only the original or calculated samples belonging to the input block under consideration, and where the input block under consideration and the following input block overlap over a predetermined number of original samples. Output blocks are formed, where each output block corresponds respectively to an input block.
    Type: Grant
    Filed: January 11, 2000
    Date of Patent: May 25, 2004
    Assignee: Canon Kabushiki Kaisha
    Inventors: Félix Henry, Bertrand Berthelot, Eric Majani
  • Patent number: 6741962
    Abstract: A speech recognition system for recognizing an input voice of a narrow frequency band. The speech recognition system includes: a frequency band converting unit for converting the input voice of the narrow frequency band into a pseudo voice of a wide frequency band which covers an entirety of the narrow frequency band and which is wider than the narrow frequency band.
    Type: Grant
    Filed: March 7, 2002
    Date of Patent: May 25, 2004
    Assignee: NEC Corporation
    Inventor: Kenichi Iso
  • Publication number: 20040034530
    Abstract: A highly integrated data structure for synthesizing a waveform is provided for facilitating integrated handling of the data. The data structure for waveform synthesis data or use in generation of a target waveform comprises, at a macro level, a macro waveform value data field for storing a waveform value data section including source waveform value data for use in the generation of the target waveform, and a macro (first) header including control data for forming a macro waveform in the target waveform using the source waveform value data included in the data field. At a micro level, the data structure according to the present invention comprises a micro waveform value data field, and a micro (second) header for generating a micro waveform in the target waveform using the waveform value data included in the micro waveform value data field.
    Type: Application
    Filed: May 29, 2003
    Publication date: February 19, 2004
    Inventor: Tomomi Hara
  • Patent number: 6691083
    Abstract: Wideband speech is synthesized from a bandlimited speech signal, for example from speech which has been transmitted via the public switched telephone network. Due to the nature of the vocal tract, there is a correlation between a bandlimited signal and those parts of an original wideband speech signal which are missing from that signal. Narrowband speech is characterized in terms of estimated formant frequencies provided by a peak picker. The frequency of formants in speech give a good indication, for voiced sounds, as to the shape of the vocal tract. The set of frequencies provided by the peak picker is used to access a codebook which provides synthesis parameters for use by a synthesizer.
    Type: Grant
    Filed: August 31, 2000
    Date of Patent: February 10, 2004
    Assignee: British Telecommunications public limited company
    Inventor: Andrew Paul Breen
  • Patent number: 6691081
    Abstract: A digital signal processor for processing data including voice messaging data that may have both voiced and unvoiced speech components utilizes computer routines stored in a memory used by the digital signal processor. The computer routines programmed provide for control of at least a portion of a selective call receiver; receiving and decoding data received at the selective call receiver; comparing the addresses received at the selective call receiver with addresses stored in a memory location coupled to the digital signal processor; controlling voicing including both voiced and unvoiced speech components; and generating a pitch wave using an inverse discrete Fourier Transform and resample the pitch wave to provide a time domain voiced speech component.
    Type: Grant
    Filed: April 28, 2000
    Date of Patent: February 10, 2004
    Assignee: Motorola, Inc.
    Inventors: Jian-Cheng Huang, Kenneth D. Finlon, Floyd D. Simpson
  • Publication number: 20040024600
    Abstract: When pitch of a speech segment is being modified from a current pitch to a requested pitch, and the difference between these is relatively large, a pitch modification algorithm is used to modify the pitch of the speech segment. When the difference between current and requested pitches is relatively small, the pitch of the speech segment is not modified. After one or the other speech modification techniques are used, then the resultant modified speech segment is overlapped and added to previously modified speech segments. A modification ratio is determined in order to quantify the difference between the current and requested pitches for a speech segment. The modification ratio is a ratio between the requested and current pitches. Low and high ratio thresholds are used to determine when pitch is being modified to a predetermined high degree, and whether pitch of the speech segment will or will not be modified.
    Type: Application
    Filed: July 30, 2002
    Publication date: February 5, 2004
    Applicant: International Business Machines Corporation
    Inventors: Wael Mohamed Hamza, Michael Alan Picheny
  • Patent number: 6687674
    Abstract: Waveform data of a plurality of loop waveforms are prestored along with initial phase information corresponding to the loop waveforms. Two of the loop waveforms are selected and repeatedly read out to thereby form loop-reproduced waveforms corresponding to the selected two loop waveforms. These loop-reproduced waveforms are connected together, for example, through cross-fade synthesis, during which time the phases of the loop-reproduced waveforms are adjusted to match with each other using the corresponding initial phase information. Generation of loop read addresses is carried out in such a way that first and second address signals for reading out first and second loop waveforms, respectively, are caused to loop while maintaining a difference between the first and second addresses corresponding to a difference between the initial phases of the selected loop waveforms.
    Type: Grant
    Filed: July 28, 1999
    Date of Patent: February 3, 2004
    Assignee: Yamaha Corporation
    Inventors: Hideo Suzuki, Masao Sakama, Umeyama Yasuyuki
  • Publication number: 20040015359
    Abstract: A signal connecting method and apparatus is provided which can reduce noises and create natural synthesized voices. The signal connecting method (or apparatus) for connecting a plurality of waveform signals and creating a synthesized waveform signal, has: a step (or unit) for determining an upper limit frequency of a frequency spectrum of each of the plurality of waveform signals; and a step (or unit) for filtering at least a connection portion of each waveform signal by using predetermined filter characteristics having the determined upper limit frequency. The cut-off frequency of the filtering is the higher upper limit frequency in upper limit frequencies of spectra of adjacent two waveform signals before and after the connection portion of the waveform signals. Higher harmonics to be caused by discontinuity of the connection portion of waveform signals can be effectively removed and noises of synthesized waveform signals can be reduced considerably.
    Type: Application
    Filed: February 27, 2003
    Publication date: January 22, 2004
    Inventors: Yasushi Sato, Davin Patrick
  • Patent number: 6665637
    Abstract: The present invention relates to the concealment of errors in decoded acoustic signals caused by encoded data representing the acoustic signals being partially lost or damaged during transmission over a transmission medium. In case of lost data or received damaged data a secondary reconstructed signal is produced on basis of a primary reconstructed signal. This signal has a spectrally adjusted spectrum (Z4E), such that it deviates less with respect spectral shape from a spectrum (Z3) of a previously reconstructed signal produced from previously received data than a spectrum (Z′4) of the primary reconstructed signal.
    Type: Grant
    Filed: October 19, 2001
    Date of Patent: December 16, 2003
    Assignee: Telefonaktiebolaget LM Ericsson (publ)
    Inventor: Stefan Bruhn
  • Patent number: 6658382
    Abstract: An input signal is time-frequency transformed, then the frequency-domain coefficients are divided into coefficient segments of about 100 Hz width to generate a sequence of coefficient segments, and the sequence of coefficient segments is split into subbands each consisting of plural coefficient segments. A threshold value is determined based on the intensity of each coefficient segment in each subband. The intensity of each coefficient segment is compared with the threshold value, and the coefficient segments are classified into low- and high-intensity groups. The coefficient segments are quantized for each group, or they are flattened respectively and then quantized through recombination.
    Type: Grant
    Filed: March 23, 2000
    Date of Patent: December 2, 2003
    Assignee: Nippon Telegraph and Telephone Corporation
    Inventors: Naoki Iwakami, Takehiro Moriya, Akio Jin, Kazuaki Chikira, Takeshi Mori
  • Patent number: 6629067
    Abstract: A range control system includes an input section for inputting a singing voice, a fundamental frequency extracting section for extracting a fundamental frequency of the inputted voice, and a pitch control section for performing a pitch control of the inputted voice so as to match the extracted fundamental frequency with a given frequency. The system further includes a formant extracting section for extracting a formant of the inputted voice, and a formant filter section for performing a filter operation relative to the pitch-controlled voice so that the pitch-controlled voice has a characteristic of the extracted formant. The system further includes an input loudness detecting section for detecting a first loudness of the inputted voice, and a loudness control section for controlling a second loudness of the voice subjected to the filter operation to match with the first loudness.
    Type: Grant
    Filed: May 14, 1998
    Date of Patent: September 30, 2003
    Assignee: Kabushiki Kaisha Kawai Gakki Seisakusho
    Inventors: Tsutomu Saito, Hiroshi Kato, Youichi Kondo
  • Publication number: 20030163318
    Abstract: A compression/decompression device for speech synthesis allows an increased compression ratio of source signals and improved quality of synthesized speech. The position and amplitude of each pulse for exciting a filer for speech synthesis are calculated based on autocorrelation and cross-correlation. As the number of pulses (k) is increased one by one, an S/N (signal-to-noise ratio) at each pulse number k is successively calculated based on the autocorrelation and the cross-correlation. When the S/N exceeds a preset threshold, the number of pulses is determined and is used for the compression of a speech unit.
    Type: Application
    Filed: February 28, 2003
    Publication date: August 28, 2003
    Applicant: NEC Corporation
    Inventor: Masahiro Serizawa
  • Publication number: 20030154082
    Abstract: A speaker of encoded speech data recorded in a semiconductor storage device in an IC recorder is to be retrieved easily. An information receiving unit 10 in a speaker retrieval apparatus 1 reads out the encoded speech data recorded in a semiconductor storage device 107 in an IC recorder 100. A speech decoding unit 12 decodes the encoded speech data. A speaker frequency detection unit 13 discriminates the speaker based on a feature of the speech waveform decoded to find the frequency of conversation (frequency of occurrence) of the speaker in a preset time interval. A speaker frequency graph displaying unit 14 displays the speaker frequency on a picture as a two-dimensional graph having time and the frequency as two axes.
    Type: Application
    Filed: January 15, 2003
    Publication date: August 14, 2003
    Inventors: Yasuhiro Toguri, Masayuki Nishiguchi
  • Patent number: 6606600
    Abstract: Using a group of spectral components from an audio signal, a coder produces a digital data stream (&PHgr;) including the quantification indices (QE0m) of at least some of the spectral components (E0m). The coder, or a transcoder situated downstream, selects at least one pair of components exhibiting a maximum correlation out of the group of spectral components, and includes in its digital output data stream identification information for each pair selected. At least some of the quantification indices are included in the output data stream for only one of the components for each pair selected. The decoder will use these to obtain the suppressed indices for the other component of the pair.
    Type: Grant
    Filed: March 17, 2000
    Date of Patent: August 12, 2003
    Assignee: Matra Nortel Communications
    Inventors: Carlo Murgia, Gaël Richard, Philipe Lockwood
  • Patent number: 6594631
    Abstract: A method for forming phoneme data and a voice synthesizing apparatus for phoneme data in the voice synthesizing apparatus is provided. In this method and apparatus, an LPC coefficient is obtained for every phoneme and is set to temporary phoneme data and a first LPC Cepstrum based on the LPC coefficient is obtained. A second LPC Cepstrum is obtained based on each voice waveform signal which has been synthesized and generated by the voice synthesizing apparatus while the pitch frequency is changed step by step with a filter characteristic of the voice synthesizing apparatus being set to a filter characteristic according to the temporary phoneme data. Further, an error between the first and second LPC Cepstrums is obtained as an LPC Cepstrum distortion. Each phoneme in the phoneme group belonging to the same phoneme name in each of the phonemes is classified into a plurality of groups every frame length. The optimum phoneme is selected based on the LPC Cepstrum distortion every group from this group.
    Type: Grant
    Filed: September 7, 2000
    Date of Patent: July 15, 2003
    Assignee: Pioneer Corporation
    Inventors: Shisei Cho, Katsumi Amano, Hiroyuki Ishihara
  • Patent number: 6591234
    Abstract: A processor (300) is arranged to divide a communication signal into a plurality of frequency band signals including speech and noise components due to speech and noise. The processor generates first and second power signals for the frequency band signals. Each first power signal is based on estimating over a first time period the power of one of the frequency band signals. Each second power signal is based on estimating over a second time period less than the first time period the power of one of the frequency band signals. The processor generates condition signals representing conditions of the frequency band signals, and adjusts the gain of the frequency band signals in response to the condition signals to generate adjusted frequency band signals. The processor then combines the adjusted frequency band signals to generate an adjusted communication signal.
    Type: Grant
    Filed: January 7, 2000
    Date of Patent: July 8, 2003
    Assignee: Tellabs Operations, Inc.
    Inventors: Ravi Chandran, Daniel J. Marchok, Bruce E. Dunne
  • Patent number: 6539355
    Abstract: A bandwidth expanding method and apparatus in which frequency characteristics of high-frequency components of broad band signals can be adjusted to the liking of the user, overflow due to addition is prevented from occurring without power variations being perceived by a user, the number of broad band formants is reduced, and emphasis is attached to the rough structure of the spectrum, so that the produced broad band speech signals can be improved in quality. To this end, in a speech bandwidth expansion device, frequency characteristics of the frequency components not less than 3400 Hz are adjusted by preset alterable parameter values and summed to the original narrow band speech components. If overflow has occurred in a sample, the high-range gain of the sample is lowered to a level below the overflow level before proceeding to addition.
    Type: Grant
    Filed: October 14, 1999
    Date of Patent: March 25, 2003
    Assignee: Sony Corporation
    Inventors: Shiro Omori, Masayuki Nishiguchi
  • Patent number: 6535852
    Abstract: Building a data-driven text-to-speech system involves collecting a database of natural speech from which to train models or select segments for concatenation. Typically the speech in that database is produced by a single speaker. In this invention we include in our database speech from a multiplicity of speakers.
    Type: Grant
    Filed: March 29, 2001
    Date of Patent: March 18, 2003
    Assignee: International Business Machines Corporation
    Inventor: Ellen M. Eide
  • Publication number: 20030046079
    Abstract: A voice synthesizing apparatus comprises: a storage device that stores a first database storing a first parameter obtained by analyzing a voice and a second database storing a second parameter obtained by analyzing a voice with vibrato; an input device that inputs information for a voice to be synthesized; a generating device that generates a third parameter based on the first parameter read from the first database and the second parameter read from the second database in accordance with the input information; and a synthesizing device that synthesizes the voice in accordance with the third parameter. A very real vibrato effect can be added to a synthesized voice.
    Type: Application
    Filed: August 30, 2002
    Publication date: March 6, 2003
    Inventors: Yasuo Yoshioka, Alex Loscos
  • Patent number: 6513008
    Abstract: A speech synthesizer customization system provides a mechanism for generating a hierarchical customized user database. The customization system has a template management tool for generating the templates based on customization data from a user and associated replicated dynamic synthesis data from a text-to-speech (TTS) synthesizer. The replicated dynamic synthesis data is arranged in a dynamic data structure having hierarchical levels. The customization system further includes a user database that supplements a standard database of the synthesizer. The tool populates the user database with the templates such that the templates enable the user database to uniformly override subsequently generated speech synthesis data at all hierarchical levels of the dynamic data structure.
    Type: Grant
    Filed: March 15, 2001
    Date of Patent: January 28, 2003
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Steve Pearson, Peter Veprek, Jean-Claude Junqua
  • Patent number: 6505158
    Abstract: A method and system for providing concatenative speech uses a speech synthesis input to populate a triphone-indexed database that is later used for searching and retrieval to create a phoneme string acceptable for a text-to-speech operation. Prior to initiating the “real time” synthesis, a database is created of all possible triphone contexts by inputting a continuous stream of speech. The speech data is then analyzed to identify all possible triphone sequences in the stream, and the various units chosen for each context. During a later text-to-speech operation, the triphone contexts in the text are identified and the triphone-indexed phonemes in the database are searched to retrieve the best-matched candidates.
    Type: Grant
    Filed: July 5, 2000
    Date of Patent: January 7, 2003
    Assignee: AT&T Corp.
    Inventor: Alistair D. Conkie
  • Patent number: 6499014
    Abstract: The speech synthesis apparatus of the present invention includes: a text analyzer operable to generate a phonetic and prosodic symbol string from character information of an input text; a word dictionary storing a reading and an accent of a word; an voice segment dictionary storing a phoneme that is a basic unit of speech; a parameter generator operable to generate synthesizing parameters including at least a phoneme, a duration of the phoneme and a fundamental frequency for the phonetic and prosodic symbol string, the parameter generator including a calculating means operable to obtain a sum of phrase components and a sum of accent components and to calculate an average pitch from the sum of the phrase components and the sum of the accent components, and a determining means operable to determine a base pitch from the average pitch; and a waveform generator operable to generate a synthesized waveform by making waveform-overlapping referring to the synthesizing parameters generated by the parameter generator a
    Type: Grant
    Filed: March 7, 2000
    Date of Patent: December 24, 2002
    Assignee: Oki Electric Industry Co., Ltd.
    Inventor: Keiichi Chihara
  • Publication number: 20020184032
    Abstract: A voice synthesizing apparatus comprises: a memory that stores phoneme pieces having a plurality of different pitches for each phoneme represented by a same phoneme symbol; a reading device that reads a phoneme piece by using a pitch as an index; and a voice synthesizer that synthesizes a voice in accordance with the read phoneme piece.
    Type: Application
    Filed: March 8, 2002
    Publication date: December 5, 2002
    Inventors: Yuji Hisaminato, Jordi Bonada Sanjaume
  • Patent number: 6477495
    Abstract: A prosodic parameter for an input text is computed by storing a sentence of vocalized speech in a speech corpus memory, searching for a stored text having a similar prosody to an input text as a key to the speech corpus and modifying the prosodic parameter based upon the search results. Because a plurality of prosodic parameters are handled as a linking data, a synthesized sound similar to natural speech having a natural intonation and prosody is produced.
    Type: Grant
    Filed: March 1, 1999
    Date of Patent: November 5, 2002
    Assignee: Hitachi, Ltd.
    Inventors: Nobuo Nukaga, Yoshinori Kitahara, Keiko Fujita, Haru Ando, Shunichi Yajima
  • Publication number: 20020156631
    Abstract: A method and a system of producing a synthesized voice is provided. A voice sound waveform is provided at a voice sampling frequency based on pronunciation informations. A voice-less sound waveform is produced at a voice-less sampling frequency based on the pronunciation informations. The voice sampling frequency is converted into an output sampling frequency to produce a frequency-converted voice sound waveform with the output sampling frequency, wherein each of the voice sampling frequency and the voice-less sampling frequency is independent from the output sampling frequency. The voice-less sampling frequency is converted into the output sampling frequency to produce a frequency-converted voice-less sound waveform with the output sampling frequency.
    Type: Application
    Filed: April 18, 2002
    Publication date: October 24, 2002
    Applicant: NEC CORPORATION
    Inventor: Reishi Kondo
  • Patent number: 6438522
    Abstract: A method and apparatus for speech synthesis utilize a plurality of stored prosodic templates, each having been generated based on a series of enunciations of a single syllable executed in accordance with the rythm, pitch and speech power variations of an enunciated sample speech item, whereby the templates express rythm, speech power and pitch characteristics of respectively different sample speech items. Data representing an object speech item are converted to a sequence of acoustic waveform segments which respectively express the syllables of the speech item, the number of morae (syllable intervals) and the accent type of the speech item are judged and a prosodic template having the same number of morae and accent type is selected, and waveform shaping is applied to the waveform segments such as to match the rythm, speech power and pitch characteristics of the object speech item to those expressed by the selected prosodic template.
    Type: Grant
    Filed: September 22, 1999
    Date of Patent: August 20, 2002
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Toshimitsu Minowa, Hirofumi Nishimura, Ryo Mochizuki
  • Patent number: 6424944
    Abstract: A text analyzing section converts given text data into syllable data. A melody producing section receives the converted syllable data together with the text data and a standard MIDI file. The syllable data are assigned to a melody of the standard MIDI file and sent to a sequencer section. A software synthesizer converts the syllable data into vocal sounds with the interval variable in accordance the melody.
    Type: Grant
    Filed: August 16, 1999
    Date of Patent: July 23, 2002
    Assignee: Victor Company of Japan Ltd.
    Inventor: Kazuo Hikawa
  • Patent number: 6421642
    Abstract: A device and method for reproducing musical sounds is disclosed. Musical sounds and voices are stored and reproduced with user-definable timing and pitch, with the timing and pitch being independently controllable in real time. Musical sounds are stored in waveform memory, and pitch and timing information may be received in real time. The stored musical sounds and voices are then reproduced in accordance with the received pitch and timing information. The reproduction of stored musical sounds can also be stopped and resumed at user-definable marks.
    Type: Grant
    Filed: May 2, 2000
    Date of Patent: July 16, 2002
    Assignee: Roland Corporation
    Inventor: Takashi Saruhashi
  • Publication number: 20020072909
    Abstract: A speech synthesis system is disclosed that utilizes a pitch contour resulting in a more natural-sounding speech. The present invention modifies the predicted pitch, b(t), for synthesized speech using a low frequency energy booster. The low frequency energy booster interpolates the discrete pitch values, if necessary, and increase the amount of energy of the pitch contour associated with low frequency values, such as all frequency values below 10 Hertz. The amount of energy of the pitch contour associated with low frequency values can be increased, for example, by adding band-limited noise (a carrier signal) to the pitch contour, b(t), or by filtering the pitch values with an impulse response filter having a pole at the desired low frequency value. The present invention serves to add vibrato to the to the original pitch contour, b(t), and thereby improves the naturalness of the synthetic waveform.
    Type: Application
    Filed: December 7, 2000
    Publication date: June 13, 2002
    Inventors: Ellen Marie Eide, Raimo Bakis
  • Patent number: 6377917
    Abstract: A prosody modification system and methodology calculates synchronization marks in an original, quasi-periodic signal to a finer precision than the sampling rate of the original signal. Synthetic synchronization marks are generated according to the desired prosody modification also to a finer precision than the sampling rate of the original signal. Waveforms are extracted from the original signal and are fine-shifted to the exact location on the synthetic time axis by a resampling technique. The fine-shifted waveforms are windowed by an asymmetric filtering window, overlapped, and summed together to produce a synthetic signal.
    Type: Grant
    Filed: November 4, 1999
    Date of Patent: April 23, 2002
    Assignee: Microsoft Corporation
    Inventors: Francisco M. Gimenez de los Galanes, David Thieme Talkin
  • Publication number: 20020007268
    Abstract: Encoding (2) a signal (A) is provided, wherein frequency and amplitude information of at least one sinusoidal component in the signal (A) is determined (20), and sinusoidal parameters (f,a) representing the frequency and amplitude information are transmitted (22), and wherein further a phase jitter parameter (p) is transmitted, which represents an amount of phase jitter that should be added during restoring the sinusoidal component from the transmitted sinusoidal parameters (f,a).
    Type: Application
    Filed: June 20, 2001
    Publication date: January 17, 2002
    Inventors: Arnoldus Werner Johannes Oomen, Albertus Cornelis Den Brinker
  • Patent number: 6336092
    Abstract: The invention is a method for transforming a source individual's voice so as to adopt the characteristics of a target individual's voice. The excitation signal component of the target individual's voice is extracted and the spectral envelope of the source individual's voice is extracted. The transformed voice is synthesized by applying the spectral envelope of the source individual to the excitation signal component of the voice of the target individual. A higher quality transformation is achieved using an enhanced excitation signal created by replacing unvoiced regions of the signal with interpolated data from adjacent voiced regions. Various methods of transforming the spectral characteristics of the source individual's voice are also disclosed.
    Type: Grant
    Filed: April 28, 1997
    Date of Patent: January 1, 2002
    Assignee: Ivl Technologies Ltd
    Inventors: Brian Charles Gibson, Peter Ronald Lupini, Dale John Shpak
  • Publication number: 20010051873
    Abstract: In a method of synthesizing voiced speech from pitch prototype waveforms by time-synchronous waveform interpolation (TSWI), one or more pitch prototypes is extracted from a speech signal or a residue signal. The extraction process is performed in such a way that the prototype has minimum energy at the boundary. Each prototype is circularly shifted so as to be time-synchronous with the original signal. A linear phase shift is applied to each extracted prototype relative to the previously extracted prototype so as to maximize the cross-correlation between successive extracted prototypes. A two-dimensional prototype-evolving surface is constructed by unsampling the prototypes to every sample point. The two-dimensional prototype-evolving surface is re-sampled to generate a one-dimensional, synthesized signal frame with sample points defined by piecewise continuous cubic phase contour functions computed from the pitch lags and the phase shifts added to the extracted prototypes.
    Type: Application
    Filed: November 13, 1998
    Publication date: December 13, 2001
    Inventors: AMITAVA DAS, EDDIE L. T. CHOY
  • Patent number: 6317713
    Abstract: Sound generating parameters are used for outputting fundamental frequency and a command regarding prosody, and a sound source generator. The sound generation device further includes use of an accent command and a descent command for calculating fundamental frequency and incorporates a rhythm command, which is representable by a sine wave. The device also uses character string analysis for analyzing a character string and generating a command concerning phoneme and prosody, a calculating element for outputting fundamental frequency as sound generation parameters, which depends on prosody, a sound source generator, and an articulator that depends on a phoneme command.
    Type: Grant
    Filed: January 6, 1999
    Date of Patent: November 13, 2001
    Assignee: Arcadia, Inc.
    Inventor: Seiichi Tenpaku
  • Patent number: 6311158
    Abstract: Techniques for synthesizing a time-domain signal. The time-domain signal is partitioned into a number of time-domain frames and a waveform in generated for each time-domain frame. Each waveform includes one or more sinusoids. The waveform is generated by selecting a sinusoid for synthesis and computing a set of parameter values (e.g. the start and end amplitude, frequency, and phase values) for the selected sinusoid. A template is determined for the selected sinusoid based on the computed parameter values and a selected window function. The frequency-domain template is such that the amplitude of the selected sinusoid in the time domain matches, at a time-domain frame boundary, the amplitude of a corresponding sinusoid in an adjacent time-domain frame. The template is added to a frequency-domain frame. The process is repeated for each sinusoid in the waveform. After all sinusoids have been processed, the frequency-domain frame is transformed to a time-domain frame.
    Type: Grant
    Filed: March 16, 1999
    Date of Patent: October 30, 2001
    Assignee: Creative Technology Ltd.
    Inventor: Jean Laroche
  • Patent number: 6308156
    Abstract: A digital speech synthesis process in which utterances in a language are recorded, and the recorded utterances are divided into speech segments which are stored so as to allow their allocation to specific phonemes. A text which is to be output as speech is converted to a phoneme chain and the stored segments are output in a sequence defined by the phoneme chain. An analysis of the text to be output as speech is carried out and thus provides information which completes the phoneme chain and modifies the timing sequence signal for the speech segments which are to be strung together for output as speech.
    Type: Grant
    Filed: September 14, 1998
    Date of Patent: October 23, 2001
    Assignee: G Data Software GmbH
    Inventors: William Barry, Ralf Benzmüller, Andreas Luning
  • Patent number: 6301556
    Abstract: An apparatus and method for reducing sparseness in a coded speech signal. Sparse codebook values are generated from a codebook. An anti-sparseness operation is performed on the sparse codebook values to produce output codebook values having a greater density of non-zero values than the sparse codebook values. The output codebook values are processed by a speech processor to generate an encoded speech signal during an encoding operation or a decoded speech signal during a decoding operation.
    Type: Grant
    Filed: December 22, 1999
    Date of Patent: October 9, 2001
    Assignee: Telefonaktiebolaget L M. Ericsson (publ)
    Inventors: Roar Hagen, Björn Stig Erik Johansson, Erik Ekudden, Willem Baastian Kleijn
  • Patent number: 6292775
    Abstract: A speech processing system (10) incorporates an analogue to digital converter (16) to digitize input speech signals for Fourier transformation to produce short-term spectral cross-sections. These cross-sections are compared with one hundred and fifty reference patterns in a store (34), the patterns having respective stored sets of formant frequencies assigned thereto by a human expert. Six stored patterns most closely matching each input cross-section are selected for further processing by dynamic programming, which indicates the pattern which is a best match to the input cross-section by using frequency-scale warping to achieve alignment. The stores formant frequencies of the best matching pattern are modified by the frequency warping, and the results are used as formant frequency estimates for the input cross-section. The frequencies are further refined on the basis of the shape of the input cross-section near to the chosen formants.
    Type: Grant
    Filed: February 18, 1999
    Date of Patent: September 18, 2001
    Assignee: The Secretary of State for Defence in Her Britannic Majesty's Government of the United Kingdom of Great Britain and Northern Ireland
    Inventor: John N Holmes