Transformation Patents (Class 704/269)
  • Patent number: 8055693
    Abstract: A set of words is converted to a corresponding set of particles, wherein the words and the particles are unique within each set. For each word, all possible partitionings of the word into particles are determined, and a cost is determined for each possible partitioning. The particles of the possible partitioning associated with a minimal cost are added to the set of particles.
    Type: Grant
    Filed: June 30, 2009
    Date of Patent: November 8, 2011
    Assignee: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Tony Ezzat, Evandro Gouvea
  • Patent number: 8024193
    Abstract: The present invention provides, among other things, automatic identification of near-redundant units in a large TTS voice table, identifying which units are distinctive enough to keep and which units are sufficiently redundant to discard. According to an aspect of the invention, pruning is treated as a clustering problem in a suitable feature space. All instances of a given unit (e.g. word or characters expressed as Unicode strings) are mapped onto the feature space, and cluster units in that space using a suitable similarity measure. Since all units in a given cluster are, by construction, closely related from the point of view of the measure used, they are suitably redundant and can be replaced by a single instance. The disclosed method can detect near-redundancy in TTS units in a completely unsupervised manner, based on an original feature extraction and clustering strategy.
    Type: Grant
    Filed: October 10, 2006
    Date of Patent: September 20, 2011
    Assignee: Apple inc.
    Inventor: Jerome R. Bellegarda
  • Patent number: 7966186
    Abstract: A system and method for generating a synthetic text-to-speech TTS voice are disclosed. A user is presented with at least one TTS voice and at least one voice characteristic. A new synthetic TTS voice is generated by blending a plurality of existing TTS voices according to the selected voice characteristics. The blending of voices involves interpolating segmented parameters of each TTS voice. Segmented parameters may be, for example, prosodic characteristics of the speech such as pitch, volume, phone durations, accents, stress, mis-pronunciations and emotion.
    Type: Grant
    Filed: November 4, 2008
    Date of Patent: June 21, 2011
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: David A. Kapilow, Kenneth H. Rosen, Juergen Schroeter
  • Patent number: 7958446
    Abstract: A browsing application for accessing resources over a network includes code for receiving a command from a user to translate textual material appearing on an arbitrary page displayed in the content display area, and code for causing the textual material to be passed to a translation resource on the network, whereby the display area of the browsing application is caused to display a page which includes a translation of the textual material. The application may display a menu accessible from an arbitrary page whereby the user may select among translation options such as a translate to language and a translate from language. The application may be configured to cause text selected by a user to be translated, and/or may cause an entire page to be translated. Translated text may be displayed along with graphics in a layout similar to that of the original page.
    Type: Grant
    Filed: October 31, 2005
    Date of Patent: June 7, 2011
    Assignee: Yahoo! Inc.
    Inventors: Edward Seitz, Brockton Davis, Derrick Whittle, James Bollas
  • Patent number: 7953600
    Abstract: A speech synthesis system receives symbolic input describing an utterance to be synthesized. In one embodiment, different portions of the utterance are constructed from different sources, one of which is a speech corpus recorded from a human speaker whose voice is to be modeled. The other sources may include other human speech corpora or speech produced using Rule-Based Speech Synthesis (RBSS). At least some portions of the utterance may be constructed by modifying prototype speech units to produce adapted speech units that are contextually appropriate for the utterance. The system concatenates the adapted speech units with the other speech units to produce a speech waveform. In another embodiment, a speech unit of a speech corpus recorded from a human speaker lacks transitions at one or both of its edges. A transition is synthesized using RBSS and concatenated with the speech unit in producing a speech waveform for the utterance.
    Type: Grant
    Filed: April 24, 2007
    Date of Patent: May 31, 2011
    Assignee: NovaSpeech LLC
    Inventors: Susan R. Hertz, Harold G. Mills
  • Patent number: 7945446
    Abstract: Spectrum envelope of an input sound is detected. In the meantime, a converting spectrum is acquired which is a frequency spectrum of a converting sound comprising a plurality of sounds, such as unison sounds. Output spectrum is generated by imparting the detected spectrum envelope of the input sound to the acquired converting spectrum. Sound signal is synthesized on the basis of the generated output spectrum. Further, a pitch of the input sound may be detected, and frequencies of peaks in the acquired converting spectrum may be varied in accordance with the detected pitch of the input sound. In this manner, the output spectrum can have the pitch and spectrum envelope of the input sound and spectrum frequency components of the converting sound comprising a plurality of sounds, and thus, unison sounds can be readily generated with simple arrangements.
    Type: Grant
    Filed: March 9, 2006
    Date of Patent: May 17, 2011
    Assignee: Yamaha Corporation
    Inventors: Hideki Kemmochi, Yasuo Yoshioka, Jordi Bonada
  • Patent number: 7835909
    Abstract: A method and apparatus for normalizing a histogram utilizing a backward cumulative histogram which can cumulate a probability distribution function in an order from a greatest to smallest value so as to estimate a noise robust histogram. A method of normalizing a speech feature vector includes: extracting the speech feature vector from a speech signal; calculating a probability distribution function using the extracted speech feature vector; calculating a backward cumulative distribution function by cumulating the probability distribution function in an order from a largest to smallest value; and normalizing a histogram using the backward cumulative distribution function.
    Type: Grant
    Filed: December 12, 2006
    Date of Patent: November 16, 2010
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: So-Young Jeong, Gil Jin Jang, Kwang Cheol Oh
  • Patent number: 7814284
    Abstract: A data redundancy elimination system.
    Type: Grant
    Filed: January 18, 2007
    Date of Patent: October 12, 2010
    Assignee: Cisco Technology, Inc.
    Inventors: Gideon Glass, Maxim Martynov, Qiwen Zhang, Etai Lev Ran, Dan Li
  • Patent number: 7716052
    Abstract: A method, apparatus and a computer program product to generate an audible speech word that corresponds to text. The method includes providing a text word and, in response to the text word, processing pre-recorded speech segments that are derived from a plurality of speakers to selectively concatenate together speech segments based on at least one cost function to form audio data for generating an audible speech word that corresponds to the text word. A data structure is also provided for use in a concatenative text-to-speech system that includes a plurality of speech segments derived from a plurality of speakers, where each speech segment includes an associated attribute vector each of which is comprised of at least one attribute vector element that identifies the speaker from which the speech segment was derived.
    Type: Grant
    Filed: April 7, 2005
    Date of Patent: May 11, 2010
    Assignee: Nuance Communications, Inc.
    Inventors: Andrew S. Aaron, Ellen M. Eide, Wael M. Hamza, Michael A. Picheny, Charles T. Rutherfoord, Zhi Wei Shuang, Maria E. Smith
  • Patent number: 7698139
    Abstract: In a method and apparatus for a differentiated voice output, systems existing in a vehicle, such as the on-board computer, the navigation system, and others, can be connected with a voice output device. The voice outputs of different systems can be differentiated by way of voice characteristics.
    Type: Grant
    Filed: June 20, 2003
    Date of Patent: April 13, 2010
    Assignee: Bayerische Motoren Werke Aktiengesellschaft
    Inventors: Georg Obert, Klaus-Josef Bengler
  • Publication number: 20100076769
    Abstract: Speech enhancement based on a psycho-acoustic model is disclosed that is capable of preserving the fidelity of speech while sufficiently suppressing noise including the processing artifact known as “musical noise”.
    Type: Application
    Filed: March 14, 2008
    Publication date: March 25, 2010
    Applicant: DOLBY LABORATORIES LICENSING CORPORATION
    Inventor: Rongshan Yu
  • Patent number: 7664645
    Abstract: The voice of a synthesized voice output is individualized and matched to a user voice, the voice of a communication partner or the voice of a famous personality. In this way mobile terminals in particular can be originally individualized and text messages can be read out using a specific voice.
    Type: Grant
    Filed: March 11, 2005
    Date of Patent: February 16, 2010
    Assignee: SVOX AG
    Inventors: Horst-Udo Hain, Klaus Lukas
  • Patent number: 7580843
    Abstract: A synthesis subband filter apparatus is provided. The apparatus is used for processing 18 sets of signals which each includes 32 subband sampling signals in accordance with a specification providing 512 window coefficients. The apparatus includes a processor for processing the 18 sets of signals in sequence. The processor further includes a converting module and a generating module. The converting module is used for converting the 32 subband sampling signals of the set of signals being processed into 32 converted vectors by use of 32-points discrete cosine transform (DCT), and writing the 32 converted vectors into 512 default vectors with a first-in, first-out queue. The generating module is used for generating 32 pulse code modulation (PCM) signals, relative to the set of signals being processed according to a set of synthesis formulae proposed in this invention.
    Type: Grant
    Filed: May 8, 2006
    Date of Patent: August 25, 2009
    Assignee: Quanta Computer, Inc.
    Inventors: Chih-Hsien Chang, Chih-Wei Hung, Hsien-Ming Tsai
  • Publication number: 20090187409
    Abstract: Techniques for efficiently encoding an input signal are described. In one design, a generalized encoder encodes the input signal (e.g., an audio signal) based on at least one detector and multiple encoders. The at least one detector may include a signal activity detector, a noise-like signal detector, a sparseness detector, some other detector, or a combination thereof. The multiple encoders may include a silence encoder, a noise-like signal encoder, a time-domain encoder, a transform-domain encoder, some other encoder, or a combination thereof. The characteristics of the input signal may be determined based on the at least one detector. An encoder may be selected from among the multiple encoders based on the characteristics of the input signal. The input signal may be encoded based on the selected encoder. The input signal may include a sequence of frames, and detection and encoding may be performed for each frame.
    Type: Application
    Filed: October 8, 2007
    Publication date: July 23, 2009
    Applicant: Qualcomm Incorporated
    Inventors: Venkatesh Krishnan, Vivek Rajendran, Ananthapadmanabhan A. Kandhadai
  • Patent number: 7562018
    Abstract: A language processing portion (31) analyzes a text from a dialogue processing section (20) and transforms the text to information on pronunciation and accent. A prosody generation portion (32) generates an intonation pattern according to a control signal from the dialogue processing section (20). A waveform DB (34) stores prerecorded waveform data together with pitch mark data imparted thereto. A waveform cutting portion (33) cuts desired pitch waveforms from the waveform DB (34). A phase operation portion (35) removes phase fluctuation by standardizing phase spectra of the pitch waveforms cut by the waveform cutting portion (33), and afterwards imparts phase fluctuation by diffusing only high phase components randomly according to the control signal from the dialogue processing section (20). The thus-produced pitch waveforms are placed at desired intervals and superimposed.
    Type: Grant
    Filed: November 25, 2003
    Date of Patent: July 14, 2009
    Assignee: Panasonic Corporation
    Inventors: Takahiro Kamai, Yumiko Kato
  • Patent number: 7558732
    Abstract: Method and system for computer-aided speed synthesis for synthesizing electronic text by performing a predefined series of rules-based analyses in a predefined order, each of the analyses operating in a graduated manner to convert respective electronic text into electronic lexicons, and announcing analog speech based on the results of the performing step.
    Type: Grant
    Filed: March 22, 2005
    Date of Patent: July 7, 2009
    Assignee: Infineon Technologies AG
    Inventors: Michael Kustner, Markus Schnell
  • Patent number: 7536305
    Abstract: A mixed lossless audio compression has application to a unified lossy and lossless audio compression scheme that combines lossy and lossless audio compression within a same audio signal. The mixed lossless compression codes a transition frame between lossy and lossless coding frames to produce seamless transitions. The mixed lossless coding performs a lapped transform and inverse lapped transform to produce an appropriately windowed and folded pseudo-time domain frame, which can then be losslessly coded. The mixed lossless coding also can be applied for frames that exhibit poor lossy compression performance.
    Type: Grant
    Filed: July 14, 2003
    Date of Patent: May 19, 2009
    Assignee: Microsoft Corporation
    Inventors: Wei-Ge Chen, Chao He
  • Patent number: 7454348
    Abstract: A system and method for generating a synthetic text-to-speech TTS voice are disclosed. A user is presented with at least one TTS voice and at least one voice characteristic. A new synthetic TTS voice is generated by blending a plurality of existing TTS voices according to the selected voice characteristics. The blending of voices involves interpolating segmented parameters of each TTS voice. Segmented parameters may be, for example, prosodic characteristics of the speech such as pitch, volume, phone durations, accents, stress, mis-pronunciations and emotion.
    Type: Grant
    Filed: January 8, 2004
    Date of Patent: November 18, 2008
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: David A. Kapilow, Kenneth H. Rosen, Juergen Schroeter
  • Patent number: 7401021
    Abstract: An apparatus for voice modulation in a mobile terminal comprises: a voice input unit being inputted a voice of a subscriber and generating an analog voice signal; a voice modulation unit for modulating the generated analog voice signal; an audio processor for converting the modulated analog voice signal into a digital signal; and an mobile station modem (MSM) for processing the digital signal to be suitable for a wireless transmission. Therefore, the apparatus for voice modulation in a mobile terminal is able to protect the privacy of subscriber by modulating the voice of subscriber during speaking on the phone, and is able to prevent the telephone harassment. Also, the voice of subscriber can be modulated variously as voice in a cave, child voice, devil voice, man's voice, woman's voice, and user defined effect sound, etc., and therefore, the various desires of mobile terminal user can be satisfied.
    Type: Grant
    Filed: July 10, 2002
    Date of Patent: July 15, 2008
    Assignee: LG Electronics Inc.
    Inventor: I-Won Choi
  • Patent number: 7379873
    Abstract: Voice synthesis unit data stored in a phoneme database 10 is selected by a voice synthesis unit selector 12 in accordance with MIDI information stored in a performance data storage unit 11. Characteristic parameters are derived from the selected voice synthesis unit data. A characteristic parameter correction unit 21 corrects the characteristic parameters based on pitch information, etc. A spectrum envelope generating unit 23 generates a spectrum envelope in accordance with the corrected characteristic parameter. A timbre transformation unit 25 changes timbre by correcting the characteristic parameters in accordance with timbre transformation parameters in a time axis. Timbres in the same song position can be transformed into different arbitrary timbres respectively; therefore, the synthesized singing voice will be rich in variety and reality.
    Type: Grant
    Filed: July 3, 2003
    Date of Patent: May 27, 2008
    Assignee: Yamaha Corporation
    Inventor: Hideki Kemmochi
  • Patent number: 7336565
    Abstract: An educational alarm clock radio is provided that speaks a new word each day when the alarm goes off, the words each being stored in a memory cartridge as an individual increments of information in a sequential set of increments. When the alarm goes off, the word of the day, the definition of that word of the day and its use in a sentence are spoken via the audio portion of the device as the next information increment in the sequence. The word will also be displayed on a screen so the user can see the correct spelling of the word. The word may be replayed at any time during the day by activating a device control. Prior words may be displayed by energizing a reverse control. The entire sequence of previously played words, moreover, can be played in serial fashion through further activation of control or combination of controls. The device also serves as an alarm clock radio with alarm types such as wake by buzzer or radio as well as the wake by words function.
    Type: Grant
    Filed: June 12, 2006
    Date of Patent: February 26, 2008
    Inventors: Neil Rohrbacker, Gregory Rohrbacker
  • Patent number: 7337117
    Abstract: An apparatus for phonetically screening predetermined character strings. The apparatus includes a text-to-speech module, and a phonetic screening module in communication with the text-to-speech module. The phonetic screening module is for replacing a first character string with a second character string based on a phonetic enunciation by the text-to-speech module of the first character string.
    Type: Grant
    Filed: September 21, 2004
    Date of Patent: February 26, 2008
    Assignee: AT&T Delaware Intellectual Property, Inc.
    Inventor: Anita Hogans Simpson
  • Patent number: 7162417
    Abstract: An amplitude altering magnification (r) applied to sub-phoneme units of a voiced portion and an amplitude altering magnification s to be applied to sub-phoneme units of an unvoiced portion are determined based upon a target phoneme average power (p0) of synthesized speech and power (p) of a selected phoneme unit. Sub-phoneme units are extracted from a phoneme to be synthesized. From among the extracted sub-phoneme units, a sub-phoneme unit of the voiced portion is multiplied by the amplitude altering magnification (r), and a sub-phoneme unit of the unvoiced portion is multiplied by the amplitude altering magnification (s). Synthesized speech is obtained using the sub-phoneme units thus obtained. This makes it possible to realize power control in which any decline in the quality of synthesized speech is reduced.
    Type: Grant
    Filed: July 13, 2005
    Date of Patent: January 9, 2007
    Assignee: Canon Kabushiki Kaisha
    Inventors: Masayuki Yamada, Yasuhiro Komori, Mitsuru Otsuka
  • Patent number: 7069217
    Abstract: A synthesizer is disclosed in which a speech waveform is synthesized by selecting a synthetic starting waveform segment and then generating a sequence of further segments. The further waveform segments are generated based jointly upon the value of the immediately-preceding segment and upon a model of the dynamics of an actual sound similar to that being generated. In particular, a method is disclosed of a voiced speech sound comprising calculating each new output value from the previous output value using data modeling the evolution, over a short time interval, of the voiced speech sound to be synthesized. This sequential generation of waveform segments enables a synthesized sequence of speech waveforms to be generated of any duration. In addition, a low-dimensional state space representation of speech signals are used in which successive pitch pulse cycles are superimposed to estimate the progression of the cyclic speech signal within each cycle.
    Type: Grant
    Filed: January 9, 1997
    Date of Patent: June 27, 2006
    Assignee: British Telecommunications PLC
    Inventors: Stephen McLaughlin, Michael Banbrook
  • Patent number: 7050966
    Abstract: A system and method of improving signal intelligibility over an interference signal is provided. The system includes a psychoacoustic professor having a psychoacoustic model wherein the level of a signal-of-interest is improved so as to be audible above noise and so as not to exceed a predetermined maximum output level. The system can be combined with active noise cancellation.
    Type: Grant
    Filed: August 7, 2002
    Date of Patent: May 23, 2006
    Assignee: AMI Semiconductor, Inc.
    Inventors: Todd Schneider, David Coode, Robert L. Brennan, Peter Olijnyk
  • Patent number: 7010488
    Abstract: A system and method is used to compress concatenative acoustic inventories for speech. Instead of using general purpose signal compression methods such as vector quantization, the method of the invention uses multiple properties of acoustic inventories to reduce the size of the acoustic inventories, such as the close acoustic match property and acoustic units that are labeled with sufficiently fine distinctions such that between any two phones no events occur that are substantially distinct from these two phones. The close acoustic match property is where acoustic units that share the same phone are acoustically similar at the points where these units may be concatenated. By utilizing multiple properties of acoustic units, the number of parameters per unit that are stored as LPC parameters are minimized. As a result, smaller storage devices may be used due to the reduction of the size of the storage requirements.
    Type: Grant
    Filed: May 9, 2002
    Date of Patent: March 7, 2006
    Assignee: Oregon Health & Science University
    Inventors: Jan P. H. van Santen, Alexander Kain
  • Patent number: 6999520
    Abstract: A method and apparatus for extending the dynamic range of an integer or fixed-point Fast Fourier Transform (“FFT”) system that may be used in communications devices such as ADSL modems. The disclosed FFT system utilizes a shift control module to increase the effective dynamic range of the FFT implementation by selectively choosing at least one stage of an FFT butterfly implementation in which the outputs of the butterfly stage are not divided to otherwise avoid overflow problems.
    Type: Grant
    Filed: January 24, 2002
    Date of Patent: February 14, 2006
    Assignee: Tioga Technologies
    Inventor: Guy Reina
  • Patent number: 6963842
    Abstract: A memory-efficient system converting a signal from a first transform domain to a second transform domain. The system includes a first mechanism that obtains an input signal expressed via a first transform-domain signal representation. A second mechanism expresses the input signal via a second transform-domain signal representation without intermediate time-domain conversion. In the specific embodiment, the input signal is a Modified Discrete Cosine Transform (MDCT) signal. The second transform-domain signal representation is a Discrete Fourier Transform (DFT) signal. The second mechanism further includes a third mechanism that combines effects of an inverse MDCT, a synthesis window function, and an analysis window function, and provides a first signal in response thereto. A fourth mechanism converts the MDCT signal to the DFT signal based on the first signal.
    Type: Grant
    Filed: September 5, 2001
    Date of Patent: November 8, 2005
    Assignee: Creative Technology Ltd.
    Inventor: Michael M. Goodwin
  • Patent number: 6950799
    Abstract: A speech processing system modifies various aspects of input speech according to a user-selected one of various preprogrammed voice fonts. Initially, the speech converter receives a formants signal representing an input speech signal and a pitch signal representing the input signal's fundamental frequency. One or both of the following may also be received: a voicing signal comprising an indication of whether the input speech signal is voiced, unvoiced, or mixed, and/or a gain signal representing the input speech signal's energy. The speech converter also receives user selection of one of multiple preprogrammed voice fonts, each specifying a manner of modifying one or more of the received signals (i.e., formants, voicing, pitch, gain). The speech converter modifies at least one of the formants, voicing, pitch, and/or gain signals as specified by the selected voice font.
    Type: Grant
    Filed: February 19, 2002
    Date of Patent: September 27, 2005
    Assignee: Qualcomm Inc.
    Inventors: Ning Bi, Andrew P. DeJaco
  • Patent number: 6947731
    Abstract: A method for conversion of a voice output of appliance statuses, wherein three spoken phrases are stored for each appliance to be controlled, with the first spoken phrase being allocated to a first appliance status, the second spoken phrase being allocated to a second appliance status, and the third spoken phrase being allocated for at least one third status. When an appliance status is checked, the relevant appliance sends a data word. If the value (which identifies the current appliance status) of the data word corresponds to a first value, the first spoken phrase is output, if it corresponds to a second, the second spoken phrase is output, and the third spoken phrase and the third value are output for at least one third value.
    Type: Grant
    Filed: September 21, 2000
    Date of Patent: September 20, 2005
    Assignee: Siemens Aktiengesellschaft
    Inventor: Erich Kamperschroer
  • Patent number: 6845359
    Abstract: A Fast Fourier Transform (FFT) based voice synthesis method 110, program product and vocoder. Sounds, e.g., speech and audio, are synthesized from multiple sine waves. Each sine wave component is represented by a small number of FFT coefficients 116. Amplitude 120 and phase 124 information of the components may be incorporated into these coefficients. The FFT coefficients corresponding to each of the components are summed 126 and, then, an inverse FFT is applied 128 to the sum to generate a time domain signal. An appropriate section is extracted 130 from the inverse transformed time domain signal as an approximation to the desired output. FFT based synthesis 110 may be combined with simple sine wave summation 100, using FFT based synthesis 110 for complex sounds, e.g., male voices and unvoiced speech, and sine wave summation 100 for simpler sounds, e.g., female voices.
    Type: Grant
    Filed: March 22, 2001
    Date of Patent: January 18, 2005
    Assignee: Motorola, Inc.
    Inventor: Tenkasi Ramabadran
  • Publication number: 20040260552
    Abstract: A method, computer program product, and data processing system for compensating for fundamental frequency changes in a frame-based speech processing system is disclosed. In a preferred embodiment of the present invention, a frame of a voiced speech signal is processed by an inverse linear-predictive filter to obtain a residual signal that is indicative of the fundamental tone emitted by the speaker's vocal cords. A transformation function is applied to the frame to limit the frame to an integer number of pitch cycles. This transformed frame is used in conjunction with vocal tract parameters obtained from the original speech signal frame to construct a pitch-adjusted speech signal that can more easily be understood by speech- or speaker recognition software.
    Type: Application
    Filed: June 23, 2003
    Publication date: December 23, 2004
    Applicant: International Business Machines Corporation
    Inventors: Jiri Navratil, Ganesh N. Ramaswamy, Ran D. Zilca
  • Patent number: 6804649
    Abstract: Voice synthesis with improved expressivity is obtained in a voice synthesiser of source-filter type by making use of a library of source sound categories in the source module. Each source sound category corresponds to a particular morphological category and is derived from analysis of real vocal sounds, by inverse filtering so as to subtract the effect of the vocal tract. The library may be parametrical, that is, the stored data corresponds not to the inverse-filtered sounds themselves but to synthesis coefficients for resynthesising the inverse-filtered sounds using any suitable re-synthesis technique, such as the phase vocoder technique. The coefficients are derived by Short Time Fourier Transform (STFT) analysis.
    Type: Grant
    Filed: June 1, 2001
    Date of Patent: October 12, 2004
    Assignee: Sony France S.A.
    Inventor: Eduardo Reck Miranda
  • Patent number: 6795807
    Abstract: A device and a method to be used by laryngeally impaired people to improve the naturalness of their speech. An artificial sound creating mechanism which forms a simulated glottal pulse in the vocal tract is utilized. An artificial glottal pulse is compared with the natural spectrum and an inverse filter is generated to provide an output signal which would better reproduce natural sound. A digital signal processor introduces a variation of pitch based on an algorithm developed for this purpose; i.e. creating prosody. The algorithm uses primarily the relative amplitude of the speech signal and the rise and fall rates of the amplitude as a basis for setting the frequency of the speech. The invention also clarifies speech of laryngectomees by sensing the presence of consonants in the speech and appropriately amplifying them with respect to the vowel sounds.
    Type: Grant
    Filed: August 17, 2000
    Date of Patent: September 21, 2004
    Inventor: David R. Baraff
  • Patent number: 6785652
    Abstract: A method and an apparatus for improved duration modeling of phonemes in a speech synthesis system are provided. According to one aspect, text is received into a processor of a speech synthesis system. The received text is processed using a sum-of-products phoneme duration model that is used in either the formant method or the concatenative method of speech generation. The phoneme duration model, which is used along with a phoneme pitch model, is produced by developing a non-exponential functional transformation form for use with a generalized additive model. The non-exponential functional transformation form comprises a root sinusoidal transformation that is controlled in response to a minimum phoneme duration and a maximum phoneme duration. The minimum and maximum phoneme durations are observed in training data. The received text is processed by specifying at least one of a number of contextual factors for the generalized additive model.
    Type: Grant
    Filed: December 19, 2002
    Date of Patent: August 31, 2004
    Assignee: Apple Computer, Inc.
    Inventors: Jerome R. Bellegarda, Kim Silverman
  • Patent number: 6778962
    Abstract: A speech synthesizing method includes determining the accent type of the input character string, selecting the prosodic model data from a prosody dictionary for storing typical ones of the prosodic models representing the prosodic information for the character strings in a word dictionary, based on the input character string and the accent type, transforming the prosodic information of the prosodic model when the character string of the selected prosodic model is not coincident with the input character string, selecting the waveform data corresponding to each character of the input character string from a waveform dictionary, based on the prosodic model data after transformation, and connecting the selected waveform data with each other. Therefore, a difference between an input character string and a character string stored in a dictionary is absorbed, then it is possible to synthesize a natural voice.
    Type: Grant
    Filed: July 21, 2000
    Date of Patent: August 17, 2004
    Assignees: Konami Corporation, Konami Computer Entertainment Tokyo, Inc.
    Inventors: Osamu Kasai, Toshiyuki Mizoguchi
  • Patent number: 6757653
    Abstract: A method of composing messages for speech output and the improvement of the quality of reproduction of speech outputs. A series of original sentences for messages is segmented and stored as audio files with search criteria. The length, position, and transition values for the respective segments can be recorded and stored. A sentence to be reproduced is transmitted in a format corresponding to the format of the search criteria. It is determined whether the sentence to be reproduced can be fully reproduced by one segment or a succession of stored segments. The segments found in each case are examined using their entries as to how far the individual segments match as regards speech rhythm. The audio files of the segments in which the examination resulted in the pre-requisites for optimal maintaining of the natural speech rhythm are combined and output for reproduction.
    Type: Grant
    Filed: June 28, 2001
    Date of Patent: June 29, 2004
    Assignee: Nokia Mobile Phones, Ltd.
    Inventors: Peter Buth, Simona Grothues, Amir Iman, Wolfgang Theimer
  • Patent number: 6754628
    Abstract: Methods and apparatus for facilitating speaker recognition, wherein, from target data that is provided relating to a target speaker and background data that is provided relating to at least one background speaker, a set of cohort data is selected from the background data that has at least one proximate characteristic with respect to the target data. The target data and the cohort data are then combined in a manner to produce at least one new cohort model for use in subsequent speaker verification. Similar methods and apparatus are contemplated for non-voice-based applications, such as verification through fingerprints.
    Type: Grant
    Filed: June 13, 2000
    Date of Patent: June 22, 2004
    Assignee: International Business Machines Corporation
    Inventors: Upendra V. Chaudhari, Stephane H. Maes, Jiri Navratil
  • Patent number: 6741666
    Abstract: A method and a device by which original digital signals are analysis-filtered, where the original digital signals include original samples representing physical quantities, and where the original samples are transformed by successive calculation steps into high and low frequency output samples. Any sample calculated at a given step is calculated by a predetermined function of the original samples and/or previously calculated samples, where the samples are ordered by increasing rank. The signal is processed by successive input blocks of samples, where the calculations made on an input block under consideration take into account only the original or calculated samples belonging to the input block under consideration, and where the input block under consideration and the following input block overlap over a predetermined number of original samples. Output blocks are formed, where each output block corresponds respectively to an input block.
    Type: Grant
    Filed: January 11, 2000
    Date of Patent: May 25, 2004
    Assignee: Canon Kabushiki Kaisha
    Inventors: Félix Henry, Bertrand Berthelot, Eric Majani
  • Publication number: 20040093214
    Abstract: A speech to touch translator assembly and method for converting spoken words directed to an operator into tactile sensations caused by combinations of pressure point exertions on the body of the operator, each combination of pressure points exerted signifying a phoneme of one of the spoken words, and sound characteristics superimposed on the spoken words, permitting comprehension of spoken words, and the speaker thereof, by persons that are deaf and blind.
    Type: Application
    Filed: November 12, 2002
    Publication date: May 13, 2004
    Inventors: Robert V. Belenger, Gennaro R. Lopriore
  • Publication number: 20040073429
    Abstract: The invention relates to an information transmission system capable of transmitting target information via voice, as well as to an information encoding apparatus and an information decoding apparatus for use with the system. The information encoding apparatus (31) converts input text information to an intermediate code in accordance with a predetermined encoding method, and outputs a voice derived from voice information based on the intermediate code and supplemented with music arrangement information. The voice is transmitted either directly or via a broadcasting or communicating medium to a receiving side. The information decoding apparatus (34) on the receiving side receives the generated voice, recognizes a voice waveform from the received voice, and reproduces the original target information by decoding the intermediate code based on the recognized voice waveform.
    Type: Application
    Filed: August 15, 2003
    Publication date: April 15, 2004
    Inventor: Tetsuya Naruse
  • Publication number: 20040006472
    Abstract: Voice synthesis unit data stored in a phoneme database 10 is selected by a voice synthesis unit selector 12 in accordance with MIDI information stored in a performance data storage unit 11. Characteristic parameters are derived from the selected voice synthesis unit data. A characteristic parameter correction unit 21 corrects the characteristic parameters based on pitch information, etc. A spectrum envelope generating unit 23 generates a spectrum envelope in accordance with the corrected characteristic parameter. A timbre transformation unit 25 changes timbre by correcting the characteristic parameters in accordance with timbre transformation parameters in a time axis. Timbres in the same song position can be transformed into different arbitrary timbres respectively; therefore, the synthesized singing voice will be rich in variety and reality.
    Type: Application
    Filed: July 3, 2003
    Publication date: January 8, 2004
    Applicant: Yamaha Corporation
    Inventor: Hideki Kemmochi
  • Patent number: 6658382
    Abstract: An input signal is time-frequency transformed, then the frequency-domain coefficients are divided into coefficient segments of about 100 Hz width to generate a sequence of coefficient segments, and the sequence of coefficient segments is split into subbands each consisting of plural coefficient segments. A threshold value is determined based on the intensity of each coefficient segment in each subband. The intensity of each coefficient segment is compared with the threshold value, and the coefficient segments are classified into low- and high-intensity groups. The coefficient segments are quantized for each group, or they are flattened respectively and then quantized through recombination.
    Type: Grant
    Filed: March 23, 2000
    Date of Patent: December 2, 2003
    Assignee: Nippon Telegraph and Telephone Corporation
    Inventors: Naoki Iwakami, Takehiro Moriya, Akio Jin, Kazuaki Chikira, Takeshi Mori
  • Publication number: 20030187651
    Abstract: A voice synthesis system analyzes an input character string, determining a part for which to use recorded voice and a part for which to use synthesized voice, extracts voice data for the part for which to use recorded voice from a database and extracts its feature amount. Then, the system synthesizes voice data to fit the extracted feature amount for the part for which to use synthesized voice, and combines/outputs these pieces of voice data.
    Type: Application
    Filed: December 3, 2002
    Publication date: October 2, 2003
    Applicant: Fujitsu Limited
    Inventor: Wataru Imatake
  • Patent number: 6553344
    Abstract: A method and an apparatus for improved duration modeling of phonemes in a speech synthesis system are provided. According to one aspect, text is received into a processor of a speech synthesis system. The received text is processed using a sum-of-products phoneme duration model that is used in either the formant method or the concatenative method of speech generation. The phoneme duration model, which is used along with a phoneme pitch model, is produced by developing a non-exponential functional transformation form for use with a generalized additive model. The non-exponential functional transformation form comprises a root sinusoidal transformation that is controlled in response to a minimum phoneme duration and a maximum phoneme duration. The minimum and maximum phoneme durations are observed in training data. The received text is processed by specifying at least one of a number of contextual factors for the generalized additive model.
    Type: Grant
    Filed: February 22, 2002
    Date of Patent: April 22, 2003
    Assignee: Apple Computer, Inc.
    Inventors: Jerome R. Bellegarda, Kim Silverman
  • Patent number: 6549884
    Abstract: A system for pitch-shifting an audio signal wherein resampling is done in the frequency domain. The system includes a method for pitch-shifting a signal by converting the signal to a frequency domain representation and then identifying a specific region in the frequency domain representation. The region being located at a first frequency location. Next, the region is shifted to a second frequency location to form a adjusted frequency domain representation. Finally, the adjusted frequency domain representation is transformed to a time domain signal representing the input signal with shifted pitch. This eliminates the expensive time domain resampling stage and allows the computational costs to become independent of the pitch modification factor.
    Type: Grant
    Filed: September 21, 1999
    Date of Patent: April 15, 2003
    Assignee: Creative Technology Ltd.
    Inventors: Jean Laroche, Mark Dolson
  • Patent number: 6529874
    Abstract: A representative pattern memory stores a plurality of initial representative patterns as a noise pattern. Different attribute is affixed to each initial representative pattern. A pitch pattern memory stores a large number of natural pitch patterns as an accent phrase. A clustering unit classifies each natural pitch pattern to the initial representative pattern based on the attribute of the accent phrase. A transformation parameter generation unit calculates an error between a transformed representative pattern and each natural pitch pattern classified to the initial representative pattern. A representative pattern generation unit calculates an evaluation function of the sum of the error between the transformed-representative pattern and each natural pitch pattern classified to the initial representative pattern, and updates each initial representative pattern.
    Type: Grant
    Filed: September 8, 1998
    Date of Patent: March 4, 2003
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Takehiko Kagoshima, Takaaki Nii, Shigenobu Seto, Masahiro Morita, Masami Akamine, Yoshinori Shiga
  • Patent number: 6463412
    Abstract: A high performance voice transformation apparatus and method is provided in which voice input is transformed into a symbolic representation of phonemes in the voice input. The symbolic representation is used to retrieve output voice segments of a selected target speaker for use in outputting the voice input in a different voice. In addition, voice input characteristics are extracted from the voice input and are then applied to the output voice segments to thereby provide a more realistic human sounding voice output.
    Type: Grant
    Filed: December 16, 1999
    Date of Patent: October 8, 2002
    Assignee: International Business Machines Corporation
    Inventors: Jason Raymond Baumgartner, Steven Leonard Roberts, Nadeem Malik, Flemming Andersen
  • Patent number: 6336092
    Abstract: The invention is a method for transforming a source individual's voice so as to adopt the characteristics of a target individual's voice. The excitation signal component of the target individual's voice is extracted and the spectral envelope of the source individual's voice is extracted. The transformed voice is synthesized by applying the spectral envelope of the source individual to the excitation signal component of the voice of the target individual. A higher quality transformation is achieved using an enhanced excitation signal created by replacing unvoiced regions of the signal with interpolated data from adjacent voiced regions. Various methods of transforming the spectral characteristics of the source individual's voice are also disclosed.
    Type: Grant
    Filed: April 28, 1997
    Date of Patent: January 1, 2002
    Assignee: Ivl Technologies Ltd
    Inventors: Brian Charles Gibson, Peter Ronald Lupini, Dale John Shpak
  • Patent number: 6332121
    Abstract: In a synthesis unit generator, a plurality of synthesis speech segments are generated by synthesizing training speech segments labeled with phonetic contexts and input speech segments while altering the pitch/duration of the input speech segments in accordance with the pitch/duration of the training speech segments. Typical speech segments are selected from the input speech segments on the basis of a distance between the synthesis speech segments and the training speech segments, and are stored in a storage. In addition, a plurality of phonetic context clusters corresponding to the synthesis units are generated on the basis of the distance, and are stored in a storage. A synthesis speech signal is generated by reading out, from the storage, those of the synthesis units, which correspond to the phonetic context clusters including phonetic contexts of input phonemes, and connecting the selected synthesis units in a speech synthesizer.
    Type: Grant
    Filed: November 27, 2000
    Date of Patent: December 18, 2001
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Takehiko Kagoshima, Masami Akamine