Transformation Patents (Class 704/269)

Method for retrieving items represented by particles from an information database

Patent number: 8055693

Abstract: A set of words is converted to a corresponding set of particles, wherein the words and the particles are unique within each set. For each word, all possible partitionings of the word into particles are determined, and a cost is determined for each possible partitioning. The particles of the possible partitioning associated with a minimal cost are added to the set of particles.

Type: Grant

Filed: June 30, 2009

Date of Patent: November 8, 2011

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Tony Ezzat, Evandro Gouvea
Methods and apparatus related to pruning for concatenative text-to-speech synthesis

Patent number: 8024193

Abstract: The present invention provides, among other things, automatic identification of near-redundant units in a large TTS voice table, identifying which units are distinctive enough to keep and which units are sufficiently redundant to discard. According to an aspect of the invention, pruning is treated as a clustering problem in a suitable feature space. All instances of a given unit (e.g. word or characters expressed as Unicode strings) are mapped onto the feature space, and cluster units in that space using a suitable similarity measure. Since all units in a given cluster are, by construction, closely related from the point of view of the measure used, they are suitably redundant and can be replaced by a single instance. The disclosed method can detect near-redundancy in TTS units in a completely unsupervised manner, based on an original feature extraction and clustering strategy.

Type: Grant

Filed: October 10, 2006

Date of Patent: September 20, 2011

Assignee: Apple inc.

Inventor: Jerome R. Bellegarda
System and method for blending synthetic voices

Patent number: 7966186

Abstract: A system and method for generating a synthetic text-to-speech TTS voice are disclosed. A user is presented with at least one TTS voice and at least one voice characteristic. A new synthetic TTS voice is generated by blending a plurality of existing TTS voices according to the selected voice characteristics. The blending of voices involves interpolating segmented parameters of each TTS voice. Segmented parameters may be, for example, prosodic characteristics of the speech such as pitch, volume, phone durations, accents, stress, mis-pronunciations and emotion.

Type: Grant

Filed: November 4, 2008

Date of Patent: June 21, 2011

Assignee: AT&T Intellectual Property II, L.P.

Inventors: David A. Kapilow, Kenneth H. Rosen, Juergen Schroeter
Systems and methods for language translation in network browsing applications

Patent number: 7958446

Abstract: A browsing application for accessing resources over a network includes code for receiving a command from a user to translate textual material appearing on an arbitrary page displayed in the content display area, and code for causing the textual material to be passed to a translation resource on the network, whereby the display area of the browsing application is caused to display a page which includes a translation of the textual material. The application may display a menu accessible from an arbitrary page whereby the user may select among translation options such as a translate to language and a translate from language. The application may be configured to cause text selected by a user to be translated, and/or may cause an entire page to be translated. Translated text may be displayed along with graphics in a layout similar to that of the original page.

Type: Grant

Filed: October 31, 2005

Date of Patent: June 7, 2011

Assignee: Yahoo! Inc.

Inventors: Edward Seitz, Brockton Davis, Derrick Whittle, James Bollas
System and method for hybrid speech synthesis

Patent number: 7953600

Abstract: A speech synthesis system receives symbolic input describing an utterance to be synthesized. In one embodiment, different portions of the utterance are constructed from different sources, one of which is a speech corpus recorded from a human speaker whose voice is to be modeled. The other sources may include other human speech corpora or speech produced using Rule-Based Speech Synthesis (RBSS). At least some portions of the utterance may be constructed by modifying prototype speech units to produce adapted speech units that are contextually appropriate for the utterance. The system concatenates the adapted speech units with the other speech units to produce a speech waveform. In another embodiment, a speech unit of a speech corpus recorded from a human speaker lacks transitions at one or both of its edges. A transition is synthesized using RBSS and concatenated with the speech unit in producing a speech waveform for the utterance.

Type: Grant

Filed: April 24, 2007

Date of Patent: May 31, 2011

Assignee: NovaSpeech LLC

Inventors: Susan R. Hertz, Harold G. Mills
Sound processing apparatus and method, and program therefor

Patent number: 7945446

Abstract: Spectrum envelope of an input sound is detected. In the meantime, a converting spectrum is acquired which is a frequency spectrum of a converting sound comprising a plurality of sounds, such as unison sounds. Output spectrum is generated by imparting the detected spectrum envelope of the input sound to the acquired converting spectrum. Sound signal is synthesized on the basis of the generated output spectrum. Further, a pitch of the input sound may be detected, and frequencies of peaks in the acquired converting spectrum may be varied in accordance with the detected pitch of the input sound. In this manner, the output spectrum can have the pitch and spectrum envelope of the input sound and spectrum frequency components of the converting sound comprising a plurality of sounds, and thus, unison sounds can be readily generated with simple arrangements.

Type: Grant

Filed: March 9, 2006

Date of Patent: May 17, 2011

Assignee: Yamaha Corporation

Inventors: Hideki Kemmochi, Yasuo Yoshioka, Jordi Bonada
Method and apparatus for normalizing voice feature vector by backward cumulative histogram

Patent number: 7835909

Abstract: A method and apparatus for normalizing a histogram utilizing a backward cumulative histogram which can cumulate a probability distribution function in an order from a greatest to smallest value so as to estimate a noise robust histogram. A method of normalizing a speech feature vector includes: extracting the speech feature vector from a speech signal; calculating a probability distribution function using the extracted speech feature vector; calculating a backward cumulative distribution function by cumulating the probability distribution function in an order from a largest to smallest value; and normalizing a histogram using the backward cumulative distribution function.

Type: Grant

Filed: December 12, 2006

Date of Patent: November 16, 2010

Assignee: Samsung Electronics Co., Ltd.

Inventors: So-Young Jeong, Gil Jin Jang, Kwang Cheol Oh
Redundancy elimination by aggregation of multiple chunks

Patent number: 7814284

Abstract: A data redundancy elimination system.

Type: Grant

Filed: January 18, 2007

Date of Patent: October 12, 2010

Assignee: Cisco Technology, Inc.

Inventors: Gideon Glass, Maxim Martynov, Qiwen Zhang, Etai Lev Ran, Dan Li
Method, apparatus and computer program providing a multi-speaker database for concatenative text-to-speech synthesis

Patent number: 7716052

Abstract: A method, apparatus and a computer program product to generate an audible speech word that corresponds to text. The method includes providing a text word and, in response to the text word, processing pre-recorded speech segments that are derived from a plurality of speakers to selectively concatenate together speech segments based on at least one cost function to form audio data for generating an audible speech word that corresponds to the text word. A data structure is also provided for use in a concatenative text-to-speech system that includes a plurality of speech segments derived from a plurality of speakers, where each speech segment includes an associated attribute vector each of which is comprised of at least one attribute vector element that identifies the speaker from which the speech segment was derived.

Type: Grant

Filed: April 7, 2005

Date of Patent: May 11, 2010

Assignee: Nuance Communications, Inc.

Inventors: Andrew S. Aaron, Ellen M. Eide, Wael M. Hamza, Michael A. Picheny, Charles T. Rutherfoord, Zhi Wei Shuang, Maria E. Smith
Method and apparatus for a differentiated voice output

Patent number: 7698139

Abstract: In a method and apparatus for a differentiated voice output, systems existing in a vehicle, such as the on-board computer, the navigation system, and others, can be connected with a voice output device. The voice outputs of different systems can be differentiated by way of voice characteristics.

Type: Grant

Filed: June 20, 2003

Date of Patent: April 13, 2010

Assignee: Bayerische Motoren Werke Aktiengesellschaft

Inventors: Georg Obert, Klaus-Josef Bengler
Speech Enhancement Employing a Perceptual Model

Publication number: 20100076769

Abstract: Speech enhancement based on a psycho-acoustic model is disclosed that is capable of preserving the fidelity of speech while sufficiently suppressing noise including the processing artifact known as “musical noise”.

Type: Application

Filed: March 14, 2008

Publication date: March 25, 2010

Applicant: DOLBY LABORATORIES LICENSING CORPORATION

Inventor: Rongshan Yu
Individualization of voice output by matching synthesized voice target voice

Patent number: 7664645

Abstract: The voice of a synthesized voice output is individualized and matched to a user voice, the voice of a communication partner or the voice of a famous personality. In this way mobile terminals in particular can be originally individualized and text messages can be read out using a specific voice.

Type: Grant

Filed: March 11, 2005

Date of Patent: February 16, 2010

Assignee: SVOX AG

Inventors: Horst-Udo Hain, Klaus Lukas
Synthesis subband filter process and apparatus

Patent number: 7580843

Abstract: A synthesis subband filter apparatus is provided. The apparatus is used for processing 18 sets of signals which each includes 32 subband sampling signals in accordance with a specification providing 512 window coefficients. The apparatus includes a processor for processing the 18 sets of signals in sequence. The processor further includes a converting module and a generating module. The converting module is used for converting the 32 subband sampling signals of the set of signals being processed into 32 converted vectors by use of 32-points discrete cosine transform (DCT), and writing the 32 converted vectors into 512 default vectors with a first-in, first-out queue. The generating module is used for generating 32 pulse code modulation (PCM) signals, relative to the set of signals being processed according to a set of synthesis formulae proposed in this invention.

Type: Grant

Filed: May 8, 2006

Date of Patent: August 25, 2009

Assignee: Quanta Computer, Inc.

Inventors: Chih-Hsien Chang, Chih-Wei Hung, Hsien-Ming Tsai
METHOD AND APPARATUS FOR ENCODING AND DECODING AUDIO SIGNALS

Publication number: 20090187409

Abstract: Techniques for efficiently encoding an input signal are described. In one design, a generalized encoder encodes the input signal (e.g., an audio signal) based on at least one detector and multiple encoders. The at least one detector may include a signal activity detector, a noise-like signal detector, a sparseness detector, some other detector, or a combination thereof. The multiple encoders may include a silence encoder, a noise-like signal encoder, a time-domain encoder, a transform-domain encoder, some other encoder, or a combination thereof. The characteristics of the input signal may be determined based on the at least one detector. An encoder may be selected from among the multiple encoders based on the characteristics of the input signal. The input signal may be encoded based on the selected encoder. The input signal may include a sequence of frames, and detection and encoding may be performed for each frame.

Type: Application

Filed: October 8, 2007

Publication date: July 23, 2009

Applicant: Qualcomm Incorporated

Inventors: Venkatesh Krishnan, Vivek Rajendran, Ananthapadmanabhan A. Kandhadai
Speech synthesis method and speech synthesizer

Patent number: 7562018

Abstract: A language processing portion (31) analyzes a text from a dialogue processing section (20) and transforms the text to information on pronunciation and accent. A prosody generation portion (32) generates an intonation pattern according to a control signal from the dialogue processing section (20). A waveform DB (34) stores prerecorded waveform data together with pitch mark data imparted thereto. A waveform cutting portion (33) cuts desired pitch waveforms from the waveform DB (34). A phase operation portion (35) removes phase fluctuation by standardizing phase spectra of the pitch waveforms cut by the waveform cutting portion (33), and afterwards imparts phase fluctuation by diffusing only high phase components randomly according to the control signal from the dialogue processing section (20). The thus-produced pitch waveforms are placed at desired intervals and superimposed.

Type: Grant

Filed: November 25, 2003

Date of Patent: July 14, 2009

Assignee: Panasonic Corporation

Inventors: Takahiro Kamai, Yumiko Kato
Method and system for computer-aided speech synthesis

Patent number: 7558732

Abstract: Method and system for computer-aided speed synthesis for synthesizing electronic text by performing a predefined series of rules-based analyses in a predefined order, each of the analyses operating in a graduated manner to convert respective electronic text into electronic lexicons, and announcing analog speech based on the results of the performing step.

Type: Grant

Filed: March 22, 2005

Date of Patent: July 7, 2009

Assignee: Infineon Technologies AG

Inventors: Michael Kustner, Markus Schnell
Mixed lossless audio compression

Patent number: 7536305

Abstract: A mixed lossless audio compression has application to a unified lossy and lossless audio compression scheme that combines lossy and lossless audio compression within a same audio signal. The mixed lossless compression codes a transition frame between lossy and lossless coding frames to produce seamless transitions. The mixed lossless coding performs a lapped transform and inverse lapped transform to produce an appropriately windowed and folded pseudo-time domain frame, which can then be losslessly coded. The mixed lossless coding also can be applied for frames that exhibit poor lossy compression performance.

Type: Grant

Filed: July 14, 2003

Date of Patent: May 19, 2009

Assignee: Microsoft Corporation

Inventors: Wei-Ge Chen, Chao He
System and method for blending synthetic voices

Patent number: 7454348

Abstract: A system and method for generating a synthetic text-to-speech TTS voice are disclosed. A user is presented with at least one TTS voice and at least one voice characteristic. A new synthetic TTS voice is generated by blending a plurality of existing TTS voices according to the selected voice characteristics. The blending of voices involves interpolating segmented parameters of each TTS voice. Segmented parameters may be, for example, prosodic characteristics of the speech such as pitch, volume, phone durations, accents, stress, mis-pronunciations and emotion.

Type: Grant

Filed: January 8, 2004

Date of Patent: November 18, 2008

Assignee: AT&T Intellectual Property II, L.P.

Inventors: David A. Kapilow, Kenneth H. Rosen, Juergen Schroeter
Apparatus and method for voice modulation in mobile terminal

Patent number: 7401021

Abstract: An apparatus for voice modulation in a mobile terminal comprises: a voice input unit being inputted a voice of a subscriber and generating an analog voice signal; a voice modulation unit for modulating the generated analog voice signal; an audio processor for converting the modulated analog voice signal into a digital signal; and an mobile station modem (MSM) for processing the digital signal to be suitable for a wireless transmission. Therefore, the apparatus for voice modulation in a mobile terminal is able to protect the privacy of subscriber by modulating the voice of subscriber during speaking on the phone, and is able to prevent the telephone harassment. Also, the voice of subscriber can be modulated variously as voice in a cave, child voice, devil voice, man's voice, woman's voice, and user defined effect sound, etc., and therefore, the various desires of mobile terminal user can be satisfied.

Type: Grant

Filed: July 10, 2002

Date of Patent: July 15, 2008

Assignee: LG Electronics Inc.

Inventor: I-Won Choi
Singing voice synthesizing apparatus, singing voice synthesizing method and program for synthesizing singing voice

Patent number: 7379873

Abstract: Voice synthesis unit data stored in a phoneme database 10 is selected by a voice synthesis unit selector 12 in accordance with MIDI information stored in a performance data storage unit 11. Characteristic parameters are derived from the selected voice synthesis unit data. A characteristic parameter correction unit 21 corrects the characteristic parameters based on pitch information, etc. A spectrum envelope generating unit 23 generates a spectrum envelope in accordance with the corrected characteristic parameter. A timbre transformation unit 25 changes timbre by correcting the characteristic parameters in accordance with timbre transformation parameters in a time axis. Timbres in the same song position can be transformed into different arbitrary timbres respectively; therefore, the synthesized singing voice will be rich in variety and reality.

Type: Grant

Filed: July 3, 2003

Date of Patent: May 27, 2008

Assignee: Yamaha Corporation

Inventor: Hideki Kemmochi
Talking alarm clock

Patent number: 7336565

Abstract: An educational alarm clock radio is provided that speaks a new word each day when the alarm goes off, the words each being stored in a memory cartridge as an individual increments of information in a sequential set of increments. When the alarm goes off, the word of the day, the definition of that word of the day and its use in a sentence are spoken via the audio portion of the device as the next information increment in the sequence. The word will also be displayed on a screen so the user can see the correct spelling of the word. The word may be replayed at any time during the day by activating a device control. Prior words may be displayed by energizing a reverse control. The entire sequence of previously played words, moreover, can be played in serial fashion through further activation of control or combination of controls. The device also serves as an alarm clock radio with alarm types such as wake by buzzer or radio as well as the wake by words function.

Type: Grant

Filed: June 12, 2006

Date of Patent: February 26, 2008

Inventors: Neil Rohrbacker, Gregory Rohrbacker
Apparatus and method for phonetically screening predetermined character strings

Patent number: 7337117

Abstract: An apparatus for phonetically screening predetermined character strings. The apparatus includes a text-to-speech module, and a phonetic screening module in communication with the text-to-speech module. The phonetic screening module is for replacing a first character string with a second character string based on a phonetic enunciation by the text-to-speech module of the first character string.

Type: Grant

Filed: September 21, 2004

Date of Patent: February 26, 2008

Assignee: AT&T Delaware Intellectual Property, Inc.

Inventor: Anita Hogans Simpson
Speech synthesizing method and apparatus for altering amplitudes of voiced and invoiced portions

Patent number: 7162417

Abstract: An amplitude altering magnification (r) applied to sub-phoneme units of a voiced portion and an amplitude altering magnification s to be applied to sub-phoneme units of an unvoiced portion are determined based upon a target phoneme average power (p0) of synthesized speech and power (p) of a selected phoneme unit. Sub-phoneme units are extracted from a phoneme to be synthesized. From among the extracted sub-phoneme units, a sub-phoneme unit of the voiced portion is multiplied by the amplitude altering magnification (r), and a sub-phoneme unit of the unvoiced portion is multiplied by the amplitude altering magnification (s). Synthesized speech is obtained using the sub-phoneme units thus obtained. This makes it possible to realize power control in which any decline in the quality of synthesized speech is reduced.

Type: Grant

Filed: July 13, 2005

Date of Patent: January 9, 2007

Assignee: Canon Kabushiki Kaisha

Inventors: Masayuki Yamada, Yasuhiro Komori, Mitsuru Otsuka
Waveform synthesis

Patent number: 7069217

Abstract: A synthesizer is disclosed in which a speech waveform is synthesized by selecting a synthetic starting waveform segment and then generating a sequence of further segments. The further waveform segments are generated based jointly upon the value of the immediately-preceding segment and upon a model of the dynamics of an actual sound similar to that being generated. In particular, a method is disclosed of a voiced speech sound comprising calculating each new output value from the previous output value using data modeling the evolution, over a short time interval, of the voiced speech sound to be synthesized. This sequential generation of waveform segments enables a synthesized sequence of speech waveforms to be generated of any duration. In addition, a low-dimensional state space representation of speech signals are used in which successive pitch pulse cycles are superimposed to estimate the progression of the cyclic speech signal within each cycle.

Type: Grant

Filed: January 9, 1997

Date of Patent: June 27, 2006

Assignee: British Telecommunications PLC

Inventors: Stephen McLaughlin, Michael Banbrook
Sound intelligibility enhancement using a psychoacoustic model and an oversampled filterbank

Patent number: 7050966

Abstract: A system and method of improving signal intelligibility over an interference signal is provided. The system includes a psychoacoustic professor having a psychoacoustic model wherein the level of a signal-of-interest is improved so as to be audible above noise and so as not to exceed a predetermined maximum output level. The system can be combined with active noise cancellation.

Type: Grant

Filed: August 7, 2002

Date of Patent: May 23, 2006

Assignee: AMI Semiconductor, Inc.

Inventors: Todd Schneider, David Coode, Robert L. Brennan, Peter Olijnyk
System and method for compressing concatenative acoustic inventories for speech synthesis

Patent number: 7010488

Abstract: A system and method is used to compress concatenative acoustic inventories for speech. Instead of using general purpose signal compression methods such as vector quantization, the method of the invention uses multiple properties of acoustic inventories to reduce the size of the acoustic inventories, such as the close acoustic match property and acoustic units that are labeled with sufficiently fine distinctions such that between any two phones no events occur that are substantially distinct from these two phones. The close acoustic match property is where acoustic units that share the same phone are acoustically similar at the points where these units may be concatenated. By utilizing multiple properties of acoustic units, the number of parameters per unit that are stored as LPC parameters are minimized. As a result, smaller storage devices may be used due to the reduction of the size of the storage requirements.

Type: Grant

Filed: May 9, 2002

Date of Patent: March 7, 2006

Assignee: Oregon Health & Science University

Inventors: Jan P. H. van Santen, Alexander Kain
Efficient FFT implementation for asymmetric digital subscriber line (ADSL)

Patent number: 6999520

Abstract: A method and apparatus for extending the dynamic range of an integer or fixed-point Fast Fourier Transform (“FFT”) system that may be used in communications devices such as ADSL modems. The disclosed FFT system utilizes a shift control module to increase the effective dynamic range of the FFT implementation by selectively choosing at least one stage of an FFT butterfly implementation in which the outputs of the butterfly stage are not divided to otherwise avoid overflow problems.

Type: Grant

Filed: January 24, 2002

Date of Patent: February 14, 2006

Assignee: Tioga Technologies

Inventor: Guy Reina
Efficient system and method for converting between different transform-domain signal representations

Patent number: 6963842

Abstract: A memory-efficient system converting a signal from a first transform domain to a second transform domain. The system includes a first mechanism that obtains an input signal expressed via a first transform-domain signal representation. A second mechanism expresses the input signal via a second transform-domain signal representation without intermediate time-domain conversion. In the specific embodiment, the input signal is a Modified Discrete Cosine Transform (MDCT) signal. The second transform-domain signal representation is a Discrete Fourier Transform (DFT) signal. The second mechanism further includes a third mechanism that combines effects of an inverse MDCT, a synthesis window function, and an analysis window function, and provides a first signal in response thereto. A fourth mechanism converts the MDCT signal to the DFT signal based on the first signal.

Type: Grant

Filed: September 5, 2001

Date of Patent: November 8, 2005

Assignee: Creative Technology Ltd.

Inventor: Michael M. Goodwin
Speech converter utilizing preprogrammed voice profiles

Patent number: 6950799

Abstract: A speech processing system modifies various aspects of input speech according to a user-selected one of various preprogrammed voice fonts. Initially, the speech converter receives a formants signal representing an input speech signal and a pitch signal representing the input signal's fundamental frequency. One or both of the following may also be received: a voicing signal comprising an indication of whether the input speech signal is voiced, unvoiced, or mixed, and/or a gain signal representing the input speech signal's energy. The speech converter also receives user selection of one of multiple preprogrammed voice fonts, each specifying a manner of modifying one or more of the received signals (i.e., formants, voicing, pitch, gain). The speech converter modifies at least one of the formants, voicing, pitch, and/or gain signals as specified by the selected voice font.

Type: Grant

Filed: February 19, 2002

Date of Patent: September 27, 2005

Assignee: Qualcomm Inc.

Inventors: Ning Bi, Andrew P. DeJaco
Method for converting status messages output in spoken form

Patent number: 6947731

Abstract: A method for conversion of a voice output of appliance statuses, wherein three spoken phrases are stored for each appliance to be controlled, with the first spoken phrase being allocated to a first appliance status, the second spoken phrase being allocated to a second appliance status, and the third spoken phrase being allocated for at least one third status. When an appliance status is checked, the relevant appliance sends a data word. If the value (which identifies the current appliance status) of the data word corresponds to a first value, the first spoken phrase is output, if it corresponds to a second, the second spoken phrase is output, and the third spoken phrase and the third value are output for at least one third value.

Type: Grant

Filed: September 21, 2000

Date of Patent: September 20, 2005

Assignee: Siemens Aktiengesellschaft

Inventor: Erich Kamperschroer
FFT based sine wave synthesis method for parametric vocoders

Patent number: 6845359

Abstract: A Fast Fourier Transform (FFT) based voice synthesis method 110, program product and vocoder. Sounds, e.g., speech and audio, are synthesized from multiple sine waves. Each sine wave component is represented by a small number of FFT coefficients 116. Amplitude 120 and phase 124 information of the components may be incorporated into these coefficients. The FFT coefficients corresponding to each of the components are summed 126 and, then, an inverse FFT is applied 128 to the sum to generate a time domain signal. An appropriate section is extracted 130 from the inverse transformed time domain signal as an approximation to the desired output. FFT based synthesis 110 may be combined with simple sine wave summation 100, using FFT based synthesis 110 for complex sounds, e.g., male voices and unvoiced speech, and sine wave summation 100 for simpler sounds, e.g., female voices.

Type: Grant

Filed: March 22, 2001

Date of Patent: January 18, 2005

Assignee: Motorola, Inc.

Inventor: Tenkasi Ramabadran
Method and apparatus to compensate for fundamental frequency changes and artifacts and reduce sensitivity to pitch information in a frame-based speech processing system

Publication number: 20040260552

Abstract: A method, computer program product, and data processing system for compensating for fundamental frequency changes in a frame-based speech processing system is disclosed. In a preferred embodiment of the present invention, a frame of a voiced speech signal is processed by an inverse linear-predictive filter to obtain a residual signal that is indicative of the fundamental tone emitted by the speaker's vocal cords. A transformation function is applied to the frame to limit the frame to an integer number of pitch cycles. This transformed frame is used in conjunction with vocal tract parameters obtained from the original speech signal frame to construct a pitch-adjusted speech signal that can more easily be understood by speech- or speaker recognition software.

Type: Application

Filed: June 23, 2003

Publication date: December 23, 2004

Applicant: International Business Machines Corporation

Inventors: Jiri Navratil, Ganesh N. Ramaswamy, Ran D. Zilca
Expressivity of voice synthesis by emphasizing source signal features

Patent number: 6804649

Abstract: Voice synthesis with improved expressivity is obtained in a voice synthesiser of source-filter type by making use of a library of source sound categories in the source module. Each source sound category corresponds to a particular morphological category and is derived from analysis of real vocal sounds, by inverse filtering so as to subtract the effect of the vocal tract. The library may be parametrical, that is, the stored data corresponds not to the inverse-filtered sounds themselves but to synthesis coefficients for resynthesising the inverse-filtered sounds using any suitable re-synthesis technique, such as the phase vocoder technique. The coefficients are derived by Short Time Fourier Transform (STFT) analysis.

Type: Grant

Filed: June 1, 2001

Date of Patent: October 12, 2004

Assignee: Sony France S.A.

Inventor: Eduardo Reck Miranda
Method and means for creating prosody in speech regeneration for laryngectomees

Patent number: 6795807

Abstract: A device and a method to be used by laryngeally impaired people to improve the naturalness of their speech. An artificial sound creating mechanism which forms a simulated glottal pulse in the vocal tract is utilized. An artificial glottal pulse is compared with the natural spectrum and an inverse filter is generated to provide an output signal which would better reproduce natural sound. A digital signal processor introduces a variation of pitch based on an algorithm developed for this purpose; i.e. creating prosody. The algorithm uses primarily the relative amplitude of the speech signal and the rise and fall rates of the amplitude as a basis for setting the frequency of the speech. The invention also clarifies speech of laryngectomees by sensing the presence of consonants in the speech and appropriately amplifying them with respect to the vowel sounds.

Type: Grant

Filed: August 17, 2000

Date of Patent: September 21, 2004

Inventor: David R. Baraff
Method and apparatus for improved duration modeling of phonemes

Patent number: 6785652

Abstract: A method and an apparatus for improved duration modeling of phonemes in a speech synthesis system are provided. According to one aspect, text is received into a processor of a speech synthesis system. The received text is processed using a sum-of-products phoneme duration model that is used in either the formant method or the concatenative method of speech generation. The phoneme duration model, which is used along with a phoneme pitch model, is produced by developing a non-exponential functional transformation form for use with a generalized additive model. The non-exponential functional transformation form comprises a root sinusoidal transformation that is controlled in response to a minimum phoneme duration and a maximum phoneme duration. The minimum and maximum phoneme durations are observed in training data. The received text is processed by specifying at least one of a number of contextual factors for the generalized additive model.

Type: Grant

Filed: December 19, 2002

Date of Patent: August 31, 2004

Assignee: Apple Computer, Inc.

Inventors: Jerome R. Bellegarda, Kim Silverman
Speech synthesis with prosodic model data and accent type

Patent number: 6778962

Abstract: A speech synthesizing method includes determining the accent type of the input character string, selecting the prosodic model data from a prosody dictionary for storing typical ones of the prosodic models representing the prosodic information for the character strings in a word dictionary, based on the input character string and the accent type, transforming the prosodic information of the prosodic model when the character string of the selected prosodic model is not coincident with the input character string, selecting the waveform data corresponding to each character of the input character string from a waveform dictionary, based on the prosodic model data after transformation, and connecting the selected waveform data with each other. Therefore, a difference between an input character string and a character string stored in a dictionary is absorbed, then it is possible to synthesize a natural voice.

Type: Grant

Filed: July 21, 2000

Date of Patent: August 17, 2004

Assignees: Konami Corporation, Konami Computer Entertainment Tokyo, Inc.

Inventors: Osamu Kasai, Toshiyuki Mizoguchi
Reassembling speech sentence fragments using associated phonetic property

Patent number: 6757653

Abstract: A method of composing messages for speech output and the improvement of the quality of reproduction of speech outputs. A series of original sentences for messages is segmented and stored as audio files with search criteria. The length, position, and transition values for the respective segments can be recorded and stored. A sentence to be reproduced is transmitted in a format corresponding to the format of the search criteria. It is determined whether the sentence to be reproduced can be fully reproduced by one segment or a succession of stored segments. The segments found in each case are examined using their entries as to how far the individual segments match as regards speech rhythm. The audio files of the segments in which the examination resulted in the pre-requisites for optimal maintaining of the natural speech rhythm are combined and output for reproduction.

Type: Grant

Filed: June 28, 2001

Date of Patent: June 29, 2004

Assignee: Nokia Mobile Phones, Ltd.

Inventors: Peter Buth, Simona Grothues, Amir Iman, Wolfgang Theimer
Speaker recognition using cohort-specific feature transforms

Patent number: 6754628

Abstract: Methods and apparatus for facilitating speaker recognition, wherein, from target data that is provided relating to a target speaker and background data that is provided relating to at least one background speaker, a set of cohort data is selected from the background data that has at least one proximate characteristic with respect to the target data. The target data and the cohort data are then combined in a manner to produce at least one new cohort model for use in subsequent speaker verification. Similar methods and apparatus are contemplated for non-voice-based applications, such as verification through fingerprints.

Type: Grant

Filed: June 13, 2000

Date of Patent: June 22, 2004

Assignee: International Business Machines Corporation

Inventors: Upendra V. Chaudhari, Stephane H. Maes, Jiri Navratil
Device and method for transforming a digital signal

Patent number: 6741666

Abstract: A method and a device by which original digital signals are analysis-filtered, where the original digital signals include original samples representing physical quantities, and where the original samples are transformed by successive calculation steps into high and low frequency output samples. Any sample calculated at a given step is calculated by a predetermined function of the original samples and/or previously calculated samples, where the samples are ordered by increasing rank. The signal is processed by successive input blocks of samples, where the calculations made on an input block under consideration take into account only the original or calculated samples belonging to the input block under consideration, and where the input block under consideration and the following input block overlap over a predetermined number of original samples. Output blocks are formed, where each output block corresponds respectively to an input block.

Type: Grant

Filed: January 11, 2000

Date of Patent: May 25, 2004

Assignee: Canon Kabushiki Kaisha

Inventors: Félix Henry, Bertrand Berthelot, Eric Majani
Discriminating speech to touch translator assembly and method

Publication number: 20040093214

Abstract: A speech to touch translator assembly and method for converting spoken words directed to an operator into tactile sensations caused by combinations of pressure point exertions on the body of the operator, each combination of pressure points exerted signifying a phoneme of one of the spoken words, and sound characteristics superimposed on the spoken words, permitting comprehension of spoken words, and the speaker thereof, by persons that are deaf and blind.

Type: Application

Filed: November 12, 2002

Publication date: May 13, 2004

Inventors: Robert V. Belenger, Gennaro R. Lopriore
Information transmitting system, information encoder and information decoder

Publication number: 20040073429

Abstract: The invention relates to an information transmission system capable of transmitting target information via voice, as well as to an information encoding apparatus and an information decoding apparatus for use with the system. The information encoding apparatus (31) converts input text information to an intermediate code in accordance with a predetermined encoding method, and outputs a voice derived from voice information based on the intermediate code and supplemented with music arrangement information. The voice is transmitted either directly or via a broadcasting or communicating medium to a receiving side. The information decoding apparatus (34) on the receiving side receives the generated voice, recognizes a voice waveform from the received voice, and reproduces the original target information by decoding the intermediate code based on the recognized voice waveform.

Type: Application

Filed: August 15, 2003

Publication date: April 15, 2004

Inventor: Tetsuya Naruse
Singing voice synthesizing apparatus, singing voice synthesizing method and program for synthesizing singing voice

Publication number: 20040006472

Abstract: Voice synthesis unit data stored in a phoneme database 10 is selected by a voice synthesis unit selector 12 in accordance with MIDI information stored in a performance data storage unit 11. Characteristic parameters are derived from the selected voice synthesis unit data. A characteristic parameter correction unit 21 corrects the characteristic parameters based on pitch information, etc. A spectrum envelope generating unit 23 generates a spectrum envelope in accordance with the corrected characteristic parameter. A timbre transformation unit 25 changes timbre by correcting the characteristic parameters in accordance with timbre transformation parameters in a time axis. Timbres in the same song position can be transformed into different arbitrary timbres respectively; therefore, the synthesized singing voice will be rich in variety and reality.

Type: Application

Filed: July 3, 2003

Publication date: January 8, 2004

Applicant: Yamaha Corporation

Inventor: Hideki Kemmochi
Audio signal coding and decoding methods and apparatus and recording media with programs therefor

Patent number: 6658382

Abstract: An input signal is time-frequency transformed, then the frequency-domain coefficients are divided into coefficient segments of about 100 Hz width to generate a sequence of coefficient segments, and the sequence of coefficient segments is split into subbands each consisting of plural coefficient segments. A threshold value is determined based on the intensity of each coefficient segment in each subband. The intensity of each coefficient segment is compared with the threshold value, and the coefficient segments are classified into low- and high-intensity groups. The coefficient segments are quantized for each group, or they are flattened respectively and then quantized through recombination.

Type: Grant

Filed: March 23, 2000

Date of Patent: December 2, 2003

Assignee: Nippon Telegraph and Telephone Corporation

Inventors: Naoki Iwakami, Takehiro Moriya, Akio Jin, Kazuaki Chikira, Takeshi Mori
Voice synthesis system combining recorded voice with synthesized voice

Publication number: 20030187651

Abstract: A voice synthesis system analyzes an input character string, determining a part for which to use recorded voice and a part for which to use synthesized voice, extracts voice data for the part for which to use recorded voice from a database and extracts its feature amount. Then, the system synthesizes voice data to fit the extracted feature amount for the part for which to use synthesized voice, and combines/outputs these pieces of voice data.

Type: Application

Filed: December 3, 2002

Publication date: October 2, 2003

Applicant: Fujitsu Limited

Inventor: Wataru Imatake
Method and apparatus for improved duration modeling of phonemes

Patent number: 6553344

Abstract: A method and an apparatus for improved duration modeling of phonemes in a speech synthesis system are provided. According to one aspect, text is received into a processor of a speech synthesis system. The received text is processed using a sum-of-products phoneme duration model that is used in either the formant method or the concatenative method of speech generation. The phoneme duration model, which is used along with a phoneme pitch model, is produced by developing a non-exponential functional transformation form for use with a generalized additive model. The non-exponential functional transformation form comprises a root sinusoidal transformation that is controlled in response to a minimum phoneme duration and a maximum phoneme duration. The minimum and maximum phoneme durations are observed in training data. The received text is processed by specifying at least one of a number of contextual factors for the generalized additive model.

Type: Grant

Filed: February 22, 2002

Date of Patent: April 22, 2003

Assignee: Apple Computer, Inc.

Inventors: Jerome R. Bellegarda, Kim Silverman
Phase-vocoder pitch-shifting

Patent number: 6549884

Abstract: A system for pitch-shifting an audio signal wherein resampling is done in the frequency domain. The system includes a method for pitch-shifting a signal by converting the signal to a frequency domain representation and then identifying a specific region in the frequency domain representation. The region being located at a first frequency location. Next, the region is shifted to a second frequency location to form a adjusted frequency domain representation. Finally, the adjusted frequency domain representation is transformed to a time domain signal representing the input signal with shifted pitch. This eliminates the expensive time domain resampling stage and allows the computational costs to become independent of the pitch modification factor.

Type: Grant

Filed: September 21, 1999

Date of Patent: April 15, 2003

Assignee: Creative Technology Ltd.

Inventors: Jean Laroche, Mark Dolson
Clustered patterns for text-to-speech synthesis

Patent number: 6529874

Abstract: A representative pattern memory stores a plurality of initial representative patterns as a noise pattern. Different attribute is affixed to each initial representative pattern. A pitch pattern memory stores a large number of natural pitch patterns as an accent phrase. A clustering unit classifies each natural pitch pattern to the initial representative pattern based on the attribute of the accent phrase. A transformation parameter generation unit calculates an error between a transformed representative pattern and each natural pitch pattern classified to the initial representative pattern. A representative pattern generation unit calculates an evaluation function of the sum of the error between the transformed-representative pattern and each natural pitch pattern classified to the initial representative pattern, and updates each initial representative pattern.

Type: Grant

Filed: September 8, 1998

Date of Patent: March 4, 2003

Assignee: Kabushiki Kaisha Toshiba

Inventors: Takehiko Kagoshima, Takaaki Nii, Shigenobu Seto, Masahiro Morita, Masami Akamine, Yoshinori Shiga
High performance voice transformation apparatus and method

Patent number: 6463412

Abstract: A high performance voice transformation apparatus and method is provided in which voice input is transformed into a symbolic representation of phonemes in the voice input. The symbolic representation is used to retrieve output voice segments of a selected target speaker for use in outputting the voice input in a different voice. In addition, voice input characteristics are extracted from the voice input and are then applied to the output voice segments to thereby provide a more realistic human sounding voice output.

Type: Grant

Filed: December 16, 1999

Date of Patent: October 8, 2002

Assignee: International Business Machines Corporation

Inventors: Jason Raymond Baumgartner, Steven Leonard Roberts, Nadeem Malik, Flemming Andersen
Targeted vocal transformation

Patent number: 6336092

Abstract: The invention is a method for transforming a source individual's voice so as to adopt the characteristics of a target individual's voice. The excitation signal component of the target individual's voice is extracted and the spectral envelope of the source individual's voice is extracted. The transformed voice is synthesized by applying the spectral envelope of the source individual to the excitation signal component of the voice of the target individual. A higher quality transformation is achieved using an enhanced excitation signal created by replacing unvoiced regions of the signal with interpolated data from adjacent voiced regions. Various methods of transforming the spectral characteristics of the source individual's voice are also disclosed.

Type: Grant

Filed: April 28, 1997

Date of Patent: January 1, 2002

Assignee: Ivl Technologies Ltd

Inventors: Brian Charles Gibson, Peter Ronald Lupini, Dale John Shpak
Speech synthesis method

Patent number: 6332121

Abstract: In a synthesis unit generator, a plurality of synthesis speech segments are generated by synthesizing training speech segments labeled with phonetic contexts and input speech segments while altering the pitch/duration of the input speech segments in accordance with the pitch/duration of the training speech segments. Typical speech segments are selected from the input speech segments on the basis of a distance between the synthesis speech segments and the training speech segments, and are stored in a storage. In addition, a plurality of phonetic context clusters corresponding to the synthesis units are generated on the basis of the distance, and are stored in a storage. A synthesis speech signal is generated by reading out, from the storage, those of the synthesis units, which correspond to the phonetic context clusters including phonetic contexts of input phonemes, and connecting the selected synthesis units in a speech synthesizer.

Type: Grant

Filed: November 27, 2000

Date of Patent: December 18, 2001

Assignee: Kabushiki Kaisha Toshiba

Inventors: Takehiko Kagoshima, Masami Akamine

prev 1 2 3 next