Transformation Patents (Class 704/269)
-
Patent number: 8055693Abstract: A set of words is converted to a corresponding set of particles, wherein the words and the particles are unique within each set. For each word, all possible partitionings of the word into particles are determined, and a cost is determined for each possible partitioning. The particles of the possible partitioning associated with a minimal cost are added to the set of particles.Type: GrantFiled: June 30, 2009Date of Patent: November 8, 2011Assignee: Mitsubishi Electric Research Laboratories, Inc.Inventors: Tony Ezzat, Evandro Gouvea
-
Patent number: 8024193Abstract: The present invention provides, among other things, automatic identification of near-redundant units in a large TTS voice table, identifying which units are distinctive enough to keep and which units are sufficiently redundant to discard. According to an aspect of the invention, pruning is treated as a clustering problem in a suitable feature space. All instances of a given unit (e.g. word or characters expressed as Unicode strings) are mapped onto the feature space, and cluster units in that space using a suitable similarity measure. Since all units in a given cluster are, by construction, closely related from the point of view of the measure used, they are suitably redundant and can be replaced by a single instance. The disclosed method can detect near-redundancy in TTS units in a completely unsupervised manner, based on an original feature extraction and clustering strategy.Type: GrantFiled: October 10, 2006Date of Patent: September 20, 2011Assignee: Apple inc.Inventor: Jerome R. Bellegarda
-
Patent number: 7966186Abstract: A system and method for generating a synthetic text-to-speech TTS voice are disclosed. A user is presented with at least one TTS voice and at least one voice characteristic. A new synthetic TTS voice is generated by blending a plurality of existing TTS voices according to the selected voice characteristics. The blending of voices involves interpolating segmented parameters of each TTS voice. Segmented parameters may be, for example, prosodic characteristics of the speech such as pitch, volume, phone durations, accents, stress, mis-pronunciations and emotion.Type: GrantFiled: November 4, 2008Date of Patent: June 21, 2011Assignee: AT&T Intellectual Property II, L.P.Inventors: David A. Kapilow, Kenneth H. Rosen, Juergen Schroeter
-
Patent number: 7958446Abstract: A browsing application for accessing resources over a network includes code for receiving a command from a user to translate textual material appearing on an arbitrary page displayed in the content display area, and code for causing the textual material to be passed to a translation resource on the network, whereby the display area of the browsing application is caused to display a page which includes a translation of the textual material. The application may display a menu accessible from an arbitrary page whereby the user may select among translation options such as a translate to language and a translate from language. The application may be configured to cause text selected by a user to be translated, and/or may cause an entire page to be translated. Translated text may be displayed along with graphics in a layout similar to that of the original page.Type: GrantFiled: October 31, 2005Date of Patent: June 7, 2011Assignee: Yahoo! Inc.Inventors: Edward Seitz, Brockton Davis, Derrick Whittle, James Bollas
-
Patent number: 7953600Abstract: A speech synthesis system receives symbolic input describing an utterance to be synthesized. In one embodiment, different portions of the utterance are constructed from different sources, one of which is a speech corpus recorded from a human speaker whose voice is to be modeled. The other sources may include other human speech corpora or speech produced using Rule-Based Speech Synthesis (RBSS). At least some portions of the utterance may be constructed by modifying prototype speech units to produce adapted speech units that are contextually appropriate for the utterance. The system concatenates the adapted speech units with the other speech units to produce a speech waveform. In another embodiment, a speech unit of a speech corpus recorded from a human speaker lacks transitions at one or both of its edges. A transition is synthesized using RBSS and concatenated with the speech unit in producing a speech waveform for the utterance.Type: GrantFiled: April 24, 2007Date of Patent: May 31, 2011Assignee: NovaSpeech LLCInventors: Susan R. Hertz, Harold G. Mills
-
Patent number: 7945446Abstract: Spectrum envelope of an input sound is detected. In the meantime, a converting spectrum is acquired which is a frequency spectrum of a converting sound comprising a plurality of sounds, such as unison sounds. Output spectrum is generated by imparting the detected spectrum envelope of the input sound to the acquired converting spectrum. Sound signal is synthesized on the basis of the generated output spectrum. Further, a pitch of the input sound may be detected, and frequencies of peaks in the acquired converting spectrum may be varied in accordance with the detected pitch of the input sound. In this manner, the output spectrum can have the pitch and spectrum envelope of the input sound and spectrum frequency components of the converting sound comprising a plurality of sounds, and thus, unison sounds can be readily generated with simple arrangements.Type: GrantFiled: March 9, 2006Date of Patent: May 17, 2011Assignee: Yamaha CorporationInventors: Hideki Kemmochi, Yasuo Yoshioka, Jordi Bonada
-
Patent number: 7835909Abstract: A method and apparatus for normalizing a histogram utilizing a backward cumulative histogram which can cumulate a probability distribution function in an order from a greatest to smallest value so as to estimate a noise robust histogram. A method of normalizing a speech feature vector includes: extracting the speech feature vector from a speech signal; calculating a probability distribution function using the extracted speech feature vector; calculating a backward cumulative distribution function by cumulating the probability distribution function in an order from a largest to smallest value; and normalizing a histogram using the backward cumulative distribution function.Type: GrantFiled: December 12, 2006Date of Patent: November 16, 2010Assignee: Samsung Electronics Co., Ltd.Inventors: So-Young Jeong, Gil Jin Jang, Kwang Cheol Oh
-
Patent number: 7814284Abstract: A data redundancy elimination system.Type: GrantFiled: January 18, 2007Date of Patent: October 12, 2010Assignee: Cisco Technology, Inc.Inventors: Gideon Glass, Maxim Martynov, Qiwen Zhang, Etai Lev Ran, Dan Li
-
Patent number: 7716052Abstract: A method, apparatus and a computer program product to generate an audible speech word that corresponds to text. The method includes providing a text word and, in response to the text word, processing pre-recorded speech segments that are derived from a plurality of speakers to selectively concatenate together speech segments based on at least one cost function to form audio data for generating an audible speech word that corresponds to the text word. A data structure is also provided for use in a concatenative text-to-speech system that includes a plurality of speech segments derived from a plurality of speakers, where each speech segment includes an associated attribute vector each of which is comprised of at least one attribute vector element that identifies the speaker from which the speech segment was derived.Type: GrantFiled: April 7, 2005Date of Patent: May 11, 2010Assignee: Nuance Communications, Inc.Inventors: Andrew S. Aaron, Ellen M. Eide, Wael M. Hamza, Michael A. Picheny, Charles T. Rutherfoord, Zhi Wei Shuang, Maria E. Smith
-
Patent number: 7698139Abstract: In a method and apparatus for a differentiated voice output, systems existing in a vehicle, such as the on-board computer, the navigation system, and others, can be connected with a voice output device. The voice outputs of different systems can be differentiated by way of voice characteristics.Type: GrantFiled: June 20, 2003Date of Patent: April 13, 2010Assignee: Bayerische Motoren Werke AktiengesellschaftInventors: Georg Obert, Klaus-Josef Bengler
-
Publication number: 20100076769Abstract: Speech enhancement based on a psycho-acoustic model is disclosed that is capable of preserving the fidelity of speech while sufficiently suppressing noise including the processing artifact known as “musical noise”.Type: ApplicationFiled: March 14, 2008Publication date: March 25, 2010Applicant: DOLBY LABORATORIES LICENSING CORPORATIONInventor: Rongshan Yu
-
Patent number: 7664645Abstract: The voice of a synthesized voice output is individualized and matched to a user voice, the voice of a communication partner or the voice of a famous personality. In this way mobile terminals in particular can be originally individualized and text messages can be read out using a specific voice.Type: GrantFiled: March 11, 2005Date of Patent: February 16, 2010Assignee: SVOX AGInventors: Horst-Udo Hain, Klaus Lukas
-
Patent number: 7580843Abstract: A synthesis subband filter apparatus is provided. The apparatus is used for processing 18 sets of signals which each includes 32 subband sampling signals in accordance with a specification providing 512 window coefficients. The apparatus includes a processor for processing the 18 sets of signals in sequence. The processor further includes a converting module and a generating module. The converting module is used for converting the 32 subband sampling signals of the set of signals being processed into 32 converted vectors by use of 32-points discrete cosine transform (DCT), and writing the 32 converted vectors into 512 default vectors with a first-in, first-out queue. The generating module is used for generating 32 pulse code modulation (PCM) signals, relative to the set of signals being processed according to a set of synthesis formulae proposed in this invention.Type: GrantFiled: May 8, 2006Date of Patent: August 25, 2009Assignee: Quanta Computer, Inc.Inventors: Chih-Hsien Chang, Chih-Wei Hung, Hsien-Ming Tsai
-
Publication number: 20090187409Abstract: Techniques for efficiently encoding an input signal are described. In one design, a generalized encoder encodes the input signal (e.g., an audio signal) based on at least one detector and multiple encoders. The at least one detector may include a signal activity detector, a noise-like signal detector, a sparseness detector, some other detector, or a combination thereof. The multiple encoders may include a silence encoder, a noise-like signal encoder, a time-domain encoder, a transform-domain encoder, some other encoder, or a combination thereof. The characteristics of the input signal may be determined based on the at least one detector. An encoder may be selected from among the multiple encoders based on the characteristics of the input signal. The input signal may be encoded based on the selected encoder. The input signal may include a sequence of frames, and detection and encoding may be performed for each frame.Type: ApplicationFiled: October 8, 2007Publication date: July 23, 2009Applicant: Qualcomm IncorporatedInventors: Venkatesh Krishnan, Vivek Rajendran, Ananthapadmanabhan A. Kandhadai
-
Patent number: 7562018Abstract: A language processing portion (31) analyzes a text from a dialogue processing section (20) and transforms the text to information on pronunciation and accent. A prosody generation portion (32) generates an intonation pattern according to a control signal from the dialogue processing section (20). A waveform DB (34) stores prerecorded waveform data together with pitch mark data imparted thereto. A waveform cutting portion (33) cuts desired pitch waveforms from the waveform DB (34). A phase operation portion (35) removes phase fluctuation by standardizing phase spectra of the pitch waveforms cut by the waveform cutting portion (33), and afterwards imparts phase fluctuation by diffusing only high phase components randomly according to the control signal from the dialogue processing section (20). The thus-produced pitch waveforms are placed at desired intervals and superimposed.Type: GrantFiled: November 25, 2003Date of Patent: July 14, 2009Assignee: Panasonic CorporationInventors: Takahiro Kamai, Yumiko Kato
-
Patent number: 7558732Abstract: Method and system for computer-aided speed synthesis for synthesizing electronic text by performing a predefined series of rules-based analyses in a predefined order, each of the analyses operating in a graduated manner to convert respective electronic text into electronic lexicons, and announcing analog speech based on the results of the performing step.Type: GrantFiled: March 22, 2005Date of Patent: July 7, 2009Assignee: Infineon Technologies AGInventors: Michael Kustner, Markus Schnell
-
Patent number: 7536305Abstract: A mixed lossless audio compression has application to a unified lossy and lossless audio compression scheme that combines lossy and lossless audio compression within a same audio signal. The mixed lossless compression codes a transition frame between lossy and lossless coding frames to produce seamless transitions. The mixed lossless coding performs a lapped transform and inverse lapped transform to produce an appropriately windowed and folded pseudo-time domain frame, which can then be losslessly coded. The mixed lossless coding also can be applied for frames that exhibit poor lossy compression performance.Type: GrantFiled: July 14, 2003Date of Patent: May 19, 2009Assignee: Microsoft CorporationInventors: Wei-Ge Chen, Chao He
-
Patent number: 7454348Abstract: A system and method for generating a synthetic text-to-speech TTS voice are disclosed. A user is presented with at least one TTS voice and at least one voice characteristic. A new synthetic TTS voice is generated by blending a plurality of existing TTS voices according to the selected voice characteristics. The blending of voices involves interpolating segmented parameters of each TTS voice. Segmented parameters may be, for example, prosodic characteristics of the speech such as pitch, volume, phone durations, accents, stress, mis-pronunciations and emotion.Type: GrantFiled: January 8, 2004Date of Patent: November 18, 2008Assignee: AT&T Intellectual Property II, L.P.Inventors: David A. Kapilow, Kenneth H. Rosen, Juergen Schroeter
-
Patent number: 7401021Abstract: An apparatus for voice modulation in a mobile terminal comprises: a voice input unit being inputted a voice of a subscriber and generating an analog voice signal; a voice modulation unit for modulating the generated analog voice signal; an audio processor for converting the modulated analog voice signal into a digital signal; and an mobile station modem (MSM) for processing the digital signal to be suitable for a wireless transmission. Therefore, the apparatus for voice modulation in a mobile terminal is able to protect the privacy of subscriber by modulating the voice of subscriber during speaking on the phone, and is able to prevent the telephone harassment. Also, the voice of subscriber can be modulated variously as voice in a cave, child voice, devil voice, man's voice, woman's voice, and user defined effect sound, etc., and therefore, the various desires of mobile terminal user can be satisfied.Type: GrantFiled: July 10, 2002Date of Patent: July 15, 2008Assignee: LG Electronics Inc.Inventor: I-Won Choi
-
Patent number: 7379873Abstract: Voice synthesis unit data stored in a phoneme database 10 is selected by a voice synthesis unit selector 12 in accordance with MIDI information stored in a performance data storage unit 11. Characteristic parameters are derived from the selected voice synthesis unit data. A characteristic parameter correction unit 21 corrects the characteristic parameters based on pitch information, etc. A spectrum envelope generating unit 23 generates a spectrum envelope in accordance with the corrected characteristic parameter. A timbre transformation unit 25 changes timbre by correcting the characteristic parameters in accordance with timbre transformation parameters in a time axis. Timbres in the same song position can be transformed into different arbitrary timbres respectively; therefore, the synthesized singing voice will be rich in variety and reality.Type: GrantFiled: July 3, 2003Date of Patent: May 27, 2008Assignee: Yamaha CorporationInventor: Hideki Kemmochi
-
Patent number: 7336565Abstract: An educational alarm clock radio is provided that speaks a new word each day when the alarm goes off, the words each being stored in a memory cartridge as an individual increments of information in a sequential set of increments. When the alarm goes off, the word of the day, the definition of that word of the day and its use in a sentence are spoken via the audio portion of the device as the next information increment in the sequence. The word will also be displayed on a screen so the user can see the correct spelling of the word. The word may be replayed at any time during the day by activating a device control. Prior words may be displayed by energizing a reverse control. The entire sequence of previously played words, moreover, can be played in serial fashion through further activation of control or combination of controls. The device also serves as an alarm clock radio with alarm types such as wake by buzzer or radio as well as the wake by words function.Type: GrantFiled: June 12, 2006Date of Patent: February 26, 2008Inventors: Neil Rohrbacker, Gregory Rohrbacker
-
Patent number: 7337117Abstract: An apparatus for phonetically screening predetermined character strings. The apparatus includes a text-to-speech module, and a phonetic screening module in communication with the text-to-speech module. The phonetic screening module is for replacing a first character string with a second character string based on a phonetic enunciation by the text-to-speech module of the first character string.Type: GrantFiled: September 21, 2004Date of Patent: February 26, 2008Assignee: AT&T Delaware Intellectual Property, Inc.Inventor: Anita Hogans Simpson
-
Patent number: 7162417Abstract: An amplitude altering magnification (r) applied to sub-phoneme units of a voiced portion and an amplitude altering magnification s to be applied to sub-phoneme units of an unvoiced portion are determined based upon a target phoneme average power (p0) of synthesized speech and power (p) of a selected phoneme unit. Sub-phoneme units are extracted from a phoneme to be synthesized. From among the extracted sub-phoneme units, a sub-phoneme unit of the voiced portion is multiplied by the amplitude altering magnification (r), and a sub-phoneme unit of the unvoiced portion is multiplied by the amplitude altering magnification (s). Synthesized speech is obtained using the sub-phoneme units thus obtained. This makes it possible to realize power control in which any decline in the quality of synthesized speech is reduced.Type: GrantFiled: July 13, 2005Date of Patent: January 9, 2007Assignee: Canon Kabushiki KaishaInventors: Masayuki Yamada, Yasuhiro Komori, Mitsuru Otsuka
-
Patent number: 7069217Abstract: A synthesizer is disclosed in which a speech waveform is synthesized by selecting a synthetic starting waveform segment and then generating a sequence of further segments. The further waveform segments are generated based jointly upon the value of the immediately-preceding segment and upon a model of the dynamics of an actual sound similar to that being generated. In particular, a method is disclosed of a voiced speech sound comprising calculating each new output value from the previous output value using data modeling the evolution, over a short time interval, of the voiced speech sound to be synthesized. This sequential generation of waveform segments enables a synthesized sequence of speech waveforms to be generated of any duration. In addition, a low-dimensional state space representation of speech signals are used in which successive pitch pulse cycles are superimposed to estimate the progression of the cyclic speech signal within each cycle.Type: GrantFiled: January 9, 1997Date of Patent: June 27, 2006Assignee: British Telecommunications PLCInventors: Stephen McLaughlin, Michael Banbrook
-
Patent number: 7050966Abstract: A system and method of improving signal intelligibility over an interference signal is provided. The system includes a psychoacoustic professor having a psychoacoustic model wherein the level of a signal-of-interest is improved so as to be audible above noise and so as not to exceed a predetermined maximum output level. The system can be combined with active noise cancellation.Type: GrantFiled: August 7, 2002Date of Patent: May 23, 2006Assignee: AMI Semiconductor, Inc.Inventors: Todd Schneider, David Coode, Robert L. Brennan, Peter Olijnyk
-
Patent number: 7010488Abstract: A system and method is used to compress concatenative acoustic inventories for speech. Instead of using general purpose signal compression methods such as vector quantization, the method of the invention uses multiple properties of acoustic inventories to reduce the size of the acoustic inventories, such as the close acoustic match property and acoustic units that are labeled with sufficiently fine distinctions such that between any two phones no events occur that are substantially distinct from these two phones. The close acoustic match property is where acoustic units that share the same phone are acoustically similar at the points where these units may be concatenated. By utilizing multiple properties of acoustic units, the number of parameters per unit that are stored as LPC parameters are minimized. As a result, smaller storage devices may be used due to the reduction of the size of the storage requirements.Type: GrantFiled: May 9, 2002Date of Patent: March 7, 2006Assignee: Oregon Health & Science UniversityInventors: Jan P. H. van Santen, Alexander Kain
-
Patent number: 6999520Abstract: A method and apparatus for extending the dynamic range of an integer or fixed-point Fast Fourier Transform (“FFT”) system that may be used in communications devices such as ADSL modems. The disclosed FFT system utilizes a shift control module to increase the effective dynamic range of the FFT implementation by selectively choosing at least one stage of an FFT butterfly implementation in which the outputs of the butterfly stage are not divided to otherwise avoid overflow problems.Type: GrantFiled: January 24, 2002Date of Patent: February 14, 2006Assignee: Tioga TechnologiesInventor: Guy Reina
-
Efficient system and method for converting between different transform-domain signal representations
Patent number: 6963842Abstract: A memory-efficient system converting a signal from a first transform domain to a second transform domain. The system includes a first mechanism that obtains an input signal expressed via a first transform-domain signal representation. A second mechanism expresses the input signal via a second transform-domain signal representation without intermediate time-domain conversion. In the specific embodiment, the input signal is a Modified Discrete Cosine Transform (MDCT) signal. The second transform-domain signal representation is a Discrete Fourier Transform (DFT) signal. The second mechanism further includes a third mechanism that combines effects of an inverse MDCT, a synthesis window function, and an analysis window function, and provides a first signal in response thereto. A fourth mechanism converts the MDCT signal to the DFT signal based on the first signal.Type: GrantFiled: September 5, 2001Date of Patent: November 8, 2005Assignee: Creative Technology Ltd.Inventor: Michael M. Goodwin -
Patent number: 6950799Abstract: A speech processing system modifies various aspects of input speech according to a user-selected one of various preprogrammed voice fonts. Initially, the speech converter receives a formants signal representing an input speech signal and a pitch signal representing the input signal's fundamental frequency. One or both of the following may also be received: a voicing signal comprising an indication of whether the input speech signal is voiced, unvoiced, or mixed, and/or a gain signal representing the input speech signal's energy. The speech converter also receives user selection of one of multiple preprogrammed voice fonts, each specifying a manner of modifying one or more of the received signals (i.e., formants, voicing, pitch, gain). The speech converter modifies at least one of the formants, voicing, pitch, and/or gain signals as specified by the selected voice font.Type: GrantFiled: February 19, 2002Date of Patent: September 27, 2005Assignee: Qualcomm Inc.Inventors: Ning Bi, Andrew P. DeJaco
-
Patent number: 6947731Abstract: A method for conversion of a voice output of appliance statuses, wherein three spoken phrases are stored for each appliance to be controlled, with the first spoken phrase being allocated to a first appliance status, the second spoken phrase being allocated to a second appliance status, and the third spoken phrase being allocated for at least one third status. When an appliance status is checked, the relevant appliance sends a data word. If the value (which identifies the current appliance status) of the data word corresponds to a first value, the first spoken phrase is output, if it corresponds to a second, the second spoken phrase is output, and the third spoken phrase and the third value are output for at least one third value.Type: GrantFiled: September 21, 2000Date of Patent: September 20, 2005Assignee: Siemens AktiengesellschaftInventor: Erich Kamperschroer
-
Patent number: 6845359Abstract: A Fast Fourier Transform (FFT) based voice synthesis method 110, program product and vocoder. Sounds, e.g., speech and audio, are synthesized from multiple sine waves. Each sine wave component is represented by a small number of FFT coefficients 116. Amplitude 120 and phase 124 information of the components may be incorporated into these coefficients. The FFT coefficients corresponding to each of the components are summed 126 and, then, an inverse FFT is applied 128 to the sum to generate a time domain signal. An appropriate section is extracted 130 from the inverse transformed time domain signal as an approximation to the desired output. FFT based synthesis 110 may be combined with simple sine wave summation 100, using FFT based synthesis 110 for complex sounds, e.g., male voices and unvoiced speech, and sine wave summation 100 for simpler sounds, e.g., female voices.Type: GrantFiled: March 22, 2001Date of Patent: January 18, 2005Assignee: Motorola, Inc.Inventor: Tenkasi Ramabadran
-
Publication number: 20040260552Abstract: A method, computer program product, and data processing system for compensating for fundamental frequency changes in a frame-based speech processing system is disclosed. In a preferred embodiment of the present invention, a frame of a voiced speech signal is processed by an inverse linear-predictive filter to obtain a residual signal that is indicative of the fundamental tone emitted by the speaker's vocal cords. A transformation function is applied to the frame to limit the frame to an integer number of pitch cycles. This transformed frame is used in conjunction with vocal tract parameters obtained from the original speech signal frame to construct a pitch-adjusted speech signal that can more easily be understood by speech- or speaker recognition software.Type: ApplicationFiled: June 23, 2003Publication date: December 23, 2004Applicant: International Business Machines CorporationInventors: Jiri Navratil, Ganesh N. Ramaswamy, Ran D. Zilca
-
Patent number: 6804649Abstract: Voice synthesis with improved expressivity is obtained in a voice synthesiser of source-filter type by making use of a library of source sound categories in the source module. Each source sound category corresponds to a particular morphological category and is derived from analysis of real vocal sounds, by inverse filtering so as to subtract the effect of the vocal tract. The library may be parametrical, that is, the stored data corresponds not to the inverse-filtered sounds themselves but to synthesis coefficients for resynthesising the inverse-filtered sounds using any suitable re-synthesis technique, such as the phase vocoder technique. The coefficients are derived by Short Time Fourier Transform (STFT) analysis.Type: GrantFiled: June 1, 2001Date of Patent: October 12, 2004Assignee: Sony France S.A.Inventor: Eduardo Reck Miranda
-
Patent number: 6795807Abstract: A device and a method to be used by laryngeally impaired people to improve the naturalness of their speech. An artificial sound creating mechanism which forms a simulated glottal pulse in the vocal tract is utilized. An artificial glottal pulse is compared with the natural spectrum and an inverse filter is generated to provide an output signal which would better reproduce natural sound. A digital signal processor introduces a variation of pitch based on an algorithm developed for this purpose; i.e. creating prosody. The algorithm uses primarily the relative amplitude of the speech signal and the rise and fall rates of the amplitude as a basis for setting the frequency of the speech. The invention also clarifies speech of laryngectomees by sensing the presence of consonants in the speech and appropriately amplifying them with respect to the vowel sounds.Type: GrantFiled: August 17, 2000Date of Patent: September 21, 2004Inventor: David R. Baraff
-
Patent number: 6785652Abstract: A method and an apparatus for improved duration modeling of phonemes in a speech synthesis system are provided. According to one aspect, text is received into a processor of a speech synthesis system. The received text is processed using a sum-of-products phoneme duration model that is used in either the formant method or the concatenative method of speech generation. The phoneme duration model, which is used along with a phoneme pitch model, is produced by developing a non-exponential functional transformation form for use with a generalized additive model. The non-exponential functional transformation form comprises a root sinusoidal transformation that is controlled in response to a minimum phoneme duration and a maximum phoneme duration. The minimum and maximum phoneme durations are observed in training data. The received text is processed by specifying at least one of a number of contextual factors for the generalized additive model.Type: GrantFiled: December 19, 2002Date of Patent: August 31, 2004Assignee: Apple Computer, Inc.Inventors: Jerome R. Bellegarda, Kim Silverman
-
Patent number: 6778962Abstract: A speech synthesizing method includes determining the accent type of the input character string, selecting the prosodic model data from a prosody dictionary for storing typical ones of the prosodic models representing the prosodic information for the character strings in a word dictionary, based on the input character string and the accent type, transforming the prosodic information of the prosodic model when the character string of the selected prosodic model is not coincident with the input character string, selecting the waveform data corresponding to each character of the input character string from a waveform dictionary, based on the prosodic model data after transformation, and connecting the selected waveform data with each other. Therefore, a difference between an input character string and a character string stored in a dictionary is absorbed, then it is possible to synthesize a natural voice.Type: GrantFiled: July 21, 2000Date of Patent: August 17, 2004Assignees: Konami Corporation, Konami Computer Entertainment Tokyo, Inc.Inventors: Osamu Kasai, Toshiyuki Mizoguchi
-
Patent number: 6757653Abstract: A method of composing messages for speech output and the improvement of the quality of reproduction of speech outputs. A series of original sentences for messages is segmented and stored as audio files with search criteria. The length, position, and transition values for the respective segments can be recorded and stored. A sentence to be reproduced is transmitted in a format corresponding to the format of the search criteria. It is determined whether the sentence to be reproduced can be fully reproduced by one segment or a succession of stored segments. The segments found in each case are examined using their entries as to how far the individual segments match as regards speech rhythm. The audio files of the segments in which the examination resulted in the pre-requisites for optimal maintaining of the natural speech rhythm are combined and output for reproduction.Type: GrantFiled: June 28, 2001Date of Patent: June 29, 2004Assignee: Nokia Mobile Phones, Ltd.Inventors: Peter Buth, Simona Grothues, Amir Iman, Wolfgang Theimer
-
Patent number: 6754628Abstract: Methods and apparatus for facilitating speaker recognition, wherein, from target data that is provided relating to a target speaker and background data that is provided relating to at least one background speaker, a set of cohort data is selected from the background data that has at least one proximate characteristic with respect to the target data. The target data and the cohort data are then combined in a manner to produce at least one new cohort model for use in subsequent speaker verification. Similar methods and apparatus are contemplated for non-voice-based applications, such as verification through fingerprints.Type: GrantFiled: June 13, 2000Date of Patent: June 22, 2004Assignee: International Business Machines CorporationInventors: Upendra V. Chaudhari, Stephane H. Maes, Jiri Navratil
-
Patent number: 6741666Abstract: A method and a device by which original digital signals are analysis-filtered, where the original digital signals include original samples representing physical quantities, and where the original samples are transformed by successive calculation steps into high and low frequency output samples. Any sample calculated at a given step is calculated by a predetermined function of the original samples and/or previously calculated samples, where the samples are ordered by increasing rank. The signal is processed by successive input blocks of samples, where the calculations made on an input block under consideration take into account only the original or calculated samples belonging to the input block under consideration, and where the input block under consideration and the following input block overlap over a predetermined number of original samples. Output blocks are formed, where each output block corresponds respectively to an input block.Type: GrantFiled: January 11, 2000Date of Patent: May 25, 2004Assignee: Canon Kabushiki KaishaInventors: Félix Henry, Bertrand Berthelot, Eric Majani
-
Publication number: 20040093214Abstract: A speech to touch translator assembly and method for converting spoken words directed to an operator into tactile sensations caused by combinations of pressure point exertions on the body of the operator, each combination of pressure points exerted signifying a phoneme of one of the spoken words, and sound characteristics superimposed on the spoken words, permitting comprehension of spoken words, and the speaker thereof, by persons that are deaf and blind.Type: ApplicationFiled: November 12, 2002Publication date: May 13, 2004Inventors: Robert V. Belenger, Gennaro R. Lopriore
-
Publication number: 20040073429Abstract: The invention relates to an information transmission system capable of transmitting target information via voice, as well as to an information encoding apparatus and an information decoding apparatus for use with the system. The information encoding apparatus (31) converts input text information to an intermediate code in accordance with a predetermined encoding method, and outputs a voice derived from voice information based on the intermediate code and supplemented with music arrangement information. The voice is transmitted either directly or via a broadcasting or communicating medium to a receiving side. The information decoding apparatus (34) on the receiving side receives the generated voice, recognizes a voice waveform from the received voice, and reproduces the original target information by decoding the intermediate code based on the recognized voice waveform.Type: ApplicationFiled: August 15, 2003Publication date: April 15, 2004Inventor: Tetsuya Naruse
-
Publication number: 20040006472Abstract: Voice synthesis unit data stored in a phoneme database 10 is selected by a voice synthesis unit selector 12 in accordance with MIDI information stored in a performance data storage unit 11. Characteristic parameters are derived from the selected voice synthesis unit data. A characteristic parameter correction unit 21 corrects the characteristic parameters based on pitch information, etc. A spectrum envelope generating unit 23 generates a spectrum envelope in accordance with the corrected characteristic parameter. A timbre transformation unit 25 changes timbre by correcting the characteristic parameters in accordance with timbre transformation parameters in a time axis. Timbres in the same song position can be transformed into different arbitrary timbres respectively; therefore, the synthesized singing voice will be rich in variety and reality.Type: ApplicationFiled: July 3, 2003Publication date: January 8, 2004Applicant: Yamaha CorporationInventor: Hideki Kemmochi
-
Patent number: 6658382Abstract: An input signal is time-frequency transformed, then the frequency-domain coefficients are divided into coefficient segments of about 100 Hz width to generate a sequence of coefficient segments, and the sequence of coefficient segments is split into subbands each consisting of plural coefficient segments. A threshold value is determined based on the intensity of each coefficient segment in each subband. The intensity of each coefficient segment is compared with the threshold value, and the coefficient segments are classified into low- and high-intensity groups. The coefficient segments are quantized for each group, or they are flattened respectively and then quantized through recombination.Type: GrantFiled: March 23, 2000Date of Patent: December 2, 2003Assignee: Nippon Telegraph and Telephone CorporationInventors: Naoki Iwakami, Takehiro Moriya, Akio Jin, Kazuaki Chikira, Takeshi Mori
-
Publication number: 20030187651Abstract: A voice synthesis system analyzes an input character string, determining a part for which to use recorded voice and a part for which to use synthesized voice, extracts voice data for the part for which to use recorded voice from a database and extracts its feature amount. Then, the system synthesizes voice data to fit the extracted feature amount for the part for which to use synthesized voice, and combines/outputs these pieces of voice data.Type: ApplicationFiled: December 3, 2002Publication date: October 2, 2003Applicant: Fujitsu LimitedInventor: Wataru Imatake
-
Patent number: 6553344Abstract: A method and an apparatus for improved duration modeling of phonemes in a speech synthesis system are provided. According to one aspect, text is received into a processor of a speech synthesis system. The received text is processed using a sum-of-products phoneme duration model that is used in either the formant method or the concatenative method of speech generation. The phoneme duration model, which is used along with a phoneme pitch model, is produced by developing a non-exponential functional transformation form for use with a generalized additive model. The non-exponential functional transformation form comprises a root sinusoidal transformation that is controlled in response to a minimum phoneme duration and a maximum phoneme duration. The minimum and maximum phoneme durations are observed in training data. The received text is processed by specifying at least one of a number of contextual factors for the generalized additive model.Type: GrantFiled: February 22, 2002Date of Patent: April 22, 2003Assignee: Apple Computer, Inc.Inventors: Jerome R. Bellegarda, Kim Silverman
-
Patent number: 6549884Abstract: A system for pitch-shifting an audio signal wherein resampling is done in the frequency domain. The system includes a method for pitch-shifting a signal by converting the signal to a frequency domain representation and then identifying a specific region in the frequency domain representation. The region being located at a first frequency location. Next, the region is shifted to a second frequency location to form a adjusted frequency domain representation. Finally, the adjusted frequency domain representation is transformed to a time domain signal representing the input signal with shifted pitch. This eliminates the expensive time domain resampling stage and allows the computational costs to become independent of the pitch modification factor.Type: GrantFiled: September 21, 1999Date of Patent: April 15, 2003Assignee: Creative Technology Ltd.Inventors: Jean Laroche, Mark Dolson
-
Patent number: 6529874Abstract: A representative pattern memory stores a plurality of initial representative patterns as a noise pattern. Different attribute is affixed to each initial representative pattern. A pitch pattern memory stores a large number of natural pitch patterns as an accent phrase. A clustering unit classifies each natural pitch pattern to the initial representative pattern based on the attribute of the accent phrase. A transformation parameter generation unit calculates an error between a transformed representative pattern and each natural pitch pattern classified to the initial representative pattern. A representative pattern generation unit calculates an evaluation function of the sum of the error between the transformed-representative pattern and each natural pitch pattern classified to the initial representative pattern, and updates each initial representative pattern.Type: GrantFiled: September 8, 1998Date of Patent: March 4, 2003Assignee: Kabushiki Kaisha ToshibaInventors: Takehiko Kagoshima, Takaaki Nii, Shigenobu Seto, Masahiro Morita, Masami Akamine, Yoshinori Shiga
-
Patent number: 6463412Abstract: A high performance voice transformation apparatus and method is provided in which voice input is transformed into a symbolic representation of phonemes in the voice input. The symbolic representation is used to retrieve output voice segments of a selected target speaker for use in outputting the voice input in a different voice. In addition, voice input characteristics are extracted from the voice input and are then applied to the output voice segments to thereby provide a more realistic human sounding voice output.Type: GrantFiled: December 16, 1999Date of Patent: October 8, 2002Assignee: International Business Machines CorporationInventors: Jason Raymond Baumgartner, Steven Leonard Roberts, Nadeem Malik, Flemming Andersen
-
Patent number: 6336092Abstract: The invention is a method for transforming a source individual's voice so as to adopt the characteristics of a target individual's voice. The excitation signal component of the target individual's voice is extracted and the spectral envelope of the source individual's voice is extracted. The transformed voice is synthesized by applying the spectral envelope of the source individual to the excitation signal component of the voice of the target individual. A higher quality transformation is achieved using an enhanced excitation signal created by replacing unvoiced regions of the signal with interpolated data from adjacent voiced regions. Various methods of transforming the spectral characteristics of the source individual's voice are also disclosed.Type: GrantFiled: April 28, 1997Date of Patent: January 1, 2002Assignee: Ivl Technologies LtdInventors: Brian Charles Gibson, Peter Ronald Lupini, Dale John Shpak
-
Patent number: 6332121Abstract: In a synthesis unit generator, a plurality of synthesis speech segments are generated by synthesizing training speech segments labeled with phonetic contexts and input speech segments while altering the pitch/duration of the input speech segments in accordance with the pitch/duration of the training speech segments. Typical speech segments are selected from the input speech segments on the basis of a distance between the synthesis speech segments and the training speech segments, and are stored in a storage. In addition, a plurality of phonetic context clusters corresponding to the synthesis units are generated on the basis of the distance, and are stored in a storage. A synthesis speech signal is generated by reading out, from the storage, those of the synthesis units, which correspond to the phonetic context clusters including phonetic contexts of input phonemes, and connecting the selected synthesis units in a speech synthesizer.Type: GrantFiled: November 27, 2000Date of Patent: December 18, 2001Assignee: Kabushiki Kaisha ToshibaInventors: Takehiko Kagoshima, Masami Akamine