Frequency Element Patents (Class 704/268)

Information transmission device

Patent number: 8185395

Abstract: An information transmission device which analyzes a diction of a speaker and provides an utterance in accordance with the diction of the speaker, and which has a microphone detecting a sound signal of the speaker, a feature extraction unit extracting at least one feature value of the diction of the speaker based on the sound signal detected by the microphone, a voice synthesis unit synthesizing a voice signal to be uttered so that the voice signal has the same feature value as the diction of the speaker, based on the feature value extracted by the feature extraction unit, and a voice output unit performing an utterance based on the voice signal synthesized by the voice synthesis unit.

Type: Grant

Filed: September 13, 2005

Date of Patent: May 22, 2012

Assignee: Honda Motor Co., Ltd.

Inventors: Tokitomo Ariyoshi, Kazuhiro Nakadai, Hiroshi Tsujino
Periodic signal enhancement system

Patent number: 8170879

Abstract: A signal enhancement system improves the understandability of speech or other audio signals. The system reinforces selected parts of the signal, may attenuate selected parts of the signal, and may increase SNR. The system includes delay logic, a partitioned adaptive filter, and signal reinforcement logic. The partitioned adaptive filter may track and enhance the fundamental frequency and harmonics in the input signal. The partitioned filter output signals may approximately reproduce the input signal, delayed by an integer multiple of the period of the fundamental frequency of the input signal. The reinforcement logic combines the input signal and the filtered signals to produce an enhanced output signal.

Type: Grant

Filed: April 8, 2005

Date of Patent: May 1, 2012

Assignee: QNX Software Systems Limited

Inventors: Rajeev Nongpiur, David Giesbrecht, Phillip Hetherington
Method, apparatus and program for speech synthesis

Patent number: 8165882

Abstract: Apparatus and method for generating high quality synthesized speech having smooth waveform concatenation. The apparatus includes a pitch frequency calculation section, a pitch synchronization position calculation section, a unit waveform storage, a unit waveform selection section, a unit waveform generation section, and a waveform synthesis section. The unit waveform generation section includes a conversion ratio calculation section, a sampling rate conversion section, and a unit waveform re-selection section. The conversion ratio calculation section calculates a sampling rate conversion ratio from the pitch information and the position of pitch synchronization, and the sampling rate conversion section converts the sampling rate of the unit waveform, delivered as input, based on the sampling rate conversion ratio.

Type: Grant

Filed: September 4, 2006

Date of Patent: April 24, 2012

Assignee: NEC Corporation

Inventors: Masanori Kato, Satoshi Tsukada
Techniques for enhancing the performance of concatenative speech synthesis

Patent number: 8145491

Abstract: When pitch of a speech segment is being modified from a current pitch to a requested pitch, and the difference between these is relatively large, a pitch modification algorithm is used to modify the pitch of the speech segment. When the difference between current and requested pitches is relatively small, the pitch of the speech segment is not modified. After one or the other speech modification techniques are used, then the resultant modified speech segment is overlapped and added to previously modified speech segments. A modification ratio is determined in order to quantify the difference between the current and requested pitches for a speech segment. The modification ratio is a ratio between the requested and current pitches. Low and high ratio thresholds are used to determine when pitch is being modified to a predetermined high degree, and whether pitch of the speech segment will or will not be modified.

Type: Grant

Filed: July 30, 2002

Date of Patent: March 27, 2012

Assignee: Nuance Communications, Inc.

Inventors: Wael Mohamed Hamza, Michael Alan Picheny
System and method for predicting prosodic parameters

Patent number: 8126717

Abstract: A method for generating a prosody model that predicts prosodic parameters is disclosed. Upon receiving text annotated with acoustic features, the method comprises generating first classification and regression trees (CARTs) that predict durations and F0 from text by generating initial boundary labels by considering pauses, generating initial accent labels by applying a simple rule on text-derived features only, adding the predicted accent and boundary labels to feature vectors, and using the feature vectors to generate the first CARTs. The first CARTs are used to predict accent and boundary labels. Next, the first CARTs are used to generate second CARTs that predict durations and F0 from text and acoustic features by using lengthened accented syllables and phrase-final syllables, refining accent and boundary models simultaneously, comparing actual and predicted duration of a whole prosodic phrase to normalize speaking rate, and generating the second CARTs that predict the normalized speaking rate.

Type: Grant

Filed: October 13, 2006

Date of Patent: February 28, 2012

Assignee: AT&T Intellectual Property II, L.P.

Inventor: Volker Franz Strom
Automatic level control of speech signals

Patent number: 8121835

Abstract: Automatic level control of speech portions of an audio signal is provided. An audio signal is received in the form of a sequence of samples and may contain speech portion and non-speech portions. The sequence of samples is divided into a sequence of sub-frames. Multiple sub-frames adjacent to a present sub-frame are examined to determine a peak value of samples in the sub-frames. A gain factor is computed for the present sub-frame based on the peak value and a desired maximum value for said speech portion, and each sample in the present sub-frame is amplified by the gain factor. In an embodiment, variations in filtered energy values of multiple sub-frames enable determination of whether a sub-frame corresponds to a speech or non-speech/noise portion.

Type: Grant

Filed: March 6, 2008

Date of Patent: February 21, 2012

Assignee: Texas Instruments Incorporated

Inventor: Fitzgerald John Archibald
Speech synthesis system and speech synthesis method

Patent number: 8108216

Abstract: In a speech synthesis, a selecting unit selects one string from first speech unit strings corresponding to a first segment sequence obtained by dividing a phoneme string corresponding to target speech into segments. The selecting unit performs repeatedly generating, based on maximum W second speech unit strings corresponding to a second segment sequence as a partial sequence of the first sequence, third speech unit strings corresponding to a third segment sequence obtained by adding a segment to the second sequence, and selecting maximum W strings from the third strings based on a evaluation value of each of the third strings. The value is obtained by correcting a total cost of each of the third string candidate with a penalty coefficient for each of the third strings. The coefficient is based on a restriction concerning quickness of speech unit data acquisition, and depends on extent in which the restriction is approached.

Type: Grant

Filed: March 19, 2008

Date of Patent: January 31, 2012

Assignee: Kabushiki Kaisha Toshiba

Inventors: Masahiro Morita, Takehiko Kagoshima
Noise adaptive mobile communication device, and call sound synthesizing method using the same

Patent number: 8108217

Abstract: A noise adaptive mobile communication device including a noise collecting microphone which collects noise from a peripheral environment; a noise sensing unit which senses the collected noise; a frequency-component detecting unit which detects a frequency component of the sensed noise; a sound generating unit which generates a noise-adaptive sound from the detected frequency component; a call-sound synthesizing unit which synthesizes received call sound with the noise-adaptive sound; and an operation control unit which controls the call-sound synthesizing unit to operate each predetermined time.

Type: Grant

Filed: February 11, 2005

Date of Patent: January 31, 2012

Assignee: Samsung Electronics Co., Ltd.

Inventors: Myung-Hyun Yoo, Jaywoo Kim, Joonah Park, Seung-Nyung Chung
Method and apparatus for speech synthesis using paralinguistic variation

Patent number: 8103505

Abstract: A method and apparatus for speech synthesis in a computer-user interface using random paralinguistic variation is described herein. According to one aspect of the present invention, a method for synthesizing speech comprises generating synthesized speech having certain prosodic features. The synthesized speech is further processed by applying a random paralinguistic variation to the acoustic sequence representing the synthesized speech without altering the linguistic prosodic features. According to one aspect of the present invention, the application of the paralinguistic variation is correlated with a previously applied paralinguistic variation to reflect a gradual change in the computer voice, while still maintaining a random quality.

Type: Grant

Filed: November 19, 2003

Date of Patent: January 24, 2012

Assignee: Apple Inc.

Inventors: Kim Silverman, Donald Lindsay
CODING, MODIFICATION AND SYNTHESIS OF SPEECH SEGMENTS

Publication number: 20110320207

Abstract: The invention relates to a method for speech signal analysis, modification and synthesis comprising a phase for the location of analysis windows by means of an iterative process for the determination of the phase of the first sinusoidal component and comparison between the phase value of said component and a predetermined value, a phase for the selection of analysis frames corresponding to an allophone and readjustment of the duration and the fundamental frequency according to certain thresholds and a phase for the generation of synthetic speech from synthesis frames taking the information of the closest analysis frame as spectral information of the synthesis frame and taking as many synthesis frames as periods that the synthetic signal has. The method allows a coherent location of the analysis windows within the periods of the signal and the exact generation of the synthesis instants in a manner synchronous with the fundamental period.

Type: Application

Filed: December 21, 2010

Publication date: December 29, 2011

Applicant: TELEFONICA, S.A.

Inventors: Miguel Angel Rodriguez Crespo, Jose Gregorio Escalada Sardina, Ana Armenta Lopez Vicuna
Voice synthesis device

Patent number: 8073696

Abstract: A voice synthesis device is provided to include: an emotion input unit obtaining an utterance mode of a voice waveform, a prosody generation unit generating a prosody, a characteristic tone selection unit selecting a characteristic tone based on the utterance mode; and a characteristic tone temporal position estimation unit (i) judging whether or not each of phonemes included in a phonologic sequence of text is to be uttered with the characteristic tone, based on the phonologic sequence, the characteristic tone, and the prosody, and (ii) deciding a phoneme, which is an utterance position where the text is uttered with the characteristic tone. The voice synthesis device also includes an element selection unit and an element connection unit generating the voice waveform based on the phonologic sequence, the prosody, and the utterance position, so that the text is uttered in the utterance mode with the characteristic tone at the determined utterance position.

Type: Grant

Filed: May 2, 2006

Date of Patent: December 6, 2011

Assignee: Panasonic Corporation

Inventors: Yumiko Kato, Takahiro Kamai
Method and Apparatus for Switching Speech or Audio Signals

Publication number: 20110270614

Abstract: A method and an apparatus for switching speech or audio signals, wherein the method for switching speech or audio signals includes when switching of a speech or audio, weighting a first high frequency band signal of a current frame of speech or audio signal and a second high frequency band signal of the previous M frame of speech or audio signals to obtain a processed first high frequency band signal, where M is greater than or equal to 1, and synthesizing the processed first high frequency band signal and a first low frequency band signal of the current frame of speech or audio signal into a wide frequency band signal. In this way, speech or audio signals with different bandwidths can be smoothly switched, thus improving the quality of audio signals received by a user.

Type: Application

Filed: June 16, 2011

Publication date: November 3, 2011

Applicant: HUAWEI TECHNOLOGIES CO., LTD.

Inventors: Zexin Liu, Lei Miao, Chen Hu, Wenhai Wu, Yue Lang, Qing Zhang
Prosody-pattern generating apparatus, speech synthesizing apparatus, and computer program product and method thereof

Patent number: 8046225

Abstract: Normalization parameters are generated at a normalization-parameter generating unit by calculating the mean values and the standard deviations of an initial prosody pattern and a prosody pattern of a training sentence of a speech corpus. Then, the variance range or variance width of the initial prosody pattern is normalized at the prosody-pattern normalizing unit in accordance with the normalization parameters. As a result, a prosody pattern similar to speech of human beings and improved in naturalness can be generated with a small amount of calculation.

Type: Grant

Filed: February 8, 2008

Date of Patent: October 25, 2011

Assignee: Kabushiki Kaisha Toshiba

Inventors: Takashi Masuko, Masami Akamine
Multi-unit approach to text-to-speech synthesis

Patent number: 8036894

Abstract: Methods, apparatus, systems, and computer program products are provided for synthesizing speech. One method includes matching a first level of units of a received input string to audio segments from a plurality of audio segments including using properties of or between first level units to locate matching audio segments from a plurality of selections, parsing unmatched first level units into second level units, matching the second level units to audio segments using properties of or between the units to locate matching audio segments from a plurality of selections and synthesizing the input string, including combining the audio segments associated with the first and second units.

Type: Grant

Filed: February 16, 2006

Date of Patent: October 11, 2011

Assignee: Apple Inc.

Inventors: Matthias Neeracher, Devang K. Naik, Kevin B. Aitken, Jerome R. Bellegarda, Kim E.A. Silverman
Using non-speech sounds during text-to-speech synthesis

Patent number: 8027837

Abstract: Systems, apparatus, methods and computer program products are described for producing text-to-speech synthesis with non-speech sounds. In general, some of the pauses or silences that would otherwise be generated in synthesized speech are instead synthesized as non-speech sounds such as breaths. Non-speech sounds can be identified from pre-recorded speech that can include meta-data such as the grammatical and phrasal structure of words and sounds that precede and succeed non-speech sounds. A non-speech sound can be selected for use in synthesized speech based on the words, punctuation, grammatical and phrasal structure of text from which the speech is being synthesized, or other characteristics.

Type: Grant

Filed: September 15, 2006

Date of Patent: September 27, 2011

Assignee: Apple Inc.

Inventors: Kim E. A. Silverman, Matthias Neeracher
ROBOT, METHOD AND PROGRAM OF CONTROLLING ROBOT

Publication number: 20110224977

Abstract: A robot may include a driving control unit configured to control a driving of a movable unit that is connected movably to a body unit, a voice generating unit configured to generate a voice, and a voice output unit configured to output the voice, which has been generated by the voice generating unit. The voice generating unit may correct the voice, which is generated, based on a bearing of the movable unit, which is controlled by the driving control unit, to the body unit.

Type: Application

Filed: September 14, 2010

Publication date: September 15, 2011

Applicant: HONDA MOTOR CO., LTD.

Inventors: Kazuhiro NAKADAI, Takuma OTSUKA, Hiroshi OKUNO
System for Controlling Digital Effects in Live Performances with Vocal Improvisation

Publication number: 20110218810

Abstract: A system for controlling digital effects in live performances with vocal improvisation is described. The system features a complex controller that in one embodiment utilizes several magnetically activated electronic switches attached to a glove that is worn by an artist during a live performance. The switches are activated by a permanent magnet that is also attached to the switch bearing glove and a second magnet attached to a glove worn on the opposite hand. Furthermore, the switches are wirelessly connected by a miniature, battery-operated wireless data communications unit to a digital vocal processor unit that provides a dual mode, multi-channel phrase looping capability wherein individual channels can be selected for re-recording and selected banks of channels can be deleted during the performance. This combination of features allows a complex sequence of digital effects to be controlled by the artist during a performance while maintaining the freedom of movement desired to enhance the performance.

Type: Application

Filed: February 28, 2011

Publication date: September 8, 2011

Inventor: Momilani Ramstrum
Method and apparatus for switching speech or audio signals

Patent number: 8000968

Abstract: A method and an apparatus for switching speech or audio signals, wherein the method for switching speech or audio signals includes when switching of a speech or audio, weighting a first high frequency band signal of a current frame of speech or audio signal and a second high frequency band signal of the previous M frame of speech or audio signals to obtain a processed first high frequency band signal, where M is greater than or equal to 1, and synthesizing the processed first high frequency band signal and a first low frequency band signal of the current frame of speech or audio signal into a wide frequency band signal. In this way, speech or audio signals with different bandwidths can be smoothly switched, thus improving the quality of audio signals received by a user.

Type: Grant

Filed: April 26, 2011

Date of Patent: August 16, 2011

Assignee: Huawei Technologies Co., Ltd.

Inventors: Zexin Liu, Lei Miao, Chan Hu, Wenhai Wu, Yue Lang, Qing Zhang
SPEECH SYNTHESIS SYSTEM

Publication number: 20110196680

Abstract: When a system (100) is used for synthesizing speech having prosody serving as a reference, the system stores speech element information representing a speech element capable of synthesizing speech having a degree of naturalness indicating a degree of similarity to speech uttered by a human higher than a predetermined reference value (speech element information storage (115)). The system accepts requested prosody information representing prosody requested by the user (requested prosody information accepting part (113)). The system generates intermediate prosody information representing intermediate prosody between the reference prosody and the requested prosody (intermediate prosody information generator (114)). The system executes a speech synthesis process to synthesize speech based on the generated intermediate prosody information and the stored speech element information (speech synthesizer (116)).

Type: Application

Filed: August 21, 2009

Publication date: August 11, 2011

Applicant: NEC CORPORATION

Inventor: Masanori Kato
Sound-source separation system

Patent number: 7987090

Abstract: A system capable of reducing the influence of sound reverberation or reflection to improve sound-source separation accuracy. An original signal X(?,f) is separated from an observed signal Y(?,f) according to a first model and a second model to extract an unknown signal E(?,f). According to the first model, the original signal X(?,f) of the current frame f is represented as a combined signal of known signals S(?,f?m+1) (m=1 to M) that span a certain number M of current and previous frames. This enables extraction of the unknown signal E(?,f) without changing the window length while reducing the influence of reverberation or reflection of the known signal S(?,f) on the observed signal Y(?,f).

Type: Grant

Filed: August 7, 2008

Date of Patent: July 26, 2011

Assignee: Honda Motor Co., Ltd.

Inventors: Ryu Takeda, Kazuhiro Nakadai, Hiroshi Tsujino, Hiroshi Okuno
Text to speech synthesis

Patent number: 7979280

Abstract: An input linguistic description is converted into a speech waveform by deriving at least one target unit sequence corresponding to the linguistic description, selecting from a waveform unit database for the target unit sequences a plurality of alternative unit sequences approximating the target unit sequences, concatenating the alternative unit sequences to alternative speech waveforms and presenting the alternative speech waveforms to an operating person and enabling the choice of one of the presented alternative speech waveforms. There are no iterative cycles of manual modification and automatic selection, which enables a fast way of working. The operator does not need knowledge of units, targets, and costs, but chooses from a set of given alternatives. The fine-tuning of TTS prompts therefore becomes accessible to non-experts.

Type: Grant

Filed: February 22, 2007

Date of Patent: July 12, 2011

Assignee: Svox AG

Inventors: Johan Wouters, Christof Traber, Marcel Riedi, Martin Reber, Jürgen Keller
System and method for blending synthetic voices

Patent number: 7966186

Abstract: A system and method for generating a synthetic text-to-speech TTS voice are disclosed. A user is presented with at least one TTS voice and at least one voice characteristic. A new synthetic TTS voice is generated by blending a plurality of existing TTS voices according to the selected voice characteristics. The blending of voices involves interpolating segmented parameters of each TTS voice. Segmented parameters may be, for example, prosodic characteristics of the speech such as pitch, volume, phone durations, accents, stress, mis-pronunciations and emotion.

Type: Grant

Filed: November 4, 2008

Date of Patent: June 21, 2011

Assignee: AT&T Intellectual Property II, L.P.

Inventors: David A. Kapilow, Kenneth H. Rosen, Juergen Schroeter
Audio signal interpolation method and device

Patent number: 7957973

Abstract: An audio signal interpolation device comprises a spectral movement calculation unit which determines a spectral movement which is indicative of a difference in each of spectral components between a frequency spectrum of a current frame of an input audio signal and a frequency spectrum of a previous frame of the input audio signal stored in a spectrum storing unit. An interpolation band determination unit determines a frequency band to be interpolated by using the frequency spectrum of the current frame and the spectral movement. A spectrum interpolation unit performs interpolation of spectral components in the frequency band for the current frame by using either the frequency spectrum of the current frame or the frequency spectrum of the previous frame.

Type: Grant

Filed: July 25, 2007

Date of Patent: June 7, 2011

Assignee: Fujitsu Limited

Inventors: Masakiyo Tanaka, Masanao Suzuki, Miyuki Shirakawa, Takashi Makiuchi
VOICE QUALITY CONVERSION APPARATUS, PITCH CONVERSION APPARATUS, AND VOICE QUALITY CONVERSION METHOD

Publication number: 20110125493

Abstract: The voice quality conversion apparatus includes: low-frequency harmonic level calculating units and a harmonic level mixing unit for calculating a low-frequency sound source spectrum by mixing a level of a harmonic of an input sound source waveform and a level of a harmonic of a target sound source waveform at a predetermined conversion ratio for each order of harmonics including fundamental, in a frequency range equal to or lower than a boundary frequency; a high-frequency spectral envelope mixing unit that calculates a high-frequency sound source spectrum by mixing the input sound source spectrum and the target sound source spectrum at the predetermined conversion ratio in a frequency range larger than the boundary frequency; and a spectrum combining unit that combines the low-frequency sound source spectrum with the high-frequency sound source spectrum at the boundary frequency to generate a sound source spectrum for an entire frequency range.

Type: Application

Filed: January 31, 2011

Publication date: May 26, 2011

Inventors: Yoshifumi Hirose, Takahiro Kamai
Sound processing apparatus and method, and program therefor

Patent number: 7945446

Abstract: Spectrum envelope of an input sound is detected. In the meantime, a converting spectrum is acquired which is a frequency spectrum of a converting sound comprising a plurality of sounds, such as unison sounds. Output spectrum is generated by imparting the detected spectrum envelope of the input sound to the acquired converting spectrum. Sound signal is synthesized on the basis of the generated output spectrum. Further, a pitch of the input sound may be detected, and frequencies of peaks in the acquired converting spectrum may be varied in accordance with the detected pitch of the input sound. In this manner, the output spectrum can have the pitch and spectrum envelope of the input sound and spectrum frequency components of the converting sound comprising a plurality of sounds, and thus, unison sounds can be readily generated with simple arrangements.

Type: Grant

Filed: March 9, 2006

Date of Patent: May 17, 2011

Assignee: Yamaha Corporation

Inventors: Hideki Kemmochi, Yasuo Yoshioka, Jordi Bonada
SPEECH SYNTHESIS APPARATUS AND METHOD

Publication number: 20110087488

Abstract: According to an embodiment, a speech synthesis apparatus includes a selecting unit configured to select speaker's parameters one by one for respective speakers and obtain a plurality of speakers' parameters, the speaker's parameters being prepared for respective pitch waveforms corresponding to speaker's speech sounds, the speaker's parameters including formant frequencies, formant phases, formant powers, and window functions concerning respective formants that are contained in the respective pitch waveforms. The apparatus includes a mapping unit configured to make formants correspond to each other between the plurality of speakers' parameters using a cost function based on the formant frequencies and the formant powers. The apparatus includes a generating unit configured to generate an interpolated speaker's parameter by interpolating, at desired interpolation ratios, the formant frequencies, formant phases, formant powers, and window functions of formants which are made to correspond to each other.

Type: Application

Filed: December 16, 2010

Publication date: April 14, 2011

Inventors: Ryo Morinaka, Takehiko Kagoshima
RICH CONTEXT MODELING FOR TEXT-TO-SPEECH ENGINES

Publication number: 20110054903

Abstract: Embodiments of rich text modeling for speech synthesis are disclosed. In operation, a text-to-speech engine refines a plurality of rich context models based on decision tree-tied Hidden Markov Models (HMMs) to produce a plurality of refined rich context models. The text-to-speech engine then generates synthesized speech for an input text based at least on some of the plurality of refined rich context models.

Type: Application

Filed: December 2, 2009

Publication date: March 3, 2011

Applicant: MICROSOFT CORPORATION

Inventors: Zhi-Jie Yan, Yao Qian, Frank Kao-Ping Soong
METHOD AND APPARATUS FOR EXTRACTING PROSODIC FEATURE OF SPEECH SIGNAL

Publication number: 20110046958

Abstract: The present invention discloses a method and an apparatus for extracting a prosodic feature of a speech signal, the method including: dividing the speech signal into speech frames; transforming the speech frames from time domain to frequency domain; and extracting respective prosodic features for different frequency ranges. According to the above technical solution of the present invention, it is possible to effectively extract the prosodic feature which can combine with a traditional acoustics feature without any obstacle.

Type: Application

Filed: August 16, 2010

Publication date: February 24, 2011

Applicant: Sony Corporation

Inventors: Kun LIU, Weiguo Wu
SYSTEM AND METHOD FOR SPEECH SYNTHESIS USING FREQUENCY SPLICING

Publication number: 20110046957

Abstract: Techniques are disclosed for frequency splicing in which speech segments used in the creation of a final speech waveform are constructed, at least in part, by combining (e.g., summing) a small number (e.g., two) of component speech segments that overlap substantially, or entirely, in time but have spectral energy that occupies disjoint, or substantially disjoint, frequency ranges. The component speech segments may be derived from speech segments produced by different speakers or from different speech segments produced by the same speaker. Depending on the embodiment, frequency splicing may supplement rule-based, concatenative, hybrid, or limited-vocabulary speech synthesis systems to provide various advantages.

Type: Application

Filed: August 24, 2010

Publication date: February 24, 2011

Applicant: NovaSpeech, LLC

Inventors: Susan R. Hertz, Harold G. Mills
Personalized voice playback for screen reader

Patent number: 7865365

Abstract: A method, system, and computer program product is disclosed for customizing a synthesized voice based upon audible input voice data. The input voice data is typically in the form of one or more predetermined paragraphs being read into a voice recorder. The input voice data is then analyzed for adjustable voice characteristics to determine basic voice qualities (e.g., pitch, breathiness, tone, speed; variability of any of these qualities, etc.) and to identify any “specialized speech patterns”. Based upon this analysis, the characteristics of the voice utilized to read text appearing on the screen are modified to resemble the input voice data. This allows a user of the system to easily and automatically create a voice that is familiar to the user.

Type: Grant

Filed: August 5, 2004

Date of Patent: January 4, 2011

Assignee: Nuance Communications, Inc.

Inventors: Debbie Ann Anglin, Howard Neil Anglin, Nyralin Novella Kline
ATTENUATION OF OVERVOICING, IN PARTICULAR FOR THE GENERATION OF AN EXCITATION AT A DECODER WHEN DATA IS MISSING

Publication number: 20100324907

Abstract: The invention proposes the synthesis of a signal consisting of consecutive blocks. It proposes more particularly, on receipt of such a signal, to replace, by synthesis, lost or erroneous blocks of this signal. To this end, it proposes an attenuation of the overvoicing during the generation of a signal synthesis. More particularly, a voiced excitation is generated on the basis of the pitch period (T) estimated or transmitted at the previous block, by optionally applying a correction of plus or minus a sample of the duration of this period (counted in terms of number of samples), by constituting groups (A?,B?,C?,D?) of at least two samples and inverting positions of samples in the groups, randomly (B?,C?) or in a forced manner. An over-harmonicity in the excitation generated is thus broken and the effect of overvoicing in the synthesis of the generated signal is thereby attenuated.

Type: Application

Filed: October 17, 2007

Publication date: December 23, 2010

Applicant: France Telecom

Inventors: David Virette, Balazs Kovesi
Voice guidance device and navigation device with the same

Patent number: 7805306

Abstract: For a voice guidance phrase, multiple voice data items having individually different voice ranges or frequencies are previously stored in a memory. A voice mixing unit chooses to mix three voice data items among the stored voice data items and thereby produces a mixed voice data item. A voice outputting unit converts the mixed voice data item into a voice and then vocalizes a voice guidance phrase via a speaker. A voice measuring unit measures a characteristic of a frequency, a volume, or a pronunciation speed with respect to a response voice responding to the outputted voice guidance phrase. A voice mixing unit produces a mixed voice data item having a characteristic similar to the measured characteristic and outputs it.

Type: Grant

Filed: July 18, 2005

Date of Patent: September 28, 2010

Assignee: Denso Corporation

Inventor: Takao Mitsui
Method for speech quality degradation estimation and method for degradation measures calculation and apparatuses thereof

Patent number: 7801725

Abstract: A method for speech quality degradation estimation, a method for degradation measures calculation, and the apparatuses thereof are provided. The first method above estimates the speech quality of a speech signal that is modified by a pitch-synchronous prosody modification method, which comprises the following steps. First, extract at least one source pitchmark from the speech signal, and then maps the source pitchmark(s) to at least one target pitchmark(s). Finally, calculate at least one degradation measure based on the mapping between the source and the target pitchmarks. The degradation measures include several weighted pitch-related functions and duration-related functions, where the weighting functions can be calculated based on the speech signal or the pitchmark(s) mapping mentioned above.

Type: Grant

Filed: June 29, 2006

Date of Patent: September 21, 2010

Assignee: Industrial Technology Research Institute

Inventors: Shi-Han Chen, Chih-Chung Kuo, Shun-Ju Chen
Method of generating a prosodic model for adjusting speech style and apparatus and method of synthesizing conversational speech using the same

Patent number: 7792673

Abstract: An apparatus and method for adjusting the friendliness of a synthesized speech and thus generating synthesized speech of various styles in a speech synthesis system are provided. The method includes the steps of defining at least two friendliness levels; storing recorded speech data of sentences, the sentences being made up according to each of the friendliness levels; extracting at least one of prosodic characteristics for each of the friendliness levels from the recorded speech data, said prosodic characteristics including at least one of a sentence-final intonation type, boundary intonation types of intonation phrases in the sentence, and an average value of F0 of the sentence, with respect to the recorded speech data; and generating a prosodic model for each of the friendliness levels by statistically modeling the at least one of the prosodic characteristics.

Type: Grant

Filed: November 7, 2006

Date of Patent: September 7, 2010

Assignee: Electronics and Telecommunications Research Institute

Inventors: Seung Shin Oh, Sang Hun Kim, Young Jik Lee
SPEECH SYNTHESIS DEVICE, SPEECH SYNTHESIS METHOD, AND SPEECH SYNTHESIS PROGRAM

Publication number: 20100223058

Abstract: A speech synthesis device includes a pitch pattern generation unit (104) which generates a pitch pattern by combining, based on pitch pattern target data including phonemic information formed from at least syllables, phonemes, and words, a standard pattern which approximately expresses the rough shape of the pitch pattern and an original utterance pattern which expresses the pitch pattern of a recorded speech, a unit waveform selection unit (106) which selects unit waveform data based on the generated pitch pattern and upon selection, selects original utterance unit waveform data corresponding to the original utterance pattern in a section where the original utterance pattern is used, and a speech waveform generation unit (107) which generates a synthetic speech by editing the selected unit waveform data so as to reproduce prosody represented by the generated pitch pattern.

Type: Application

Filed: August 28, 2008

Publication date: September 2, 2010

Inventors: Yasuyuki Mitsui, Reishi Kondo
Voice synthesizer, voice synthesizing method, and computer program

Patent number: 7739113

Abstract: A voice synthesizer includes a recorded voice storage portion (124) that stores recorded voices that are pre-recorded; a voice input portion (110) that is input with a reading voice reading out a text that is to be generated by the synthesized voice; an attribute information input portion (112) that is input with a label string, which is a string of labels assigned to each phoneme included in the reading voice, and label information, which indicates the border position of each phoneme corresponding to each label; a parameter extraction portion (116) that extracts characteristic parameters of the reading voice based on the label string, the label information, and the reading voice; and a voice synthesis portion (122) that selects the recorded voices from the recorded voice storage portion in accordance with the characteristic parameters, synthesizes the recorded voices, and generates the synthesized voice that reads out the text.

Type: Grant

Filed: November 9, 2006

Date of Patent: June 15, 2010

Assignee: Oki Electric Industry Co., Ltd.

Inventor: Tsutomu Kaneyasu
Speech recognition apparatus, speech recognition apparatus and program thereof

Patent number: 7720679

Abstract: Provided is a method for canceling background noise of a sound source other than a target direction sound source in order to realize highly accurate speech recognition, and a system using the same. In terms of directional characteristics of a microphone array, due to a capability of approximating a power distribution of each angle of each of possible various sound source directions by use of a sum of coefficient multiples of a base form angle power distribution of a target sound source measured beforehand by base form angle by using a base form sound, and power distribution of a non-directional background sound by base form, only a component of the target sound source direction is extracted at a noise suppression part. In addition, when the target sound source direction is unknown, at a sound source localization part, a distribution for minimizing the approximate residual is selected from base form angle power distributions of various sound source directions to assume a target sound source direction.

Type: Grant

Filed: September 24, 2008

Date of Patent: May 18, 2010

Assignee: Nuance Communications, Inc.

Inventors: Osamu Ichikawa, Tetsuya Takiguchi, Masafumi Nishimura
Method, apparatus and computer program providing a multi-speaker database for concatenative text-to-speech synthesis

Patent number: 7716052

Abstract: A method, apparatus and a computer program product to generate an audible speech word that corresponds to text. The method includes providing a text word and, in response to the text word, processing pre-recorded speech segments that are derived from a plurality of speakers to selectively concatenate together speech segments based on at least one cost function to form audio data for generating an audible speech word that corresponds to the text word. A data structure is also provided for use in a concatenative text-to-speech system that includes a plurality of speech segments derived from a plurality of speakers, where each speech segment includes an associated attribute vector each of which is comprised of at least one attribute vector element that identifies the speaker from which the speech segment was derived.

Type: Grant

Filed: April 7, 2005

Date of Patent: May 11, 2010

Assignee: Nuance Communications, Inc.

Inventors: Andrew S. Aaron, Ellen M. Eide, Wael M. Hamza, Michael A. Picheny, Charles T. Rutherfoord, Zhi Wei Shuang, Maria E. Smith
Periodic signal enhancement system

Patent number: 7680652

Abstract: A signal enhancement system improves the understandability of speech or other audio signals. The system reinforces selected parts of the signal, may attenuate selected parts of the signal, and may increase SNR. The system includes delay logic, an adaptive filter, and signal reinforcement logic. The adaptive filter may track one or more fundamental frequencies in the input signal and outputs a filtered signal. The filtered signal may approximately reproduce the input signal approximately delayed by an integer multiple of the signal's fundamental frequencies. The reinforcement logic combines the input signal and the filtered signal output to produce an enhanced signal output.

Type: Grant

Filed: October 26, 2004

Date of Patent: March 16, 2010

Assignee: QNX Software Systems (Wavemakers), Inc.

Inventors: David Giesbrecht, Phillip Hetherington
Voice analysis/synthesis apparatus and program

Patent number: 7672835

Abstract: An FFT unit performs an FFT process on high-frequency-eliminated, pitch-shifted voice data for one frame. A time scaling unit calculates a frequency amplitude, a phase, a phase difference between the present and immediately preceding frames, and an unwrapped version of the phase difference for each channel from which the frequency component was obtained by the FFT, detects a reference channel based on a peak one of the frequency amplitudes, and calculates the phase of each channel in a synthesized voice based on the reference channel, using results of the calculation. An IFFT unit processes each frequency component in accordance with the calculated phase, performs an IFFT process on the resulting frequency component, and produces synthesized voice data for one frame.

Type: Grant

Filed: December 19, 2005

Date of Patent: March 2, 2010

Assignee: Casio Computer Co., Ltd.

Inventor: Masaru Setoguchi
Pitch detection of speech signals

Patent number: 7660718

Abstract: Pitch detection of speech signals finds numerous applications in karaoke, voice recognition and scoring applications. While most of the existing techniques rely on time domain methods, the invention utilizes frequency domain methods. There is provided a method and system for determining the pitch of speech from a speech signal. The method includes the steps of: producing or obtaining the speech signal; distinguishing the speech signal into voiced, unvoiced or silence sections using speech signal energy levels; applying a Fourier Transform to the speech signal and obtaining speech signal parameters; determining peaks of the Fourier transformed speech signal; tracking the speech signal parameters of the determined peaks to select partials; and determining the pitch from the selected partials using a two-way mismatch error calculation.

Type: Grant

Filed: September 23, 2004

Date of Patent: February 9, 2010

Assignee: STMicroelectronics Asia Pacific Pte. Ltd.

Inventors: Kabi Prakash Padhi, Sapna George
SPEECH SYNTHESIZING APPARATUS AND METHOD THEREOF

Publication number: 20090326951

Abstract: Ratios of powers at the peaks of respective formants of the spectrum of a pitch-cycle waveform and powers at boundaries between the formants are obtained and, when the ratios are large, bandwidth of window functions are widened and the formant waveforms are generated by multiplying generated sinusoidal waveforms from the formant parameter sets on the basis of pitch-cycle waveform generating data by the window functions of the widened bandwidth, whereby a pitch-cycle waveform is generated by the sum of these formant waveforms.

Type: Application

Filed: April 14, 2009

Publication date: December 31, 2009

Applicant: KABUSHIKI KAISHA TOSHIBA

Inventors: Ryo Morinaka, Takehiko Kagoshima
Speech synthesis system and method

Patent number: 7630896

Abstract: A speech synthesis system in a preferred embodiment includes a speech unit storage section, a phonetic environment storage section, a phonetic sequence/prosodic information input section, a plural-speech-unit selection section, a fused-speech-unit sequence generation section, and a fused-speech-unit modification/concatenation section. By fusing a plurality of selected speech units in the fused speech unit sequence generation section, a fused speech unit is generated. In the fused speech unit sequence generation section, the average power information is calculated for a plurality of selected M speech units, N speech units are fused together, and the power information of the fused speech unit is so corrected as to be equalized with the average power information of the M speech units.

Type: Grant

Filed: September 23, 2005

Date of Patent: December 8, 2009

Assignee: Kabushiki Kaisha Toshiba

Inventors: Masatsune Tamura, Gou Hirabayashi, Takehiko Kagoshima
METHOD, APPARATUS AND COMPUTER PROGRAM PRODUCT FOR PROVIDING IMPROVED SPEECH SYNTHESIS

Publication number: 20090299747

Abstract: An apparatus for providing improved speech synthesis may include a processor and a memory storing executable instructions. In response to execution of the instructions by the processor, the apparatus may perform at least selecting a real glottal pulse from among one or more stored real glottal pulses based at least in part on a property associated with the real glottal pulse, utilizing the real glottal pulse selected as a basis for generation of an excitation signal, and modifying the excitation signal based on spectral parameters generated by a model to provide synthetic speech.

Type: Application

Filed: May 29, 2009

Publication date: December 3, 2009

Inventors: Tuomo Johannes Raitio, Antti Santeri Suni, Martti Tapani Vainio, Paavo Ilmari Alku, Jani Kristian Nurminen
Voice synthesizer of multi sounds

Patent number: 7613612

Abstract: In a voice synthesizer, an envelope acquisition portion obtains a spectral envelope of a reference frequency spectrum of a given voice. A spectrum acquisition portion obtains a collective frequency spectrum of a plurality of voices which are generated in parallel to one another. An envelope adjustment portion adjusts a spectral envelope of the collective frequency spectrum obtained by the spectrum acquisition portion so as to approximately match with the spectral envelope of the reference frequency spectrum obtained by the envelope acquisition portion. A voice generation portion generates an output voice signal from the collective frequency spectrum having the spectral envelope adjusted by the envelope adjustment portion.

Type: Grant

Filed: January 31, 2006

Date of Patent: November 3, 2009

Assignee: Yamaha Corporation

Inventors: Hideki Kemmochi, Jordi Bonada
TONE DETECTION FOR SIGNALS SENT THROUGH A VOCODER

Publication number: 20090265173

Abstract: A tone detector and associated method for use with EVRC-B and GSM vocoders to enable reliable detection of system connect tones over a wireless communication system. The tone detection method examines a number of sequential data frames of the signal received from the vocoder and determines that the tone is present if the spectral energy at frequencies around the tone is much higher than that at neighboring frequencies and if the calculated center frequency of the data frames is at or near the frequency of the tone.

Type: Application

Filed: April 18, 2008

Publication date: October 22, 2009

Applicant: GENERAL MOTORS CORPORATION

Inventors: Sethu K. Madhavan, Jijun Yin, Qin Jiang, Darrel James Van Buer
SPEECH PROCESSING APPARATUS, METHOD, AND COMPUTER PROGRAM PRODUCT

Publication number: 20090248417

Abstract: A method to generate a pitch contour for speech synthesis is proposed. The method is based on finding the pitch contour that maximizes a total likelihood function created by the combination of all the statistical models of the pitch contour segments of an utterance, at one or multiple linguistic levels. These statistical models are trained from a database of spoken speech, by means of a decision tree that for each linguistic level clusters the parametric representation of the pitch segments extracted from the spoken speech data with some features obtained from the text associated with that speech data. The parameterization of the pitch segments is performed in such a way, the likelihood function of any linguistic level can be expressed in terms of the parameters of one of the levels, thus allowing the maximization to be calculated with respect to the parameters of that level.

Type: Application

Filed: March 17, 2009

Publication date: October 1, 2009

Applicant: KABUSHIKI KAISHA TOSHIBA

Inventors: Javier Latorre, Masami Akamine
METHOD, APPARATUS AND PROGRAM FOR SPEECH SYNTHESIS

Publication number: 20090204405

Abstract: Apparatus and method for generating high quality synthesized speech having smooth waveform concatenation. The apparatus includes a pitch frequency calculation section, a pitch synchronization position calculation section, a unit waveform storage, a unit waveform selection section, a unit waveform generation section, and a waveform synthesis section. The unit waveform generation section includes a conversion ratio calculation section, a sampling rate conversion section, and a unit waveform re-selection section. The conversion ratio calculation section calculates a sampling rate conversion ratio from the pitch information and the position of pitch synchronization, and the sampling rate conversion section converts the sampling rate of the unit waveform, delivered as input, based on the sampling rate conversion ratio.

Type: Application

Filed: September 4, 2006

Publication date: August 13, 2009

Applicant: NEC CORPORATION

Inventors: Masanori Kato, Satoshi Tsukada
Voice synthesis device

Patent number: 7571099

Abstract: A voice synthesis device for generating synthetic voice having great freedom in voice quality and good sound quality from text data is provided.

Type: Grant

Filed: January 17, 2005

Date of Patent: August 4, 2009

Assignee: Panasonic Corporation

Inventors: Natsuki Saito, Takahiro Kamai, Yumiko Kato
Speech synthesis method and speech synthesizer

Patent number: 7562018

Abstract: A language processing portion (31) analyzes a text from a dialogue processing section (20) and transforms the text to information on pronunciation and accent. A prosody generation portion (32) generates an intonation pattern according to a control signal from the dialogue processing section (20). A waveform DB (34) stores prerecorded waveform data together with pitch mark data imparted thereto. A waveform cutting portion (33) cuts desired pitch waveforms from the waveform DB (34). A phase operation portion (35) removes phase fluctuation by standardizing phase spectra of the pitch waveforms cut by the waveform cutting portion (33), and afterwards imparts phase fluctuation by diffusing only high phase components randomly according to the control signal from the dialogue processing section (20). The thus-produced pitch waveforms are placed at desired intervals and superimposed.

Type: Grant

Filed: November 25, 2003

Date of Patent: July 14, 2009

Assignee: Panasonic Corporation

Inventors: Takahiro Kamai, Yumiko Kato

prev 1 2 3 4 5 6 7 next