Frequency Element Patents (Class 704/268)
-
Patent number: 8185395Abstract: An information transmission device which analyzes a diction of a speaker and provides an utterance in accordance with the diction of the speaker, and which has a microphone detecting a sound signal of the speaker, a feature extraction unit extracting at least one feature value of the diction of the speaker based on the sound signal detected by the microphone, a voice synthesis unit synthesizing a voice signal to be uttered so that the voice signal has the same feature value as the diction of the speaker, based on the feature value extracted by the feature extraction unit, and a voice output unit performing an utterance based on the voice signal synthesized by the voice synthesis unit.Type: GrantFiled: September 13, 2005Date of Patent: May 22, 2012Assignee: Honda Motor Co., Ltd.Inventors: Tokitomo Ariyoshi, Kazuhiro Nakadai, Hiroshi Tsujino
-
Patent number: 8170879Abstract: A signal enhancement system improves the understandability of speech or other audio signals. The system reinforces selected parts of the signal, may attenuate selected parts of the signal, and may increase SNR. The system includes delay logic, a partitioned adaptive filter, and signal reinforcement logic. The partitioned adaptive filter may track and enhance the fundamental frequency and harmonics in the input signal. The partitioned filter output signals may approximately reproduce the input signal, delayed by an integer multiple of the period of the fundamental frequency of the input signal. The reinforcement logic combines the input signal and the filtered signals to produce an enhanced output signal.Type: GrantFiled: April 8, 2005Date of Patent: May 1, 2012Assignee: QNX Software Systems LimitedInventors: Rajeev Nongpiur, David Giesbrecht, Phillip Hetherington
-
Patent number: 8165882Abstract: Apparatus and method for generating high quality synthesized speech having smooth waveform concatenation. The apparatus includes a pitch frequency calculation section, a pitch synchronization position calculation section, a unit waveform storage, a unit waveform selection section, a unit waveform generation section, and a waveform synthesis section. The unit waveform generation section includes a conversion ratio calculation section, a sampling rate conversion section, and a unit waveform re-selection section. The conversion ratio calculation section calculates a sampling rate conversion ratio from the pitch information and the position of pitch synchronization, and the sampling rate conversion section converts the sampling rate of the unit waveform, delivered as input, based on the sampling rate conversion ratio.Type: GrantFiled: September 4, 2006Date of Patent: April 24, 2012Assignee: NEC CorporationInventors: Masanori Kato, Satoshi Tsukada
-
Patent number: 8145491Abstract: When pitch of a speech segment is being modified from a current pitch to a requested pitch, and the difference between these is relatively large, a pitch modification algorithm is used to modify the pitch of the speech segment. When the difference between current and requested pitches is relatively small, the pitch of the speech segment is not modified. After one or the other speech modification techniques are used, then the resultant modified speech segment is overlapped and added to previously modified speech segments. A modification ratio is determined in order to quantify the difference between the current and requested pitches for a speech segment. The modification ratio is a ratio between the requested and current pitches. Low and high ratio thresholds are used to determine when pitch is being modified to a predetermined high degree, and whether pitch of the speech segment will or will not be modified.Type: GrantFiled: July 30, 2002Date of Patent: March 27, 2012Assignee: Nuance Communications, Inc.Inventors: Wael Mohamed Hamza, Michael Alan Picheny
-
Patent number: 8126717Abstract: A method for generating a prosody model that predicts prosodic parameters is disclosed. Upon receiving text annotated with acoustic features, the method comprises generating first classification and regression trees (CARTs) that predict durations and F0 from text by generating initial boundary labels by considering pauses, generating initial accent labels by applying a simple rule on text-derived features only, adding the predicted accent and boundary labels to feature vectors, and using the feature vectors to generate the first CARTs. The first CARTs are used to predict accent and boundary labels. Next, the first CARTs are used to generate second CARTs that predict durations and F0 from text and acoustic features by using lengthened accented syllables and phrase-final syllables, refining accent and boundary models simultaneously, comparing actual and predicted duration of a whole prosodic phrase to normalize speaking rate, and generating the second CARTs that predict the normalized speaking rate.Type: GrantFiled: October 13, 2006Date of Patent: February 28, 2012Assignee: AT&T Intellectual Property II, L.P.Inventor: Volker Franz Strom
-
Patent number: 8121835Abstract: Automatic level control of speech portions of an audio signal is provided. An audio signal is received in the form of a sequence of samples and may contain speech portion and non-speech portions. The sequence of samples is divided into a sequence of sub-frames. Multiple sub-frames adjacent to a present sub-frame are examined to determine a peak value of samples in the sub-frames. A gain factor is computed for the present sub-frame based on the peak value and a desired maximum value for said speech portion, and each sample in the present sub-frame is amplified by the gain factor. In an embodiment, variations in filtered energy values of multiple sub-frames enable determination of whether a sub-frame corresponds to a speech or non-speech/noise portion.Type: GrantFiled: March 6, 2008Date of Patent: February 21, 2012Assignee: Texas Instruments IncorporatedInventor: Fitzgerald John Archibald
-
Patent number: 8108216Abstract: In a speech synthesis, a selecting unit selects one string from first speech unit strings corresponding to a first segment sequence obtained by dividing a phoneme string corresponding to target speech into segments. The selecting unit performs repeatedly generating, based on maximum W second speech unit strings corresponding to a second segment sequence as a partial sequence of the first sequence, third speech unit strings corresponding to a third segment sequence obtained by adding a segment to the second sequence, and selecting maximum W strings from the third strings based on a evaluation value of each of the third strings. The value is obtained by correcting a total cost of each of the third string candidate with a penalty coefficient for each of the third strings. The coefficient is based on a restriction concerning quickness of speech unit data acquisition, and depends on extent in which the restriction is approached.Type: GrantFiled: March 19, 2008Date of Patent: January 31, 2012Assignee: Kabushiki Kaisha ToshibaInventors: Masahiro Morita, Takehiko Kagoshima
-
Patent number: 8108217Abstract: A noise adaptive mobile communication device including a noise collecting microphone which collects noise from a peripheral environment; a noise sensing unit which senses the collected noise; a frequency-component detecting unit which detects a frequency component of the sensed noise; a sound generating unit which generates a noise-adaptive sound from the detected frequency component; a call-sound synthesizing unit which synthesizes received call sound with the noise-adaptive sound; and an operation control unit which controls the call-sound synthesizing unit to operate each predetermined time.Type: GrantFiled: February 11, 2005Date of Patent: January 31, 2012Assignee: Samsung Electronics Co., Ltd.Inventors: Myung-Hyun Yoo, Jaywoo Kim, Joonah Park, Seung-Nyung Chung
-
Patent number: 8103505Abstract: A method and apparatus for speech synthesis in a computer-user interface using random paralinguistic variation is described herein. According to one aspect of the present invention, a method for synthesizing speech comprises generating synthesized speech having certain prosodic features. The synthesized speech is further processed by applying a random paralinguistic variation to the acoustic sequence representing the synthesized speech without altering the linguistic prosodic features. According to one aspect of the present invention, the application of the paralinguistic variation is correlated with a previously applied paralinguistic variation to reflect a gradual change in the computer voice, while still maintaining a random quality.Type: GrantFiled: November 19, 2003Date of Patent: January 24, 2012Assignee: Apple Inc.Inventors: Kim Silverman, Donald Lindsay
-
Publication number: 20110320207Abstract: The invention relates to a method for speech signal analysis, modification and synthesis comprising a phase for the location of analysis windows by means of an iterative process for the determination of the phase of the first sinusoidal component and comparison between the phase value of said component and a predetermined value, a phase for the selection of analysis frames corresponding to an allophone and readjustment of the duration and the fundamental frequency according to certain thresholds and a phase for the generation of synthetic speech from synthesis frames taking the information of the closest analysis frame as spectral information of the synthesis frame and taking as many synthesis frames as periods that the synthetic signal has. The method allows a coherent location of the analysis windows within the periods of the signal and the exact generation of the synthesis instants in a manner synchronous with the fundamental period.Type: ApplicationFiled: December 21, 2010Publication date: December 29, 2011Applicant: TELEFONICA, S.A.Inventors: Miguel Angel Rodriguez Crespo, Jose Gregorio Escalada Sardina, Ana Armenta Lopez Vicuna
-
Patent number: 8073696Abstract: A voice synthesis device is provided to include: an emotion input unit obtaining an utterance mode of a voice waveform, a prosody generation unit generating a prosody, a characteristic tone selection unit selecting a characteristic tone based on the utterance mode; and a characteristic tone temporal position estimation unit (i) judging whether or not each of phonemes included in a phonologic sequence of text is to be uttered with the characteristic tone, based on the phonologic sequence, the characteristic tone, and the prosody, and (ii) deciding a phoneme, which is an utterance position where the text is uttered with the characteristic tone. The voice synthesis device also includes an element selection unit and an element connection unit generating the voice waveform based on the phonologic sequence, the prosody, and the utterance position, so that the text is uttered in the utterance mode with the characteristic tone at the determined utterance position.Type: GrantFiled: May 2, 2006Date of Patent: December 6, 2011Assignee: Panasonic CorporationInventors: Yumiko Kato, Takahiro Kamai
-
Publication number: 20110270614Abstract: A method and an apparatus for switching speech or audio signals, wherein the method for switching speech or audio signals includes when switching of a speech or audio, weighting a first high frequency band signal of a current frame of speech or audio signal and a second high frequency band signal of the previous M frame of speech or audio signals to obtain a processed first high frequency band signal, where M is greater than or equal to 1, and synthesizing the processed first high frequency band signal and a first low frequency band signal of the current frame of speech or audio signal into a wide frequency band signal. In this way, speech or audio signals with different bandwidths can be smoothly switched, thus improving the quality of audio signals received by a user.Type: ApplicationFiled: June 16, 2011Publication date: November 3, 2011Applicant: HUAWEI TECHNOLOGIES CO., LTD.Inventors: Zexin Liu, Lei Miao, Chen Hu, Wenhai Wu, Yue Lang, Qing Zhang
-
Patent number: 8046225Abstract: Normalization parameters are generated at a normalization-parameter generating unit by calculating the mean values and the standard deviations of an initial prosody pattern and a prosody pattern of a training sentence of a speech corpus. Then, the variance range or variance width of the initial prosody pattern is normalized at the prosody-pattern normalizing unit in accordance with the normalization parameters. As a result, a prosody pattern similar to speech of human beings and improved in naturalness can be generated with a small amount of calculation.Type: GrantFiled: February 8, 2008Date of Patent: October 25, 2011Assignee: Kabushiki Kaisha ToshibaInventors: Takashi Masuko, Masami Akamine
-
Patent number: 8036894Abstract: Methods, apparatus, systems, and computer program products are provided for synthesizing speech. One method includes matching a first level of units of a received input string to audio segments from a plurality of audio segments including using properties of or between first level units to locate matching audio segments from a plurality of selections, parsing unmatched first level units into second level units, matching the second level units to audio segments using properties of or between the units to locate matching audio segments from a plurality of selections and synthesizing the input string, including combining the audio segments associated with the first and second units.Type: GrantFiled: February 16, 2006Date of Patent: October 11, 2011Assignee: Apple Inc.Inventors: Matthias Neeracher, Devang K. Naik, Kevin B. Aitken, Jerome R. Bellegarda, Kim E.A. Silverman
-
Patent number: 8027837Abstract: Systems, apparatus, methods and computer program products are described for producing text-to-speech synthesis with non-speech sounds. In general, some of the pauses or silences that would otherwise be generated in synthesized speech are instead synthesized as non-speech sounds such as breaths. Non-speech sounds can be identified from pre-recorded speech that can include meta-data such as the grammatical and phrasal structure of words and sounds that precede and succeed non-speech sounds. A non-speech sound can be selected for use in synthesized speech based on the words, punctuation, grammatical and phrasal structure of text from which the speech is being synthesized, or other characteristics.Type: GrantFiled: September 15, 2006Date of Patent: September 27, 2011Assignee: Apple Inc.Inventors: Kim E. A. Silverman, Matthias Neeracher
-
Publication number: 20110224977Abstract: A robot may include a driving control unit configured to control a driving of a movable unit that is connected movably to a body unit, a voice generating unit configured to generate a voice, and a voice output unit configured to output the voice, which has been generated by the voice generating unit. The voice generating unit may correct the voice, which is generated, based on a bearing of the movable unit, which is controlled by the driving control unit, to the body unit.Type: ApplicationFiled: September 14, 2010Publication date: September 15, 2011Applicant: HONDA MOTOR CO., LTD.Inventors: Kazuhiro NAKADAI, Takuma OTSUKA, Hiroshi OKUNO
-
Publication number: 20110218810Abstract: A system for controlling digital effects in live performances with vocal improvisation is described. The system features a complex controller that in one embodiment utilizes several magnetically activated electronic switches attached to a glove that is worn by an artist during a live performance. The switches are activated by a permanent magnet that is also attached to the switch bearing glove and a second magnet attached to a glove worn on the opposite hand. Furthermore, the switches are wirelessly connected by a miniature, battery-operated wireless data communications unit to a digital vocal processor unit that provides a dual mode, multi-channel phrase looping capability wherein individual channels can be selected for re-recording and selected banks of channels can be deleted during the performance. This combination of features allows a complex sequence of digital effects to be controlled by the artist during a performance while maintaining the freedom of movement desired to enhance the performance.Type: ApplicationFiled: February 28, 2011Publication date: September 8, 2011Inventor: Momilani Ramstrum
-
Patent number: 8000968Abstract: A method and an apparatus for switching speech or audio signals, wherein the method for switching speech or audio signals includes when switching of a speech or audio, weighting a first high frequency band signal of a current frame of speech or audio signal and a second high frequency band signal of the previous M frame of speech or audio signals to obtain a processed first high frequency band signal, where M is greater than or equal to 1, and synthesizing the processed first high frequency band signal and a first low frequency band signal of the current frame of speech or audio signal into a wide frequency band signal. In this way, speech or audio signals with different bandwidths can be smoothly switched, thus improving the quality of audio signals received by a user.Type: GrantFiled: April 26, 2011Date of Patent: August 16, 2011Assignee: Huawei Technologies Co., Ltd.Inventors: Zexin Liu, Lei Miao, Chan Hu, Wenhai Wu, Yue Lang, Qing Zhang
-
Publication number: 20110196680Abstract: When a system (100) is used for synthesizing speech having prosody serving as a reference, the system stores speech element information representing a speech element capable of synthesizing speech having a degree of naturalness indicating a degree of similarity to speech uttered by a human higher than a predetermined reference value (speech element information storage (115)). The system accepts requested prosody information representing prosody requested by the user (requested prosody information accepting part (113)). The system generates intermediate prosody information representing intermediate prosody between the reference prosody and the requested prosody (intermediate prosody information generator (114)). The system executes a speech synthesis process to synthesize speech based on the generated intermediate prosody information and the stored speech element information (speech synthesizer (116)).Type: ApplicationFiled: August 21, 2009Publication date: August 11, 2011Applicant: NEC CORPORATIONInventor: Masanori Kato
-
Patent number: 7987090Abstract: A system capable of reducing the influence of sound reverberation or reflection to improve sound-source separation accuracy. An original signal X(?,f) is separated from an observed signal Y(?,f) according to a first model and a second model to extract an unknown signal E(?,f). According to the first model, the original signal X(?,f) of the current frame f is represented as a combined signal of known signals S(?,f?m+1) (m=1 to M) that span a certain number M of current and previous frames. This enables extraction of the unknown signal E(?,f) without changing the window length while reducing the influence of reverberation or reflection of the known signal S(?,f) on the observed signal Y(?,f).Type: GrantFiled: August 7, 2008Date of Patent: July 26, 2011Assignee: Honda Motor Co., Ltd.Inventors: Ryu Takeda, Kazuhiro Nakadai, Hiroshi Tsujino, Hiroshi Okuno
-
Patent number: 7979280Abstract: An input linguistic description is converted into a speech waveform by deriving at least one target unit sequence corresponding to the linguistic description, selecting from a waveform unit database for the target unit sequences a plurality of alternative unit sequences approximating the target unit sequences, concatenating the alternative unit sequences to alternative speech waveforms and presenting the alternative speech waveforms to an operating person and enabling the choice of one of the presented alternative speech waveforms. There are no iterative cycles of manual modification and automatic selection, which enables a fast way of working. The operator does not need knowledge of units, targets, and costs, but chooses from a set of given alternatives. The fine-tuning of TTS prompts therefore becomes accessible to non-experts.Type: GrantFiled: February 22, 2007Date of Patent: July 12, 2011Assignee: Svox AGInventors: Johan Wouters, Christof Traber, Marcel Riedi, Martin Reber, Jürgen Keller
-
Patent number: 7966186Abstract: A system and method for generating a synthetic text-to-speech TTS voice are disclosed. A user is presented with at least one TTS voice and at least one voice characteristic. A new synthetic TTS voice is generated by blending a plurality of existing TTS voices according to the selected voice characteristics. The blending of voices involves interpolating segmented parameters of each TTS voice. Segmented parameters may be, for example, prosodic characteristics of the speech such as pitch, volume, phone durations, accents, stress, mis-pronunciations and emotion.Type: GrantFiled: November 4, 2008Date of Patent: June 21, 2011Assignee: AT&T Intellectual Property II, L.P.Inventors: David A. Kapilow, Kenneth H. Rosen, Juergen Schroeter
-
Patent number: 7957973Abstract: An audio signal interpolation device comprises a spectral movement calculation unit which determines a spectral movement which is indicative of a difference in each of spectral components between a frequency spectrum of a current frame of an input audio signal and a frequency spectrum of a previous frame of the input audio signal stored in a spectrum storing unit. An interpolation band determination unit determines a frequency band to be interpolated by using the frequency spectrum of the current frame and the spectral movement. A spectrum interpolation unit performs interpolation of spectral components in the frequency band for the current frame by using either the frequency spectrum of the current frame or the frequency spectrum of the previous frame.Type: GrantFiled: July 25, 2007Date of Patent: June 7, 2011Assignee: Fujitsu LimitedInventors: Masakiyo Tanaka, Masanao Suzuki, Miyuki Shirakawa, Takashi Makiuchi
-
Publication number: 20110125493Abstract: The voice quality conversion apparatus includes: low-frequency harmonic level calculating units and a harmonic level mixing unit for calculating a low-frequency sound source spectrum by mixing a level of a harmonic of an input sound source waveform and a level of a harmonic of a target sound source waveform at a predetermined conversion ratio for each order of harmonics including fundamental, in a frequency range equal to or lower than a boundary frequency; a high-frequency spectral envelope mixing unit that calculates a high-frequency sound source spectrum by mixing the input sound source spectrum and the target sound source spectrum at the predetermined conversion ratio in a frequency range larger than the boundary frequency; and a spectrum combining unit that combines the low-frequency sound source spectrum with the high-frequency sound source spectrum at the boundary frequency to generate a sound source spectrum for an entire frequency range.Type: ApplicationFiled: January 31, 2011Publication date: May 26, 2011Inventors: Yoshifumi Hirose, Takahiro Kamai
-
Patent number: 7945446Abstract: Spectrum envelope of an input sound is detected. In the meantime, a converting spectrum is acquired which is a frequency spectrum of a converting sound comprising a plurality of sounds, such as unison sounds. Output spectrum is generated by imparting the detected spectrum envelope of the input sound to the acquired converting spectrum. Sound signal is synthesized on the basis of the generated output spectrum. Further, a pitch of the input sound may be detected, and frequencies of peaks in the acquired converting spectrum may be varied in accordance with the detected pitch of the input sound. In this manner, the output spectrum can have the pitch and spectrum envelope of the input sound and spectrum frequency components of the converting sound comprising a plurality of sounds, and thus, unison sounds can be readily generated with simple arrangements.Type: GrantFiled: March 9, 2006Date of Patent: May 17, 2011Assignee: Yamaha CorporationInventors: Hideki Kemmochi, Yasuo Yoshioka, Jordi Bonada
-
Publication number: 20110087488Abstract: According to an embodiment, a speech synthesis apparatus includes a selecting unit configured to select speaker's parameters one by one for respective speakers and obtain a plurality of speakers' parameters, the speaker's parameters being prepared for respective pitch waveforms corresponding to speaker's speech sounds, the speaker's parameters including formant frequencies, formant phases, formant powers, and window functions concerning respective formants that are contained in the respective pitch waveforms. The apparatus includes a mapping unit configured to make formants correspond to each other between the plurality of speakers' parameters using a cost function based on the formant frequencies and the formant powers. The apparatus includes a generating unit configured to generate an interpolated speaker's parameter by interpolating, at desired interpolation ratios, the formant frequencies, formant phases, formant powers, and window functions of formants which are made to correspond to each other.Type: ApplicationFiled: December 16, 2010Publication date: April 14, 2011Inventors: Ryo Morinaka, Takehiko Kagoshima
-
Publication number: 20110054903Abstract: Embodiments of rich text modeling for speech synthesis are disclosed. In operation, a text-to-speech engine refines a plurality of rich context models based on decision tree-tied Hidden Markov Models (HMMs) to produce a plurality of refined rich context models. The text-to-speech engine then generates synthesized speech for an input text based at least on some of the plurality of refined rich context models.Type: ApplicationFiled: December 2, 2009Publication date: March 3, 2011Applicant: MICROSOFT CORPORATIONInventors: Zhi-Jie Yan, Yao Qian, Frank Kao-Ping Soong
-
Publication number: 20110046958Abstract: The present invention discloses a method and an apparatus for extracting a prosodic feature of a speech signal, the method including: dividing the speech signal into speech frames; transforming the speech frames from time domain to frequency domain; and extracting respective prosodic features for different frequency ranges. According to the above technical solution of the present invention, it is possible to effectively extract the prosodic feature which can combine with a traditional acoustics feature without any obstacle.Type: ApplicationFiled: August 16, 2010Publication date: February 24, 2011Applicant: Sony CorporationInventors: Kun LIU, Weiguo Wu
-
Publication number: 20110046957Abstract: Techniques are disclosed for frequency splicing in which speech segments used in the creation of a final speech waveform are constructed, at least in part, by combining (e.g., summing) a small number (e.g., two) of component speech segments that overlap substantially, or entirely, in time but have spectral energy that occupies disjoint, or substantially disjoint, frequency ranges. The component speech segments may be derived from speech segments produced by different speakers or from different speech segments produced by the same speaker. Depending on the embodiment, frequency splicing may supplement rule-based, concatenative, hybrid, or limited-vocabulary speech synthesis systems to provide various advantages.Type: ApplicationFiled: August 24, 2010Publication date: February 24, 2011Applicant: NovaSpeech, LLCInventors: Susan R. Hertz, Harold G. Mills
-
Patent number: 7865365Abstract: A method, system, and computer program product is disclosed for customizing a synthesized voice based upon audible input voice data. The input voice data is typically in the form of one or more predetermined paragraphs being read into a voice recorder. The input voice data is then analyzed for adjustable voice characteristics to determine basic voice qualities (e.g., pitch, breathiness, tone, speed; variability of any of these qualities, etc.) and to identify any “specialized speech patterns”. Based upon this analysis, the characteristics of the voice utilized to read text appearing on the screen are modified to resemble the input voice data. This allows a user of the system to easily and automatically create a voice that is familiar to the user.Type: GrantFiled: August 5, 2004Date of Patent: January 4, 2011Assignee: Nuance Communications, Inc.Inventors: Debbie Ann Anglin, Howard Neil Anglin, Nyralin Novella Kline
-
Publication number: 20100324907Abstract: The invention proposes the synthesis of a signal consisting of consecutive blocks. It proposes more particularly, on receipt of such a signal, to replace, by synthesis, lost or erroneous blocks of this signal. To this end, it proposes an attenuation of the overvoicing during the generation of a signal synthesis. More particularly, a voiced excitation is generated on the basis of the pitch period (T) estimated or transmitted at the previous block, by optionally applying a correction of plus or minus a sample of the duration of this period (counted in terms of number of samples), by constituting groups (A?,B?,C?,D?) of at least two samples and inverting positions of samples in the groups, randomly (B?,C?) or in a forced manner. An over-harmonicity in the excitation generated is thus broken and the effect of overvoicing in the synthesis of the generated signal is thereby attenuated.Type: ApplicationFiled: October 17, 2007Publication date: December 23, 2010Applicant: France TelecomInventors: David Virette, Balazs Kovesi
-
Patent number: 7805306Abstract: For a voice guidance phrase, multiple voice data items having individually different voice ranges or frequencies are previously stored in a memory. A voice mixing unit chooses to mix three voice data items among the stored voice data items and thereby produces a mixed voice data item. A voice outputting unit converts the mixed voice data item into a voice and then vocalizes a voice guidance phrase via a speaker. A voice measuring unit measures a characteristic of a frequency, a volume, or a pronunciation speed with respect to a response voice responding to the outputted voice guidance phrase. A voice mixing unit produces a mixed voice data item having a characteristic similar to the measured characteristic and outputs it.Type: GrantFiled: July 18, 2005Date of Patent: September 28, 2010Assignee: Denso CorporationInventor: Takao Mitsui
-
Patent number: 7801725Abstract: A method for speech quality degradation estimation, a method for degradation measures calculation, and the apparatuses thereof are provided. The first method above estimates the speech quality of a speech signal that is modified by a pitch-synchronous prosody modification method, which comprises the following steps. First, extract at least one source pitchmark from the speech signal, and then maps the source pitchmark(s) to at least one target pitchmark(s). Finally, calculate at least one degradation measure based on the mapping between the source and the target pitchmarks. The degradation measures include several weighted pitch-related functions and duration-related functions, where the weighting functions can be calculated based on the speech signal or the pitchmark(s) mapping mentioned above.Type: GrantFiled: June 29, 2006Date of Patent: September 21, 2010Assignee: Industrial Technology Research InstituteInventors: Shi-Han Chen, Chih-Chung Kuo, Shun-Ju Chen
-
Patent number: 7792673Abstract: An apparatus and method for adjusting the friendliness of a synthesized speech and thus generating synthesized speech of various styles in a speech synthesis system are provided. The method includes the steps of defining at least two friendliness levels; storing recorded speech data of sentences, the sentences being made up according to each of the friendliness levels; extracting at least one of prosodic characteristics for each of the friendliness levels from the recorded speech data, said prosodic characteristics including at least one of a sentence-final intonation type, boundary intonation types of intonation phrases in the sentence, and an average value of F0 of the sentence, with respect to the recorded speech data; and generating a prosodic model for each of the friendliness levels by statistically modeling the at least one of the prosodic characteristics.Type: GrantFiled: November 7, 2006Date of Patent: September 7, 2010Assignee: Electronics and Telecommunications Research InstituteInventors: Seung Shin Oh, Sang Hun Kim, Young Jik Lee
-
Publication number: 20100223058Abstract: A speech synthesis device includes a pitch pattern generation unit (104) which generates a pitch pattern by combining, based on pitch pattern target data including phonemic information formed from at least syllables, phonemes, and words, a standard pattern which approximately expresses the rough shape of the pitch pattern and an original utterance pattern which expresses the pitch pattern of a recorded speech, a unit waveform selection unit (106) which selects unit waveform data based on the generated pitch pattern and upon selection, selects original utterance unit waveform data corresponding to the original utterance pattern in a section where the original utterance pattern is used, and a speech waveform generation unit (107) which generates a synthetic speech by editing the selected unit waveform data so as to reproduce prosody represented by the generated pitch pattern.Type: ApplicationFiled: August 28, 2008Publication date: September 2, 2010Inventors: Yasuyuki Mitsui, Reishi Kondo
-
Patent number: 7739113Abstract: A voice synthesizer includes a recorded voice storage portion (124) that stores recorded voices that are pre-recorded; a voice input portion (110) that is input with a reading voice reading out a text that is to be generated by the synthesized voice; an attribute information input portion (112) that is input with a label string, which is a string of labels assigned to each phoneme included in the reading voice, and label information, which indicates the border position of each phoneme corresponding to each label; a parameter extraction portion (116) that extracts characteristic parameters of the reading voice based on the label string, the label information, and the reading voice; and a voice synthesis portion (122) that selects the recorded voices from the recorded voice storage portion in accordance with the characteristic parameters, synthesizes the recorded voices, and generates the synthesized voice that reads out the text.Type: GrantFiled: November 9, 2006Date of Patent: June 15, 2010Assignee: Oki Electric Industry Co., Ltd.Inventor: Tsutomu Kaneyasu
-
Patent number: 7720679Abstract: Provided is a method for canceling background noise of a sound source other than a target direction sound source in order to realize highly accurate speech recognition, and a system using the same. In terms of directional characteristics of a microphone array, due to a capability of approximating a power distribution of each angle of each of possible various sound source directions by use of a sum of coefficient multiples of a base form angle power distribution of a target sound source measured beforehand by base form angle by using a base form sound, and power distribution of a non-directional background sound by base form, only a component of the target sound source direction is extracted at a noise suppression part. In addition, when the target sound source direction is unknown, at a sound source localization part, a distribution for minimizing the approximate residual is selected from base form angle power distributions of various sound source directions to assume a target sound source direction.Type: GrantFiled: September 24, 2008Date of Patent: May 18, 2010Assignee: Nuance Communications, Inc.Inventors: Osamu Ichikawa, Tetsuya Takiguchi, Masafumi Nishimura
-
Patent number: 7716052Abstract: A method, apparatus and a computer program product to generate an audible speech word that corresponds to text. The method includes providing a text word and, in response to the text word, processing pre-recorded speech segments that are derived from a plurality of speakers to selectively concatenate together speech segments based on at least one cost function to form audio data for generating an audible speech word that corresponds to the text word. A data structure is also provided for use in a concatenative text-to-speech system that includes a plurality of speech segments derived from a plurality of speakers, where each speech segment includes an associated attribute vector each of which is comprised of at least one attribute vector element that identifies the speaker from which the speech segment was derived.Type: GrantFiled: April 7, 2005Date of Patent: May 11, 2010Assignee: Nuance Communications, Inc.Inventors: Andrew S. Aaron, Ellen M. Eide, Wael M. Hamza, Michael A. Picheny, Charles T. Rutherfoord, Zhi Wei Shuang, Maria E. Smith
-
Patent number: 7680652Abstract: A signal enhancement system improves the understandability of speech or other audio signals. The system reinforces selected parts of the signal, may attenuate selected parts of the signal, and may increase SNR. The system includes delay logic, an adaptive filter, and signal reinforcement logic. The adaptive filter may track one or more fundamental frequencies in the input signal and outputs a filtered signal. The filtered signal may approximately reproduce the input signal approximately delayed by an integer multiple of the signal's fundamental frequencies. The reinforcement logic combines the input signal and the filtered signal output to produce an enhanced signal output.Type: GrantFiled: October 26, 2004Date of Patent: March 16, 2010Assignee: QNX Software Systems (Wavemakers), Inc.Inventors: David Giesbrecht, Phillip Hetherington
-
Patent number: 7672835Abstract: An FFT unit performs an FFT process on high-frequency-eliminated, pitch-shifted voice data for one frame. A time scaling unit calculates a frequency amplitude, a phase, a phase difference between the present and immediately preceding frames, and an unwrapped version of the phase difference for each channel from which the frequency component was obtained by the FFT, detects a reference channel based on a peak one of the frequency amplitudes, and calculates the phase of each channel in a synthesized voice based on the reference channel, using results of the calculation. An IFFT unit processes each frequency component in accordance with the calculated phase, performs an IFFT process on the resulting frequency component, and produces synthesized voice data for one frame.Type: GrantFiled: December 19, 2005Date of Patent: March 2, 2010Assignee: Casio Computer Co., Ltd.Inventor: Masaru Setoguchi
-
Patent number: 7660718Abstract: Pitch detection of speech signals finds numerous applications in karaoke, voice recognition and scoring applications. While most of the existing techniques rely on time domain methods, the invention utilizes frequency domain methods. There is provided a method and system for determining the pitch of speech from a speech signal. The method includes the steps of: producing or obtaining the speech signal; distinguishing the speech signal into voiced, unvoiced or silence sections using speech signal energy levels; applying a Fourier Transform to the speech signal and obtaining speech signal parameters; determining peaks of the Fourier transformed speech signal; tracking the speech signal parameters of the determined peaks to select partials; and determining the pitch from the selected partials using a two-way mismatch error calculation.Type: GrantFiled: September 23, 2004Date of Patent: February 9, 2010Assignee: STMicroelectronics Asia Pacific Pte. Ltd.Inventors: Kabi Prakash Padhi, Sapna George
-
Publication number: 20090326951Abstract: Ratios of powers at the peaks of respective formants of the spectrum of a pitch-cycle waveform and powers at boundaries between the formants are obtained and, when the ratios are large, bandwidth of window functions are widened and the formant waveforms are generated by multiplying generated sinusoidal waveforms from the formant parameter sets on the basis of pitch-cycle waveform generating data by the window functions of the widened bandwidth, whereby a pitch-cycle waveform is generated by the sum of these formant waveforms.Type: ApplicationFiled: April 14, 2009Publication date: December 31, 2009Applicant: KABUSHIKI KAISHA TOSHIBAInventors: Ryo Morinaka, Takehiko Kagoshima
-
Patent number: 7630896Abstract: A speech synthesis system in a preferred embodiment includes a speech unit storage section, a phonetic environment storage section, a phonetic sequence/prosodic information input section, a plural-speech-unit selection section, a fused-speech-unit sequence generation section, and a fused-speech-unit modification/concatenation section. By fusing a plurality of selected speech units in the fused speech unit sequence generation section, a fused speech unit is generated. In the fused speech unit sequence generation section, the average power information is calculated for a plurality of selected M speech units, N speech units are fused together, and the power information of the fused speech unit is so corrected as to be equalized with the average power information of the M speech units.Type: GrantFiled: September 23, 2005Date of Patent: December 8, 2009Assignee: Kabushiki Kaisha ToshibaInventors: Masatsune Tamura, Gou Hirabayashi, Takehiko Kagoshima
-
Publication number: 20090299747Abstract: An apparatus for providing improved speech synthesis may include a processor and a memory storing executable instructions. In response to execution of the instructions by the processor, the apparatus may perform at least selecting a real glottal pulse from among one or more stored real glottal pulses based at least in part on a property associated with the real glottal pulse, utilizing the real glottal pulse selected as a basis for generation of an excitation signal, and modifying the excitation signal based on spectral parameters generated by a model to provide synthetic speech.Type: ApplicationFiled: May 29, 2009Publication date: December 3, 2009Inventors: Tuomo Johannes Raitio, Antti Santeri Suni, Martti Tapani Vainio, Paavo Ilmari Alku, Jani Kristian Nurminen
-
Patent number: 7613612Abstract: In a voice synthesizer, an envelope acquisition portion obtains a spectral envelope of a reference frequency spectrum of a given voice. A spectrum acquisition portion obtains a collective frequency spectrum of a plurality of voices which are generated in parallel to one another. An envelope adjustment portion adjusts a spectral envelope of the collective frequency spectrum obtained by the spectrum acquisition portion so as to approximately match with the spectral envelope of the reference frequency spectrum obtained by the envelope acquisition portion. A voice generation portion generates an output voice signal from the collective frequency spectrum having the spectral envelope adjusted by the envelope adjustment portion.Type: GrantFiled: January 31, 2006Date of Patent: November 3, 2009Assignee: Yamaha CorporationInventors: Hideki Kemmochi, Jordi Bonada
-
Publication number: 20090265173Abstract: A tone detector and associated method for use with EVRC-B and GSM vocoders to enable reliable detection of system connect tones over a wireless communication system. The tone detection method examines a number of sequential data frames of the signal received from the vocoder and determines that the tone is present if the spectral energy at frequencies around the tone is much higher than that at neighboring frequencies and if the calculated center frequency of the data frames is at or near the frequency of the tone.Type: ApplicationFiled: April 18, 2008Publication date: October 22, 2009Applicant: GENERAL MOTORS CORPORATIONInventors: Sethu K. Madhavan, Jijun Yin, Qin Jiang, Darrel James Van Buer
-
Publication number: 20090248417Abstract: A method to generate a pitch contour for speech synthesis is proposed. The method is based on finding the pitch contour that maximizes a total likelihood function created by the combination of all the statistical models of the pitch contour segments of an utterance, at one or multiple linguistic levels. These statistical models are trained from a database of spoken speech, by means of a decision tree that for each linguistic level clusters the parametric representation of the pitch segments extracted from the spoken speech data with some features obtained from the text associated with that speech data. The parameterization of the pitch segments is performed in such a way, the likelihood function of any linguistic level can be expressed in terms of the parameters of one of the levels, thus allowing the maximization to be calculated with respect to the parameters of that level.Type: ApplicationFiled: March 17, 2009Publication date: October 1, 2009Applicant: KABUSHIKI KAISHA TOSHIBAInventors: Javier Latorre, Masami Akamine
-
Publication number: 20090204405Abstract: Apparatus and method for generating high quality synthesized speech having smooth waveform concatenation. The apparatus includes a pitch frequency calculation section, a pitch synchronization position calculation section, a unit waveform storage, a unit waveform selection section, a unit waveform generation section, and a waveform synthesis section. The unit waveform generation section includes a conversion ratio calculation section, a sampling rate conversion section, and a unit waveform re-selection section. The conversion ratio calculation section calculates a sampling rate conversion ratio from the pitch information and the position of pitch synchronization, and the sampling rate conversion section converts the sampling rate of the unit waveform, delivered as input, based on the sampling rate conversion ratio.Type: ApplicationFiled: September 4, 2006Publication date: August 13, 2009Applicant: NEC CORPORATIONInventors: Masanori Kato, Satoshi Tsukada
-
Patent number: 7571099Abstract: A voice synthesis device for generating synthetic voice having great freedom in voice quality and good sound quality from text data is provided.Type: GrantFiled: January 17, 2005Date of Patent: August 4, 2009Assignee: Panasonic CorporationInventors: Natsuki Saito, Takahiro Kamai, Yumiko Kato
-
Patent number: 7562018Abstract: A language processing portion (31) analyzes a text from a dialogue processing section (20) and transforms the text to information on pronunciation and accent. A prosody generation portion (32) generates an intonation pattern according to a control signal from the dialogue processing section (20). A waveform DB (34) stores prerecorded waveform data together with pitch mark data imparted thereto. A waveform cutting portion (33) cuts desired pitch waveforms from the waveform DB (34). A phase operation portion (35) removes phase fluctuation by standardizing phase spectra of the pitch waveforms cut by the waveform cutting portion (33), and afterwards imparts phase fluctuation by diffusing only high phase components randomly according to the control signal from the dialogue processing section (20). The thus-produced pitch waveforms are placed at desired intervals and superimposed.Type: GrantFiled: November 25, 2003Date of Patent: July 14, 2009Assignee: Panasonic CorporationInventors: Takahiro Kamai, Yumiko Kato