Frequency Element Patents (Class 704/268)
  • Patent number: 8121835
    Abstract: Automatic level control of speech portions of an audio signal is provided. An audio signal is received in the form of a sequence of samples and may contain speech portion and non-speech portions. The sequence of samples is divided into a sequence of sub-frames. Multiple sub-frames adjacent to a present sub-frame are examined to determine a peak value of samples in the sub-frames. A gain factor is computed for the present sub-frame based on the peak value and a desired maximum value for said speech portion, and each sample in the present sub-frame is amplified by the gain factor. In an embodiment, variations in filtered energy values of multiple sub-frames enable determination of whether a sub-frame corresponds to a speech or non-speech/noise portion.
    Type: Grant
    Filed: March 6, 2008
    Date of Patent: February 21, 2012
    Assignee: Texas Instruments Incorporated
    Inventor: Fitzgerald John Archibald
  • Patent number: 8108216
    Abstract: In a speech synthesis, a selecting unit selects one string from first speech unit strings corresponding to a first segment sequence obtained by dividing a phoneme string corresponding to target speech into segments. The selecting unit performs repeatedly generating, based on maximum W second speech unit strings corresponding to a second segment sequence as a partial sequence of the first sequence, third speech unit strings corresponding to a third segment sequence obtained by adding a segment to the second sequence, and selecting maximum W strings from the third strings based on a evaluation value of each of the third strings. The value is obtained by correcting a total cost of each of the third string candidate with a penalty coefficient for each of the third strings. The coefficient is based on a restriction concerning quickness of speech unit data acquisition, and depends on extent in which the restriction is approached.
    Type: Grant
    Filed: March 19, 2008
    Date of Patent: January 31, 2012
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Masahiro Morita, Takehiko Kagoshima
  • Patent number: 8108217
    Abstract: A noise adaptive mobile communication device including a noise collecting microphone which collects noise from a peripheral environment; a noise sensing unit which senses the collected noise; a frequency-component detecting unit which detects a frequency component of the sensed noise; a sound generating unit which generates a noise-adaptive sound from the detected frequency component; a call-sound synthesizing unit which synthesizes received call sound with the noise-adaptive sound; and an operation control unit which controls the call-sound synthesizing unit to operate each predetermined time.
    Type: Grant
    Filed: February 11, 2005
    Date of Patent: January 31, 2012
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Myung-Hyun Yoo, Jaywoo Kim, Joonah Park, Seung-Nyung Chung
  • Patent number: 8103505
    Abstract: A method and apparatus for speech synthesis in a computer-user interface using random paralinguistic variation is described herein. According to one aspect of the present invention, a method for synthesizing speech comprises generating synthesized speech having certain prosodic features. The synthesized speech is further processed by applying a random paralinguistic variation to the acoustic sequence representing the synthesized speech without altering the linguistic prosodic features. According to one aspect of the present invention, the application of the paralinguistic variation is correlated with a previously applied paralinguistic variation to reflect a gradual change in the computer voice, while still maintaining a random quality.
    Type: Grant
    Filed: November 19, 2003
    Date of Patent: January 24, 2012
    Assignee: Apple Inc.
    Inventors: Kim Silverman, Donald Lindsay
  • Publication number: 20110320207
    Abstract: The invention relates to a method for speech signal analysis, modification and synthesis comprising a phase for the location of analysis windows by means of an iterative process for the determination of the phase of the first sinusoidal component and comparison between the phase value of said component and a predetermined value, a phase for the selection of analysis frames corresponding to an allophone and readjustment of the duration and the fundamental frequency according to certain thresholds and a phase for the generation of synthetic speech from synthesis frames taking the information of the closest analysis frame as spectral information of the synthesis frame and taking as many synthesis frames as periods that the synthetic signal has. The method allows a coherent location of the analysis windows within the periods of the signal and the exact generation of the synthesis instants in a manner synchronous with the fundamental period.
    Type: Application
    Filed: December 21, 2010
    Publication date: December 29, 2011
    Applicant: TELEFONICA, S.A.
    Inventors: Miguel Angel Rodriguez Crespo, Jose Gregorio Escalada Sardina, Ana Armenta Lopez Vicuna
  • Patent number: 8073696
    Abstract: A voice synthesis device is provided to include: an emotion input unit obtaining an utterance mode of a voice waveform, a prosody generation unit generating a prosody, a characteristic tone selection unit selecting a characteristic tone based on the utterance mode; and a characteristic tone temporal position estimation unit (i) judging whether or not each of phonemes included in a phonologic sequence of text is to be uttered with the characteristic tone, based on the phonologic sequence, the characteristic tone, and the prosody, and (ii) deciding a phoneme, which is an utterance position where the text is uttered with the characteristic tone. The voice synthesis device also includes an element selection unit and an element connection unit generating the voice waveform based on the phonologic sequence, the prosody, and the utterance position, so that the text is uttered in the utterance mode with the characteristic tone at the determined utterance position.
    Type: Grant
    Filed: May 2, 2006
    Date of Patent: December 6, 2011
    Assignee: Panasonic Corporation
    Inventors: Yumiko Kato, Takahiro Kamai
  • Publication number: 20110270614
    Abstract: A method and an apparatus for switching speech or audio signals, wherein the method for switching speech or audio signals includes when switching of a speech or audio, weighting a first high frequency band signal of a current frame of speech or audio signal and a second high frequency band signal of the previous M frame of speech or audio signals to obtain a processed first high frequency band signal, where M is greater than or equal to 1, and synthesizing the processed first high frequency band signal and a first low frequency band signal of the current frame of speech or audio signal into a wide frequency band signal. In this way, speech or audio signals with different bandwidths can be smoothly switched, thus improving the quality of audio signals received by a user.
    Type: Application
    Filed: June 16, 2011
    Publication date: November 3, 2011
    Inventors: Zexin Liu, Lei Miao, Chen Hu, Wenhai Wu, Yue Lang, Qing Zhang
  • Patent number: 8046225
    Abstract: Normalization parameters are generated at a normalization-parameter generating unit by calculating the mean values and the standard deviations of an initial prosody pattern and a prosody pattern of a training sentence of a speech corpus. Then, the variance range or variance width of the initial prosody pattern is normalized at the prosody-pattern normalizing unit in accordance with the normalization parameters. As a result, a prosody pattern similar to speech of human beings and improved in naturalness can be generated with a small amount of calculation.
    Type: Grant
    Filed: February 8, 2008
    Date of Patent: October 25, 2011
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Takashi Masuko, Masami Akamine
  • Patent number: 8036894
    Abstract: Methods, apparatus, systems, and computer program products are provided for synthesizing speech. One method includes matching a first level of units of a received input string to audio segments from a plurality of audio segments including using properties of or between first level units to locate matching audio segments from a plurality of selections, parsing unmatched first level units into second level units, matching the second level units to audio segments using properties of or between the units to locate matching audio segments from a plurality of selections and synthesizing the input string, including combining the audio segments associated with the first and second units.
    Type: Grant
    Filed: February 16, 2006
    Date of Patent: October 11, 2011
    Assignee: Apple Inc.
    Inventors: Matthias Neeracher, Devang K. Naik, Kevin B. Aitken, Jerome R. Bellegarda, Kim E.A. Silverman
  • Patent number: 8027837
    Abstract: Systems, apparatus, methods and computer program products are described for producing text-to-speech synthesis with non-speech sounds. In general, some of the pauses or silences that would otherwise be generated in synthesized speech are instead synthesized as non-speech sounds such as breaths. Non-speech sounds can be identified from pre-recorded speech that can include meta-data such as the grammatical and phrasal structure of words and sounds that precede and succeed non-speech sounds. A non-speech sound can be selected for use in synthesized speech based on the words, punctuation, grammatical and phrasal structure of text from which the speech is being synthesized, or other characteristics.
    Type: Grant
    Filed: September 15, 2006
    Date of Patent: September 27, 2011
    Assignee: Apple Inc.
    Inventors: Kim E. A. Silverman, Matthias Neeracher
  • Publication number: 20110224977
    Abstract: A robot may include a driving control unit configured to control a driving of a movable unit that is connected movably to a body unit, a voice generating unit configured to generate a voice, and a voice output unit configured to output the voice, which has been generated by the voice generating unit. The voice generating unit may correct the voice, which is generated, based on a bearing of the movable unit, which is controlled by the driving control unit, to the body unit.
    Type: Application
    Filed: September 14, 2010
    Publication date: September 15, 2011
    Applicant: HONDA MOTOR CO., LTD.
    Inventors: Kazuhiro NAKADAI, Takuma OTSUKA, Hiroshi OKUNO
  • Publication number: 20110218810
    Abstract: A system for controlling digital effects in live performances with vocal improvisation is described. The system features a complex controller that in one embodiment utilizes several magnetically activated electronic switches attached to a glove that is worn by an artist during a live performance. The switches are activated by a permanent magnet that is also attached to the switch bearing glove and a second magnet attached to a glove worn on the opposite hand. Furthermore, the switches are wirelessly connected by a miniature, battery-operated wireless data communications unit to a digital vocal processor unit that provides a dual mode, multi-channel phrase looping capability wherein individual channels can be selected for re-recording and selected banks of channels can be deleted during the performance. This combination of features allows a complex sequence of digital effects to be controlled by the artist during a performance while maintaining the freedom of movement desired to enhance the performance.
    Type: Application
    Filed: February 28, 2011
    Publication date: September 8, 2011
    Inventor: Momilani Ramstrum
  • Patent number: 8000968
    Abstract: A method and an apparatus for switching speech or audio signals, wherein the method for switching speech or audio signals includes when switching of a speech or audio, weighting a first high frequency band signal of a current frame of speech or audio signal and a second high frequency band signal of the previous M frame of speech or audio signals to obtain a processed first high frequency band signal, where M is greater than or equal to 1, and synthesizing the processed first high frequency band signal and a first low frequency band signal of the current frame of speech or audio signal into a wide frequency band signal. In this way, speech or audio signals with different bandwidths can be smoothly switched, thus improving the quality of audio signals received by a user.
    Type: Grant
    Filed: April 26, 2011
    Date of Patent: August 16, 2011
    Assignee: Huawei Technologies Co., Ltd.
    Inventors: Zexin Liu, Lei Miao, Chan Hu, Wenhai Wu, Yue Lang, Qing Zhang
  • Publication number: 20110196680
    Abstract: When a system (100) is used for synthesizing speech having prosody serving as a reference, the system stores speech element information representing a speech element capable of synthesizing speech having a degree of naturalness indicating a degree of similarity to speech uttered by a human higher than a predetermined reference value (speech element information storage (115)). The system accepts requested prosody information representing prosody requested by the user (requested prosody information accepting part (113)). The system generates intermediate prosody information representing intermediate prosody between the reference prosody and the requested prosody (intermediate prosody information generator (114)). The system executes a speech synthesis process to synthesize speech based on the generated intermediate prosody information and the stored speech element information (speech synthesizer (116)).
    Type: Application
    Filed: August 21, 2009
    Publication date: August 11, 2011
    Applicant: NEC CORPORATION
    Inventor: Masanori Kato
  • Patent number: 7987090
    Abstract: A system capable of reducing the influence of sound reverberation or reflection to improve sound-source separation accuracy. An original signal X(?,f) is separated from an observed signal Y(?,f) according to a first model and a second model to extract an unknown signal E(?,f). According to the first model, the original signal X(?,f) of the current frame f is represented as a combined signal of known signals S(?,f?m+1) (m=1 to M) that span a certain number M of current and previous frames. This enables extraction of the unknown signal E(?,f) without changing the window length while reducing the influence of reverberation or reflection of the known signal S(?,f) on the observed signal Y(?,f).
    Type: Grant
    Filed: August 7, 2008
    Date of Patent: July 26, 2011
    Assignee: Honda Motor Co., Ltd.
    Inventors: Ryu Takeda, Kazuhiro Nakadai, Hiroshi Tsujino, Hiroshi Okuno
  • Patent number: 7979280
    Abstract: An input linguistic description is converted into a speech waveform by deriving at least one target unit sequence corresponding to the linguistic description, selecting from a waveform unit database for the target unit sequences a plurality of alternative unit sequences approximating the target unit sequences, concatenating the alternative unit sequences to alternative speech waveforms and presenting the alternative speech waveforms to an operating person and enabling the choice of one of the presented alternative speech waveforms. There are no iterative cycles of manual modification and automatic selection, which enables a fast way of working. The operator does not need knowledge of units, targets, and costs, but chooses from a set of given alternatives. The fine-tuning of TTS prompts therefore becomes accessible to non-experts.
    Type: Grant
    Filed: February 22, 2007
    Date of Patent: July 12, 2011
    Assignee: Svox AG
    Inventors: Johan Wouters, Christof Traber, Marcel Riedi, Martin Reber, Jürgen Keller
  • Patent number: 7966186
    Abstract: A system and method for generating a synthetic text-to-speech TTS voice are disclosed. A user is presented with at least one TTS voice and at least one voice characteristic. A new synthetic TTS voice is generated by blending a plurality of existing TTS voices according to the selected voice characteristics. The blending of voices involves interpolating segmented parameters of each TTS voice. Segmented parameters may be, for example, prosodic characteristics of the speech such as pitch, volume, phone durations, accents, stress, mis-pronunciations and emotion.
    Type: Grant
    Filed: November 4, 2008
    Date of Patent: June 21, 2011
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: David A. Kapilow, Kenneth H. Rosen, Juergen Schroeter
  • Patent number: 7957973
    Abstract: An audio signal interpolation device comprises a spectral movement calculation unit which determines a spectral movement which is indicative of a difference in each of spectral components between a frequency spectrum of a current frame of an input audio signal and a frequency spectrum of a previous frame of the input audio signal stored in a spectrum storing unit. An interpolation band determination unit determines a frequency band to be interpolated by using the frequency spectrum of the current frame and the spectral movement. A spectrum interpolation unit performs interpolation of spectral components in the frequency band for the current frame by using either the frequency spectrum of the current frame or the frequency spectrum of the previous frame.
    Type: Grant
    Filed: July 25, 2007
    Date of Patent: June 7, 2011
    Assignee: Fujitsu Limited
    Inventors: Masakiyo Tanaka, Masanao Suzuki, Miyuki Shirakawa, Takashi Makiuchi
  • Publication number: 20110125493
    Abstract: The voice quality conversion apparatus includes: low-frequency harmonic level calculating units and a harmonic level mixing unit for calculating a low-frequency sound source spectrum by mixing a level of a harmonic of an input sound source waveform and a level of a harmonic of a target sound source waveform at a predetermined conversion ratio for each order of harmonics including fundamental, in a frequency range equal to or lower than a boundary frequency; a high-frequency spectral envelope mixing unit that calculates a high-frequency sound source spectrum by mixing the input sound source spectrum and the target sound source spectrum at the predetermined conversion ratio in a frequency range larger than the boundary frequency; and a spectrum combining unit that combines the low-frequency sound source spectrum with the high-frequency sound source spectrum at the boundary frequency to generate a sound source spectrum for an entire frequency range.
    Type: Application
    Filed: January 31, 2011
    Publication date: May 26, 2011
    Inventors: Yoshifumi Hirose, Takahiro Kamai
  • Patent number: 7945446
    Abstract: Spectrum envelope of an input sound is detected. In the meantime, a converting spectrum is acquired which is a frequency spectrum of a converting sound comprising a plurality of sounds, such as unison sounds. Output spectrum is generated by imparting the detected spectrum envelope of the input sound to the acquired converting spectrum. Sound signal is synthesized on the basis of the generated output spectrum. Further, a pitch of the input sound may be detected, and frequencies of peaks in the acquired converting spectrum may be varied in accordance with the detected pitch of the input sound. In this manner, the output spectrum can have the pitch and spectrum envelope of the input sound and spectrum frequency components of the converting sound comprising a plurality of sounds, and thus, unison sounds can be readily generated with simple arrangements.
    Type: Grant
    Filed: March 9, 2006
    Date of Patent: May 17, 2011
    Assignee: Yamaha Corporation
    Inventors: Hideki Kemmochi, Yasuo Yoshioka, Jordi Bonada
  • Publication number: 20110087488
    Abstract: According to an embodiment, a speech synthesis apparatus includes a selecting unit configured to select speaker's parameters one by one for respective speakers and obtain a plurality of speakers' parameters, the speaker's parameters being prepared for respective pitch waveforms corresponding to speaker's speech sounds, the speaker's parameters including formant frequencies, formant phases, formant powers, and window functions concerning respective formants that are contained in the respective pitch waveforms. The apparatus includes a mapping unit configured to make formants correspond to each other between the plurality of speakers' parameters using a cost function based on the formant frequencies and the formant powers. The apparatus includes a generating unit configured to generate an interpolated speaker's parameter by interpolating, at desired interpolation ratios, the formant frequencies, formant phases, formant powers, and window functions of formants which are made to correspond to each other.
    Type: Application
    Filed: December 16, 2010
    Publication date: April 14, 2011
    Inventors: Ryo Morinaka, Takehiko Kagoshima
  • Publication number: 20110054903
    Abstract: Embodiments of rich text modeling for speech synthesis are disclosed. In operation, a text-to-speech engine refines a plurality of rich context models based on decision tree-tied Hidden Markov Models (HMMs) to produce a plurality of refined rich context models. The text-to-speech engine then generates synthesized speech for an input text based at least on some of the plurality of refined rich context models.
    Type: Application
    Filed: December 2, 2009
    Publication date: March 3, 2011
    Inventors: Zhi-Jie Yan, Yao Qian, Frank Kao-Ping Soong
  • Publication number: 20110046958
    Abstract: The present invention discloses a method and an apparatus for extracting a prosodic feature of a speech signal, the method including: dividing the speech signal into speech frames; transforming the speech frames from time domain to frequency domain; and extracting respective prosodic features for different frequency ranges. According to the above technical solution of the present invention, it is possible to effectively extract the prosodic feature which can combine with a traditional acoustics feature without any obstacle.
    Type: Application
    Filed: August 16, 2010
    Publication date: February 24, 2011
    Applicant: Sony Corporation
    Inventors: Kun LIU, Weiguo Wu
  • Publication number: 20110046957
    Abstract: Techniques are disclosed for frequency splicing in which speech segments used in the creation of a final speech waveform are constructed, at least in part, by combining (e.g., summing) a small number (e.g., two) of component speech segments that overlap substantially, or entirely, in time but have spectral energy that occupies disjoint, or substantially disjoint, frequency ranges. The component speech segments may be derived from speech segments produced by different speakers or from different speech segments produced by the same speaker. Depending on the embodiment, frequency splicing may supplement rule-based, concatenative, hybrid, or limited-vocabulary speech synthesis systems to provide various advantages.
    Type: Application
    Filed: August 24, 2010
    Publication date: February 24, 2011
    Applicant: NovaSpeech, LLC
    Inventors: Susan R. Hertz, Harold G. Mills
  • Patent number: 7865365
    Abstract: A method, system, and computer program product is disclosed for customizing a synthesized voice based upon audible input voice data. The input voice data is typically in the form of one or more predetermined paragraphs being read into a voice recorder. The input voice data is then analyzed for adjustable voice characteristics to determine basic voice qualities (e.g., pitch, breathiness, tone, speed; variability of any of these qualities, etc.) and to identify any “specialized speech patterns”. Based upon this analysis, the characteristics of the voice utilized to read text appearing on the screen are modified to resemble the input voice data. This allows a user of the system to easily and automatically create a voice that is familiar to the user.
    Type: Grant
    Filed: August 5, 2004
    Date of Patent: January 4, 2011
    Assignee: Nuance Communications, Inc.
    Inventors: Debbie Ann Anglin, Howard Neil Anglin, Nyralin Novella Kline
  • Publication number: 20100324907
    Abstract: The invention proposes the synthesis of a signal consisting of consecutive blocks. It proposes more particularly, on receipt of such a signal, to replace, by synthesis, lost or erroneous blocks of this signal. To this end, it proposes an attenuation of the overvoicing during the generation of a signal synthesis. More particularly, a voiced excitation is generated on the basis of the pitch period (T) estimated or transmitted at the previous block, by optionally applying a correction of plus or minus a sample of the duration of this period (counted in terms of number of samples), by constituting groups (A?,B?,C?,D?) of at least two samples and inverting positions of samples in the groups, randomly (B?,C?) or in a forced manner. An over-harmonicity in the excitation generated is thus broken and the effect of overvoicing in the synthesis of the generated signal is thereby attenuated.
    Type: Application
    Filed: October 17, 2007
    Publication date: December 23, 2010
    Applicant: France Telecom
    Inventors: David Virette, Balazs Kovesi
  • Patent number: 7805306
    Abstract: For a voice guidance phrase, multiple voice data items having individually different voice ranges or frequencies are previously stored in a memory. A voice mixing unit chooses to mix three voice data items among the stored voice data items and thereby produces a mixed voice data item. A voice outputting unit converts the mixed voice data item into a voice and then vocalizes a voice guidance phrase via a speaker. A voice measuring unit measures a characteristic of a frequency, a volume, or a pronunciation speed with respect to a response voice responding to the outputted voice guidance phrase. A voice mixing unit produces a mixed voice data item having a characteristic similar to the measured characteristic and outputs it.
    Type: Grant
    Filed: July 18, 2005
    Date of Patent: September 28, 2010
    Assignee: Denso Corporation
    Inventor: Takao Mitsui
  • Patent number: 7801725
    Abstract: A method for speech quality degradation estimation, a method for degradation measures calculation, and the apparatuses thereof are provided. The first method above estimates the speech quality of a speech signal that is modified by a pitch-synchronous prosody modification method, which comprises the following steps. First, extract at least one source pitchmark from the speech signal, and then maps the source pitchmark(s) to at least one target pitchmark(s). Finally, calculate at least one degradation measure based on the mapping between the source and the target pitchmarks. The degradation measures include several weighted pitch-related functions and duration-related functions, where the weighting functions can be calculated based on the speech signal or the pitchmark(s) mapping mentioned above.
    Type: Grant
    Filed: June 29, 2006
    Date of Patent: September 21, 2010
    Assignee: Industrial Technology Research Institute
    Inventors: Shi-Han Chen, Chih-Chung Kuo, Shun-Ju Chen
  • Patent number: 7792673
    Abstract: An apparatus and method for adjusting the friendliness of a synthesized speech and thus generating synthesized speech of various styles in a speech synthesis system are provided. The method includes the steps of defining at least two friendliness levels; storing recorded speech data of sentences, the sentences being made up according to each of the friendliness levels; extracting at least one of prosodic characteristics for each of the friendliness levels from the recorded speech data, said prosodic characteristics including at least one of a sentence-final intonation type, boundary intonation types of intonation phrases in the sentence, and an average value of F0 of the sentence, with respect to the recorded speech data; and generating a prosodic model for each of the friendliness levels by statistically modeling the at least one of the prosodic characteristics.
    Type: Grant
    Filed: November 7, 2006
    Date of Patent: September 7, 2010
    Assignee: Electronics and Telecommunications Research Institute
    Inventors: Seung Shin Oh, Sang Hun Kim, Young Jik Lee
  • Publication number: 20100223058
    Abstract: A speech synthesis device includes a pitch pattern generation unit (104) which generates a pitch pattern by combining, based on pitch pattern target data including phonemic information formed from at least syllables, phonemes, and words, a standard pattern which approximately expresses the rough shape of the pitch pattern and an original utterance pattern which expresses the pitch pattern of a recorded speech, a unit waveform selection unit (106) which selects unit waveform data based on the generated pitch pattern and upon selection, selects original utterance unit waveform data corresponding to the original utterance pattern in a section where the original utterance pattern is used, and a speech waveform generation unit (107) which generates a synthetic speech by editing the selected unit waveform data so as to reproduce prosody represented by the generated pitch pattern.
    Type: Application
    Filed: August 28, 2008
    Publication date: September 2, 2010
    Inventors: Yasuyuki Mitsui, Reishi Kondo
  • Patent number: 7739113
    Abstract: A voice synthesizer includes a recorded voice storage portion (124) that stores recorded voices that are pre-recorded; a voice input portion (110) that is input with a reading voice reading out a text that is to be generated by the synthesized voice; an attribute information input portion (112) that is input with a label string, which is a string of labels assigned to each phoneme included in the reading voice, and label information, which indicates the border position of each phoneme corresponding to each label; a parameter extraction portion (116) that extracts characteristic parameters of the reading voice based on the label string, the label information, and the reading voice; and a voice synthesis portion (122) that selects the recorded voices from the recorded voice storage portion in accordance with the characteristic parameters, synthesizes the recorded voices, and generates the synthesized voice that reads out the text.
    Type: Grant
    Filed: November 9, 2006
    Date of Patent: June 15, 2010
    Assignee: Oki Electric Industry Co., Ltd.
    Inventor: Tsutomu Kaneyasu
  • Patent number: 7720679
    Abstract: Provided is a method for canceling background noise of a sound source other than a target direction sound source in order to realize highly accurate speech recognition, and a system using the same. In terms of directional characteristics of a microphone array, due to a capability of approximating a power distribution of each angle of each of possible various sound source directions by use of a sum of coefficient multiples of a base form angle power distribution of a target sound source measured beforehand by base form angle by using a base form sound, and power distribution of a non-directional background sound by base form, only a component of the target sound source direction is extracted at a noise suppression part. In addition, when the target sound source direction is unknown, at a sound source localization part, a distribution for minimizing the approximate residual is selected from base form angle power distributions of various sound source directions to assume a target sound source direction.
    Type: Grant
    Filed: September 24, 2008
    Date of Patent: May 18, 2010
    Assignee: Nuance Communications, Inc.
    Inventors: Osamu Ichikawa, Tetsuya Takiguchi, Masafumi Nishimura
  • Patent number: 7716052
    Abstract: A method, apparatus and a computer program product to generate an audible speech word that corresponds to text. The method includes providing a text word and, in response to the text word, processing pre-recorded speech segments that are derived from a plurality of speakers to selectively concatenate together speech segments based on at least one cost function to form audio data for generating an audible speech word that corresponds to the text word. A data structure is also provided for use in a concatenative text-to-speech system that includes a plurality of speech segments derived from a plurality of speakers, where each speech segment includes an associated attribute vector each of which is comprised of at least one attribute vector element that identifies the speaker from which the speech segment was derived.
    Type: Grant
    Filed: April 7, 2005
    Date of Patent: May 11, 2010
    Assignee: Nuance Communications, Inc.
    Inventors: Andrew S. Aaron, Ellen M. Eide, Wael M. Hamza, Michael A. Picheny, Charles T. Rutherfoord, Zhi Wei Shuang, Maria E. Smith
  • Patent number: 7680652
    Abstract: A signal enhancement system improves the understandability of speech or other audio signals. The system reinforces selected parts of the signal, may attenuate selected parts of the signal, and may increase SNR. The system includes delay logic, an adaptive filter, and signal reinforcement logic. The adaptive filter may track one or more fundamental frequencies in the input signal and outputs a filtered signal. The filtered signal may approximately reproduce the input signal approximately delayed by an integer multiple of the signal's fundamental frequencies. The reinforcement logic combines the input signal and the filtered signal output to produce an enhanced signal output.
    Type: Grant
    Filed: October 26, 2004
    Date of Patent: March 16, 2010
    Assignee: QNX Software Systems (Wavemakers), Inc.
    Inventors: David Giesbrecht, Phillip Hetherington
  • Patent number: 7672835
    Abstract: An FFT unit performs an FFT process on high-frequency-eliminated, pitch-shifted voice data for one frame. A time scaling unit calculates a frequency amplitude, a phase, a phase difference between the present and immediately preceding frames, and an unwrapped version of the phase difference for each channel from which the frequency component was obtained by the FFT, detects a reference channel based on a peak one of the frequency amplitudes, and calculates the phase of each channel in a synthesized voice based on the reference channel, using results of the calculation. An IFFT unit processes each frequency component in accordance with the calculated phase, performs an IFFT process on the resulting frequency component, and produces synthesized voice data for one frame.
    Type: Grant
    Filed: December 19, 2005
    Date of Patent: March 2, 2010
    Assignee: Casio Computer Co., Ltd.
    Inventor: Masaru Setoguchi
  • Patent number: 7660718
    Abstract: Pitch detection of speech signals finds numerous applications in karaoke, voice recognition and scoring applications. While most of the existing techniques rely on time domain methods, the invention utilizes frequency domain methods. There is provided a method and system for determining the pitch of speech from a speech signal. The method includes the steps of: producing or obtaining the speech signal; distinguishing the speech signal into voiced, unvoiced or silence sections using speech signal energy levels; applying a Fourier Transform to the speech signal and obtaining speech signal parameters; determining peaks of the Fourier transformed speech signal; tracking the speech signal parameters of the determined peaks to select partials; and determining the pitch from the selected partials using a two-way mismatch error calculation.
    Type: Grant
    Filed: September 23, 2004
    Date of Patent: February 9, 2010
    Assignee: STMicroelectronics Asia Pacific Pte. Ltd.
    Inventors: Kabi Prakash Padhi, Sapna George
  • Publication number: 20090326951
    Abstract: Ratios of powers at the peaks of respective formants of the spectrum of a pitch-cycle waveform and powers at boundaries between the formants are obtained and, when the ratios are large, bandwidth of window functions are widened and the formant waveforms are generated by multiplying generated sinusoidal waveforms from the formant parameter sets on the basis of pitch-cycle waveform generating data by the window functions of the widened bandwidth, whereby a pitch-cycle waveform is generated by the sum of these formant waveforms.
    Type: Application
    Filed: April 14, 2009
    Publication date: December 31, 2009
    Inventors: Ryo Morinaka, Takehiko Kagoshima
  • Patent number: 7630896
    Abstract: A speech synthesis system in a preferred embodiment includes a speech unit storage section, a phonetic environment storage section, a phonetic sequence/prosodic information input section, a plural-speech-unit selection section, a fused-speech-unit sequence generation section, and a fused-speech-unit modification/concatenation section. By fusing a plurality of selected speech units in the fused speech unit sequence generation section, a fused speech unit is generated. In the fused speech unit sequence generation section, the average power information is calculated for a plurality of selected M speech units, N speech units are fused together, and the power information of the fused speech unit is so corrected as to be equalized with the average power information of the M speech units.
    Type: Grant
    Filed: September 23, 2005
    Date of Patent: December 8, 2009
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Masatsune Tamura, Gou Hirabayashi, Takehiko Kagoshima
  • Publication number: 20090299747
    Abstract: An apparatus for providing improved speech synthesis may include a processor and a memory storing executable instructions. In response to execution of the instructions by the processor, the apparatus may perform at least selecting a real glottal pulse from among one or more stored real glottal pulses based at least in part on a property associated with the real glottal pulse, utilizing the real glottal pulse selected as a basis for generation of an excitation signal, and modifying the excitation signal based on spectral parameters generated by a model to provide synthetic speech.
    Type: Application
    Filed: May 29, 2009
    Publication date: December 3, 2009
    Inventors: Tuomo Johannes Raitio, Antti Santeri Suni, Martti Tapani Vainio, Paavo Ilmari Alku, Jani Kristian Nurminen
  • Patent number: 7613612
    Abstract: In a voice synthesizer, an envelope acquisition portion obtains a spectral envelope of a reference frequency spectrum of a given voice. A spectrum acquisition portion obtains a collective frequency spectrum of a plurality of voices which are generated in parallel to one another. An envelope adjustment portion adjusts a spectral envelope of the collective frequency spectrum obtained by the spectrum acquisition portion so as to approximately match with the spectral envelope of the reference frequency spectrum obtained by the envelope acquisition portion. A voice generation portion generates an output voice signal from the collective frequency spectrum having the spectral envelope adjusted by the envelope adjustment portion.
    Type: Grant
    Filed: January 31, 2006
    Date of Patent: November 3, 2009
    Assignee: Yamaha Corporation
    Inventors: Hideki Kemmochi, Jordi Bonada
  • Publication number: 20090265173
    Abstract: A tone detector and associated method for use with EVRC-B and GSM vocoders to enable reliable detection of system connect tones over a wireless communication system. The tone detection method examines a number of sequential data frames of the signal received from the vocoder and determines that the tone is present if the spectral energy at frequencies around the tone is much higher than that at neighboring frequencies and if the calculated center frequency of the data frames is at or near the frequency of the tone.
    Type: Application
    Filed: April 18, 2008
    Publication date: October 22, 2009
    Inventors: Sethu K. Madhavan, Jijun Yin, Qin Jiang, Darrel James Van Buer
  • Publication number: 20090248417
    Abstract: A method to generate a pitch contour for speech synthesis is proposed. The method is based on finding the pitch contour that maximizes a total likelihood function created by the combination of all the statistical models of the pitch contour segments of an utterance, at one or multiple linguistic levels. These statistical models are trained from a database of spoken speech, by means of a decision tree that for each linguistic level clusters the parametric representation of the pitch segments extracted from the spoken speech data with some features obtained from the text associated with that speech data. The parameterization of the pitch segments is performed in such a way, the likelihood function of any linguistic level can be expressed in terms of the parameters of one of the levels, thus allowing the maximization to be calculated with respect to the parameters of that level.
    Type: Application
    Filed: March 17, 2009
    Publication date: October 1, 2009
    Inventors: Javier Latorre, Masami Akamine
  • Publication number: 20090204405
    Abstract: Apparatus and method for generating high quality synthesized speech having smooth waveform concatenation. The apparatus includes a pitch frequency calculation section, a pitch synchronization position calculation section, a unit waveform storage, a unit waveform selection section, a unit waveform generation section, and a waveform synthesis section. The unit waveform generation section includes a conversion ratio calculation section, a sampling rate conversion section, and a unit waveform re-selection section. The conversion ratio calculation section calculates a sampling rate conversion ratio from the pitch information and the position of pitch synchronization, and the sampling rate conversion section converts the sampling rate of the unit waveform, delivered as input, based on the sampling rate conversion ratio.
    Type: Application
    Filed: September 4, 2006
    Publication date: August 13, 2009
    Applicant: NEC CORPORATION
    Inventors: Masanori Kato, Satoshi Tsukada
  • Patent number: 7571099
    Abstract: A voice synthesis device for generating synthetic voice having great freedom in voice quality and good sound quality from text data is provided.
    Type: Grant
    Filed: January 17, 2005
    Date of Patent: August 4, 2009
    Assignee: Panasonic Corporation
    Inventors: Natsuki Saito, Takahiro Kamai, Yumiko Kato
  • Patent number: 7562018
    Abstract: A language processing portion (31) analyzes a text from a dialogue processing section (20) and transforms the text to information on pronunciation and accent. A prosody generation portion (32) generates an intonation pattern according to a control signal from the dialogue processing section (20). A waveform DB (34) stores prerecorded waveform data together with pitch mark data imparted thereto. A waveform cutting portion (33) cuts desired pitch waveforms from the waveform DB (34). A phase operation portion (35) removes phase fluctuation by standardizing phase spectra of the pitch waveforms cut by the waveform cutting portion (33), and afterwards imparts phase fluctuation by diffusing only high phase components randomly according to the control signal from the dialogue processing section (20). The thus-produced pitch waveforms are placed at desired intervals and superimposed.
    Type: Grant
    Filed: November 25, 2003
    Date of Patent: July 14, 2009
    Assignee: Panasonic Corporation
    Inventors: Takahiro Kamai, Yumiko Kato
  • Publication number: 20090177475
    Abstract: Even when a pitch cycle has a large fluctuation and the pitch cycle string changes abruptly, it possible to suppress the affect of the pitch cycle fluctuation and generate high-quality synthesized speech. A speech synthesis device generates a synthesized speech corresponding to an input text sentence according to an original speech waveform stored in original speech waveform information storage unit (25). The speech synthesis device includes pitch cycle correction unit (40) which extracts a fluctuation component of the pitch cycle of the original speech waveform which is obtained from original speech waveform information storage unit (25) in order to generate the synthesized speech and which corrects, based on the extracted fluctuation component, the pitch cycle of the synthesized speech obtained by analyzing the input text sentence. Pitch cycle correction unit (40) connects the pitch cycle waveform of the original speech waveform at the pitch cycle of the corrected synthesized speech.
    Type: Application
    Filed: July 4, 2007
    Publication date: July 9, 2009
    Applicant: NEC CORPORATION
    Inventor: Masanori Kato
  • Patent number: 7558727
    Abstract: The present invention relates to a method of synthesizing a first sound signal based on a second sound signal, the first sound signal having a required first fundamental frequency and the second sound signal having a second fundamental frequency, the method comprising the steps of, a) determining of required pitch bell locations in the time domain of the first sound signal, the pitch bell locations being distanced by one period of the first fundamental frequency, b) providing of pitch bells by windowing the second sound signal on pitch bell locations in the time domain of the second sound signal, the pitch bell locations being distanced by one period of the second fundamental frequency, c) randomly selecting of a pitch bell from the provided pitch bells for each of the required pitch bell locations, d) performing an overlap and add operation on the selected pitch bells for synthesizing the first signal.
    Type: Grant
    Filed: August 5, 2003
    Date of Patent: July 7, 2009
    Assignee: Koninklijke Philips Electronics N.V.
    Inventor: Ercan Ferit Gigi
  • Patent number: 7552052
    Abstract: A plurality of voice segments, each including one or more phonemes are acquired in a time-serial manner, in correspondence with desired singing or speaking words. As necessary, a boundary is designated between start and end points of a vowel phoneme included in any one of the acquired voice segments. Voice is synthesized for a region of the vowel phoneme that precedes the designated boundary vowel phoneme, or a region of the vowel phoneme that succeeds the designated boundary in the vowel phoneme. By synthesizing a voice for the region preceding the designated boundary, it is possible to synthesize a voice imitative of a vowel sound that is uttered by a person and then stopped to sound with his or her mouth kept opened. Further, by synthesizing a voice for the region succeeding the designated boundary, it is possible to synthesize a voice imitative of a vowel sound that is started to sound with the mouth opened.
    Type: Grant
    Filed: July 13, 2005
    Date of Patent: June 23, 2009
    Assignee: Yamaha Corporation
    Inventor: Hideki Kemmochi
  • Patent number: 7546241
    Abstract: In a speech synthesis process, micro-segments are cut from acquired waveform data and a window function. The obtained micro-segments are re-arranged to implement a desired prosody, and superposed data is generated by superposing the re-arranged micro-segments, so as to obtain synthetic speech waveform data. A spectrum correction filter is formed based on the acquired waveform data. At least one of the waveform data, micro-segments, and superposed data is corrected using the spectrum correction filter. In this way, “blur” of a speech spectrum due to the window function applied to obtain micro-segments is reduced, and speech synthesis with high sound quality is realized.
    Type: Grant
    Filed: June 2, 2003
    Date of Patent: June 9, 2009
    Assignee: Canon Kabushiki Kaisha
    Inventors: Masayuki Yamada, Yasuhiro Komori, Toshiaki Fukada
  • Patent number: 7529672
    Abstract: A method of synthesizing a speech signal by providing a first speech unit signal having an end interval and a second speech unit signal having a front interval, wherein at least some of the periods of the end interval are appended in inverted order at the end of the first speech unit signal in order to provide a fade-out interval, and at least some of the periods of the front interval are appended in inverted order at the beginning of the second speech unit signal to provide a fade-in interval. An overlap and add operation is performed on the end and fade-in intervals and the fade-out and front intervals.
    Type: Grant
    Filed: August 8, 2003
    Date of Patent: May 5, 2009
    Assignee: Koninklijke Philips Electronics N.V.
    Inventor: Ercan Ferit Gigi