Specialized Model Patents (Class 704/266)

Voice generator, method for generating voice, and navigation apparatus

Patent number: 7555433

Abstract: A main controller feeds a spelling translator with a text item representing a place name stored in a map database. The spelling translator translates the spelling of the text item according to rules described in a translation rule table. The spelling translator translates, e.g., a French character or string included in the text item and not included in the English alphabet into an English alphabet character or string having a pronunciation equivalent or similar to the pronunciation of the French character or string. The translated text item is fed into a TTS engine for English. The TTS engine converts the text item into voice, which is output from a speaker.

Type: Grant

Filed: July 7, 2003

Date of Patent: June 30, 2009

Assignee: Alpine Electronics, Inc.

Inventor: Michiaki Otani
Speech synthesis method and apparatus, and dictionary generation method and apparatus

Patent number: 7546241

Abstract: In a speech synthesis process, micro-segments are cut from acquired waveform data and a window function. The obtained micro-segments are re-arranged to implement a desired prosody, and superposed data is generated by superposing the re-arranged micro-segments, so as to obtain synthetic speech waveform data. A spectrum correction filter is formed based on the acquired waveform data. At least one of the waveform data, micro-segments, and superposed data is corrected using the spectrum correction filter. In this way, “blur” of a speech spectrum due to the window function applied to obtain micro-segments is reduced, and speech synthesis with high sound quality is realized.

Type: Grant

Filed: June 2, 2003

Date of Patent: June 9, 2009

Assignee: Canon Kabushiki Kaisha

Inventors: Masayuki Yamada, Yasuhiro Komori, Toshiaki Fukada
Frame erasure concealment in voice communications

Patent number: 7519535

Abstract: A voice decoder configured to receive a sequence of frames, each of the frames having voice parameters. The voice decoder includes a speech generator that generates speech from the voice parameters. A frame erasure concealment module is configured to reconstruct the voice parameters for a frame erasure in the sequence of frames from the voice parameters in one of the previous frames and the voice parameters in one of the subsequent frames.

Type: Grant

Filed: January 31, 2005

Date of Patent: April 14, 2009

Assignee: QUALCOMM Incorporated

Inventor: Serafin Diaz Spindola
Intonation generation method, speech synthesis apparatus using the method and voice server

Patent number: 7502739

Abstract: In generation of an intonation pattern of a speech synthesis, a speech synthesis system is capable of providing a highly natural speech and capable of reproducing speech characteristics of a speaker flexibly and accurately by effectively utilizing FO patterns of actual speech accumulated in a database. An intonation generation method generates an intonation of synthesized speech for text by estimating, based on language information of the text and based on the estimated outline of the intonation, and then selects an optimum intonation pattern from a database which stores intonation patterns of actual speech. Speech characteristics recorded in advance are reflected in an estimation of an outline of the intonation pattern and selection of a waveform element of a speech.

Type: Grant

Filed: January 24, 2005

Date of Patent: March 10, 2009

Assignee: International Business Machines Corporation

Inventors: Takashi Saito, Masaharu Sakamoto
SYSTEM AND METHOD FOR BLENDING SYNTHETIC VOICES

Publication number: 20090063153

Abstract: A system and method for generating a synthetic text-to-speech TTS voice are disclosed. A user is presented with at least one TTS voice and at least one voice characteristic. A new synthetic TTS voice is generated by blending a plurality of existing TTS voices according to the selected voice characteristics. The blending of voices involves interpolating segmented parameters of each TTS voice. Segmented parameters may be, for example, prosodic characteristics of the speech such as pitch, volume, phone durations, accents, stress, mis-pronunciations and emotion.

Type: Application

Filed: November 4, 2008

Publication date: March 5, 2009

Applicant: AT&T Corp.

Inventors: David A. Kapilow, Kenneth H. Rosen, Juergen Schroeter
Text structure for voice synthesis, voice synthesis method, voice synthesis apparatus, and computer program thereof

Patent number: 7487093

Abstract: In a voice synthesis apparatus, by bounding a desired range of input text to be output by, e.g., a start tag “<morphing type=“emotion” start=“happy” end=“angry”>” and end tag </morphing>, a feature of synthetic voice is continuously changed while gradually changing voice from a happy voice to an angry voice upon outputting synthetic voice.

Type: Grant

Filed: August 10, 2004

Date of Patent: February 3, 2009

Assignee: Canon Kabushiki Kaisha

Inventors: Masahiro Mutsuno, Toshiaki Fukada
Method and system for customizing voice translation of text to speech

Patent number: 7483832

Abstract: A method and system of customizing voice translation of a text to speech includes digitally recording speech samples of a known speaker, correlating each of the speech samples with a standardized audio representation, and organizing the recorded speech samples and correlated audio representations into a collection. The collection of speech samples correlated with audio representations is saved as a single voice file and stored in a device capable of translating the text to speech. The voice file is applied to a translation of text to speech so that the translated speech is customized according to the applied voice file.

Type: Grant

Filed: December 10, 2001

Date of Patent: January 27, 2009

Assignee: AT&T Intellectual Property I, L.P.

Inventor: Steve Tischer
Systems and methods for building a native language phoneme lexicon having native pronunciations of non-native words derived from non-native pronunciations

Patent number: 7472061

Abstract: Systems and methods are provided for automatically building a native phonetic lexicon for a speech-based application trained to process a native (base) language, wherein the native phonetic lexicon includes native phonetic transcriptions (base forms) for non-native (foreign) words which are automatically derived from non-native phonetic transcriptions of the non-native words.

Type: Grant

Filed: March 31, 2008

Date of Patent: December 30, 2008

Assignee: International Business Machines Corporation

Inventors: Neal Alewine, Eric Janke, Paul Sharp, Roberto Sicconi
Generating paralinguistic phenomena via markup in text-to-speech synthesis

Patent number: 7472065

Abstract: Converting marked-up text into a synthesized stream includes providing marked-up text to a processor-based system, converting the marked-up text into a text stream including vocabulary items, retrieving audio segments corresponding to the vocabulary items, concatenating the audio segments to form a synthesized stream, and audibly outputting the synthesized stream, wherein the marked-up text includes a normal text and a paralinguistic text; and wherein the normal text is differentiated from the paralinguistic text by using a grammar constraint, and wherein the paralinguistic text is associated with more than one audio segment, wherein the retrieving of the plurality audio segments includes selecting one audio segment associated with the paralinguistic text.

Type: Grant

Filed: June 4, 2004

Date of Patent: December 30, 2008

Assignee: International Business Machines Corporation

Inventors: Andrew S. Aaron, Raimo Bakis, Ellen M. Eide, Wael Hamza
Automatic speech segmentation and verification using segment confidence measures

Patent number: 7472066

Abstract: An automatic speech segmentation and verification system and method is disclosed, which has a known text script and a recorded speech corpus corresponding to the known text script. A speech unit segmentor segments the recorded speech corpus into N test speech unit segments referring to the phonetic information of the known text script. Then, a segmental verifier is applied to obtain a confidence measure of syllable segmentation for verifying the correctness of the cutting points of test speech unit segments. A phonetic verifier obtains a confidence measure of syllable verification by using verification models for verifying whether the recorded speech corpus is correctly recorded. Finally, a speech unit inspector integrates the confidence measure of syllable segmentation and the confidence measure of syllable verification to determine whether the test speech unit segment is accepted or not.

Type: Grant

Filed: February 23, 2004

Date of Patent: December 30, 2008

Assignee: Industrial Technology Research Institute

Inventors: Chih-Chung Kuo, Chi-Shiang Kuo, Jau-Hung Chen
Voice converter for assimilation by frame synthesis with temporal alignment

Patent number: 7464034

Abstract: A voice converting apparatus is constructed for converting an input voice into an output voice according to a target voice. The apparatus includes a storage section, an analyzing section including a characteristic analyzer, a producing section, a synthesizing section, a memory, an alignment processor, and target decoder.

Type: Grant

Filed: September 27, 2004

Date of Patent: December 9, 2008

Assignees: Yamaha Corporation, Pompeu Fabra University

Inventors: Takahiro Kawashima, Yasuo Yoshioka, Pedro Cano, Alex Loscos, Xavier Serra, Mark Schiementz, Jordi Bonada
Method and system for preselection of suitable units for concatenative speech

Patent number: 7460997

Abstract: A system and method for improving the response time of text-to-speech synthesis utilizes “triphone contexts” (i.e., triplets comprising a central phoneme and its immediate context) as the basic unit, instead of performing phoneme-by-phoneme synthesis. The method comprises a method of generating a triphone preselection cost database for use in speech synthesis, the method comprising 1) selecting a triphone sequence u1-u2-u3, 2) calculating a preselection cost for each 5-phoneme sequence ua-u1-u2-u3-ub, where u2 is allowed to match any identically labeled phoneme in a database and the units ua and ub vary over the entire phoneme universe and 3) storing a group of the selected triphone sequences exhibiting the lowest costs in a triphone preselection cost database.

Type: Grant

Filed: August 22, 2006

Date of Patent: December 2, 2008

Assignee: AT&T Intellectual Property II, L.P.

Inventor: Alistair D. Conkie
System and method for blending synthetic voices

Patent number: 7454348

Abstract: A system and method for generating a synthetic text-to-speech TTS voice are disclosed. A user is presented with at least one TTS voice and at least one voice characteristic. A new synthetic TTS voice is generated by blending a plurality of existing TTS voices according to the selected voice characteristics. The blending of voices involves interpolating segmented parameters of each TTS voice. Segmented parameters may be, for example, prosodic characteristics of the speech such as pitch, volume, phone durations, accents, stress, mis-pronunciations and emotion.

Type: Grant

Filed: January 8, 2004

Date of Patent: November 18, 2008

Assignee: AT&T Intellectual Property II, L.P.

Inventors: David A. Kapilow, Kenneth H. Rosen, Juergen Schroeter
Method, apparatus, and system for building a compact model for large vocabulary continuous speech recognition (LVCSR) system

Patent number: 7454341

Abstract: According to one aspect of the invention, a method is provided in which a mean vector set and a variance vector set of a set of N Gaussians are divided into multiple mean sub-vector sets and variance sub-vector sets, respectively. Each mean sub-vector set contains a subset of the dimensions of the corresponding mean vector set and each variance sub-vector set contains a subset of the dimensions of the corresponding variance vector set. Each resultant sub-vector set is clustered to build a codebook for the respective sub-vector set using a modified K-means clustering process which dynamically merges and splits clusters based upon the size and average distortion of each cluster during each iteration in the modified K-means clustering process.

Type: Grant

Filed: September 30, 2000

Date of Patent: November 18, 2008

Assignee: Intel Corporation

Inventors: Jielin Pan, Baosheng Yuan
Word or collocation emphasizing voice synthesizer

Patent number: 7454345

Abstract: A voice synthesizer, which obtains a voice by emphasizing a specific part of a sentence, includes an emphasis degree deciding unit that extracts a word or a collocation to be emphasized from among respective words or respective collocations on the basis of an extracting reference with respect to the each word or the each collocation included in a sentence and deciding an emphasis degree of the extracted word or the extracted collocation, an acoustic processing unit that synthesizes a voice having an emphasis degree which is decided by the emphasis degree deciding unit applied to the word to be emphasized or the collocation to be emphasized, whereby the emphasized part of the word or the collocation can be obtained automatically on the basis of the extracting reference, such as a frequency of appearance and a level of importance of the word or the collocation.

Type: Grant

Filed: February 23, 2005

Date of Patent: November 18, 2008

Assignee: Fujitsu Limited

Inventors: Hitoshi Sasaki, Yasushi Yamazaki, Yasuji Ota, Kaori Endo, Nobuyuki Katae, Kazuhiro Watanabe
System and method for converting text-to-voice

Patent number: 7451087

Abstract: A method for converting text to concatenated voice by utilizing a digital voice library and a set of playback rules is provided. The method includes receiving and expanding text data to form a sequence of text and pseudo words. The sequence of text and pseudo words is converted into a sequence of speech items, and the sequence of speech items is converted into a sequence of voice recordings. The method includes generating voice data on the sequence of voice recordings by concatenating adjacent recordings in the sequence of voice recordings.

Type: Grant

Filed: March 27, 2001

Date of Patent: November 11, 2008

Assignee: Qwest Communications International Inc.

Inventors: Eliot M. Case, Richard P. Phillips
VOICE CONVERSION APPARATUS AND SPEECH SYNTHESIS APPARATUS

Publication number: 20080201150

Abstract: A conversion rule and a rule selection parameter are stored. The conversion rule converts a spectral parameter of a source speaker to a spectral parameter of a target speaker. The rule selection parameter represents the spectral parameter of the source speaker. A first conversion rule of start timing and a second conversion rule of end timing in a speech unit of the source speaker are selected by the spectral parameter of the start timing and the end timing. An interpolation coefficient corresponding to the spectral parameter of each timing in the speech unit is calculated by the first conversion rule and the second conversion rule. A third conversion rule corresponding to the spectral parameter of each timing in the speech unit is calculated by interpolating the first conversion rule and the second conversion rule with the interpolation coefficient. The spectral parameter of each timing is converted to a spectral parameter of the target speaker by the third conversion rule.

Type: Application

Filed: January 22, 2008

Publication date: August 21, 2008

Applicant: KABUSHIKI KAISHA TOSHIBA

Inventors: Masatsune Tamura, Takehiro Kagoshima
System and method for distributed gain control

Patent number: 7415118

Abstract: In accordance with an embodiment, the invention provides a spectral enhancement system that includes a plurality of distributed filters, a plurality of energy distribution units, and a weighted-averaging unit. At least one of the distributed filters receives a multi-frequency input signal. Each of the plurality of energy-detection units is coupled to an output of at least one filter and provides an energy-detection output signal. The weighted-averaging unit is coupled to each of the energy-detection units and provides a weighted-averaging signal to each of the filters responsive to the energy-detection output signals from each of the energy-detection units to implement distributed gain control. In an embodiment, the energy detection units are coupled to the outputs of the filters via a plurality of differentiator units.

Type: Grant

Filed: July 23, 2003

Date of Patent: August 19, 2008

Assignee: Massachusetts Institute of Technology

Inventors: Rahul Sarpeshkar, Lorenzo Turicchia
Method for conditioning a database for automatic speech processing

Patent number: 7406417

Abstract: A neural network can be trained for synthesizing or recognizing speech with the aid of a database produced by automatically matching graphemes and phonemes. First, graphemes and phonemes are matched for words which have the same number of graphemes and phonemes. Next, graphemes and phonemes are matched for words that have more graphemes than phonemes in a series of steps that combine graphemes with preceding phonemes. Then, graphemes and phonemes are matched for words that have fewer graphemes than phonemes. After each step, infrequent and unsuccessful matches made in the preceding step are are erased. After this process is completed, the database can be used to train the neural network and graphemes, or letters of a text can be converted into the corresponding phonemes with the aid of the trained neural network.

Type: Grant

Filed: August 29, 2000

Date of Patent: July 29, 2008

Assignee: Siemens Aktiengesellschaft

Inventor: Horst-Udo Hain
Device and method for interpolating frequency components of signal

Patent number: 7400651

Abstract: A frequency interpolation apparatus is provided which reproduces a signal similar to an original signal by approximately recovering suppressed frequency components, from an input signal having the suppressed frequency components in a specific frequency band of the original signal. The input signal is divided into a plurality of signal component sets each having frequency components in a frequency band among a plurality of frequency bands, and a signal component set in the band with the suppressed signal components is synthesized from the plurality of divided signal component sets and added to the input signal. Each of the plurality of divided signal component sets is frequency-converted to a signal component set in the same frequency band, and the signal component set in the band with the suppressed signal components is synthesized through linear combination of the frequency-converted signal component sets.

Type: Grant

Filed: June 29, 2001

Date of Patent: July 15, 2008

Assignee: Kabushiki Kaisha Kenwood

Inventor: Yasushi Sato
Apparatus and method for reproducing voice in synchronism with music piece

Patent number: 7365260

Abstract: Music piece sequence data are composed of a plurality of event data which include performance event data and user event data designed for linking a voice to progression of a music piece. A plurality of voice data files are stored in a memory separately from the music piece sequence data. In music piece reproduction, the individual event data of the music piece sequence data are sequentially read out, and a tone signal is generated in response to each readout of the performance event data. In the meantime, a voice reproduction instruction is output in response to each readout of the user event data. In accordance with the voice reproduction instruction, a voice data file is selected from among the voice data files stored in the memory, and a voice signal is generated on the basis of each read-out voice data.

Type: Grant

Filed: December 16, 2003

Date of Patent: April 29, 2008

Assignee: Yamaha Corporation

Inventor: Takahiro Kawashima
Method and apparatus for training an automated speech recognition-based system

Patent number: 7346507

Abstract: A method and apparatus for building a training set for an automated speech recognition-based system, which determines the statistically optimal number of frequently requested responses to automate in order to achieve a desired automation rate. The invention may be used to select the appropriate tokens and responses to train the system and to achieve a desired “phrase coverage” for all of the many different ways human beings may phrase a request that calls for one of a plurality of frequently-requested responses. The invention also determines the statistically optimal number of tokens (spoken requests) required to train a speech recognition-based system to achieve the desired phrase coverage and optimal allocation of tokens over the set of responses that are to be automated.

Type: Grant

Filed: June 4, 2003

Date of Patent: March 18, 2008

Assignee: BBN Technologies Corp.

Inventors: Premkumar Natarajan, Rohit Prasad
Domain adaptation for TTS systems

Patent number: 7328157

Abstract: Embodiments of the present invention pertain to adaptation of a corpus-driven general-purpose TTS system to at least one specific domain. The domain adaptation is realized by adding a limited amount of domain-specific speech that provides a maximum impact on improved perceived naturalness of speech. An approach for generating optimized script for adaptation is proposed, the core of which is a dynamic programming based algorithm that segments domain-specific corpus into a minimum number of segments that appear in the unit inventory. Increases in perceived naturalness of speech after adaptation are estimated from the generated script without recording speech from it.

Type: Grant

Filed: January 24, 2003

Date of Patent: February 5, 2008

Assignee: Microsoft Corporation

Inventors: Min Chu, Hu Peng
Interactive speech recognition apparatus and method with conditioned voice prompts

Patent number: 7328159

Abstract: An improved system for an interactive voice recognition system (400) includes a voice prompt generator (401) for generating voice prompt in a first frequency band (501). A speech detector (406) detects presence of speech energy in a second frequency band (502). The first and second frequency bands (501, 502) are essentially conjugate frequency bands. A voice data generator (412) generates voice data based on an output of the voice prompt generator (401) and audible speech of a voice response generator (402). A control signal (422) controls the voice prompt generator (401) based on whether the speech detector (406) detects presence of speech energy in the second frequency band (502). A back end (405) of the interactive voice recognition system (400) is configured to operate on an extracted front end voice feature based on whether the speech detector (406) detects presence of speech energy in the second frequency band (502).

Type: Grant

Filed: January 15, 2002

Date of Patent: February 5, 2008

Assignee: Qualcomm Inc.

Inventors: Chienchung Chang, Narendranath Malayath
Method and system for generating natural sounding concatenative synthetic speech

Patent number: 7308407

Abstract: A method for generating synthetic speech can include identifying a recording of conversational speech and creating a transcription of the conversational speech. Using the transcription, rather than a predefined script, the recording can be analyzed and acoustic units extracted. Each acoustic unit can include a phoneme and/or a sub-phoneme. The acoustic units can be stored so that a concatenative text-to-speech engine can later splice the acoustic units together to produce synthetic speech.

Type: Grant

Filed: March 3, 2003

Date of Patent: December 11, 2007

Assignee: International Business Machines Corporation

Inventor: David E. Reich
Providing services for an information processing system using an audio interface

Patent number: 7308408

Abstract: A method and system for providing efficient menu services for an information processing system that uses a telephone or other form of audio user interface. In one embodiment, the menu services provide effective support for novice users by providing a full listing of available keywords and rotating house advertisements which inform novice users of potential features and information. For experienced users, cues are rendered so that at any time the user can say a desired keyword to invoke the corresponding application. The menu is flat to facilitate its usage. Full keyword listings are rendered after the user is given a brief cue to say a keyword. Service messages rotate words and word prosody. When listening to receive information from the user, after the user has been cued, soft background music or other audible signals are rendered to inform the user that a response may now be spoken to the service.

Type: Grant

Filed: September 29, 2004

Date of Patent: December 11, 2007

Assignee: Microsoft Corporation

Inventors: Lisa Joy Stifelman, Hadi Partovi, Haleh Partovi, David Bryan Alpert, Matthew Talin Marx, Scott James Bailey, Kyle D. Sims, Darby McDonough Bailey, Roderick Steven Brathwaite, Eugene Koh, Angus Macdonald Davis
Synthetically generated speech responses including prosodic characteristics of speech inputs

Patent number: 7280968

Abstract: A method for digitally generating speech with improved prosodic characteristics can include receiving a speech input, determining at least one prosodic characteristic contained within the speech input, and generating a speech output including the prosodic characteristic within the speech output.

Type: Grant

Filed: March 25, 2003

Date of Patent: October 9, 2007

Assignee: International Business Machines Corporation

Inventor: Oscar J. Blass
System and method for speech synthesis using a smoothing filter

Patent number: 7277856

Abstract: A speech synthesis system for controlling a discontinuous distortion that occurs at the transition portion between concatenated phonemes which are speech units of a synthesized speech using a smoothing technique, comprising: a discontinuous distortion processing means adapted to predict a discontinuity at the transition portion between concatenated samples of phonemes used for a speech synthesis through a predetermined learning process, and control a discontinuity at the transition portion between the concatenated phonemes of the synthesized speech in such a fashion that it is smoothed adaptively to correspond to a degree of the predicted discontinuity. The smoothing filter smoothes the synthesized speech so that the discontinuity degree of synthesized speech follows the predicted discontinuity degree according to the filter coefficient (a) changed adaptively to correspond to a ratio of the predicted discontinuity degree to the real discontinuity degree.

Type: Grant

Filed: October 31, 2002

Date of Patent: October 2, 2007

Assignee: Samsung Electronics Co., Ltd.

Inventors: Ki-seung Lee, Jeong-su Kim, Jae-won Lee
Automatic segmentation in speech synthesis

Patent number: 7266497

Abstract: Systems and methods for automatically segmenting speech inventories. A set of Hidden Markov Models (HMMs) are initialized using bootstrap data. The HMMs are next re-estimated and aligned to produce phone labels. The phone boundaries of the phone labels are then corrected using spectral boundary correction. Optionally, this process of using the spectral-boundary-corrected phone labels as input instead of the bootstrap data is performed iteratively in order to further reduce mismatches between manual labels and phone labels assigned by the HMM approach.

Type: Grant

Filed: January 14, 2003

Date of Patent: September 4, 2007

Assignee: AT&T Corp.

Inventors: Alistair D. Conkie, Yeon-Jun Kim
Synthesis-based pre-selection of suitable units for concatenative speech

Patent number: 7233901

Abstract: A system and computer-readable medium synthesize speech from text using a triphone unit selection database. The instructions on the computer-readable medium control a computing device to perform the steps: receiving input text, selecting a plurality of N phoneme units from the triphone unit selection database as candidate phonemes for synthesized speech based on the input text, applying a cost process to select a set of phonemes from the candidate phonemes and synthesizing speech using the selected set of phonemes.

Type: Grant

Filed: December 30, 2005

Date of Patent: June 19, 2007

Assignee: AT&T Corp.

Inventor: Alistair D. Conkie
Assignment of phonemes to the graphemes producing them

Patent number: 7171362

Abstract: The assignment of phonemes to graphemes producing them in a lexicon having words (grapheme sequences) and their associated phonetic transcription (phoneme sequences) for the preparation of patterns for training neural networks for the purpose of grapheme-phoneme conversion is carried out with the aid of a variant of dynamic programming which is known as dynamic time warping (DTW).

Type: Grant

Filed: August 31, 2001

Date of Patent: January 30, 2007

Assignee: Siemens Aktiengesellschaft

Inventor: Horst-Udo Hain
Speech synthesis apparatus, control method therefor and computer-readable memory

Patent number: 7139712

Abstract: A second phoneme is generated in consideration of a phonemic context with respect to a first phoneme as a search target. Phonemic piece data corresponding to the second phoneme is searched out from a database. A third phoneme is generated by changing the phonemic context on the basis of the search result, and phonemic piece data corresponding to the third phoneme is re-searched out from the database. The search or re-search result is registered in a table in correspondence with the second or third phoneme.

Type: Grant

Filed: March 5, 1999

Date of Patent: November 21, 2006

Assignee: Canon Kabushiki Kaisha

Inventor: Masayuki Yamada
Method and system for preselection of suitable units for concatenative speech

Patent number: 7124083

Abstract: A system and method for improving the response time of text-to-speech synthesis utilizes “triphone contexts” (i.e., triplets comprising a central phoneme and its immediate context) as the basic unit, instead of performing phoneme-by-phoneme synthesis. The method comprises a method of generating a triphone preselection cost database for use in speech synthesis, the method comprising 1) selecting a triphone sequence u1-u2-u3, 2) calculating a preselection cost for each 5-phoneme sequence ua-u1-u2-u3-ub, where u2 is allowed to match any identically labeled phoneme in a database and the units ua and ub vary over the entire phoneme universe and 3) storing a group of the selected triphone sequences exhibiting the lowest costs in a triphone preselection cost database.

Type: Grant

Filed: November 5, 2003

Date of Patent: October 17, 2006

Assignee: AT&T Corp.

Inventor: Alistair D. Conkie
Method and system for real time audio synthesis

Patent number: 7120584

Abstract: A method and system for synthesizing audio speech is provided. A synthesis engine receives from a host, compressed and normalized speech units and prosodic information. The synthesis engine decompresses data and synthesizes audio signals. The synthesis engine can be implemented on a digital signal processing system which can meet requirements of low resources (i.e. low power consumption, lower memory usage), such as a DSP system including an input/output module, a WOLA filterbank and a DSP core that operate in parallel.

Type: Grant

Filed: October 22, 2002

Date of Patent: October 10, 2006

Assignee: AMI Semiconductor, Inc.

Inventors: Hamid Sheikhzadeh-Nadjar, Etienne Cornu, Robert L. Brennan
Methods and apparatus for rapid acoustic unit selection from a large speech corpus

Patent number: 7082396

Abstract: A speech synthesis system can select recorded speech fragments, or acoustic units, from a very large database of acoustic units to produce artificial speech. The selected acoustic units are chosen to minimize a combination of target and concatenation costs for a given sentence. However, as concatenation costs, which are measures of the mismatch between sequential pairs of acoustic units, are expensive to compute, processing can be greatly reduced by pre-computing and caching the concatenation costs. Unfortunately, the number of possible sequential pairs of acoustic units makes such caching prohibitive. However, statistical experiments reveal that while about 85% of the acoustic units are typically used in common speech, less than 1% of the possible sequential pairs of acoustic units occur in practice.

Type: Grant

Filed: December 19, 2003

Date of Patent: July 25, 2006

Assignee: AT&T Corp

Inventors: Mark C. Beutnagel, Mehryar Mohri, Michael D. Riley
Advance TTS for facial animation

Patent number: 7076426

Abstract: An enhanced system is achieved by allowing bookmarks which can specify that the stream of bits that follow corresponds to phonemes and a plurality of prosody information, including duration information, that is specified for times within the duration of the phonemes. Illustratively, such a stream comprises a flag to enable a duration flag, a flag to enable a pitch contour flag, a flag to enable an energy contour flag, a specification of the number of phonemes that follow, and, for each phoneme, one or more sets of specific prosody information that relates to the phoneme, such as a set of pitch values and their durations.

Type: Grant

Filed: January 27, 1999

Date of Patent: July 11, 2006

Assignee: AT&T Corp.

Inventors: Mark Charles Beutnagel, Joern Ostermann, Schuyler Reynier Quackenbush
Waveform synthesis

Patent number: 7069217

Abstract: A synthesizer is disclosed in which a speech waveform is synthesized by selecting a synthetic starting waveform segment and then generating a sequence of further segments. The further waveform segments are generated based jointly upon the value of the immediately-preceding segment and upon a model of the dynamics of an actual sound similar to that being generated. In particular, a method is disclosed of a voiced speech sound comprising calculating each new output value from the previous output value using data modeling the evolution, over a short time interval, of the voiced speech sound to be synthesized. This sequential generation of waveform segments enables a synthesized sequence of speech waveforms to be generated of any duration. In addition, a low-dimensional state space representation of speech signals are used in which successive pitch pulse cycles are superimposed to estimate the progression of the cyclic speech signal within each cycle.

Type: Grant

Filed: January 9, 1997

Date of Patent: June 27, 2006

Assignee: British Telecommunications PLC

Inventors: Stephen McLaughlin, Michael Banbrook
Monitoring text to speech output to effect control of barge-in

Patent number: 7062440

Abstract: A speech system has a speech input channel including a speech recognizer, and a speech output channel including a text-to-speech converter. Associated with the input channel is a barge-in control for setting barge-in behavior parameters determining how the apparatus handles barge-in by a user during speech output by the apparatus. In order to make the barge-in control more responsive to the actual speech output from the output channel, a barge-in prediction arrangement is provided that is responsive to feature values produced during the operation of the text-to-speech converter to produce indications as to the most likely barge-in points. The barge-in control is responsive to these indications to adjust at least one of the barge-in behavior parameters for periods corresponding to the most likely barge-in points.

Type: Grant

Filed: May 31, 2002

Date of Patent: June 13, 2006

Assignee: Hewlett-Packard Development Company, L.P.

Inventors: Paul St John Brittan, Roger Cecil Ferry Tucker
Method and apparatus for distribution-based language model adaptation

Patent number: 7043422

Abstract: A method and apparatus are provided for adapting a language model to a task-specific domain. Under the method and apparatus, the relative frequency of n-grams in a small training set (i.e. task-specific training data set) and the relative frequency of n-grams in a large training set (i.e. out-of-domain training data set) are used to weight a distribution count of n-grams in the large training set. The weighted distributions are then used to form a modified language model by identifying probabilities for n-grams from the weighted distributions.

Type: Grant

Filed: September 4, 2001

Date of Patent: May 9, 2006

Assignee: Microsoft Corporation

Inventors: Jianfeng Gao, Mingjing Li
Speech synthesizing apparatus and method, and storage medium therefor

Patent number: 7031919

Abstract: A speech synthesizing apparatus for synthesizing a speech waveform stores speech data, which is obtained by adding attribute information onto phoneme data, in a database. In accordance with prescribed retrieval conditions, a phoneme retrieval unit retrieves phoneme data from the speech data that has been stored in the database and retains the retrieved results in a retrieved-result storage area. A processing unit for assigning a power penalty and a processing unit for assigning a phoneme-duration penalty assign the penalties, on the basis of power and phoneme duration constituting the attribute information, to a set of phoneme data stored in the retrieved-result storage area. A processing unit for determining typical phoneme data performs sorting on the basis of the assigned penalties and, based upon the stored results, selects phoneme data to be employed in the synthesis of a speech waveform.

Type: Grant

Filed: August 30, 1999

Date of Patent: April 18, 2006

Assignee: Canon Kabushiki Kaisha

Inventors: Yasuo Okutani, Masayuki Yamada
Singing voice synthesizing apparatus, singing voice synthesizing method, and program for realizing singing voice synthesizing method

Patent number: 7016841

Abstract: A singing voice synthesizing apparatus is provided, which enables achievement of a natural sounding synthesized singing voice with a good level of comprehensibility. A phoneme database stores a plurality of voice fragment data formed of voice fragments each being a single phoneme or a phoneme chain of at least two concatenated phonemes, each of the plurality of voice fragment data comprising data of a deterministic component and data of a stochastic component. A readout device that reads out from the phoneme database the voice fragment data corresponding to inputted lyrics. A duration time adjusting device adjusts time duration of the read-out voice fragment data so as to match a desired tempo and manner of singing. An adjusting device adjusts the deterministic component and the stochastic component of the read-out voice fragment so as to match a desired pitch.

Type: Grant

Filed: December 27, 2001

Date of Patent: March 21, 2006

Assignee: Yamaha Corporation

Inventors: Hideki Kenmochi, Xavier Serra, Jordi Bonada
Synthesis-based pre-selection of suitable units for concatenative speech

Patent number: 7013278

Abstract: A method for generating concatenative speech uses a speech synthesis input to populate a triphone-indexed database that is later used for searching and retrieval to create a phoneme string acceptable for a text-to-speech operation. Prior to initiating the “real time” synthesis process, a database is created of all possible triphone contexts by inputting a continuous stream of speech. The speech data is then analyzed to identify all possible triphone sequences in the stream, and the various units chosen for each context. During a later text-to-speech operation, the triphone contexts in the text are identified and the triphone-indexed phonemes in the database are searched to retrieve the best-matched candidates.

Type: Grant

Filed: September 5, 2002

Date of Patent: March 14, 2006

Assignee: AT&T Corp.

Inventor: Alistair D. Conkie
Method and apparatus for an adaptive codebook search in a speech processing system

Patent number: 7003461

Abstract: An adaptive codebook search (ACS) algorithm is based on a set of matrix operations suitable for data processing engines supporting a single instruction multiple data (SIMD) architecture. The result is a reduction in memory access and increased parallelism to produce an overall improvement in the computational efficiency of ACS processing.

Type: Grant

Filed: July 9, 2002

Date of Patent: February 21, 2006

Assignee: Renesas Technology Corporation

Inventor: Clifford Tavares
Voice personalization of speech synthesizer

Patent number: 6970820

Abstract: The speech synthesizer is personalized to sound like or mimic the speech characteristics of an individual speaker. The individual speaker provides a quantity of enrollment data, which can be extracted from a short quantity of speech, and the system modifies the base synthesis parameters to more closely resemble those of the new speaker. More specifically, the synthesis parameters may be decomposed into speaker dependent parameters, such as context-independent parameters, and speaker independent parameters, such as context dependent parameters. The speaker dependent parameters are adapted using enrollment data from the new speaker. After adaptation, the speaker dependent parameters are combined with the speaker independent parameters to provide a set of personalized synthesis parameters.

Type: Grant

Filed: February 26, 2001

Date of Patent: November 29, 2005

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Jean-Claude Junqua, Florent Perronnin, Roland Kuhn, Patrick Nguyen
Voice feature extraction device

Patent number: 6959277

Abstract: In a conventional device for extracting voice features accurately without being influenced by noises, such as a voice recognition device, usually an input voice signal is processed first by a noise reduction system having the tap length N, and the result is FFT-processed by L-points, and then the power spectrum vector is calculated; accordingly, a one time operation requires N multiplications and (N?1) summations. The voice feature extraction device according to the invention receives a voice signal including noises from a microphone, which is processed by a window function operation unit, and thereafter FFT-processed by an FFT operation unit by L-points. A power calculation unit calculates a power spectrum vector of the input voice signal. However, a noise reduction system determines in advance a filter coefficient of this system and processes the coefficient to calculate a noise reduction coefficient, and the power spectrum vector is processed by this noise reduction system.

Type: Grant

Filed: June 26, 2001

Date of Patent: October 25, 2005

Assignee: Alpine Electronics, Inc.

Inventors: Shingo Kiuchi, Toshiaki Asano, Nozomu Saito
Run time synthesizer adaptation to improve intelligibility of synthesized speech

Patent number: 6876968

Abstract: A method and system provide for run-time modification of synthesized speech. The method includes the step of generating synthesized speech based on textual input and a plurality of run-time control parameter values. Real-time data is generated based on an input signal, where the input signal characterizes an intelligibility of the speech with regard to a listener. The method further provides for modifying one or more of the run-time control parameter values based on the real-time data such that the intelligibility of the speech increases. Modifying the parameter values at run-time as opposed to during the design stages provides a level of adaptation unachievable through conventional approaches.

Type: Grant

Filed: March 8, 2001

Date of Patent: April 5, 2005

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventor: Peter Veprek
Speech synthesis device handling phoneme units of extended CV

Patent number: 6847932

Abstract: Given phonetic information is divided into speech units of extended CV which is a contiguous sequence of phonemes without clear distinction containing a vowel or some vowels. Contour of vocal tract transmission function of phoneme of the speech unit of extended CV is obtained from the phoneme directory which contains a contour of vocal tract transmission function of each phoneme associated with phonetic information in a unit of extended CV. Speech waveform data is generated based on the contour of vocal tract transmission function of phoneme of the speech unit of extended CV. Speech waveform data is converted into analog voice signal.

Type: Grant

Filed: September 28, 2000

Date of Patent: January 25, 2005

Assignee: Arcadia, Inc.

Inventors: Kazuyuki Ashimura, Seiichi Tenpaku
Expressive parsing in computerized conversion of text to speech

Patent number: 6847931

Abstract: A preferred embodiment of the method for converting text to speech using a computing device having a memory is disclosed. Text, being made up of a plurality of words, is received into the memory of the computing device. A plurality of phonemes are derived from the text. Each of the phonemes is associated with a prosody record based on a database of prosody records associated with a plurality of words. A first set of the artificial intelligence rules is applied to determine context information associated with the text. The context influenced prosody changes for each of the phonemes is determined. Then a second set of rules, based on Lessac theory to determine Lessac derived prosody changes for each of the phonemes is applied. The prosody record for each of the phonemes is amended in response to the context influenced prosody changes and the Lessac derived prosody changes. Then a reading from the memory sound information associated with the phonemes is performed.

Type: Grant

Filed: January 29, 2002

Date of Patent: January 25, 2005

Assignee: Lessac Technology, Inc.

Inventors: Edwin R. Addison, H. Donald Wilson, Gary Marple, Anthony H. Handal, Nancy Krebs
Prosody template matching for text-to-speech systems

Patent number: 6845358

Abstract: A prosody matching template in the form of a tree structure stores indices which point to lookup table and template information prescribing pitch and duration values that are used to add inflection to the output of a text-to-speech synthesizer. The lookup module employs a search algorithm that explores each branch of the tree, assigning penalty scores based on whether the syllable represented by a node of the tree does or does not match the corresponding syllable of the target word. The path with the lowest penalty score is selected as the index into the prosody template table. The system will add nodes by cloning existing nodes in cases where it is not possible to find a one-to-one match between the number of syllables in the target word and the number of nodes in the tree.

Type: Grant

Filed: January 5, 2001

Date of Patent: January 18, 2005

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Nicholas Kibre, Ted H. Applebaum
FFT based sine wave synthesis method for parametric vocoders

Patent number: 6845359

Abstract: A Fast Fourier Transform (FFT) based voice synthesis method 110, program product and vocoder. Sounds, e.g., speech and audio, are synthesized from multiple sine waves. Each sine wave component is represented by a small number of FFT coefficients 116. Amplitude 120 and phase 124 information of the components may be incorporated into these coefficients. The FFT coefficients corresponding to each of the components are summed 126 and, then, an inverse FFT is applied 128 to the sum to generate a time domain signal. An appropriate section is extracted 130 from the inverse transformed time domain signal as an approximation to the desired output. FFT based synthesis 110 may be combined with simple sine wave summation 100, using FFT based synthesis 110 for complex sounds, e.g., male voices and unvoiced speech, and sine wave summation 100 for simpler sounds, e.g., female voices.

Type: Grant

Filed: March 22, 2001

Date of Patent: January 18, 2005

Assignee: Motorola, Inc.

Inventor: Tenkasi Ramabadran

prev 1 2 3 4 5 6 next