Specialized Model Patents (Class 704/266)
  • Patent number: 7555433
    Abstract: A main controller feeds a spelling translator with a text item representing a place name stored in a map database. The spelling translator translates the spelling of the text item according to rules described in a translation rule table. The spelling translator translates, e.g., a French character or string included in the text item and not included in the English alphabet into an English alphabet character or string having a pronunciation equivalent or similar to the pronunciation of the French character or string. The translated text item is fed into a TTS engine for English. The TTS engine converts the text item into voice, which is output from a speaker.
    Type: Grant
    Filed: July 7, 2003
    Date of Patent: June 30, 2009
    Assignee: Alpine Electronics, Inc.
    Inventor: Michiaki Otani
  • Patent number: 7546241
    Abstract: In a speech synthesis process, micro-segments are cut from acquired waveform data and a window function. The obtained micro-segments are re-arranged to implement a desired prosody, and superposed data is generated by superposing the re-arranged micro-segments, so as to obtain synthetic speech waveform data. A spectrum correction filter is formed based on the acquired waveform data. At least one of the waveform data, micro-segments, and superposed data is corrected using the spectrum correction filter. In this way, “blur” of a speech spectrum due to the window function applied to obtain micro-segments is reduced, and speech synthesis with high sound quality is realized.
    Type: Grant
    Filed: June 2, 2003
    Date of Patent: June 9, 2009
    Assignee: Canon Kabushiki Kaisha
    Inventors: Masayuki Yamada, Yasuhiro Komori, Toshiaki Fukada
  • Patent number: 7519535
    Abstract: A voice decoder configured to receive a sequence of frames, each of the frames having voice parameters. The voice decoder includes a speech generator that generates speech from the voice parameters. A frame erasure concealment module is configured to reconstruct the voice parameters for a frame erasure in the sequence of frames from the voice parameters in one of the previous frames and the voice parameters in one of the subsequent frames.
    Type: Grant
    Filed: January 31, 2005
    Date of Patent: April 14, 2009
    Assignee: QUALCOMM Incorporated
    Inventor: Serafin Diaz Spindola
  • Patent number: 7502739
    Abstract: In generation of an intonation pattern of a speech synthesis, a speech synthesis system is capable of providing a highly natural speech and capable of reproducing speech characteristics of a speaker flexibly and accurately by effectively utilizing FO patterns of actual speech accumulated in a database. An intonation generation method generates an intonation of synthesized speech for text by estimating, based on language information of the text and based on the estimated outline of the intonation, and then selects an optimum intonation pattern from a database which stores intonation patterns of actual speech. Speech characteristics recorded in advance are reflected in an estimation of an outline of the intonation pattern and selection of a waveform element of a speech.
    Type: Grant
    Filed: January 24, 2005
    Date of Patent: March 10, 2009
    Assignee: International Business Machines Corporation
    Inventors: Takashi Saito, Masaharu Sakamoto
  • Publication number: 20090063153
    Abstract: A system and method for generating a synthetic text-to-speech TTS voice are disclosed. A user is presented with at least one TTS voice and at least one voice characteristic. A new synthetic TTS voice is generated by blending a plurality of existing TTS voices according to the selected voice characteristics. The blending of voices involves interpolating segmented parameters of each TTS voice. Segmented parameters may be, for example, prosodic characteristics of the speech such as pitch, volume, phone durations, accents, stress, mis-pronunciations and emotion.
    Type: Application
    Filed: November 4, 2008
    Publication date: March 5, 2009
    Applicant: AT&T Corp.
    Inventors: David A. Kapilow, Kenneth H. Rosen, Juergen Schroeter
  • Patent number: 7487093
    Abstract: In a voice synthesis apparatus, by bounding a desired range of input text to be output by, e.g., a start tag “<morphing type=“emotion” start=“happy” end=“angry”>” and end tag </morphing>, a feature of synthetic voice is continuously changed while gradually changing voice from a happy voice to an angry voice upon outputting synthetic voice.
    Type: Grant
    Filed: August 10, 2004
    Date of Patent: February 3, 2009
    Assignee: Canon Kabushiki Kaisha
    Inventors: Masahiro Mutsuno, Toshiaki Fukada
  • Patent number: 7483832
    Abstract: A method and system of customizing voice translation of a text to speech includes digitally recording speech samples of a known speaker, correlating each of the speech samples with a standardized audio representation, and organizing the recorded speech samples and correlated audio representations into a collection. The collection of speech samples correlated with audio representations is saved as a single voice file and stored in a device capable of translating the text to speech. The voice file is applied to a translation of text to speech so that the translated speech is customized according to the applied voice file.
    Type: Grant
    Filed: December 10, 2001
    Date of Patent: January 27, 2009
    Assignee: AT&T Intellectual Property I, L.P.
    Inventor: Steve Tischer
  • Patent number: 7472061
    Abstract: Systems and methods are provided for automatically building a native phonetic lexicon for a speech-based application trained to process a native (base) language, wherein the native phonetic lexicon includes native phonetic transcriptions (base forms) for non-native (foreign) words which are automatically derived from non-native phonetic transcriptions of the non-native words.
    Type: Grant
    Filed: March 31, 2008
    Date of Patent: December 30, 2008
    Assignee: International Business Machines Corporation
    Inventors: Neal Alewine, Eric Janke, Paul Sharp, Roberto Sicconi
  • Patent number: 7472065
    Abstract: Converting marked-up text into a synthesized stream includes providing marked-up text to a processor-based system, converting the marked-up text into a text stream including vocabulary items, retrieving audio segments corresponding to the vocabulary items, concatenating the audio segments to form a synthesized stream, and audibly outputting the synthesized stream, wherein the marked-up text includes a normal text and a paralinguistic text; and wherein the normal text is differentiated from the paralinguistic text by using a grammar constraint, and wherein the paralinguistic text is associated with more than one audio segment, wherein the retrieving of the plurality audio segments includes selecting one audio segment associated with the paralinguistic text.
    Type: Grant
    Filed: June 4, 2004
    Date of Patent: December 30, 2008
    Assignee: International Business Machines Corporation
    Inventors: Andrew S. Aaron, Raimo Bakis, Ellen M. Eide, Wael Hamza
  • Patent number: 7472066
    Abstract: An automatic speech segmentation and verification system and method is disclosed, which has a known text script and a recorded speech corpus corresponding to the known text script. A speech unit segmentor segments the recorded speech corpus into N test speech unit segments referring to the phonetic information of the known text script. Then, a segmental verifier is applied to obtain a confidence measure of syllable segmentation for verifying the correctness of the cutting points of test speech unit segments. A phonetic verifier obtains a confidence measure of syllable verification by using verification models for verifying whether the recorded speech corpus is correctly recorded. Finally, a speech unit inspector integrates the confidence measure of syllable segmentation and the confidence measure of syllable verification to determine whether the test speech unit segment is accepted or not.
    Type: Grant
    Filed: February 23, 2004
    Date of Patent: December 30, 2008
    Assignee: Industrial Technology Research Institute
    Inventors: Chih-Chung Kuo, Chi-Shiang Kuo, Jau-Hung Chen
  • Patent number: 7464034
    Abstract: A voice converting apparatus is constructed for converting an input voice into an output voice according to a target voice. The apparatus includes a storage section, an analyzing section including a characteristic analyzer, a producing section, a synthesizing section, a memory, an alignment processor, and target decoder.
    Type: Grant
    Filed: September 27, 2004
    Date of Patent: December 9, 2008
    Assignees: Yamaha Corporation, Pompeu Fabra University
    Inventors: Takahiro Kawashima, Yasuo Yoshioka, Pedro Cano, Alex Loscos, Xavier Serra, Mark Schiementz, Jordi Bonada
  • Patent number: 7460997
    Abstract: A system and method for improving the response time of text-to-speech synthesis utilizes “triphone contexts” (i.e., triplets comprising a central phoneme and its immediate context) as the basic unit, instead of performing phoneme-by-phoneme synthesis. The method comprises a method of generating a triphone preselection cost database for use in speech synthesis, the method comprising 1) selecting a triphone sequence u1-u2-u3, 2) calculating a preselection cost for each 5-phoneme sequence ua-u1-u2-u3-ub, where u2 is allowed to match any identically labeled phoneme in a database and the units ua and ub vary over the entire phoneme universe and 3) storing a group of the selected triphone sequences exhibiting the lowest costs in a triphone preselection cost database.
    Type: Grant
    Filed: August 22, 2006
    Date of Patent: December 2, 2008
    Assignee: AT&T Intellectual Property II, L.P.
    Inventor: Alistair D. Conkie
  • Patent number: 7454348
    Abstract: A system and method for generating a synthetic text-to-speech TTS voice are disclosed. A user is presented with at least one TTS voice and at least one voice characteristic. A new synthetic TTS voice is generated by blending a plurality of existing TTS voices according to the selected voice characteristics. The blending of voices involves interpolating segmented parameters of each TTS voice. Segmented parameters may be, for example, prosodic characteristics of the speech such as pitch, volume, phone durations, accents, stress, mis-pronunciations and emotion.
    Type: Grant
    Filed: January 8, 2004
    Date of Patent: November 18, 2008
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: David A. Kapilow, Kenneth H. Rosen, Juergen Schroeter
  • Patent number: 7454341
    Abstract: According to one aspect of the invention, a method is provided in which a mean vector set and a variance vector set of a set of N Gaussians are divided into multiple mean sub-vector sets and variance sub-vector sets, respectively. Each mean sub-vector set contains a subset of the dimensions of the corresponding mean vector set and each variance sub-vector set contains a subset of the dimensions of the corresponding variance vector set. Each resultant sub-vector set is clustered to build a codebook for the respective sub-vector set using a modified K-means clustering process which dynamically merges and splits clusters based upon the size and average distortion of each cluster during each iteration in the modified K-means clustering process.
    Type: Grant
    Filed: September 30, 2000
    Date of Patent: November 18, 2008
    Assignee: Intel Corporation
    Inventors: Jielin Pan, Baosheng Yuan
  • Patent number: 7454345
    Abstract: A voice synthesizer, which obtains a voice by emphasizing a specific part of a sentence, includes an emphasis degree deciding unit that extracts a word or a collocation to be emphasized from among respective words or respective collocations on the basis of an extracting reference with respect to the each word or the each collocation included in a sentence and deciding an emphasis degree of the extracted word or the extracted collocation, an acoustic processing unit that synthesizes a voice having an emphasis degree which is decided by the emphasis degree deciding unit applied to the word to be emphasized or the collocation to be emphasized, whereby the emphasized part of the word or the collocation can be obtained automatically on the basis of the extracting reference, such as a frequency of appearance and a level of importance of the word or the collocation.
    Type: Grant
    Filed: February 23, 2005
    Date of Patent: November 18, 2008
    Assignee: Fujitsu Limited
    Inventors: Hitoshi Sasaki, Yasushi Yamazaki, Yasuji Ota, Kaori Endo, Nobuyuki Katae, Kazuhiro Watanabe
  • Patent number: 7451087
    Abstract: A method for converting text to concatenated voice by utilizing a digital voice library and a set of playback rules is provided. The method includes receiving and expanding text data to form a sequence of text and pseudo words. The sequence of text and pseudo words is converted into a sequence of speech items, and the sequence of speech items is converted into a sequence of voice recordings. The method includes generating voice data on the sequence of voice recordings by concatenating adjacent recordings in the sequence of voice recordings.
    Type: Grant
    Filed: March 27, 2001
    Date of Patent: November 11, 2008
    Assignee: Qwest Communications International Inc.
    Inventors: Eliot M. Case, Richard P. Phillips
  • Publication number: 20080201150
    Abstract: A conversion rule and a rule selection parameter are stored. The conversion rule converts a spectral parameter of a source speaker to a spectral parameter of a target speaker. The rule selection parameter represents the spectral parameter of the source speaker. A first conversion rule of start timing and a second conversion rule of end timing in a speech unit of the source speaker are selected by the spectral parameter of the start timing and the end timing. An interpolation coefficient corresponding to the spectral parameter of each timing in the speech unit is calculated by the first conversion rule and the second conversion rule. A third conversion rule corresponding to the spectral parameter of each timing in the speech unit is calculated by interpolating the first conversion rule and the second conversion rule with the interpolation coefficient. The spectral parameter of each timing is converted to a spectral parameter of the target speaker by the third conversion rule.
    Type: Application
    Filed: January 22, 2008
    Publication date: August 21, 2008
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Masatsune Tamura, Takehiro Kagoshima
  • Patent number: 7415118
    Abstract: In accordance with an embodiment, the invention provides a spectral enhancement system that includes a plurality of distributed filters, a plurality of energy distribution units, and a weighted-averaging unit. At least one of the distributed filters receives a multi-frequency input signal. Each of the plurality of energy-detection units is coupled to an output of at least one filter and provides an energy-detection output signal. The weighted-averaging unit is coupled to each of the energy-detection units and provides a weighted-averaging signal to each of the filters responsive to the energy-detection output signals from each of the energy-detection units to implement distributed gain control. In an embodiment, the energy detection units are coupled to the outputs of the filters via a plurality of differentiator units.
    Type: Grant
    Filed: July 23, 2003
    Date of Patent: August 19, 2008
    Assignee: Massachusetts Institute of Technology
    Inventors: Rahul Sarpeshkar, Lorenzo Turicchia
  • Patent number: 7406417
    Abstract: A neural network can be trained for synthesizing or recognizing speech with the aid of a database produced by automatically matching graphemes and phonemes. First, graphemes and phonemes are matched for words which have the same number of graphemes and phonemes. Next, graphemes and phonemes are matched for words that have more graphemes than phonemes in a series of steps that combine graphemes with preceding phonemes. Then, graphemes and phonemes are matched for words that have fewer graphemes than phonemes. After each step, infrequent and unsuccessful matches made in the preceding step are are erased. After this process is completed, the database can be used to train the neural network and graphemes, or letters of a text can be converted into the corresponding phonemes with the aid of the trained neural network.
    Type: Grant
    Filed: August 29, 2000
    Date of Patent: July 29, 2008
    Assignee: Siemens Aktiengesellschaft
    Inventor: Horst-Udo Hain
  • Patent number: 7400651
    Abstract: A frequency interpolation apparatus is provided which reproduces a signal similar to an original signal by approximately recovering suppressed frequency components, from an input signal having the suppressed frequency components in a specific frequency band of the original signal. The input signal is divided into a plurality of signal component sets each having frequency components in a frequency band among a plurality of frequency bands, and a signal component set in the band with the suppressed signal components is synthesized from the plurality of divided signal component sets and added to the input signal. Each of the plurality of divided signal component sets is frequency-converted to a signal component set in the same frequency band, and the signal component set in the band with the suppressed signal components is synthesized through linear combination of the frequency-converted signal component sets.
    Type: Grant
    Filed: June 29, 2001
    Date of Patent: July 15, 2008
    Assignee: Kabushiki Kaisha Kenwood
    Inventor: Yasushi Sato
  • Patent number: 7365260
    Abstract: Music piece sequence data are composed of a plurality of event data which include performance event data and user event data designed for linking a voice to progression of a music piece. A plurality of voice data files are stored in a memory separately from the music piece sequence data. In music piece reproduction, the individual event data of the music piece sequence data are sequentially read out, and a tone signal is generated in response to each readout of the performance event data. In the meantime, a voice reproduction instruction is output in response to each readout of the user event data. In accordance with the voice reproduction instruction, a voice data file is selected from among the voice data files stored in the memory, and a voice signal is generated on the basis of each read-out voice data.
    Type: Grant
    Filed: December 16, 2003
    Date of Patent: April 29, 2008
    Assignee: Yamaha Corporation
    Inventor: Takahiro Kawashima
  • Patent number: 7346507
    Abstract: A method and apparatus for building a training set for an automated speech recognition-based system, which determines the statistically optimal number of frequently requested responses to automate in order to achieve a desired automation rate. The invention may be used to select the appropriate tokens and responses to train the system and to achieve a desired “phrase coverage” for all of the many different ways human beings may phrase a request that calls for one of a plurality of frequently-requested responses. The invention also determines the statistically optimal number of tokens (spoken requests) required to train a speech recognition-based system to achieve the desired phrase coverage and optimal allocation of tokens over the set of responses that are to be automated.
    Type: Grant
    Filed: June 4, 2003
    Date of Patent: March 18, 2008
    Assignee: BBN Technologies Corp.
    Inventors: Premkumar Natarajan, Rohit Prasad
  • Patent number: 7328157
    Abstract: Embodiments of the present invention pertain to adaptation of a corpus-driven general-purpose TTS system to at least one specific domain. The domain adaptation is realized by adding a limited amount of domain-specific speech that provides a maximum impact on improved perceived naturalness of speech. An approach for generating optimized script for adaptation is proposed, the core of which is a dynamic programming based algorithm that segments domain-specific corpus into a minimum number of segments that appear in the unit inventory. Increases in perceived naturalness of speech after adaptation are estimated from the generated script without recording speech from it.
    Type: Grant
    Filed: January 24, 2003
    Date of Patent: February 5, 2008
    Assignee: Microsoft Corporation
    Inventors: Min Chu, Hu Peng
  • Patent number: 7328159
    Abstract: An improved system for an interactive voice recognition system (400) includes a voice prompt generator (401) for generating voice prompt in a first frequency band (501). A speech detector (406) detects presence of speech energy in a second frequency band (502). The first and second frequency bands (501, 502) are essentially conjugate frequency bands. A voice data generator (412) generates voice data based on an output of the voice prompt generator (401) and audible speech of a voice response generator (402). A control signal (422) controls the voice prompt generator (401) based on whether the speech detector (406) detects presence of speech energy in the second frequency band (502). A back end (405) of the interactive voice recognition system (400) is configured to operate on an extracted front end voice feature based on whether the speech detector (406) detects presence of speech energy in the second frequency band (502).
    Type: Grant
    Filed: January 15, 2002
    Date of Patent: February 5, 2008
    Assignee: Qualcomm Inc.
    Inventors: Chienchung Chang, Narendranath Malayath
  • Patent number: 7308407
    Abstract: A method for generating synthetic speech can include identifying a recording of conversational speech and creating a transcription of the conversational speech. Using the transcription, rather than a predefined script, the recording can be analyzed and acoustic units extracted. Each acoustic unit can include a phoneme and/or a sub-phoneme. The acoustic units can be stored so that a concatenative text-to-speech engine can later splice the acoustic units together to produce synthetic speech.
    Type: Grant
    Filed: March 3, 2003
    Date of Patent: December 11, 2007
    Assignee: International Business Machines Corporation
    Inventor: David E. Reich
  • Patent number: 7308408
    Abstract: A method and system for providing efficient menu services for an information processing system that uses a telephone or other form of audio user interface. In one embodiment, the menu services provide effective support for novice users by providing a full listing of available keywords and rotating house advertisements which inform novice users of potential features and information. For experienced users, cues are rendered so that at any time the user can say a desired keyword to invoke the corresponding application. The menu is flat to facilitate its usage. Full keyword listings are rendered after the user is given a brief cue to say a keyword. Service messages rotate words and word prosody. When listening to receive information from the user, after the user has been cued, soft background music or other audible signals are rendered to inform the user that a response may now be spoken to the service.
    Type: Grant
    Filed: September 29, 2004
    Date of Patent: December 11, 2007
    Assignee: Microsoft Corporation
    Inventors: Lisa Joy Stifelman, Hadi Partovi, Haleh Partovi, David Bryan Alpert, Matthew Talin Marx, Scott James Bailey, Kyle D. Sims, Darby McDonough Bailey, Roderick Steven Brathwaite, Eugene Koh, Angus Macdonald Davis
  • Patent number: 7280968
    Abstract: A method for digitally generating speech with improved prosodic characteristics can include receiving a speech input, determining at least one prosodic characteristic contained within the speech input, and generating a speech output including the prosodic characteristic within the speech output.
    Type: Grant
    Filed: March 25, 2003
    Date of Patent: October 9, 2007
    Assignee: International Business Machines Corporation
    Inventor: Oscar J. Blass
  • Patent number: 7277856
    Abstract: A speech synthesis system for controlling a discontinuous distortion that occurs at the transition portion between concatenated phonemes which are speech units of a synthesized speech using a smoothing technique, comprising: a discontinuous distortion processing means adapted to predict a discontinuity at the transition portion between concatenated samples of phonemes used for a speech synthesis through a predetermined learning process, and control a discontinuity at the transition portion between the concatenated phonemes of the synthesized speech in such a fashion that it is smoothed adaptively to correspond to a degree of the predicted discontinuity. The smoothing filter smoothes the synthesized speech so that the discontinuity degree of synthesized speech follows the predicted discontinuity degree according to the filter coefficient (a) changed adaptively to correspond to a ratio of the predicted discontinuity degree to the real discontinuity degree.
    Type: Grant
    Filed: October 31, 2002
    Date of Patent: October 2, 2007
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Ki-seung Lee, Jeong-su Kim, Jae-won Lee
  • Patent number: 7266497
    Abstract: Systems and methods for automatically segmenting speech inventories. A set of Hidden Markov Models (HMMs) are initialized using bootstrap data. The HMMs are next re-estimated and aligned to produce phone labels. The phone boundaries of the phone labels are then corrected using spectral boundary correction. Optionally, this process of using the spectral-boundary-corrected phone labels as input instead of the bootstrap data is performed iteratively in order to further reduce mismatches between manual labels and phone labels assigned by the HMM approach.
    Type: Grant
    Filed: January 14, 2003
    Date of Patent: September 4, 2007
    Assignee: AT&T Corp.
    Inventors: Alistair D. Conkie, Yeon-Jun Kim
  • Patent number: 7233901
    Abstract: A system and computer-readable medium synthesize speech from text using a triphone unit selection database. The instructions on the computer-readable medium control a computing device to perform the steps: receiving input text, selecting a plurality of N phoneme units from the triphone unit selection database as candidate phonemes for synthesized speech based on the input text, applying a cost process to select a set of phonemes from the candidate phonemes and synthesizing speech using the selected set of phonemes.
    Type: Grant
    Filed: December 30, 2005
    Date of Patent: June 19, 2007
    Assignee: AT&T Corp.
    Inventor: Alistair D. Conkie
  • Patent number: 7171362
    Abstract: The assignment of phonemes to graphemes producing them in a lexicon having words (grapheme sequences) and their associated phonetic transcription (phoneme sequences) for the preparation of patterns for training neural networks for the purpose of grapheme-phoneme conversion is carried out with the aid of a variant of dynamic programming which is known as dynamic time warping (DTW).
    Type: Grant
    Filed: August 31, 2001
    Date of Patent: January 30, 2007
    Assignee: Siemens Aktiengesellschaft
    Inventor: Horst-Udo Hain
  • Patent number: 7139712
    Abstract: A second phoneme is generated in consideration of a phonemic context with respect to a first phoneme as a search target. Phonemic piece data corresponding to the second phoneme is searched out from a database. A third phoneme is generated by changing the phonemic context on the basis of the search result, and phonemic piece data corresponding to the third phoneme is re-searched out from the database. The search or re-search result is registered in a table in correspondence with the second or third phoneme.
    Type: Grant
    Filed: March 5, 1999
    Date of Patent: November 21, 2006
    Assignee: Canon Kabushiki Kaisha
    Inventor: Masayuki Yamada
  • Patent number: 7124083
    Abstract: A system and method for improving the response time of text-to-speech synthesis utilizes “triphone contexts” (i.e., triplets comprising a central phoneme and its immediate context) as the basic unit, instead of performing phoneme-by-phoneme synthesis. The method comprises a method of generating a triphone preselection cost database for use in speech synthesis, the method comprising 1) selecting a triphone sequence u1-u2-u3, 2) calculating a preselection cost for each 5-phoneme sequence ua-u1-u2-u3-ub, where u2 is allowed to match any identically labeled phoneme in a database and the units ua and ub vary over the entire phoneme universe and 3) storing a group of the selected triphone sequences exhibiting the lowest costs in a triphone preselection cost database.
    Type: Grant
    Filed: November 5, 2003
    Date of Patent: October 17, 2006
    Assignee: AT&T Corp.
    Inventor: Alistair D. Conkie
  • Patent number: 7120584
    Abstract: A method and system for synthesizing audio speech is provided. A synthesis engine receives from a host, compressed and normalized speech units and prosodic information. The synthesis engine decompresses data and synthesizes audio signals. The synthesis engine can be implemented on a digital signal processing system which can meet requirements of low resources (i.e. low power consumption, lower memory usage), such as a DSP system including an input/output module, a WOLA filterbank and a DSP core that operate in parallel.
    Type: Grant
    Filed: October 22, 2002
    Date of Patent: October 10, 2006
    Assignee: AMI Semiconductor, Inc.
    Inventors: Hamid Sheikhzadeh-Nadjar, Etienne Cornu, Robert L. Brennan
  • Patent number: 7082396
    Abstract: A speech synthesis system can select recorded speech fragments, or acoustic units, from a very large database of acoustic units to produce artificial speech. The selected acoustic units are chosen to minimize a combination of target and concatenation costs for a given sentence. However, as concatenation costs, which are measures of the mismatch between sequential pairs of acoustic units, are expensive to compute, processing can be greatly reduced by pre-computing and caching the concatenation costs. Unfortunately, the number of possible sequential pairs of acoustic units makes such caching prohibitive. However, statistical experiments reveal that while about 85% of the acoustic units are typically used in common speech, less than 1% of the possible sequential pairs of acoustic units occur in practice.
    Type: Grant
    Filed: December 19, 2003
    Date of Patent: July 25, 2006
    Assignee: AT&T Corp
    Inventors: Mark C. Beutnagel, Mehryar Mohri, Michael D. Riley
  • Patent number: 7076426
    Abstract: An enhanced system is achieved by allowing bookmarks which can specify that the stream of bits that follow corresponds to phonemes and a plurality of prosody information, including duration information, that is specified for times within the duration of the phonemes. Illustratively, such a stream comprises a flag to enable a duration flag, a flag to enable a pitch contour flag, a flag to enable an energy contour flag, a specification of the number of phonemes that follow, and, for each phoneme, one or more sets of specific prosody information that relates to the phoneme, such as a set of pitch values and their durations.
    Type: Grant
    Filed: January 27, 1999
    Date of Patent: July 11, 2006
    Assignee: AT&T Corp.
    Inventors: Mark Charles Beutnagel, Joern Ostermann, Schuyler Reynier Quackenbush
  • Patent number: 7069217
    Abstract: A synthesizer is disclosed in which a speech waveform is synthesized by selecting a synthetic starting waveform segment and then generating a sequence of further segments. The further waveform segments are generated based jointly upon the value of the immediately-preceding segment and upon a model of the dynamics of an actual sound similar to that being generated. In particular, a method is disclosed of a voiced speech sound comprising calculating each new output value from the previous output value using data modeling the evolution, over a short time interval, of the voiced speech sound to be synthesized. This sequential generation of waveform segments enables a synthesized sequence of speech waveforms to be generated of any duration. In addition, a low-dimensional state space representation of speech signals are used in which successive pitch pulse cycles are superimposed to estimate the progression of the cyclic speech signal within each cycle.
    Type: Grant
    Filed: January 9, 1997
    Date of Patent: June 27, 2006
    Assignee: British Telecommunications PLC
    Inventors: Stephen McLaughlin, Michael Banbrook
  • Patent number: 7062440
    Abstract: A speech system has a speech input channel including a speech recognizer, and a speech output channel including a text-to-speech converter. Associated with the input channel is a barge-in control for setting barge-in behavior parameters determining how the apparatus handles barge-in by a user during speech output by the apparatus. In order to make the barge-in control more responsive to the actual speech output from the output channel, a barge-in prediction arrangement is provided that is responsive to feature values produced during the operation of the text-to-speech converter to produce indications as to the most likely barge-in points. The barge-in control is responsive to these indications to adjust at least one of the barge-in behavior parameters for periods corresponding to the most likely barge-in points.
    Type: Grant
    Filed: May 31, 2002
    Date of Patent: June 13, 2006
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Paul St John Brittan, Roger Cecil Ferry Tucker
  • Patent number: 7043422
    Abstract: A method and apparatus are provided for adapting a language model to a task-specific domain. Under the method and apparatus, the relative frequency of n-grams in a small training set (i.e. task-specific training data set) and the relative frequency of n-grams in a large training set (i.e. out-of-domain training data set) are used to weight a distribution count of n-grams in the large training set. The weighted distributions are then used to form a modified language model by identifying probabilities for n-grams from the weighted distributions.
    Type: Grant
    Filed: September 4, 2001
    Date of Patent: May 9, 2006
    Assignee: Microsoft Corporation
    Inventors: Jianfeng Gao, Mingjing Li
  • Patent number: 7031919
    Abstract: A speech synthesizing apparatus for synthesizing a speech waveform stores speech data, which is obtained by adding attribute information onto phoneme data, in a database. In accordance with prescribed retrieval conditions, a phoneme retrieval unit retrieves phoneme data from the speech data that has been stored in the database and retains the retrieved results in a retrieved-result storage area. A processing unit for assigning a power penalty and a processing unit for assigning a phoneme-duration penalty assign the penalties, on the basis of power and phoneme duration constituting the attribute information, to a set of phoneme data stored in the retrieved-result storage area. A processing unit for determining typical phoneme data performs sorting on the basis of the assigned penalties and, based upon the stored results, selects phoneme data to be employed in the synthesis of a speech waveform.
    Type: Grant
    Filed: August 30, 1999
    Date of Patent: April 18, 2006
    Assignee: Canon Kabushiki Kaisha
    Inventors: Yasuo Okutani, Masayuki Yamada
  • Patent number: 7016841
    Abstract: A singing voice synthesizing apparatus is provided, which enables achievement of a natural sounding synthesized singing voice with a good level of comprehensibility. A phoneme database stores a plurality of voice fragment data formed of voice fragments each being a single phoneme or a phoneme chain of at least two concatenated phonemes, each of the plurality of voice fragment data comprising data of a deterministic component and data of a stochastic component. A readout device that reads out from the phoneme database the voice fragment data corresponding to inputted lyrics. A duration time adjusting device adjusts time duration of the read-out voice fragment data so as to match a desired tempo and manner of singing. An adjusting device adjusts the deterministic component and the stochastic component of the read-out voice fragment so as to match a desired pitch.
    Type: Grant
    Filed: December 27, 2001
    Date of Patent: March 21, 2006
    Assignee: Yamaha Corporation
    Inventors: Hideki Kenmochi, Xavier Serra, Jordi Bonada
  • Patent number: 7013278
    Abstract: A method for generating concatenative speech uses a speech synthesis input to populate a triphone-indexed database that is later used for searching and retrieval to create a phoneme string acceptable for a text-to-speech operation. Prior to initiating the “real time” synthesis process, a database is created of all possible triphone contexts by inputting a continuous stream of speech. The speech data is then analyzed to identify all possible triphone sequences in the stream, and the various units chosen for each context. During a later text-to-speech operation, the triphone contexts in the text are identified and the triphone-indexed phonemes in the database are searched to retrieve the best-matched candidates.
    Type: Grant
    Filed: September 5, 2002
    Date of Patent: March 14, 2006
    Assignee: AT&T Corp.
    Inventor: Alistair D. Conkie
  • Patent number: 7003461
    Abstract: An adaptive codebook search (ACS) algorithm is based on a set of matrix operations suitable for data processing engines supporting a single instruction multiple data (SIMD) architecture. The result is a reduction in memory access and increased parallelism to produce an overall improvement in the computational efficiency of ACS processing.
    Type: Grant
    Filed: July 9, 2002
    Date of Patent: February 21, 2006
    Assignee: Renesas Technology Corporation
    Inventor: Clifford Tavares
  • Patent number: 6970820
    Abstract: The speech synthesizer is personalized to sound like or mimic the speech characteristics of an individual speaker. The individual speaker provides a quantity of enrollment data, which can be extracted from a short quantity of speech, and the system modifies the base synthesis parameters to more closely resemble those of the new speaker. More specifically, the synthesis parameters may be decomposed into speaker dependent parameters, such as context-independent parameters, and speaker independent parameters, such as context dependent parameters. The speaker dependent parameters are adapted using enrollment data from the new speaker. After adaptation, the speaker dependent parameters are combined with the speaker independent parameters to provide a set of personalized synthesis parameters.
    Type: Grant
    Filed: February 26, 2001
    Date of Patent: November 29, 2005
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Jean-Claude Junqua, Florent Perronnin, Roland Kuhn, Patrick Nguyen
  • Patent number: 6959277
    Abstract: In a conventional device for extracting voice features accurately without being influenced by noises, such as a voice recognition device, usually an input voice signal is processed first by a noise reduction system having the tap length N, and the result is FFT-processed by L-points, and then the power spectrum vector is calculated; accordingly, a one time operation requires N multiplications and (N?1) summations. The voice feature extraction device according to the invention receives a voice signal including noises from a microphone, which is processed by a window function operation unit, and thereafter FFT-processed by an FFT operation unit by L-points. A power calculation unit calculates a power spectrum vector of the input voice signal. However, a noise reduction system determines in advance a filter coefficient of this system and processes the coefficient to calculate a noise reduction coefficient, and the power spectrum vector is processed by this noise reduction system.
    Type: Grant
    Filed: June 26, 2001
    Date of Patent: October 25, 2005
    Assignee: Alpine Electronics, Inc.
    Inventors: Shingo Kiuchi, Toshiaki Asano, Nozomu Saito
  • Patent number: 6876968
    Abstract: A method and system provide for run-time modification of synthesized speech. The method includes the step of generating synthesized speech based on textual input and a plurality of run-time control parameter values. Real-time data is generated based on an input signal, where the input signal characterizes an intelligibility of the speech with regard to a listener. The method further provides for modifying one or more of the run-time control parameter values based on the real-time data such that the intelligibility of the speech increases. Modifying the parameter values at run-time as opposed to during the design stages provides a level of adaptation unachievable through conventional approaches.
    Type: Grant
    Filed: March 8, 2001
    Date of Patent: April 5, 2005
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventor: Peter Veprek
  • Patent number: 6847932
    Abstract: Given phonetic information is divided into speech units of extended CV which is a contiguous sequence of phonemes without clear distinction containing a vowel or some vowels. Contour of vocal tract transmission function of phoneme of the speech unit of extended CV is obtained from the phoneme directory which contains a contour of vocal tract transmission function of each phoneme associated with phonetic information in a unit of extended CV. Speech waveform data is generated based on the contour of vocal tract transmission function of phoneme of the speech unit of extended CV. Speech waveform data is converted into analog voice signal.
    Type: Grant
    Filed: September 28, 2000
    Date of Patent: January 25, 2005
    Assignee: Arcadia, Inc.
    Inventors: Kazuyuki Ashimura, Seiichi Tenpaku
  • Patent number: 6847931
    Abstract: A preferred embodiment of the method for converting text to speech using a computing device having a memory is disclosed. Text, being made up of a plurality of words, is received into the memory of the computing device. A plurality of phonemes are derived from the text. Each of the phonemes is associated with a prosody record based on a database of prosody records associated with a plurality of words. A first set of the artificial intelligence rules is applied to determine context information associated with the text. The context influenced prosody changes for each of the phonemes is determined. Then a second set of rules, based on Lessac theory to determine Lessac derived prosody changes for each of the phonemes is applied. The prosody record for each of the phonemes is amended in response to the context influenced prosody changes and the Lessac derived prosody changes. Then a reading from the memory sound information associated with the phonemes is performed.
    Type: Grant
    Filed: January 29, 2002
    Date of Patent: January 25, 2005
    Assignee: Lessac Technology, Inc.
    Inventors: Edwin R. Addison, H. Donald Wilson, Gary Marple, Anthony H. Handal, Nancy Krebs
  • Patent number: 6845358
    Abstract: A prosody matching template in the form of a tree structure stores indices which point to lookup table and template information prescribing pitch and duration values that are used to add inflection to the output of a text-to-speech synthesizer. The lookup module employs a search algorithm that explores each branch of the tree, assigning penalty scores based on whether the syllable represented by a node of the tree does or does not match the corresponding syllable of the target word. The path with the lowest penalty score is selected as the index into the prosody template table. The system will add nodes by cloning existing nodes in cases where it is not possible to find a one-to-one match between the number of syllables in the target word and the number of nodes in the tree.
    Type: Grant
    Filed: January 5, 2001
    Date of Patent: January 18, 2005
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Nicholas Kibre, Ted H. Applebaum
  • Patent number: 6845359
    Abstract: A Fast Fourier Transform (FFT) based voice synthesis method 110, program product and vocoder. Sounds, e.g., speech and audio, are synthesized from multiple sine waves. Each sine wave component is represented by a small number of FFT coefficients 116. Amplitude 120 and phase 124 information of the components may be incorporated into these coefficients. The FFT coefficients corresponding to each of the components are summed 126 and, then, an inverse FFT is applied 128 to the sum to generate a time domain signal. An appropriate section is extracted 130 from the inverse transformed time domain signal as an approximation to the desired output. FFT based synthesis 110 may be combined with simple sine wave summation 100, using FFT based synthesis 110 for complex sounds, e.g., male voices and unvoiced speech, and sine wave summation 100 for simpler sounds, e.g., female voices.
    Type: Grant
    Filed: March 22, 2001
    Date of Patent: January 18, 2005
    Assignee: Motorola, Inc.
    Inventor: Tenkasi Ramabadran