Specialized Model Patents (Class 704/266)
-
Patent number: 7555433Abstract: A main controller feeds a spelling translator with a text item representing a place name stored in a map database. The spelling translator translates the spelling of the text item according to rules described in a translation rule table. The spelling translator translates, e.g., a French character or string included in the text item and not included in the English alphabet into an English alphabet character or string having a pronunciation equivalent or similar to the pronunciation of the French character or string. The translated text item is fed into a TTS engine for English. The TTS engine converts the text item into voice, which is output from a speaker.Type: GrantFiled: July 7, 2003Date of Patent: June 30, 2009Assignee: Alpine Electronics, Inc.Inventor: Michiaki Otani
-
Patent number: 7546241Abstract: In a speech synthesis process, micro-segments are cut from acquired waveform data and a window function. The obtained micro-segments are re-arranged to implement a desired prosody, and superposed data is generated by superposing the re-arranged micro-segments, so as to obtain synthetic speech waveform data. A spectrum correction filter is formed based on the acquired waveform data. At least one of the waveform data, micro-segments, and superposed data is corrected using the spectrum correction filter. In this way, “blur” of a speech spectrum due to the window function applied to obtain micro-segments is reduced, and speech synthesis with high sound quality is realized.Type: GrantFiled: June 2, 2003Date of Patent: June 9, 2009Assignee: Canon Kabushiki KaishaInventors: Masayuki Yamada, Yasuhiro Komori, Toshiaki Fukada
-
Patent number: 7519535Abstract: A voice decoder configured to receive a sequence of frames, each of the frames having voice parameters. The voice decoder includes a speech generator that generates speech from the voice parameters. A frame erasure concealment module is configured to reconstruct the voice parameters for a frame erasure in the sequence of frames from the voice parameters in one of the previous frames and the voice parameters in one of the subsequent frames.Type: GrantFiled: January 31, 2005Date of Patent: April 14, 2009Assignee: QUALCOMM IncorporatedInventor: Serafin Diaz Spindola
-
Patent number: 7502739Abstract: In generation of an intonation pattern of a speech synthesis, a speech synthesis system is capable of providing a highly natural speech and capable of reproducing speech characteristics of a speaker flexibly and accurately by effectively utilizing FO patterns of actual speech accumulated in a database. An intonation generation method generates an intonation of synthesized speech for text by estimating, based on language information of the text and based on the estimated outline of the intonation, and then selects an optimum intonation pattern from a database which stores intonation patterns of actual speech. Speech characteristics recorded in advance are reflected in an estimation of an outline of the intonation pattern and selection of a waveform element of a speech.Type: GrantFiled: January 24, 2005Date of Patent: March 10, 2009Assignee: International Business Machines CorporationInventors: Takashi Saito, Masaharu Sakamoto
-
Publication number: 20090063153Abstract: A system and method for generating a synthetic text-to-speech TTS voice are disclosed. A user is presented with at least one TTS voice and at least one voice characteristic. A new synthetic TTS voice is generated by blending a plurality of existing TTS voices according to the selected voice characteristics. The blending of voices involves interpolating segmented parameters of each TTS voice. Segmented parameters may be, for example, prosodic characteristics of the speech such as pitch, volume, phone durations, accents, stress, mis-pronunciations and emotion.Type: ApplicationFiled: November 4, 2008Publication date: March 5, 2009Applicant: AT&T Corp.Inventors: David A. Kapilow, Kenneth H. Rosen, Juergen Schroeter
-
Patent number: 7487093Abstract: In a voice synthesis apparatus, by bounding a desired range of input text to be output by, e.g., a start tag “<morphing type=“emotion” start=“happy” end=“angry”>” and end tag </morphing>, a feature of synthetic voice is continuously changed while gradually changing voice from a happy voice to an angry voice upon outputting synthetic voice.Type: GrantFiled: August 10, 2004Date of Patent: February 3, 2009Assignee: Canon Kabushiki KaishaInventors: Masahiro Mutsuno, Toshiaki Fukada
-
Patent number: 7483832Abstract: A method and system of customizing voice translation of a text to speech includes digitally recording speech samples of a known speaker, correlating each of the speech samples with a standardized audio representation, and organizing the recorded speech samples and correlated audio representations into a collection. The collection of speech samples correlated with audio representations is saved as a single voice file and stored in a device capable of translating the text to speech. The voice file is applied to a translation of text to speech so that the translated speech is customized according to the applied voice file.Type: GrantFiled: December 10, 2001Date of Patent: January 27, 2009Assignee: AT&T Intellectual Property I, L.P.Inventor: Steve Tischer
-
Patent number: 7472061Abstract: Systems and methods are provided for automatically building a native phonetic lexicon for a speech-based application trained to process a native (base) language, wherein the native phonetic lexicon includes native phonetic transcriptions (base forms) for non-native (foreign) words which are automatically derived from non-native phonetic transcriptions of the non-native words.Type: GrantFiled: March 31, 2008Date of Patent: December 30, 2008Assignee: International Business Machines CorporationInventors: Neal Alewine, Eric Janke, Paul Sharp, Roberto Sicconi
-
Patent number: 7472065Abstract: Converting marked-up text into a synthesized stream includes providing marked-up text to a processor-based system, converting the marked-up text into a text stream including vocabulary items, retrieving audio segments corresponding to the vocabulary items, concatenating the audio segments to form a synthesized stream, and audibly outputting the synthesized stream, wherein the marked-up text includes a normal text and a paralinguistic text; and wherein the normal text is differentiated from the paralinguistic text by using a grammar constraint, and wherein the paralinguistic text is associated with more than one audio segment, wherein the retrieving of the plurality audio segments includes selecting one audio segment associated with the paralinguistic text.Type: GrantFiled: June 4, 2004Date of Patent: December 30, 2008Assignee: International Business Machines CorporationInventors: Andrew S. Aaron, Raimo Bakis, Ellen M. Eide, Wael Hamza
-
Patent number: 7472066Abstract: An automatic speech segmentation and verification system and method is disclosed, which has a known text script and a recorded speech corpus corresponding to the known text script. A speech unit segmentor segments the recorded speech corpus into N test speech unit segments referring to the phonetic information of the known text script. Then, a segmental verifier is applied to obtain a confidence measure of syllable segmentation for verifying the correctness of the cutting points of test speech unit segments. A phonetic verifier obtains a confidence measure of syllable verification by using verification models for verifying whether the recorded speech corpus is correctly recorded. Finally, a speech unit inspector integrates the confidence measure of syllable segmentation and the confidence measure of syllable verification to determine whether the test speech unit segment is accepted or not.Type: GrantFiled: February 23, 2004Date of Patent: December 30, 2008Assignee: Industrial Technology Research InstituteInventors: Chih-Chung Kuo, Chi-Shiang Kuo, Jau-Hung Chen
-
Patent number: 7464034Abstract: A voice converting apparatus is constructed for converting an input voice into an output voice according to a target voice. The apparatus includes a storage section, an analyzing section including a characteristic analyzer, a producing section, a synthesizing section, a memory, an alignment processor, and target decoder.Type: GrantFiled: September 27, 2004Date of Patent: December 9, 2008Assignees: Yamaha Corporation, Pompeu Fabra UniversityInventors: Takahiro Kawashima, Yasuo Yoshioka, Pedro Cano, Alex Loscos, Xavier Serra, Mark Schiementz, Jordi Bonada
-
Patent number: 7460997Abstract: A system and method for improving the response time of text-to-speech synthesis utilizes “triphone contexts” (i.e., triplets comprising a central phoneme and its immediate context) as the basic unit, instead of performing phoneme-by-phoneme synthesis. The method comprises a method of generating a triphone preselection cost database for use in speech synthesis, the method comprising 1) selecting a triphone sequence u1-u2-u3, 2) calculating a preselection cost for each 5-phoneme sequence ua-u1-u2-u3-ub, where u2 is allowed to match any identically labeled phoneme in a database and the units ua and ub vary over the entire phoneme universe and 3) storing a group of the selected triphone sequences exhibiting the lowest costs in a triphone preselection cost database.Type: GrantFiled: August 22, 2006Date of Patent: December 2, 2008Assignee: AT&T Intellectual Property II, L.P.Inventor: Alistair D. Conkie
-
Patent number: 7454348Abstract: A system and method for generating a synthetic text-to-speech TTS voice are disclosed. A user is presented with at least one TTS voice and at least one voice characteristic. A new synthetic TTS voice is generated by blending a plurality of existing TTS voices according to the selected voice characteristics. The blending of voices involves interpolating segmented parameters of each TTS voice. Segmented parameters may be, for example, prosodic characteristics of the speech such as pitch, volume, phone durations, accents, stress, mis-pronunciations and emotion.Type: GrantFiled: January 8, 2004Date of Patent: November 18, 2008Assignee: AT&T Intellectual Property II, L.P.Inventors: David A. Kapilow, Kenneth H. Rosen, Juergen Schroeter
-
Patent number: 7454341Abstract: According to one aspect of the invention, a method is provided in which a mean vector set and a variance vector set of a set of N Gaussians are divided into multiple mean sub-vector sets and variance sub-vector sets, respectively. Each mean sub-vector set contains a subset of the dimensions of the corresponding mean vector set and each variance sub-vector set contains a subset of the dimensions of the corresponding variance vector set. Each resultant sub-vector set is clustered to build a codebook for the respective sub-vector set using a modified K-means clustering process which dynamically merges and splits clusters based upon the size and average distortion of each cluster during each iteration in the modified K-means clustering process.Type: GrantFiled: September 30, 2000Date of Patent: November 18, 2008Assignee: Intel CorporationInventors: Jielin Pan, Baosheng Yuan
-
Patent number: 7454345Abstract: A voice synthesizer, which obtains a voice by emphasizing a specific part of a sentence, includes an emphasis degree deciding unit that extracts a word or a collocation to be emphasized from among respective words or respective collocations on the basis of an extracting reference with respect to the each word or the each collocation included in a sentence and deciding an emphasis degree of the extracted word or the extracted collocation, an acoustic processing unit that synthesizes a voice having an emphasis degree which is decided by the emphasis degree deciding unit applied to the word to be emphasized or the collocation to be emphasized, whereby the emphasized part of the word or the collocation can be obtained automatically on the basis of the extracting reference, such as a frequency of appearance and a level of importance of the word or the collocation.Type: GrantFiled: February 23, 2005Date of Patent: November 18, 2008Assignee: Fujitsu LimitedInventors: Hitoshi Sasaki, Yasushi Yamazaki, Yasuji Ota, Kaori Endo, Nobuyuki Katae, Kazuhiro Watanabe
-
Patent number: 7451087Abstract: A method for converting text to concatenated voice by utilizing a digital voice library and a set of playback rules is provided. The method includes receiving and expanding text data to form a sequence of text and pseudo words. The sequence of text and pseudo words is converted into a sequence of speech items, and the sequence of speech items is converted into a sequence of voice recordings. The method includes generating voice data on the sequence of voice recordings by concatenating adjacent recordings in the sequence of voice recordings.Type: GrantFiled: March 27, 2001Date of Patent: November 11, 2008Assignee: Qwest Communications International Inc.Inventors: Eliot M. Case, Richard P. Phillips
-
Publication number: 20080201150Abstract: A conversion rule and a rule selection parameter are stored. The conversion rule converts a spectral parameter of a source speaker to a spectral parameter of a target speaker. The rule selection parameter represents the spectral parameter of the source speaker. A first conversion rule of start timing and a second conversion rule of end timing in a speech unit of the source speaker are selected by the spectral parameter of the start timing and the end timing. An interpolation coefficient corresponding to the spectral parameter of each timing in the speech unit is calculated by the first conversion rule and the second conversion rule. A third conversion rule corresponding to the spectral parameter of each timing in the speech unit is calculated by interpolating the first conversion rule and the second conversion rule with the interpolation coefficient. The spectral parameter of each timing is converted to a spectral parameter of the target speaker by the third conversion rule.Type: ApplicationFiled: January 22, 2008Publication date: August 21, 2008Applicant: KABUSHIKI KAISHA TOSHIBAInventors: Masatsune Tamura, Takehiro Kagoshima
-
Patent number: 7415118Abstract: In accordance with an embodiment, the invention provides a spectral enhancement system that includes a plurality of distributed filters, a plurality of energy distribution units, and a weighted-averaging unit. At least one of the distributed filters receives a multi-frequency input signal. Each of the plurality of energy-detection units is coupled to an output of at least one filter and provides an energy-detection output signal. The weighted-averaging unit is coupled to each of the energy-detection units and provides a weighted-averaging signal to each of the filters responsive to the energy-detection output signals from each of the energy-detection units to implement distributed gain control. In an embodiment, the energy detection units are coupled to the outputs of the filters via a plurality of differentiator units.Type: GrantFiled: July 23, 2003Date of Patent: August 19, 2008Assignee: Massachusetts Institute of TechnologyInventors: Rahul Sarpeshkar, Lorenzo Turicchia
-
Patent number: 7406417Abstract: A neural network can be trained for synthesizing or recognizing speech with the aid of a database produced by automatically matching graphemes and phonemes. First, graphemes and phonemes are matched for words which have the same number of graphemes and phonemes. Next, graphemes and phonemes are matched for words that have more graphemes than phonemes in a series of steps that combine graphemes with preceding phonemes. Then, graphemes and phonemes are matched for words that have fewer graphemes than phonemes. After each step, infrequent and unsuccessful matches made in the preceding step are are erased. After this process is completed, the database can be used to train the neural network and graphemes, or letters of a text can be converted into the corresponding phonemes with the aid of the trained neural network.Type: GrantFiled: August 29, 2000Date of Patent: July 29, 2008Assignee: Siemens AktiengesellschaftInventor: Horst-Udo Hain
-
Patent number: 7400651Abstract: A frequency interpolation apparatus is provided which reproduces a signal similar to an original signal by approximately recovering suppressed frequency components, from an input signal having the suppressed frequency components in a specific frequency band of the original signal. The input signal is divided into a plurality of signal component sets each having frequency components in a frequency band among a plurality of frequency bands, and a signal component set in the band with the suppressed signal components is synthesized from the plurality of divided signal component sets and added to the input signal. Each of the plurality of divided signal component sets is frequency-converted to a signal component set in the same frequency band, and the signal component set in the band with the suppressed signal components is synthesized through linear combination of the frequency-converted signal component sets.Type: GrantFiled: June 29, 2001Date of Patent: July 15, 2008Assignee: Kabushiki Kaisha KenwoodInventor: Yasushi Sato
-
Patent number: 7365260Abstract: Music piece sequence data are composed of a plurality of event data which include performance event data and user event data designed for linking a voice to progression of a music piece. A plurality of voice data files are stored in a memory separately from the music piece sequence data. In music piece reproduction, the individual event data of the music piece sequence data are sequentially read out, and a tone signal is generated in response to each readout of the performance event data. In the meantime, a voice reproduction instruction is output in response to each readout of the user event data. In accordance with the voice reproduction instruction, a voice data file is selected from among the voice data files stored in the memory, and a voice signal is generated on the basis of each read-out voice data.Type: GrantFiled: December 16, 2003Date of Patent: April 29, 2008Assignee: Yamaha CorporationInventor: Takahiro Kawashima
-
Patent number: 7346507Abstract: A method and apparatus for building a training set for an automated speech recognition-based system, which determines the statistically optimal number of frequently requested responses to automate in order to achieve a desired automation rate. The invention may be used to select the appropriate tokens and responses to train the system and to achieve a desired “phrase coverage” for all of the many different ways human beings may phrase a request that calls for one of a plurality of frequently-requested responses. The invention also determines the statistically optimal number of tokens (spoken requests) required to train a speech recognition-based system to achieve the desired phrase coverage and optimal allocation of tokens over the set of responses that are to be automated.Type: GrantFiled: June 4, 2003Date of Patent: March 18, 2008Assignee: BBN Technologies Corp.Inventors: Premkumar Natarajan, Rohit Prasad
-
Patent number: 7328157Abstract: Embodiments of the present invention pertain to adaptation of a corpus-driven general-purpose TTS system to at least one specific domain. The domain adaptation is realized by adding a limited amount of domain-specific speech that provides a maximum impact on improved perceived naturalness of speech. An approach for generating optimized script for adaptation is proposed, the core of which is a dynamic programming based algorithm that segments domain-specific corpus into a minimum number of segments that appear in the unit inventory. Increases in perceived naturalness of speech after adaptation are estimated from the generated script without recording speech from it.Type: GrantFiled: January 24, 2003Date of Patent: February 5, 2008Assignee: Microsoft CorporationInventors: Min Chu, Hu Peng
-
Patent number: 7328159Abstract: An improved system for an interactive voice recognition system (400) includes a voice prompt generator (401) for generating voice prompt in a first frequency band (501). A speech detector (406) detects presence of speech energy in a second frequency band (502). The first and second frequency bands (501, 502) are essentially conjugate frequency bands. A voice data generator (412) generates voice data based on an output of the voice prompt generator (401) and audible speech of a voice response generator (402). A control signal (422) controls the voice prompt generator (401) based on whether the speech detector (406) detects presence of speech energy in the second frequency band (502). A back end (405) of the interactive voice recognition system (400) is configured to operate on an extracted front end voice feature based on whether the speech detector (406) detects presence of speech energy in the second frequency band (502).Type: GrantFiled: January 15, 2002Date of Patent: February 5, 2008Assignee: Qualcomm Inc.Inventors: Chienchung Chang, Narendranath Malayath
-
Patent number: 7308407Abstract: A method for generating synthetic speech can include identifying a recording of conversational speech and creating a transcription of the conversational speech. Using the transcription, rather than a predefined script, the recording can be analyzed and acoustic units extracted. Each acoustic unit can include a phoneme and/or a sub-phoneme. The acoustic units can be stored so that a concatenative text-to-speech engine can later splice the acoustic units together to produce synthetic speech.Type: GrantFiled: March 3, 2003Date of Patent: December 11, 2007Assignee: International Business Machines CorporationInventor: David E. Reich
-
Patent number: 7308408Abstract: A method and system for providing efficient menu services for an information processing system that uses a telephone or other form of audio user interface. In one embodiment, the menu services provide effective support for novice users by providing a full listing of available keywords and rotating house advertisements which inform novice users of potential features and information. For experienced users, cues are rendered so that at any time the user can say a desired keyword to invoke the corresponding application. The menu is flat to facilitate its usage. Full keyword listings are rendered after the user is given a brief cue to say a keyword. Service messages rotate words and word prosody. When listening to receive information from the user, after the user has been cued, soft background music or other audible signals are rendered to inform the user that a response may now be spoken to the service.Type: GrantFiled: September 29, 2004Date of Patent: December 11, 2007Assignee: Microsoft CorporationInventors: Lisa Joy Stifelman, Hadi Partovi, Haleh Partovi, David Bryan Alpert, Matthew Talin Marx, Scott James Bailey, Kyle D. Sims, Darby McDonough Bailey, Roderick Steven Brathwaite, Eugene Koh, Angus Macdonald Davis
-
Patent number: 7280968Abstract: A method for digitally generating speech with improved prosodic characteristics can include receiving a speech input, determining at least one prosodic characteristic contained within the speech input, and generating a speech output including the prosodic characteristic within the speech output.Type: GrantFiled: March 25, 2003Date of Patent: October 9, 2007Assignee: International Business Machines CorporationInventor: Oscar J. Blass
-
Patent number: 7277856Abstract: A speech synthesis system for controlling a discontinuous distortion that occurs at the transition portion between concatenated phonemes which are speech units of a synthesized speech using a smoothing technique, comprising: a discontinuous distortion processing means adapted to predict a discontinuity at the transition portion between concatenated samples of phonemes used for a speech synthesis through a predetermined learning process, and control a discontinuity at the transition portion between the concatenated phonemes of the synthesized speech in such a fashion that it is smoothed adaptively to correspond to a degree of the predicted discontinuity. The smoothing filter smoothes the synthesized speech so that the discontinuity degree of synthesized speech follows the predicted discontinuity degree according to the filter coefficient (a) changed adaptively to correspond to a ratio of the predicted discontinuity degree to the real discontinuity degree.Type: GrantFiled: October 31, 2002Date of Patent: October 2, 2007Assignee: Samsung Electronics Co., Ltd.Inventors: Ki-seung Lee, Jeong-su Kim, Jae-won Lee
-
Patent number: 7266497Abstract: Systems and methods for automatically segmenting speech inventories. A set of Hidden Markov Models (HMMs) are initialized using bootstrap data. The HMMs are next re-estimated and aligned to produce phone labels. The phone boundaries of the phone labels are then corrected using spectral boundary correction. Optionally, this process of using the spectral-boundary-corrected phone labels as input instead of the bootstrap data is performed iteratively in order to further reduce mismatches between manual labels and phone labels assigned by the HMM approach.Type: GrantFiled: January 14, 2003Date of Patent: September 4, 2007Assignee: AT&T Corp.Inventors: Alistair D. Conkie, Yeon-Jun Kim
-
Patent number: 7233901Abstract: A system and computer-readable medium synthesize speech from text using a triphone unit selection database. The instructions on the computer-readable medium control a computing device to perform the steps: receiving input text, selecting a plurality of N phoneme units from the triphone unit selection database as candidate phonemes for synthesized speech based on the input text, applying a cost process to select a set of phonemes from the candidate phonemes and synthesizing speech using the selected set of phonemes.Type: GrantFiled: December 30, 2005Date of Patent: June 19, 2007Assignee: AT&T Corp.Inventor: Alistair D. Conkie
-
Patent number: 7171362Abstract: The assignment of phonemes to graphemes producing them in a lexicon having words (grapheme sequences) and their associated phonetic transcription (phoneme sequences) for the preparation of patterns for training neural networks for the purpose of grapheme-phoneme conversion is carried out with the aid of a variant of dynamic programming which is known as dynamic time warping (DTW).Type: GrantFiled: August 31, 2001Date of Patent: January 30, 2007Assignee: Siemens AktiengesellschaftInventor: Horst-Udo Hain
-
Patent number: 7139712Abstract: A second phoneme is generated in consideration of a phonemic context with respect to a first phoneme as a search target. Phonemic piece data corresponding to the second phoneme is searched out from a database. A third phoneme is generated by changing the phonemic context on the basis of the search result, and phonemic piece data corresponding to the third phoneme is re-searched out from the database. The search or re-search result is registered in a table in correspondence with the second or third phoneme.Type: GrantFiled: March 5, 1999Date of Patent: November 21, 2006Assignee: Canon Kabushiki KaishaInventor: Masayuki Yamada
-
Patent number: 7124083Abstract: A system and method for improving the response time of text-to-speech synthesis utilizes “triphone contexts” (i.e., triplets comprising a central phoneme and its immediate context) as the basic unit, instead of performing phoneme-by-phoneme synthesis. The method comprises a method of generating a triphone preselection cost database for use in speech synthesis, the method comprising 1) selecting a triphone sequence u1-u2-u3, 2) calculating a preselection cost for each 5-phoneme sequence ua-u1-u2-u3-ub, where u2 is allowed to match any identically labeled phoneme in a database and the units ua and ub vary over the entire phoneme universe and 3) storing a group of the selected triphone sequences exhibiting the lowest costs in a triphone preselection cost database.Type: GrantFiled: November 5, 2003Date of Patent: October 17, 2006Assignee: AT&T Corp.Inventor: Alistair D. Conkie
-
Patent number: 7120584Abstract: A method and system for synthesizing audio speech is provided. A synthesis engine receives from a host, compressed and normalized speech units and prosodic information. The synthesis engine decompresses data and synthesizes audio signals. The synthesis engine can be implemented on a digital signal processing system which can meet requirements of low resources (i.e. low power consumption, lower memory usage), such as a DSP system including an input/output module, a WOLA filterbank and a DSP core that operate in parallel.Type: GrantFiled: October 22, 2002Date of Patent: October 10, 2006Assignee: AMI Semiconductor, Inc.Inventors: Hamid Sheikhzadeh-Nadjar, Etienne Cornu, Robert L. Brennan
-
Patent number: 7082396Abstract: A speech synthesis system can select recorded speech fragments, or acoustic units, from a very large database of acoustic units to produce artificial speech. The selected acoustic units are chosen to minimize a combination of target and concatenation costs for a given sentence. However, as concatenation costs, which are measures of the mismatch between sequential pairs of acoustic units, are expensive to compute, processing can be greatly reduced by pre-computing and caching the concatenation costs. Unfortunately, the number of possible sequential pairs of acoustic units makes such caching prohibitive. However, statistical experiments reveal that while about 85% of the acoustic units are typically used in common speech, less than 1% of the possible sequential pairs of acoustic units occur in practice.Type: GrantFiled: December 19, 2003Date of Patent: July 25, 2006Assignee: AT&T CorpInventors: Mark C. Beutnagel, Mehryar Mohri, Michael D. Riley
-
Patent number: 7076426Abstract: An enhanced system is achieved by allowing bookmarks which can specify that the stream of bits that follow corresponds to phonemes and a plurality of prosody information, including duration information, that is specified for times within the duration of the phonemes. Illustratively, such a stream comprises a flag to enable a duration flag, a flag to enable a pitch contour flag, a flag to enable an energy contour flag, a specification of the number of phonemes that follow, and, for each phoneme, one or more sets of specific prosody information that relates to the phoneme, such as a set of pitch values and their durations.Type: GrantFiled: January 27, 1999Date of Patent: July 11, 2006Assignee: AT&T Corp.Inventors: Mark Charles Beutnagel, Joern Ostermann, Schuyler Reynier Quackenbush
-
Patent number: 7069217Abstract: A synthesizer is disclosed in which a speech waveform is synthesized by selecting a synthetic starting waveform segment and then generating a sequence of further segments. The further waveform segments are generated based jointly upon the value of the immediately-preceding segment and upon a model of the dynamics of an actual sound similar to that being generated. In particular, a method is disclosed of a voiced speech sound comprising calculating each new output value from the previous output value using data modeling the evolution, over a short time interval, of the voiced speech sound to be synthesized. This sequential generation of waveform segments enables a synthesized sequence of speech waveforms to be generated of any duration. In addition, a low-dimensional state space representation of speech signals are used in which successive pitch pulse cycles are superimposed to estimate the progression of the cyclic speech signal within each cycle.Type: GrantFiled: January 9, 1997Date of Patent: June 27, 2006Assignee: British Telecommunications PLCInventors: Stephen McLaughlin, Michael Banbrook
-
Patent number: 7062440Abstract: A speech system has a speech input channel including a speech recognizer, and a speech output channel including a text-to-speech converter. Associated with the input channel is a barge-in control for setting barge-in behavior parameters determining how the apparatus handles barge-in by a user during speech output by the apparatus. In order to make the barge-in control more responsive to the actual speech output from the output channel, a barge-in prediction arrangement is provided that is responsive to feature values produced during the operation of the text-to-speech converter to produce indications as to the most likely barge-in points. The barge-in control is responsive to these indications to adjust at least one of the barge-in behavior parameters for periods corresponding to the most likely barge-in points.Type: GrantFiled: May 31, 2002Date of Patent: June 13, 2006Assignee: Hewlett-Packard Development Company, L.P.Inventors: Paul St John Brittan, Roger Cecil Ferry Tucker
-
Patent number: 7043422Abstract: A method and apparatus are provided for adapting a language model to a task-specific domain. Under the method and apparatus, the relative frequency of n-grams in a small training set (i.e. task-specific training data set) and the relative frequency of n-grams in a large training set (i.e. out-of-domain training data set) are used to weight a distribution count of n-grams in the large training set. The weighted distributions are then used to form a modified language model by identifying probabilities for n-grams from the weighted distributions.Type: GrantFiled: September 4, 2001Date of Patent: May 9, 2006Assignee: Microsoft CorporationInventors: Jianfeng Gao, Mingjing Li
-
Patent number: 7031919Abstract: A speech synthesizing apparatus for synthesizing a speech waveform stores speech data, which is obtained by adding attribute information onto phoneme data, in a database. In accordance with prescribed retrieval conditions, a phoneme retrieval unit retrieves phoneme data from the speech data that has been stored in the database and retains the retrieved results in a retrieved-result storage area. A processing unit for assigning a power penalty and a processing unit for assigning a phoneme-duration penalty assign the penalties, on the basis of power and phoneme duration constituting the attribute information, to a set of phoneme data stored in the retrieved-result storage area. A processing unit for determining typical phoneme data performs sorting on the basis of the assigned penalties and, based upon the stored results, selects phoneme data to be employed in the synthesis of a speech waveform.Type: GrantFiled: August 30, 1999Date of Patent: April 18, 2006Assignee: Canon Kabushiki KaishaInventors: Yasuo Okutani, Masayuki Yamada
-
Patent number: 7016841Abstract: A singing voice synthesizing apparatus is provided, which enables achievement of a natural sounding synthesized singing voice with a good level of comprehensibility. A phoneme database stores a plurality of voice fragment data formed of voice fragments each being a single phoneme or a phoneme chain of at least two concatenated phonemes, each of the plurality of voice fragment data comprising data of a deterministic component and data of a stochastic component. A readout device that reads out from the phoneme database the voice fragment data corresponding to inputted lyrics. A duration time adjusting device adjusts time duration of the read-out voice fragment data so as to match a desired tempo and manner of singing. An adjusting device adjusts the deterministic component and the stochastic component of the read-out voice fragment so as to match a desired pitch.Type: GrantFiled: December 27, 2001Date of Patent: March 21, 2006Assignee: Yamaha CorporationInventors: Hideki Kenmochi, Xavier Serra, Jordi Bonada
-
Patent number: 7013278Abstract: A method for generating concatenative speech uses a speech synthesis input to populate a triphone-indexed database that is later used for searching and retrieval to create a phoneme string acceptable for a text-to-speech operation. Prior to initiating the “real time” synthesis process, a database is created of all possible triphone contexts by inputting a continuous stream of speech. The speech data is then analyzed to identify all possible triphone sequences in the stream, and the various units chosen for each context. During a later text-to-speech operation, the triphone contexts in the text are identified and the triphone-indexed phonemes in the database are searched to retrieve the best-matched candidates.Type: GrantFiled: September 5, 2002Date of Patent: March 14, 2006Assignee: AT&T Corp.Inventor: Alistair D. Conkie
-
Patent number: 7003461Abstract: An adaptive codebook search (ACS) algorithm is based on a set of matrix operations suitable for data processing engines supporting a single instruction multiple data (SIMD) architecture. The result is a reduction in memory access and increased parallelism to produce an overall improvement in the computational efficiency of ACS processing.Type: GrantFiled: July 9, 2002Date of Patent: February 21, 2006Assignee: Renesas Technology CorporationInventor: Clifford Tavares
-
Patent number: 6970820Abstract: The speech synthesizer is personalized to sound like or mimic the speech characteristics of an individual speaker. The individual speaker provides a quantity of enrollment data, which can be extracted from a short quantity of speech, and the system modifies the base synthesis parameters to more closely resemble those of the new speaker. More specifically, the synthesis parameters may be decomposed into speaker dependent parameters, such as context-independent parameters, and speaker independent parameters, such as context dependent parameters. The speaker dependent parameters are adapted using enrollment data from the new speaker. After adaptation, the speaker dependent parameters are combined with the speaker independent parameters to provide a set of personalized synthesis parameters.Type: GrantFiled: February 26, 2001Date of Patent: November 29, 2005Assignee: Matsushita Electric Industrial Co., Ltd.Inventors: Jean-Claude Junqua, Florent Perronnin, Roland Kuhn, Patrick Nguyen
-
Patent number: 6959277Abstract: In a conventional device for extracting voice features accurately without being influenced by noises, such as a voice recognition device, usually an input voice signal is processed first by a noise reduction system having the tap length N, and the result is FFT-processed by L-points, and then the power spectrum vector is calculated; accordingly, a one time operation requires N multiplications and (N?1) summations. The voice feature extraction device according to the invention receives a voice signal including noises from a microphone, which is processed by a window function operation unit, and thereafter FFT-processed by an FFT operation unit by L-points. A power calculation unit calculates a power spectrum vector of the input voice signal. However, a noise reduction system determines in advance a filter coefficient of this system and processes the coefficient to calculate a noise reduction coefficient, and the power spectrum vector is processed by this noise reduction system.Type: GrantFiled: June 26, 2001Date of Patent: October 25, 2005Assignee: Alpine Electronics, Inc.Inventors: Shingo Kiuchi, Toshiaki Asano, Nozomu Saito
-
Patent number: 6876968Abstract: A method and system provide for run-time modification of synthesized speech. The method includes the step of generating synthesized speech based on textual input and a plurality of run-time control parameter values. Real-time data is generated based on an input signal, where the input signal characterizes an intelligibility of the speech with regard to a listener. The method further provides for modifying one or more of the run-time control parameter values based on the real-time data such that the intelligibility of the speech increases. Modifying the parameter values at run-time as opposed to during the design stages provides a level of adaptation unachievable through conventional approaches.Type: GrantFiled: March 8, 2001Date of Patent: April 5, 2005Assignee: Matsushita Electric Industrial Co., Ltd.Inventor: Peter Veprek
-
Patent number: 6847932Abstract: Given phonetic information is divided into speech units of extended CV which is a contiguous sequence of phonemes without clear distinction containing a vowel or some vowels. Contour of vocal tract transmission function of phoneme of the speech unit of extended CV is obtained from the phoneme directory which contains a contour of vocal tract transmission function of each phoneme associated with phonetic information in a unit of extended CV. Speech waveform data is generated based on the contour of vocal tract transmission function of phoneme of the speech unit of extended CV. Speech waveform data is converted into analog voice signal.Type: GrantFiled: September 28, 2000Date of Patent: January 25, 2005Assignee: Arcadia, Inc.Inventors: Kazuyuki Ashimura, Seiichi Tenpaku
-
Patent number: 6847931Abstract: A preferred embodiment of the method for converting text to speech using a computing device having a memory is disclosed. Text, being made up of a plurality of words, is received into the memory of the computing device. A plurality of phonemes are derived from the text. Each of the phonemes is associated with a prosody record based on a database of prosody records associated with a plurality of words. A first set of the artificial intelligence rules is applied to determine context information associated with the text. The context influenced prosody changes for each of the phonemes is determined. Then a second set of rules, based on Lessac theory to determine Lessac derived prosody changes for each of the phonemes is applied. The prosody record for each of the phonemes is amended in response to the context influenced prosody changes and the Lessac derived prosody changes. Then a reading from the memory sound information associated with the phonemes is performed.Type: GrantFiled: January 29, 2002Date of Patent: January 25, 2005Assignee: Lessac Technology, Inc.Inventors: Edwin R. Addison, H. Donald Wilson, Gary Marple, Anthony H. Handal, Nancy Krebs
-
Patent number: 6845358Abstract: A prosody matching template in the form of a tree structure stores indices which point to lookup table and template information prescribing pitch and duration values that are used to add inflection to the output of a text-to-speech synthesizer. The lookup module employs a search algorithm that explores each branch of the tree, assigning penalty scores based on whether the syllable represented by a node of the tree does or does not match the corresponding syllable of the target word. The path with the lowest penalty score is selected as the index into the prosody template table. The system will add nodes by cloning existing nodes in cases where it is not possible to find a one-to-one match between the number of syllables in the target word and the number of nodes in the tree.Type: GrantFiled: January 5, 2001Date of Patent: January 18, 2005Assignee: Matsushita Electric Industrial Co., Ltd.Inventors: Nicholas Kibre, Ted H. Applebaum
-
Patent number: 6845359Abstract: A Fast Fourier Transform (FFT) based voice synthesis method 110, program product and vocoder. Sounds, e.g., speech and audio, are synthesized from multiple sine waves. Each sine wave component is represented by a small number of FFT coefficients 116. Amplitude 120 and phase 124 information of the components may be incorporated into these coefficients. The FFT coefficients corresponding to each of the components are summed 126 and, then, an inverse FFT is applied 128 to the sum to generate a time domain signal. An appropriate section is extracted 130 from the inverse transformed time domain signal as an approximation to the desired output. FFT based synthesis 110 may be combined with simple sine wave summation 100, using FFT based synthesis 110 for complex sounds, e.g., male voices and unvoiced speech, and sine wave summation 100 for simpler sounds, e.g., female voices.Type: GrantFiled: March 22, 2001Date of Patent: January 18, 2005Assignee: Motorola, Inc.Inventor: Tenkasi Ramabadran