Specialized Model Patents (Class 704/266)
  • Patent number: 8155966
    Abstract: [Problems] To convert a signal of non-audible murmur obtained through an in-vivo conduction microphone into a signal of a speech that is recognizable for (hardly misrecognized by) a receiving person with maximum accuracy.
    Type: Grant
    Filed: February 7, 2007
    Date of Patent: April 10, 2012
    Assignee: National University Corporation Nara Institute of Science and Technology
    Inventors: Tomoki Toda, Mikihiro Nakagiri, Hideki Kashioka, Kiyohiro Shikano
  • Patent number: 8145492
    Abstract: A behavior control system of a robot for learning a phoneme sequence includes a sound inputting device inputting a phoneme sequence, a sound signal learning unit operable to convert the phoneme sequence into a sound synthesis parameter and to learn or evaluate a relationship between a sound synthesis parameter of a phoneme sequence that is generated by the robot and a sound synthesis parameter used for sound imitation, and a sound synthesizer operable to generate a phoneme sequence based on the sound synthesis parameter obtained by the sound signal learning unit.
    Type: Grant
    Filed: April 6, 2005
    Date of Patent: March 27, 2012
    Assignee: Sony Corporation
    Inventor: Masahiro Fujita
  • Patent number: 8135591
    Abstract: A method and system are disclosed that train a text-to-speech synthesis system for use in speech synthesis. The method includes generating a speech database of audio files comprising domain-specific voices having various prosodies, and training a text-to-speech synthesis system using the speech database by selecting audio segments having a prosody based on at least one dialog state. The system includes a processor, a speech database of audio files, and modules for implementing the method.
    Type: Grant
    Filed: August 13, 2009
    Date of Patent: March 13, 2012
    Assignee: AT&T Intellectual Property II, L.P.
    Inventor: Horst Juergen Schroeter
  • Patent number: 8131547
    Abstract: A method and system are disclosed that automatically segment speech to generate a speech inventory. The method includes initializing a Hidden Markov Model (HMM) using seed input data, performing a segmentation of the HMM into speech units to generate phone labels, correcting the segmentation of the speech units. Correcting the segmentation of the speech units includes re-estimating the HMM based on a current version of the phone labels, embedded re-estimating of the HMM, and updating the current version of the phone labels using spectral boundary correction. The system includes modules configured to control a processor to perform steps of the method.
    Type: Grant
    Filed: August 20, 2009
    Date of Patent: March 6, 2012
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Alistair D. Conkie, Yeon-Jun Kim
  • Patent number: 8126718
    Abstract: To facilitate text-to-speech conversion of a username, a first or last name of a user associated with the username may be retrieved, and a pronunciation of the username may be determined based at least in part on whether the name forms at least part of the username. To facilitate text-to-speech conversion of a domain name having a top level domain and at least one other level domain, a pronunciation for the top level domain may be determined based at least in part upon whether the top level domain is one of a predetermined set of top level domains. Each other level domain may be searched for one or more recognized words therewithin, and a pronunciation of the other level domain may be determined based at least in part on an outcome of the search. The username and domain name may form part of a network address such as an email address, URL or URI.
    Type: Grant
    Filed: July 11, 2008
    Date of Patent: February 28, 2012
    Assignee: Research In Motion Limited
    Inventors: Matthew Bells, Jennifer Elizabeth Lhotak, Michael Angelo Nanni
  • Patent number: 8121847
    Abstract: The disclosure relates to a communication terminal having a bandwidth expansion device for expanding the bandwidth of a narrowband voice signal, on a low-frequency and/or high-frequency side, by synthesizing at least one frequency band on the basis of the narrowband voice signal. A qualitatively satisfactory bandwidth expansion is thus performed using a plurality of net bit rates. The bandwidth expansion device is further connected to a memory containing a lookup table comprising at least one parameter value for the bandwidth expansion, for at least two net bit rates of the narrowband voice signal. A method for expanding a bandwidth of a narrowband voice signal having at least two net bit rates in a communication terminal is also disclosed herein.
    Type: Grant
    Filed: October 30, 2003
    Date of Patent: February 21, 2012
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Stefano Ambrosius Klinke, Frank Lorenz
  • Patent number: 8099281
    Abstract: A device and related methods for word-sense disambiguation during a text-to-speech conversion are provided. The device, for use with a computer-based system capable of converting text data to synthesized speech, includes an identification module for identifying a homograph contained in the text data. The device also includes an assignment module for assigning a pronunciation to the homograph using a statistical test constructed from a recursive partitioning of training samples, each training sample being a word string containing the homograph. The recursive partitioning is based on determining for each training sample an order and a distance of each word indicator relative to the homograph in the training sample. An absence of one of the word indicators in a training sample is treated as equivalent to the absent word indicator being more than a predefined distance from the homograph.
    Type: Grant
    Filed: June 6, 2005
    Date of Patent: January 17, 2012
    Assignee: Nunance Communications, Inc.
    Inventor: Philip Gleason
  • Patent number: 8086456
    Abstract: A speech synthesis system can select recorded speech fragments, or acoustic units, from a very large database of acoustic units to produce artificial speech. The selected acoustic units are chosen to minimize a combination of target and concatenation costs for a given sentence. However, as concatenation costs, which are measures of the mismatch between sequential pairs of acoustic units, are expensive to compute, processing can be greatly reduced by pre-computing and caching the concatenation costs. Unfortunately, the number of possible sequential pairs of acoustic units makes such caching prohibitive. A method for constructing an efficient concatenation cost database is provided by synthesizing a large body of speech, identifying the acoustic unit sequential pairs generated and their respective concatenation costs. By constructing a concatenation cost database in this fashion, the processing power required at run-time is greatly reduced with negligible effect on speech quality.
    Type: Grant
    Filed: July 20, 2010
    Date of Patent: December 27, 2011
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Mark Charles Beutnagel, Mehryar Mohri, Michael Dennis Riley
  • Patent number: 8073696
    Abstract: A voice synthesis device is provided to include: an emotion input unit obtaining an utterance mode of a voice waveform, a prosody generation unit generating a prosody, a characteristic tone selection unit selecting a characteristic tone based on the utterance mode; and a characteristic tone temporal position estimation unit (i) judging whether or not each of phonemes included in a phonologic sequence of text is to be uttered with the characteristic tone, based on the phonologic sequence, the characteristic tone, and the prosody, and (ii) deciding a phoneme, which is an utterance position where the text is uttered with the characteristic tone. The voice synthesis device also includes an element selection unit and an element connection unit generating the voice waveform based on the phonologic sequence, the prosody, and the utterance position, so that the text is uttered in the utterance mode with the characteristic tone at the determined utterance position.
    Type: Grant
    Filed: May 2, 2006
    Date of Patent: December 6, 2011
    Assignee: Panasonic Corporation
    Inventors: Yumiko Kato, Takahiro Kamai
  • Publication number: 20110282668
    Abstract: A method of and system for speech synthesis. First and second text inputs are received in a text-to-speech system, and processed into respective first and second speech outputs corresponding to stored speech respectively from first and second speakers using a processor of the system. The second speech output of the second speaker is adapted to sound like the first speech output of the first speaker.
    Type: Application
    Filed: May 14, 2010
    Publication date: November 17, 2011
    Applicant: GENERAL MOTORS LLC
    Inventors: Jeffrey M. Stefan, Gaurav Talwar, Rathinavelu Chengalvarayan
  • Patent number: 8035643
    Abstract: Systems and methods are described, which create a mapping from a space of a source object (e.g., source facial expressions) to a space of a target object (e.g., target facial expressions). In certain implementations, the mapping is learned based a training set composed of corresponding shapes (e.g. facial expressions) in each space. The user can create the training set by selecting expressions from, for example, captured source performance data, and by sculpting corresponding target expressions. Additional target shapes (e.g., target facial expressions) can be interpolated and extrapolated from the shapes in the training set to generate corresponding shapes for potential source shapes (e.g., facial expressions).
    Type: Grant
    Filed: March 19, 2007
    Date of Patent: October 11, 2011
    Assignee: Lucasfilm Entertainment Company Ltd.
    Inventors: Frederic P. Pighin, Cary Phillips, Steve Sullivan
  • Patent number: 8036884
    Abstract: The present invention provides a method, a computer-software-product and an apparatus for enabling a determination of speech related audio data within a record of digital audio data. The method comprises steps for extracting audio features from the record of digital audio data, for classifying one or more subsections of the record of digital audio data, and for marking at least a part of the record of digital audio data classified as speech. The classification of the digital audio data record is performed on the basis of the extracted audio features and with respect to at least one predetermined audio class.
    Type: Grant
    Filed: February 24, 2005
    Date of Patent: October 11, 2011
    Assignee: Sony Deutschland GmbH
    Inventors: Yin Hay Lam, Josep Maria Sola I Caros
  • Patent number: 8032377
    Abstract: Grapheme-to-phoneme alignment quality is improved by introducing a first preliminary alignment step, followed by an enlargement step of the grapheme-set and phoneme-set, and a second alignment step based on the previously enlarged grapheme /phoneme sets. During the enlargement step, grapheme clusters and phoneme clusters are generated that become members of a new grapheme and phoneme set. The new elements are chosen using statistical information calculated using the results of the first alignment step. The enlarged sets are the new grapheme and phoneme alphabet used for the second alignment step. The lexicon is rewritten using this new alphabet before starting with the second alignment step that produces the final result.
    Type: Grant
    Filed: April 30, 2003
    Date of Patent: October 4, 2011
    Assignee: Loquendo S.p.A.
    Inventor: Paolo Massimino
  • Patent number: 8010362
    Abstract: A voice conversion rule and a rule selection parameter are stored. The voice conversion rule converts a spectral parameter vector of a source speaker to a spectral parameter vector of a target speaker. The rule selection parameter represents the spectral parameter vector of the source speaker. A first voice conversion rule of start time and a second voice conversion rule of end time in a speech unit of the source speaker are selected by the spectral parameter vector of the start time and the end time. An interpolation coefficient corresponding to the spectral parameter vector of each time in the speech unit is calculated by the first voice conversion rule and the second voice conversion rule. A third voice conversion rule corresponding to the spectral parameter vector of each time in the speech unit is calculated by interpolating the first voice conversion rule and the second voice conversion rule with the interpolation coefficient.
    Type: Grant
    Filed: January 22, 2008
    Date of Patent: August 30, 2011
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Masatsune Tamura, Takehiro Kagoshima
  • Patent number: 7996226
    Abstract: Disclosed herein are various aspects of a toolkit used for generating a TTS voice for use in a spoken dialog system. The embodiments in each case may be in the form of the system, a computer-readable medium or a method for generating the TTS voice. An embodiment of the invention relates to a method of tracking progress in developing a text-to-speech (TTS) voice. The method comprises insuring that a corpus of recorded speech contains reading errors and matches an associated written text, creating a tuple for each utterance in the corpus and tracking progress for each utterance utilizing the tuple. Various parameters may be tracked using the tuple but the tuple provides a means for enabling multiple workers to efficiently process a database of utterance in preparation of a TTS voice.
    Type: Grant
    Filed: December 15, 2009
    Date of Patent: August 9, 2011
    Assignee: AT&T Intellecutal Property II, L.P.
    Inventors: Steven Lawrence Davis, Shane Fetters, David Eugene Schulz, Beverly Gustafson, Louise Loney
  • Publication number: 20110112840
    Abstract: A synthetic sound generation method for generating a synthetic sound for making a listener recall an image of an actual sound signal which is a sound signal other than a speech signal and of which the listener knows what sound the actual sound signal is, by hearing the speech signal, comprising the steps of: extracting a signal of a predetermined frequency band of an inputted speech signal; extracting an amplitude envelope curve component of the extracted signal; extracting a signal of a predetermined frequency band of the actual sound signal which is a sound signal other than the speech signal and of which the listener knows what sound the actual sound signal is; and multiplying the amplitude envelope curve component of the inputted speech signal and the extracted predetermined frequency band signal of the actual sound signal.
    Type: Application
    Filed: February 13, 2009
    Publication date: May 12, 2011
    Applicant: OTODESIGNERS CO., LTD.
    Inventor: Shinichi Sakamoto
  • Patent number: 7921016
    Abstract: A method for providing a 3D audio work includes providing a one-ear HRTF filter and a related function synthesizer storing a related function therein, and inputting sound signals into the one-ear HRTF filter. The sound signals are converted into one-ear output sound signals which are received by one ear and synthesized to output sound signals for the other ear. A method for providing the related function includes inputting sound signals into HRTF filters of opposite ears and obtaining output sound signals which respectively act as raw signals and target signals. The raw signals are synthesized by a synthesizer to output sound signals which compare with the target signals. A related function registered in the synthesizer is accordingly regulated so as to obtain the related function which satisfies a minimum difference between the output sound signals from the synthesizer and the target signals.
    Type: Grant
    Filed: November 8, 2007
    Date of Patent: April 5, 2011
    Assignee: Foxconn Technology Co., Ltd.
    Inventor: Kuen-Ying Ou
  • Publication number: 20110054903
    Abstract: Embodiments of rich text modeling for speech synthesis are disclosed. In operation, a text-to-speech engine refines a plurality of rich context models based on decision tree-tied Hidden Markov Models (HMMs) to produce a plurality of refined rich context models. The text-to-speech engine then generates synthesized speech for an input text based at least on some of the plurality of refined rich context models.
    Type: Application
    Filed: December 2, 2009
    Publication date: March 3, 2011
    Applicant: MICROSOFT CORPORATION
    Inventors: Zhi-Jie Yan, Yao Qian, Frank Kao-Ping Soong
  • Publication number: 20110046957
    Abstract: Techniques are disclosed for frequency splicing in which speech segments used in the creation of a final speech waveform are constructed, at least in part, by combining (e.g., summing) a small number (e.g., two) of component speech segments that overlap substantially, or entirely, in time but have spectral energy that occupies disjoint, or substantially disjoint, frequency ranges. The component speech segments may be derived from speech segments produced by different speakers or from different speech segments produced by the same speaker. Depending on the embodiment, frequency splicing may supplement rule-based, concatenative, hybrid, or limited-vocabulary speech synthesis systems to provide various advantages.
    Type: Application
    Filed: August 24, 2010
    Publication date: February 24, 2011
    Applicant: NovaSpeech, LLC
    Inventors: Susan R. Hertz, Harold G. Mills
  • Patent number: 7890330
    Abstract: A method records verbal expressions of a person for use in a vehicle navigation system. The vehicle navigation system has a database including a map and text describing street names and points of interest of the map. The method includes the steps of obtaining from the database text of a word having at least one syllable, analyzing the syllable with a greedy algorithm to construct at least one text phrase comprising each syllable, such that the number of phrases is substantially minimized, converting the text phrase to at least one corresponding phonetic symbol phrase, displaying to the person the phonetic symbol phrase, the person verbally expressing each phrase of the phonetic symbol phrase, and recording the verbal expression of each phrase of the phonetic symbol phrase.
    Type: Grant
    Filed: December 30, 2006
    Date of Patent: February 15, 2011
    Assignee: Alpine Electronics Inc.
    Inventors: Inci Ozkaragoz, Benjamin Ao, William Arthur
  • Patent number: 7873517
    Abstract: A motor vehicle has a speech interface for an acoustic input of commands for operating the motor vehicle or a module of the motor vehicle. The speech interface includes a speech recognition database in which a substantial portion of commands or command components, which can be input, are stored in a version according to a pronunciation in a first language and in a version according to a pronunciation in at least a second language, and a speech recognition engine for automatically comparing an acoustic command to commands and/or command components, which are stored in the speech recognition database, in a version according to the pronunciation in the first language and to commands and/or command components, which are stored in the speech recognition database, in a version according to the pronunciation in the second language.
    Type: Grant
    Filed: November 9, 2006
    Date of Patent: January 18, 2011
    Assignee: Volkswagen of America, Inc.
    Inventors: Ramon Prieto, M. Kashif Imam, Carsten Bergmann, Wai Yin Cheung, Carly Williams
  • Patent number: 7865365
    Abstract: A method, system, and computer program product is disclosed for customizing a synthesized voice based upon audible input voice data. The input voice data is typically in the form of one or more predetermined paragraphs being read into a voice recorder. The input voice data is then analyzed for adjustable voice characteristics to determine basic voice qualities (e.g., pitch, breathiness, tone, speed; variability of any of these qualities, etc.) and to identify any “specialized speech patterns”. Based upon this analysis, the characteristics of the voice utilized to read text appearing on the screen are modified to resemble the input voice data. This allows a user of the system to easily and automatically create a voice that is familiar to the user.
    Type: Grant
    Filed: August 5, 2004
    Date of Patent: January 4, 2011
    Assignee: Nuance Communications, Inc.
    Inventors: Debbie Ann Anglin, Howard Neil Anglin, Nyralin Novella Kline
  • Publication number: 20100312562
    Abstract: A rope-jumping algorithm is employed in a Hidden Markov Model based text to speech system to determine start and end models and to modify the start and end models by setting small co-variances. Disordered acoustic parameters due to violation of parameter constraints are avoided through the modification and result in stable line frequency spectrum for the generated speech.
    Type: Application
    Filed: June 4, 2009
    Publication date: December 9, 2010
    Applicant: Microsoft Corporation
    Inventors: Wenlin Wang, Guoliang Zhang, Jingyang Xu
  • Publication number: 20100305949
    Abstract: It is possible to provide a speech synthesis device, speech synthesis method, and speech synthesis program which can improve a speech quality and reduce a calculation amount with a preferable balance between them. The speech synthesis device includes: a sub-score calculation unit (60/65) which calculates a segment selection sub-score for selecting an optimal segment; and a candidate narrowing unit (70/73) for narrowing the candidates according to the number of the candidate segments and the segment selection sub score. The speech synthesis device performs candidate narrowing by the sub score calculation unit (60/65) and the candidate narrowing unit (70/73) in the candidate selection process when generating a synthesized speech from an input text.
    Type: Application
    Filed: November 25, 2008
    Publication date: December 2, 2010
    Inventors: Masanori Kato, Yasuyuki Mitsui, Reishi Kondo
  • Patent number: 7840408
    Abstract: The present invention provides a method and apparatus for training a duration prediction model, method and apparatus for duration prediction, method and apparatus for speech synthesis. Said method for training a duration prediction model, comprising: generating an initial duration prediction model with a plurality of attributes related to duration prediction and at least part of possible attribute combinations of said plurality of attributes, in which each of said plurality of attributes and said attribute combinations is included as an item; calculating importance of each said item in said duration prediction model; deleting the item having the lowest importance calculated; re-generating a duration prediction model with the remaining items; determining whether said re-generated duration prediction model is an optimal model; and repeating said step of calculating importance and the following steps, if said duration prediction model is determined as not optimal model.
    Type: Grant
    Filed: October 19, 2006
    Date of Patent: November 23, 2010
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Lifu Yi, Jie Hao
  • Patent number: 7792673
    Abstract: An apparatus and method for adjusting the friendliness of a synthesized speech and thus generating synthesized speech of various styles in a speech synthesis system are provided. The method includes the steps of defining at least two friendliness levels; storing recorded speech data of sentences, the sentences being made up according to each of the friendliness levels; extracting at least one of prosodic characteristics for each of the friendliness levels from the recorded speech data, said prosodic characteristics including at least one of a sentence-final intonation type, boundary intonation types of intonation phrases in the sentence, and an average value of F0 of the sentence, with respect to the recorded speech data; and generating a prosodic model for each of the friendliness levels by statistically modeling the at least one of the prosodic characteristics.
    Type: Grant
    Filed: November 7, 2006
    Date of Patent: September 7, 2010
    Assignee: Electronics and Telecommunications Research Institute
    Inventors: Seung Shin Oh, Sang Hun Kim, Young Jik Lee
  • Publication number: 20100204985
    Abstract: A warping factor estimation system comprises label information generation unit that outputs voice/non-voice label information, warp model storage unit in which a probability model representing voice and non-voice occurrence probabilities is stored, and warp estimation unit that calculates a warping factor in the frequency axis direction using the probability model representing voice and non-voice occurrence probabilities, voice and non-voice labels, and a cepstrum.
    Type: Application
    Filed: September 22, 2008
    Publication date: August 12, 2010
    Inventor: Tadashi Emori
  • Patent number: 7765100
    Abstract: A method and an apparatus for recovering a line spectrum pair (LSP) parameter of a spectrum region when frame loss occurs during speech decoding and a speech decoding apparatus adopting the same are provided. The method of recovering an LSP parameter in speech decoding includes: if it is determined that a received speech packet has an erased frame, converting an LSP parameter of a previous good frame (PGF) of the erased frame or LSP parameters of the PGF and a next good frame (NGF) of the erased frame into a spectrum region and obtaining a spectrum envelope of the PGF or spectrum envelopes of the PGF and NGF; recovering a spectrum envelope of the erased frame using the spectrum envelope of the PGF or the spectrum envelopes of the PGF and NGF; and converting the recovered spectrum envelope of the erased frame into an LSP parameter of the erased frame.
    Type: Grant
    Filed: February 6, 2006
    Date of Patent: July 27, 2010
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Hosang Sung, Seungho Choi, Kihyun Choo
  • Patent number: 7761299
    Abstract: A speech synthesis system can select recorded speech fragments, or acoustic units, from a large database of acoustic units to produce artificial speech. The selected acoustic units are chosen to minimize a combination of target and concatenation costs for a given sentence. Concatenation costs are expensive to compute. Processing is reduced by pre-computing and caching the concatenation costs. The number of possible sequential pairs of acoustic units makes such caching prohibitive. A method for constructing an efficient concatenation cost database is provided by synthesizing a large body of speech, identifying the acoustic unit sequential pairs generated and their respective concatenation costs, and storing those concatenation costs likely to occur.
    Type: Grant
    Filed: March 27, 2008
    Date of Patent: July 20, 2010
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Mark Charles Beutnagel, Mehryar Mohri, Michael Dennis Riley
  • Publication number: 20100145706
    Abstract: An object of the present invention is to provide a device and a method for generating a synthesized speech that has an utterance form that matches music. A musical genre estimation unit of the speech synthesizing device estimates the musical genre to which a received music signal belongs, an utterance form selection unit references an utterance form information storage unit to determine an utterance form from the musical genre. A prosody generation unit references a prosody generation rule storage unit, selected from prosody generation rule storage units 151 to 15N according to the utterance form, and generates prosody information from a phonetic symbol sequence. A unit waveform selection unit references a unit waveform data storage unit, selected from unit waveform data storage units 161 to 16N according to the utterance form, and selects a unit waveform from the phonetic symbol sequence and the prosody information.
    Type: Application
    Filed: February 1, 2007
    Publication date: June 10, 2010
    Applicant: NEC CORPORATION
    Inventor: Masanori Kato
  • Patent number: 7725315
    Abstract: A voice enhancement system is provided for improving the perceptual quality of a processed voice signal. The system improves the perceptual quality of a received voice signal by removing unwanted noise from a voice signal recorded by a microphone or from some other source. Specifically, the system removes sounds that occur within the environment of the signal source but which are unrelated to speech. The system is especially well adapted for removing transient road noises from speech signals recorded in moving vehicles. Transient road noises include common temporal and spectral characteristics that can be modeled. A transient road noise detector employs such models to detect the presence of transient road noises in a voice signal. If transient road noises are found to be present, a transient road noise attenuator is provided to remove them from the signal.
    Type: Grant
    Filed: October 17, 2005
    Date of Patent: May 25, 2010
    Assignee: QNX Software Systems (Wavemakers), Inc.
    Inventors: Phillip A. Hetherington, Shreyas Paranjpe
  • Publication number: 20100125459
    Abstract: Exemplary embodiments provide for determining a sequence of words in a TTS system. An input text is analyzed using two models, a word n-gram model and an accent class n-gram model. A list of all possible words for each word in the input is generated for each model. Each word in each list for each model is given a score based on the probability that the word is the correct word in the sequence, based on the particular model. The two lists are combined and the two scores are combined for each word. A set of sequences of words are generated. Each sequence of words comprises a unique combination of an attribute and associated word for each word in the input. The combined score of each of word in the sequence of words is combined. A sequence of words having the highest score is selected and presented to a user.
    Type: Application
    Filed: July 1, 2009
    Publication date: May 20, 2010
    Applicant: Nuance Communications, Inc.
    Inventors: Nobuyasu Itoh, Tohru Nagano, Masafumi Nishimura, Ryuki Tachibana
  • Publication number: 20100088089
    Abstract: Synthesizing a set of digital speech samples corresponding to a selected voicing state includes dividing speech model parameters into frames, with a frame of speech model parameters including pitch information, voicing information determining the voicing state in one or more frequency regions, and spectral information. First and second digital filters are computed using, respectively, first and second frames of speech model parameters, with the frequency responses of the digital filters corresponding to the spectral information in frequency regions for which the voicing state equals the selected voicing state. A set of pulse locations are determined, and sets of first and second signal samples are produced using the pulse locations and, respectively, the first and second digital filters. Finally, the sets of first and second signal samples are combined to produce a set of digital speech samples corresponding to the selected voicing state.
    Type: Application
    Filed: August 21, 2009
    Publication date: April 8, 2010
    Applicant: DIGITAL VOICE SYSTEMS, INC.
    Inventor: John C. Hardwick
  • Patent number: 7689421
    Abstract: Described is a voice persona service by which users convert text into speech waveforms, based on user-provided parameters and voice data from a service data store. The service may be remotely accessed, such as via the Internet. The user may provide text tagged with parameters, with the text sent to a text-to-speech engine along with base or custom voice data, and the resulting waveform morphed based on the tags. The user may also provide speech. Once created, a voice persona corresponding to the speech waveform may be persisted, exchanged, made public, shared and so forth. In one example, the voice persona service receives user input and parameters, and retrieves a base or custom voice that may be edited by the user via a morphing algorithm. The service outputs a waveform, such as a .wav file for embedding in a software program, and persists the voice persona corresponding to that waveform.
    Type: Grant
    Filed: June 27, 2007
    Date of Patent: March 30, 2010
    Assignee: Microsoft Corporation
    Inventors: Yusheng Li, Min Chu, Xin Zou, Frank Kao-ping Soong
  • Publication number: 20100076768
    Abstract: Disclosed is a speech synthesizing apparatus including a segment selection unit that selects a segment suited to a target segment environment from candidate segments, includes a prosody change amount calculation unit that calculates prosody change amount of each candidate segment based on prosody information of candidate segments and the target segment environment, a selection criterion calculation unit that calculates a selection criterion based on the prosody change amount, a candidate selection unit that narrows down selection candidates based on the prosody change amount and the selection criterion, and an optimum segment search unit than searches for an optimum segment from among the narrowed-down candidate segments.
    Type: Application
    Filed: February 15, 2008
    Publication date: March 25, 2010
    Applicant: NEC CORPORATION
    Inventors: Masanori Kato, Reishi Kondo, Yasuyuki Mitsui
  • Patent number: 7684977
    Abstract: In an interface unit, an input section obtains an input signal of user's speech or the like and an input processing section processes the input signal and detects information relating to the user. On the basis of the detection result, a response contents determination section determines response contents to the user. While, a response manner adjusting section adjusts a response manner to the user, such as speech speed and the like, on the basis of the processing state of the input signal, the information relating to the user detected from the input signal, and the like.
    Type: Grant
    Filed: June 8, 2006
    Date of Patent: March 23, 2010
    Assignee: Panasonic Corporation
    Inventor: Koji Morikawa
  • Publication number: 20100049523
    Abstract: Systems and methods for providing synthesized speech in a manner that takes into account the environment where the speech is presented. A method embodiment includes, based on a listening environment and at least one other parameter associated with at least one other parameter, selecting an approach from the plurality of approaches for presenting synthesized speech in a listening environment, presenting synthesized speech according to the selected approach and based on natural language input received from a user indicating that an inability to understand the presented synthesized speech, selecting a second approach from the plurality of approaches and presenting subsequent synthesized speech using the second approach.
    Type: Application
    Filed: October 28, 2009
    Publication date: February 25, 2010
    Applicant: AT&T Corp.
    Inventors: Kenneth H. Rosen, Carroll W. Creswell, Jeffrey J. Farah, Pradeep K. Bansal, Ann K. Syrdal
  • Publication number: 20090319275
    Abstract: A speech synthesizing device, the device includes: a text accepting unit for accepting text data; an extracting unit for extracting a special character including a pictographic character, a face mark or a symbol from text data accepted by the text accepting unit; a dictionary database in which a plurality of special characters and a plurality of phonetic expressions for each special character are registered; a selecting unit for selecting a phonetic expression of an extracted special character from the dictionary database when the extracting unit extracts the special character; a converting unit for converting the text data accepted by the accepting unit to a phonogram in accordance with a phonetic expression selected by the selecting unit in association with the extracted special character; and a speech synthesizing unit for synthesizing a voice from a phonogram obtained by the converting
    Type: Application
    Filed: August 31, 2009
    Publication date: December 24, 2009
    Applicant: FUJITSU LIMITED
    Inventor: Takuya Noda
  • Publication number: 20090313025
    Abstract: A method and system are disclosed that automatically segment speech to generate a speech inventory. The method includes initializing a Hidden Markov Model (HMM) using seed input data, performing a segmentation of the HMM into speech units to generate phone labels, correcting the segmentation of the speech units. Correcting the segmentation of the speech units includes re-estimating the HMM based on a current version of the phone labels, embedded re-estimating of the HMM, and updating the current version of the phone labels using spectral boundary correction. The system includes modules configured to control a processor to perform steps of the method.
    Type: Application
    Filed: August 20, 2009
    Publication date: December 17, 2009
    Applicant: AT&T Corp.
    Inventors: Alistair D. CONKIE, Yeon-Jun KIM
  • Patent number: 7630898
    Abstract: Disclosed are various elements of a toolkit used for generating a TTS voice for use in a spoken dialog system. The embodiments in each case may be in the form of the system, a computer-readable medium or a method for generating the TTS voice. One embodiment of the invention relates to a method of generating a database for a TTS voice. The method comprises matching every spoken word associated with a TTS voice database with a smallest set of possible pronunciations for each word. The smallest set is generated by automatically determining a dialect and linguistic context using linguistic rules, empirically determining idiosyncratic speaker characteristics and determining a subject domain. The method further comprises dynamically generating a pronunciation dictionary on a word-by-word basis using the smallest set.
    Type: Grant
    Filed: September 27, 2005
    Date of Patent: December 8, 2009
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Steven Lawrence Davis, Shane Fetters, David Eugene Schulz, Beverly Gustafson, Louise Loney
  • Patent number: 7613612
    Abstract: In a voice synthesizer, an envelope acquisition portion obtains a spectral envelope of a reference frequency spectrum of a given voice. A spectrum acquisition portion obtains a collective frequency spectrum of a plurality of voices which are generated in parallel to one another. An envelope adjustment portion adjusts a spectral envelope of the collective frequency spectrum obtained by the spectrum acquisition portion so as to approximately match with the spectral envelope of the reference frequency spectrum obtained by the envelope acquisition portion. A voice generation portion generates an output voice signal from the collective frequency spectrum having the spectral envelope adjusted by the envelope adjustment portion.
    Type: Grant
    Filed: January 31, 2006
    Date of Patent: November 3, 2009
    Assignee: Yamaha Corporation
    Inventors: Hideki Kemmochi, Jordi Bonada
  • Patent number: 7606710
    Abstract: A method for text-to-pronunciation conversion includes a process for searching grapheme-phoneme segments and a three-stage process of text-to-pronunciation conversion. This method looks for a sequence of grapheme-phoneme pairs, which is referred to as a chunk, via a trained pronouncing dictionary, performs grapheme segmentation, chunk marking and a decision process on an input text, and determines a pronouncing sequence for the text. With the chunk marking, the method greatly reduces the search space on the associated phoneme graph, and thereby efficiently enhances the search speed for the candidate chunk sequences. The method keeps a high word-accuracy as well as saves computing time.
    Type: Grant
    Filed: December 21, 2005
    Date of Patent: October 20, 2009
    Assignee: Industrial Technology Research Institute
    Inventors: Nien-Chih Wang, Ching-Hsieh Lee
  • Publication number: 20090254349
    Abstract: A speech synthesizer can execute speech content editing at high speed and generate speech content easily. The speech synthesizer includes a small speech element DB (101), a small speech element selection unit (102), a small speech element concatenation unit (103), a prosody modification unit (104), a large speech element DB (105), a correspondence DB (106) that associates the small speech element DB (101) with the large speech element DB (105), a speech element candidate obtainment unit (107), a large speech element selection unit (108), and a large speech element concatenation unit (109). By editing synthetic speech using the small speech element DB (101) and performing quality enhancement on an editing result using the large speech element DB (105), speech content can be generated easily on a mobile terminal.
    Type: Application
    Filed: May 11, 2007
    Publication date: October 8, 2009
    Inventors: Yoshifumi Hirose, Yumiko Kato, Takahiro Kamai
  • Patent number: 7599838
    Abstract: Methods and systems, including computer program products, for speech animation. The system includes a speech animation server and one or more speech animation clients. The speech animation server generates speech animation content that drives the expressions and behaviors of talking agents displayed by the speech animation clients. The data used by the server includes one or more references to behavioral contexts. A behavioral context corresponds to a particular application scenario and includes a set of expressions that are appropriate to the particular application scenario. A behavioral context can also be defined as a combination of two or more other behavioral contexts. The server automatically incorporates the expressions of a particular behavioral context into any data that references the particular behavioral context.
    Type: Grant
    Filed: September 1, 2004
    Date of Patent: October 6, 2009
    Assignee: SAP Aktiengesellschaft
    Inventors: Li Gong, Townsend Duong, Andrew Yinger
  • Publication number: 20090248417
    Abstract: A method to generate a pitch contour for speech synthesis is proposed. The method is based on finding the pitch contour that maximizes a total likelihood function created by the combination of all the statistical models of the pitch contour segments of an utterance, at one or multiple linguistic levels. These statistical models are trained from a database of spoken speech, by means of a decision tree that for each linguistic level clusters the parametric representation of the pitch segments extracted from the spoken speech data with some features obtained from the text associated with that speech data. The parameterization of the pitch segments is performed in such a way, the likelihood function of any linguistic level can be expressed in terms of the parameters of one of the levels, thus allowing the maximization to be calculated with respect to the parameters of that level.
    Type: Application
    Filed: March 17, 2009
    Publication date: October 1, 2009
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Javier Latorre, Masami Akamine
  • Patent number: 7590540
    Abstract: A method for distance definition in a text-to-speech conversion system by applying Gaussian Mixture Model (GMM) to a distance definition. According to an embodiment, the text that is to be subjected to text-to-speech conversion is analyzed to obtain a text with descriptive prosody annotation; clustering is performed for samples in the obtained text; and a GMM model is generated for each cluster, to determine the distance between the sample and the corresponding GMM model.
    Type: Grant
    Filed: September 29, 2005
    Date of Patent: September 15, 2009
    Assignee: Nuance Communications, Inc.
    Inventors: Wei Z W Zhang, Xi Jun Ma, Ling Jin, Hai Xin Chai
  • Patent number: 7587320
    Abstract: Systems and methods for automatically segmenting speech inventories. A set of Hidden Markov Models (HMMs) are initialized using bootstrap data. The HMMs are next re-estimated and aligned to produce phone labels. The phone boundaries of the phone labels are then corrected using spectral boundary correction. Optionally, this process of using the spectral-boundary-corrected phone labels as input instead of the bootstrap data is performed iteratively in order to further reduce mismatches between manual labels and phone labels assigned by the HMM approach.
    Type: Grant
    Filed: August 1, 2007
    Date of Patent: September 8, 2009
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Alistair D. Conkie, Yeon-Jun Kim
  • Patent number: 7584104
    Abstract: A system, method and computer readable medium that trains a text-to-speech synthesis system for use in speech synthesis is disclosed. The method may include recording audio files of one or more live voices speaking language used in a specific domain, the audio files being recorded using various prosodies, storing the recorded audio files in a speech database; and training a text-to-speech synthesis system using the speech database, wherein the text-to-speech synthesis system selects audio selects audio segments having a prosody based on at least one dialog state and one speech act.
    Type: Grant
    Filed: September 8, 2006
    Date of Patent: September 1, 2009
    Assignee: AT&T Intellectual Property II, L.P.
    Inventor: Horst Juergen Schroeter
  • Patent number: 7574360
    Abstract: A unit selection module for Chinese Text-to-Speech (TTS) synthesis includes a probabilistic context free grammar (PCFG) parser, a latent semantic indexing (LSI) module, and a modified variable-length unit selection scheme; any Chinese sentence is firstly input and then parsed into a context-free grammar (CFG) by the PCFG parser; wherein there are several possible CFGs for every Chinese sentence, and the CFG (or the syntactic structure) with the highest probability is then taken as the best CFG (or the syntactic structure) of the Chinese sentence; the LSI module is then used to calculate the structural distance between all the candidate synthesis units and the target unit in a corpus; through the modified variable-length unit selection scheme, tagged with the dynamic programming algorithm, the units are searched to find the best synthesis unit concatenation sequence.
    Type: Grant
    Filed: July 22, 2005
    Date of Patent: August 11, 2009
    Assignee: National Cheng Kung University
    Inventors: Chung Hsien Wu, Jiun Fu Chen, Chi Chun Hsia, Jhing Fa Wang
  • Patent number: 7565293
    Abstract: A Voice User Interface is provided for interactively responding in a synthesized voice to a call from a human caller, a Text to Speech system by which text entered by an agent and interactive data are converted to synthesized speech, a morphing transformation library containing pre-computed voice transformation parameters unique to each agent affiliated with the VUI, and a switching system for transferring handling of the call between the VUI and the agent. The human agent's verbal interaction with the caller is performed in the agent's natural voice. Text transmitted by an agent to a caller and interactive data is in a synthesized voice created using the pre-computed transformation parameters corresponding to the agent's ID selected from the morphing transformation library. All speech presented to a caller is presented in approximately the same unique voice as initially presented when the call is established, thereby permitting an aurally seamless phone call, as perceived by the caller.
    Type: Grant
    Filed: May 7, 2008
    Date of Patent: July 21, 2009
    Assignee: International Business Machines Corporation
    Inventors: Oded Fuhrmann, Ron Hoory, Dan Pelleg