Specialized Model Patents (Class 704/266)

Apparatus and method for producing an audible speech signal from a non-audible speech signal

Patent number: 8155966

Abstract: [Problems] To convert a signal of non-audible murmur obtained through an in-vivo conduction microphone into a signal of a speech that is recognizable for (hardly misrecognized by) a receiving person with maximum accuracy.

Type: Grant

Filed: February 7, 2007

Date of Patent: April 10, 2012

Assignee: National University Corporation Nara Institute of Science and Technology

Inventors: Tomoki Toda, Mikihiro Nakagiri, Hideki Kashioka, Kiyohiro Shikano
Robot behavior control system and method, and robot apparatus

Patent number: 8145492

Abstract: A behavior control system of a robot for learning a phoneme sequence includes a sound inputting device inputting a phoneme sequence, a sound signal learning unit operable to convert the phoneme sequence into a sound synthesis parameter and to learn or evaluate a relationship between a sound synthesis parameter of a phoneme sequence that is generated by the robot and a sound synthesis parameter used for sound imitation, and a sound synthesizer operable to generate a phoneme sequence based on the sound synthesis parameter obtained by the sound signal learning unit.

Type: Grant

Filed: April 6, 2005

Date of Patent: March 27, 2012

Assignee: Sony Corporation

Inventor: Masahiro Fujita
Method and system for training a text-to-speech synthesis system using a specific domain speech database

Patent number: 8135591

Abstract: A method and system are disclosed that train a text-to-speech synthesis system for use in speech synthesis. The method includes generating a speech database of audio files comprising domain-specific voices having various prosodies, and training a text-to-speech synthesis system using the speech database by selecting audio segments having a prosody based on at least one dialog state. The system includes a processor, a speech database of audio files, and modules for implementing the method.

Type: Grant

Filed: August 13, 2009

Date of Patent: March 13, 2012

Assignee: AT&T Intellectual Property II, L.P.

Inventor: Horst Juergen Schroeter
Automatic segmentation in speech synthesis

Patent number: 8131547

Abstract: A method and system are disclosed that automatically segment speech to generate a speech inventory. The method includes initializing a Hidden Markov Model (HMM) using seed input data, performing a segmentation of the HMM into speech units to generate phone labels, correcting the segmentation of the speech units. Correcting the segmentation of the speech units includes re-estimating the HMM based on a current version of the phone labels, embedded re-estimating of the HMM, and updating the current version of the phone labels using spectral boundary correction. The system includes modules configured to control a processor to perform steps of the method.

Type: Grant

Filed: August 20, 2009

Date of Patent: March 6, 2012

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Alistair D. Conkie, Yeon-Jun Kim
Facilitating text-to-speech conversion of a username or a network address containing a username

Patent number: 8126718

Abstract: To facilitate text-to-speech conversion of a username, a first or last name of a user associated with the username may be retrieved, and a pronunciation of the username may be determined based at least in part on whether the name forms at least part of the username. To facilitate text-to-speech conversion of a domain name having a top level domain and at least one other level domain, a pronunciation for the top level domain may be determined based at least in part upon whether the top level domain is one of a predetermined set of top level domains. Each other level domain may be searched for one or more recognized words therewithin, and a pronunciation of the other level domain may be determined based at least in part on an outcome of the search. The username and domain name may form part of a network address such as an email address, URL or URI.

Type: Grant

Filed: July 11, 2008

Date of Patent: February 28, 2012

Assignee: Research In Motion Limited

Inventors: Matthew Bells, Jennifer Elizabeth Lhotak, Michael Angelo Nanni
Communication terminal with a parameterised bandwidth expansion, and method for the bandwidth expansion thereof

Patent number: 8121847

Abstract: The disclosure relates to a communication terminal having a bandwidth expansion device for expanding the bandwidth of a narrowband voice signal, on a low-frequency and/or high-frequency side, by synthesizing at least one frequency band on the basis of the narrowband voice signal. A qualitatively satisfactory bandwidth expansion is thus performed using a plurality of net bit rates. The bandwidth expansion device is further connected to a memory containing a lookup table comprising at least one parameter value for the bandwidth expansion, for at least two net bit rates of the narrowband voice signal. A method for expanding a bandwidth of a narrowband voice signal having at least two net bit rates in a communication terminal is also disclosed herein.

Type: Grant

Filed: October 30, 2003

Date of Patent: February 21, 2012

Assignee: Hewlett-Packard Development Company, L.P.

Inventors: Stefano Ambrosius Klinke, Frank Lorenz
System and method for word-sense disambiguation by recursive partitioning

Patent number: 8099281

Abstract: A device and related methods for word-sense disambiguation during a text-to-speech conversion are provided. The device, for use with a computer-based system capable of converting text data to synthesized speech, includes an identification module for identifying a homograph contained in the text data. The device also includes an assignment module for assigning a pronunciation to the homograph using a statistical test constructed from a recursive partitioning of training samples, each training sample being a word string containing the homograph. The recursive partitioning is based on determining for each training sample an order and a distance of each word indicator relative to the homograph in the training sample. An absence of one of the word indicators in a training sample is treated as equivalent to the absent word indicator being more than a predefined distance from the homograph.

Type: Grant

Filed: June 6, 2005

Date of Patent: January 17, 2012

Assignee: Nunance Communications, Inc.

Inventor: Philip Gleason
Methods and apparatus for rapid acoustic unit selection from a large speech corpus

Patent number: 8086456

Abstract: A speech synthesis system can select recorded speech fragments, or acoustic units, from a very large database of acoustic units to produce artificial speech. The selected acoustic units are chosen to minimize a combination of target and concatenation costs for a given sentence. However, as concatenation costs, which are measures of the mismatch between sequential pairs of acoustic units, are expensive to compute, processing can be greatly reduced by pre-computing and caching the concatenation costs. Unfortunately, the number of possible sequential pairs of acoustic units makes such caching prohibitive. A method for constructing an efficient concatenation cost database is provided by synthesizing a large body of speech, identifying the acoustic unit sequential pairs generated and their respective concatenation costs. By constructing a concatenation cost database in this fashion, the processing power required at run-time is greatly reduced with negligible effect on speech quality.

Type: Grant

Filed: July 20, 2010

Date of Patent: December 27, 2011

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Mark Charles Beutnagel, Mehryar Mohri, Michael Dennis Riley
Voice synthesis device

Patent number: 8073696

Abstract: A voice synthesis device is provided to include: an emotion input unit obtaining an utterance mode of a voice waveform, a prosody generation unit generating a prosody, a characteristic tone selection unit selecting a characteristic tone based on the utterance mode; and a characteristic tone temporal position estimation unit (i) judging whether or not each of phonemes included in a phonologic sequence of text is to be uttered with the characteristic tone, based on the phonologic sequence, the characteristic tone, and the prosody, and (ii) deciding a phoneme, which is an utterance position where the text is uttered with the characteristic tone. The voice synthesis device also includes an element selection unit and an element connection unit generating the voice waveform based on the phonologic sequence, the prosody, and the utterance position, so that the text is uttered in the utterance mode with the characteristic tone at the determined utterance position.

Type: Grant

Filed: May 2, 2006

Date of Patent: December 6, 2011

Assignee: Panasonic Corporation

Inventors: Yumiko Kato, Takahiro Kamai
SPEECH ADAPTATION IN SPEECH SYNTHESIS

Publication number: 20110282668

Abstract: A method of and system for speech synthesis. First and second text inputs are received in a text-to-speech system, and processed into respective first and second speech outputs corresponding to stored speech respectively from first and second speakers using a processor of the system. The second speech output of the second speaker is adapted to sound like the first speech output of the first speaker.

Type: Application

Filed: May 14, 2010

Publication date: November 17, 2011

Applicant: GENERAL MOTORS LLC

Inventors: Jeffrey M. Stefan, Gaurav Talwar, Rathinavelu Chengalvarayan
Animation retargeting

Patent number: 8035643

Abstract: Systems and methods are described, which create a mapping from a space of a source object (e.g., source facial expressions) to a space of a target object (e.g., target facial expressions). In certain implementations, the mapping is learned based a training set composed of corresponding shapes (e.g. facial expressions) in each space. The user can create the training set by selecting expressions from, for example, captured source performance data, and by sculpting corresponding target expressions. Additional target shapes (e.g., target facial expressions) can be interpolated and extrapolated from the shapes in the training set to generate corresponding shapes for potential source shapes (e.g., facial expressions).

Type: Grant

Filed: March 19, 2007

Date of Patent: October 11, 2011

Assignee: Lucasfilm Entertainment Company Ltd.

Inventors: Frederic P. Pighin, Cary Phillips, Steve Sullivan
Identification of the presence of speech in digital audio data

Patent number: 8036884

Abstract: The present invention provides a method, a computer-software-product and an apparatus for enabling a determination of speech related audio data within a record of digital audio data. The method comprises steps for extracting audio features from the record of digital audio data, for classifying one or more subsections of the record of digital audio data, and for marking at least a part of the record of digital audio data classified as speech. The classification of the digital audio data record is performed on the basis of the extracted audio features and with respect to at least one predetermined audio class.

Type: Grant

Filed: February 24, 2005

Date of Patent: October 11, 2011

Assignee: Sony Deutschland GmbH

Inventors: Yin Hay Lam, Josep Maria Sola I Caros
Grapheme to phoneme alignment method and relative rule-set generating system

Patent number: 8032377

Abstract: Grapheme-to-phoneme alignment quality is improved by introducing a first preliminary alignment step, followed by an enlargement step of the grapheme-set and phoneme-set, and a second alignment step based on the previously enlarged grapheme /phoneme sets. During the enlargement step, grapheme clusters and phoneme clusters are generated that become members of a new grapheme and phoneme set. The new elements are chosen using statistical information calculated using the results of the first alignment step. The enlarged sets are the new grapheme and phoneme alphabet used for the second alignment step. The lexicon is rewritten using this new alphabet before starting with the second alignment step that produces the final result.

Type: Grant

Filed: April 30, 2003

Date of Patent: October 4, 2011

Assignee: Loquendo S.p.A.

Inventor: Paolo Massimino
Voice conversion using interpolated speech unit start and end-time conversion rule matrices and spectral compensation on its spectral parameter vector

Patent number: 8010362

Abstract: A voice conversion rule and a rule selection parameter are stored. The voice conversion rule converts a spectral parameter vector of a source speaker to a spectral parameter vector of a target speaker. The rule selection parameter represents the spectral parameter vector of the source speaker. A first voice conversion rule of start time and a second voice conversion rule of end time in a speech unit of the source speaker are selected by the spectral parameter vector of the start time and the end time. An interpolation coefficient corresponding to the spectral parameter vector of each time in the speech unit is calculated by the first voice conversion rule and the second voice conversion rule. A third voice conversion rule corresponding to the spectral parameter vector of each time in the speech unit is calculated by interpolating the first voice conversion rule and the second voice conversion rule with the interpolation coefficient.

Type: Grant

Filed: January 22, 2008

Date of Patent: August 30, 2011

Assignee: Kabushiki Kaisha Toshiba

Inventors: Masatsune Tamura, Takehiro Kagoshima
System and method of developing a TTS voice

Patent number: 7996226

Abstract: Disclosed herein are various aspects of a toolkit used for generating a TTS voice for use in a spoken dialog system. The embodiments in each case may be in the form of the system, a computer-readable medium or a method for generating the TTS voice. An embodiment of the invention relates to a method of tracking progress in developing a text-to-speech (TTS) voice. The method comprises insuring that a corpus of recorded speech contains reading errors and matches an associated written text, creating a tuple for each utterance in the corpus and tracking progress for each utterance utilizing the tuple. Various parameters may be tracked using the tuple but the tuple provides a means for enabling multiple workers to efficiently process a database of utterance in preparation of a TTS voice.

Type: Grant

Filed: December 15, 2009

Date of Patent: August 9, 2011

Assignee: AT&T Intellecutal Property II, L.P.

Inventors: Steven Lawrence Davis, Shane Fetters, David Eugene Schulz, Beverly Gustafson, Louise Loney
SYNTHETIC SOUND GENERATION METHOD AND APPARATUS

Publication number: 20110112840

Abstract: A synthetic sound generation method for generating a synthetic sound for making a listener recall an image of an actual sound signal which is a sound signal other than a speech signal and of which the listener knows what sound the actual sound signal is, by hearing the speech signal, comprising the steps of: extracting a signal of a predetermined frequency band of an inputted speech signal; extracting an amplitude envelope curve component of the extracted signal; extracting a signal of a predetermined frequency band of the actual sound signal which is a sound signal other than the speech signal and of which the listener knows what sound the actual sound signal is; and multiplying the amplitude envelope curve component of the inputted speech signal and the extracted predetermined frequency band signal of the actual sound signal.

Type: Application

Filed: February 13, 2009

Publication date: May 12, 2011

Applicant: OTODESIGNERS CO., LTD.

Inventor: Shinichi Sakamoto
Method and device for providing 3D audio work

Patent number: 7921016

Abstract: A method for providing a 3D audio work includes providing a one-ear HRTF filter and a related function synthesizer storing a related function therein, and inputting sound signals into the one-ear HRTF filter. The sound signals are converted into one-ear output sound signals which are received by one ear and synthesized to output sound signals for the other ear. A method for providing the related function includes inputting sound signals into HRTF filters of opposite ears and obtaining output sound signals which respectively act as raw signals and target signals. The raw signals are synthesized by a synthesizer to output sound signals which compare with the target signals. A related function registered in the synthesizer is accordingly regulated so as to obtain the related function which satisfies a minimum difference between the output sound signals from the synthesizer and the target signals.

Type: Grant

Filed: November 8, 2007

Date of Patent: April 5, 2011

Assignee: Foxconn Technology Co., Ltd.

Inventor: Kuen-Ying Ou
RICH CONTEXT MODELING FOR TEXT-TO-SPEECH ENGINES

Publication number: 20110054903

Abstract: Embodiments of rich text modeling for speech synthesis are disclosed. In operation, a text-to-speech engine refines a plurality of rich context models based on decision tree-tied Hidden Markov Models (HMMs) to produce a plurality of refined rich context models. The text-to-speech engine then generates synthesized speech for an input text based at least on some of the plurality of refined rich context models.

Type: Application

Filed: December 2, 2009

Publication date: March 3, 2011

Applicant: MICROSOFT CORPORATION

Inventors: Zhi-Jie Yan, Yao Qian, Frank Kao-Ping Soong
SYSTEM AND METHOD FOR SPEECH SYNTHESIS USING FREQUENCY SPLICING

Publication number: 20110046957

Abstract: Techniques are disclosed for frequency splicing in which speech segments used in the creation of a final speech waveform are constructed, at least in part, by combining (e.g., summing) a small number (e.g., two) of component speech segments that overlap substantially, or entirely, in time but have spectral energy that occupies disjoint, or substantially disjoint, frequency ranges. The component speech segments may be derived from speech segments produced by different speakers or from different speech segments produced by the same speaker. Depending on the embodiment, frequency splicing may supplement rule-based, concatenative, hybrid, or limited-vocabulary speech synthesis systems to provide various advantages.

Type: Application

Filed: August 24, 2010

Publication date: February 24, 2011

Applicant: NovaSpeech, LLC

Inventors: Susan R. Hertz, Harold G. Mills
Voice recording tool for creating database used in text to speech synthesis system

Patent number: 7890330

Abstract: A method records verbal expressions of a person for use in a vehicle navigation system. The vehicle navigation system has a database including a map and text describing street names and points of interest of the map. The method includes the steps of obtaining from the database text of a word having at least one syllable, analyzing the syllable with a greedy algorithm to construct at least one text phrase comprising each syllable, such that the number of phrases is substantially minimized, converting the text phrase to at least one corresponding phonetic symbol phrase, displaying to the person the phonetic symbol phrase, the person verbally expressing each phrase of the phonetic symbol phrase, and recording the verbal expression of each phrase of the phonetic symbol phrase.

Type: Grant

Filed: December 30, 2006

Date of Patent: February 15, 2011

Assignee: Alpine Electronics Inc.

Inventors: Inci Ozkaragoz, Benjamin Ao, William Arthur
Motor vehicle with a speech interface

Patent number: 7873517

Abstract: A motor vehicle has a speech interface for an acoustic input of commands for operating the motor vehicle or a module of the motor vehicle. The speech interface includes a speech recognition database in which a substantial portion of commands or command components, which can be input, are stored in a version according to a pronunciation in a first language and in a version according to a pronunciation in at least a second language, and a speech recognition engine for automatically comparing an acoustic command to commands and/or command components, which are stored in the speech recognition database, in a version according to the pronunciation in the first language and to commands and/or command components, which are stored in the speech recognition database, in a version according to the pronunciation in the second language.

Type: Grant

Filed: November 9, 2006

Date of Patent: January 18, 2011

Assignee: Volkswagen of America, Inc.

Inventors: Ramon Prieto, M. Kashif Imam, Carsten Bergmann, Wai Yin Cheung, Carly Williams
Personalized voice playback for screen reader

Patent number: 7865365

Abstract: A method, system, and computer program product is disclosed for customizing a synthesized voice based upon audible input voice data. The input voice data is typically in the form of one or more predetermined paragraphs being read into a voice recorder. The input voice data is then analyzed for adjustable voice characteristics to determine basic voice qualities (e.g., pitch, breathiness, tone, speed; variability of any of these qualities, etc.) and to identify any “specialized speech patterns”. Based upon this analysis, the characteristics of the voice utilized to read text appearing on the screen are modified to resemble the input voice data. This allows a user of the system to easily and automatically create a voice that is familiar to the user.

Type: Grant

Filed: August 5, 2004

Date of Patent: January 4, 2011

Assignee: Nuance Communications, Inc.

Inventors: Debbie Ann Anglin, Howard Neil Anglin, Nyralin Novella Kline
HIDDEN MARKOV MODEL BASED TEXT TO SPEECH SYSTEMS EMPLOYING ROPE-JUMPING ALGORITHM

Publication number: 20100312562

Abstract: A rope-jumping algorithm is employed in a Hidden Markov Model based text to speech system to determine start and end models and to modify the start and end models by setting small co-variances. Disordered acoustic parameters due to violation of parameter constraints are avoided through the modification and result in stable line frequency spectrum for the generated speech.

Type: Application

Filed: June 4, 2009

Publication date: December 9, 2010

Applicant: Microsoft Corporation

Inventors: Wenlin Wang, Guoliang Zhang, Jingyang Xu
SPEECH SYNTHESIS DEVICE, SPEECH SYNTHESIS METHOD, AND SPEECH SYNTHESIS PROGRAM

Publication number: 20100305949

Abstract: It is possible to provide a speech synthesis device, speech synthesis method, and speech synthesis program which can improve a speech quality and reduce a calculation amount with a preferable balance between them. The speech synthesis device includes: a sub-score calculation unit (60/65) which calculates a segment selection sub-score for selecting an optimal segment; and a candidate narrowing unit (70/73) for narrowing the candidates according to the number of the candidate segments and the segment selection sub score. The speech synthesis device performs candidate narrowing by the sub score calculation unit (60/65) and the candidate narrowing unit (70/73) in the candidate selection process when generating a synthesized speech from an input text.

Type: Application

Filed: November 25, 2008

Publication date: December 2, 2010

Inventors: Masanori Kato, Yasuyuki Mitsui, Reishi Kondo
Duration prediction modeling in speech synthesis

Patent number: 7840408

Abstract: The present invention provides a method and apparatus for training a duration prediction model, method and apparatus for duration prediction, method and apparatus for speech synthesis. Said method for training a duration prediction model, comprising: generating an initial duration prediction model with a plurality of attributes related to duration prediction and at least part of possible attribute combinations of said plurality of attributes, in which each of said plurality of attributes and said attribute combinations is included as an item; calculating importance of each said item in said duration prediction model; deleting the item having the lowest importance calculated; re-generating a duration prediction model with the remaining items; determining whether said re-generated duration prediction model is an optimal model; and repeating said step of calculating importance and the following steps, if said duration prediction model is determined as not optimal model.

Type: Grant

Filed: October 19, 2006

Date of Patent: November 23, 2010

Assignee: Kabushiki Kaisha Toshiba

Inventors: Lifu Yi, Jie Hao
Method of generating a prosodic model for adjusting speech style and apparatus and method of synthesizing conversational speech using the same

Patent number: 7792673

Abstract: An apparatus and method for adjusting the friendliness of a synthesized speech and thus generating synthesized speech of various styles in a speech synthesis system are provided. The method includes the steps of defining at least two friendliness levels; storing recorded speech data of sentences, the sentences being made up according to each of the friendliness levels; extracting at least one of prosodic characteristics for each of the friendliness levels from the recorded speech data, said prosodic characteristics including at least one of a sentence-final intonation type, boundary intonation types of intonation phrases in the sentence, and an average value of F0 of the sentence, with respect to the recorded speech data; and generating a prosodic model for each of the friendliness levels by statistically modeling the at least one of the prosodic characteristics.

Type: Grant

Filed: November 7, 2006

Date of Patent: September 7, 2010

Assignee: Electronics and Telecommunications Research Institute

Inventors: Seung Shin Oh, Sang Hun Kim, Young Jik Lee
FREQUENCY AXIS WARPING FACTOR ESTIMATION APPARATUS, SYSTEM, METHOD AND PROGRAM

Publication number: 20100204985

Abstract: A warping factor estimation system comprises label information generation unit that outputs voice/non-voice label information, warp model storage unit in which a probability model representing voice and non-voice occurrence probabilities is stored, and warp estimation unit that calculates a warping factor in the frequency axis direction using the probability model representing voice and non-voice occurrence probabilities, voice and non-voice labels, and a cepstrum.

Type: Application

Filed: September 22, 2008

Publication date: August 12, 2010

Inventor: Tadashi Emori
Method and apparatus for recovering line spectrum pair parameter and speech decoding apparatus using same

Patent number: 7765100

Abstract: A method and an apparatus for recovering a line spectrum pair (LSP) parameter of a spectrum region when frame loss occurs during speech decoding and a speech decoding apparatus adopting the same are provided. The method of recovering an LSP parameter in speech decoding includes: if it is determined that a received speech packet has an erased frame, converting an LSP parameter of a previous good frame (PGF) of the erased frame or LSP parameters of the PGF and a next good frame (NGF) of the erased frame into a spectrum region and obtaining a spectrum envelope of the PGF or spectrum envelopes of the PGF and NGF; recovering a spectrum envelope of the erased frame using the spectrum envelope of the PGF or the spectrum envelopes of the PGF and NGF; and converting the recovered spectrum envelope of the erased frame into an LSP parameter of the erased frame.

Type: Grant

Filed: February 6, 2006

Date of Patent: July 27, 2010

Assignee: Samsung Electronics Co., Ltd.

Inventors: Hosang Sung, Seungho Choi, Kihyun Choo
Methods and apparatus for rapid acoustic unit selection from a large speech corpus

Patent number: 7761299

Abstract: A speech synthesis system can select recorded speech fragments, or acoustic units, from a large database of acoustic units to produce artificial speech. The selected acoustic units are chosen to minimize a combination of target and concatenation costs for a given sentence. Concatenation costs are expensive to compute. Processing is reduced by pre-computing and caching the concatenation costs. The number of possible sequential pairs of acoustic units makes such caching prohibitive. A method for constructing an efficient concatenation cost database is provided by synthesizing a large body of speech, identifying the acoustic unit sequential pairs generated and their respective concatenation costs, and storing those concatenation costs likely to occur.

Type: Grant

Filed: March 27, 2008

Date of Patent: July 20, 2010

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Mark Charles Beutnagel, Mehryar Mohri, Michael Dennis Riley
Speech Synthesizing Device, Speech Synthesizing Method, and Program

Publication number: 20100145706

Abstract: An object of the present invention is to provide a device and a method for generating a synthesized speech that has an utterance form that matches music. A musical genre estimation unit of the speech synthesizing device estimates the musical genre to which a received music signal belongs, an utterance form selection unit references an utterance form information storage unit to determine an utterance form from the musical genre. A prosody generation unit references a prosody generation rule storage unit, selected from prosody generation rule storage units 151 to 15N according to the utterance form, and generates prosody information from a phonetic symbol sequence. A unit waveform selection unit references a unit waveform data storage unit, selected from unit waveform data storage units 161 to 16N according to the utterance form, and selects a unit waveform from the phonetic symbol sequence and the prosody information.

Type: Application

Filed: February 1, 2007

Publication date: June 10, 2010

Applicant: NEC CORPORATION

Inventor: Masanori Kato
Minimization of transient noises in a voice signal

Patent number: 7725315

Abstract: A voice enhancement system is provided for improving the perceptual quality of a processed voice signal. The system improves the perceptual quality of a received voice signal by removing unwanted noise from a voice signal recorded by a microphone or from some other source. Specifically, the system removes sounds that occur within the environment of the signal source but which are unrelated to speech. The system is especially well adapted for removing transient road noises from speech signals recorded in moving vehicles. Transient road noises include common temporal and spectral characteristics that can be modeled. A transient road noise detector employs such models to detect the presence of transient road noises in a voice signal. If transient road noises are found to be present, a transient road noise attenuator is provided to remove them from the signal.

Type: Grant

Filed: October 17, 2005

Date of Patent: May 25, 2010

Assignee: QNX Software Systems (Wavemakers), Inc.

Inventors: Phillip A. Hetherington, Shreyas Paranjpe
STOCHASTIC PHONEME AND ACCENT GENERATION USING ACCENT CLASS

Publication number: 20100125459

Abstract: Exemplary embodiments provide for determining a sequence of words in a TTS system. An input text is analyzed using two models, a word n-gram model and an accent class n-gram model. A list of all possible words for each word in the input is generated for each model. Each word in each list for each model is given a score based on the probability that the word is the correct word in the sequence, based on the particular model. The two lists are combined and the two scores are combined for each word. A set of sequences of words are generated. Each sequence of words comprises a unique combination of an attribute and associated word for each word in the input. The combined score of each of word in the sequence of words is combined. A sequence of words having the highest score is selected and presented to a user.

Type: Application

Filed: July 1, 2009

Publication date: May 20, 2010

Applicant: Nuance Communications, Inc.

Inventors: Nobuyasu Itoh, Tohru Nagano, Masafumi Nishimura, Ryuki Tachibana
Speech Synthesizer

Publication number: 20100088089

Abstract: Synthesizing a set of digital speech samples corresponding to a selected voicing state includes dividing speech model parameters into frames, with a frame of speech model parameters including pitch information, voicing information determining the voicing state in one or more frequency regions, and spectral information. First and second digital filters are computed using, respectively, first and second frames of speech model parameters, with the frequency responses of the digital filters corresponding to the spectral information in frequency regions for which the voicing state equals the selected voicing state. A set of pulse locations are determined, and sets of first and second signal samples are produced using the pulse locations and, respectively, the first and second digital filters. Finally, the sets of first and second signal samples are combined to produce a set of digital speech samples corresponding to the selected voicing state.

Type: Application

Filed: August 21, 2009

Publication date: April 8, 2010

Applicant: DIGITAL VOICE SYSTEMS, INC.

Inventor: John C. Hardwick
Voice persona service for embedding text-to-speech features into software programs

Patent number: 7689421

Abstract: Described is a voice persona service by which users convert text into speech waveforms, based on user-provided parameters and voice data from a service data store. The service may be remotely accessed, such as via the Internet. The user may provide text tagged with parameters, with the text sent to a text-to-speech engine along with base or custom voice data, and the resulting waveform morphed based on the tags. The user may also provide speech. Once created, a voice persona corresponding to the speech waveform may be persisted, exchanged, made public, shared and so forth. In one example, the voice persona service receives user input and parameters, and retrieves a base or custom voice that may be edited by the user via a morphing algorithm. The service outputs a waveform, such as a .wav file for embedding in a software program, and persists the voice persona corresponding to that waveform.

Type: Grant

Filed: June 27, 2007

Date of Patent: March 30, 2010

Assignee: Microsoft Corporation

Inventors: Yusheng Li, Min Chu, Xin Zou, Frank Kao-ping Soong
SPEECH SYNTHESIZING APPARATUS, METHOD, AND PROGRAM

Publication number: 20100076768

Abstract: Disclosed is a speech synthesizing apparatus including a segment selection unit that selects a segment suited to a target segment environment from candidate segments, includes a prosody change amount calculation unit that calculates prosody change amount of each candidate segment based on prosody information of candidate segments and the target segment environment, a selection criterion calculation unit that calculates a selection criterion based on the prosody change amount, a candidate selection unit that narrows down selection candidates based on the prosody change amount and the selection criterion, and an optimum segment search unit than searches for an optimum segment from among the narrowed-down candidate segments.

Type: Application

Filed: February 15, 2008

Publication date: March 25, 2010

Applicant: NEC CORPORATION

Inventors: Masanori Kato, Reishi Kondo, Yasuyuki Mitsui
User adaptive system and control method thereof

Patent number: 7684977

Abstract: In an interface unit, an input section obtains an input signal of user's speech or the like and an input processing section processes the input signal and detects information relating to the user. On the basis of the detection result, a response contents determination section determines response contents to the user. While, a response manner adjusting section adjusts a response manner to the user, such as speech speed and the like, on the basis of the processing state of the input signal, the information relating to the user detected from the input signal, and the like.

Type: Grant

Filed: June 8, 2006

Date of Patent: March 23, 2010

Assignee: Panasonic Corporation

Inventor: Koji Morikawa
SYSTEM AND METHOD FOR CONFIGURING VOICE SYNTHESIS

Publication number: 20100049523

Abstract: Systems and methods for providing synthesized speech in a manner that takes into account the environment where the speech is presented. A method embodiment includes, based on a listening environment and at least one other parameter associated with at least one other parameter, selecting an approach from the plurality of approaches for presenting synthesized speech in a listening environment, presenting synthesized speech according to the selected approach and based on natural language input received from a user indicating that an inability to understand the presented synthesized speech, selecting a second approach from the plurality of approaches and presenting subsequent synthesized speech using the second approach.

Type: Application

Filed: October 28, 2009

Publication date: February 25, 2010

Applicant: AT&T Corp.

Inventors: Kenneth H. Rosen, Carroll W. Creswell, Jeffrey J. Farah, Pradeep K. Bansal, Ann K. Syrdal
SPEECH SYNTHESIZING DEVICE, SPEECH SYNTHESIZING SYSTEM, LANGUAGE PROCESSING DEVICE, SPEECH SYNTHESIZING METHOD AND RECORDING MEDIUM

Publication number: 20090319275

Abstract: A speech synthesizing device, the device includes: a text accepting unit for accepting text data; an extracting unit for extracting a special character including a pictographic character, a face mark or a symbol from text data accepted by the text accepting unit; a dictionary database in which a plurality of special characters and a plurality of phonetic expressions for each special character are registered; a selecting unit for selecting a phonetic expression of an extracted special character from the dictionary database when the extracting unit extracts the special character; a converting unit for converting the text data accepted by the accepting unit to a phonogram in accordance with a phonetic expression selected by the selecting unit in association with the extracted special character; and a speech synthesizing unit for synthesizing a voice from a phonogram obtained by the converting

Type: Application

Filed: August 31, 2009

Publication date: December 24, 2009

Applicant: FUJITSU LIMITED

Inventor: Takuya Noda
Automatic Segmentation in Speech Synthesis

Publication number: 20090313025

Abstract: A method and system are disclosed that automatically segment speech to generate a speech inventory. The method includes initializing a Hidden Markov Model (HMM) using seed input data, performing a segmentation of the HMM into speech units to generate phone labels, correcting the segmentation of the speech units. Correcting the segmentation of the speech units includes re-estimating the HMM based on a current version of the phone labels, embedded re-estimating of the HMM, and updating the current version of the phone labels using spectral boundary correction. The system includes modules configured to control a processor to perform steps of the method.

Type: Application

Filed: August 20, 2009

Publication date: December 17, 2009

Applicant: AT&T Corp.

Inventors: Alistair D. CONKIE, Yeon-Jun KIM
System and method for preparing a pronunciation dictionary for a text-to-speech voice

Patent number: 7630898

Abstract: Disclosed are various elements of a toolkit used for generating a TTS voice for use in a spoken dialog system. The embodiments in each case may be in the form of the system, a computer-readable medium or a method for generating the TTS voice. One embodiment of the invention relates to a method of generating a database for a TTS voice. The method comprises matching every spoken word associated with a TTS voice database with a smallest set of possible pronunciations for each word. The smallest set is generated by automatically determining a dialect and linguistic context using linguistic rules, empirically determining idiosyncratic speaker characteristics and determining a subject domain. The method further comprises dynamically generating a pronunciation dictionary on a word-by-word basis using the smallest set.

Type: Grant

Filed: September 27, 2005

Date of Patent: December 8, 2009

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Steven Lawrence Davis, Shane Fetters, David Eugene Schulz, Beverly Gustafson, Louise Loney
Voice synthesizer of multi sounds

Patent number: 7613612

Abstract: In a voice synthesizer, an envelope acquisition portion obtains a spectral envelope of a reference frequency spectrum of a given voice. A spectrum acquisition portion obtains a collective frequency spectrum of a plurality of voices which are generated in parallel to one another. An envelope adjustment portion adjusts a spectral envelope of the collective frequency spectrum obtained by the spectrum acquisition portion so as to approximately match with the spectral envelope of the reference frequency spectrum obtained by the envelope acquisition portion. A voice generation portion generates an output voice signal from the collective frequency spectrum having the spectral envelope adjusted by the envelope adjustment portion.

Type: Grant

Filed: January 31, 2006

Date of Patent: November 3, 2009

Assignee: Yamaha Corporation

Inventors: Hideki Kemmochi, Jordi Bonada
Method for text-to-pronunciation conversion

Patent number: 7606710

Abstract: A method for text-to-pronunciation conversion includes a process for searching grapheme-phoneme segments and a three-stage process of text-to-pronunciation conversion. This method looks for a sequence of grapheme-phoneme pairs, which is referred to as a chunk, via a trained pronouncing dictionary, performs grapheme segmentation, chunk marking and a decision process on an input text, and determines a pronouncing sequence for the text. With the chunk marking, the method greatly reduces the search space on the associated phoneme graph, and thereby efficiently enhances the search speed for the candidate chunk sequences. The method keeps a high word-accuracy as well as saves computing time.

Type: Grant

Filed: December 21, 2005

Date of Patent: October 20, 2009

Assignee: Industrial Technology Research Institute

Inventors: Nien-Chih Wang, Ching-Hsieh Lee
SPEECH SYNTHESIZER

Publication number: 20090254349

Abstract: A speech synthesizer can execute speech content editing at high speed and generate speech content easily. The speech synthesizer includes a small speech element DB (101), a small speech element selection unit (102), a small speech element concatenation unit (103), a prosody modification unit (104), a large speech element DB (105), a correspondence DB (106) that associates the small speech element DB (101) with the large speech element DB (105), a speech element candidate obtainment unit (107), a large speech element selection unit (108), and a large speech element concatenation unit (109). By editing synthetic speech using the small speech element DB (101) and performing quality enhancement on an editing result using the large speech element DB (105), speech content can be generated easily on a mobile terminal.

Type: Application

Filed: May 11, 2007

Publication date: October 8, 2009

Inventors: Yoshifumi Hirose, Yumiko Kato, Takahiro Kamai
Speech animation with behavioral contexts for application scenarios

Patent number: 7599838

Abstract: Methods and systems, including computer program products, for speech animation. The system includes a speech animation server and one or more speech animation clients. The speech animation server generates speech animation content that drives the expressions and behaviors of talking agents displayed by the speech animation clients. The data used by the server includes one or more references to behavioral contexts. A behavioral context corresponds to a particular application scenario and includes a set of expressions that are appropriate to the particular application scenario. A behavioral context can also be defined as a combination of two or more other behavioral contexts. The server automatically incorporates the expressions of a particular behavioral context into any data that references the particular behavioral context.

Type: Grant

Filed: September 1, 2004

Date of Patent: October 6, 2009

Assignee: SAP Aktiengesellschaft

Inventors: Li Gong, Townsend Duong, Andrew Yinger
SPEECH PROCESSING APPARATUS, METHOD, AND COMPUTER PROGRAM PRODUCT

Publication number: 20090248417

Abstract: A method to generate a pitch contour for speech synthesis is proposed. The method is based on finding the pitch contour that maximizes a total likelihood function created by the combination of all the statistical models of the pitch contour segments of an utterance, at one or multiple linguistic levels. These statistical models are trained from a database of spoken speech, by means of a decision tree that for each linguistic level clusters the parametric representation of the pitch segments extracted from the spoken speech data with some features obtained from the text associated with that speech data. The parameterization of the pitch segments is performed in such a way, the likelihood function of any linguistic level can be expressed in terms of the parameters of one of the levels, thus allowing the maximization to be calculated with respect to the parameters of that level.

Type: Application

Filed: March 17, 2009

Publication date: October 1, 2009

Applicant: KABUSHIKI KAISHA TOSHIBA

Inventors: Javier Latorre, Masami Akamine
Method and system for statistic-based distance definition in text-to-speech conversion

Patent number: 7590540

Abstract: A method for distance definition in a text-to-speech conversion system by applying Gaussian Mixture Model (GMM) to a distance definition. According to an embodiment, the text that is to be subjected to text-to-speech conversion is analyzed to obtain a text with descriptive prosody annotation; clustering is performed for samples in the obtained text; and a GMM model is generated for each cluster, to determine the distance between the sample and the corresponding GMM model.

Type: Grant

Filed: September 29, 2005

Date of Patent: September 15, 2009

Assignee: Nuance Communications, Inc.

Inventors: Wei Z W Zhang, Xi Jun Ma, Ling Jin, Hai Xin Chai
Automatic segmentation in speech synthesis

Patent number: 7587320

Abstract: Systems and methods for automatically segmenting speech inventories. A set of Hidden Markov Models (HMMs) are initialized using bootstrap data. The HMMs are next re-estimated and aligned to produce phone labels. The phone boundaries of the phone labels are then corrected using spectral boundary correction. Optionally, this process of using the spectral-boundary-corrected phone labels as input instead of the bootstrap data is performed iteratively in order to further reduce mismatches between manual labels and phone labels assigned by the HMM approach.

Type: Grant

Filed: August 1, 2007

Date of Patent: September 8, 2009

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Alistair D. Conkie, Yeon-Jun Kim
Method and system for training a text-to-speech synthesis system using a domain-specific speech database

Patent number: 7584104

Abstract: A system, method and computer readable medium that trains a text-to-speech synthesis system for use in speech synthesis is disclosed. The method may include recording audio files of one or more live voices speaking language used in a specific domain, the audio files being recorded using various prosodies, storing the recorded audio files in a speech database; and training a text-to-speech synthesis system using the speech database, wherein the text-to-speech synthesis system selects audio selects audio segments having a prosody based on at least one dialog state and one speech act.

Type: Grant

Filed: September 8, 2006

Date of Patent: September 1, 2009

Assignee: AT&T Intellectual Property II, L.P.

Inventor: Horst Juergen Schroeter
Unit selection module and method of chinese text-to-speech synthesis

Patent number: 7574360

Abstract: A unit selection module for Chinese Text-to-Speech (TTS) synthesis includes a probabilistic context free grammar (PCFG) parser, a latent semantic indexing (LSI) module, and a modified variable-length unit selection scheme; any Chinese sentence is firstly input and then parsed into a context-free grammar (CFG) by the PCFG parser; wherein there are several possible CFGs for every Chinese sentence, and the CFG (or the syntactic structure) with the highest probability is then taken as the best CFG (or the syntactic structure) of the Chinese sentence; the LSI module is then used to calculate the structural distance between all the candidate synthesis units and the target unit in a corpus; through the modified variable-length unit selection scheme, tagged with the dynamic programming algorithm, the units are searched to find the best synthesis unit concatenation sequence.

Type: Grant

Filed: July 22, 2005

Date of Patent: August 11, 2009

Assignee: National Cheng Kung University

Inventors: Chung Hsien Wu, Jiun Fu Chen, Chi Chun Hsia, Jhing Fa Wang
Seamless hybrid computer human call service

Patent number: 7565293

Abstract: A Voice User Interface is provided for interactively responding in a synthesized voice to a call from a human caller, a Text to Speech system by which text entered by an agent and interactive data are converted to synthesized speech, a morphing transformation library containing pre-computed voice transformation parameters unique to each agent affiliated with the VUI, and a switching system for transferring handling of the call between the VUI and the agent. The human agent's verbal interaction with the caller is performed in the agent's natural voice. Text transmitted by an agent to a caller and interactive data is in a synthesized voice created using the pre-computed transformation parameters corresponding to the agent's ID selected from the morphing transformation library. All speech presented to a caller is presented in approximately the same unique voice as initially presented when the call is established, thereby permitting an aurally seamless phone call, as perceived by the caller.

Type: Grant

Filed: May 7, 2008

Date of Patent: July 21, 2009

Assignee: International Business Machines Corporation

Inventors: Oded Fuhrmann, Ron Hoory, Dan Pelleg

prev 1 2 3 4 5 6 next