Abstract: A method is provided for optimizing an objective measure used to estimate mean opinion score or naturalness of synthesized speech from a speech synthesizer. The method includes using an objective measure that has components derived directly from textual information used to form synthesized utterances. The objective measure has a high correlation with mean opinion score such that a relationship can be formed between the objective measure and corresponding mean opinion score. The objective measure is altered to provide a different function of textual information derived from the utterances so as to improve the relationship between the scores of the objective measure and subjective ratings of the synthesized utterances.
Abstract: The present invention provides an application-independent and engine-independent middleware layer (204) between applications (202) and engines (206, 208). The middleware provides speech-related services to both applications (202) and engines (206, 208), thereby making it far easier for application vendors and engine vendors to bring their technology to consumers.
Type:
Grant
Filed:
December 5, 2006
Date of Patent:
May 27, 2008
Assignee:
Microsoft Corporation
Inventors:
Philipp Heinz Schmid, Ralph Lipe, Robert Chambers, Edward Connell
Abstract: A differential compression technique is disclosed for compression individual speaker models, such as Gaussian mixture models, by computing a delta model from the difference between an individual speaker model and a baseline model. Further compression may be applied to the delta model to reduce the large storage requirements generally attributed to speaker models.
Abstract: A data simplifying and merging method for a voice decoding memory system is disclosed. The method includes the steps of: reading a voice data from a non-volatile memory in a memory system; performing logic operation on the voice data in order to obtain an index; fetching corresponding decoded voice data in a table of the memory system in accordance with the index; and adding the decoded voice data to the voice data in order to obtain an original voice data.
Abstract: The intonation of speech is modified by an appropriate combination of resampling and time-domain harmonic scaling. Resampling increases (upsampling) or decreases (downsampling) the number of data points in a signal. Harmonic scaling adds or removes pitch cycles to or from a signal. The pitch of a speech signal can be increased by combining downsampling with harmonic scaling that adds an appropriate number of pitch cycles. Alternatively, pitch can be decreased by combining upsampling with harmonic scaling that removes an appropriate number of pitch cycles. The present invention can be implemented in an automated speech-therapy tool that is able to modify the intonation of prerecorded reference speech signals for playback to a user to emphasize the correct pronunciation by increasing the pitch of selected portions of words or phrases that the user had previously mispronounced.
Type:
Grant
Filed:
May 15, 2003
Date of Patent:
May 13, 2008
Assignee:
Lucent Technologies Inc.
Inventors:
Juergen Cezanne, Sunil K. Gupta, Chetan Vinchhi
Abstract: A method of classifying a spectro-temporal interval of an input audio signal (x(t)) is disclosed. A spectro-temporal interval of the input audio signal is first modelled (62 . . . 71) according to a perceptual model to provide a first representation (Rep 1). The spectro-temporal interval is then modelled (62 . . . 71) using a modified noise substituted input signal according to the same perceptual model to provide a second representation (Rep 2). The spectro-temporal interval is then classified as being noise or not based on a comparison of the first and second representations.
Type:
Grant
Filed:
May 27, 2003
Date of Patent:
May 13, 2008
Assignee:
Koninklijke Philips Electronics N. V.
Inventors:
Steven Leonardus Josephus Dimphina Elisabeth Van De Par, Jan Janto Skowronek
Abstract: A system and method for generating a video sequence having mouth movements synchronized with speech sounds are disclosed. The system utilizes a database of n-phones as the smallest selectable unit, wherein n is larger than 1 and preferably 3. The system calculates a target cost for each candidate n-phone for a target frame using a phonetic distance, coarticulation parameter, and speech rate. For each n-phone in a target sequence, the system searches for candidate n-phones that are visually similar according to the target cost. The system samples each candidate n-phone to get a same number of frames as in the target sequence and builds a video frame lattice of candidate video frames. The system assigns a joint cost to each pair of adjacent frames and searches the video frame lattice to construct the video sequence by finding the optimal path through the lattice according to the minimum of the sum of the target cost and the joint cost over the sequence.
Type:
Grant
Filed:
February 16, 2007
Date of Patent:
May 6, 2008
Assignee:
AT&T Corp.
Inventors:
Eric Cosatto, Hans Peter Graf, Fu Jie Huang
Abstract: A method of identifying patterns in a digitized acoustic signal is disclosed. The method comprises: (i) converting the digitized acoustic signal into a spatial representation being defined by a plurality of regions on a vibrating membrane, each the regions having a different vibration resonance, each the vibration resonance corresponding to a different frequency of the acoustic signal; (ii) iteratively calculating a weight function, the weight function having a spatial dependence being representative of acoustic patterns of each region of the plurality of regions; and (iii) using the weight function for converting the spatial representation into a reconstructed acoustic signal; thereby identifying the patterns in the acoustic signal.
Abstract: A system and method for speech file processing which provides users with differentially selectable speech file transcripts which can be sent to one or more other users. The speech files may be voicemail messages from which respective voicemail transcripts are created. The voicemail transcripts are provided in a user selectable format from which users may select non-contiguous portions of the transcript.
Abstract: A computer system with an appropriate set of instructions to allow the computer system to serve as a translation device for use in a structured interview between an interviewer using a first language and an interviewee in a second language in order to minimize or eliminate that need for a human translator during the acquisition of routine information. Problems addressed include: determination of the appropriate language for use as the second language, the use of a single display screen through the use of a toggle function to toggle from the first language to the second language, delivery of context sensitive audio files while controlling the number of screens to be created and presented to the interviewer, and the creation of a set of discharge instructions. A preferred set of hardware and a mobile cart is discussed.
Abstract: A speech synthesis apparatus appropriately transforms a voice characteristic of speech. The speech synthesis apparatus includes an element storing unit in which speech elements are stored, and a function storing unit in which transformation functions are stored, an adaptability judging unit which derives a degree of similarity by comparing a speech element stored in the element storing unit with an acoustic characteristic of the speech element used for generating a transformation function stored in the function storing unit. The speech synthesis apparatus also includes a selecting unit and voice characteristic transforming unit which transforms, for each speech element stored in the element storing unit, based on the degree of similarity derived by the adaptability judging unit, a voice characteristic of the speech element by applying one of the transformation functions stored in the function storing unit.
Type:
Grant
Filed:
February 13, 2006
Date of Patent:
March 25, 2008
Assignee:
Matsushita Electric Industrial Co., Ltd.
Abstract: A system and method is described for transcribing a recorded message automatically from a voicemail message system or an answering machine. The designated message may be automatically transcribed and forwarded to various archival devices, such as a printer, a facsimile machine, computer memory storage, or email. Thus, the system and method of the present invention automates the transcription process of a recorded message eliminating the need for a person to transcribe the message.
Type:
Grant
Filed:
September 28, 2001
Date of Patent:
March 18, 2008
Assignee:
AT&T Delaware Intellectual Property, Inc.
Abstract: Various embodiments directed to performing speech recognition over a data channel are described. One or more embodiments may comprise encoding voice information to be communicated over a voice channel using a voice compression algorithm, switching from said voice channel to a data channel, encoding voice information to be communicated over said data channel using said voice compression algorithm used for said voice channel, and communicating encoded voice information over said data channel. Other embodiments are described and claimed.
Abstract: A translated text creator translates a text in which an unknown word is left in an original language representation without being translated, while known words are translated. Translated text created by the translated text creator is displayed. A link setter sets a link for performing a search for the unknown word in a search field of a selected Internet search engine which corresponds to a field of a subject matter of the original text.
Type:
Grant
Filed:
February 28, 2002
Date of Patent:
January 8, 2008
Assignee:
International Business Machines Corporation
Abstract: An sound encoder accepts a sound signal and then produces a plurality of codes which represent the sound signal on a frame-by-frame basis. The sound encoder determines the order in which the plurality of codes is to be multiplexed into a multiplexed code based on one of the plurality of codes on a frame-by-frame basis, multiplexes the plurality of codes one by one into a multiplexed code in the determined order, and acquires an error correction code for the multiplexed code. The sound encoder then outputs the multiplexed code including the acquired error correction code added to the end thereof as a sound code.
Abstract: Various methods are disclosed for improving the readability of authored documents. The methods generally involve scanning a sentence to check for specific signs of potential writing problems, and applying associated sign-dependent decision logic to assess whether particular writing problems exist. The methods may be implemented in a computer program that makes editing suggestions to a user and/or makes edits automatically. The program may, in some cases, dim unnecessary language to reveal how the sentence will read with such language removed.
Abstract: A name entity extraction technique using language models is provided. A general language model is provided for the natural language understanding domain. A language model is also provided for each name entity. The name entity language models are added to the general language model. Each language model is considered a state. Probabilities are applied for each transition within a state and between each state. For each word in an utterance, the name extraction process determines a best current state and a best previous state. When the end of the utterance is reached, the process traces back to find the best path. Each series of words in a state other than the general language model state is identified as a name entity. A technique is provided to iteratively extract names and retrain the general language model until the probabilities do not change.
Type:
Grant
Filed:
December 10, 2002
Date of Patent:
November 20, 2007
Assignee:
International Business Machines Corporation
Abstract: A computer-implemented method is disclosed for creating a grammar to be processed by a speech recognition engine in the context of a voice-activated command system. The method includes receiving a database containing a plurality of terms and identifying a set of terms that are pronounced the same but spelled differently. The method also includes placing a single term within the grammar to represent the set of terms.
Type:
Grant
Filed:
June 30, 2004
Date of Patent:
November 20, 2007
Assignee:
Microsoft Corporation
Inventors:
Yun-Cheng Ju, David Ollason, Siddharth Bhatia
Abstract: Methods and systems for syntactically indexing and searching data sets to achieve more accurate search results are provided. Example embodiments provide a Syntactic Query Engine (“SQE”) that parses, indexes, and stores a data set, as well as processes natural language queries subsequently submitted against the data set. The SQE comprises a Query Preprocessor, a Data Set Preprocessor, a Query Builder, a Data Set Indexer, an Enhanced Natural Language Parser (“ENLP”), a data set repository, and, in some embodiments, a user interface. After preprocessing the data set, the SQE parses the data set and determines the syntactic and grammatical roles of each term to generate enhanced data representations for each object in the data set. The SQE indexes and stores these enhanced data representations in the data set repository. Upon subsequently receiving a query, the SQE parses the query similarly and searches the indexed stored data set to locate data that contains similar terms used in similar grammatical roles.
Type:
Grant
Filed:
November 8, 2001
Date of Patent:
October 16, 2007
Assignee:
Insightful Corporation
Inventors:
Giovanni B. Marchisio, Krzysztof Koperski, Jisheng Liang, Alejandro Murua, Thien Nguyen