Specialized Model Patents (Class 704/266)

Method and apparatus for diphone aliasing

Patent number: 6122616

Abstract: The present invention improves upon electronic speech synthesis using pre-recorded segments of speech to fill in for other missing segments of speech. The formalized aliasing approach of the present invention overcomes the ad hoc aliasing approach of the prior art which oftentimes generated less than satisfactory speech synthesis sound output. By formalizing the relationship between missing speech sound samples and available speech sound samples, the present invention provides a structured approach to aliasing which results in improved synthetic speech sound quality. Further, the formalized aliasing approach of the present invention can be used to lessen storage requirements for speech sound samples by only storing as many sound samples as memory capacity can support.

Type: Grant

Filed: July 3, 1996

Date of Patent: September 19, 2000

Assignee: Apple Computer, Inc.

Inventor: Caroline G. Henton
Homograph filter for speech synthesis system

Patent number: 6098042

Abstract: A homograph filter and method which increase the probability that homographs are pronounced correctly in a speech synthesis system utilizes a filter engine operating in conjunction with a set of rules. The filter engine parses a textual sentence to extract any present homographs and applies a correct set of rules to the homograph, based on an optimal search algorithm. The engine then carries out any appropriate substitution of phonetic data. Rules are primarily based on syntactic analisis, based on a priori knowledge of how each homograph is used. The rule set is classified into different categories in order to optimize the search algorithm and to allow the rules to be modified and updated incrementally without effecting the engine construction and/or performance. The search algorithm utilizes syntactic analysis to achieve optimum results.

Type: Grant

Filed: January 30, 1998

Date of Patent: August 1, 2000

Assignee: International Business Machines Corporation

Inventor: Duy Quoc Huynh
Grapheme to phoneme module for synthesizing speech alternately using pairs of four related data bases

Patent number: 6094633

Abstract: Synthetic speech is generated from conventional texts and in particular by converting text in graphemes into a text in phonemes. The grapheme text is analyzed into rimes and onsets, and each word is analyzed from the end so that earlier-occurring segments are at least partially defined by the identification of later-occurring segments. It is a particular feature that an internal string of consonants, i.e., a string of consonants preceded and followed by a vowel, is split into two portions, namely, a second portion which is contained in a database of onsets, and an earlier portion which, together with the preceding vowel or vowels, is contained in a database of rimes.

Type: Grant

Filed: December 2, 1996

Date of Patent: July 25, 2000

Assignee: British Telecommunications public limited company

Inventors: Margaret Gaved, James Hawkey
Synthesizing a voice by developing meter patterns in the direction of a time axis according to velocity and pitch of a voice

Patent number: 6088674

Abstract: Voice-generating information, comprising discrete voice data for velocity or pitch of a voice is made by dispensing the discrete data so that the voice data is not dependent on a time lag between phonemes and at the same time is present at a relative level against a reference thereof. The said information includes data on plural types of voice tone, and is stored in a voice-generating information storing section. Voice tone data indicating sound parameters for each voice element, such as phoneme for each voice tone type, is stored in a voice tone storing section. Voice data, corresponding to the type of voice tone in the voice-generating information stored in the voice-generating storing section, is selected from a plurality of voice type data stored in the voice tone storing section under control by a control section. Meter patterns, which occur successively in the direction of a time axis, are developed according to the voice-generating information.

Type: Grant

Filed: March 20, 1997

Date of Patent: July 11, 2000

Assignee: Justsystem Corp.

Inventor: Nobuhide Yamazaki
System and method of using pre-enrolled speech sub-units for efficient speech synthesis

Patent number: 6041300

Abstract: A speech recognition system is disclosed useful in, for example, hands-free voice telephone dialing applications. The system will match a spoken word (token) to one previously enrolled in the system. The system will thereafter synthesize or replay the recognized word so that the speaker can confirm that the recognized word is indeed the correct word before further action is taken. In the case of voice activated dialing, this avoids wrong numbers. The token itself is not explicitly recorded; rather, only the lefemes may be recorded from which the token can be reconstructed for playback. This greatly reduces the amount of disk space that is needed for the database as well as provides the ability to reconstruction data in real time for synthesis use by a local name recognition machine.

Type: Grant

Filed: March 21, 1997

Date of Patent: March 21, 2000

Assignee: International Business Machines Corporation

Inventors: Abraham Poovakunnel Ittycheriah, Stephane Herman Maes
Method and apparatus using decision trees to generate and score multiple pronunciations for a spelled word

Patent number: 6016471

Abstract: The mixed decision tree includes a network of yes-no questions about adjacent letters in a spelled word sequence and also about adjacent phonemes in the phoneme sequence corresponding to the spelled word sequence. Leaf nodes of the mixed decision tree provide information about which phonetic transcriptions are most probable. Using the mixed trees, scores are developed for each of a plurality of possible pronunciations, and these scores can be used to select the best pronunciation as well as to rank pronunciations in order of probability. The pronunciations generated by the system can be used in speech synthesis and speech recognition applications as well as lexicography applications.

Type: Grant

Filed: April 29, 1998

Date of Patent: January 18, 2000

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Roland Kuhn, Jean-Claude Junqua, Matteo Contolini
Apparatus and method for changing reproduction speed of speech sound and recording medium

Patent number: 5991724

Abstract: A reproduction speed of speech sound changing apparatus which reproduces speech data at a speed in which essential part thereof can be caught so that the outline of the speech sound can be grasped even when changing the reproduction speed, besides remarkably reduces the whole reproducing time wherein a reproducing speed in each predetermined period is calculated according to a parameter value in every predetermined period of speech data in accordance with such a manner that a part having a high parameter value such as high power, high pitch or the like of speech data is judged to be the part, where important contents are involved, and such part of important contents is reproduced at such a speed that the contents can be caught, while the parts other than that described above are reproduced either at such a speed that the whole reproduction of speech data can be completed within a required time, or reproduced by skipping over the parts if at thus determined reproduction speed, reproduced speech sound cannot be

Type: Grant

Filed: March 5, 1998

Date of Patent: November 23, 1999

Assignee: Fujitsu Limited

Inventors: Hideki Kojima, Shinta Kimura
Method for transmitting multiresolution audio signals in a radio frequency communication system as determined upon request by the code-rate selector

Patent number: 5974376

Abstract: The present invention relates to a method for transmitting multiresolution audio signals via wireless devices in a radio frequency communication system wherein audio signals are decomposed into levels of resolution. The audio signal is decomposed into levels including a base signal at a base transmission rate and one or more signal details and input into a code rate selector, controlled by either party to the communication. The base signal represents the coarsest resolution or quality of the signal. Each signal detail, when added to the base signal, improves the resolution of the signal by increasing the detail and the transmission rate. An audio receiving unit transmits a request for audio transmission to the audio transmitting unit. In response to the initial request, the base signal is transmitted to the audio receiving unit. If the base signal is insufficient, the sound quality can be increased incrementally by sending further requests to transmit additional signal detail from the code rate selector.

Type: Grant

Filed: October 10, 1996

Date of Patent: October 26, 1999

Assignee: Ericsson, Inc.

Inventors: Amer Hassan, David G. Matthews
Method and apparatus for reproducing speech signals and method for transmitting same

Patent number: 5926788

Abstract: An encoding unit 2 divides speech signals provided to an input terminal 10 into frames and encodes the divided signals on the frame basis to output encoding parameters such as line spectral pair (LSP) parameters, pitch, voiced(V)/unvoiced (UV) or spectral amplitude A.sub.m. The modified encoding parameter calculating unit 3 interpolates the encoding parameters for calculating modified encoding parameters associated with desired time points. A decoding unit 6 synthesizes sine waves and the noise based upon the modified encoding parameters and outputs the synthesized speech signals at an output terminal 37. Speed control can be achieved easily at an arbitrary rate over a wide range with high sound quality with the phoneme and the pitch remaining unchanged.

Type: Grant

Filed: June 17, 1996

Date of Patent: July 20, 1999

Assignee: Sony Corporation

Inventor: Masayuki Nishiguchi
Reading and pronunciation tutor

Patent number: 5920838

Abstract: A computer implemented reading tutor comprises a player for outputting a response. An input block implementing a plurality of functions such as silence detection, speech recognition, etc. captures the read material. A tutoring function compares the output of the speech recognizer to the text which was supposed to have been read and generates a response, as needed, based on information in a knowledge base and an optional student model. The response is output to the user through the player. A quality control function evaluates the captured read material and stores the captured material in the knowledge base under certain conditions. An auto-enhancement function uses information available to the tutor to create additional resources such as identifying rhyming words, words with common roots, etc., which can be used as responses.

Type: Grant

Filed: June 2, 1997

Date of Patent: July 6, 1999

Assignee: Carnegie Mellon University

Inventors: Jack Mostow, Gregory S. Aist
High quality concatenative reading system

Patent number: 5878393

Abstract: Computer-stored text, such as numerical information, is processed by a word list generator to develop a word list corresponding to those words that are to be spoken by the system. The word list generator assigns a prosodic environment state or token to each entry in the list. The prosodic environment identifies how the word functions in its current prosodic context. Different intonations are applied based on the prosodic environment. Next, the preceding and adjacent words are examined to determine how each word may need to be pronounced differently, based on the ending phoneme of the preceding word and the beginning phoneme of the following word. Using this phonological information along with the prosodic information, a sample list is generated by accessing a dictionary of stored samples. The sample list is then serially played through suitable digital-to-analog conversion circuitry to generate the text-to-speech output.

Type: Grant

Filed: September 9, 1996

Date of Patent: March 2, 1999

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Kazue Hata, Nicholas Kibre
Method and apparatus for automatic generation of vocal emotion in a synthetic text-to-speech system

Patent number: 5860064

Abstract: A method and apparatus for the automatic application of vocal emotion parameters to text in a text-to-speech system. Predefining vocal parameters for various vocal emotions allows simple selection and application of vocal emotions to text to be output from a text-to-speech system. Further, the present invention is capable of generating vocal emotion with the limited prosodic controls available in a concatenative synthesizer.

Type: Grant

Filed: February 24, 1997

Date of Patent: January 12, 1999

Assignee: Apple Computer, Inc.

Inventor: Caroline G. Henton
Control of speaker recognition characteristics of a multiple speaker speech synthesizer

Patent number: 5857170

Abstract: A speech synthesizing apparatus for varying a speech characteristic condition is adapted to accept a speech request that does not have a speech characteristic condition and to synthesize a speech responsive thereto. A controlling portion accepts a plurality of speech requests; a speech synthesizing portion switches a plurality of speech characteristics for speech synthesis; a speaker outputs a speech corresponding to an output signal of the speech synthesizing portion; and a synthesizer characteristic table stores speech characteristic conditions synthesized by the speech synthesizing portion. The controlling portion can accept a speech request that does not have a speech characteristic condition. Then, the controlling portion selects an available speech characteristic condition from a synthesizer characteristic table and sends the selected speech characteristic condition to the speech synthesizer.

Type: Grant

Filed: August 14, 1995

Date of Patent: January 5, 1999

Assignee: NEC Corporation

Inventor: Reishi Kondo
Reproducing apparatus

Patent number: 5845247

Abstract: The reproducing apparatus of the invention reproduces a plurality of band signals which have been subjected to a band division and includes a time-scale modifier which receives the plurality of band signals and time-axis compresses the respective band signals at the same ratio, thereby outputting a plurality of time-axis compressed band signals and a synthesis filter bank for synthesizing the plurality of time-axis compressed band signals.

Type: Grant

Filed: September 11, 1996

Date of Patent: December 1, 1998

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventor: Shuji Miyasaka
Methods for controlling the generation of speech from text representing one or more names

Patent number: 5832435

Abstract: Improved automated synthesis of human audible speech from text is disclosed. Performance enhancement of the underlying text comprehensibility is obtained through prosodic treatment of the synthesized material, improved speaking rate treatment, and improved methods of spelling words or terms for the system user. Prosodic shaping of text sequences appropriate for the discourse in large groupings of text segments, with prosodic boundaries developed to indicate conceptual units within the text groupings, is implemented in a preferred embodiment.

Type: Grant

Filed: January 29, 1997

Date of Patent: November 3, 1998

Assignee: Nynex Science & Technology Inc.

Inventor: Kim Ernest Alexander Silverman
Method and apparatus for automatic assignment of duration values for synthetic speech

Patent number: 5832434

Abstract: The present invention automatically determines sound duration values, based on context, for phonetic symbols which are produced during text-to-speech conversion. The context-dependent and static attributes of the phonetic symbols are checked and specified. Then, the phonetic symbols are processed by a set of sequential duration-specification rules which set the duration value for each phonetic symbol.

Type: Grant

Filed: January 17, 1997

Date of Patent: November 3, 1998

Assignee: Apple Computer, Inc.

Inventor: Scott E. Meredith
Split matrix quantization

Patent number: 5819224

Abstract: A speech synthesis system in which coefficients of a speech synthesis filter are quantized. An LSP or other filter coefficient representation which evolves slowly with time is generated for each of a series of N input speech frames to produce p coefficients in respect of each frame. The coefficients related to the N frames define a p.times.N matrix, with each row of the matrix containing N coefficients and each coefficient of one row being related to a respective one of the N frames. The matrix is split into a series of submatrices each made up from one or more of the rows, and each submatrix is vector quantized independently of the other submatrices using a composite time/spectral weighting function which for example emphasises distortion associated with high energy regions of the spectrum of each of the N input speech frames and is also proportional to the energy and degree of voicing of each of the N input speech frames.

Type: Grant

Filed: April 1, 1996

Date of Patent: October 6, 1998

Assignee: The Victoria University of Manchester

Inventor: Costas Xydeas
Grapheme-to-phoneme conversion of digit strings using weighted finite state transducers to apply grammar to powers of a number basis

Patent number: 5781884

Abstract: The present invention provides a method of expanding a string of one or more digits to form a verbal equivalent using weighted finite state transducers. The method provides a grammatical description that expands the string into a numeric concept represented by a sum of powers of a base number system, compiles the grammatical description into a first weighted finite state transducer, provides a language specific grammatical description for verbally expressing the numeric concept, compiles the language specific grammatical description into a second weighted finite state transducer, composes the first and second finite state transducers to form a third weighted finite state transducer from which the verbal equivalent of the string can be synthesized, and synthesizes the verbal equivalent from the third weighted finite state transducer.

Type: Grant

Filed: November 22, 1996

Date of Patent: July 14, 1998

Assignee: Lucent Technologies, Inc.

Inventors: Fernando Carlos Neves Pereira, Michael Dennis Riley, Richard William Sproat
Very low bit rate voice messaging system using asymmetric voice compression processing

Patent number: 5781882

Abstract: An apparatus and method for processing a voice message to provide low bit rate speech transmission processes the voice message to generate speech parameters which are arranged into a two dimensional parameter matrix (502) including a sequence of parameter frames. The two dimensional parameter matrix (502) is transformed using a predetermined two dimensional matrix transformation function (414) to obtain a two dimensional transform matrix (506). Distance values representing distances between templates of a set of predetermined templates and the two dimensional transform matrix (506) are then derived. The distance values derived are identified by indexes identifying the templates of the set of predetermined templates. The distance values derived are compared, and an index corresponding to a template of the set of predetermined templates having a shortest distance is selected and then transmitted.

Type: Grant

Filed: September 14, 1995

Date of Patent: July 14, 1998

Assignee: Motorola, Inc.

Inventors: Walter Lee Davis, Jian-Cheng Huang, Leon Jasinski
System for building a language model network for speech recognition

Patent number: 5765133

Abstract: A system for recognizing continuous speech, for example for automatic dictation applications, uses a bigramme language model organized as a network with finite probability states. The system also uses methods of estimating the probabilities associated with the bigrammes and of representing the model of the language in a tree-like probability network.

Type: Grant

Filed: March 15, 1996

Date of Patent: June 9, 1998

Assignee: Istituto Trentino Di Cultura

Inventors: Giuliano Antoniol, Fabio Brugnara, Mauro Cettolo, Marcello Federico
Adaptive methods for controlling the annunciation rate of synthesized speech

Patent number: 5749071

Abstract: Improved automated synthesis of human audible speech from text is disclosed. Performance enhancement of the underlying text comprehensibility is obtained through prosodic treatment of the synthesized material, improved speaking rate treatment, and improved methods of spelling words or terms for the sysstem user. Prosodic shaping of text sequences appropriate for the discourse in large groupings of text segments, with prosodic boundaries developed to indicate conceptual units within the text groupings, is implemented in a preferred embodiment.

Type: Grant

Filed: January 29, 1997

Date of Patent: May 5, 1998

Assignee: Nynex Science and Technology, Inc.

Inventor: Kim Ernest Alexander Silverman
Methods for controlling the generation of speech from text representing names and addresses

Patent number: 5732395

Abstract: Improved automated synthesis of human audible speech from text is disclosed. Performance enhancement of the underlying text comprehensibility is obtained through prosodic treatment of the synthesized material, improved speaking rate treatment, and improved methods of spelling words or terms for the system user. Prosodic shaping of text sequences appropriate for the discourse in large groupings of text segments, with prosodic boundaries developed to indicate conceptual units within the text groupings, is implemented in a preferred embodiment.

Type: Grant

Filed: January 29, 1997

Date of Patent: March 24, 1998

Assignee: NYNEX Science & Technology

Inventor: Kim Ernest Alexander Silverman

prev … 2 3 4 5 6