Specialized Model Patents (Class 704/266)
-
Patent number: 6122616Abstract: The present invention improves upon electronic speech synthesis using pre-recorded segments of speech to fill in for other missing segments of speech. The formalized aliasing approach of the present invention overcomes the ad hoc aliasing approach of the prior art which oftentimes generated less than satisfactory speech synthesis sound output. By formalizing the relationship between missing speech sound samples and available speech sound samples, the present invention provides a structured approach to aliasing which results in improved synthetic speech sound quality. Further, the formalized aliasing approach of the present invention can be used to lessen storage requirements for speech sound samples by only storing as many sound samples as memory capacity can support.Type: GrantFiled: July 3, 1996Date of Patent: September 19, 2000Assignee: Apple Computer, Inc.Inventor: Caroline G. Henton
-
Patent number: 6098042Abstract: A homograph filter and method which increase the probability that homographs are pronounced correctly in a speech synthesis system utilizes a filter engine operating in conjunction with a set of rules. The filter engine parses a textual sentence to extract any present homographs and applies a correct set of rules to the homograph, based on an optimal search algorithm. The engine then carries out any appropriate substitution of phonetic data. Rules are primarily based on syntactic analisis, based on a priori knowledge of how each homograph is used. The rule set is classified into different categories in order to optimize the search algorithm and to allow the rules to be modified and updated incrementally without effecting the engine construction and/or performance. The search algorithm utilizes syntactic analysis to achieve optimum results.Type: GrantFiled: January 30, 1998Date of Patent: August 1, 2000Assignee: International Business Machines CorporationInventor: Duy Quoc Huynh
-
Patent number: 6094633Abstract: Synthetic speech is generated from conventional texts and in particular by converting text in graphemes into a text in phonemes. The grapheme text is analyzed into rimes and onsets, and each word is analyzed from the end so that earlier-occurring segments are at least partially defined by the identification of later-occurring segments. It is a particular feature that an internal string of consonants, i.e., a string of consonants preceded and followed by a vowel, is split into two portions, namely, a second portion which is contained in a database of onsets, and an earlier portion which, together with the preceding vowel or vowels, is contained in a database of rimes.Type: GrantFiled: December 2, 1996Date of Patent: July 25, 2000Assignee: British Telecommunications public limited companyInventors: Margaret Gaved, James Hawkey
-
Patent number: 6088674Abstract: Voice-generating information, comprising discrete voice data for velocity or pitch of a voice is made by dispensing the discrete data so that the voice data is not dependent on a time lag between phonemes and at the same time is present at a relative level against a reference thereof. The said information includes data on plural types of voice tone, and is stored in a voice-generating information storing section. Voice tone data indicating sound parameters for each voice element, such as phoneme for each voice tone type, is stored in a voice tone storing section. Voice data, corresponding to the type of voice tone in the voice-generating information stored in the voice-generating storing section, is selected from a plurality of voice type data stored in the voice tone storing section under control by a control section. Meter patterns, which occur successively in the direction of a time axis, are developed according to the voice-generating information.Type: GrantFiled: March 20, 1997Date of Patent: July 11, 2000Assignee: Justsystem Corp.Inventor: Nobuhide Yamazaki
-
Patent number: 6041300Abstract: A speech recognition system is disclosed useful in, for example, hands-free voice telephone dialing applications. The system will match a spoken word (token) to one previously enrolled in the system. The system will thereafter synthesize or replay the recognized word so that the speaker can confirm that the recognized word is indeed the correct word before further action is taken. In the case of voice activated dialing, this avoids wrong numbers. The token itself is not explicitly recorded; rather, only the lefemes may be recorded from which the token can be reconstructed for playback. This greatly reduces the amount of disk space that is needed for the database as well as provides the ability to reconstruction data in real time for synthesis use by a local name recognition machine.Type: GrantFiled: March 21, 1997Date of Patent: March 21, 2000Assignee: International Business Machines CorporationInventors: Abraham Poovakunnel Ittycheriah, Stephane Herman Maes
-
Patent number: 6016471Abstract: The mixed decision tree includes a network of yes-no questions about adjacent letters in a spelled word sequence and also about adjacent phonemes in the phoneme sequence corresponding to the spelled word sequence. Leaf nodes of the mixed decision tree provide information about which phonetic transcriptions are most probable. Using the mixed trees, scores are developed for each of a plurality of possible pronunciations, and these scores can be used to select the best pronunciation as well as to rank pronunciations in order of probability. The pronunciations generated by the system can be used in speech synthesis and speech recognition applications as well as lexicography applications.Type: GrantFiled: April 29, 1998Date of Patent: January 18, 2000Assignee: Matsushita Electric Industrial Co., Ltd.Inventors: Roland Kuhn, Jean-Claude Junqua, Matteo Contolini
-
Patent number: 5991724Abstract: A reproduction speed of speech sound changing apparatus which reproduces speech data at a speed in which essential part thereof can be caught so that the outline of the speech sound can be grasped even when changing the reproduction speed, besides remarkably reduces the whole reproducing time wherein a reproducing speed in each predetermined period is calculated according to a parameter value in every predetermined period of speech data in accordance with such a manner that a part having a high parameter value such as high power, high pitch or the like of speech data is judged to be the part, where important contents are involved, and such part of important contents is reproduced at such a speed that the contents can be caught, while the parts other than that described above are reproduced either at such a speed that the whole reproduction of speech data can be completed within a required time, or reproduced by skipping over the parts if at thus determined reproduction speed, reproduced speech sound cannot beType: GrantFiled: March 5, 1998Date of Patent: November 23, 1999Assignee: Fujitsu LimitedInventors: Hideki Kojima, Shinta Kimura
-
Patent number: 5974376Abstract: The present invention relates to a method for transmitting multiresolution audio signals via wireless devices in a radio frequency communication system wherein audio signals are decomposed into levels of resolution. The audio signal is decomposed into levels including a base signal at a base transmission rate and one or more signal details and input into a code rate selector, controlled by either party to the communication. The base signal represents the coarsest resolution or quality of the signal. Each signal detail, when added to the base signal, improves the resolution of the signal by increasing the detail and the transmission rate. An audio receiving unit transmits a request for audio transmission to the audio transmitting unit. In response to the initial request, the base signal is transmitted to the audio receiving unit. If the base signal is insufficient, the sound quality can be increased incrementally by sending further requests to transmit additional signal detail from the code rate selector.Type: GrantFiled: October 10, 1996Date of Patent: October 26, 1999Assignee: Ericsson, Inc.Inventors: Amer Hassan, David G. Matthews
-
Patent number: 5926788Abstract: An encoding unit 2 divides speech signals provided to an input terminal 10 into frames and encodes the divided signals on the frame basis to output encoding parameters such as line spectral pair (LSP) parameters, pitch, voiced(V)/unvoiced (UV) or spectral amplitude A.sub.m. The modified encoding parameter calculating unit 3 interpolates the encoding parameters for calculating modified encoding parameters associated with desired time points. A decoding unit 6 synthesizes sine waves and the noise based upon the modified encoding parameters and outputs the synthesized speech signals at an output terminal 37. Speed control can be achieved easily at an arbitrary rate over a wide range with high sound quality with the phoneme and the pitch remaining unchanged.Type: GrantFiled: June 17, 1996Date of Patent: July 20, 1999Assignee: Sony CorporationInventor: Masayuki Nishiguchi
-
Patent number: 5920838Abstract: A computer implemented reading tutor comprises a player for outputting a response. An input block implementing a plurality of functions such as silence detection, speech recognition, etc. captures the read material. A tutoring function compares the output of the speech recognizer to the text which was supposed to have been read and generates a response, as needed, based on information in a knowledge base and an optional student model. The response is output to the user through the player. A quality control function evaluates the captured read material and stores the captured material in the knowledge base under certain conditions. An auto-enhancement function uses information available to the tutor to create additional resources such as identifying rhyming words, words with common roots, etc., which can be used as responses.Type: GrantFiled: June 2, 1997Date of Patent: July 6, 1999Assignee: Carnegie Mellon UniversityInventors: Jack Mostow, Gregory S. Aist
-
Patent number: 5878393Abstract: Computer-stored text, such as numerical information, is processed by a word list generator to develop a word list corresponding to those words that are to be spoken by the system. The word list generator assigns a prosodic environment state or token to each entry in the list. The prosodic environment identifies how the word functions in its current prosodic context. Different intonations are applied based on the prosodic environment. Next, the preceding and adjacent words are examined to determine how each word may need to be pronounced differently, based on the ending phoneme of the preceding word and the beginning phoneme of the following word. Using this phonological information along with the prosodic information, a sample list is generated by accessing a dictionary of stored samples. The sample list is then serially played through suitable digital-to-analog conversion circuitry to generate the text-to-speech output.Type: GrantFiled: September 9, 1996Date of Patent: March 2, 1999Assignee: Matsushita Electric Industrial Co., Ltd.Inventors: Kazue Hata, Nicholas Kibre
-
Patent number: 5860064Abstract: A method and apparatus for the automatic application of vocal emotion parameters to text in a text-to-speech system. Predefining vocal parameters for various vocal emotions allows simple selection and application of vocal emotions to text to be output from a text-to-speech system. Further, the present invention is capable of generating vocal emotion with the limited prosodic controls available in a concatenative synthesizer.Type: GrantFiled: February 24, 1997Date of Patent: January 12, 1999Assignee: Apple Computer, Inc.Inventor: Caroline G. Henton
-
Patent number: 5857170Abstract: A speech synthesizing apparatus for varying a speech characteristic condition is adapted to accept a speech request that does not have a speech characteristic condition and to synthesize a speech responsive thereto. A controlling portion accepts a plurality of speech requests; a speech synthesizing portion switches a plurality of speech characteristics for speech synthesis; a speaker outputs a speech corresponding to an output signal of the speech synthesizing portion; and a synthesizer characteristic table stores speech characteristic conditions synthesized by the speech synthesizing portion. The controlling portion can accept a speech request that does not have a speech characteristic condition. Then, the controlling portion selects an available speech characteristic condition from a synthesizer characteristic table and sends the selected speech characteristic condition to the speech synthesizer.Type: GrantFiled: August 14, 1995Date of Patent: January 5, 1999Assignee: NEC CorporationInventor: Reishi Kondo
-
Patent number: 5845247Abstract: The reproducing apparatus of the invention reproduces a plurality of band signals which have been subjected to a band division and includes a time-scale modifier which receives the plurality of band signals and time-axis compresses the respective band signals at the same ratio, thereby outputting a plurality of time-axis compressed band signals and a synthesis filter bank for synthesizing the plurality of time-axis compressed band signals.Type: GrantFiled: September 11, 1996Date of Patent: December 1, 1998Assignee: Matsushita Electric Industrial Co., Ltd.Inventor: Shuji Miyasaka
-
Patent number: 5832435Abstract: Improved automated synthesis of human audible speech from text is disclosed. Performance enhancement of the underlying text comprehensibility is obtained through prosodic treatment of the synthesized material, improved speaking rate treatment, and improved methods of spelling words or terms for the system user. Prosodic shaping of text sequences appropriate for the discourse in large groupings of text segments, with prosodic boundaries developed to indicate conceptual units within the text groupings, is implemented in a preferred embodiment.Type: GrantFiled: January 29, 1997Date of Patent: November 3, 1998Assignee: Nynex Science & Technology Inc.Inventor: Kim Ernest Alexander Silverman
-
Patent number: 5832434Abstract: The present invention automatically determines sound duration values, based on context, for phonetic symbols which are produced during text-to-speech conversion. The context-dependent and static attributes of the phonetic symbols are checked and specified. Then, the phonetic symbols are processed by a set of sequential duration-specification rules which set the duration value for each phonetic symbol.Type: GrantFiled: January 17, 1997Date of Patent: November 3, 1998Assignee: Apple Computer, Inc.Inventor: Scott E. Meredith
-
Patent number: 5819224Abstract: A speech synthesis system in which coefficients of a speech synthesis filter are quantized. An LSP or other filter coefficient representation which evolves slowly with time is generated for each of a series of N input speech frames to produce p coefficients in respect of each frame. The coefficients related to the N frames define a p.times.N matrix, with each row of the matrix containing N coefficients and each coefficient of one row being related to a respective one of the N frames. The matrix is split into a series of submatrices each made up from one or more of the rows, and each submatrix is vector quantized independently of the other submatrices using a composite time/spectral weighting function which for example emphasises distortion associated with high energy regions of the spectrum of each of the N input speech frames and is also proportional to the energy and degree of voicing of each of the N input speech frames.Type: GrantFiled: April 1, 1996Date of Patent: October 6, 1998Assignee: The Victoria University of ManchesterInventor: Costas Xydeas
-
Patent number: 5781884Abstract: The present invention provides a method of expanding a string of one or more digits to form a verbal equivalent using weighted finite state transducers. The method provides a grammatical description that expands the string into a numeric concept represented by a sum of powers of a base number system, compiles the grammatical description into a first weighted finite state transducer, provides a language specific grammatical description for verbally expressing the numeric concept, compiles the language specific grammatical description into a second weighted finite state transducer, composes the first and second finite state transducers to form a third weighted finite state transducer from which the verbal equivalent of the string can be synthesized, and synthesizes the verbal equivalent from the third weighted finite state transducer.Type: GrantFiled: November 22, 1996Date of Patent: July 14, 1998Assignee: Lucent Technologies, Inc.Inventors: Fernando Carlos Neves Pereira, Michael Dennis Riley, Richard William Sproat
-
Patent number: 5781882Abstract: An apparatus and method for processing a voice message to provide low bit rate speech transmission processes the voice message to generate speech parameters which are arranged into a two dimensional parameter matrix (502) including a sequence of parameter frames. The two dimensional parameter matrix (502) is transformed using a predetermined two dimensional matrix transformation function (414) to obtain a two dimensional transform matrix (506). Distance values representing distances between templates of a set of predetermined templates and the two dimensional transform matrix (506) are then derived. The distance values derived are identified by indexes identifying the templates of the set of predetermined templates. The distance values derived are compared, and an index corresponding to a template of the set of predetermined templates having a shortest distance is selected and then transmitted.Type: GrantFiled: September 14, 1995Date of Patent: July 14, 1998Assignee: Motorola, Inc.Inventors: Walter Lee Davis, Jian-Cheng Huang, Leon Jasinski
-
Patent number: 5765133Abstract: A system for recognizing continuous speech, for example for automatic dictation applications, uses a bigramme language model organized as a network with finite probability states. The system also uses methods of estimating the probabilities associated with the bigrammes and of representing the model of the language in a tree-like probability network.Type: GrantFiled: March 15, 1996Date of Patent: June 9, 1998Assignee: Istituto Trentino Di CulturaInventors: Giuliano Antoniol, Fabio Brugnara, Mauro Cettolo, Marcello Federico
-
Patent number: 5749071Abstract: Improved automated synthesis of human audible speech from text is disclosed. Performance enhancement of the underlying text comprehensibility is obtained through prosodic treatment of the synthesized material, improved speaking rate treatment, and improved methods of spelling words or terms for the sysstem user. Prosodic shaping of text sequences appropriate for the discourse in large groupings of text segments, with prosodic boundaries developed to indicate conceptual units within the text groupings, is implemented in a preferred embodiment.Type: GrantFiled: January 29, 1997Date of Patent: May 5, 1998Assignee: Nynex Science and Technology, Inc.Inventor: Kim Ernest Alexander Silverman
-
Patent number: 5732395Abstract: Improved automated synthesis of human audible speech from text is disclosed. Performance enhancement of the underlying text comprehensibility is obtained through prosodic treatment of the synthesized material, improved speaking rate treatment, and improved methods of spelling words or terms for the system user. Prosodic shaping of text sequences appropriate for the discourse in large groupings of text segments, with prosodic boundaries developed to indicate conceptual units within the text groupings, is implemented in a preferred embodiment.Type: GrantFiled: January 29, 1997Date of Patent: March 24, 1998Assignee: NYNEX Science & TechnologyInventor: Kim Ernest Alexander Silverman