Specialized Model Patents (Class 704/266)
  • Patent number: 6122616
    Abstract: The present invention improves upon electronic speech synthesis using pre-recorded segments of speech to fill in for other missing segments of speech. The formalized aliasing approach of the present invention overcomes the ad hoc aliasing approach of the prior art which oftentimes generated less than satisfactory speech synthesis sound output. By formalizing the relationship between missing speech sound samples and available speech sound samples, the present invention provides a structured approach to aliasing which results in improved synthetic speech sound quality. Further, the formalized aliasing approach of the present invention can be used to lessen storage requirements for speech sound samples by only storing as many sound samples as memory capacity can support.
    Type: Grant
    Filed: July 3, 1996
    Date of Patent: September 19, 2000
    Assignee: Apple Computer, Inc.
    Inventor: Caroline G. Henton
  • Patent number: 6098042
    Abstract: A homograph filter and method which increase the probability that homographs are pronounced correctly in a speech synthesis system utilizes a filter engine operating in conjunction with a set of rules. The filter engine parses a textual sentence to extract any present homographs and applies a correct set of rules to the homograph, based on an optimal search algorithm. The engine then carries out any appropriate substitution of phonetic data. Rules are primarily based on syntactic analisis, based on a priori knowledge of how each homograph is used. The rule set is classified into different categories in order to optimize the search algorithm and to allow the rules to be modified and updated incrementally without effecting the engine construction and/or performance. The search algorithm utilizes syntactic analysis to achieve optimum results.
    Type: Grant
    Filed: January 30, 1998
    Date of Patent: August 1, 2000
    Assignee: International Business Machines Corporation
    Inventor: Duy Quoc Huynh
  • Patent number: 6094633
    Abstract: Synthetic speech is generated from conventional texts and in particular by converting text in graphemes into a text in phonemes. The grapheme text is analyzed into rimes and onsets, and each word is analyzed from the end so that earlier-occurring segments are at least partially defined by the identification of later-occurring segments. It is a particular feature that an internal string of consonants, i.e., a string of consonants preceded and followed by a vowel, is split into two portions, namely, a second portion which is contained in a database of onsets, and an earlier portion which, together with the preceding vowel or vowels, is contained in a database of rimes.
    Type: Grant
    Filed: December 2, 1996
    Date of Patent: July 25, 2000
    Assignee: British Telecommunications public limited company
    Inventors: Margaret Gaved, James Hawkey
  • Patent number: 6088674
    Abstract: Voice-generating information, comprising discrete voice data for velocity or pitch of a voice is made by dispensing the discrete data so that the voice data is not dependent on a time lag between phonemes and at the same time is present at a relative level against a reference thereof. The said information includes data on plural types of voice tone, and is stored in a voice-generating information storing section. Voice tone data indicating sound parameters for each voice element, such as phoneme for each voice tone type, is stored in a voice tone storing section. Voice data, corresponding to the type of voice tone in the voice-generating information stored in the voice-generating storing section, is selected from a plurality of voice type data stored in the voice tone storing section under control by a control section. Meter patterns, which occur successively in the direction of a time axis, are developed according to the voice-generating information.
    Type: Grant
    Filed: March 20, 1997
    Date of Patent: July 11, 2000
    Assignee: Justsystem Corp.
    Inventor: Nobuhide Yamazaki
  • Patent number: 6041300
    Abstract: A speech recognition system is disclosed useful in, for example, hands-free voice telephone dialing applications. The system will match a spoken word (token) to one previously enrolled in the system. The system will thereafter synthesize or replay the recognized word so that the speaker can confirm that the recognized word is indeed the correct word before further action is taken. In the case of voice activated dialing, this avoids wrong numbers. The token itself is not explicitly recorded; rather, only the lefemes may be recorded from which the token can be reconstructed for playback. This greatly reduces the amount of disk space that is needed for the database as well as provides the ability to reconstruction data in real time for synthesis use by a local name recognition machine.
    Type: Grant
    Filed: March 21, 1997
    Date of Patent: March 21, 2000
    Assignee: International Business Machines Corporation
    Inventors: Abraham Poovakunnel Ittycheriah, Stephane Herman Maes
  • Patent number: 6016471
    Abstract: The mixed decision tree includes a network of yes-no questions about adjacent letters in a spelled word sequence and also about adjacent phonemes in the phoneme sequence corresponding to the spelled word sequence. Leaf nodes of the mixed decision tree provide information about which phonetic transcriptions are most probable. Using the mixed trees, scores are developed for each of a plurality of possible pronunciations, and these scores can be used to select the best pronunciation as well as to rank pronunciations in order of probability. The pronunciations generated by the system can be used in speech synthesis and speech recognition applications as well as lexicography applications.
    Type: Grant
    Filed: April 29, 1998
    Date of Patent: January 18, 2000
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Roland Kuhn, Jean-Claude Junqua, Matteo Contolini
  • Patent number: 5991724
    Abstract: A reproduction speed of speech sound changing apparatus which reproduces speech data at a speed in which essential part thereof can be caught so that the outline of the speech sound can be grasped even when changing the reproduction speed, besides remarkably reduces the whole reproducing time wherein a reproducing speed in each predetermined period is calculated according to a parameter value in every predetermined period of speech data in accordance with such a manner that a part having a high parameter value such as high power, high pitch or the like of speech data is judged to be the part, where important contents are involved, and such part of important contents is reproduced at such a speed that the contents can be caught, while the parts other than that described above are reproduced either at such a speed that the whole reproduction of speech data can be completed within a required time, or reproduced by skipping over the parts if at thus determined reproduction speed, reproduced speech sound cannot be
    Type: Grant
    Filed: March 5, 1998
    Date of Patent: November 23, 1999
    Assignee: Fujitsu Limited
    Inventors: Hideki Kojima, Shinta Kimura
  • Patent number: 5974376
    Abstract: The present invention relates to a method for transmitting multiresolution audio signals via wireless devices in a radio frequency communication system wherein audio signals are decomposed into levels of resolution. The audio signal is decomposed into levels including a base signal at a base transmission rate and one or more signal details and input into a code rate selector, controlled by either party to the communication. The base signal represents the coarsest resolution or quality of the signal. Each signal detail, when added to the base signal, improves the resolution of the signal by increasing the detail and the transmission rate. An audio receiving unit transmits a request for audio transmission to the audio transmitting unit. In response to the initial request, the base signal is transmitted to the audio receiving unit. If the base signal is insufficient, the sound quality can be increased incrementally by sending further requests to transmit additional signal detail from the code rate selector.
    Type: Grant
    Filed: October 10, 1996
    Date of Patent: October 26, 1999
    Assignee: Ericsson, Inc.
    Inventors: Amer Hassan, David G. Matthews
  • Patent number: 5926788
    Abstract: An encoding unit 2 divides speech signals provided to an input terminal 10 into frames and encodes the divided signals on the frame basis to output encoding parameters such as line spectral pair (LSP) parameters, pitch, voiced(V)/unvoiced (UV) or spectral amplitude A.sub.m. The modified encoding parameter calculating unit 3 interpolates the encoding parameters for calculating modified encoding parameters associated with desired time points. A decoding unit 6 synthesizes sine waves and the noise based upon the modified encoding parameters and outputs the synthesized speech signals at an output terminal 37. Speed control can be achieved easily at an arbitrary rate over a wide range with high sound quality with the phoneme and the pitch remaining unchanged.
    Type: Grant
    Filed: June 17, 1996
    Date of Patent: July 20, 1999
    Assignee: Sony Corporation
    Inventor: Masayuki Nishiguchi
  • Patent number: 5920838
    Abstract: A computer implemented reading tutor comprises a player for outputting a response. An input block implementing a plurality of functions such as silence detection, speech recognition, etc. captures the read material. A tutoring function compares the output of the speech recognizer to the text which was supposed to have been read and generates a response, as needed, based on information in a knowledge base and an optional student model. The response is output to the user through the player. A quality control function evaluates the captured read material and stores the captured material in the knowledge base under certain conditions. An auto-enhancement function uses information available to the tutor to create additional resources such as identifying rhyming words, words with common roots, etc., which can be used as responses.
    Type: Grant
    Filed: June 2, 1997
    Date of Patent: July 6, 1999
    Assignee: Carnegie Mellon University
    Inventors: Jack Mostow, Gregory S. Aist
  • Patent number: 5878393
    Abstract: Computer-stored text, such as numerical information, is processed by a word list generator to develop a word list corresponding to those words that are to be spoken by the system. The word list generator assigns a prosodic environment state or token to each entry in the list. The prosodic environment identifies how the word functions in its current prosodic context. Different intonations are applied based on the prosodic environment. Next, the preceding and adjacent words are examined to determine how each word may need to be pronounced differently, based on the ending phoneme of the preceding word and the beginning phoneme of the following word. Using this phonological information along with the prosodic information, a sample list is generated by accessing a dictionary of stored samples. The sample list is then serially played through suitable digital-to-analog conversion circuitry to generate the text-to-speech output.
    Type: Grant
    Filed: September 9, 1996
    Date of Patent: March 2, 1999
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Kazue Hata, Nicholas Kibre
  • Patent number: 5860064
    Abstract: A method and apparatus for the automatic application of vocal emotion parameters to text in a text-to-speech system. Predefining vocal parameters for various vocal emotions allows simple selection and application of vocal emotions to text to be output from a text-to-speech system. Further, the present invention is capable of generating vocal emotion with the limited prosodic controls available in a concatenative synthesizer.
    Type: Grant
    Filed: February 24, 1997
    Date of Patent: January 12, 1999
    Assignee: Apple Computer, Inc.
    Inventor: Caroline G. Henton
  • Patent number: 5857170
    Abstract: A speech synthesizing apparatus for varying a speech characteristic condition is adapted to accept a speech request that does not have a speech characteristic condition and to synthesize a speech responsive thereto. A controlling portion accepts a plurality of speech requests; a speech synthesizing portion switches a plurality of speech characteristics for speech synthesis; a speaker outputs a speech corresponding to an output signal of the speech synthesizing portion; and a synthesizer characteristic table stores speech characteristic conditions synthesized by the speech synthesizing portion. The controlling portion can accept a speech request that does not have a speech characteristic condition. Then, the controlling portion selects an available speech characteristic condition from a synthesizer characteristic table and sends the selected speech characteristic condition to the speech synthesizer.
    Type: Grant
    Filed: August 14, 1995
    Date of Patent: January 5, 1999
    Assignee: NEC Corporation
    Inventor: Reishi Kondo
  • Patent number: 5845247
    Abstract: The reproducing apparatus of the invention reproduces a plurality of band signals which have been subjected to a band division and includes a time-scale modifier which receives the plurality of band signals and time-axis compresses the respective band signals at the same ratio, thereby outputting a plurality of time-axis compressed band signals and a synthesis filter bank for synthesizing the plurality of time-axis compressed band signals.
    Type: Grant
    Filed: September 11, 1996
    Date of Patent: December 1, 1998
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventor: Shuji Miyasaka
  • Patent number: 5832435
    Abstract: Improved automated synthesis of human audible speech from text is disclosed. Performance enhancement of the underlying text comprehensibility is obtained through prosodic treatment of the synthesized material, improved speaking rate treatment, and improved methods of spelling words or terms for the system user. Prosodic shaping of text sequences appropriate for the discourse in large groupings of text segments, with prosodic boundaries developed to indicate conceptual units within the text groupings, is implemented in a preferred embodiment.
    Type: Grant
    Filed: January 29, 1997
    Date of Patent: November 3, 1998
    Assignee: Nynex Science & Technology Inc.
    Inventor: Kim Ernest Alexander Silverman
  • Patent number: 5832434
    Abstract: The present invention automatically determines sound duration values, based on context, for phonetic symbols which are produced during text-to-speech conversion. The context-dependent and static attributes of the phonetic symbols are checked and specified. Then, the phonetic symbols are processed by a set of sequential duration-specification rules which set the duration value for each phonetic symbol.
    Type: Grant
    Filed: January 17, 1997
    Date of Patent: November 3, 1998
    Assignee: Apple Computer, Inc.
    Inventor: Scott E. Meredith
  • Patent number: 5819224
    Abstract: A speech synthesis system in which coefficients of a speech synthesis filter are quantized. An LSP or other filter coefficient representation which evolves slowly with time is generated for each of a series of N input speech frames to produce p coefficients in respect of each frame. The coefficients related to the N frames define a p.times.N matrix, with each row of the matrix containing N coefficients and each coefficient of one row being related to a respective one of the N frames. The matrix is split into a series of submatrices each made up from one or more of the rows, and each submatrix is vector quantized independently of the other submatrices using a composite time/spectral weighting function which for example emphasises distortion associated with high energy regions of the spectrum of each of the N input speech frames and is also proportional to the energy and degree of voicing of each of the N input speech frames.
    Type: Grant
    Filed: April 1, 1996
    Date of Patent: October 6, 1998
    Assignee: The Victoria University of Manchester
    Inventor: Costas Xydeas
  • Patent number: 5781884
    Abstract: The present invention provides a method of expanding a string of one or more digits to form a verbal equivalent using weighted finite state transducers. The method provides a grammatical description that expands the string into a numeric concept represented by a sum of powers of a base number system, compiles the grammatical description into a first weighted finite state transducer, provides a language specific grammatical description for verbally expressing the numeric concept, compiles the language specific grammatical description into a second weighted finite state transducer, composes the first and second finite state transducers to form a third weighted finite state transducer from which the verbal equivalent of the string can be synthesized, and synthesizes the verbal equivalent from the third weighted finite state transducer.
    Type: Grant
    Filed: November 22, 1996
    Date of Patent: July 14, 1998
    Assignee: Lucent Technologies, Inc.
    Inventors: Fernando Carlos Neves Pereira, Michael Dennis Riley, Richard William Sproat
  • Patent number: 5781882
    Abstract: An apparatus and method for processing a voice message to provide low bit rate speech transmission processes the voice message to generate speech parameters which are arranged into a two dimensional parameter matrix (502) including a sequence of parameter frames. The two dimensional parameter matrix (502) is transformed using a predetermined two dimensional matrix transformation function (414) to obtain a two dimensional transform matrix (506). Distance values representing distances between templates of a set of predetermined templates and the two dimensional transform matrix (506) are then derived. The distance values derived are identified by indexes identifying the templates of the set of predetermined templates. The distance values derived are compared, and an index corresponding to a template of the set of predetermined templates having a shortest distance is selected and then transmitted.
    Type: Grant
    Filed: September 14, 1995
    Date of Patent: July 14, 1998
    Assignee: Motorola, Inc.
    Inventors: Walter Lee Davis, Jian-Cheng Huang, Leon Jasinski
  • Patent number: 5765133
    Abstract: A system for recognizing continuous speech, for example for automatic dictation applications, uses a bigramme language model organized as a network with finite probability states. The system also uses methods of estimating the probabilities associated with the bigrammes and of representing the model of the language in a tree-like probability network.
    Type: Grant
    Filed: March 15, 1996
    Date of Patent: June 9, 1998
    Assignee: Istituto Trentino Di Cultura
    Inventors: Giuliano Antoniol, Fabio Brugnara, Mauro Cettolo, Marcello Federico
  • Patent number: 5749071
    Abstract: Improved automated synthesis of human audible speech from text is disclosed. Performance enhancement of the underlying text comprehensibility is obtained through prosodic treatment of the synthesized material, improved speaking rate treatment, and improved methods of spelling words or terms for the sysstem user. Prosodic shaping of text sequences appropriate for the discourse in large groupings of text segments, with prosodic boundaries developed to indicate conceptual units within the text groupings, is implemented in a preferred embodiment.
    Type: Grant
    Filed: January 29, 1997
    Date of Patent: May 5, 1998
    Assignee: Nynex Science and Technology, Inc.
    Inventor: Kim Ernest Alexander Silverman
  • Patent number: 5732395
    Abstract: Improved automated synthesis of human audible speech from text is disclosed. Performance enhancement of the underlying text comprehensibility is obtained through prosodic treatment of the synthesized material, improved speaking rate treatment, and improved methods of spelling words or terms for the system user. Prosodic shaping of text sequences appropriate for the discourse in large groupings of text segments, with prosodic boundaries developed to indicate conceptual units within the text groupings, is implemented in a preferred embodiment.
    Type: Grant
    Filed: January 29, 1997
    Date of Patent: March 24, 1998
    Assignee: NYNEX Science & Technology
    Inventor: Kim Ernest Alexander Silverman