Vocal Tract Model Patents (Class 704/261)
  • Patent number: 7613611
    Abstract: Provided is a method and an apparatus for vocal-cord signal recognition. A signal processing unit receives and digitalizes a vocal cord signal, and a noise removing unit which channel noise included in the vocal cord signal. A feature extracting unit extracts a feature vector from the vocal cord signal, which has the channel noise removed therefrom, and a recognizing unit calculates a similarity between the vocal cord signal and the learned model parameter. Consequently, the apparatus is robust in a noisy environment.
    Type: Grant
    Filed: May 26, 2005
    Date of Patent: November 3, 2009
    Assignee: Electronics and Telecommunications Research Institute
    Inventors: Kwan Hyun Cho, Mun Sung Han, Young Giu Jung, Hee Sook Shin, Jun Seok Park, Dong Won Han
  • Publication number: 20090222269
    Abstract: An apparatus for voice synthesis includes: a word database for storing words and voices; a syllable database for storing syllables and voices; a processor for executing a process including: extracting a word from a document, generating a voice signal based on the extracted voice when the extracted word is included in the word database synthesizing a voice signal based on the extracted voice associated with the one or more syllables corresponding to the extracted word when the extracted word is not found in the word database; a speaker for producing a voice based on either of the generated and the synthesized voice signal; and a display for selectively displaying the extracted word when the voice based on the synthesized voice signal is produced by the speaker.
    Type: Application
    Filed: May 11, 2009
    Publication date: September 3, 2009
    Inventor: Shinichiro MORI
  • Publication number: 20090222268
    Abstract: A speech synthesis system synthesizes a speech signal corresponding to an input speech signal based on a spectral envelope of the input speech signal. A glottal pulse generator generates a time series of glottal pulses, that are processed into a glottal pulse magnitude spectrum. A shaping circuit shapes the glottal pulse magnitude spectrum based on the spectral envelope and generates a shaped glottal pulse magnitude spectrum. A harmonic null adjustment circuit reduces harmonic nulls in the shaped glottal pulse magnitude spectrum and generates a null-adjusted synthesized speech spectrum. An inverse transform circuit generates a null-adjusted time-series speech signal. An overlap and add circuit synthesizes the speech signal based on the null-adjusted time-series speech signal.
    Type: Application
    Filed: March 3, 2008
    Publication date: September 3, 2009
    Applicant: QNX SOFTWARE SYSTEMS (WAVEMAKERS), INC.
    Inventors: Xueman Li, Phillip A. Hetherington, Shahla Parveen, Tommy TSZ Chun Chiu
  • Patent number: 7567903
    Abstract: A method and apparatus for performing speech recognition are provided. A Vocal Tract Length Normalized acoustic model for a speaker is generated from training data. Speech recognition is performed on a first recognition input to determine a first best hypothesis. A first Vocal Tract Length Normalization factor is estimated based on the first best hypothesis. Speech recognition is performed on a second recognition input using the Vocal Tract Length Normalized acoustic model to determine an other best hypothesis. An other Vocal Tract Length Normalization factor is estimated based on the other best hypothesis and at least one previous best hypothesis.
    Type: Grant
    Filed: January 12, 2005
    Date of Patent: July 28, 2009
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Vincent Goffin, Andrej Ljolje, Murat Saraclar
  • Patent number: 7529672
    Abstract: A method of synthesizing a speech signal by providing a first speech unit signal having an end interval and a second speech unit signal having a front interval, wherein at least some of the periods of the end interval are appended in inverted order at the end of the first speech unit signal in order to provide a fade-out interval, and at least some of the periods of the front interval are appended in inverted order at the beginning of the second speech unit signal to provide a fade-in interval. An overlap and add operation is performed on the end and fade-in intervals and the fade-out and front intervals.
    Type: Grant
    Filed: August 8, 2003
    Date of Patent: May 5, 2009
    Assignee: Koninklijke Philips Electronics N.V.
    Inventor: Ercan Ferit Gigi
  • Publication number: 20090112596
    Abstract: A system and method are disclosed for synthesizing speech based on a selected speech act. A method includes modifying synthesized speech of a spoken dialogue system, by (1) receiving a user utterance, (2) analyzing the user utterance to determine an appropriate speech act, and (3) generating a response of a type associated with the appropriate speech act, wherein in linguistic variables in the response are selected, based on the appropriate speech act.
    Type: Application
    Filed: October 30, 2007
    Publication date: April 30, 2009
    Applicant: AT&T Lab, Inc.
    Inventors: Ann K. Syrdal, Mark Beutnagel, Alistair D. Conkie, Yeon-Jun Kim
  • Publication number: 20090094031
    Abstract: An apparatus for providing text independent voice conversion may include a first voice conversion model and a second voice conversion model. The first voice conversion model may be trained with respect to conversion of training source speech to synthetic speech corresponding to the training source speech. The second voice conversion model may be trained with respect to conversion to training target speech from synthetic speech corresponding to the training target speech. An output of the first voice conversion model may be communicated to the second voice conversion model to process source speech input into the first voice conversion model into target speech corresponding to the source speech as the output of the second voice conversion model.
    Type: Application
    Filed: October 4, 2007
    Publication date: April 9, 2009
    Inventors: Jilei Tian, Victor Popa, Jani K. Nurminen
  • Publication number: 20090063156
    Abstract: A voice synthesis method, said method comprising a step of choosing a synthetic voice from among a set of voices having predetermined spectral signatures and a step of recording the natural voice of a first person, the method comprising a step of transforming the natural recorded voice so as to conform with the spectral signature of the chosen synthetic voice, the natural voice thereby transformed being recorded, said method comprising a step of determining at least one situation parameter for a first character from among a set of predefined parameters, each predefined parameter being associated with a spectral alteration of the emitted voice, the determined situation parameter particularly characterizing the environment or the physical or psychological state of the character, the method comprising a step of spectrally altering the transformed natural voice so as to conform with the spectral alteration associated with the character's situation parameter.
    Type: Application
    Filed: August 26, 2008
    Publication date: March 5, 2009
    Applicant: Alcatel Lucent
    Inventors: Sylvain SQUEDIN, Serge Papillon
  • Publication number: 20090063155
    Abstract: The present invention provides a robot apparatus with a vocal interactive function. The robot apparatus receives a vocal input, and recognizes the vocal input. The robot apparatus stores a plurality of output data, an output count of each of the output data, and a weighted value of each of the output data. The robot apparatus outputs output data according to the weighted values of all the output data corresponding to the vocal input, and adds one to the output count of the output data. The robot apparatus calculates the weighted values of all the output data corresponding to the vocal input according to the output count. Consequently, the robot apparatus may output different and variable output data when receiving the same vocal input. The present invention also provides a vocal interactive method adapted for the robot apparatus.
    Type: Application
    Filed: August 13, 2008
    Publication date: March 5, 2009
    Applicant: HON HAI PRECISION INDUSTRY CO., LTD.
    Inventors: Tsu-Li Chiang, Chuan-Hong Wang, Kuo-Pao Hung, Kuan-Hong Hsieh
  • Patent number: 7457752
    Abstract: Method and apparatus for controlling the operation of an emotion synthesizing device, notably of the type where the emotion is conveyed by a sound, having at least one input parameter whose value is used to set a type of emotion to be conveyed, by making at least one parameter a variable parameter over a determined control range, thereby to confer a variability in an amount of the type of emotion to be conveyed. The variable parameter can be made variable according to a variation model over the control range, the model relating a quantity of emotion control variable to the variable parameter, whereby said control variable is used to variably establish a value of said variable parameter. Preferably the variation obeys a linear model, the variable parameter being made to vary linearly with a variation in a quantity of emotion control variable.
    Type: Grant
    Filed: August 12, 2002
    Date of Patent: November 25, 2008
    Assignee: Sony France S.A.
    Inventor: Pierre Yves Oudeyer
  • Publication number: 20080288258
    Abstract: The present invention provides a speech analysis method comprising steps of obtaining a speech signal and a corresponding DEGG/EGG signal; regarding the speech signal as the output of a vocal tract filter in a source-filter model taking the DEGG/EGG signal as the input; and estimating the features of the vocal tract filter from the speech signal as the output and the DEGG/EGG signal as the input, wherein the features of the vocal tract filter are expressed by the state vectors of the vocal tract filter at selected time points, and the step of estimating is performed using Kalman filtering.
    Type: Application
    Filed: April 3, 2008
    Publication date: November 20, 2008
    Applicant: International Business Machines Corporation
    Inventors: Dan Ning Jiang, Fan Ping Meng, Yong Qin, Zhi Wei Shuang
  • Patent number: 7398213
    Abstract: The present invention relates to a method and system for diagnosing pathological phenomenon using a voice signal. In one embodiment, the existence of at least one pathological phenomena is determined based at least in part upon a calculated average intensity function associated with speech from the patient. In another embodiment, the existence of at least one pathological phenomena is determined based at least in part upon the a calculated maximum intensity function associated with speech from the patient.
    Type: Grant
    Filed: May 16, 2006
    Date of Patent: July 8, 2008
    Assignee: Exaudios Technologies
    Inventors: Yoram Levanon, Lan Lossos-Shifrin
  • Publication number: 20080154601
    Abstract: A method and system for providing efficient menu services for an information processing system that uses a telephone or other form of audio user interface. In one embodiment, the menu services provide effective support for novice users by providing a full listing of available keywords and rotating house advertisements which inform novice users of potential features and information. For experienced users, cues are rendered so that at any time the user can say a desired keyword to invoke the corresponding application. The menu is flat to facilitate its usage. Full keyword listings are rendered after the user is given a brief cue to say a keyword. Service messages rotate words and word prosody. When listening to receive information from the user, after the user has been cued, soft background music or other audible signals are rendered to inform the user that a response may now be spoken to the service.
    Type: Application
    Filed: November 20, 2007
    Publication date: June 26, 2008
    Applicant: Microsoft Corporation
    Inventors: Lisa Joy Stifelman, Hadi Partovi, Haleh Partovi, David Bryan Alpert, Matthew Talin Marx, Scott James Bailey, Kyle D. Sims, Darby McDonough Bailey, Roderick Steven Brathwaite, Eugene Koh, Angus Macdonald Davis
  • Patent number: 7365749
    Abstract: An animation wireframe is modified with three-dimensional (3D) range and color data having a corresponding shape surface. The animation wireframe is vertically scaled based on distances between consecutive features within the 3D range and color data and corresponding distances within the generic animation wireframe. For each animation wireframe point, the location of the animation wireframe point is adjusted to coincide with a point on the shape surface. The shape surface point lies along a scaling line connecting the animation wireframe point, the shape surface point and an origin point. The scaling line is within a horizontal point.
    Type: Grant
    Filed: August 15, 2006
    Date of Patent: April 29, 2008
    Assignee: AT&T Corp.
    Inventor: Joern Ostermann
  • Publication number: 20080077407
    Abstract: A system, method and computer-readable media are disclosed for improving speech synthesis. A text-to-speech (TTS) voice database for use in a TTS system is generated by a method comprising labeling a voice database phonemically and applying a pre-/post-vocalic distinction to the phonemic labels to generate a TTS voice database. When a system synthesizes speech using speech units from the TTS voice database, the database provides phonemes for selection using the pre-/post-vocalic distinctions which improve unit selection to render the synthetic speech more natural.
    Type: Application
    Filed: September 26, 2006
    Publication date: March 27, 2008
    Applicant: AT&T Corp.
    Inventors: Mark Beutnagel, Alistair Conkie, Yeon-Jun Kim, Ann K. Syrdal
  • Patent number: 7330813
    Abstract: A speech processing apparatus able to enhance formants more naturally, wherein a speech analyzing unit analyzes an input speech signal to find LPCs and converts the LPCs to LSPs, a speech decoding unit calculates a distance between adjacent orders of the LSPs by an LSP analytical processing unit and calculates LSP adjusting amounts of larger values for LSPs of adjacent orders closer in distance by an LSP adjusting amount calculating unit, an LSP adjusting unit adjusts the LSPs based on the LSP adjusting amounts such that the LSPs of adjacent orders closer in distance become closer, an LSP-LPC converting unit converts the adjusted LSPs to LPCs, and an LPC combining unit uses the LPCs and sound source parameters to obtain formant-enhanced speech.
    Type: Grant
    Filed: August 5, 2003
    Date of Patent: February 12, 2008
    Assignee: Fujitsu limited
    Inventor: Mutsumi Saito
  • Patent number: 7233900
    Abstract: The present invention relates to a word sequence output device in which emotional synthetic speech can be output. The device outputs emotional synthetic speech. A text generating unit 31 generates spoken text for synthetic speech by using text as a word sequence included in action command information in accordance with the action command information. An emotion checking unit 39 checks an emotion model value and determines whether or not the emotion of a robot is aroused based on the emotion model value. Further, when the emotion of the robot is aroused, the emotion checking unit 39 instructs the text generating unit 31 to change the word order. The text generating unit 31 changes the word order of the spoken text in accordance with the instructions from the emotion checking unit 39. Accordingly, when the spoken text is “Kimi wa kirei da.” (You are beautiful.), the word order is changed to make a sentence “Kirei da, kimi wa.” (You are beautiful, you are.
    Type: Grant
    Filed: April 5, 2002
    Date of Patent: June 19, 2007
    Assignee: Sony Corporation
    Inventor: Shinichi Kariya
  • Patent number: 7225129
    Abstract: A method of modeling speech distinctions within computer-animated talking heads that utilize the manipulation of speech production articulators for selected speech segments. Graphical representations of voice characteristics and speech production characteristics are generated in response to said speech segment. By way of example, breath images are generated such as particle-cloud images, and particle-stream images to represent the voiced characteristics such as the presence of stops and fricatives, respectively. The coloring on exterior portions of the talking head is displayed in response to selected voice characteristics such as nasality. The external physiology of the talking head is modulated, such as by changing the width and movement of the nose, the position of the eyebrows, and movement of the throat in response to the voiced speech characteristics such as pitch, nasality, and voicebox vibration, respectively.
    Type: Grant
    Filed: September 20, 2001
    Date of Patent: May 29, 2007
    Assignee: The Regents of the University of California
    Inventors: Dominic W. Massaro, Michael M. Cohen, Jonas Beskow
  • Patent number: 7219064
    Abstract: To provide a robot which autonomously forms and performs an action plan in response to external factors without direct command input from an operator. When reading a story printed in a book or other print media or recorded in recording media or when reading a story downloaded through a network, the robot does not simply read every single word as it is written. Instead, the robot uses external factors, such as a change of time, a change of season, or a change in a user's mood, and dynamically alters the story as long as the changed contents are substantially the same as the original contents. As a result, the robot can read aloud the story whose contents would differ every time the story is read.
    Type: Grant
    Filed: October 23, 2001
    Date of Patent: May 15, 2007
    Assignee: Sony Corporation
    Inventors: Hideki Nakakita, Tomoaki Kasuga
  • Patent number: 7184958
    Abstract: A speech synthesis method subjects a reference speech signal to windowing to extract a speech pitch wave having a window function of a window length double a pitch period of the reference speech signal from the reference speech signal. A linear prediction coefficient is generated by subjecting the reference speech signal to a linear prediction analysis. The speech pitch wave is subjected to inverse-filtering based on the linear prediction coefficient to produce a residual pitch wave, which is then stored as information of a speech synthesis unit in a voiced period in a storage. Speech using the information of the speech synthesis unit is then synthesized.
    Type: Grant
    Filed: March 5, 2004
    Date of Patent: February 27, 2007
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Takehiko Kagoshima, Masami Akamine
  • Patent number: 7162417
    Abstract: An amplitude altering magnification (r) applied to sub-phoneme units of a voiced portion and an amplitude altering magnification s to be applied to sub-phoneme units of an unvoiced portion are determined based upon a target phoneme average power (p0) of synthesized speech and power (p) of a selected phoneme unit. Sub-phoneme units are extracted from a phoneme to be synthesized. From among the extracted sub-phoneme units, a sub-phoneme unit of the voiced portion is multiplied by the amplitude altering magnification (r), and a sub-phoneme unit of the unvoiced portion is multiplied by the amplitude altering magnification (s). Synthesized speech is obtained using the sub-phoneme units thus obtained. This makes it possible to realize power control in which any decline in the quality of synthesized speech is reduced.
    Type: Grant
    Filed: July 13, 2005
    Date of Patent: January 9, 2007
    Assignee: Canon Kabushiki Kaisha
    Inventors: Masayuki Yamada, Yasuhiro Komori, Mitsuru Otsuka
  • Patent number: 7113909
    Abstract: A stereotypical sentence is synthesized into a voice of an arbitrary speech style. A third party is able to prepare prosody data and a user of a terminal device having a voice synthesizing part can acquire the prosody data. The voice synthesizing method determines a voice-contents identifier to point to a type of voice contents of a stereotypical sentence, prepares a speech style dictionary including speech style and prosody data which correspond to the voice-contents identifier, selects prosody data of the synthesized voice to be generated from the speech style dictionary, and adds the selected prosody data to a voice synthesizer 13 as voice-synthesizer driving data to thereby perform voice synthesis with a specific speech style. Thus, a voice of a stereotypical sentence can be synthesized with an arbitrary speech style.
    Type: Grant
    Filed: July 31, 2001
    Date of Patent: September 26, 2006
    Assignee: Hitachi, Ltd.
    Inventors: Nobuo Nukaga, Kenji Nagamatsu, Yoshinori Kitahara
  • Patent number: 7085718
    Abstract: It is suggested to include application speech (AS) into the set of identification speech data (ISD) for training a speaker-identification process so as to make possible a reduction of the set of initial identification speech data (IISD) to be collected within an initial enrolment phase and therefore to add more convenience for the user to be registered or enrolled.
    Type: Grant
    Filed: May 6, 2002
    Date of Patent: August 1, 2006
    Assignee: Sony Deutschland GmbH
    Inventor: Thomas Kemp
  • Patent number: 7082395
    Abstract: A means and method are provided for enhancing or replacing the natural excitation of the human vocal tract by artificial excitation means, wherein the artificially created acoustics present additional spectral, temporal, or phase data useful for (1) enhancing the machine recognition robustness of audible speech or (2) enabling more robust machine-recognition of relatively inaudible mouthed or whispered speech. The artificial excitation (a) may be arranged to be audible or inaudible, (b) may be designed to be non-interfering with another user's similar means, (c) may be used in one or both of a vocal content-enhancement mode or a complimentary vocal tract-probing mode, and/or (d) may be used for the recognition of audible or inaudible continuous speech or isolated spoken commands.
    Type: Grant
    Filed: October 3, 2002
    Date of Patent: July 25, 2006
    Inventors: Carol A. Tosaya, John W. Sliwa, Jr.
  • Patent number: 6993484
    Abstract: An amplitude altering magnification (r) applied to sub-phoneme units of a voiced portion and an amplitude altering magnification s to be applied to sub-phoneme units of an unvoiced portion are determined based upon a target phoneme average power (p0) of synthesized speech and power (p) of a selected phoneme unit. Sub-phoneme units are extracted from a phoneme to be synthesized. From among the extracted sub-phoneme units, a sub-phoneme unit of the voiced portion is multiplied by the amplitude altering magnification (r), and a sub-phoneme unit of the unvoiced portion is multiplied by the amplitude altering magnification (s). Synthesized speech is obtained using the sub-phoneme units thus obtained. This makes it possible to realize power control in which any decline in the quality of synthesized speech is reduced.
    Type: Grant
    Filed: August 30, 1999
    Date of Patent: January 31, 2006
    Assignee: Canon Kabushiki Kaisha
    Inventors: Masayuki Yamada, Yasuhiro Komori, Mitsuru Otsuka
  • Patent number: 6990451
    Abstract: A method of making a digital voice library utilized for converting text to concatenated voice in accordance with a set of playback rules includes generating a complex tone that reflects a particular inflection required for a particular voice recording of a particular speech item. The complex tone is composed of portions of a recording of a voice talent uttering a vocal sequence. The voice talent is recorded reciting the particular speech item to make the particular voice recording. The voice talent uses the complex tone as a guide to allow the voice talent to recite the particular speech item in accordance with the particular inflection.
    Type: Grant
    Filed: June 1, 2001
    Date of Patent: January 24, 2006
    Assignee: Qwest Communications International Inc.
    Inventors: Eliot M. Case, Richard P. Phillips
  • Patent number: 6990450
    Abstract: A method for converting text to concatenated voice by utilizing a digital voice library and a set of playback rules is provided. Multiple voice recordings correspond to a single speech item and represent various inflections of that single speech item. The method includes determining syllable count and impact value for each speech item in a sequence of speech items. A desired inflection for each speech item is determined based on the syllable count and the impact value and further based on a set of playback rules. A sequence of voice recordings is determined by determining a voice recording for each speech item based on the desired inflection and based on the available voice recordings that correspond to the particular speech item. Voice data are generated based on a sequence of voice recordings by concatenating adjacent recordings in the sequence of voice recordings.
    Type: Grant
    Filed: March 27, 2001
    Date of Patent: January 24, 2006
    Assignee: Qwest Communications International Inc.
    Inventors: Eliot M. Case, Judith L. Weirauch, Richard P. Phillips
  • Patent number: 6970820
    Abstract: The speech synthesizer is personalized to sound like or mimic the speech characteristics of an individual speaker. The individual speaker provides a quantity of enrollment data, which can be extracted from a short quantity of speech, and the system modifies the base synthesis parameters to more closely resemble those of the new speaker. More specifically, the synthesis parameters may be decomposed into speaker dependent parameters, such as context-independent parameters, and speaker independent parameters, such as context dependent parameters. The speaker dependent parameters are adapted using enrollment data from the new speaker. After adaptation, the speaker dependent parameters are combined with the speaker independent parameters to provide a set of personalized synthesis parameters.
    Type: Grant
    Filed: February 26, 2001
    Date of Patent: November 29, 2005
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventors: Jean-Claude Junqua, Florent Perronnin, Roland Kuhn, Patrick Nguyen
  • Patent number: 6950799
    Abstract: A speech processing system modifies various aspects of input speech according to a user-selected one of various preprogrammed voice fonts. Initially, the speech converter receives a formants signal representing an input speech signal and a pitch signal representing the input signal's fundamental frequency. One or both of the following may also be received: a voicing signal comprising an indication of whether the input speech signal is voiced, unvoiced, or mixed, and/or a gain signal representing the input speech signal's energy. The speech converter also receives user selection of one of multiple preprogrammed voice fonts, each specifying a manner of modifying one or more of the received signals (i.e., formants, voicing, pitch, gain). The speech converter modifies at least one of the formants, voicing, pitch, and/or gain signals as specified by the selected voice font.
    Type: Grant
    Filed: February 19, 2002
    Date of Patent: September 27, 2005
    Assignee: Qualcomm Inc.
    Inventors: Ning Bi, Andrew P. DeJaco
  • Patent number: 6810379
    Abstract: A client/server text-to-speech synthesis system and method divides the method optimally between client and server. The server stores large databases for pronunciation analysis, prosody generation, and acoustic unit selection corresponding to a normalized text, while the client performs computationally intensive decompression and concatenation of selected acoustic units to generate speech. The units are transmitted from the client to the server in a highly compressed format, with a compression method selected based on the predetermined set of potential acoustic units. This compression method allows for very high-quality and natural-sounding speech to be output at the client machine.
    Type: Grant
    Filed: April 24, 2001
    Date of Patent: October 26, 2004
    Assignee: Sensory, Inc.
    Inventors: Pieter Vermeulen, Todd F. Mozer
  • Patent number: 6801894
    Abstract: A speech synthesizer includes a data memory having a plurality of address areas, which stores a plurality of phases in the address areas and an address designating circuit designating one of the address areas based on the phase signal. Further, a speech synthesizer includes a speech synthesizing circuit generating a speech synthesizing signal corresponding to the phase, which is stored in the designated area, a digital/analog converter transforming the speech synthesizing signal to an analog signal having amplitude, and a counter setting a period of silence. Furthermore, a speech synthesizer includes a silence-input circuit being connected between the speech synthesizing circuit and the digital/analog converter, which supplies a predetermined voltage to the digital/analog converter for the period that is set by the counter.
    Type: Grant
    Filed: March 22, 2001
    Date of Patent: October 5, 2004
    Assignee: Oki Electric Industry Co., Ltd.
    Inventors: Yoshihisa Nakamura, Hiroaki Matsubara
  • Publication number: 20030061050
    Abstract: A means and method are provided for enhancing or replacing the natural excitation of the human vocal tract by artificial excitation means, wherein the artificially created acoustics present additional spectral, temporal, or phase data useful for (1) enhancing the machine recognition robustness of audible speech or (2) enabling more robust machine-recognition of relatively inaudible mouthed or whispered speech. The artificial excitation (a) may be arranged to be audible or inaudible, (b) may be designed to be non-interfering with another user's similar means, (c) may be used in one or both of a vocal content-enhancement mode or a complimentary vocal tract-probing mode, and/or (d) may be used for the recognition of audible or inaudible continuous speech or isolated spoken commands.
    Type: Application
    Filed: November 27, 2002
    Publication date: March 27, 2003
    Inventors: Carol A. Tosaya, John W. Sliwa
  • Patent number: 6477496
    Abstract: A method, system and product are provided for synthesizing sound using encoded audio signals having a plurality of frequency subbands, each subband having a scale factor and sample data associated therewith. The method includes selecting a spectral envelope, and selecting a plurality of frequency subbands, each subband having sample data associated therewith. The method also includes generating a synthetic encoded audio signal having a plurality of frequency subbands, the subbands having the selected spectral envelope and the selected sample data. The system includes control logic for performing the method. The product includes a storage medium having computer readable programmed instructions for performing the method.
    Type: Grant
    Filed: December 20, 1996
    Date of Patent: November 5, 2002
    Inventor: Eliot M. Case
  • Patent number: 6463412
    Abstract: A high performance voice transformation apparatus and method is provided in which voice input is transformed into a symbolic representation of phonemes in the voice input. The symbolic representation is used to retrieve output voice segments of a selected target speaker for use in outputting the voice input in a different voice. In addition, voice input characteristics are extracted from the voice input and are then applied to the output voice segments to thereby provide a more realistic human sounding voice output.
    Type: Grant
    Filed: December 16, 1999
    Date of Patent: October 8, 2002
    Assignee: International Business Machines Corporation
    Inventors: Jason Raymond Baumgartner, Steven Leonard Roberts, Nadeem Malik, Flemming Andersen
  • Patent number: 6453287
    Abstract: A system and method for enhancing the speech quality of the mixed excitation linear predictive (MELP) coder and other low bit-rate speech coders. The system and method employ a plosive analysis/synthesis method, which detects the frame containing a plosive signal, applies a simple model to synthesize the plosive signal, and adds the synthesized plosive to the coded speech. The system and method remains compatible with the existing MELP coder bit stream.
    Type: Grant
    Filed: September 29, 1999
    Date of Patent: September 17, 2002
    Assignee: Georgia-Tech Research Corporation
    Inventors: Takahiro Unno, Thomas P. Barnwell, III, Kwan K. Truong
  • Patent number: 6347298
    Abstract: A computerized apparatus for reducing the size of a dictionary used in a text-to-speech synthesis system are provided. In an initial phase, the method and apparatus determine if entries in the dictionary, each containing a grapheme string and a corresponding phoneme string, can be fully matched by using at least one rule set used to synthesize words to phonemic data. If the entry can be fully matched using rule processing alone, the entry is indicated to be deleted from the dictionary. In a second phase, the method and apparatus determine if the entry, considered as a root word entry, is required in the dictionary in order to support phoneme synthesis of other entries containing the root word entry, and if so, the root word entry is indicated to be saved in the dictionary.
    Type: Grant
    Filed: February 26, 2001
    Date of Patent: February 12, 2002
    Assignee: Compaq Computer Corporation
    Inventors: Anthony J. Vitale, Ginger Chun-Che Lin, Thomas Kopec
  • Patent number: 6317713
    Abstract: Sound generating parameters are used for outputting fundamental frequency and a command regarding prosody, and a sound source generator. The sound generation device further includes use of an accent command and a descent command for calculating fundamental frequency and incorporates a rhythm command, which is representable by a sine wave. The device also uses character string analysis for analyzing a character string and generating a command concerning phoneme and prosody, a calculating element for outputting fundamental frequency as sound generation parameters, which depends on prosody, a sound source generator, and an articulator that depends on a phoneme command.
    Type: Grant
    Filed: January 6, 1999
    Date of Patent: November 13, 2001
    Assignee: Arcadia, Inc.
    Inventor: Seiichi Tenpaku
  • Patent number: 6208968
    Abstract: A computerized method and apparatus for reducing the size of a dictionary used in a text-to-speech synthesis system are provided. In an initial phase, the method and apparatus determine if entries in the dictionary, each containing a grapheme string and a corresponding phoneme string, can be fully matched by using at least one rule set used to synthesize words to phonemic data. If the entry can be fully matched using rule processing alone, the entry is indicated to be deleted from the dictionary. In a second phase, the method and apparatus determine if the entry, considered as a root word entry, is required in the dictionary in order to support phoneme synthesis of other entries containing the root word entry, and if so, the root word entry is indicated to be saved in the dictionary.
    Type: Grant
    Filed: December 16, 1998
    Date of Patent: March 27, 2001
    Assignee: Compaq Computer Corporation
    Inventors: Anthony J. Vitale, Ginger Chun-Che Lin, Thomas Kopec
  • Patent number: 6195632
    Abstract: An iterative formant analysis, based on minimizing the arc-length of various curves, and under various filter constraints estimates formant frequencies with desirable properties for text-to-speech applications. A class of arc-length cost functions may be employed. Some of these have analytic solutions and thus lend themselves well to applications requiring speed and reliability. The arc-length inverse filtering techniques are inherently pitch synchronous and are useful in realizing high quality pitch tracking and pitch epoch marking.
    Type: Grant
    Filed: November 25, 1998
    Date of Patent: February 27, 2001
    Assignee: Matsushita Electric Industrial Co., Ltd.
    Inventor: Steve Pearson
  • Patent number: 6122616
    Abstract: The present invention improves upon electronic speech synthesis using pre-recorded segments of speech to fill in for other missing segments of speech. The formalized aliasing approach of the present invention overcomes the ad hoc aliasing approach of the prior art which oftentimes generated less than satisfactory speech synthesis sound output. By formalizing the relationship between missing speech sound samples and available speech sound samples, the present invention provides a structured approach to aliasing which results in improved synthetic speech sound quality. Further, the formalized aliasing approach of the present invention can be used to lessen storage requirements for speech sound samples by only storing as many sound samples as memory capacity can support.
    Type: Grant
    Filed: July 3, 1996
    Date of Patent: September 19, 2000
    Assignee: Apple Computer, Inc.
    Inventor: Caroline G. Henton
  • Patent number: 6101469
    Abstract: For use in a synthesizer having a wave source that produces a periodic wave, frequency shifting circuitry for frequency-shifting the periodic wave and waveshaping circuitry for transforming the periodic wave into a waveform containing a formant, the frequency-shifting causing displacement of the formant, a circuit for, and method of, compensating for the displacement and a synthesizer employing the circuit or the method. In one embodiment, the circuit includes bias circuitry, coupled to the wave source and the frequency shifting circuitry, that introduces a bias into the periodic wave based on a degree to which the frequency shifting circuitry frequency shifts the periodic wave, the bias reducing a degree to which the formant is correspondingly frequency-shifted.
    Type: Grant
    Filed: March 2, 1998
    Date of Patent: August 8, 2000
    Assignee: Lucent Technologies Inc.
    Inventor: Steven D. Curtin
  • Patent number: 6044345
    Abstract: Human speech is coded by singling out from a transfer function of the speech, all poles that are unrelated to any particular resonance of a human vocal tract model. All other poles are maintained. A glottal pulse related sequence is defined representing the singled out poles through an explicitation of the derivative of the glottal air flow. Speech is outputted by a filter based on combining the glottal pulse related sequence and a representation of a formant filter with a complex transfer function expressing all other poles. The glottal pulse sequence is modelled through further explicitly expressible generation parameters. In particular, a non-zero decaying return phase supplemented to the glottal-pulse response that is explicitized in all its parameters, while amending the overall response in accordance with volumetric continuity.
    Type: Grant
    Filed: April 17, 1998
    Date of Patent: March 28, 2000
    Assignee: U.S. Phillips Corporation
    Inventor: Raymond N. J. Veldhuis
  • Patent number: 6012028
    Abstract: The text to speech conversion system distinguishes geographical names based upon the present position and includes a text input unit for inputting text data, a position coordinator input unit for inputting present location information of the text to speech conversion system, and a text normalizer connected to the text input unit and the position coordinator input unit for capable of generating a plurality of pronunciation signals indicative of a plurality of pronunciations for a common portion of the text data, the text normalizer selecting one of the pronunciation signals based upon the present location information.
    Type: Grant
    Filed: January 28, 1998
    Date of Patent: January 4, 2000
    Assignee: Ricoh Company, Ltd.
    Inventors: Syuji Kubota, Yuichi Kojima
  • Patent number: 6006187
    Abstract: The present invention discloses a computer prosody user interface operable to visually tailor the prosody of a text to be uttered by a text-to-speech system. The prosody user interface, permits users to alter a synthesized voice along one or more dimensions on a word-by-word basis. In one embodiment of the present invention, the prosody user interface is operable to alter the speaking rate relative word duration and the word prominence of a synthesized voice. Specifically, one or more words are selected using presentation means, and speech parameters corresponding to the speaking rate relative word duration and the word prominence are manipulated using speech parameter manipulation means. Modifications to the speech parameters are accompanied by visual changes to the presentation means, thereby providing a visual feel to the computer prosody user interface.
    Type: Grant
    Filed: October 1, 1996
    Date of Patent: December 21, 1999
    Assignee: Lucent Technologies Inc.
    Inventor: Michael Abraham Tanenblatt
  • Patent number: 5995932
    Abstract: A training system used while a person is speaking uses a feedback modification technique to reduce accents. As the speaker is speaking, the system feeds back to the speaker the speaker's speech in "real-time" so that the speaker, in effect, hears what he or she is saying while saying it. The system includes a detector configured to monitor a speaker's speech to detect a preselected target vowel sound that the speaker wishes to produce accurately. In response to the detector detecting a "target" vowel sound, a cue generator generates a sensory cue (e.g., an amplification of the "target" vowel sound) that is perceived by the speaker. As the speaker is speaking, the generator feeds back to the speaker the sensory cue along with the speech so that the cue is coincident with the "target" vowel sound.
    Type: Grant
    Filed: December 31, 1997
    Date of Patent: November 30, 1999
    Assignee: Scientific Learning Corporation
    Inventor: John F. Houde
  • Patent number: 5983178
    Abstract: A speaker clustering apparatus generates HMMs for clusters based on feature quantities of a vocal-tract configuration of speech waveform data, and a speech recognition apparatus provided with the speaker clustering apparatus. In response to the speech waveform data of N speakers, an estimator estimates feature quantities of vocal-tract configurations, with reference to correspondence between vocal-tract configuration parameters and Formant frequencies predetermined based on a predetermined vocal tract model of a standard speaker. Further, a clustering processor calculates speaker-to-speaker distances between the N speakers based on the feature quantities of the vocal-tract configurations of the N speakers as estimated, and clusters the vocal-tract configurations of the N speakers using a clustering algorithm based on calculated speaker-to-speaker distances, thereby generating K clusters.
    Type: Grant
    Filed: December 10, 1998
    Date of Patent: November 9, 1999
    Assignee: ATR Interpreting Telecommunications Research Laboratories
    Inventors: Masaki Naito, Li Deng, Yoshinori Sagisaka
  • Patent number: 5905970
    Abstract: In a speech coding device for coding an input speech with an AbS (Analysis by Synthesis) system and one of a forward type and a backward type configuration, a vocal tract prediction coefficient generating circuit produces a vocal tract prediction coefficient from one of an input speech signal and a locally reproduced synthetic speech signal. A speech synthesizing circuit produces a synthetic speech signal by using codes stored in an excitation codebook in one-to-one correspondence with indexes, and the vocal tract prediction coefficient. A comparing circuit compares the synthetic speech signal and input speech signal to thereby output an error signal. A perceptual weighting circuit weights the error signal to thereby output a perceptually weighted signal. A codebook index selecting circuit selects an optimal index for the excitation codebook out of at least the weighted signal, and feeds the optimal index to the excitation codebook.
    Type: Grant
    Filed: December 11, 1996
    Date of Patent: May 18, 1999
    Assignee: Oki Electric Industry Co., Ltd.
    Inventor: Hiromi Aoyagi
  • Patent number: 5890118
    Abstract: A speech synthesis apparatus includes; a memory for storing a plurality of typical waveforms corresponding to a plurality of frames, the typical waveforms each previously obtained by extracting in units of at least one frame from a prediction error signal formed in predetermined units, a voiced speech source generator including an interpolation circuit for performing interpolation between the typical waveforms read out from the memory means to obtain a plurality of interpolation signals each having at least one of an interpolation pitch period and a signal level which changes smoothly between the corresponding frames, a superposition circuit for superposing the interpolation signals obtained by the interpolation circuit to form a voiced speech source signal, an unvoiced speech source generator for generating an unvoiced speech source signal, and a vocal tract filter selectively driven by the voiced speech source signal outputted from the voiced speech source generator and the unvoiced speech source signal fro
    Type: Grant
    Filed: March 8, 1996
    Date of Patent: March 30, 1999
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Takehiko Kagoshima, Masami Akamine
  • Patent number: 5876213
    Abstract: A karaoke apparatus is constructed to perform a karaoke accompaniment part and a karaoke harmony part for accompanying a live vocal part. A pickup device collects a singing voice of the live vocal part. A detector device analyzes the collected singing voice to detect a musical register thereof at which the live vocal part is actually performed. A harmony generator device generates a harmony voice of the karaoke harmony part according to the detected musical register so that the karaoke harmony part is made consonant with the live vocal part. A tone generator device generates an instrumental tone of the karaoke accompaniment part in parallel to the karaoke harmony part.
    Type: Grant
    Filed: July 30, 1996
    Date of Patent: March 2, 1999
    Assignee: Yamaha Corporation
    Inventor: Shuichi Matsumoto
  • Patent number: 5826221
    Abstract: In vocal tract prediction coefficient coding and decoding circuitry, a vocal tract prediction coefficient converter/quantizer transforms vocal tract prediction coefficients of consecutive subframes constituting a single frame to corresponding LSP (Line Spectrum Pair) coefficients, quantizes the LSP coefficients, and thereby outputs quantized LSP coefficient values together with indexes assigned thereto. A coding mode decision assumes, e.g., three different coding modes based on the above quantized LSP coefficient values, the quantized LSP coefficient value of the fourth subframe of the previous frame, and the above indexes. The decision determines which coding mode should be used to code the current frame, and outputs mode code information and quantization code information. The circuitry is capable of reproducing high quality faithful speeches without resorting to a high mean coding rate even when the vocal tract prediction coefficient noticeably varies within the frame.
    Type: Grant
    Filed: October 29, 1996
    Date of Patent: October 20, 1998
    Assignee: Oki Electric Industry Co., Ltd.
    Inventor: Hiromi Aoyagi