Vocal Tract Model Patents (Class 704/261)
-
Patent number: 7613611Abstract: Provided is a method and an apparatus for vocal-cord signal recognition. A signal processing unit receives and digitalizes a vocal cord signal, and a noise removing unit which channel noise included in the vocal cord signal. A feature extracting unit extracts a feature vector from the vocal cord signal, which has the channel noise removed therefrom, and a recognizing unit calculates a similarity between the vocal cord signal and the learned model parameter. Consequently, the apparatus is robust in a noisy environment.Type: GrantFiled: May 26, 2005Date of Patent: November 3, 2009Assignee: Electronics and Telecommunications Research InstituteInventors: Kwan Hyun Cho, Mun Sung Han, Young Giu Jung, Hee Sook Shin, Jun Seok Park, Dong Won Han
-
Publication number: 20090222269Abstract: An apparatus for voice synthesis includes: a word database for storing words and voices; a syllable database for storing syllables and voices; a processor for executing a process including: extracting a word from a document, generating a voice signal based on the extracted voice when the extracted word is included in the word database synthesizing a voice signal based on the extracted voice associated with the one or more syllables corresponding to the extracted word when the extracted word is not found in the word database; a speaker for producing a voice based on either of the generated and the synthesized voice signal; and a display for selectively displaying the extracted word when the voice based on the synthesized voice signal is produced by the speaker.Type: ApplicationFiled: May 11, 2009Publication date: September 3, 2009Inventor: Shinichiro MORI
-
Publication number: 20090222268Abstract: A speech synthesis system synthesizes a speech signal corresponding to an input speech signal based on a spectral envelope of the input speech signal. A glottal pulse generator generates a time series of glottal pulses, that are processed into a glottal pulse magnitude spectrum. A shaping circuit shapes the glottal pulse magnitude spectrum based on the spectral envelope and generates a shaped glottal pulse magnitude spectrum. A harmonic null adjustment circuit reduces harmonic nulls in the shaped glottal pulse magnitude spectrum and generates a null-adjusted synthesized speech spectrum. An inverse transform circuit generates a null-adjusted time-series speech signal. An overlap and add circuit synthesizes the speech signal based on the null-adjusted time-series speech signal.Type: ApplicationFiled: March 3, 2008Publication date: September 3, 2009Applicant: QNX SOFTWARE SYSTEMS (WAVEMAKERS), INC.Inventors: Xueman Li, Phillip A. Hetherington, Shahla Parveen, Tommy TSZ Chun Chiu
-
Patent number: 7567903Abstract: A method and apparatus for performing speech recognition are provided. A Vocal Tract Length Normalized acoustic model for a speaker is generated from training data. Speech recognition is performed on a first recognition input to determine a first best hypothesis. A first Vocal Tract Length Normalization factor is estimated based on the first best hypothesis. Speech recognition is performed on a second recognition input using the Vocal Tract Length Normalized acoustic model to determine an other best hypothesis. An other Vocal Tract Length Normalization factor is estimated based on the other best hypothesis and at least one previous best hypothesis.Type: GrantFiled: January 12, 2005Date of Patent: July 28, 2009Assignee: AT&T Intellectual Property II, L.P.Inventors: Vincent Goffin, Andrej Ljolje, Murat Saraclar
-
Patent number: 7529672Abstract: A method of synthesizing a speech signal by providing a first speech unit signal having an end interval and a second speech unit signal having a front interval, wherein at least some of the periods of the end interval are appended in inverted order at the end of the first speech unit signal in order to provide a fade-out interval, and at least some of the periods of the front interval are appended in inverted order at the beginning of the second speech unit signal to provide a fade-in interval. An overlap and add operation is performed on the end and fade-in intervals and the fade-out and front intervals.Type: GrantFiled: August 8, 2003Date of Patent: May 5, 2009Assignee: Koninklijke Philips Electronics N.V.Inventor: Ercan Ferit Gigi
-
Publication number: 20090112596Abstract: A system and method are disclosed for synthesizing speech based on a selected speech act. A method includes modifying synthesized speech of a spoken dialogue system, by (1) receiving a user utterance, (2) analyzing the user utterance to determine an appropriate speech act, and (3) generating a response of a type associated with the appropriate speech act, wherein in linguistic variables in the response are selected, based on the appropriate speech act.Type: ApplicationFiled: October 30, 2007Publication date: April 30, 2009Applicant: AT&T Lab, Inc.Inventors: Ann K. Syrdal, Mark Beutnagel, Alistair D. Conkie, Yeon-Jun Kim
-
Publication number: 20090094031Abstract: An apparatus for providing text independent voice conversion may include a first voice conversion model and a second voice conversion model. The first voice conversion model may be trained with respect to conversion of training source speech to synthetic speech corresponding to the training source speech. The second voice conversion model may be trained with respect to conversion to training target speech from synthetic speech corresponding to the training target speech. An output of the first voice conversion model may be communicated to the second voice conversion model to process source speech input into the first voice conversion model into target speech corresponding to the source speech as the output of the second voice conversion model.Type: ApplicationFiled: October 4, 2007Publication date: April 9, 2009Inventors: Jilei Tian, Victor Popa, Jani K. Nurminen
-
Publication number: 20090063156Abstract: A voice synthesis method, said method comprising a step of choosing a synthetic voice from among a set of voices having predetermined spectral signatures and a step of recording the natural voice of a first person, the method comprising a step of transforming the natural recorded voice so as to conform with the spectral signature of the chosen synthetic voice, the natural voice thereby transformed being recorded, said method comprising a step of determining at least one situation parameter for a first character from among a set of predefined parameters, each predefined parameter being associated with a spectral alteration of the emitted voice, the determined situation parameter particularly characterizing the environment or the physical or psychological state of the character, the method comprising a step of spectrally altering the transformed natural voice so as to conform with the spectral alteration associated with the character's situation parameter.Type: ApplicationFiled: August 26, 2008Publication date: March 5, 2009Applicant: Alcatel LucentInventors: Sylvain SQUEDIN, Serge Papillon
-
Publication number: 20090063155Abstract: The present invention provides a robot apparatus with a vocal interactive function. The robot apparatus receives a vocal input, and recognizes the vocal input. The robot apparatus stores a plurality of output data, an output count of each of the output data, and a weighted value of each of the output data. The robot apparatus outputs output data according to the weighted values of all the output data corresponding to the vocal input, and adds one to the output count of the output data. The robot apparatus calculates the weighted values of all the output data corresponding to the vocal input according to the output count. Consequently, the robot apparatus may output different and variable output data when receiving the same vocal input. The present invention also provides a vocal interactive method adapted for the robot apparatus.Type: ApplicationFiled: August 13, 2008Publication date: March 5, 2009Applicant: HON HAI PRECISION INDUSTRY CO., LTD.Inventors: Tsu-Li Chiang, Chuan-Hong Wang, Kuo-Pao Hung, Kuan-Hong Hsieh
-
Patent number: 7457752Abstract: Method and apparatus for controlling the operation of an emotion synthesizing device, notably of the type where the emotion is conveyed by a sound, having at least one input parameter whose value is used to set a type of emotion to be conveyed, by making at least one parameter a variable parameter over a determined control range, thereby to confer a variability in an amount of the type of emotion to be conveyed. The variable parameter can be made variable according to a variation model over the control range, the model relating a quantity of emotion control variable to the variable parameter, whereby said control variable is used to variably establish a value of said variable parameter. Preferably the variation obeys a linear model, the variable parameter being made to vary linearly with a variation in a quantity of emotion control variable.Type: GrantFiled: August 12, 2002Date of Patent: November 25, 2008Assignee: Sony France S.A.Inventor: Pierre Yves Oudeyer
-
Publication number: 20080288258Abstract: The present invention provides a speech analysis method comprising steps of obtaining a speech signal and a corresponding DEGG/EGG signal; regarding the speech signal as the output of a vocal tract filter in a source-filter model taking the DEGG/EGG signal as the input; and estimating the features of the vocal tract filter from the speech signal as the output and the DEGG/EGG signal as the input, wherein the features of the vocal tract filter are expressed by the state vectors of the vocal tract filter at selected time points, and the step of estimating is performed using Kalman filtering.Type: ApplicationFiled: April 3, 2008Publication date: November 20, 2008Applicant: International Business Machines CorporationInventors: Dan Ning Jiang, Fan Ping Meng, Yong Qin, Zhi Wei Shuang
-
Patent number: 7398213Abstract: The present invention relates to a method and system for diagnosing pathological phenomenon using a voice signal. In one embodiment, the existence of at least one pathological phenomena is determined based at least in part upon a calculated average intensity function associated with speech from the patient. In another embodiment, the existence of at least one pathological phenomena is determined based at least in part upon the a calculated maximum intensity function associated with speech from the patient.Type: GrantFiled: May 16, 2006Date of Patent: July 8, 2008Assignee: Exaudios TechnologiesInventors: Yoram Levanon, Lan Lossos-Shifrin
-
Publication number: 20080154601Abstract: A method and system for providing efficient menu services for an information processing system that uses a telephone or other form of audio user interface. In one embodiment, the menu services provide effective support for novice users by providing a full listing of available keywords and rotating house advertisements which inform novice users of potential features and information. For experienced users, cues are rendered so that at any time the user can say a desired keyword to invoke the corresponding application. The menu is flat to facilitate its usage. Full keyword listings are rendered after the user is given a brief cue to say a keyword. Service messages rotate words and word prosody. When listening to receive information from the user, after the user has been cued, soft background music or other audible signals are rendered to inform the user that a response may now be spoken to the service.Type: ApplicationFiled: November 20, 2007Publication date: June 26, 2008Applicant: Microsoft CorporationInventors: Lisa Joy Stifelman, Hadi Partovi, Haleh Partovi, David Bryan Alpert, Matthew Talin Marx, Scott James Bailey, Kyle D. Sims, Darby McDonough Bailey, Roderick Steven Brathwaite, Eugene Koh, Angus Macdonald Davis
-
Patent number: 7365749Abstract: An animation wireframe is modified with three-dimensional (3D) range and color data having a corresponding shape surface. The animation wireframe is vertically scaled based on distances between consecutive features within the 3D range and color data and corresponding distances within the generic animation wireframe. For each animation wireframe point, the location of the animation wireframe point is adjusted to coincide with a point on the shape surface. The shape surface point lies along a scaling line connecting the animation wireframe point, the shape surface point and an origin point. The scaling line is within a horizontal point.Type: GrantFiled: August 15, 2006Date of Patent: April 29, 2008Assignee: AT&T Corp.Inventor: Joern Ostermann
-
Publication number: 20080077407Abstract: A system, method and computer-readable media are disclosed for improving speech synthesis. A text-to-speech (TTS) voice database for use in a TTS system is generated by a method comprising labeling a voice database phonemically and applying a pre-/post-vocalic distinction to the phonemic labels to generate a TTS voice database. When a system synthesizes speech using speech units from the TTS voice database, the database provides phonemes for selection using the pre-/post-vocalic distinctions which improve unit selection to render the synthetic speech more natural.Type: ApplicationFiled: September 26, 2006Publication date: March 27, 2008Applicant: AT&T Corp.Inventors: Mark Beutnagel, Alistair Conkie, Yeon-Jun Kim, Ann K. Syrdal
-
Patent number: 7330813Abstract: A speech processing apparatus able to enhance formants more naturally, wherein a speech analyzing unit analyzes an input speech signal to find LPCs and converts the LPCs to LSPs, a speech decoding unit calculates a distance between adjacent orders of the LSPs by an LSP analytical processing unit and calculates LSP adjusting amounts of larger values for LSPs of adjacent orders closer in distance by an LSP adjusting amount calculating unit, an LSP adjusting unit adjusts the LSPs based on the LSP adjusting amounts such that the LSPs of adjacent orders closer in distance become closer, an LSP-LPC converting unit converts the adjusted LSPs to LPCs, and an LPC combining unit uses the LPCs and sound source parameters to obtain formant-enhanced speech.Type: GrantFiled: August 5, 2003Date of Patent: February 12, 2008Assignee: Fujitsu limitedInventor: Mutsumi Saito
-
Patent number: 7233900Abstract: The present invention relates to a word sequence output device in which emotional synthetic speech can be output. The device outputs emotional synthetic speech. A text generating unit 31 generates spoken text for synthetic speech by using text as a word sequence included in action command information in accordance with the action command information. An emotion checking unit 39 checks an emotion model value and determines whether or not the emotion of a robot is aroused based on the emotion model value. Further, when the emotion of the robot is aroused, the emotion checking unit 39 instructs the text generating unit 31 to change the word order. The text generating unit 31 changes the word order of the spoken text in accordance with the instructions from the emotion checking unit 39. Accordingly, when the spoken text is “Kimi wa kirei da.” (You are beautiful.), the word order is changed to make a sentence “Kirei da, kimi wa.” (You are beautiful, you are.Type: GrantFiled: April 5, 2002Date of Patent: June 19, 2007Assignee: Sony CorporationInventor: Shinichi Kariya
-
Patent number: 7225129Abstract: A method of modeling speech distinctions within computer-animated talking heads that utilize the manipulation of speech production articulators for selected speech segments. Graphical representations of voice characteristics and speech production characteristics are generated in response to said speech segment. By way of example, breath images are generated such as particle-cloud images, and particle-stream images to represent the voiced characteristics such as the presence of stops and fricatives, respectively. The coloring on exterior portions of the talking head is displayed in response to selected voice characteristics such as nasality. The external physiology of the talking head is modulated, such as by changing the width and movement of the nose, the position of the eyebrows, and movement of the throat in response to the voiced speech characteristics such as pitch, nasality, and voicebox vibration, respectively.Type: GrantFiled: September 20, 2001Date of Patent: May 29, 2007Assignee: The Regents of the University of CaliforniaInventors: Dominic W. Massaro, Michael M. Cohen, Jonas Beskow
-
Patent number: 7219064Abstract: To provide a robot which autonomously forms and performs an action plan in response to external factors without direct command input from an operator. When reading a story printed in a book or other print media or recorded in recording media or when reading a story downloaded through a network, the robot does not simply read every single word as it is written. Instead, the robot uses external factors, such as a change of time, a change of season, or a change in a user's mood, and dynamically alters the story as long as the changed contents are substantially the same as the original contents. As a result, the robot can read aloud the story whose contents would differ every time the story is read.Type: GrantFiled: October 23, 2001Date of Patent: May 15, 2007Assignee: Sony CorporationInventors: Hideki Nakakita, Tomoaki Kasuga
-
Patent number: 7184958Abstract: A speech synthesis method subjects a reference speech signal to windowing to extract a speech pitch wave having a window function of a window length double a pitch period of the reference speech signal from the reference speech signal. A linear prediction coefficient is generated by subjecting the reference speech signal to a linear prediction analysis. The speech pitch wave is subjected to inverse-filtering based on the linear prediction coefficient to produce a residual pitch wave, which is then stored as information of a speech synthesis unit in a voiced period in a storage. Speech using the information of the speech synthesis unit is then synthesized.Type: GrantFiled: March 5, 2004Date of Patent: February 27, 2007Assignee: Kabushiki Kaisha ToshibaInventors: Takehiko Kagoshima, Masami Akamine
-
Patent number: 7162417Abstract: An amplitude altering magnification (r) applied to sub-phoneme units of a voiced portion and an amplitude altering magnification s to be applied to sub-phoneme units of an unvoiced portion are determined based upon a target phoneme average power (p0) of synthesized speech and power (p) of a selected phoneme unit. Sub-phoneme units are extracted from a phoneme to be synthesized. From among the extracted sub-phoneme units, a sub-phoneme unit of the voiced portion is multiplied by the amplitude altering magnification (r), and a sub-phoneme unit of the unvoiced portion is multiplied by the amplitude altering magnification (s). Synthesized speech is obtained using the sub-phoneme units thus obtained. This makes it possible to realize power control in which any decline in the quality of synthesized speech is reduced.Type: GrantFiled: July 13, 2005Date of Patent: January 9, 2007Assignee: Canon Kabushiki KaishaInventors: Masayuki Yamada, Yasuhiro Komori, Mitsuru Otsuka
-
Patent number: 7113909Abstract: A stereotypical sentence is synthesized into a voice of an arbitrary speech style. A third party is able to prepare prosody data and a user of a terminal device having a voice synthesizing part can acquire the prosody data. The voice synthesizing method determines a voice-contents identifier to point to a type of voice contents of a stereotypical sentence, prepares a speech style dictionary including speech style and prosody data which correspond to the voice-contents identifier, selects prosody data of the synthesized voice to be generated from the speech style dictionary, and adds the selected prosody data to a voice synthesizer 13 as voice-synthesizer driving data to thereby perform voice synthesis with a specific speech style. Thus, a voice of a stereotypical sentence can be synthesized with an arbitrary speech style.Type: GrantFiled: July 31, 2001Date of Patent: September 26, 2006Assignee: Hitachi, Ltd.Inventors: Nobuo Nukaga, Kenji Nagamatsu, Yoshinori Kitahara
-
Patent number: 7085718Abstract: It is suggested to include application speech (AS) into the set of identification speech data (ISD) for training a speaker-identification process so as to make possible a reduction of the set of initial identification speech data (IISD) to be collected within an initial enrolment phase and therefore to add more convenience for the user to be registered or enrolled.Type: GrantFiled: May 6, 2002Date of Patent: August 1, 2006Assignee: Sony Deutschland GmbHInventor: Thomas Kemp
-
Patent number: 7082395Abstract: A means and method are provided for enhancing or replacing the natural excitation of the human vocal tract by artificial excitation means, wherein the artificially created acoustics present additional spectral, temporal, or phase data useful for (1) enhancing the machine recognition robustness of audible speech or (2) enabling more robust machine-recognition of relatively inaudible mouthed or whispered speech. The artificial excitation (a) may be arranged to be audible or inaudible, (b) may be designed to be non-interfering with another user's similar means, (c) may be used in one or both of a vocal content-enhancement mode or a complimentary vocal tract-probing mode, and/or (d) may be used for the recognition of audible or inaudible continuous speech or isolated spoken commands.Type: GrantFiled: October 3, 2002Date of Patent: July 25, 2006Inventors: Carol A. Tosaya, John W. Sliwa, Jr.
-
Patent number: 6993484Abstract: An amplitude altering magnification (r) applied to sub-phoneme units of a voiced portion and an amplitude altering magnification s to be applied to sub-phoneme units of an unvoiced portion are determined based upon a target phoneme average power (p0) of synthesized speech and power (p) of a selected phoneme unit. Sub-phoneme units are extracted from a phoneme to be synthesized. From among the extracted sub-phoneme units, a sub-phoneme unit of the voiced portion is multiplied by the amplitude altering magnification (r), and a sub-phoneme unit of the unvoiced portion is multiplied by the amplitude altering magnification (s). Synthesized speech is obtained using the sub-phoneme units thus obtained. This makes it possible to realize power control in which any decline in the quality of synthesized speech is reduced.Type: GrantFiled: August 30, 1999Date of Patent: January 31, 2006Assignee: Canon Kabushiki KaishaInventors: Masayuki Yamada, Yasuhiro Komori, Mitsuru Otsuka
-
Patent number: 6990451Abstract: A method of making a digital voice library utilized for converting text to concatenated voice in accordance with a set of playback rules includes generating a complex tone that reflects a particular inflection required for a particular voice recording of a particular speech item. The complex tone is composed of portions of a recording of a voice talent uttering a vocal sequence. The voice talent is recorded reciting the particular speech item to make the particular voice recording. The voice talent uses the complex tone as a guide to allow the voice talent to recite the particular speech item in accordance with the particular inflection.Type: GrantFiled: June 1, 2001Date of Patent: January 24, 2006Assignee: Qwest Communications International Inc.Inventors: Eliot M. Case, Richard P. Phillips
-
Patent number: 6990450Abstract: A method for converting text to concatenated voice by utilizing a digital voice library and a set of playback rules is provided. Multiple voice recordings correspond to a single speech item and represent various inflections of that single speech item. The method includes determining syllable count and impact value for each speech item in a sequence of speech items. A desired inflection for each speech item is determined based on the syllable count and the impact value and further based on a set of playback rules. A sequence of voice recordings is determined by determining a voice recording for each speech item based on the desired inflection and based on the available voice recordings that correspond to the particular speech item. Voice data are generated based on a sequence of voice recordings by concatenating adjacent recordings in the sequence of voice recordings.Type: GrantFiled: March 27, 2001Date of Patent: January 24, 2006Assignee: Qwest Communications International Inc.Inventors: Eliot M. Case, Judith L. Weirauch, Richard P. Phillips
-
Patent number: 6970820Abstract: The speech synthesizer is personalized to sound like or mimic the speech characteristics of an individual speaker. The individual speaker provides a quantity of enrollment data, which can be extracted from a short quantity of speech, and the system modifies the base synthesis parameters to more closely resemble those of the new speaker. More specifically, the synthesis parameters may be decomposed into speaker dependent parameters, such as context-independent parameters, and speaker independent parameters, such as context dependent parameters. The speaker dependent parameters are adapted using enrollment data from the new speaker. After adaptation, the speaker dependent parameters are combined with the speaker independent parameters to provide a set of personalized synthesis parameters.Type: GrantFiled: February 26, 2001Date of Patent: November 29, 2005Assignee: Matsushita Electric Industrial Co., Ltd.Inventors: Jean-Claude Junqua, Florent Perronnin, Roland Kuhn, Patrick Nguyen
-
Patent number: 6950799Abstract: A speech processing system modifies various aspects of input speech according to a user-selected one of various preprogrammed voice fonts. Initially, the speech converter receives a formants signal representing an input speech signal and a pitch signal representing the input signal's fundamental frequency. One or both of the following may also be received: a voicing signal comprising an indication of whether the input speech signal is voiced, unvoiced, or mixed, and/or a gain signal representing the input speech signal's energy. The speech converter also receives user selection of one of multiple preprogrammed voice fonts, each specifying a manner of modifying one or more of the received signals (i.e., formants, voicing, pitch, gain). The speech converter modifies at least one of the formants, voicing, pitch, and/or gain signals as specified by the selected voice font.Type: GrantFiled: February 19, 2002Date of Patent: September 27, 2005Assignee: Qualcomm Inc.Inventors: Ning Bi, Andrew P. DeJaco
-
Patent number: 6810379Abstract: A client/server text-to-speech synthesis system and method divides the method optimally between client and server. The server stores large databases for pronunciation analysis, prosody generation, and acoustic unit selection corresponding to a normalized text, while the client performs computationally intensive decompression and concatenation of selected acoustic units to generate speech. The units are transmitted from the client to the server in a highly compressed format, with a compression method selected based on the predetermined set of potential acoustic units. This compression method allows for very high-quality and natural-sounding speech to be output at the client machine.Type: GrantFiled: April 24, 2001Date of Patent: October 26, 2004Assignee: Sensory, Inc.Inventors: Pieter Vermeulen, Todd F. Mozer
-
Patent number: 6801894Abstract: A speech synthesizer includes a data memory having a plurality of address areas, which stores a plurality of phases in the address areas and an address designating circuit designating one of the address areas based on the phase signal. Further, a speech synthesizer includes a speech synthesizing circuit generating a speech synthesizing signal corresponding to the phase, which is stored in the designated area, a digital/analog converter transforming the speech synthesizing signal to an analog signal having amplitude, and a counter setting a period of silence. Furthermore, a speech synthesizer includes a silence-input circuit being connected between the speech synthesizing circuit and the digital/analog converter, which supplies a predetermined voltage to the digital/analog converter for the period that is set by the counter.Type: GrantFiled: March 22, 2001Date of Patent: October 5, 2004Assignee: Oki Electric Industry Co., Ltd.Inventors: Yoshihisa Nakamura, Hiroaki Matsubara
-
Publication number: 20030061050Abstract: A means and method are provided for enhancing or replacing the natural excitation of the human vocal tract by artificial excitation means, wherein the artificially created acoustics present additional spectral, temporal, or phase data useful for (1) enhancing the machine recognition robustness of audible speech or (2) enabling more robust machine-recognition of relatively inaudible mouthed or whispered speech. The artificial excitation (a) may be arranged to be audible or inaudible, (b) may be designed to be non-interfering with another user's similar means, (c) may be used in one or both of a vocal content-enhancement mode or a complimentary vocal tract-probing mode, and/or (d) may be used for the recognition of audible or inaudible continuous speech or isolated spoken commands.Type: ApplicationFiled: November 27, 2002Publication date: March 27, 2003Inventors: Carol A. Tosaya, John W. Sliwa
-
Patent number: 6477496Abstract: A method, system and product are provided for synthesizing sound using encoded audio signals having a plurality of frequency subbands, each subband having a scale factor and sample data associated therewith. The method includes selecting a spectral envelope, and selecting a plurality of frequency subbands, each subband having sample data associated therewith. The method also includes generating a synthetic encoded audio signal having a plurality of frequency subbands, the subbands having the selected spectral envelope and the selected sample data. The system includes control logic for performing the method. The product includes a storage medium having computer readable programmed instructions for performing the method.Type: GrantFiled: December 20, 1996Date of Patent: November 5, 2002Inventor: Eliot M. Case
-
Patent number: 6463412Abstract: A high performance voice transformation apparatus and method is provided in which voice input is transformed into a symbolic representation of phonemes in the voice input. The symbolic representation is used to retrieve output voice segments of a selected target speaker for use in outputting the voice input in a different voice. In addition, voice input characteristics are extracted from the voice input and are then applied to the output voice segments to thereby provide a more realistic human sounding voice output.Type: GrantFiled: December 16, 1999Date of Patent: October 8, 2002Assignee: International Business Machines CorporationInventors: Jason Raymond Baumgartner, Steven Leonard Roberts, Nadeem Malik, Flemming Andersen
-
Patent number: 6453287Abstract: A system and method for enhancing the speech quality of the mixed excitation linear predictive (MELP) coder and other low bit-rate speech coders. The system and method employ a plosive analysis/synthesis method, which detects the frame containing a plosive signal, applies a simple model to synthesize the plosive signal, and adds the synthesized plosive to the coded speech. The system and method remains compatible with the existing MELP coder bit stream.Type: GrantFiled: September 29, 1999Date of Patent: September 17, 2002Assignee: Georgia-Tech Research CorporationInventors: Takahiro Unno, Thomas P. Barnwell, III, Kwan K. Truong
-
Patent number: 6347298Abstract: A computerized apparatus for reducing the size of a dictionary used in a text-to-speech synthesis system are provided. In an initial phase, the method and apparatus determine if entries in the dictionary, each containing a grapheme string and a corresponding phoneme string, can be fully matched by using at least one rule set used to synthesize words to phonemic data. If the entry can be fully matched using rule processing alone, the entry is indicated to be deleted from the dictionary. In a second phase, the method and apparatus determine if the entry, considered as a root word entry, is required in the dictionary in order to support phoneme synthesis of other entries containing the root word entry, and if so, the root word entry is indicated to be saved in the dictionary.Type: GrantFiled: February 26, 2001Date of Patent: February 12, 2002Assignee: Compaq Computer CorporationInventors: Anthony J. Vitale, Ginger Chun-Che Lin, Thomas Kopec
-
Patent number: 6317713Abstract: Sound generating parameters are used for outputting fundamental frequency and a command regarding prosody, and a sound source generator. The sound generation device further includes use of an accent command and a descent command for calculating fundamental frequency and incorporates a rhythm command, which is representable by a sine wave. The device also uses character string analysis for analyzing a character string and generating a command concerning phoneme and prosody, a calculating element for outputting fundamental frequency as sound generation parameters, which depends on prosody, a sound source generator, and an articulator that depends on a phoneme command.Type: GrantFiled: January 6, 1999Date of Patent: November 13, 2001Assignee: Arcadia, Inc.Inventor: Seiichi Tenpaku
-
Patent number: 6208968Abstract: A computerized method and apparatus for reducing the size of a dictionary used in a text-to-speech synthesis system are provided. In an initial phase, the method and apparatus determine if entries in the dictionary, each containing a grapheme string and a corresponding phoneme string, can be fully matched by using at least one rule set used to synthesize words to phonemic data. If the entry can be fully matched using rule processing alone, the entry is indicated to be deleted from the dictionary. In a second phase, the method and apparatus determine if the entry, considered as a root word entry, is required in the dictionary in order to support phoneme synthesis of other entries containing the root word entry, and if so, the root word entry is indicated to be saved in the dictionary.Type: GrantFiled: December 16, 1998Date of Patent: March 27, 2001Assignee: Compaq Computer CorporationInventors: Anthony J. Vitale, Ginger Chun-Che Lin, Thomas Kopec
-
Patent number: 6195632Abstract: An iterative formant analysis, based on minimizing the arc-length of various curves, and under various filter constraints estimates formant frequencies with desirable properties for text-to-speech applications. A class of arc-length cost functions may be employed. Some of these have analytic solutions and thus lend themselves well to applications requiring speed and reliability. The arc-length inverse filtering techniques are inherently pitch synchronous and are useful in realizing high quality pitch tracking and pitch epoch marking.Type: GrantFiled: November 25, 1998Date of Patent: February 27, 2001Assignee: Matsushita Electric Industrial Co., Ltd.Inventor: Steve Pearson
-
Patent number: 6122616Abstract: The present invention improves upon electronic speech synthesis using pre-recorded segments of speech to fill in for other missing segments of speech. The formalized aliasing approach of the present invention overcomes the ad hoc aliasing approach of the prior art which oftentimes generated less than satisfactory speech synthesis sound output. By formalizing the relationship between missing speech sound samples and available speech sound samples, the present invention provides a structured approach to aliasing which results in improved synthetic speech sound quality. Further, the formalized aliasing approach of the present invention can be used to lessen storage requirements for speech sound samples by only storing as many sound samples as memory capacity can support.Type: GrantFiled: July 3, 1996Date of Patent: September 19, 2000Assignee: Apple Computer, Inc.Inventor: Caroline G. Henton
-
Patent number: 6101469Abstract: For use in a synthesizer having a wave source that produces a periodic wave, frequency shifting circuitry for frequency-shifting the periodic wave and waveshaping circuitry for transforming the periodic wave into a waveform containing a formant, the frequency-shifting causing displacement of the formant, a circuit for, and method of, compensating for the displacement and a synthesizer employing the circuit or the method. In one embodiment, the circuit includes bias circuitry, coupled to the wave source and the frequency shifting circuitry, that introduces a bias into the periodic wave based on a degree to which the frequency shifting circuitry frequency shifts the periodic wave, the bias reducing a degree to which the formant is correspondingly frequency-shifted.Type: GrantFiled: March 2, 1998Date of Patent: August 8, 2000Assignee: Lucent Technologies Inc.Inventor: Steven D. Curtin
-
Patent number: 6044345Abstract: Human speech is coded by singling out from a transfer function of the speech, all poles that are unrelated to any particular resonance of a human vocal tract model. All other poles are maintained. A glottal pulse related sequence is defined representing the singled out poles through an explicitation of the derivative of the glottal air flow. Speech is outputted by a filter based on combining the glottal pulse related sequence and a representation of a formant filter with a complex transfer function expressing all other poles. The glottal pulse sequence is modelled through further explicitly expressible generation parameters. In particular, a non-zero decaying return phase supplemented to the glottal-pulse response that is explicitized in all its parameters, while amending the overall response in accordance with volumetric continuity.Type: GrantFiled: April 17, 1998Date of Patent: March 28, 2000Assignee: U.S. Phillips CorporationInventor: Raymond N. J. Veldhuis
-
Patent number: 6012028Abstract: The text to speech conversion system distinguishes geographical names based upon the present position and includes a text input unit for inputting text data, a position coordinator input unit for inputting present location information of the text to speech conversion system, and a text normalizer connected to the text input unit and the position coordinator input unit for capable of generating a plurality of pronunciation signals indicative of a plurality of pronunciations for a common portion of the text data, the text normalizer selecting one of the pronunciation signals based upon the present location information.Type: GrantFiled: January 28, 1998Date of Patent: January 4, 2000Assignee: Ricoh Company, Ltd.Inventors: Syuji Kubota, Yuichi Kojima
-
Patent number: 6006187Abstract: The present invention discloses a computer prosody user interface operable to visually tailor the prosody of a text to be uttered by a text-to-speech system. The prosody user interface, permits users to alter a synthesized voice along one or more dimensions on a word-by-word basis. In one embodiment of the present invention, the prosody user interface is operable to alter the speaking rate relative word duration and the word prominence of a synthesized voice. Specifically, one or more words are selected using presentation means, and speech parameters corresponding to the speaking rate relative word duration and the word prominence are manipulated using speech parameter manipulation means. Modifications to the speech parameters are accompanied by visual changes to the presentation means, thereby providing a visual feel to the computer prosody user interface.Type: GrantFiled: October 1, 1996Date of Patent: December 21, 1999Assignee: Lucent Technologies Inc.Inventor: Michael Abraham Tanenblatt
-
Patent number: 5995932Abstract: A training system used while a person is speaking uses a feedback modification technique to reduce accents. As the speaker is speaking, the system feeds back to the speaker the speaker's speech in "real-time" so that the speaker, in effect, hears what he or she is saying while saying it. The system includes a detector configured to monitor a speaker's speech to detect a preselected target vowel sound that the speaker wishes to produce accurately. In response to the detector detecting a "target" vowel sound, a cue generator generates a sensory cue (e.g., an amplification of the "target" vowel sound) that is perceived by the speaker. As the speaker is speaking, the generator feeds back to the speaker the sensory cue along with the speech so that the cue is coincident with the "target" vowel sound.Type: GrantFiled: December 31, 1997Date of Patent: November 30, 1999Assignee: Scientific Learning CorporationInventor: John F. Houde
-
Patent number: 5983178Abstract: A speaker clustering apparatus generates HMMs for clusters based on feature quantities of a vocal-tract configuration of speech waveform data, and a speech recognition apparatus provided with the speaker clustering apparatus. In response to the speech waveform data of N speakers, an estimator estimates feature quantities of vocal-tract configurations, with reference to correspondence between vocal-tract configuration parameters and Formant frequencies predetermined based on a predetermined vocal tract model of a standard speaker. Further, a clustering processor calculates speaker-to-speaker distances between the N speakers based on the feature quantities of the vocal-tract configurations of the N speakers as estimated, and clusters the vocal-tract configurations of the N speakers using a clustering algorithm based on calculated speaker-to-speaker distances, thereby generating K clusters.Type: GrantFiled: December 10, 1998Date of Patent: November 9, 1999Assignee: ATR Interpreting Telecommunications Research LaboratoriesInventors: Masaki Naito, Li Deng, Yoshinori Sagisaka
-
Patent number: 5905970Abstract: In a speech coding device for coding an input speech with an AbS (Analysis by Synthesis) system and one of a forward type and a backward type configuration, a vocal tract prediction coefficient generating circuit produces a vocal tract prediction coefficient from one of an input speech signal and a locally reproduced synthetic speech signal. A speech synthesizing circuit produces a synthetic speech signal by using codes stored in an excitation codebook in one-to-one correspondence with indexes, and the vocal tract prediction coefficient. A comparing circuit compares the synthetic speech signal and input speech signal to thereby output an error signal. A perceptual weighting circuit weights the error signal to thereby output a perceptually weighted signal. A codebook index selecting circuit selects an optimal index for the excitation codebook out of at least the weighted signal, and feeds the optimal index to the excitation codebook.Type: GrantFiled: December 11, 1996Date of Patent: May 18, 1999Assignee: Oki Electric Industry Co., Ltd.Inventor: Hiromi Aoyagi
-
Patent number: 5890118Abstract: A speech synthesis apparatus includes; a memory for storing a plurality of typical waveforms corresponding to a plurality of frames, the typical waveforms each previously obtained by extracting in units of at least one frame from a prediction error signal formed in predetermined units, a voiced speech source generator including an interpolation circuit for performing interpolation between the typical waveforms read out from the memory means to obtain a plurality of interpolation signals each having at least one of an interpolation pitch period and a signal level which changes smoothly between the corresponding frames, a superposition circuit for superposing the interpolation signals obtained by the interpolation circuit to form a voiced speech source signal, an unvoiced speech source generator for generating an unvoiced speech source signal, and a vocal tract filter selectively driven by the voiced speech source signal outputted from the voiced speech source generator and the unvoiced speech source signal froType: GrantFiled: March 8, 1996Date of Patent: March 30, 1999Assignee: Kabushiki Kaisha ToshibaInventors: Takehiko Kagoshima, Masami Akamine
-
Patent number: 5876213Abstract: A karaoke apparatus is constructed to perform a karaoke accompaniment part and a karaoke harmony part for accompanying a live vocal part. A pickup device collects a singing voice of the live vocal part. A detector device analyzes the collected singing voice to detect a musical register thereof at which the live vocal part is actually performed. A harmony generator device generates a harmony voice of the karaoke harmony part according to the detected musical register so that the karaoke harmony part is made consonant with the live vocal part. A tone generator device generates an instrumental tone of the karaoke accompaniment part in parallel to the karaoke harmony part.Type: GrantFiled: July 30, 1996Date of Patent: March 2, 1999Assignee: Yamaha CorporationInventor: Shuichi Matsumoto
-
Patent number: 5826221Abstract: In vocal tract prediction coefficient coding and decoding circuitry, a vocal tract prediction coefficient converter/quantizer transforms vocal tract prediction coefficients of consecutive subframes constituting a single frame to corresponding LSP (Line Spectrum Pair) coefficients, quantizes the LSP coefficients, and thereby outputs quantized LSP coefficient values together with indexes assigned thereto. A coding mode decision assumes, e.g., three different coding modes based on the above quantized LSP coefficient values, the quantized LSP coefficient value of the fourth subframe of the previous frame, and the above indexes. The decision determines which coding mode should be used to code the current frame, and outputs mode code information and quantization code information. The circuitry is capable of reproducing high quality faithful speeches without resorting to a high mean coding rate even when the vocal tract prediction coefficient noticeably varies within the frame.Type: GrantFiled: October 29, 1996Date of Patent: October 20, 1998Assignee: Oki Electric Industry Co., Ltd.Inventor: Hiromi Aoyagi