Vocal Tract Model Patents (Class 704/261)

Method and apparatus for vocal-cord signal recognition

Patent number: 7613611

Abstract: Provided is a method and an apparatus for vocal-cord signal recognition. A signal processing unit receives and digitalizes a vocal cord signal, and a noise removing unit which channel noise included in the vocal cord signal. A feature extracting unit extracts a feature vector from the vocal cord signal, which has the channel noise removed therefrom, and a recognizing unit calculates a similarity between the vocal cord signal and the learned model parameter. Consequently, the apparatus is robust in a noisy environment.

Type: Grant

Filed: May 26, 2005

Date of Patent: November 3, 2009

Assignee: Electronics and Telecommunications Research Institute

Inventors: Kwan Hyun Cho, Mun Sung Han, Young Giu Jung, Hee Sook Shin, Jun Seok Park, Dong Won Han
SENTENCE READING ALOUD APPARATUS, CONTROL METHOD FOR CONTROLLING THE SAME, AND CONTROL PROGRAM FOR CONTROLLING THE SAME

Publication number: 20090222269

Abstract: An apparatus for voice synthesis includes: a word database for storing words and voices; a syllable database for storing syllables and voices; a processor for executing a process including: extracting a word from a document, generating a voice signal based on the extracted voice when the extracted word is included in the word database synthesizing a voice signal based on the extracted voice associated with the one or more syllables corresponding to the extracted word when the extracted word is not found in the word database; a speaker for producing a voice based on either of the generated and the synthesized voice signal; and a display for selectively displaying the extracted word when the voice based on the synthesized voice signal is produced by the speaker.

Type: Application

Filed: May 11, 2009

Publication date: September 3, 2009

Inventor: Shinichiro MORI
SPEECH SYNTHESIS SYSTEM HAVING ARTIFICIAL EXCITATION SIGNAL

Publication number: 20090222268

Abstract: A speech synthesis system synthesizes a speech signal corresponding to an input speech signal based on a spectral envelope of the input speech signal. A glottal pulse generator generates a time series of glottal pulses, that are processed into a glottal pulse magnitude spectrum. A shaping circuit shapes the glottal pulse magnitude spectrum based on the spectral envelope and generates a shaped glottal pulse magnitude spectrum. A harmonic null adjustment circuit reduces harmonic nulls in the shaped glottal pulse magnitude spectrum and generates a null-adjusted synthesized speech spectrum. An inverse transform circuit generates a null-adjusted time-series speech signal. An overlap and add circuit synthesizes the speech signal based on the null-adjusted time-series speech signal.

Type: Application

Filed: March 3, 2008

Publication date: September 3, 2009

Applicant: QNX SOFTWARE SYSTEMS (WAVEMAKERS), INC.

Inventors: Xueman Li, Phillip A. Hetherington, Shahla Parveen, Tommy TSZ Chun Chiu
Low latency real-time vocal tract length normalization

Patent number: 7567903

Abstract: A method and apparatus for performing speech recognition are provided. A Vocal Tract Length Normalized acoustic model for a speaker is generated from training data. Speech recognition is performed on a first recognition input to determine a first best hypothesis. A first Vocal Tract Length Normalization factor is estimated based on the first best hypothesis. Speech recognition is performed on a second recognition input using the Vocal Tract Length Normalized acoustic model to determine an other best hypothesis. An other Vocal Tract Length Normalization factor is estimated based on the other best hypothesis and at least one previous best hypothesis.

Type: Grant

Filed: January 12, 2005

Date of Patent: July 28, 2009

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Vincent Goffin, Andrej Ljolje, Murat Saraclar
Speech synthesis using concatenation of speech waveforms

Patent number: 7529672

Abstract: A method of synthesizing a speech signal by providing a first speech unit signal having an end interval and a second speech unit signal having a front interval, wherein at least some of the periods of the end interval are appended in inverted order at the end of the first speech unit signal in order to provide a fade-out interval, and at least some of the periods of the front interval are appended in inverted order at the beginning of the second speech unit signal to provide a fade-in interval. An overlap and add operation is performed on the end and fade-in intervals and the fade-out and front intervals.

Type: Grant

Filed: August 8, 2003

Date of Patent: May 5, 2009

Assignee: Koninklijke Philips Electronics N.V.

Inventor: Ercan Ferit Gigi
SYSTEM AND METHOD FOR IMPROVING SYNTHESIZED SPEECH INTERACTIONS OF A SPOKEN DIALOG SYSTEM

Publication number: 20090112596

Abstract: A system and method are disclosed for synthesizing speech based on a selected speech act. A method includes modifying synthesized speech of a spoken dialogue system, by (1) receiving a user utterance, (2) analyzing the user utterance to determine an appropriate speech act, and (3) generating a response of a type associated with the appropriate speech act, wherein in linguistic variables in the response are selected, based on the appropriate speech act.

Type: Application

Filed: October 30, 2007

Publication date: April 30, 2009

Applicant: AT&T Lab, Inc.

Inventors: Ann K. Syrdal, Mark Beutnagel, Alistair D. Conkie, Yeon-Jun Kim
Method, Apparatus and Computer Program Product for Providing Text Independent Voice Conversion

Publication number: 20090094031

Abstract: An apparatus for providing text independent voice conversion may include a first voice conversion model and a second voice conversion model. The first voice conversion model may be trained with respect to conversion of training source speech to synthetic speech corresponding to the training source speech. The second voice conversion model may be trained with respect to conversion to training target speech from synthetic speech corresponding to the training target speech. An output of the first voice conversion model may be communicated to the second voice conversion model to process source speech input into the first voice conversion model into target speech corresponding to the source speech as the output of the second voice conversion model.

Type: Application

Filed: October 4, 2007

Publication date: April 9, 2009

Inventors: Jilei Tian, Victor Popa, Jani K. Nurminen
VOICE SYNTHESIS METHOD AND INTERPERSONAL COMMUNICATION METHOD, PARTICULARLY FOR MULTIPLAYER ONLINE GAMES

Publication number: 20090063156

Abstract: A voice synthesis method, said method comprising a step of choosing a synthetic voice from among a set of voices having predetermined spectral signatures and a step of recording the natural voice of a first person, the method comprising a step of transforming the natural recorded voice so as to conform with the spectral signature of the chosen synthetic voice, the natural voice thereby transformed being recorded, said method comprising a step of determining at least one situation parameter for a first character from among a set of predefined parameters, each predefined parameter being associated with a spectral alteration of the emitted voice, the determined situation parameter particularly characterizing the environment or the physical or psychological state of the character, the method comprising a step of spectrally altering the transformed natural voice so as to conform with the spectral alteration associated with the character's situation parameter.

Type: Application

Filed: August 26, 2008

Publication date: March 5, 2009

Applicant: Alcatel Lucent

Inventors: Sylvain SQUEDIN, Serge Papillon
ROBOT APPARATUS WITH VOCAL INTERACTIVE FUNCTION AND METHOD THEREFOR

Publication number: 20090063155

Abstract: The present invention provides a robot apparatus with a vocal interactive function. The robot apparatus receives a vocal input, and recognizes the vocal input. The robot apparatus stores a plurality of output data, an output count of each of the output data, and a weighted value of each of the output data. The robot apparatus outputs output data according to the weighted values of all the output data corresponding to the vocal input, and adds one to the output count of the output data. The robot apparatus calculates the weighted values of all the output data corresponding to the vocal input according to the output count. Consequently, the robot apparatus may output different and variable output data when receiving the same vocal input. The present invention also provides a vocal interactive method adapted for the robot apparatus.

Type: Application

Filed: August 13, 2008

Publication date: March 5, 2009

Applicant: HON HAI PRECISION INDUSTRY CO., LTD.

Inventors: Tsu-Li Chiang, Chuan-Hong Wang, Kuo-Pao Hung, Kuan-Hong Hsieh
Method and apparatus for controlling the operation of an emotion synthesizing device

Patent number: 7457752

Abstract: Method and apparatus for controlling the operation of an emotion synthesizing device, notably of the type where the emotion is conveyed by a sound, having at least one input parameter whose value is used to set a type of emotion to be conveyed, by making at least one parameter a variable parameter over a determined control range, thereby to confer a variability in an amount of the type of emotion to be conveyed. The variable parameter can be made variable according to a variation model over the control range, the model relating a quantity of emotion control variable to the variable parameter, whereby said control variable is used to variably establish a value of said variable parameter. Preferably the variation obeys a linear model, the variable parameter being made to vary linearly with a variation in a quantity of emotion control variable.

Type: Grant

Filed: August 12, 2002

Date of Patent: November 25, 2008

Assignee: Sony France S.A.

Inventor: Pierre Yves Oudeyer
METHOD AND APPARATUS FOR SPEECH ANALYSIS AND SYNTHESIS

Publication number: 20080288258

Abstract: The present invention provides a speech analysis method comprising steps of obtaining a speech signal and a corresponding DEGG/EGG signal; regarding the speech signal as the output of a vocal tract filter in a source-filter model taking the DEGG/EGG signal as the input; and estimating the features of the vocal tract filter from the speech signal as the output and the DEGG/EGG signal as the input, wherein the features of the vocal tract filter are expressed by the state vectors of the vocal tract filter at selected time points, and the step of estimating is performed using Kalman filtering.

Type: Application

Filed: April 3, 2008

Publication date: November 20, 2008

Applicant: International Business Machines Corporation

Inventors: Dan Ning Jiang, Fan Ping Meng, Yong Qin, Zhi Wei Shuang
Method and system for diagnosing pathological phenomenon using a voice signal

Patent number: 7398213

Abstract: The present invention relates to a method and system for diagnosing pathological phenomenon using a voice signal. In one embodiment, the existence of at least one pathological phenomena is determined based at least in part upon a calculated average intensity function associated with speech from the patient. In another embodiment, the existence of at least one pathological phenomena is determined based at least in part upon the a calculated maximum intensity function associated with speech from the patient.

Type: Grant

Filed: May 16, 2006

Date of Patent: July 8, 2008

Assignee: Exaudios Technologies

Inventors: Yoram Levanon, Lan Lossos-Shifrin
METHOD AND SYSTEM FOR PROVIDING MENU AND OTHER SERVICES FOR AN INFORMATION PROCESSING SYSTEM USING A TELEPHONE OR OTHER AUDIO INTERFACE

Publication number: 20080154601

Abstract: A method and system for providing efficient menu services for an information processing system that uses a telephone or other form of audio user interface. In one embodiment, the menu services provide effective support for novice users by providing a full listing of available keywords and rotating house advertisements which inform novice users of potential features and information. For experienced users, cues are rendered so that at any time the user can say a desired keyword to invoke the corresponding application. The menu is flat to facilitate its usage. Full keyword listings are rendered after the user is given a brief cue to say a keyword. Service messages rotate words and word prosody. When listening to receive information from the user, after the user has been cued, soft background music or other audible signals are rendered to inform the user that a response may now be spoken to the service.

Type: Application

Filed: November 20, 2007

Publication date: June 26, 2008

Applicant: Microsoft Corporation

Inventors: Lisa Joy Stifelman, Hadi Partovi, Haleh Partovi, David Bryan Alpert, Matthew Talin Marx, Scott James Bailey, Kyle D. Sims, Darby McDonough Bailey, Roderick Steven Brathwaite, Eugene Koh, Angus Macdonald Davis
Computer readable medium for modifying an animation wire frame

Patent number: 7365749

Abstract: An animation wireframe is modified with three-dimensional (3D) range and color data having a corresponding shape surface. The animation wireframe is vertically scaled based on distances between consecutive features within the 3D range and color data and corresponding distances within the generic animation wireframe. For each animation wireframe point, the location of the animation wireframe point is adjusted to coincide with a point on the shape surface. The shape surface point lies along a scaling line connecting the animation wireframe point, the shape surface point and an origin point. The scaling line is within a horizontal point.

Type: Grant

Filed: August 15, 2006

Date of Patent: April 29, 2008

Assignee: AT&T Corp.

Inventor: Joern Ostermann
PHONETICALLY ENRICHED LABELING IN UNIT SELECTION SPEECH SYNTHESIS

Publication number: 20080077407

Abstract: A system, method and computer-readable media are disclosed for improving speech synthesis. A text-to-speech (TTS) voice database for use in a TTS system is generated by a method comprising labeling a voice database phonemically and applying a pre-/post-vocalic distinction to the phonemic labels to generate a TTS voice database. When a system synthesizes speech using speech units from the TTS voice database, the database provides phonemes for selection using the pre-/post-vocalic distinctions which improve unit selection to render the synthetic speech more natural.

Type: Application

Filed: September 26, 2006

Publication date: March 27, 2008

Applicant: AT&T Corp.

Inventors: Mark Beutnagel, Alistair Conkie, Yeon-Jun Kim, Ann K. Syrdal
Speech processing apparatus and mobile communication terminal

Patent number: 7330813

Abstract: A speech processing apparatus able to enhance formants more naturally, wherein a speech analyzing unit analyzes an input speech signal to find LPCs and converts the LPCs to LSPs, a speech decoding unit calculates a distance between adjacent orders of the LSPs by an LSP analytical processing unit and calculates LSP adjusting amounts of larger values for LSPs of adjacent orders closer in distance by an LSP adjusting amount calculating unit, an LSP adjusting unit adjusts the LSPs based on the LSP adjusting amounts such that the LSPs of adjacent orders closer in distance become closer, an LSP-LPC converting unit converts the adjusted LSPs to LPCs, and an LPC combining unit uses the LPCs and sound source parameters to obtain formant-enhanced speech.

Type: Grant

Filed: August 5, 2003

Date of Patent: February 12, 2008

Assignee: Fujitsu limited

Inventor: Mutsumi Saito
Word sequence output device

Patent number: 7233900

Abstract: The present invention relates to a word sequence output device in which emotional synthetic speech can be output. The device outputs emotional synthetic speech. A text generating unit 31 generates spoken text for synthetic speech by using text as a word sequence included in action command information in accordance with the action command information. An emotion checking unit 39 checks an emotion model value and determines whether or not the emotion of a robot is aroused based on the emotion model value. Further, when the emotion of the robot is aroused, the emotion checking unit 39 instructs the text generating unit 31 to change the word order. The text generating unit 31 changes the word order of the spoken text in accordance with the instructions from the emotion checking unit 39. Accordingly, when the spoken text is “Kimi wa kirei da.” (You are beautiful.), the word order is changed to make a sentence “Kirei da, kimi wa.” (You are beautiful, you are.

Type: Grant

Filed: April 5, 2002

Date of Patent: June 19, 2007

Assignee: Sony Corporation

Inventor: Shinichi Kariya
Visual display methods for in computer-animated speech production models

Patent number: 7225129

Abstract: A method of modeling speech distinctions within computer-animated talking heads that utilize the manipulation of speech production articulators for selected speech segments. Graphical representations of voice characteristics and speech production characteristics are generated in response to said speech segment. By way of example, breath images are generated such as particle-cloud images, and particle-stream images to represent the voiced characteristics such as the presence of stops and fricatives, respectively. The coloring on exterior portions of the talking head is displayed in response to selected voice characteristics such as nasality. The external physiology of the talking head is modulated, such as by changing the width and movement of the nose, the position of the eyebrows, and movement of the throat in response to the voiced speech characteristics such as pitch, nasality, and voicebox vibration, respectively.

Type: Grant

Filed: September 20, 2001

Date of Patent: May 29, 2007

Assignee: The Regents of the University of California

Inventors: Dominic W. Massaro, Michael M. Cohen, Jonas Beskow
Legged robot, legged robot behavior control method, and storage medium

Patent number: 7219064

Abstract: To provide a robot which autonomously forms and performs an action plan in response to external factors without direct command input from an operator. When reading a story printed in a book or other print media or recorded in recording media or when reading a story downloaded through a network, the robot does not simply read every single word as it is written. Instead, the robot uses external factors, such as a change of time, a change of season, or a change in a user's mood, and dynamically alters the story as long as the changed contents are substantially the same as the original contents. As a result, the robot can read aloud the story whose contents would differ every time the story is read.

Type: Grant

Filed: October 23, 2001

Date of Patent: May 15, 2007

Assignee: Sony Corporation

Inventors: Hideki Nakakita, Tomoaki Kasuga
Speech synthesis method

Patent number: 7184958

Abstract: A speech synthesis method subjects a reference speech signal to windowing to extract a speech pitch wave having a window function of a window length double a pitch period of the reference speech signal from the reference speech signal. A linear prediction coefficient is generated by subjecting the reference speech signal to a linear prediction analysis. The speech pitch wave is subjected to inverse-filtering based on the linear prediction coefficient to produce a residual pitch wave, which is then stored as information of a speech synthesis unit in a voiced period in a storage. Speech using the information of the speech synthesis unit is then synthesized.

Type: Grant

Filed: March 5, 2004

Date of Patent: February 27, 2007

Assignee: Kabushiki Kaisha Toshiba

Inventors: Takehiko Kagoshima, Masami Akamine
Speech synthesizing method and apparatus for altering amplitudes of voiced and invoiced portions

Patent number: 7162417

Abstract: An amplitude altering magnification (r) applied to sub-phoneme units of a voiced portion and an amplitude altering magnification s to be applied to sub-phoneme units of an unvoiced portion are determined based upon a target phoneme average power (p0) of synthesized speech and power (p) of a selected phoneme unit. Sub-phoneme units are extracted from a phoneme to be synthesized. From among the extracted sub-phoneme units, a sub-phoneme unit of the voiced portion is multiplied by the amplitude altering magnification (r), and a sub-phoneme unit of the unvoiced portion is multiplied by the amplitude altering magnification (s). Synthesized speech is obtained using the sub-phoneme units thus obtained. This makes it possible to realize power control in which any decline in the quality of synthesized speech is reduced.

Type: Grant

Filed: July 13, 2005

Date of Patent: January 9, 2007

Assignee: Canon Kabushiki Kaisha

Inventors: Masayuki Yamada, Yasuhiro Komori, Mitsuru Otsuka
Voice synthesizing method and voice synthesizer performing the same

Patent number: 7113909

Abstract: A stereotypical sentence is synthesized into a voice of an arbitrary speech style. A third party is able to prepare prosody data and a user of a terminal device having a voice synthesizing part can acquire the prosody data. The voice synthesizing method determines a voice-contents identifier to point to a type of voice contents of a stereotypical sentence, prepares a speech style dictionary including speech style and prosody data which correspond to the voice-contents identifier, selects prosody data of the synthesized voice to be generated from the speech style dictionary, and adds the selected prosody data to a voice synthesizer 13 as voice-synthesizer driving data to thereby perform voice synthesis with a specific speech style. Thus, a voice of a stereotypical sentence can be synthesized with an arbitrary speech style.

Type: Grant

Filed: July 31, 2001

Date of Patent: September 26, 2006

Assignee: Hitachi, Ltd.

Inventors: Nobuo Nukaga, Kenji Nagamatsu, Yoshinori Kitahara
Method for speaker-identification using application speech

Patent number: 7085718

Abstract: It is suggested to include application speech (AS) into the set of identification speech data (ISD) for training a speaker-identification process so as to make possible a reduction of the set of initial identification speech data (IISD) to be collected within an initial enrolment phase and therefore to add more convenience for the user to be registered or enrolled.

Type: Grant

Filed: May 6, 2002

Date of Patent: August 1, 2006

Assignee: Sony Deutschland GmbH

Inventor: Thomas Kemp
Signal injection coupling into the human vocal tract for robust audible and inaudible voice recognition

Patent number: 7082395

Abstract: A means and method are provided for enhancing or replacing the natural excitation of the human vocal tract by artificial excitation means, wherein the artificially created acoustics present additional spectral, temporal, or phase data useful for (1) enhancing the machine recognition robustness of audible speech or (2) enabling more robust machine-recognition of relatively inaudible mouthed or whispered speech. The artificial excitation (a) may be arranged to be audible or inaudible, (b) may be designed to be non-interfering with another user's similar means, (c) may be used in one or both of a vocal content-enhancement mode or a complimentary vocal tract-probing mode, and/or (d) may be used for the recognition of audible or inaudible continuous speech or isolated spoken commands.

Type: Grant

Filed: October 3, 2002

Date of Patent: July 25, 2006

Inventors: Carol A. Tosaya, John W. Sliwa, Jr.
Speech synthesizing method and apparatus

Patent number: 6993484

Abstract: An amplitude altering magnification (r) applied to sub-phoneme units of a voiced portion and an amplitude altering magnification s to be applied to sub-phoneme units of an unvoiced portion are determined based upon a target phoneme average power (p0) of synthesized speech and power (p) of a selected phoneme unit. Sub-phoneme units are extracted from a phoneme to be synthesized. From among the extracted sub-phoneme units, a sub-phoneme unit of the voiced portion is multiplied by the amplitude altering magnification (r), and a sub-phoneme unit of the unvoiced portion is multiplied by the amplitude altering magnification (s). Synthesized speech is obtained using the sub-phoneme units thus obtained. This makes it possible to realize power control in which any decline in the quality of synthesized speech is reduced.

Type: Grant

Filed: August 30, 1999

Date of Patent: January 31, 2006

Assignee: Canon Kabushiki Kaisha

Inventors: Masayuki Yamada, Yasuhiro Komori, Mitsuru Otsuka
Method and apparatus for recording prosody for fully concatenated speech

Patent number: 6990451

Abstract: A method of making a digital voice library utilized for converting text to concatenated voice in accordance with a set of playback rules includes generating a complex tone that reflects a particular inflection required for a particular voice recording of a particular speech item. The complex tone is composed of portions of a recording of a voice talent uttering a vocal sequence. The voice talent is recorded reciting the particular speech item to make the particular voice recording. The voice talent uses the complex tone as a guide to allow the voice talent to recite the particular speech item in accordance with the particular inflection.

Type: Grant

Filed: June 1, 2001

Date of Patent: January 24, 2006

Assignee: Qwest Communications International Inc.

Inventors: Eliot M. Case, Richard P. Phillips
System and method for converting text-to-voice

Patent number: 6990450

Abstract: A method for converting text to concatenated voice by utilizing a digital voice library and a set of playback rules is provided. Multiple voice recordings correspond to a single speech item and represent various inflections of that single speech item. The method includes determining syllable count and impact value for each speech item in a sequence of speech items. A desired inflection for each speech item is determined based on the syllable count and the impact value and further based on a set of playback rules. A sequence of voice recordings is determined by determining a voice recording for each speech item based on the desired inflection and based on the available voice recordings that correspond to the particular speech item. Voice data are generated based on a sequence of voice recordings by concatenating adjacent recordings in the sequence of voice recordings.

Type: Grant

Filed: March 27, 2001

Date of Patent: January 24, 2006

Assignee: Qwest Communications International Inc.

Inventors: Eliot M. Case, Judith L. Weirauch, Richard P. Phillips
Voice personalization of speech synthesizer

Patent number: 6970820

Abstract: The speech synthesizer is personalized to sound like or mimic the speech characteristics of an individual speaker. The individual speaker provides a quantity of enrollment data, which can be extracted from a short quantity of speech, and the system modifies the base synthesis parameters to more closely resemble those of the new speaker. More specifically, the synthesis parameters may be decomposed into speaker dependent parameters, such as context-independent parameters, and speaker independent parameters, such as context dependent parameters. The speaker dependent parameters are adapted using enrollment data from the new speaker. After adaptation, the speaker dependent parameters are combined with the speaker independent parameters to provide a set of personalized synthesis parameters.

Type: Grant

Filed: February 26, 2001

Date of Patent: November 29, 2005

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventors: Jean-Claude Junqua, Florent Perronnin, Roland Kuhn, Patrick Nguyen
Speech converter utilizing preprogrammed voice profiles

Patent number: 6950799

Abstract: A speech processing system modifies various aspects of input speech according to a user-selected one of various preprogrammed voice fonts. Initially, the speech converter receives a formants signal representing an input speech signal and a pitch signal representing the input signal's fundamental frequency. One or both of the following may also be received: a voicing signal comprising an indication of whether the input speech signal is voiced, unvoiced, or mixed, and/or a gain signal representing the input speech signal's energy. The speech converter also receives user selection of one of multiple preprogrammed voice fonts, each specifying a manner of modifying one or more of the received signals (i.e., formants, voicing, pitch, gain). The speech converter modifies at least one of the formants, voicing, pitch, and/or gain signals as specified by the selected voice font.

Type: Grant

Filed: February 19, 2002

Date of Patent: September 27, 2005

Assignee: Qualcomm Inc.

Inventors: Ning Bi, Andrew P. DeJaco
Client/server architecture for text-to-speech synthesis

Patent number: 6810379

Abstract: A client/server text-to-speech synthesis system and method divides the method optimally between client and server. The server stores large databases for pronunciation analysis, prosody generation, and acoustic unit selection corresponding to a normalized text, while the client performs computationally intensive decompression and concatenation of selected acoustic units to generate speech. The units are transmitted from the client to the server in a highly compressed format, with a compression method selected based on the predetermined set of potential acoustic units. This compression method allows for very high-quality and natural-sounding speech to be output at the client machine.

Type: Grant

Filed: April 24, 2001

Date of Patent: October 26, 2004

Assignee: Sensory, Inc.

Inventors: Pieter Vermeulen, Todd F. Mozer
Speech synthesizer that interrupts audio output to provide pause/silence between words

Patent number: 6801894

Abstract: A speech synthesizer includes a data memory having a plurality of address areas, which stores a plurality of phases in the address areas and an address designating circuit designating one of the address areas based on the phase signal. Further, a speech synthesizer includes a speech synthesizing circuit generating a speech synthesizing signal corresponding to the phase, which is stored in the designated area, a digital/analog converter transforming the speech synthesizing signal to an analog signal having amplitude, and a counter setting a period of silence. Furthermore, a speech synthesizer includes a silence-input circuit being connected between the speech synthesizing circuit and the digital/analog converter, which supplies a predetermined voltage to the digital/analog converter for the period that is set by the counter.

Type: Grant

Filed: March 22, 2001

Date of Patent: October 5, 2004

Assignee: Oki Electric Industry Co., Ltd.

Inventors: Yoshihisa Nakamura, Hiroaki Matsubara
Signal injection coupling into the human vocal tract for robust audible and inaudible voice recognition

Publication number: 20030061050

Abstract: A means and method are provided for enhancing or replacing the natural excitation of the human vocal tract by artificial excitation means, wherein the artificially created acoustics present additional spectral, temporal, or phase data useful for (1) enhancing the machine recognition robustness of audible speech or (2) enabling more robust machine-recognition of relatively inaudible mouthed or whispered speech. The artificial excitation (a) may be arranged to be audible or inaudible, (b) may be designed to be non-interfering with another user's similar means, (c) may be used in one or both of a vocal content-enhancement mode or a complimentary vocal tract-probing mode, and/or (d) may be used for the recognition of audible or inaudible continuous speech or isolated spoken commands.

Type: Application

Filed: November 27, 2002

Publication date: March 27, 2003

Inventors: Carol A. Tosaya, John W. Sliwa
Signal synthesis by decoding subband scale factors from one audio signal and subband samples from different one

Patent number: 6477496

Abstract: A method, system and product are provided for synthesizing sound using encoded audio signals having a plurality of frequency subbands, each subband having a scale factor and sample data associated therewith. The method includes selecting a spectral envelope, and selecting a plurality of frequency subbands, each subband having sample data associated therewith. The method also includes generating a synthetic encoded audio signal having a plurality of frequency subbands, the subbands having the selected spectral envelope and the selected sample data. The system includes control logic for performing the method. The product includes a storage medium having computer readable programmed instructions for performing the method.

Type: Grant

Filed: December 20, 1996

Date of Patent: November 5, 2002

Inventor: Eliot M. Case
High performance voice transformation apparatus and method

Patent number: 6463412

Abstract: A high performance voice transformation apparatus and method is provided in which voice input is transformed into a symbolic representation of phonemes in the voice input. The symbolic representation is used to retrieve output voice segments of a selected target speaker for use in outputting the voice input in a different voice. In addition, voice input characteristics are extracted from the voice input and are then applied to the output voice segments to thereby provide a more realistic human sounding voice output.

Type: Grant

Filed: December 16, 1999

Date of Patent: October 8, 2002

Assignee: International Business Machines Corporation

Inventors: Jason Raymond Baumgartner, Steven Leonard Roberts, Nadeem Malik, Flemming Andersen
Apparatus and quality enhancement algorithm for mixed excitation linear predictive (MELP) and other speech coders

Patent number: 6453287

Abstract: A system and method for enhancing the speech quality of the mixed excitation linear predictive (MELP) coder and other low bit-rate speech coders. The system and method employ a plosive analysis/synthesis method, which detects the frame containing a plosive signal, applies a simple model to synthesize the plosive signal, and adds the synthesized plosive to the coded speech. The system and method remains compatible with the existing MELP coder bit stream.

Type: Grant

Filed: September 29, 1999

Date of Patent: September 17, 2002

Assignee: Georgia-Tech Research Corporation

Inventors: Takahiro Unno, Thomas P. Barnwell, III, Kwan K. Truong
Computer apparatus for text-to-speech synthesizer dictionary reduction

Patent number: 6347298

Abstract: A computerized apparatus for reducing the size of a dictionary used in a text-to-speech synthesis system are provided. In an initial phase, the method and apparatus determine if entries in the dictionary, each containing a grapheme string and a corresponding phoneme string, can be fully matched by using at least one rule set used to synthesize words to phonemic data. If the entry can be fully matched using rule processing alone, the entry is indicated to be deleted from the dictionary. In a second phase, the method and apparatus determine if the entry, considered as a root word entry, is required in the dictionary in order to support phoneme synthesis of other entries containing the root word entry, and if so, the root word entry is indicated to be saved in the dictionary.

Type: Grant

Filed: February 26, 2001

Date of Patent: February 12, 2002

Assignee: Compaq Computer Corporation

Inventors: Anthony J. Vitale, Ginger Chun-Che Lin, Thomas Kopec
Speech synthesis based on cricothyroid and cricoid modeling

Patent number: 6317713

Abstract: Sound generating parameters are used for outputting fundamental frequency and a command regarding prosody, and a sound source generator. The sound generation device further includes use of an accent command and a descent command for calculating fundamental frequency and incorporates a rhythm command, which is representable by a sine wave. The device also uses character string analysis for analyzing a character string and generating a command concerning phoneme and prosody, a calculating element for outputting fundamental frequency as sound generation parameters, which depends on prosody, a sound source generator, and an articulator that depends on a phoneme command.

Type: Grant

Filed: January 6, 1999

Date of Patent: November 13, 2001

Assignee: Arcadia, Inc.

Inventor: Seiichi Tenpaku
Computer method and apparatus for text-to-speech synthesizer dictionary reduction

Patent number: 6208968

Abstract: A computerized method and apparatus for reducing the size of a dictionary used in a text-to-speech synthesis system are provided. In an initial phase, the method and apparatus determine if entries in the dictionary, each containing a grapheme string and a corresponding phoneme string, can be fully matched by using at least one rule set used to synthesize words to phonemic data. If the entry can be fully matched using rule processing alone, the entry is indicated to be deleted from the dictionary. In a second phase, the method and apparatus determine if the entry, considered as a root word entry, is required in the dictionary in order to support phoneme synthesis of other entries containing the root word entry, and if so, the root word entry is indicated to be saved in the dictionary.

Type: Grant

Filed: December 16, 1998

Date of Patent: March 27, 2001

Assignee: Compaq Computer Corporation

Inventors: Anthony J. Vitale, Ginger Chun-Che Lin, Thomas Kopec
Extracting formant-based source-filter data for coding and synthesis employing cost function and inverse filtering

Patent number: 6195632

Abstract: An iterative formant analysis, based on minimizing the arc-length of various curves, and under various filter constraints estimates formant frequencies with desirable properties for text-to-speech applications. A class of arc-length cost functions may be employed. Some of these have analytic solutions and thus lend themselves well to applications requiring speed and reliability. The arc-length inverse filtering techniques are inherently pitch synchronous and are useful in realizing high quality pitch tracking and pitch epoch marking.

Type: Grant

Filed: November 25, 1998

Date of Patent: February 27, 2001

Assignee: Matsushita Electric Industrial Co., Ltd.

Inventor: Steve Pearson
Method and apparatus for diphone aliasing

Patent number: 6122616

Abstract: The present invention improves upon electronic speech synthesis using pre-recorded segments of speech to fill in for other missing segments of speech. The formalized aliasing approach of the present invention overcomes the ad hoc aliasing approach of the prior art which oftentimes generated less than satisfactory speech synthesis sound output. By formalizing the relationship between missing speech sound samples and available speech sound samples, the present invention provides a structured approach to aliasing which results in improved synthetic speech sound quality. Further, the formalized aliasing approach of the present invention can be used to lessen storage requirements for speech sound samples by only storing as many sound samples as memory capacity can support.

Type: Grant

Filed: July 3, 1996

Date of Patent: September 19, 2000

Assignee: Apple Computer, Inc.

Inventor: Caroline G. Henton
Formant shift-compensated sound synthesizer and method of operation thereof

Patent number: 6101469

Abstract: For use in a synthesizer having a wave source that produces a periodic wave, frequency shifting circuitry for frequency-shifting the periodic wave and waveshaping circuitry for transforming the periodic wave into a waveform containing a formant, the frequency-shifting causing displacement of the formant, a circuit for, and method of, compensating for the displacement and a synthesizer employing the circuit or the method. In one embodiment, the circuit includes bias circuitry, coupled to the wave source and the frequency shifting circuitry, that introduces a bias into the periodic wave based on a degree to which the frequency shifting circuitry frequency shifts the periodic wave, the bias reducing a degree to which the formant is correspondingly frequency-shifted.

Type: Grant

Filed: March 2, 1998

Date of Patent: August 8, 2000

Assignee: Lucent Technologies Inc.

Inventor: Steven D. Curtin
Method and system for coding human speech for subsequent reproduction thereof

Patent number: 6044345

Abstract: Human speech is coded by singling out from a transfer function of the speech, all poles that are unrelated to any particular resonance of a human vocal tract model. All other poles are maintained. A glottal pulse related sequence is defined representing the singled out poles through an explicitation of the derivative of the glottal air flow. Speech is outputted by a filter based on combining the glottal pulse related sequence and a representation of a formant filter with a complex transfer function expressing all other poles. The glottal pulse sequence is modelled through further explicitly expressible generation parameters. In particular, a non-zero decaying return phase supplemented to the glottal-pulse response that is explicitized in all its parameters, while amending the overall response in accordance with volumetric continuity.

Type: Grant

Filed: April 17, 1998

Date of Patent: March 28, 2000

Assignee: U.S. Phillips Corporation

Inventor: Raymond N. J. Veldhuis
Text to speech conversion system and method that distinguishes geographical names based upon the present position

Patent number: 6012028

Abstract: The text to speech conversion system distinguishes geographical names based upon the present position and includes a text input unit for inputting text data, a position coordinator input unit for inputting present location information of the text to speech conversion system, and a text normalizer connected to the text input unit and the position coordinator input unit for capable of generating a plurality of pronunciation signals indicative of a plurality of pronunciations for a common portion of the text data, the text normalizer selecting one of the pronunciation signals based upon the present location information.

Type: Grant

Filed: January 28, 1998

Date of Patent: January 4, 2000

Assignee: Ricoh Company, Ltd.

Inventors: Syuji Kubota, Yuichi Kojima
Computer prosody user interface

Patent number: 6006187

Abstract: The present invention discloses a computer prosody user interface operable to visually tailor the prosody of a text to be uttered by a text-to-speech system. The prosody user interface, permits users to alter a synthesized voice along one or more dimensions on a word-by-word basis. In one embodiment of the present invention, the prosody user interface is operable to alter the speaking rate relative word duration and the word prominence of a synthesized voice. Specifically, one or more words are selected using presentation means, and speech parameters corresponding to the speaking rate relative word duration and the word prominence are manipulated using speech parameter manipulation means. Modifications to the speech parameters are accompanied by visual changes to the presentation means, thereby providing a visual feel to the computer prosody user interface.

Type: Grant

Filed: October 1, 1996

Date of Patent: December 21, 1999

Assignee: Lucent Technologies Inc.

Inventor: Michael Abraham Tanenblatt
Feedback modification for accent reduction

Patent number: 5995932

Abstract: A training system used while a person is speaking uses a feedback modification technique to reduce accents. As the speaker is speaking, the system feeds back to the speaker the speaker's speech in "real-time" so that the speaker, in effect, hears what he or she is saying while saying it. The system includes a detector configured to monitor a speaker's speech to detect a preselected target vowel sound that the speaker wishes to produce accurately. In response to the detector detecting a "target" vowel sound, a cue generator generates a sensory cue (e.g., an amplification of the "target" vowel sound) that is perceived by the speaker. As the speaker is speaking, the generator feeds back to the speaker the sensory cue along with the speech so that the cue is coincident with the "target" vowel sound.

Type: Grant

Filed: December 31, 1997

Date of Patent: November 30, 1999

Assignee: Scientific Learning Corporation

Inventor: John F. Houde
Speaker clustering apparatus based on feature quantities of vocal-tract configuration and speech recognition apparatus therewith

Patent number: 5983178

Abstract: A speaker clustering apparatus generates HMMs for clusters based on feature quantities of a vocal-tract configuration of speech waveform data, and a speech recognition apparatus provided with the speaker clustering apparatus. In response to the speech waveform data of N speakers, an estimator estimates feature quantities of vocal-tract configurations, with reference to correspondence between vocal-tract configuration parameters and Formant frequencies predetermined based on a predetermined vocal tract model of a standard speaker. Further, a clustering processor calculates speaker-to-speaker distances between the N speakers based on the feature quantities of the vocal-tract configurations of the N speakers as estimated, and clusters the vocal-tract configurations of the N speakers using a clustering algorithm based on calculated speaker-to-speaker distances, thereby generating K clusters.

Type: Grant

Filed: December 10, 1998

Date of Patent: November 9, 1999

Assignee: ATR Interpreting Telecommunications Research Laboratories

Inventors: Masaki Naito, Li Deng, Yoshinori Sagisaka
Speech coding device for estimating an error of power envelopes of synthetic and input speech signals

Patent number: 5905970

Abstract: In a speech coding device for coding an input speech with an AbS (Analysis by Synthesis) system and one of a forward type and a backward type configuration, a vocal tract prediction coefficient generating circuit produces a vocal tract prediction coefficient from one of an input speech signal and a locally reproduced synthetic speech signal. A speech synthesizing circuit produces a synthetic speech signal by using codes stored in an excitation codebook in one-to-one correspondence with indexes, and the vocal tract prediction coefficient. A comparing circuit compares the synthetic speech signal and input speech signal to thereby output an error signal. A perceptual weighting circuit weights the error signal to thereby output a perceptually weighted signal. A codebook index selecting circuit selects an optimal index for the excitation codebook out of at least the weighted signal, and feeds the optimal index to the excitation codebook.

Type: Grant

Filed: December 11, 1996

Date of Patent: May 18, 1999

Assignee: Oki Electric Industry Co., Ltd.

Inventor: Hiromi Aoyagi
Interpolating between representative frame waveforms of a prediction error signal for speech synthesis

Patent number: 5890118

Abstract: A speech synthesis apparatus includes; a memory for storing a plurality of typical waveforms corresponding to a plurality of frames, the typical waveforms each previously obtained by extracting in units of at least one frame from a prediction error signal formed in predetermined units, a voiced speech source generator including an interpolation circuit for performing interpolation between the typical waveforms read out from the memory means to obtain a plurality of interpolation signals each having at least one of an interpolation pitch period and a signal level which changes smoothly between the corresponding frames, a superposition circuit for superposing the interpolation signals obtained by the interpolation circuit to form a voiced speech source signal, an unvoiced speech source generator for generating an unvoiced speech source signal, and a vocal tract filter selectively driven by the voiced speech source signal outputted from the voiced speech source generator and the unvoiced speech source signal fro

Type: Grant

Filed: March 8, 1996

Date of Patent: March 30, 1999

Assignee: Kabushiki Kaisha Toshiba

Inventors: Takehiko Kagoshima, Masami Akamine
Karaoke apparatus detecting register of live vocal to tune harmony vocal

Patent number: 5876213

Abstract: A karaoke apparatus is constructed to perform a karaoke accompaniment part and a karaoke harmony part for accompanying a live vocal part. A pickup device collects a singing voice of the live vocal part. A detector device analyzes the collected singing voice to detect a musical register thereof at which the live vocal part is actually performed. A harmony generator device generates a harmony voice of the karaoke harmony part according to the detected musical register so that the karaoke harmony part is made consonant with the live vocal part. A tone generator device generates an instrumental tone of the karaoke accompaniment part in parallel to the karaoke harmony part.

Type: Grant

Filed: July 30, 1996

Date of Patent: March 2, 1999

Assignee: Yamaha Corporation

Inventor: Shuichi Matsumoto
Vocal tract prediction coefficient coding and decoding circuitry capable of adaptively selecting quantized values and interpolation values

Patent number: 5826221

Abstract: In vocal tract prediction coefficient coding and decoding circuitry, a vocal tract prediction coefficient converter/quantizer transforms vocal tract prediction coefficients of consecutive subframes constituting a single frame to corresponding LSP (Line Spectrum Pair) coefficients, quantizes the LSP coefficients, and thereby outputs quantized LSP coefficient values together with indexes assigned thereto. A coding mode decision assumes, e.g., three different coding modes based on the above quantized LSP coefficient values, the quantized LSP coefficient value of the fourth subframe of the previous frame, and the above indexes. The decision determines which coding mode should be used to code the current frame, and outputs mode code information and quantization code information. The circuitry is capable of reproducing high quality faithful speeches without resorting to a high mean coding rate even when the vocal tract prediction coefficient noticeably varies within the frame.

Type: Grant

Filed: October 29, 1996

Date of Patent: October 20, 1998

Assignee: Oki Electric Industry Co., Ltd.

Inventor: Hiromi Aoyagi

prev 1 2 3 4 next