Patents by Inventor Ellen M. Eide

Ellen M. Eide has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Systems and methods for text-to-speech synthesis using spoken example

Patent number: 8886538

Abstract: Systems and methods for speech synthesis and, in particular, text-to-speech systems and methods for converting a text input to a synthetic waveform by processing prosodic and phonetic content of a spoken example of the text input to accurately mimic the input speech style and pronunciation. Systems and methods provide an interface to a TTS system to allow a user to input a text string and a spoken utterance of the text string, extract prosodic parameters from the spoken input, and process the prosodic parameters to derive corresponding markup for the text input to enable a more natural sounding synthesized speech.

Type: Grant

Filed: September 26, 2003

Date of Patent: November 11, 2014

Assignee: Nuance Communications, Inc.

Inventors: Andy Aaron, Raimo Bakis, Ellen M. Eide, Wael M. Hamza
Providing expressive user interaction with a multimodal application

Patent number: 8725513

Abstract: Methods, apparatus, and products are disclosed for providing expressive user interaction with a multimodal application, the multimodal application operating in a multimodal browser on a multimodal device supporting multiple modes of user interaction including a voice mode and one or more non-voice modes, the multimodal application operatively coupled to a speech engine through a VoiceXML interpreter, including: receiving, by the multimodal browser, user input from a user through a particular mode of user interaction; determining, by the multimodal browser, user output for the user in dependence upon the user input; determining, by the multimodal browser, a style for the user output in dependence upon the user input, the style specifying expressive output characteristics for at least one other mode of user interaction; and rendering, by the multimodal browser, the user output in dependence upon the style.

Type: Grant

Filed: April 12, 2007

Date of Patent: May 13, 2014

Assignee: Nuance Communications, Inc.

Inventors: Charles W. Cross, Jr., Ellen M. Eide, Igor R. Jablokov
System for tuning synthesized speech

Patent number: 8438032

Abstract: An embodiment of the invention is a software tool used to convert text, speech synthesis markup language (SSML), and or extended SSML to synthesized audio. Provisions are provided to create, view, play, and edit the synthesized speech including editing pitch and duration targets, speaking type, paralinguistic events, and prosody. Prosody can be provided by way of a sample recording. Users can interact with the software tool by way of a graphical user interface (GUI). The software tool can produce synthesized audio file output in many file formats.

Type: Grant

Filed: January 9, 2007

Date of Patent: May 7, 2013

Assignee: Nuance Communications, Inc.

Inventors: Raimo Bakis, Ellen M. Eide, Roberto Pieraccini, Maria E. Smith, Jie Zeng
ON DEMAND TTS VOCABULARY FOR A TELEMATICS SYSTEM

Publication number: 20120095676

Abstract: A driving directions system loads into memory a limited subset of prerecorded, spoken utterances of geographic names from a mass media storage. The subset of spoken utterances may be limited, for example, to the geographic names within a predetermined radius (e.g., a few miles) of the driver's present location. The present location of the driver may be manually entered into the driving directions system by the driver, or automatically determined using a global positioning system (“GPS”) receiver. As the vehicle moves from its present location, the driving directions system loads into memory new names from the mass media storage and overwrites, if necessary, those which are now geographically out of range. Based on the current location of the driving, the driving directions system can audibly output geographic names from the run-time memory.

Type: Application

Filed: October 24, 2011

Publication date: April 19, 2012

Applicant: Nuance Communications, Inc.

Inventors: Raimo Bakis, Ellen M. Eide, Wael Hamza
Application of emotion-based intonation and prosody to speech in text-to-speech systems

Patent number: 8065150

Abstract: A text-to-speech system that includes an arrangement for accepting text input, an arrangement for providing synthetic speech output, and an arrangement for imparting emotion-based features to synthetic speech output. The arrangement for imparting emotion-based features includes an arrangement for accepting instruction for imparting at least one emotion-based paradigm to synthetic speech output, as well as an arrangement for applying at least one emotion-based paradigm to synthetic speech output.

Type: Grant

Filed: July 14, 2008

Date of Patent: November 22, 2011

Assignee: Nuance Communications, Inc.

Inventor: Ellen M. Eide
On demand TTS vocabulary for a telematics system

Patent number: 8046213

Abstract: A driving directions system loads into memory a limited subset of prerecorded, spoken utterances of geographic names from a mass media storage. The subset of spoken utterances may be limited, for example, to the geographic names within a predetermined radius (e.g., a few miles) of the driver's present location. The present location of the driver may be manually entered into the driving directions system by the driver, or automatically determined using a global positioning system (“GPS”) receiver. As the vehicle moves from its present location, the driving directions system loads into memory new names from the mass media storage and overwrites, if necessary, those which are now geographically out of range. Based on the current location of the driving, the driving directions system can audibly output geographic names from the run-time memory.

Type: Grant

Filed: August 6, 2004

Date of Patent: October 25, 2011

Assignee: Nuance Communications, Inc.

Inventors: Raimo Bakis, Ellen M. Eide, Wael Hamza
Application of emotion-based intonation and prosody to speech in text-to-speech systems

Patent number: 7966185

Abstract: A text-to-speech system that includes an arrangement for accepting text input, an arrangement for providing synthetic speech output, and an arrangement for imparting emotion-based features to synthetic speech output. The arrangement for imparting emotion-based features includes an arrangement for accepting instruction for imparting at least one emotion-based paradigm to synthetic speech output, as well as an arrangement for applying at least one emotion-based paradigm to synthetic speech output.

Type: Grant

Filed: July 14, 2008

Date of Patent: June 21, 2011

Assignee: Nuance Communications, Inc.

Inventor: Ellen M. Eide
System and method for rescoring N-best hypotheses of an automatic speech recognition system

Patent number: 7761296

Abstract: A system and method for rescoring the N-best hypotheses from an automatic speech recognition system by comparing an original speech waveform to synthetic speech waveforms that are generated for each text sequence of the N-best hypotheses. A distance is calculated from the original speech waveform to each of the synthesized waveforms, and the text associated with the synthesized waveform that is determined to be closest to the original waveform is selected as the final hypothesis. The original waveform and each synthesized waveform are aligned to a corresponding text sequence on a phoneme level. The mean of the feature vectors which align to each phoneme is computed for the original waveform as well as for each of the synthesized hypotheses.

Type: Grant

Filed: April 2, 1999

Date of Patent: July 20, 2010

Assignee: International Business Machines Corporation

Inventors: Raimo Bakis, Ellen M. Eide
Method, apparatus and computer program providing a multi-speaker database for concatenative text-to-speech synthesis

Patent number: 7716052

Abstract: A method, apparatus and a computer program product to generate an audible speech word that corresponds to text. The method includes providing a text word and, in response to the text word, processing pre-recorded speech segments that are derived from a plurality of speakers to selectively concatenate together speech segments based on at least one cost function to form audio data for generating an audible speech word that corresponds to the text word. A data structure is also provided for use in a concatenative text-to-speech system that includes a plurality of speech segments derived from a plurality of speakers, where each speech segment includes an associated attribute vector each of which is comprised of at least one attribute vector element that identifies the speaker from which the speech segment was derived.

Type: Grant

Filed: April 7, 2005

Date of Patent: May 11, 2010

Assignee: Nuance Communications, Inc.

Inventors: Andrew S. Aaron, Ellen M. Eide, Wael M. Hamza, Michael A. Picheny, Charles T. Rutherfoord, Zhi Wei Shuang, Maria E. Smith
System and method for dynamically selecting among TTS systems

Patent number: 7702510

Abstract: Systems and methods for dynamically selecting among text-to-speech (TTS) systems. Exemplary embodiments of the systems and methods include identifying text for converting into a speech waveform, synthesizing said text by three TTS systems, generating a candidate waveform from each of the three systems, generating a score from each of the three systems, comparing each of the three scores, selecting a score based on a criteria and selecting one of the three waveforms based on the selected of the three scores.

Type: Grant

Filed: January 12, 2007

Date of Patent: April 20, 2010

Assignee: Nuance Communications, Inc.

Inventors: Ellen M. Eide, Raul Fernandez, Wael M. Hamza, Michael A. Picheny
Generating paralinguistic phenomena via markup in text-to-speech synthesis

Patent number: 7472065

Abstract: Converting marked-up text into a synthesized stream includes providing marked-up text to a processor-based system, converting the marked-up text into a text stream including vocabulary items, retrieving audio segments corresponding to the vocabulary items, concatenating the audio segments to form a synthesized stream, and audibly outputting the synthesized stream, wherein the marked-up text includes a normal text and a paralinguistic text; and wherein the normal text is differentiated from the paralinguistic text by using a grammar constraint, and wherein the paralinguistic text is associated with more than one audio segment, wherein the retrieving of the plurality audio segments includes selecting one audio segment associated with the paralinguistic text.

Type: Grant

Filed: June 4, 2004

Date of Patent: December 30, 2008

Assignee: International Business Machines Corporation

Inventors: Andrew S. Aaron, Raimo Bakis, Ellen M. Eide, Wael Hamza
APPLICATION OF EMOTION-BASED INTONATION AND PROSODY TO SPEECH IN TEXT-TO-SPEECH SYSTEMS

Publication number: 20080294443

Abstract: Abstract of the Disclosure A text-to-speech system that includes an arrangement for accepting text input, an arrangement for providing synthetic speech output, and an arrangement for imparting emotion-based features to synthetic speech output. The arrangement for imparting emotion-based features includes an arrangement for accepting instruction for imparting at least one emotion-based paradigm to synthetic speech output, as well as an arrangement for applying at least one emotion-based paradigm to synthetic speech output.

Type: Application

Filed: July 14, 2008

Publication date: November 27, 2008

Applicant: International Business Machines Corporation

Inventor: Ellen M. Eide
APPLICATION OF EMOTION-BASED INTONATION AND PROSODY TO SPEECH IN TEXT-TO-SPEECH SYSTEMS

Publication number: 20080288257

Abstract: A text-to-speech system that includes an arrangement for accepting text input, an arrangement for providing synthetic speech output, and an arrangement for imparting emotion-based features to synthetic speech output. The arrangement for imparting emotion-based features includes an arrangement for accepting instruction for imparting at least one emotion-based paradigm to synthetic speech output, as well as an arrangement for applying at least one emotion-based paradigm to synthetic speech output.

Type: Application

Filed: July 14, 2008

Publication date: November 20, 2008

Inventor: Ellen M. Eide
Providing Expressive User Interaction With A Multimodal Application

Publication number: 20080255850

Abstract: Methods, apparatus, and products are disclosed for providing expressive user interaction with a multimodal application, the multimodal application operating in a multimodal browser on a multimodal device supporting multiple modes of user interaction including a voice mode and one or more non-voice modes, the multimodal application operatively coupled to a speech engine through a VoiceXML interpreter, including: receiving, by the multimodal browser, user input from a user through a particular mode of user interaction; determining, by the multimodal browser, user output for the user in dependence upon the user input; determining, by the multimodal browser, a style for the user output in dependence upon the user input, the style specifying expressive output characteristics for at least one other mode of user interaction; and rendering, by the multimodal browser, the user output in dependence upon the style.

Type: Application

Filed: April 12, 2007

Publication date: October 16, 2008

Inventors: Charles W. Cross, Ellen M. Eide, Igor R. Jablokov
SYSTEM AND METHOD FOR DYNAMICALLY SELECTING AMONG TTS SYSTEMS

Publication number: 20080172234

Abstract: Systems and methods for dynamically selecting among text-to-speech (TTS) systems. Exemplary embodiments of the systems and methods include identifying text for converting into a speech waveform, synthesizing said text by three TTS systems, generating a candidate waveform from each of the three systems, generating a score from each of the three systems, comparing each of the three scores, selecting a score based on a criteria and selecting one of the three waveforms based on the selected of the three scores.

Type: Application

Filed: January 12, 2007

Publication date: July 17, 2008

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Ellen M. Eide, Raul Fernandez, Wael M. Hamza, Michael A. Picheny
Application of emotion-based intonation and prosody to speech in text-to-speech systems

Patent number: 7401020

Abstract: A text-to-speech system that includes an arrangement for accepting text input, an arrangement for providing synthetic speech output, and an arrangement for imparting emotion-based features to synthetic speech output. The arrangement for imparting emotion-based features includes an arrangement for accepting instruction for imparting at least one emotion-based paradigm to synthetic speech output, as well as an arrangement for applying at least one emotion-based paradigm to synthetic speech output.

Type: Grant

Filed: November 29, 2002

Date of Patent: July 15, 2008

Assignee: International Business Machines Corporation

Inventor: Ellen M. Eide
SYSTEM FOR TUNING SYNTHESIZED SPEECH

Publication number: 20080167875

Abstract: An embodiment of the invention is a software tool used to convert text, speech synthesis markup language (SSML), and or extended SSML to synthesized audio. Provisions are provided to create, view, play, and edit the synthesized speech including editing pitch and duration targets, speaking type, paralinguistic events, and prosody. Prosody can be provided by way of a sample recording. Users can interact with the software tool by way of a graphical user interface (GUI). The software tool can produce synthesized audio file output in many file formats.

Type: Application

Filed: January 9, 2007

Publication date: July 10, 2008

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Raimo Bakis, Ellen M. Eide, Roberto Pieraccini, Maria E. Smith, Jie Zeng
METHODS AND COMPUTER PROGRAM PRODUCTS FOR PROVIDING PARAPHRASING IN A TEXT-TO-SPEECH SYSTEM

Publication number: 20080167876

Abstract: A method and computer program product for providing paraphrasing in a text-to-speech (TTS) system is provided. The method includes receiving an input text, parsing the input text, and determining a paraphrase of the input text. The method also includes synthesizing the paraphrase into synthesized speech. The method further includes selecting synthesized speech to output, which includes: assigning a score to each synthesized speech associated with each paraphrase, comparing the score of each synthesized speech associated with each paraphrase, and selecting the top-scoring synthesized speech to output. Furthermore, the method includes outputting the selected synthesized speech.

Type: Application

Filed: January 4, 2007

Publication date: July 10, 2008

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Raimo Bakis, Ellen M. Eide, Wael Hamza, Michael A. Picheny
Speech recognition using discriminant features

Patent number: 7337114

Abstract: Methods and arrangements for representing the speech waveform in terms of a set of abstract, linguistic distinctions in order to derive a set of discriminative features for use in a speech recognizer. By combining the distinctive feature representation with an original waveform representation, it is possible to achieve a reduction in word error rate of 33% on an automatic speech recognition task.

Type: Grant

Filed: March 29, 2001

Date of Patent: February 26, 2008

Assignee: International Business Machines Corporation

Inventor: Ellen M. Eide
Apparatus and method for speaker normalization based on biometrics

Patent number: 6823305

Abstract: Speaker normalization is carried out based on biometric information available about a speaker, such as his height, or a dimension of a bodily member or article of clothing. The chosen biometric parameter correlates with the vocal tract length. Speech can be normalized based on the biometric parameter, which thus indirectly normalizes the speech based on the vocal tract length of the speaker. The inventive normalization can be used in model formation, or in actual speech recognition usage, or both. Substantial improvements in accuracy have been noted at little cost. The preferred biometric parameter is height, and the preferred form of scaling is linear scaling with the scale factor proportional to the height of the speaker.

Type: Grant

Filed: December 21, 2000

Date of Patent: November 23, 2004

Assignee: International Business Machines Corporation

Inventor: Ellen M. Eide

1 2 next