Methods For Producing Synthetic Speech; Speech Synthesizers (epo) Patents (Class 704/E13.002)

E Subclasses

Concept-to-speech synthesizers; generation of natural phrases not from text but from machine-based concepts (epo) (Class 704/E13.003)

Sound editing, manipulating voice of the synthesizer (epo) (Class 704/E13.004)

System and Methods for Providing Animated Video Content with a Spoken Language Segment

Publication number: 20130246063

Abstract: A system and methods are disclosed which provide simple and rapid animated content creation, particularly for more life-like synthesis of voice segments associated with an animated element. A voice input tool enables quick creation of spoken language segments for animated characters. Speech is converted to text. That text may be reconverted to speech with prosodic elements added. The text, prosodic elements, and voice may be edited.

Type: Application

Filed: April 7, 2011

Publication date: September 19, 2013

Applicant: GOOGLE INC.

Inventor: Eric Teller
Conference Call Service with Speech Processing for Heavily Accented Speakers

Publication number: 20130226576

Abstract: Speech recognition processing captures phonemes of words in a spoken speech string and retrieves text of words corresponding to particular combinations of phonemes from a phoneme dictionary. A text-to-speech synthesizer then can produce and substitute a synthesized pronunciation of that word in the speech string. If the speech recognition processing fails to recognize a particular combination of phonemes of a word, as spoken, as may occur when a word is spoken with an accent or when the speaker has a speech impediment, the speaker is prompted to clarify the word by entry, as text, from a keyboard or the like for storage in the phoneme dictionary such that a synthesized pronunciation of the word can be played out when the initially unrecognized spoken word is again encountered in a speech string to improve intelligibility, particularly for conference calls.

Type: Application

Filed: February 23, 2012

Publication date: August 29, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Peeyush Jaiswal, Burt Leo Vialpando, Fang Wang
System and method for client voice building

Patent number: 8311830

Abstract: Provided is a system and method for building and managing a customized voice of an end-user, comprising the steps of designing a set of prompts for collection from the user, wherein the prompts are selected from both an analysis tool and by the user's own choosing to capture voice characteristics unique to the user. The prompts are delivered to the user over a network to allow the user to save a user recording on a server of a service provider. This recording is then retrieved and stored on the server and then set up on the server to build a voice database using text-to-speech synthesis tools. A graphical interface allows the user to continuously refine the data file to improve the voice and customize parameter and configuration settings, thereby forming a customized voice database which can be deployed or accessed.

Type: Grant

Filed: December 6, 2011

Date of Patent: November 13, 2012

Assignee: Cepstral, LLC

Inventors: Craig F. Campbell, Kevin A. Lenzo, Alexandre D. Cox
Speech Enhancement Techniques on the Power Spectrum

Publication number: 20120265534

Abstract: The method provides a spectral speech description to be used for synthesis of a speech utterance, where at least one spectral envelope input representation is received. In one solution the improvement is made by manipulation an extremum, i.e. a peak or a valley, in the rapidly varying component of the spectral envelope representation. The rapidly varying component of the spectral envelope representation is manipulated to sharpen and/or accentuate extrema after which it is merged back with the slowly varying component or the spectral envelope input representation to create an enhanced spectral envelope final representation. In other solutions a complex spectrum envelope final representation is created with phase information derived from one of the group delay representation of a real spectral envelope input representation corresponding to a short-time speech signal and a transformed phase component of the discrete complex frequency domain input representation corresponding to the speech utterance.

Type: Application

Filed: September 4, 2009

Publication date: October 18, 2012

Applicant: SVOX AG

Inventors: Geert Coorman, Johan Wouters
TELEPHONE OR OTHER DEVICE WITH SPEAKER-BASED OR LOCATION-BASED SOUND FIELD PROCESSING

Publication number: 20120150542

Abstract: A method includes obtaining audio data representing audio content from at least one speaker. The method also includes spatially processing the audio data to create at least one sound field, where each sound field has a spatial characteristic that is unique to a specific speaker. The method further includes generating the at least one sound field using the processed audio data. The audio data could represent audio content from multiple speakers, and generating the at least one sound field could include generating multiple sound fields around a listener. The spatially processing could include performing beam forming to create multiple directional beams, and generating the multiple sound fields around the listener could include generating the directional beams with different apparent origins around the listener. The method could further include separating the audio data based on speaker, where each sound field is associated with the audio data from one of the speakers.

Type: Application

Filed: December 9, 2010

Publication date: June 14, 2012

Applicant: NATIONAL SEMICONDUCTOR CORPORATION

Inventor: Wei Ma
System and method for client voice building

Patent number: 8086457

Abstract: Provided is a system and method for building and managing a customized voice of an end-user, comprising the steps of designing a set of prompts for collection from the user, wherein the prompts are selected from both an analysis tool and by the user's own choosing to capture voice characteristics unique to the user. The prompts are delivered to the user over a network to allow the user to save a user recording on a server of a service provider. This recording is then retrieved and stored on the server and then set up on the server to build a voice database using text-to-speech synthesis tools. A graphical interface allows the user to continuously refine the data file to improve the voice and customize parameter and configuration settings, thereby forming a customized voice database which can be deployed or accessed.

Type: Grant

Filed: May 29, 2008

Date of Patent: December 27, 2011

Assignee: Cepstral, LLC

Inventors: Craig F. Campbell, Kevin A. Lenzo, Alexandre D. Cox
METHOD AND DEVICE FOR AUDIBLY INSTRUCTING A USER TO INTERACT WITH A FUNCTION

Publication number: 20110313771

Abstract: A method for audibly instructing a user to interact with a function. A function is associated with a user-written selectable item. The user-written selectable item is recognized on a surface. In response to recognizing the user-written selectable item, a first instructional message related to the operation of the function is audibly rendered without requiring further interaction from the user.

Type: Application

Filed: December 13, 2010

Publication date: December 22, 2011

Applicant: LEAPFROG ENTERPRISES, INC.

Inventor: James Marggraff
PRE-SAVED DATA COMPRESSION FOR TTS CONCATENATION COST

Publication number: 20110246200

Abstract: Pre-saved concatenation cost data is compressed through speech segment grouping. Speech segments are assigned to a predefined number of groups based on their concatenation cost values with other speech segments. A representative segment is selected for each group. The concatenation cost between two segments in different groups may then be approximated by that between the representative segments of their respective groups, thereby reducing an amount of concatenation cost data to be pre-saved.

Type: Application

Filed: April 5, 2010

Publication date: October 6, 2011

Applicant: Microsoft Corporation

Inventors: Huicheng Song, Guoliang Zhang, Zhiwei Weng
ROBOT, METHOD AND PROGRAM OF CONTROLLING ROBOT

Publication number: 20110224977

Abstract: A robot may include a driving control unit configured to control a driving of a movable unit that is connected movably to a body unit, a voice generating unit configured to generate a voice, and a voice output unit configured to output the voice, which has been generated by the voice generating unit. The voice generating unit may correct the voice, which is generated, based on a bearing of the movable unit, which is controlled by the driving control unit, to the body unit.

Type: Application

Filed: September 14, 2010

Publication date: September 15, 2011

Applicant: HONDA MOTOR CO., LTD.

Inventors: Kazuhiro NAKADAI, Takuma OTSUKA, Hiroshi OKUNO
REFRIGERATOR

Publication number: 20110210822

Abstract: A refrigerator is provided. The refrigerator includes a voice recognition unit for recognizing a voice of a name of food, a memory for storing location information of the food received in a storage chamber, a controller for determining the voice recognized by the voice recognition unit and searching a storage location of the food voice-recognized in accordance with the recognized voice, and a voice output unit for outputting a voice message on the storage location information of the food searched by the controller.

Type: Application

Filed: September 11, 2008

Publication date: September 1, 2011

Applicant: LG Electronics Inc.

Inventors: Sung-Ae Lee, Min-Kyeong Kim
SYSTEM AND METHOD FOR PROVIDING A NATURAL LANGUAGE INTERFACE TO A DATABASE

Publication number: 20110179006

Abstract: A system and method for providing a natural language interface to a database or the Internet. The method provides a response from a database to a natural language query. The method comprises receiving a user query, extracting key data from the user query, submitting the extracted key data to a data base search engine to retrieve a top n pages from the data base, processing of the top n pages through a natural language dialog engine and providing a response based on processing the top n pages.

Type: Application

Filed: March 29, 2011

Publication date: July 21, 2011

Applicant: AT&T Corp.

Inventors: Richard Vandervoort Cox, Hossein Eslambolchi, Behzad Nadji, Mazin G. Rahim
MEDIA PROCESS SERVER APPARATUS AND MEDIA PROCESS METHOD THEREFOR

Publication number: 20110093272

Abstract: A media process server apparatus has a speech synthesis data storage device for storing, after categorizing into emotions, data for speech synthesis in association with a user identifier, a text analyzer for determining, from a text message received from a message server apparatus, emotion of text, and a speech data synthesizer for generating speech data with emotional expression by synthesizing speech corresponding to the text, using data for speech synthesis that corresponds to the determined emotion and that is in association with a user identifier of a user who is a transmitter of the text message.

Type: Application

Filed: April 2, 2009

Publication date: April 21, 2011

Applicant: NTT DOCOMO, INC

Inventors: Shin-ichi Isobe, Masami Yabusaki
SYSTEM AND METHOD FOR GENERALIZED PRESELECTION FOR UNIT SELECTION SYNTHESIS

Publication number: 20110071836

Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for unit selection synthesis. The method causes a computing device to add a supplemental phoneset to a speech synthesizer front end having an existing phoneset, modify a unit preselection process based on the supplemental phoneset, preselect units from the supplemental phoneset and the existing phoneset based on the modified unit preselection process, and generate speech based on the preselected units. The supplemental phoneset can be a variation of the existing phoneset, can include a word boundary feature, can include a cluster feature where initial consonant clusters and some word boundaries are marked with diacritics, can include a function word feature which marks units as originating from a function word or a content word, and/or can include a pre-vocalic or post-vocalic feature. The speech synthesizer front end can incorporates the supplemental phoneset as an extra feature.

Type: Application

Filed: September 21, 2009

Publication date: March 24, 2011

Applicant: AT&T Intellectual Property I, L.P.

Inventors: Alistair D. CONKIE, Mark BEUTNAGEL, Yeon-Jun KIM, Ann K. SYRDAL
SPEECH PROCESSING APPARATUS, SPEECH PROCESSING METHOD AND PROGRAM

Publication number: 20110046955

Abstract: There is provided a speech processing apparatus including: a data obtaining unit which obtains music progression data defining a property of one or more time points or one or more time periods along progression of music; a determining unit which determines an output time point at which a speech is to be output during reproducing the music by utilizing the music progression data obtained by the data obtaining unit; and an audio output unit which outputs the speech at the output time point determined by the determining unit during reproducing the music.

Type: Application

Filed: August 12, 2010

Publication date: February 24, 2011

Inventors: Tetsuo IKEDA, Ken MIYASHITA, Tatsushi NASHIDA
SPEECH DEVICE, SPEECH CONTROL PROGRAM, AND SPEECH CONTROL METHOD

Publication number: 20110022390

Abstract: In order to speak numerals in a manner readily comprehensible to a user, a speech device includes a voice synthesis portion 55 which, when a given character string includes a numeral made up of a plurality of digits, speaks the numeral in either a first speech method in which the numeral is read aloud as individual digits or a second speech method in which the numeral is read aloud as a full number, a user definition table 81, an association table 83, a region table 84, and a digit number table 87 which associate a type of a character string with either the first speech method or the second speech method, a process executing portion 53 which executes a process to thereby output data, and a speech control portion 51 which generates a character string on the basis of the output data and causes the voice synthesis portion 55 to speak the generated character string in one of the first and second speech methods that is associated with the type of the output data.

Type: Application

Filed: February 4, 2009

Publication date: January 27, 2011

Applicant: SANYO ELECTRIC CO., LTD.

Inventors: Kinya Otani, Naoki Hirose
Systems and methods for selecting from multiple phonectic transcriptions for text-to-speech synthesis

Patent number: 7869999

Abstract: A system and method for generating synthetic speech, which operates in a computer implemented Text-To-Speech system. The system comprises at least a speaker database that has been previously created from user recordings, a Front-End system to receive an input text and a Text-To-Speech engine. The Front-End system generates multiple phonetic transcriptions for each word of the input text, and the TTS engine uses a cost function to select which phonetic transcription is the more appropriate for searching the speech segments within the speaker database to be concatenated and synthesized.

Type: Grant

Filed: August 10, 2005

Date of Patent: January 11, 2011

Assignee: Nuance Communications, Inc.

Inventors: Christel Amato, Hubert Crepy, Stephane Revelin, Claire Waast-Richard
SYSTEMS AND METHODS FOR DOCUMENT NARRATION WITH MULTIPLE CHARACTERS HAVING MULTIPLE MOODS

Publication number: 20100324903

Abstract: Disclosed are techniques and systems to provide a narration of a text in multiple different voices. Further disclosed are techniques and systems for providing a plurality of characters at least some of the characters having multiple associated moods for use in document narration.

Type: Application

Filed: January 14, 2010

Publication date: December 23, 2010

Inventors: Raymond C. Kurzweil, Paul Albrecht, Peter Chapman
SYSTEMS AND METHODS FOR MULTIPLE LANGUAGE DOCUMENT NARRATION

Publication number: 20100324904

Abstract: Disclosed are techniques and systems to provide a narration of a text in multiple different languages where the portions of the text narrated using the different voices associated with different languages are selected by a user.

Type: Application

Filed: January 14, 2010

Publication date: December 23, 2010

Inventors: Raymond C. Kurzweil, Paul Albrecht, Peter Chapman
HIDDEN MARKOV MODEL BASED TEXT TO SPEECH SYSTEMS EMPLOYING ROPE-JUMPING ALGORITHM

Publication number: 20100312562

Abstract: A rope-jumping algorithm is employed in a Hidden Markov Model based text to speech system to determine start and end models and to modify the start and end models by setting small co-variances. Disordered acoustic parameters due to violation of parameter constraints are avoided through the modification and result in stable line frequency spectrum for the generated speech.

Type: Application

Filed: June 4, 2009

Publication date: December 9, 2010

Applicant: Microsoft Corporation

Inventors: Wenlin Wang, Guoliang Zhang, Jingyang Xu
SPEECH SYNTHESIS DEVICE, SPEECH SYNTHESIS METHOD, AND SPEECH SYNTHESIS PROGRAM

Publication number: 20100305949

Abstract: It is possible to provide a speech synthesis device, speech synthesis method, and speech synthesis program which can improve a speech quality and reduce a calculation amount with a preferable balance between them. The speech synthesis device includes: a sub-score calculation unit (60/65) which calculates a segment selection sub-score for selecting an optimal segment; and a candidate narrowing unit (70/73) for narrowing the candidates according to the number of the candidate segments and the segment selection sub score. The speech synthesis device performs candidate narrowing by the sub score calculation unit (60/65) and the candidate narrowing unit (70/73) in the candidate selection process when generating a synthesized speech from an input text.

Type: Application

Filed: November 25, 2008

Publication date: December 2, 2010

Inventors: Masanori Kato, Yasuyuki Mitsui, Reishi Kondo
SYSTEMS AND METHODS FOR PRE-RENDERING AN AUDIO REPRESENTATION OF TEXTUAL CONTENT FOR SUBSEQUENT PLAYBACK

Publication number: 20100274838

Abstract: A system configured to pre-render an audio representation of textual content for subsequent playback includes a network, a source server, and a requesting device. The source server is configured to provide a plurality of textual content across the network. The requesting device includes a download unit, a signature generating unit, a signature comparing unit, and a text to speech conversion unit. The download unit is configured to download the plurality of textual content from the source server across the network. The signature generating unit is configured to generate a unique signature for each of the textual content. The signature comparing unit is configured to compare each unique signature with a prior corresponding signature to determine whether the corresponding textual content has changed. The text to speech conversion unit is configured to convert the textual content to speech when the textual content has been determined to have changed.

Type: Application

Filed: April 24, 2009

Publication date: October 28, 2010

Inventor: Richard A. Zemer
SPEECH SYNTHESIZING DEVICE, COMPUTER PROGRAM PRODUCT, AND METHOD

Publication number: 20100250254

Abstract: An acquiring unit acquires pattern sentences, which are similar to one another and include fixed segments and non-fixed segments, and substitution words that are substituted for the non-fixed segments. A sentence generating unit generates target sentences by replacing the non-fixed segments with the substitution words for each of the pattern sentences. A first synthetic-sound generating unit generates a first synthetic sound, a synthetic sound of the fixed segment, and a second synthetic-sound generating unit generates a second synthetic sound, a synthetic sound of the substitution word, for each of the target sentences. A calculating unit calculates a discontinuity value of a boundary between the first synthetic sound and the second synthetic sound for each of the target sentences and a selecting unit selects the target sentence having the smallest discontinuity value. A connecting unit connects the first synthetic sound and the second synthetic sound of the target sentence selected.

Type: Application

Filed: September 15, 2009

Publication date: September 30, 2010

Applicant: KABUSHIKI KAISHA TOSHIBA

Inventor: Nobuaki Mizutani
ADAPTIVE INFORMATION PRESENTATION APPARATUS AND METHODS

Publication number: 20100217657

Abstract: An adaptive information presentation apparatus and associated methods. In one embodiment, the apparatus comprises a computer readable medium having at least one computer program disposed thereon, the at least one program being configured to adaptively present (e.g., display or play out via an audio system) information that is related or in response to inputs provided via an input device such as a for example touch-screen display device. In one variant, the at least one program analyzes user input to determine a context of the input, and selects advertising related to the context for presentation to the user.

Type: Application

Filed: February 24, 2010

Publication date: August 26, 2010

Inventor: Robert F. Gazdzinski
STATE MAPPING FOR CROSS-LANGUAGE SPEAKER ADAPTATION

Publication number: 20100198577

Abstract: Creation of sub-phonemic Hidden Markov Model (HMM) states and the mapping of those states results in improved cross-language speaker adaptation. The smaller sub-phonemic mapping provides improvements in usability and intelligibility particularly between languages with few common phonemes. HMM states of different languages may be mapped to one another using a distance between the HMM states in acoustic space. This distance may be calculated using Kullback-Leibler divergence and multi-space probability distribution. By combining distance mapping and context mapping for different speakers of the same language improved cross-language speaker adaptation is possible.

Type: Application

Filed: February 3, 2009

Publication date: August 5, 2010

Applicant: Microsoft Corporation

Inventors: Yi-Ning Chen, Yao Qian, Frank Kao-Ping Soong
Speech Synthesizing Device, Speech Synthesizing Method, and Program

Publication number: 20100145706

Abstract: An object of the present invention is to provide a device and a method for generating a synthesized speech that has an utterance form that matches music. A musical genre estimation unit of the speech synthesizing device estimates the musical genre to which a received music signal belongs, an utterance form selection unit references an utterance form information storage unit to determine an utterance form from the musical genre. A prosody generation unit references a prosody generation rule storage unit, selected from prosody generation rule storage units 151 to 15N according to the utterance form, and generates prosody information from a phonetic symbol sequence. A unit waveform selection unit references a unit waveform data storage unit, selected from unit waveform data storage units 161 to 16N according to the utterance form, and selects a unit waveform from the phonetic symbol sequence and the prosody information.

Type: Application

Filed: February 1, 2007

Publication date: June 10, 2010

Applicant: NEC CORPORATION

Inventor: Masanori Kato
USER VOICE MIXING DEVICE, VIRTUAL SPACE SHARING SYSTEM, COMPUTER CONTROL METHOD, AND INFORMATION STORAGE MEDIUM

Publication number: 20100145701

Abstract: A sensation of presence of voice chat in a virtual space is enhanced. A user speech synthesizer used in a virtual space sharing system where information processing devices share the virtual space. The user speech synthesizer comprises a speech data acquiring section (60) for acquiring speech data representing a speech uttered by the user of one of the information processing devices, an environment sound storage section (66) for storing an environment sound associated with one or more regions defined in the virtual space, a region specifying section (64) for specifying a region corresponding to the user in the virtual space, and an environment sound synthesizing section (68) for acquiring the environment sound associated with the specified region from the environment sound storage section (66), combining the acquired environment sound and the speech data and synthesizing synthesized speech data.

Type: Application

Filed: June 7, 2006

Publication date: June 10, 2010

Applicant: KONAMI DIGITAL ENTERTAINMENT CO., LTD.

Inventors: Hiromasa Kaneko, Masaki Takeuchi
System and Method of Developing A TTS Voice

Publication number: 20100094632

Abstract: Disclosed herein are various aspects of a toolkit used for generating a TTS voice for use in a spoken dialog system. The embodiments in each case may be in the form of the system, a computer-readable medium or a method for generating the TTS voice. An embodiment of the invention relates to a method of tracking progress in developing a text-to-speech (TTS) voice. The method comprises insuring that a corpus of recorded speech contains reading errors and matches an associated written text, creating a tuple for each utterance in the corpus and tracking progress for each utterance utilizing the tuple. Various parameters may be tracked using the tuple but the tuple provides a means for enabling multiple workers to efficiently process a database of utterance in preparation of a TTS voice.

Type: Application

Filed: December 15, 2009

Publication date: April 15, 2010

Applicant: AT&T Corp,

Inventors: Steven Lawrence Davis, Shane Fetters, David Eugene Schultz, Beverly Gustafson, Louise Loney
Reducing Processing Latency in Optical Character Recognition for Portable Reading Machine

Publication number: 20100088099

Abstract: A portable reading device includes a computing device and a computer readable medium storing a computer program product to receive an image and select a section of the image to process. The product processes the section of the image with a first process and when the first process is finished processing the section of the image, process a result of the first process with a second process. While the second process is processing, repeats the first process on another section of the image.

Type: Application

Filed: December 8, 2009

Publication date: April 8, 2010

Inventors: Raymond C. Kurzweil, Paul Albrecht, Lucy Gibson
SPEECH AND TEXT DRIVEN HMM-BASED BODY ANIMATION SYNTHESIS

Publication number: 20100082345

Abstract: An “Animation Synthesizer” uses trainable probabilistic models, such as Hidden Markov Models (HMM), Artificial Neural Networks (ANN), etc., to provide speech and text driven body animation synthesis. Probabilistic models are trained using synchronized motion and speech inputs (e.g., live or recorded audio/video feeds) at various speech levels, such as sentences, phrases, words, phonemes, sub-phonemes, etc., depending upon the available data, and the motion type or body part being modeled. The Animation Synthesizer then uses the trainable probabilistic model for selecting animation trajectories for one or more different body parts (e.g., face, head, hands, arms, etc.) based on an arbitrary text and/or speech input. These animation trajectories are then used to synthesize a sequence of animations for digital avatars, cartoon characters, computer generated anthropomorphic persons or creatures, actual motions for physical robots, etc.

Type: Application

Filed: September 26, 2008

Publication date: April 1, 2010

Applicant: MICROSOFT CORPORATION

Inventors: Lijuan Wang, Lei Ma, Frank Kao-Ping Soong
SYSTEMS AND METHODS FOR TEXT TO SPEECH SYNTHESIS

Publication number: 20100082346

Abstract: Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech in a language (e.g., dialect or accent) that is familiar to a user. The algorithms may be implemented on a system including several dedicated render engines. The system may be part of a back end coupled to a front end including storage for media assets and associated synthesized speech, and a request processor for receiving and processing requests that result in providing the synthesized speech. The front end may communicate media assets and associated synthesized speech content over a network to host devices coupled to portable electronic devices on which the media assets and synthesized speech are played back.

Type: Application

Filed: September 29, 2008

Publication date: April 1, 2010

Applicant: Apple Inc.

Inventors: Matthew Rogers, Kim Silverman, DeVang Naik, Kevin Lenzo, Benjamin Rottler
Coarticulation Method for Audio-Visual Text-to-Speech Synthesis

Publication number: 20100076762

Abstract: A method for generating animated sequences of talking heads in text-to-speech applications wherein a processor samples a plurality of frames comprising image samples. The processor reads first data comprising one or more parameters associated with noise-producing orifice images of sequences of at least three concatenated phonemes which correspond to an input stimulus. The processor reads, based on the first data. second data comprising images of a noise-producing entity. The processor generates an animated sequence of the noise-producing entity.

Type: Application

Filed: November 30, 2009

Publication date: March 25, 2010

Applicant: AT&T Corp.

Inventors: Eric Cosatto, Hans Peter Graf, Juergen Schroeter
SYSTEM AND METHOD FOR CONFIGURING VOICE SYNTHESIS

Publication number: 20100049523

Abstract: Systems and methods for providing synthesized speech in a manner that takes into account the environment where the speech is presented. A method embodiment includes, based on a listening environment and at least one other parameter associated with at least one other parameter, selecting an approach from the plurality of approaches for presenting synthesized speech in a listening environment, presenting synthesized speech according to the selected approach and based on natural language input received from a user indicating that an inability to understand the presented synthesized speech, selecting a second approach from the plurality of approaches and presenting subsequent synthesized speech using the second approach.

Type: Application

Filed: October 28, 2009

Publication date: February 25, 2010

Applicant: AT&T Corp.

Inventors: Kenneth H. Rosen, Carroll W. Creswell, Jeffrey J. Farah, Pradeep K. Bansal, Ann K. Syrdal
Training And Applying Prosody Models

Publication number: 20100042410

Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.

Type: Application

Filed: August 11, 2009

Publication date: February 18, 2010

Inventor: James H. Stephens, JR.
System And Method For Storing And Retrieving Non-Text-Based Information

Publication number: 20100030775

Abstract: A method for non-text-based identification of a selected item of stored music. The first broad portion of the method focuses on building a music identification database. That process requires capturing a tag of the selected musical item, and processing the tag to develop reference key to the same. Then the tag is stored, together with the reference key and an association to the stored music. The database is built by collecting a multiplicity of tags. The second broad portion of the method is retrieving a desired item of stored music from the database. That process calls for capturing a query tag from a user, and processing the query tag to develop a query key to the same. The query tag is compared to reference keys stored in the database to identify the desired item of stored music.

Type: Application

Filed: October 13, 2009

Publication date: February 4, 2010

Applicant: Melodis Corporation

Inventors: Keyvan Mohajer, Majid Emami, Michal Grabowski, James M. Hom
Voice waveform interpolating apparatus and method

Publication number: 20090326950

Abstract: A voice waveform interpolating apparatus for interpolating part of stored voice data by another part of the voice data so as to generate voice data. To achieve this, it comprises a voice storage unit, an interpolated waveform generation unit generating interpolated voice data, and a waveform combining unit outputting voice data, a part of the voice data is replaced with another part of the voice data, and further comprises an interpolated waveform setting function unit judging if the other part of the voice data is appropriate as interpolated voice data to be generated by the interpolated waveform generation unit.

Type: Application

Filed: August 31, 2009

Publication date: December 31, 2009

Applicant: FUJITSU LIMITED

Inventor: Chikako Matsumoto
Automated Generation of Audiobook with Multiple Voices and Sounds from Text

Publication number: 20090326948

Abstract: A method, system and computer-usable medium are disclosed for the transcoding of annotated text to speech and audio. Source text is parsed into spoken text passages and sound description passages. A speaker identity is determined for each spoken text passage and a sound element for each sound description passage. The speaker identities and sound elements are automatically referenced to a voice and sound effects schema. A voice effect is associated with each speaker identity and a sound effect with each sound element. Each spoken text passage is then annotated with the voice effect associated with its speaker identity and each sound description passage is annotated with the sound effect associated with its sound element. The resulting annotated spoken text and sound description passages are processed to generate output text operable to be transcoded to speech and audio.

Type: Application

Filed: June 26, 2008

Publication date: December 31, 2009

Inventors: Piyush Agarwal, Priya B. Benjamin, Kam K. Yee, Neeraj Joshi
AUDIO CONTENT GENERATION SYSTEM, INFORMATION EXCHANGING SYSTEM, PROGRAM, AUDIO CONTENT GENERATING METHOD, AND INFORMATION EXCHANGING METHOD

Publication number: 20090319273

Abstract: An audio content generation system is a system for generating audio contents including a voice synthesis unit 102 which generates synthesized voice from text; and is provided with an audio content generation unit 103 which is connected to a multimedia database 101 in which contents mainly composed of audio article data V1 to V3 or text article data T1 and T2 are registered respectively, generates synthesized voice SYT1 and SYT2 for the text article data T1 and T2 registered in the multimedia database 101 by using the voice synthesis unit 102, and generates audio contents in which the synthesized voice SYT1 and SYT2 and the audio article data V1 to V3 are organized in accordance with a predetermined order.

Type: Application

Filed: June 27, 2007

Publication date: December 24, 2009

Applicant: NEC CORPORATION

Inventors: Yasuyuki Mitsui, Shinichi Doi, Reishi Kondo, Masanori Kato
SIMULATION METHOD AND SYSTEM

Publication number: 20090310939

Abstract: A simulation method and system. A computing system receives a first audio and/or video data stream. The first audio and/or video data stream includes data associated with a first person. The computing system monitors the first audio and/or video data stream. The computing system identifies emotional attributes comprised by the first audio and/or video data stream. The computing system generates a second audio and/or video data stream associated with the first audio and/or video data stream. The second audio and/or video data stream includes the data without the emotional attributes. The computing system stores the second audio and/or video data stream.

Type: Application

Filed: June 12, 2008

Publication date: December 17, 2009

Inventors: Sara H. Basson, Dimitri Kanevsky, Edward Emile Kelley, Bhuvana Ramabhadran
CHARACTER SEQUENCE MAP GENERATING APPARATUS, INFORMATION SEARCHING APPARATUS, CHARACTER SEQUENCE MAP GENERATING METHOD, INFORMATION SEARCHING METHOD, AND COMPUTER PRODUCT

Publication number: 20090299974

Abstract: A computer-readable recording medium stores therein a sequence-map generating program that causes a computer to execute extracting from files that include character strings written therein, a word having q (q?2) characters; extracting from the word extracted at the extracting the word, consecutive characters from a character position s-th (1?s?q?r+1) from a head of the word to a character position determined by a number of characters r (r?q); and generating, for each character position s-th from the head, a consecutive-character sequence map including a flag row that indicates, for each file, whether a file includes the consecutive characters extracted at the extracting the consecutive characters.

Type: Application

Filed: January 29, 2009

Publication date: December 3, 2009

Applicant: FUJITSU LIMITED

Inventors: Masahiro Kataoka, Tomoki Nagase, Takashi Tsubokura
INTERACTIVE UNIFIED ACCESS AND CONTROL OF MOBILE DEVICES

Publication number: 20090248820

Abstract: An interface between mobile devices and computing devices, such as a PC or an in-vehicle system permits a user to use the better user interface of the computing device to access and control the operation of the mobile device.

Type: Application

Filed: March 25, 2009

Publication date: October 1, 2009

Inventors: Otman A. Basir, William Ben Miners
SPEECH PROCESSING APPARATUS, METHOD, AND COMPUTER PROGRAM PRODUCT

Publication number: 20090248417

Abstract: A method to generate a pitch contour for speech synthesis is proposed. The method is based on finding the pitch contour that maximizes a total likelihood function created by the combination of all the statistical models of the pitch contour segments of an utterance, at one or multiple linguistic levels. These statistical models are trained from a database of spoken speech, by means of a decision tree that for each linguistic level clusters the parametric representation of the pitch segments extracted from the spoken speech data with some features obtained from the text associated with that speech data. The parameterization of the pitch segments is performed in such a way, the likelihood function of any linguistic level can be expressed in terms of the parameters of one of the levels, thus allowing the maximization to be calculated with respect to the parameters of that level.

Type: Application

Filed: March 17, 2009

Publication date: October 1, 2009

Applicant: KABUSHIKI KAISHA TOSHIBA

Inventors: Javier Latorre, Masami Akamine
SPEECH SYNTHESIS SYSTEM HAVING ARTIFICIAL EXCITATION SIGNAL

Publication number: 20090222268

Abstract: A speech synthesis system synthesizes a speech signal corresponding to an input speech signal based on a spectral envelope of the input speech signal. A glottal pulse generator generates a time series of glottal pulses, that are processed into a glottal pulse magnitude spectrum. A shaping circuit shapes the glottal pulse magnitude spectrum based on the spectral envelope and generates a shaped glottal pulse magnitude spectrum. A harmonic null adjustment circuit reduces harmonic nulls in the shaped glottal pulse magnitude spectrum and generates a null-adjusted synthesized speech spectrum. An inverse transform circuit generates a null-adjusted time-series speech signal. An overlap and add circuit synthesizes the speech signal based on the null-adjusted time-series speech signal.

Type: Application

Filed: March 3, 2008

Publication date: September 3, 2009

Applicant: QNX SOFTWARE SYSTEMS (WAVEMAKERS), INC.

Inventors: Xueman Li, Phillip A. Hetherington, Shahla Parveen, Tommy TSZ Chun Chiu
SPEECH SYNTHESIS APPARATUS AND METHOD THEREOF

Publication number: 20090216537

Abstract: A speech synthesis apparatus includes a text obtaining device that obtains text data for speech synthesis from the outside, a language processor that carries out morphological analysis/parsing to the text data, a prosodic processor that outputs, to a speech synthesizer, a synthesis unit string based on the prosodic and language related attributes of the text data such as accents and word classes, the speech synthesizer that generates synthesized speech from the synthesis unit string, and a speech waveform output device that reproduces a prescribed amount of output synthesized speech after it is accumulated or sequentially as it is output.

Type: Application

Filed: October 19, 2006

Publication date: August 27, 2009

Applicant: KABUSHIKI KAISHA TOSHIBA

Inventors: Osamu Nishiyama, Masahiro Morita, Takehiko Kagoshima
METHOD AND APPARATUS FOR SPEECH SYNTHESIS OF TEXT MESSAGE

Publication number: 20090198497

Abstract: Provided is a method and apparatus for speech synthesis of a text message. The method includes receiving input of voice parameters for a text message, storing each of the text message and the input voice parameters in a data packet, and transmitting the data packet to a receiving terminal.

Type: Application

Filed: December 24, 2008

Publication date: August 6, 2009

Applicant: Samsung Electronics Co., Ltd.

Inventor: Nyeong-kyu Kwon
Accelerometer-Based Control of Wearable Devices

Publication number: 20090164219

Abstract: Accelerometer-based orientation and/or movement detection for controlling wearable devices, such as wrist-worn audio recorders and wristwatches. A wrist-worn audio recorder can use an accelerometer to detect the orientation and/or movement of a user's wrist and subsequently activate a corresponding audio-recorder function, for instance recording or playback. A wearable device with a vibration mechanism can use this method to remind a user of an undesirable movement such as restless leg movement. Likewise, a talking wristwatch can use this method to activate audio reporting of time when a user moves or orients his or her wrist in close proximity to his or her ear. In such applications, and many others, accelerometer-based control of the wearable device offers significant advantages over conventional means of control, particularly in terms of ease of use and durability.

Type: Application

Filed: December 18, 2008

Publication date: June 25, 2009

Applicant: ENBIOMEDIC

Inventors: King-Wah Walter Yeung, Wei-Wei Vivian Yeung
Voice Rule-Synthesizer and Compressed Voice-Element Data Generator for the same

Publication number: 20090157397

Abstract: A voice rule-synthesizer synthesizes a voice waveform based on the voice data stored in a database, which stores a large number of compressed voice data sections in a data stream. Each voice data section is stored as a plurality of frames compressed in a fixed-length frame format. The storage capacity of the database is reduced because the compressed voice data sections are stored as the data stream.

Type: Application

Filed: February 19, 2009

Publication date: June 18, 2009

Inventor: Reishi Kondo
DYNAMIC MODIFICATION OF VOICE SELECTION BASED ON USER SPECIFIC FACTORS

Publication number: 20090043583

Abstract: The present invention discloses a solution for customizing synthetic voice characteristics in a user specific fashion. The solution can establish a communication between a user and a voice response system. A data store can be searched for a speech profile associated with the user. When a speech profile is found, a set of speech output characteristics established for the user from the profile can be determined. Parameters and settings of a text-to-speech engine can be adjusted in accordance with the determined set of speech output characteristics. During the established communication, synthetic speech can be generated using the adjusted text-to-speech engine. Thus, each detected user can hear a synthetic speech generated by a different voice specifically selected for that user. When no user profile is detected, a default voice or a voice based upon a user's speech or communication details can be used.

Type: Application

Filed: August 8, 2007

Publication date: February 12, 2009

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Ciprian AGAPI, Oscar J. BLASS, Oswaldo GAGO, Roberto VILA
Speech synthesizer and speech synthesis system

Publication number: 20090024393

Abstract: A speech synthesizer conducts a dialogue among a plurality of synthesized speakers, including a self speaker and one or more partner speakers, by use of a voice profile table describing emotional characteristics of synthesized voices, a speaker database storing feature data for different types of speakers and/or different speaking tones, a speech synthesis engine that synthesizes speech from input text according to feature data fitting the voice profile assigned to each synthesized speaker, and a profile manager that updates the voice profiles according to the content of the spoken text. The voice profiles of partner speakers are initially derived from the voice profile of the self speaker. A synthesized dialogue can be set up simply by selecting the voice profile of the self speaker.

Type: Application

Filed: June 11, 2008

Publication date: January 22, 2009

Applicant: OKI ELECTRIC INDUSTRY CO., LTD.

Inventor: Tsutomu Kaneyasu
TEXT-TO-SPEECH ASSIST FOR PORTABLE COMMUNICATION DEVICES

Publication number: 20090012793

Abstract: The present invention provides a text-to-speech assist for portable communication devices. A method for communicating text data using a portable communication device in accordance with the present invention includes: displaying text data on a display of the portable communication device while communicating with a party; selecting at least a portion of the displayed text data; converting the selected text data into synthesized speech; and providing the synthesized speech to the party using the portable communication device.

Type: Application

Filed: July 3, 2007

Publication date: January 8, 2009

Inventors: Quyen C. Dao, Gerard R. Raimondi, William D. Reeves, Paul L. Snyder
Content Distributing System and Content Receiving and Reproducing Device

Publication number: 20080250095

Abstract: A technology that makes a program line-up related to contents distributed to a user side, depending on various conditions, preferences, and communication environments of the user side that views and listens to the content is disclosed. According to the technology, an on-board device (content receiving and reproducing device) 1 receives, from a service server 5, potential content list information of contents that can be distributed from a content server 7. The on-board device 1 sorts appropriate content from among the contents in the potential content list information and decide on a reproducing order of the contents, based on conditions of the user side, such as user preferences and vehicle conditions, conditions related to an environment on the user side, such as the communication environment, and conditions related to the contents, such as the genre of the content. The on-board device 1 creates program table information (timetable) and transmits the program table information to the content server.

Type: Application

Filed: March 3, 2005

Publication date: October 9, 2008

Applicant: DENSO IT LABORATORY, INC.

Inventor: Nobuhiro Mizuno

1 2 next