Methods For Producing Synthetic Speech; Speech Synthesizers (epo) Patents (Class 704/E13.002)
  • Publication number: 20130246063
    Abstract: A system and methods are disclosed which provide simple and rapid animated content creation, particularly for more life-like synthesis of voice segments associated with an animated element. A voice input tool enables quick creation of spoken language segments for animated characters. Speech is converted to text. That text may be reconverted to speech with prosodic elements added. The text, prosodic elements, and voice may be edited.
    Type: Application
    Filed: April 7, 2011
    Publication date: September 19, 2013
    Applicant: GOOGLE INC.
    Inventor: Eric Teller
  • Publication number: 20130226576
    Abstract: Speech recognition processing captures phonemes of words in a spoken speech string and retrieves text of words corresponding to particular combinations of phonemes from a phoneme dictionary. A text-to-speech synthesizer then can produce and substitute a synthesized pronunciation of that word in the speech string. If the speech recognition processing fails to recognize a particular combination of phonemes of a word, as spoken, as may occur when a word is spoken with an accent or when the speaker has a speech impediment, the speaker is prompted to clarify the word by entry, as text, from a keyboard or the like for storage in the phoneme dictionary such that a synthesized pronunciation of the word can be played out when the initially unrecognized spoken word is again encountered in a speech string to improve intelligibility, particularly for conference calls.
    Type: Application
    Filed: February 23, 2012
    Publication date: August 29, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Peeyush Jaiswal, Burt Leo Vialpando, Fang Wang
  • Patent number: 8311830
    Abstract: Provided is a system and method for building and managing a customized voice of an end-user, comprising the steps of designing a set of prompts for collection from the user, wherein the prompts are selected from both an analysis tool and by the user's own choosing to capture voice characteristics unique to the user. The prompts are delivered to the user over a network to allow the user to save a user recording on a server of a service provider. This recording is then retrieved and stored on the server and then set up on the server to build a voice database using text-to-speech synthesis tools. A graphical interface allows the user to continuously refine the data file to improve the voice and customize parameter and configuration settings, thereby forming a customized voice database which can be deployed or accessed.
    Type: Grant
    Filed: December 6, 2011
    Date of Patent: November 13, 2012
    Assignee: Cepstral, LLC
    Inventors: Craig F. Campbell, Kevin A. Lenzo, Alexandre D. Cox
  • Publication number: 20120265534
    Abstract: The method provides a spectral speech description to be used for synthesis of a speech utterance, where at least one spectral envelope input representation is received. In one solution the improvement is made by manipulation an extremum, i.e. a peak or a valley, in the rapidly varying component of the spectral envelope representation. The rapidly varying component of the spectral envelope representation is manipulated to sharpen and/or accentuate extrema after which it is merged back with the slowly varying component or the spectral envelope input representation to create an enhanced spectral envelope final representation. In other solutions a complex spectrum envelope final representation is created with phase information derived from one of the group delay representation of a real spectral envelope input representation corresponding to a short-time speech signal and a transformed phase component of the discrete complex frequency domain input representation corresponding to the speech utterance.
    Type: Application
    Filed: September 4, 2009
    Publication date: October 18, 2012
    Applicant: SVOX AG
    Inventors: Geert Coorman, Johan Wouters
  • Publication number: 20120150542
    Abstract: A method includes obtaining audio data representing audio content from at least one speaker. The method also includes spatially processing the audio data to create at least one sound field, where each sound field has a spatial characteristic that is unique to a specific speaker. The method further includes generating the at least one sound field using the processed audio data. The audio data could represent audio content from multiple speakers, and generating the at least one sound field could include generating multiple sound fields around a listener. The spatially processing could include performing beam forming to create multiple directional beams, and generating the multiple sound fields around the listener could include generating the directional beams with different apparent origins around the listener. The method could further include separating the audio data based on speaker, where each sound field is associated with the audio data from one of the speakers.
    Type: Application
    Filed: December 9, 2010
    Publication date: June 14, 2012
    Applicant: NATIONAL SEMICONDUCTOR CORPORATION
    Inventor: Wei Ma
  • Patent number: 8086457
    Abstract: Provided is a system and method for building and managing a customized voice of an end-user, comprising the steps of designing a set of prompts for collection from the user, wherein the prompts are selected from both an analysis tool and by the user's own choosing to capture voice characteristics unique to the user. The prompts are delivered to the user over a network to allow the user to save a user recording on a server of a service provider. This recording is then retrieved and stored on the server and then set up on the server to build a voice database using text-to-speech synthesis tools. A graphical interface allows the user to continuously refine the data file to improve the voice and customize parameter and configuration settings, thereby forming a customized voice database which can be deployed or accessed.
    Type: Grant
    Filed: May 29, 2008
    Date of Patent: December 27, 2011
    Assignee: Cepstral, LLC
    Inventors: Craig F. Campbell, Kevin A. Lenzo, Alexandre D. Cox
  • Publication number: 20110313771
    Abstract: A method for audibly instructing a user to interact with a function. A function is associated with a user-written selectable item. The user-written selectable item is recognized on a surface. In response to recognizing the user-written selectable item, a first instructional message related to the operation of the function is audibly rendered without requiring further interaction from the user.
    Type: Application
    Filed: December 13, 2010
    Publication date: December 22, 2011
    Applicant: LEAPFROG ENTERPRISES, INC.
    Inventor: James Marggraff
  • Publication number: 20110246200
    Abstract: Pre-saved concatenation cost data is compressed through speech segment grouping. Speech segments are assigned to a predefined number of groups based on their concatenation cost values with other speech segments. A representative segment is selected for each group. The concatenation cost between two segments in different groups may then be approximated by that between the representative segments of their respective groups, thereby reducing an amount of concatenation cost data to be pre-saved.
    Type: Application
    Filed: April 5, 2010
    Publication date: October 6, 2011
    Applicant: Microsoft Corporation
    Inventors: Huicheng Song, Guoliang Zhang, Zhiwei Weng
  • Publication number: 20110224977
    Abstract: A robot may include a driving control unit configured to control a driving of a movable unit that is connected movably to a body unit, a voice generating unit configured to generate a voice, and a voice output unit configured to output the voice, which has been generated by the voice generating unit. The voice generating unit may correct the voice, which is generated, based on a bearing of the movable unit, which is controlled by the driving control unit, to the body unit.
    Type: Application
    Filed: September 14, 2010
    Publication date: September 15, 2011
    Applicant: HONDA MOTOR CO., LTD.
    Inventors: Kazuhiro NAKADAI, Takuma OTSUKA, Hiroshi OKUNO
  • Publication number: 20110210822
    Abstract: A refrigerator is provided. The refrigerator includes a voice recognition unit for recognizing a voice of a name of food, a memory for storing location information of the food received in a storage chamber, a controller for determining the voice recognized by the voice recognition unit and searching a storage location of the food voice-recognized in accordance with the recognized voice, and a voice output unit for outputting a voice message on the storage location information of the food searched by the controller.
    Type: Application
    Filed: September 11, 2008
    Publication date: September 1, 2011
    Applicant: LG Electronics Inc.
    Inventors: Sung-Ae Lee, Min-Kyeong Kim
  • Publication number: 20110179006
    Abstract: A system and method for providing a natural language interface to a database or the Internet. The method provides a response from a database to a natural language query. The method comprises receiving a user query, extracting key data from the user query, submitting the extracted key data to a data base search engine to retrieve a top n pages from the data base, processing of the top n pages through a natural language dialog engine and providing a response based on processing the top n pages.
    Type: Application
    Filed: March 29, 2011
    Publication date: July 21, 2011
    Applicant: AT&T Corp.
    Inventors: Richard Vandervoort Cox, Hossein Eslambolchi, Behzad Nadji, Mazin G. Rahim
  • Publication number: 20110093272
    Abstract: A media process server apparatus has a speech synthesis data storage device for storing, after categorizing into emotions, data for speech synthesis in association with a user identifier, a text analyzer for determining, from a text message received from a message server apparatus, emotion of text, and a speech data synthesizer for generating speech data with emotional expression by synthesizing speech corresponding to the text, using data for speech synthesis that corresponds to the determined emotion and that is in association with a user identifier of a user who is a transmitter of the text message.
    Type: Application
    Filed: April 2, 2009
    Publication date: April 21, 2011
    Applicant: NTT DOCOMO, INC
    Inventors: Shin-ichi Isobe, Masami Yabusaki
  • Publication number: 20110071836
    Abstract: Disclosed herein are systems, computer-implemented methods, and computer-readable storage media for unit selection synthesis. The method causes a computing device to add a supplemental phoneset to a speech synthesizer front end having an existing phoneset, modify a unit preselection process based on the supplemental phoneset, preselect units from the supplemental phoneset and the existing phoneset based on the modified unit preselection process, and generate speech based on the preselected units. The supplemental phoneset can be a variation of the existing phoneset, can include a word boundary feature, can include a cluster feature where initial consonant clusters and some word boundaries are marked with diacritics, can include a function word feature which marks units as originating from a function word or a content word, and/or can include a pre-vocalic or post-vocalic feature. The speech synthesizer front end can incorporates the supplemental phoneset as an extra feature.
    Type: Application
    Filed: September 21, 2009
    Publication date: March 24, 2011
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: Alistair D. CONKIE, Mark BEUTNAGEL, Yeon-Jun KIM, Ann K. SYRDAL
  • Publication number: 20110046955
    Abstract: There is provided a speech processing apparatus including: a data obtaining unit which obtains music progression data defining a property of one or more time points or one or more time periods along progression of music; a determining unit which determines an output time point at which a speech is to be output during reproducing the music by utilizing the music progression data obtained by the data obtaining unit; and an audio output unit which outputs the speech at the output time point determined by the determining unit during reproducing the music.
    Type: Application
    Filed: August 12, 2010
    Publication date: February 24, 2011
    Inventors: Tetsuo IKEDA, Ken MIYASHITA, Tatsushi NASHIDA
  • Publication number: 20110022390
    Abstract: In order to speak numerals in a manner readily comprehensible to a user, a speech device includes a voice synthesis portion 55 which, when a given character string includes a numeral made up of a plurality of digits, speaks the numeral in either a first speech method in which the numeral is read aloud as individual digits or a second speech method in which the numeral is read aloud as a full number, a user definition table 81, an association table 83, a region table 84, and a digit number table 87 which associate a type of a character string with either the first speech method or the second speech method, a process executing portion 53 which executes a process to thereby output data, and a speech control portion 51 which generates a character string on the basis of the output data and causes the voice synthesis portion 55 to speak the generated character string in one of the first and second speech methods that is associated with the type of the output data.
    Type: Application
    Filed: February 4, 2009
    Publication date: January 27, 2011
    Applicant: SANYO ELECTRIC CO., LTD.
    Inventors: Kinya Otani, Naoki Hirose
  • Patent number: 7869999
    Abstract: A system and method for generating synthetic speech, which operates in a computer implemented Text-To-Speech system. The system comprises at least a speaker database that has been previously created from user recordings, a Front-End system to receive an input text and a Text-To-Speech engine. The Front-End system generates multiple phonetic transcriptions for each word of the input text, and the TTS engine uses a cost function to select which phonetic transcription is the more appropriate for searching the speech segments within the speaker database to be concatenated and synthesized.
    Type: Grant
    Filed: August 10, 2005
    Date of Patent: January 11, 2011
    Assignee: Nuance Communications, Inc.
    Inventors: Christel Amato, Hubert Crepy, Stephane Revelin, Claire Waast-Richard
  • Publication number: 20100324903
    Abstract: Disclosed are techniques and systems to provide a narration of a text in multiple different voices. Further disclosed are techniques and systems for providing a plurality of characters at least some of the characters having multiple associated moods for use in document narration.
    Type: Application
    Filed: January 14, 2010
    Publication date: December 23, 2010
    Inventors: Raymond C. Kurzweil, Paul Albrecht, Peter Chapman
  • Publication number: 20100324904
    Abstract: Disclosed are techniques and systems to provide a narration of a text in multiple different languages where the portions of the text narrated using the different voices associated with different languages are selected by a user.
    Type: Application
    Filed: January 14, 2010
    Publication date: December 23, 2010
    Inventors: Raymond C. Kurzweil, Paul Albrecht, Peter Chapman
  • Publication number: 20100312562
    Abstract: A rope-jumping algorithm is employed in a Hidden Markov Model based text to speech system to determine start and end models and to modify the start and end models by setting small co-variances. Disordered acoustic parameters due to violation of parameter constraints are avoided through the modification and result in stable line frequency spectrum for the generated speech.
    Type: Application
    Filed: June 4, 2009
    Publication date: December 9, 2010
    Applicant: Microsoft Corporation
    Inventors: Wenlin Wang, Guoliang Zhang, Jingyang Xu
  • Publication number: 20100305949
    Abstract: It is possible to provide a speech synthesis device, speech synthesis method, and speech synthesis program which can improve a speech quality and reduce a calculation amount with a preferable balance between them. The speech synthesis device includes: a sub-score calculation unit (60/65) which calculates a segment selection sub-score for selecting an optimal segment; and a candidate narrowing unit (70/73) for narrowing the candidates according to the number of the candidate segments and the segment selection sub score. The speech synthesis device performs candidate narrowing by the sub score calculation unit (60/65) and the candidate narrowing unit (70/73) in the candidate selection process when generating a synthesized speech from an input text.
    Type: Application
    Filed: November 25, 2008
    Publication date: December 2, 2010
    Inventors: Masanori Kato, Yasuyuki Mitsui, Reishi Kondo
  • Publication number: 20100274838
    Abstract: A system configured to pre-render an audio representation of textual content for subsequent playback includes a network, a source server, and a requesting device. The source server is configured to provide a plurality of textual content across the network. The requesting device includes a download unit, a signature generating unit, a signature comparing unit, and a text to speech conversion unit. The download unit is configured to download the plurality of textual content from the source server across the network. The signature generating unit is configured to generate a unique signature for each of the textual content. The signature comparing unit is configured to compare each unique signature with a prior corresponding signature to determine whether the corresponding textual content has changed. The text to speech conversion unit is configured to convert the textual content to speech when the textual content has been determined to have changed.
    Type: Application
    Filed: April 24, 2009
    Publication date: October 28, 2010
    Inventor: Richard A. Zemer
  • Publication number: 20100250254
    Abstract: An acquiring unit acquires pattern sentences, which are similar to one another and include fixed segments and non-fixed segments, and substitution words that are substituted for the non-fixed segments. A sentence generating unit generates target sentences by replacing the non-fixed segments with the substitution words for each of the pattern sentences. A first synthetic-sound generating unit generates a first synthetic sound, a synthetic sound of the fixed segment, and a second synthetic-sound generating unit generates a second synthetic sound, a synthetic sound of the substitution word, for each of the target sentences. A calculating unit calculates a discontinuity value of a boundary between the first synthetic sound and the second synthetic sound for each of the target sentences and a selecting unit selects the target sentence having the smallest discontinuity value. A connecting unit connects the first synthetic sound and the second synthetic sound of the target sentence selected.
    Type: Application
    Filed: September 15, 2009
    Publication date: September 30, 2010
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventor: Nobuaki Mizutani
  • Publication number: 20100217657
    Abstract: An adaptive information presentation apparatus and associated methods. In one embodiment, the apparatus comprises a computer readable medium having at least one computer program disposed thereon, the at least one program being configured to adaptively present (e.g., display or play out via an audio system) information that is related or in response to inputs provided via an input device such as a for example touch-screen display device. In one variant, the at least one program analyzes user input to determine a context of the input, and selects advertising related to the context for presentation to the user.
    Type: Application
    Filed: February 24, 2010
    Publication date: August 26, 2010
    Inventor: Robert F. Gazdzinski
  • Publication number: 20100198577
    Abstract: Creation of sub-phonemic Hidden Markov Model (HMM) states and the mapping of those states results in improved cross-language speaker adaptation. The smaller sub-phonemic mapping provides improvements in usability and intelligibility particularly between languages with few common phonemes. HMM states of different languages may be mapped to one another using a distance between the HMM states in acoustic space. This distance may be calculated using Kullback-Leibler divergence and multi-space probability distribution. By combining distance mapping and context mapping for different speakers of the same language improved cross-language speaker adaptation is possible.
    Type: Application
    Filed: February 3, 2009
    Publication date: August 5, 2010
    Applicant: Microsoft Corporation
    Inventors: Yi-Ning Chen, Yao Qian, Frank Kao-Ping Soong
  • Publication number: 20100145706
    Abstract: An object of the present invention is to provide a device and a method for generating a synthesized speech that has an utterance form that matches music. A musical genre estimation unit of the speech synthesizing device estimates the musical genre to which a received music signal belongs, an utterance form selection unit references an utterance form information storage unit to determine an utterance form from the musical genre. A prosody generation unit references a prosody generation rule storage unit, selected from prosody generation rule storage units 151 to 15N according to the utterance form, and generates prosody information from a phonetic symbol sequence. A unit waveform selection unit references a unit waveform data storage unit, selected from unit waveform data storage units 161 to 16N according to the utterance form, and selects a unit waveform from the phonetic symbol sequence and the prosody information.
    Type: Application
    Filed: February 1, 2007
    Publication date: June 10, 2010
    Applicant: NEC CORPORATION
    Inventor: Masanori Kato
  • Publication number: 20100145701
    Abstract: A sensation of presence of voice chat in a virtual space is enhanced. A user speech synthesizer used in a virtual space sharing system where information processing devices share the virtual space. The user speech synthesizer comprises a speech data acquiring section (60) for acquiring speech data representing a speech uttered by the user of one of the information processing devices, an environment sound storage section (66) for storing an environment sound associated with one or more regions defined in the virtual space, a region specifying section (64) for specifying a region corresponding to the user in the virtual space, and an environment sound synthesizing section (68) for acquiring the environment sound associated with the specified region from the environment sound storage section (66), combining the acquired environment sound and the speech data and synthesizing synthesized speech data.
    Type: Application
    Filed: June 7, 2006
    Publication date: June 10, 2010
    Applicant: KONAMI DIGITAL ENTERTAINMENT CO., LTD.
    Inventors: Hiromasa Kaneko, Masaki Takeuchi
  • Publication number: 20100094632
    Abstract: Disclosed herein are various aspects of a toolkit used for generating a TTS voice for use in a spoken dialog system. The embodiments in each case may be in the form of the system, a computer-readable medium or a method for generating the TTS voice. An embodiment of the invention relates to a method of tracking progress in developing a text-to-speech (TTS) voice. The method comprises insuring that a corpus of recorded speech contains reading errors and matches an associated written text, creating a tuple for each utterance in the corpus and tracking progress for each utterance utilizing the tuple. Various parameters may be tracked using the tuple but the tuple provides a means for enabling multiple workers to efficiently process a database of utterance in preparation of a TTS voice.
    Type: Application
    Filed: December 15, 2009
    Publication date: April 15, 2010
    Applicant: AT&T Corp,
    Inventors: Steven Lawrence Davis, Shane Fetters, David Eugene Schultz, Beverly Gustafson, Louise Loney
  • Publication number: 20100088099
    Abstract: A portable reading device includes a computing device and a computer readable medium storing a computer program product to receive an image and select a section of the image to process. The product processes the section of the image with a first process and when the first process is finished processing the section of the image, process a result of the first process with a second process. While the second process is processing, repeats the first process on another section of the image.
    Type: Application
    Filed: December 8, 2009
    Publication date: April 8, 2010
    Inventors: Raymond C. Kurzweil, Paul Albrecht, Lucy Gibson
  • Publication number: 20100082345
    Abstract: An “Animation Synthesizer” uses trainable probabilistic models, such as Hidden Markov Models (HMM), Artificial Neural Networks (ANN), etc., to provide speech and text driven body animation synthesis. Probabilistic models are trained using synchronized motion and speech inputs (e.g., live or recorded audio/video feeds) at various speech levels, such as sentences, phrases, words, phonemes, sub-phonemes, etc., depending upon the available data, and the motion type or body part being modeled. The Animation Synthesizer then uses the trainable probabilistic model for selecting animation trajectories for one or more different body parts (e.g., face, head, hands, arms, etc.) based on an arbitrary text and/or speech input. These animation trajectories are then used to synthesize a sequence of animations for digital avatars, cartoon characters, computer generated anthropomorphic persons or creatures, actual motions for physical robots, etc.
    Type: Application
    Filed: September 26, 2008
    Publication date: April 1, 2010
    Applicant: MICROSOFT CORPORATION
    Inventors: Lijuan Wang, Lei Ma, Frank Kao-Ping Soong
  • Publication number: 20100082346
    Abstract: Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech in a language (e.g., dialect or accent) that is familiar to a user. The algorithms may be implemented on a system including several dedicated render engines. The system may be part of a back end coupled to a front end including storage for media assets and associated synthesized speech, and a request processor for receiving and processing requests that result in providing the synthesized speech. The front end may communicate media assets and associated synthesized speech content over a network to host devices coupled to portable electronic devices on which the media assets and synthesized speech are played back.
    Type: Application
    Filed: September 29, 2008
    Publication date: April 1, 2010
    Applicant: Apple Inc.
    Inventors: Matthew Rogers, Kim Silverman, DeVang Naik, Kevin Lenzo, Benjamin Rottler
  • Publication number: 20100076762
    Abstract: A method for generating animated sequences of talking heads in text-to-speech applications wherein a processor samples a plurality of frames comprising image samples. The processor reads first data comprising one or more parameters associated with noise-producing orifice images of sequences of at least three concatenated phonemes which correspond to an input stimulus. The processor reads, based on the first data. second data comprising images of a noise-producing entity. The processor generates an animated sequence of the noise-producing entity.
    Type: Application
    Filed: November 30, 2009
    Publication date: March 25, 2010
    Applicant: AT&T Corp.
    Inventors: Eric Cosatto, Hans Peter Graf, Juergen Schroeter
  • Publication number: 20100049523
    Abstract: Systems and methods for providing synthesized speech in a manner that takes into account the environment where the speech is presented. A method embodiment includes, based on a listening environment and at least one other parameter associated with at least one other parameter, selecting an approach from the plurality of approaches for presenting synthesized speech in a listening environment, presenting synthesized speech according to the selected approach and based on natural language input received from a user indicating that an inability to understand the presented synthesized speech, selecting a second approach from the plurality of approaches and presenting subsequent synthesized speech using the second approach.
    Type: Application
    Filed: October 28, 2009
    Publication date: February 25, 2010
    Applicant: AT&T Corp.
    Inventors: Kenneth H. Rosen, Carroll W. Creswell, Jeffrey J. Farah, Pradeep K. Bansal, Ann K. Syrdal
  • Publication number: 20100042410
    Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.
    Type: Application
    Filed: August 11, 2009
    Publication date: February 18, 2010
    Inventor: James H. Stephens, JR.
  • Publication number: 20100030775
    Abstract: A method for non-text-based identification of a selected item of stored music. The first broad portion of the method focuses on building a music identification database. That process requires capturing a tag of the selected musical item, and processing the tag to develop reference key to the same. Then the tag is stored, together with the reference key and an association to the stored music. The database is built by collecting a multiplicity of tags. The second broad portion of the method is retrieving a desired item of stored music from the database. That process calls for capturing a query tag from a user, and processing the query tag to develop a query key to the same. The query tag is compared to reference keys stored in the database to identify the desired item of stored music.
    Type: Application
    Filed: October 13, 2009
    Publication date: February 4, 2010
    Applicant: Melodis Corporation
    Inventors: Keyvan Mohajer, Majid Emami, Michal Grabowski, James M. Hom
  • Publication number: 20090326950
    Abstract: A voice waveform interpolating apparatus for interpolating part of stored voice data by another part of the voice data so as to generate voice data. To achieve this, it comprises a voice storage unit, an interpolated waveform generation unit generating interpolated voice data, and a waveform combining unit outputting voice data, a part of the voice data is replaced with another part of the voice data, and further comprises an interpolated waveform setting function unit judging if the other part of the voice data is appropriate as interpolated voice data to be generated by the interpolated waveform generation unit.
    Type: Application
    Filed: August 31, 2009
    Publication date: December 31, 2009
    Applicant: FUJITSU LIMITED
    Inventor: Chikako Matsumoto
  • Publication number: 20090326948
    Abstract: A method, system and computer-usable medium are disclosed for the transcoding of annotated text to speech and audio. Source text is parsed into spoken text passages and sound description passages. A speaker identity is determined for each spoken text passage and a sound element for each sound description passage. The speaker identities and sound elements are automatically referenced to a voice and sound effects schema. A voice effect is associated with each speaker identity and a sound effect with each sound element. Each spoken text passage is then annotated with the voice effect associated with its speaker identity and each sound description passage is annotated with the sound effect associated with its sound element. The resulting annotated spoken text and sound description passages are processed to generate output text operable to be transcoded to speech and audio.
    Type: Application
    Filed: June 26, 2008
    Publication date: December 31, 2009
    Inventors: Piyush Agarwal, Priya B. Benjamin, Kam K. Yee, Neeraj Joshi
  • Publication number: 20090319273
    Abstract: An audio content generation system is a system for generating audio contents including a voice synthesis unit 102 which generates synthesized voice from text; and is provided with an audio content generation unit 103 which is connected to a multimedia database 101 in which contents mainly composed of audio article data V1 to V3 or text article data T1 and T2 are registered respectively, generates synthesized voice SYT1 and SYT2 for the text article data T1 and T2 registered in the multimedia database 101 by using the voice synthesis unit 102, and generates audio contents in which the synthesized voice SYT1 and SYT2 and the audio article data V1 to V3 are organized in accordance with a predetermined order.
    Type: Application
    Filed: June 27, 2007
    Publication date: December 24, 2009
    Applicant: NEC CORPORATION
    Inventors: Yasuyuki Mitsui, Shinichi Doi, Reishi Kondo, Masanori Kato
  • Publication number: 20090310939
    Abstract: A simulation method and system. A computing system receives a first audio and/or video data stream. The first audio and/or video data stream includes data associated with a first person. The computing system monitors the first audio and/or video data stream. The computing system identifies emotional attributes comprised by the first audio and/or video data stream. The computing system generates a second audio and/or video data stream associated with the first audio and/or video data stream. The second audio and/or video data stream includes the data without the emotional attributes. The computing system stores the second audio and/or video data stream.
    Type: Application
    Filed: June 12, 2008
    Publication date: December 17, 2009
    Inventors: Sara H. Basson, Dimitri Kanevsky, Edward Emile Kelley, Bhuvana Ramabhadran
  • Publication number: 20090299974
    Abstract: A computer-readable recording medium stores therein a sequence-map generating program that causes a computer to execute extracting from files that include character strings written therein, a word having q (q?2) characters; extracting from the word extracted at the extracting the word, consecutive characters from a character position s-th (1?s?q?r+1) from a head of the word to a character position determined by a number of characters r (r?q); and generating, for each character position s-th from the head, a consecutive-character sequence map including a flag row that indicates, for each file, whether a file includes the consecutive characters extracted at the extracting the consecutive characters.
    Type: Application
    Filed: January 29, 2009
    Publication date: December 3, 2009
    Applicant: FUJITSU LIMITED
    Inventors: Masahiro Kataoka, Tomoki Nagase, Takashi Tsubokura
  • Publication number: 20090248820
    Abstract: An interface between mobile devices and computing devices, such as a PC or an in-vehicle system permits a user to use the better user interface of the computing device to access and control the operation of the mobile device.
    Type: Application
    Filed: March 25, 2009
    Publication date: October 1, 2009
    Inventors: Otman A. Basir, William Ben Miners
  • Publication number: 20090248417
    Abstract: A method to generate a pitch contour for speech synthesis is proposed. The method is based on finding the pitch contour that maximizes a total likelihood function created by the combination of all the statistical models of the pitch contour segments of an utterance, at one or multiple linguistic levels. These statistical models are trained from a database of spoken speech, by means of a decision tree that for each linguistic level clusters the parametric representation of the pitch segments extracted from the spoken speech data with some features obtained from the text associated with that speech data. The parameterization of the pitch segments is performed in such a way, the likelihood function of any linguistic level can be expressed in terms of the parameters of one of the levels, thus allowing the maximization to be calculated with respect to the parameters of that level.
    Type: Application
    Filed: March 17, 2009
    Publication date: October 1, 2009
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Javier Latorre, Masami Akamine
  • Publication number: 20090222268
    Abstract: A speech synthesis system synthesizes a speech signal corresponding to an input speech signal based on a spectral envelope of the input speech signal. A glottal pulse generator generates a time series of glottal pulses, that are processed into a glottal pulse magnitude spectrum. A shaping circuit shapes the glottal pulse magnitude spectrum based on the spectral envelope and generates a shaped glottal pulse magnitude spectrum. A harmonic null adjustment circuit reduces harmonic nulls in the shaped glottal pulse magnitude spectrum and generates a null-adjusted synthesized speech spectrum. An inverse transform circuit generates a null-adjusted time-series speech signal. An overlap and add circuit synthesizes the speech signal based on the null-adjusted time-series speech signal.
    Type: Application
    Filed: March 3, 2008
    Publication date: September 3, 2009
    Applicant: QNX SOFTWARE SYSTEMS (WAVEMAKERS), INC.
    Inventors: Xueman Li, Phillip A. Hetherington, Shahla Parveen, Tommy TSZ Chun Chiu
  • Publication number: 20090216537
    Abstract: A speech synthesis apparatus includes a text obtaining device that obtains text data for speech synthesis from the outside, a language processor that carries out morphological analysis/parsing to the text data, a prosodic processor that outputs, to a speech synthesizer, a synthesis unit string based on the prosodic and language related attributes of the text data such as accents and word classes, the speech synthesizer that generates synthesized speech from the synthesis unit string, and a speech waveform output device that reproduces a prescribed amount of output synthesized speech after it is accumulated or sequentially as it is output.
    Type: Application
    Filed: October 19, 2006
    Publication date: August 27, 2009
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Osamu Nishiyama, Masahiro Morita, Takehiko Kagoshima
  • Publication number: 20090198497
    Abstract: Provided is a method and apparatus for speech synthesis of a text message. The method includes receiving input of voice parameters for a text message, storing each of the text message and the input voice parameters in a data packet, and transmitting the data packet to a receiving terminal.
    Type: Application
    Filed: December 24, 2008
    Publication date: August 6, 2009
    Applicant: Samsung Electronics Co., Ltd.
    Inventor: Nyeong-kyu Kwon
  • Publication number: 20090164219
    Abstract: Accelerometer-based orientation and/or movement detection for controlling wearable devices, such as wrist-worn audio recorders and wristwatches. A wrist-worn audio recorder can use an accelerometer to detect the orientation and/or movement of a user's wrist and subsequently activate a corresponding audio-recorder function, for instance recording or playback. A wearable device with a vibration mechanism can use this method to remind a user of an undesirable movement such as restless leg movement. Likewise, a talking wristwatch can use this method to activate audio reporting of time when a user moves or orients his or her wrist in close proximity to his or her ear. In such applications, and many others, accelerometer-based control of the wearable device offers significant advantages over conventional means of control, particularly in terms of ease of use and durability.
    Type: Application
    Filed: December 18, 2008
    Publication date: June 25, 2009
    Applicant: ENBIOMEDIC
    Inventors: King-Wah Walter Yeung, Wei-Wei Vivian Yeung
  • Publication number: 20090157397
    Abstract: A voice rule-synthesizer synthesizes a voice waveform based on the voice data stored in a database, which stores a large number of compressed voice data sections in a data stream. Each voice data section is stored as a plurality of frames compressed in a fixed-length frame format. The storage capacity of the database is reduced because the compressed voice data sections are stored as the data stream.
    Type: Application
    Filed: February 19, 2009
    Publication date: June 18, 2009
    Inventor: Reishi Kondo
  • Publication number: 20090043583
    Abstract: The present invention discloses a solution for customizing synthetic voice characteristics in a user specific fashion. The solution can establish a communication between a user and a voice response system. A data store can be searched for a speech profile associated with the user. When a speech profile is found, a set of speech output characteristics established for the user from the profile can be determined. Parameters and settings of a text-to-speech engine can be adjusted in accordance with the determined set of speech output characteristics. During the established communication, synthetic speech can be generated using the adjusted text-to-speech engine. Thus, each detected user can hear a synthetic speech generated by a different voice specifically selected for that user. When no user profile is detected, a default voice or a voice based upon a user's speech or communication details can be used.
    Type: Application
    Filed: August 8, 2007
    Publication date: February 12, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Ciprian AGAPI, Oscar J. BLASS, Oswaldo GAGO, Roberto VILA
  • Publication number: 20090024393
    Abstract: A speech synthesizer conducts a dialogue among a plurality of synthesized speakers, including a self speaker and one or more partner speakers, by use of a voice profile table describing emotional characteristics of synthesized voices, a speaker database storing feature data for different types of speakers and/or different speaking tones, a speech synthesis engine that synthesizes speech from input text according to feature data fitting the voice profile assigned to each synthesized speaker, and a profile manager that updates the voice profiles according to the content of the spoken text. The voice profiles of partner speakers are initially derived from the voice profile of the self speaker. A synthesized dialogue can be set up simply by selecting the voice profile of the self speaker.
    Type: Application
    Filed: June 11, 2008
    Publication date: January 22, 2009
    Applicant: OKI ELECTRIC INDUSTRY CO., LTD.
    Inventor: Tsutomu Kaneyasu
  • Publication number: 20090012793
    Abstract: The present invention provides a text-to-speech assist for portable communication devices. A method for communicating text data using a portable communication device in accordance with the present invention includes: displaying text data on a display of the portable communication device while communicating with a party; selecting at least a portion of the displayed text data; converting the selected text data into synthesized speech; and providing the synthesized speech to the party using the portable communication device.
    Type: Application
    Filed: July 3, 2007
    Publication date: January 8, 2009
    Inventors: Quyen C. Dao, Gerard R. Raimondi, William D. Reeves, Paul L. Snyder
  • Publication number: 20080250095
    Abstract: A technology that makes a program line-up related to contents distributed to a user side, depending on various conditions, preferences, and communication environments of the user side that views and listens to the content is disclosed. According to the technology, an on-board device (content receiving and reproducing device) 1 receives, from a service server 5, potential content list information of contents that can be distributed from a content server 7. The on-board device 1 sorts appropriate content from among the contents in the potential content list information and decide on a reproducing order of the contents, based on conditions of the user side, such as user preferences and vehicle conditions, conditions related to an environment on the user side, such as the communication environment, and conditions related to the contents, such as the genre of the content. The on-board device 1 creates program table information (timetable) and transmits the program table information to the content server.
    Type: Application
    Filed: March 3, 2005
    Publication date: October 9, 2008
    Applicant: DENSO IT LABORATORY, INC.
    Inventor: Nobuhiro Mizuno