Details Of Speech Synthesis Systems, E.g., Synthesizer Architecture, Memory Management, Etc. (epo) Patents (Class 704/E13.005)
  • Publication number: 20120197645
    Abstract: An electronic apparatus includes a communication module, a storage module, a manipulation module, voice output control module, and a control module. The communication module receives book data delivered externally. The storage module stores the received book data. The manipulation module converts a manipulation of a user into an electrical signal. The voice output control module reproduces, as a voice, the book data based on the manipulation while controlling the reproduction speed of the voice. The control module determines a part that is important to the user, stores, in the storage module, a position of voice reproduction of the book data, and synchronizes the position of the voice reproduction with a reproduction position in the book data.
    Type: Application
    Filed: September 22, 2011
    Publication date: August 2, 2012
    Inventor: Midori Nakamae
  • Publication number: 20120166175
    Abstract: A method and system for construction and rendering of annotations associated with an electronic image is disclosed. The system comprises a first data repository for storing the electronic image, which has a plurality of pixels, with one or more pixels annotated at a plurality of levels, which contain descriptive characteristics of the pixel, in ascending magnitude, such that the descriptive characteristics at a subsequent level are with reference to descriptive characteristics of one or more pixels surrounding the pixel. The system comprises a second data repository for storing the annotations. An image display module is configured to display the electronic image. A pixel and level identification module is configured to receive pixel and level selection details from a user-interface. An annotation retrieval module is configured to retrieve annotations corresponding to the pixel and level selection from the second repository and renders the retrieved annotations for the electronic image.
    Type: Application
    Filed: December 21, 2011
    Publication date: June 28, 2012
    Applicant: Tata Consultancy Services Ltd.
    Inventor: Sunil Kumar Kopparapu
  • Publication number: 20120059654
    Abstract: An objective is to provide a technique for accurately reproducing features of a fundamental frequency of a target-speaker's voice on the basis of only a small amount of learning data. A learning apparatus learns shift amounts from a reference source F0 pattern to a target F0 pattern of a target-speaker's voice. The learning apparatus associates a source F0 pattern of a learning text to a target F0 pattern of the same learning text by associating their peaks and troughs. For each of points on the target F0 pattern, the learning apparatus obtains shift amounts in a time-axis direction and in a frequency-axis direction from a corresponding point on the source F0 pattern in reference to a result of the association, and learns a decision tree using, as an input feature vector, linguistic information obtained by parsing the learning text, and using, as an output feature vector, the calculated shift amounts.
    Type: Application
    Filed: March 16, 2010
    Publication date: March 8, 2012
    Inventors: Masafumi Nishimura, Ryuki Tachibana
  • Publication number: 20120046788
    Abstract: A speech system (2) used for a robot (1) and a robot (1) with the speech system (2) are provided. The speech system (2) includes an audio file storage unit (200) and a speech control unit (300). The audio file storage unit (200) stores audio files obtained from an audio file preparation unit (100), which is located outside the robot (1). The data stored in the audio file storage unit (200) can be prepared, modified or replaced according to the requirement of a user. According to the received robot state information, the speech control unit (300) converts the audio data of the audio file, which corresponds to the state information and is stored in the audio file storage unit (200), to a corresponding analog signal, and then plays the analog signal.
    Type: Application
    Filed: January 22, 2010
    Publication date: February 23, 2012
    Inventor: Dongqi Qian
  • Publication number: 20110144989
    Abstract: Disclosed herein are systems, methods, and computer-readable storage media for sending a spoken message as a text message. The method includes initiating a connection with a first subscriber, receiving from the first subscriber a spoken message and spoken information associated with at least one recipient address. The method further includes converting the spoken message to text via an audible text center subsystem (ATCS), and delivering the text to the recipient address. The method can also include verifying a subscription status of the first subscriber, or delivering the text to the recipient address based on retrieved preferences of the first subscriber. The preferences can be retrieved from a consolidated network repository or embedded within the spoken message. Text and the spoken message can be delivered to the same or different recipient addresses. The method can include updating recipient addresses based on a received oral command from the first subscriber.
    Type: Application
    Filed: December 15, 2009
    Publication date: June 16, 2011
    Applicant: AT&T Intellectual Property I, L.P.
    Inventor: Sangar DOWLATKHAH
  • Publication number: 20110047221
    Abstract: Methods and systems that allow multiple channels of communication between multiple users via a platform that automatically integrates and synchronizes the resources of each user during the communication are described. The systems comprise a platform capable of handling multiple types of communications with multiple users and systems. The platform contains a browser, one or more servers for handling communications between the platform and user devices that are external to the platform, a speech engine for converting text to speech and vice versa, a chat server, an email server, a text server, a data warehouse, a scheduler, a workflow/rules engine, a reports server, and integration APIs that can be integrated with 3rd party systems and allow those systems to be integrated with the platform. The platform is linked to multiple users (and their devices or systems) through a communications network.
    Type: Application
    Filed: August 24, 2009
    Publication date: February 24, 2011
    Inventors: Timothy Watanabe, Kenneth Poray, Craig So, Ryan Menda
  • Publication number: 20100318364
    Abstract: Disclosed are techniques and systems to provide a narration of a text in multiple different voices. Further disclosed are techniques and systems for generating an audible output in which different portions of a text are narrated using voice models associated with different characters.
    Type: Application
    Filed: January 14, 2010
    Publication date: December 16, 2010
    Inventors: Raymond C. Kurzweil, Paul Albrecht, Peter Chapman
  • Publication number: 20100312565
    Abstract: An interactive prompt generation and TTS optimization tool with a user-friendly graphical user interface is provided. The tool accepts HTS abstraction or speech recognition processed input from a user to generate an enhanced initial waveform for synthesis. Acoustic features of the waveform are presented to the user with graphical visualizations enabling the user to modify various parameters of the speech synthesis process and listen to modified versions until an acceptable end product is reached.
    Type: Application
    Filed: June 9, 2009
    Publication date: December 9, 2010
    Applicant: Microsoft Corporation
    Inventors: Jian-Chao Wang, Lu-Jun Yuan, Sheng Zhao, Fileno A. Alleva, Jingyang Xu, Chiwei Che
  • Publication number: 20100204985
    Abstract: A warping factor estimation system comprises label information generation unit that outputs voice/non-voice label information, warp model storage unit in which a probability model representing voice and non-voice occurrence probabilities is stored, and warp estimation unit that calculates a warping factor in the frequency axis direction using the probability model representing voice and non-voice occurrence probabilities, voice and non-voice labels, and a cepstrum.
    Type: Application
    Filed: September 22, 2008
    Publication date: August 12, 2010
    Inventor: Tadashi Emori
  • Publication number: 20100134319
    Abstract: An underwater communications system is provided that transmits electromagnetic and/or magnetic signals to a remote receiver. The transmitter includes a data input. A digital data compressor compresses data to be transmitted. A modulator modulates compressed data onto a carrier signal. An electrically insulated, magnetic coupled antenna transmits the compressed, modulated signals. The receiver that has an electrically insulated, magnetic coupled antenna for receiving a compressed, modulated signal. A demodulator is provided for demodulating the signal to reveal compressed data. A de-compressor de-compresses the data. An appropriate human interface is provided to present transmitted data into text/audio/visible form. Similarly, the transmit system comprises appropriate audio/visual/text entry mechanisms.
    Type: Application
    Filed: February 3, 2010
    Publication date: June 3, 2010
    Inventors: Mark Rhodes, Derek Wolfe, Brendan Hyland
  • Publication number: 20100131267
    Abstract: A method of recording speech for use in a speech samples library. In an exemplary embodiment, the method comprises recording a speaker pronouncing a phoneme with musical parameters characterizing pronunciation of another phoneme by the same or another speaker. For example, in one embodiment the method comprises: providing a recording of a first speaker pronouncing a first phoneme in a phonemic context. The pronunciation is characterized by some musical parameters. A second reader, who may be the same as the first reader, is then recorded pronouncing a second phoneme (different from the first phoneme) with the musical parameters that characterizes pronunciation of the first phoneme by the first speaker. The recordings made by the second reader are used for compiling a speech samples library.
    Type: Application
    Filed: March 19, 2008
    Publication date: May 27, 2010
    Applicant: Vivo Text Ltd.
    Inventors: Gershon Silbert, Andres Hakim
  • Publication number: 20100094631
    Abstract: An apparatus for synthesizing a rendered output signal having a first audio channel and a second audio channel includes a decorrelator stage for generating a decorrelator signal based on a downmix signal, and a combiner for performing a weighted combination of the downmix signal and a decorrelated signal based on parametric audio object information, downmix information and target rendering information. The combiner solves the problem of optimally combining matrixing with decorrelation for a high quality stereo scene reproduction of a number of individual audio objects using a multichannel downmix.
    Type: Application
    Filed: April 23, 2008
    Publication date: April 15, 2010
    Inventors: Jonas Engdegard, Heiko Purnhagen, Barbara Resch, Lars Villemoes, Cornelia Falch, Juergen Herre, Johannes Hilpert, Andreas Hoelzer, Leonid Terentiev
  • Publication number: 20100057465
    Abstract: A text-to-speech (TTS) system implemented in an automotive vehicle is dynamically tuned to improve intelligibility over a wide variety of vehicle operating states and environmental conditions. In one embodiment of the present invention, a TTS system is interfaced to one or more vehicle sensors to measure parameters including vehicle speed, interior noise, visibility conditions, and road roughness, among others. In response to measurements of these operating parameters, TTS voice volume, pitch, and speed, among other parameters, may be tuned in order to improve intelligibility of the TTS voice system and increase its effectiveness for the operator of the vehicle.
    Type: Application
    Filed: September 3, 2008
    Publication date: March 4, 2010
    Inventors: DAVID MICHAEL KIRSCH, Ritchie Winson Huang
  • Publication number: 20100049525
    Abstract: A method is provided of providing cues from am electronic communication device to a user while capturing an utterance. A plurality of cues associated with the user utterance are provided by the device to the user in at least near real-time. For each of a plurality of portions of the utterance, data representative of the respective portion of the user utterance is communicated from the electronic communication device to a remote electronic device. In response to this communication, data, representative of at least one parameter associated with the respective portion of the user utterance, is received at the electronic communication device. The electronic communication device provides one or more cues to the user based on the at least parameter. At least one of the cues is provided by the electronic communication device to the user prior to completion of the step of capturing the user utterance.
    Type: Application
    Filed: August 24, 2009
    Publication date: February 25, 2010
    Applicant: YAP, INC.
    Inventor: Scott Edward Paden
  • Publication number: 20090319275
    Abstract: A speech synthesizing device, the device includes: a text accepting unit for accepting text data; an extracting unit for extracting a special character including a pictographic character, a face mark or a symbol from text data accepted by the text accepting unit; a dictionary database in which a plurality of special characters and a plurality of phonetic expressions for each special character are registered; a selecting unit for selecting a phonetic expression of an extracted special character from the dictionary database when the extracting unit extracts the special character; a converting unit for converting the text data accepted by the accepting unit to a phonogram in accordance with a phonetic expression selected by the selecting unit in association with the extracted special character; and a speech synthesizing unit for synthesizing a voice from a phonogram obtained by the converting
    Type: Application
    Filed: August 31, 2009
    Publication date: December 24, 2009
    Applicant: FUJITSU LIMITED
    Inventor: Takuya Noda
  • Publication number: 20090313022
    Abstract: A method and system for audibly outputting text messages includes: setting a vocalizing function for audibly outputting text messages, searching a character speech library for each character of a received text message, and acquiring pronunciation data of each character of the received text message. The method and the system further includes vocalizing the pronunciation data of each character of the received text message, generating a voice message, and audibly outputting the generated voice message.
    Type: Application
    Filed: December 23, 2008
    Publication date: December 17, 2009
    Inventor: CHI-MING HSIAO
  • Publication number: 20090281808
    Abstract: A voice data creation system includes a dictionary data memory section that stores dictionary data for generating synthesized voice data corresponding to text data; an edition processing section that displays an edition screen for editing a voice guidance message as a sentence including a plurality of phrases to receive edition input formation so as to perform an edition processing based on the edition input information; a list information generation processing section that generates list information relating to each sentence and phrases included in the each sentence based on a result of the edition processing; a phrase voice data generating section that determines a target phrase for voice data creation based on the list information to generate and maintain voice data corresponding to the target phrase determined for voice data creation based on the dictionary data; and a memory write information generating section that determines a target phrase to be stored in a voice data memory based on the list informat
    Type: Application
    Filed: April 28, 2009
    Publication date: November 12, 2009
    Inventors: Jun NAKAMURA, Fumihito BAISHO
  • Publication number: 20090254347
    Abstract: Embodiments of the present invention provide a method and computer program product for the proactive completion of input fields for automated voice enablement of a Web page. In an embodiment of the invention, a method for proactively completing empty input fields for voice enabling a Web page can be provided. The method can include receiving speech input for an input field in a Web page and inserting a textual equivalent to the speech input into the input field in a Web page. The method further can include locating an empty input field remaining in the Web page and generating a speech grammar for the input field based upon permitted terms in a core attribute of the empty input field and prompting for speech input for the input field. Finally, the method can include posting the received speech input and the grammar to an automatic speech recognition (ASR) engine and inserting a textual equivalent to the speech input provided by the ASR engine into the empty input field.
    Type: Application
    Filed: April 7, 2008
    Publication date: October 8, 2009
    Inventors: Victor S. Moore, Wendi L. Nusbickel
  • Publication number: 20090239201
    Abstract: A phonetic pronunciation training device, phonetic pronunciation training method, and phonetic pronunciation training program is provided wherein pronunciation and sounds in language acquisition can be self-learned and listening skills, spelling skills and vocabulary can be enhanced. The present invention comprises at least a data base for storing phonetic pronunciation data associated with phonetic data and phonetic symbol data indicating this phonetic data, a selection function block for receiving instruction signal from an input means and randomly selecting phonetic pronunciation data, a phonetic pronunciation data reproducing function for reproducing selected phonetic pronunciation data, and a phonetic symbol data correct/error determination function block for comparing phonetic symbol data input by the input means and phonetic symbol data corresponding to the selected phonetic pronunciation data and recording the correct/error result to a memory means.
    Type: Application
    Filed: July 15, 2005
    Publication date: September 24, 2009
    Inventor: Richard A Moe
  • Publication number: 20090234565
    Abstract: A navigation device is disclosed including a processor unit, memory device, and a speaker. The memory device includes a plurality of sound samples. In at least one embodiment, the navigation device is arranged to play a selection of the sound samples over speaker to provide navigation instructions. In at least one embodiment, the navigation device further includes an input device for receiving sound samples and is arranged for storing the received sound samples in memory device for subsequent playback over speaker for providing navigation instructions.
    Type: Application
    Filed: February 19, 2007
    Publication date: September 17, 2009
    Inventor: Pieter Andreas Geelen
  • Publication number: 20090204402
    Abstract: Method and apparatus for creating customized podcasts with multiple voices, where text content is converted into audio content, and where the voices are selected at least in part on words in the text content suggestive of the type of voice. Types of voice include at least male and female, accent, language, and speed.
    Type: Application
    Filed: January 9, 2009
    Publication date: August 13, 2009
    Inventors: Harpreet MARWAHA, Brett ROBINSON
  • Publication number: 20090204405
    Abstract: Apparatus and method for generating high quality synthesized speech having smooth waveform concatenation. The apparatus includes a pitch frequency calculation section, a pitch synchronization position calculation section, a unit waveform storage, a unit waveform selection section, a unit waveform generation section, and a waveform synthesis section. The unit waveform generation section includes a conversion ratio calculation section, a sampling rate conversion section, and a unit waveform re-selection section. The conversion ratio calculation section calculates a sampling rate conversion ratio from the pitch information and the position of pitch synchronization, and the sampling rate conversion section converts the sampling rate of the unit waveform, delivered as input, based on the sampling rate conversion ratio.
    Type: Application
    Filed: September 4, 2006
    Publication date: August 13, 2009
    Applicant: NEC CORPORATION
    Inventors: Masanori Kato, Satoshi Tsukada
  • Publication number: 20090177474
    Abstract: A speech synthesizer includes a periodic component fusing unit and an aperiodic component fusing unit, and fuses periodic components and aperiodic components of a plurality of speech units for each segment, which are selected by a unit selector, by a periodic component fusing unit and an aperiodic component fusing unit, respectively. The speech synthesizer is further provided with an adder, so that the adder adds, edits, and concatenates the periodic components and the aperiodic components of the fused speech units to generate a speech waveform.
    Type: Application
    Filed: September 18, 2008
    Publication date: July 9, 2009
    Inventors: Masahiro Morita, Takehiko Kagoshima
  • Publication number: 20090157407
    Abstract: An apparatus for semantic media conversion from source data to audio/video data may include a processor. The processor may be configured to parse source data having text and one or more tags and create a semantic structure model representative of the source data, and generate audio data comprising at least one of speech converted from parsed text of the source data contained in the semantic structure model and applied audio effects. Corresponding methods and computer program products are also provided.
    Type: Application
    Filed: December 12, 2007
    Publication date: June 18, 2009
    Inventors: Tetsuo Yamabe, Kiyotaka Takahashi
  • Publication number: 20090132253
    Abstract: Methods and apparatuses to perform context-aware unit selection for natural language processing are described. Streams of information associated with input units are received. The streams of information are analyzed in a context associated with first candidate units to determine a first set of weights of the streams of information. A first candidate unit is selected from the first candidate units based on the first set of weights of the streams of information. The streams of information are analyzed in the context associated with second candidate units to determine a second set of weights of the streams of information. A second candidate unit is selected from second candidate units to concatenate with the first candidate unit based on the second set of weights of the streams of information.
    Type: Application
    Filed: November 20, 2007
    Publication date: May 21, 2009
    Inventor: Jerome Bellegarda
  • Publication number: 20090100150
    Abstract: The present invention provides an assistive technology screen reader in a distributed network computer system. The screen reader, on a server computer system, receives display information output from one or more applications. The screen reader converts the text and symbolic content of the display information into a performant format for transmission across a network. The screen reader, on a client computer system, receives the performant format. The received performant format is converted to a device type file, by the screen reader. The screen reader then presents the device type file to a device driver, for output to a speaker, braille reader, or the like.
    Type: Application
    Filed: June 14, 2002
    Publication date: April 16, 2009
    Inventor: David Yee
  • Publication number: 20090048842
    Abstract: Techniques for operating a reading machine are disclosed. The techniques include forming an N-dimensional features vector based on features of an image, the features corresponding to characteristics of at least one object depicted in the image, representing the features vector as a point in n-dimensional space, where n corresponds to N, the number of features in the features vector and comparing the point in n-dimensional space to a centroid that represents a cluster of points in the n-dimensional space corresponding to a class of objects to determine whether the point belongs in the class of objects corresponding to the centroid.
    Type: Application
    Filed: April 28, 2008
    Publication date: February 19, 2009
    Inventors: Paul Albrecht, Rafael Maya Zetune, Lucy Gibson, Raymond C. Kurzweil