Details Of Speech Synthesis Systems, E.g., Synthesizer Architecture, Memory Management, Etc. (epo) Patents (Class 704/E13.005)
-
Publication number: 20120197645Abstract: An electronic apparatus includes a communication module, a storage module, a manipulation module, voice output control module, and a control module. The communication module receives book data delivered externally. The storage module stores the received book data. The manipulation module converts a manipulation of a user into an electrical signal. The voice output control module reproduces, as a voice, the book data based on the manipulation while controlling the reproduction speed of the voice. The control module determines a part that is important to the user, stores, in the storage module, a position of voice reproduction of the book data, and synchronizes the position of the voice reproduction with a reproduction position in the book data.Type: ApplicationFiled: September 22, 2011Publication date: August 2, 2012Inventor: Midori Nakamae
-
Publication number: 20120166175Abstract: A method and system for construction and rendering of annotations associated with an electronic image is disclosed. The system comprises a first data repository for storing the electronic image, which has a plurality of pixels, with one or more pixels annotated at a plurality of levels, which contain descriptive characteristics of the pixel, in ascending magnitude, such that the descriptive characteristics at a subsequent level are with reference to descriptive characteristics of one or more pixels surrounding the pixel. The system comprises a second data repository for storing the annotations. An image display module is configured to display the electronic image. A pixel and level identification module is configured to receive pixel and level selection details from a user-interface. An annotation retrieval module is configured to retrieve annotations corresponding to the pixel and level selection from the second repository and renders the retrieved annotations for the electronic image.Type: ApplicationFiled: December 21, 2011Publication date: June 28, 2012Applicant: Tata Consultancy Services Ltd.Inventor: Sunil Kumar Kopparapu
-
Publication number: 20120059654Abstract: An objective is to provide a technique for accurately reproducing features of a fundamental frequency of a target-speaker's voice on the basis of only a small amount of learning data. A learning apparatus learns shift amounts from a reference source F0 pattern to a target F0 pattern of a target-speaker's voice. The learning apparatus associates a source F0 pattern of a learning text to a target F0 pattern of the same learning text by associating their peaks and troughs. For each of points on the target F0 pattern, the learning apparatus obtains shift amounts in a time-axis direction and in a frequency-axis direction from a corresponding point on the source F0 pattern in reference to a result of the association, and learns a decision tree using, as an input feature vector, linguistic information obtained by parsing the learning text, and using, as an output feature vector, the calculated shift amounts.Type: ApplicationFiled: March 16, 2010Publication date: March 8, 2012Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Masafumi Nishimura, Ryuki Tachibana
-
Publication number: 20120046788Abstract: A speech system (2) used for a robot (1) and a robot (1) with the speech system (2) are provided. The speech system (2) includes an audio file storage unit (200) and a speech control unit (300). The audio file storage unit (200) stores audio files obtained from an audio file preparation unit (100), which is located outside the robot (1). The data stored in the audio file storage unit (200) can be prepared, modified or replaced according to the requirement of a user. According to the received robot state information, the speech control unit (300) converts the audio data of the audio file, which corresponds to the state information and is stored in the audio file storage unit (200), to a corresponding analog signal, and then plays the analog signal.Type: ApplicationFiled: January 22, 2010Publication date: February 23, 2012Applicant: TEK ELECTRICAL (SUZHOU) CO., LTD.Inventor: Dongqi Qian
-
Publication number: 20110144989Abstract: Disclosed herein are systems, methods, and computer-readable storage media for sending a spoken message as a text message. The method includes initiating a connection with a first subscriber, receiving from the first subscriber a spoken message and spoken information associated with at least one recipient address. The method further includes converting the spoken message to text via an audible text center subsystem (ATCS), and delivering the text to the recipient address. The method can also include verifying a subscription status of the first subscriber, or delivering the text to the recipient address based on retrieved preferences of the first subscriber. The preferences can be retrieved from a consolidated network repository or embedded within the spoken message. Text and the spoken message can be delivered to the same or different recipient addresses. The method can include updating recipient addresses based on a received oral command from the first subscriber.Type: ApplicationFiled: December 15, 2009Publication date: June 16, 2011Applicant: AT&T Intellectual Property I, L.P.Inventor: Sangar DOWLATKHAH
-
Publication number: 20110047221Abstract: Methods and systems that allow multiple channels of communication between multiple users via a platform that automatically integrates and synchronizes the resources of each user during the communication are described. The systems comprise a platform capable of handling multiple types of communications with multiple users and systems. The platform contains a browser, one or more servers for handling communications between the platform and user devices that are external to the platform, a speech engine for converting text to speech and vice versa, a chat server, an email server, a text server, a data warehouse, a scheduler, a workflow/rules engine, a reports server, and integration APIs that can be integrated with 3rd party systems and allow those systems to be integrated with the platform. The platform is linked to multiple users (and their devices or systems) through a communications network.Type: ApplicationFiled: August 24, 2009Publication date: February 24, 2011Inventors: Timothy Watanabe, Kenneth Poray, Craig So, Ryan Menda
-
Publication number: 20100318364Abstract: Disclosed are techniques and systems to provide a narration of a text in multiple different voices. Further disclosed are techniques and systems for generating an audible output in which different portions of a text are narrated using voice models associated with different characters.Type: ApplicationFiled: January 14, 2010Publication date: December 16, 2010Inventors: Raymond C. Kurzweil, Paul Albrecht, Peter Chapman
-
Publication number: 20100312565Abstract: An interactive prompt generation and TTS optimization tool with a user-friendly graphical user interface is provided. The tool accepts HTS abstraction or speech recognition processed input from a user to generate an enhanced initial waveform for synthesis. Acoustic features of the waveform are presented to the user with graphical visualizations enabling the user to modify various parameters of the speech synthesis process and listen to modified versions until an acceptable end product is reached.Type: ApplicationFiled: June 9, 2009Publication date: December 9, 2010Applicant: Microsoft CorporationInventors: Jian-Chao Wang, Lu-Jun Yuan, Sheng Zhao, Fileno A. Alleva, Jingyang Xu, Chiwei Che
-
Publication number: 20100204985Abstract: A warping factor estimation system comprises label information generation unit that outputs voice/non-voice label information, warp model storage unit in which a probability model representing voice and non-voice occurrence probabilities is stored, and warp estimation unit that calculates a warping factor in the frequency axis direction using the probability model representing voice and non-voice occurrence probabilities, voice and non-voice labels, and a cepstrum.Type: ApplicationFiled: September 22, 2008Publication date: August 12, 2010Inventor: Tadashi Emori
-
Publication number: 20100134319Abstract: An underwater communications system is provided that transmits electromagnetic and/or magnetic signals to a remote receiver. The transmitter includes a data input. A digital data compressor compresses data to be transmitted. A modulator modulates compressed data onto a carrier signal. An electrically insulated, magnetic coupled antenna transmits the compressed, modulated signals. The receiver that has an electrically insulated, magnetic coupled antenna for receiving a compressed, modulated signal. A demodulator is provided for demodulating the signal to reveal compressed data. A de-compressor de-compresses the data. An appropriate human interface is provided to present transmitted data into text/audio/visible form. Similarly, the transmit system comprises appropriate audio/visual/text entry mechanisms.Type: ApplicationFiled: February 3, 2010Publication date: June 3, 2010Inventors: Mark Rhodes, Derek Wolfe, Brendan Hyland
-
Publication number: 20100131267Abstract: A method of recording speech for use in a speech samples library. In an exemplary embodiment, the method comprises recording a speaker pronouncing a phoneme with musical parameters characterizing pronunciation of another phoneme by the same or another speaker. For example, in one embodiment the method comprises: providing a recording of a first speaker pronouncing a first phoneme in a phonemic context. The pronunciation is characterized by some musical parameters. A second reader, who may be the same as the first reader, is then recorded pronouncing a second phoneme (different from the first phoneme) with the musical parameters that characterizes pronunciation of the first phoneme by the first speaker. The recordings made by the second reader are used for compiling a speech samples library.Type: ApplicationFiled: March 19, 2008Publication date: May 27, 2010Applicant: Vivo Text Ltd.Inventors: Gershon Silbert, Andres Hakim
-
Publication number: 20100094631Abstract: An apparatus for synthesizing a rendered output signal having a first audio channel and a second audio channel includes a decorrelator stage for generating a decorrelator signal based on a downmix signal, and a combiner for performing a weighted combination of the downmix signal and a decorrelated signal based on parametric audio object information, downmix information and target rendering information. The combiner solves the problem of optimally combining matrixing with decorrelation for a high quality stereo scene reproduction of a number of individual audio objects using a multichannel downmix.Type: ApplicationFiled: April 23, 2008Publication date: April 15, 2010Inventors: Jonas Engdegard, Heiko Purnhagen, Barbara Resch, Lars Villemoes, Cornelia Falch, Juergen Herre, Johannes Hilpert, Andreas Hoelzer, Leonid Terentiev
-
Publication number: 20100057465Abstract: A text-to-speech (TTS) system implemented in an automotive vehicle is dynamically tuned to improve intelligibility over a wide variety of vehicle operating states and environmental conditions. In one embodiment of the present invention, a TTS system is interfaced to one or more vehicle sensors to measure parameters including vehicle speed, interior noise, visibility conditions, and road roughness, among others. In response to measurements of these operating parameters, TTS voice volume, pitch, and speed, among other parameters, may be tuned in order to improve intelligibility of the TTS voice system and increase its effectiveness for the operator of the vehicle.Type: ApplicationFiled: September 3, 2008Publication date: March 4, 2010Inventors: DAVID MICHAEL KIRSCH, Ritchie Winson Huang
-
Publication number: 20100049525Abstract: A method is provided of providing cues from am electronic communication device to a user while capturing an utterance. A plurality of cues associated with the user utterance are provided by the device to the user in at least near real-time. For each of a plurality of portions of the utterance, data representative of the respective portion of the user utterance is communicated from the electronic communication device to a remote electronic device. In response to this communication, data, representative of at least one parameter associated with the respective portion of the user utterance, is received at the electronic communication device. The electronic communication device provides one or more cues to the user based on the at least parameter. At least one of the cues is provided by the electronic communication device to the user prior to completion of the step of capturing the user utterance.Type: ApplicationFiled: August 24, 2009Publication date: February 25, 2010Applicant: YAP, INC.Inventor: Scott Edward Paden
-
Publication number: 20090319275Abstract: A speech synthesizing device, the device includes: a text accepting unit for accepting text data; an extracting unit for extracting a special character including a pictographic character, a face mark or a symbol from text data accepted by the text accepting unit; a dictionary database in which a plurality of special characters and a plurality of phonetic expressions for each special character are registered; a selecting unit for selecting a phonetic expression of an extracted special character from the dictionary database when the extracting unit extracts the special character; a converting unit for converting the text data accepted by the accepting unit to a phonogram in accordance with a phonetic expression selected by the selecting unit in association with the extracted special character; and a speech synthesizing unit for synthesizing a voice from a phonogram obtained by the convertingType: ApplicationFiled: August 31, 2009Publication date: December 24, 2009Applicant: FUJITSU LIMITEDInventor: Takuya Noda
-
Publication number: 20090313022Abstract: A method and system for audibly outputting text messages includes: setting a vocalizing function for audibly outputting text messages, searching a character speech library for each character of a received text message, and acquiring pronunciation data of each character of the received text message. The method and the system further includes vocalizing the pronunciation data of each character of the received text message, generating a voice message, and audibly outputting the generated voice message.Type: ApplicationFiled: December 23, 2008Publication date: December 17, 2009Applicant: CHI MEI COMMUNICATION SYSTEMS, INC.Inventor: CHI-MING HSIAO
-
Publication number: 20090281808Abstract: A voice data creation system includes a dictionary data memory section that stores dictionary data for generating synthesized voice data corresponding to text data; an edition processing section that displays an edition screen for editing a voice guidance message as a sentence including a plurality of phrases to receive edition input formation so as to perform an edition processing based on the edition input information; a list information generation processing section that generates list information relating to each sentence and phrases included in the each sentence based on a result of the edition processing; a phrase voice data generating section that determines a target phrase for voice data creation based on the list information to generate and maintain voice data corresponding to the target phrase determined for voice data creation based on the dictionary data; and a memory write information generating section that determines a target phrase to be stored in a voice data memory based on the list informatType: ApplicationFiled: April 28, 2009Publication date: November 12, 2009Applicant: SEIKO EPSON CORPORATIONInventors: Jun NAKAMURA, Fumihito BAISHO
-
Publication number: 20090254347Abstract: Embodiments of the present invention provide a method and computer program product for the proactive completion of input fields for automated voice enablement of a Web page. In an embodiment of the invention, a method for proactively completing empty input fields for voice enabling a Web page can be provided. The method can include receiving speech input for an input field in a Web page and inserting a textual equivalent to the speech input into the input field in a Web page. The method further can include locating an empty input field remaining in the Web page and generating a speech grammar for the input field based upon permitted terms in a core attribute of the empty input field and prompting for speech input for the input field. Finally, the method can include posting the received speech input and the grammar to an automatic speech recognition (ASR) engine and inserting a textual equivalent to the speech input provided by the ASR engine into the empty input field.Type: ApplicationFiled: April 7, 2008Publication date: October 8, 2009Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Victor S. Moore, Wendi L. Nusbickel
-
Publication number: 20090239201Abstract: A phonetic pronunciation training device, phonetic pronunciation training method, and phonetic pronunciation training program is provided wherein pronunciation and sounds in language acquisition can be self-learned and listening skills, spelling skills and vocabulary can be enhanced. The present invention comprises at least a data base for storing phonetic pronunciation data associated with phonetic data and phonetic symbol data indicating this phonetic data, a selection function block for receiving instruction signal from an input means and randomly selecting phonetic pronunciation data, a phonetic pronunciation data reproducing function for reproducing selected phonetic pronunciation data, and a phonetic symbol data correct/error determination function block for comparing phonetic symbol data input by the input means and phonetic symbol data corresponding to the selected phonetic pronunciation data and recording the correct/error result to a memory means.Type: ApplicationFiled: July 15, 2005Publication date: September 24, 2009Inventor: Richard A Moe
-
Publication number: 20090234565Abstract: A navigation device is disclosed including a processor unit, memory device, and a speaker. The memory device includes a plurality of sound samples. In at least one embodiment, the navigation device is arranged to play a selection of the sound samples over speaker to provide navigation instructions. In at least one embodiment, the navigation device further includes an input device for receiving sound samples and is arranged for storing the received sound samples in memory device for subsequent playback over speaker for providing navigation instructions.Type: ApplicationFiled: February 19, 2007Publication date: September 17, 2009Inventor: Pieter Andreas Geelen
-
Publication number: 20090204405Abstract: Apparatus and method for generating high quality synthesized speech having smooth waveform concatenation. The apparatus includes a pitch frequency calculation section, a pitch synchronization position calculation section, a unit waveform storage, a unit waveform selection section, a unit waveform generation section, and a waveform synthesis section. The unit waveform generation section includes a conversion ratio calculation section, a sampling rate conversion section, and a unit waveform re-selection section. The conversion ratio calculation section calculates a sampling rate conversion ratio from the pitch information and the position of pitch synchronization, and the sampling rate conversion section converts the sampling rate of the unit waveform, delivered as input, based on the sampling rate conversion ratio.Type: ApplicationFiled: September 4, 2006Publication date: August 13, 2009Applicant: NEC CORPORATIONInventors: Masanori Kato, Satoshi Tsukada
-
Publication number: 20090204402Abstract: Method and apparatus for creating customized podcasts with multiple voices, where text content is converted into audio content, and where the voices are selected at least in part on words in the text content suggestive of the type of voice. Types of voice include at least male and female, accent, language, and speed.Type: ApplicationFiled: January 9, 2009Publication date: August 13, 2009Inventors: Harpreet MARWAHA, Brett ROBINSON
-
Publication number: 20090177474Abstract: A speech synthesizer includes a periodic component fusing unit and an aperiodic component fusing unit, and fuses periodic components and aperiodic components of a plurality of speech units for each segment, which are selected by a unit selector, by a periodic component fusing unit and an aperiodic component fusing unit, respectively. The speech synthesizer is further provided with an adder, so that the adder adds, edits, and concatenates the periodic components and the aperiodic components of the fused speech units to generate a speech waveform.Type: ApplicationFiled: September 18, 2008Publication date: July 9, 2009Applicant: KABUSHIKI KAISHA TOSHIBAInventors: Masahiro Morita, Takehiko Kagoshima
-
Publication number: 20090157407Abstract: An apparatus for semantic media conversion from source data to audio/video data may include a processor. The processor may be configured to parse source data having text and one or more tags and create a semantic structure model representative of the source data, and generate audio data comprising at least one of speech converted from parsed text of the source data contained in the semantic structure model and applied audio effects. Corresponding methods and computer program products are also provided.Type: ApplicationFiled: December 12, 2007Publication date: June 18, 2009Inventors: Tetsuo Yamabe, Kiyotaka Takahashi
-
Publication number: 20090132253Abstract: Methods and apparatuses to perform context-aware unit selection for natural language processing are described. Streams of information associated with input units are received. The streams of information are analyzed in a context associated with first candidate units to determine a first set of weights of the streams of information. A first candidate unit is selected from the first candidate units based on the first set of weights of the streams of information. The streams of information are analyzed in the context associated with second candidate units to determine a second set of weights of the streams of information. A second candidate unit is selected from second candidate units to concatenate with the first candidate unit based on the second set of weights of the streams of information.Type: ApplicationFiled: November 20, 2007Publication date: May 21, 2009Inventor: Jerome Bellegarda
-
Publication number: 20090100150Abstract: The present invention provides an assistive technology screen reader in a distributed network computer system. The screen reader, on a server computer system, receives display information output from one or more applications. The screen reader converts the text and symbolic content of the display information into a performant format for transmission across a network. The screen reader, on a client computer system, receives the performant format. The received performant format is converted to a device type file, by the screen reader. The screen reader then presents the device type file to a device driver, for output to a speaker, braille reader, or the like.Type: ApplicationFiled: June 14, 2002Publication date: April 16, 2009Inventor: David Yee
-
Publication number: 20090048842Abstract: Techniques for operating a reading machine are disclosed. The techniques include forming an N-dimensional features vector based on features of an image, the features corresponding to characteristics of at least one object depicted in the image, representing the features vector as a point in n-dimensional space, where n corresponds to N, the number of features in the features vector and comparing the point in n-dimensional space to a centroid that represents a cluster of points in the n-dimensional space corresponding to a class of objects to determine whether the point belongs in the class of objects corresponding to the centroid.Type: ApplicationFiled: April 28, 2008Publication date: February 19, 2009Inventors: Paul Albrecht, Rafael Maya Zetune, Lucy Gibson, Raymond C. Kurzweil