Text Analysis, Generation Of Parameters For Speech Synthesis Out Of Text, E.g., Grapheme To Phoneme Translation, Prosody Generation, Stress, Or Intonation Determination, Etc. (epo) Patents (Class 704/E13.011)
-
Publication number: 20110010178Abstract: Provided is a system and method for transforming vernacular pronunciation with respect to Hanja using a statistical method. In a system for transforming vernacular pronunciation, a vernacular pronunciation extracting unit extracts a vernacular pronunciation with respect to a Hanja character string, a statistical data determining unit determines a statistical data with respect to the Hanja character string by using statistical data of features related to a Hanja-vernacular pronunciation transformation, and a vernacular pronunciation transforming unit transforms the Hanja character string into a vernacular pronunciation using the extracted vernacular pronunciation and the determined statistical data.Type: ApplicationFiled: July 7, 2010Publication date: January 13, 2011Applicant: NHN CorporationInventors: Hyunjung LEE, Taeil Kim, Hee-Cheol Seo, Ji Hye Lee
-
Publication number: 20100329505Abstract: An image processing apparatus includes: a storage module configured to store a plurality of pieces of comment data; an analyzing module configured to analyze an expression of a person contained in image data; a generating module configured to select a target comment data from among the comment data stored in the storage module based on the expression of the person analyzed by the analyzing module, and to generate voice data using the target comment data; and an output module configured to output reproduction data to be used for displaying the image data together with the voice data generated by the generating module.Type: ApplicationFiled: June 1, 2010Publication date: December 30, 2010Applicant: KABUSHIKI KAISHA TOSHIBAInventors: Kousuke Imoji, Yuki Kaneko, Junichi Takahashi
-
Publication number: 20100332224Abstract: In accordance with an example embodiment of the present invention, an apparatus comprises a controller configured to process punctuated text data, and to identify punctuation in said punctuated text data; and an output unit configured to generate audio output corresponding to said punctuated text data, and to generate tactile output corresponding to said identified punctuation.Type: ApplicationFiled: June 30, 2009Publication date: December 30, 2010Applicant: NOKIA CORPORATIONInventors: Jakke Sakari Mäkelä, Jukka Pekka Naula, Niko Santeri Porjo
-
Publication number: 20100324905Abstract: Disclosed are techniques and systems to provide a narration of a text in multiple different voices. Further disclosed are techniques and systems for modifying a voice model associated with a selected character based on data received from a user.Type: ApplicationFiled: January 14, 2010Publication date: December 23, 2010Inventors: Raymond C. Kurzweil, Paul Albrecht, Peter Chapman
-
Publication number: 20100318360Abstract: The present invention is a method and system for extracting messages from a person using the body features presented by a user. The present invention captures a set of images and extracts a first set of body features, along with a set of contexts, and a set of meanings. From the first set of body features, the set of contexts, and the set of meanings, the present invention generates a set of words corresponding to the message that the person is attempting to convey. The present invention can also use the body features of the person in addition to the voice of the person to further improve the accuracy of extracting the person's message.Type: ApplicationFiled: June 10, 2009Publication date: December 16, 2010Applicant: Toyota Motor Engineering & Manufacturing North America, Inc.Inventor: Yasuo Uehara
-
Publication number: 20100318361Abstract: Assistive, context-relevant images may be provided. First, text may be received. Then a spell check indication may be received and a spelling check may be performed on the received text in response to the received spell check indication. Next, in response to the performed spelling check, a misspelling indication may be provided configured to indicate that at least one word in the received text is misspelled. A selection of the misspelling indication may then be received. Then, on a display device in response to the received selection of the misspelling indication, a plurality of suggested spellings for the at least one word and an image corresponding to a first one of the plurality of suggested spellings for the at least one word may be displayed.Type: ApplicationFiled: June 11, 2009Publication date: December 16, 2010Applicant: Microsoft CorporationInventors: Roderick C. Paulino, Jimmy Y. Sun
-
Publication number: 20100312564Abstract: A local text to speech feedback loop is utilized to modify algorithms used in speech synthesis to provide a user with an improved experience. A remote text to speech feedback loop is utilized to aggregate local feedback loop data and incorporate best solutions into new improved text to speech engine for deployment.Type: ApplicationFiled: June 5, 2009Publication date: December 9, 2010Applicant: Microsoft CorporationInventor: Michael D. Plumpe
-
Publication number: 20100312562Abstract: A rope-jumping algorithm is employed in a Hidden Markov Model based text to speech system to determine start and end models and to modify the start and end models by setting small co-variances. Disordered acoustic parameters due to violation of parameter constraints are avoided through the modification and result in stable line frequency spectrum for the generated speech.Type: ApplicationFiled: June 4, 2009Publication date: December 9, 2010Applicant: Microsoft CorporationInventors: Wenlin Wang, Guoliang Zhang, Jingyang Xu
-
Publication number: 20100299147Abstract: Systems and methods for facilitating communication including recognizing speech in a first language represented in a first audio signal; forming a first text representation of the speech; processing the first text representation to form data representing a second audio signal; and causing presentation of the second audio signal to a second user while responsive to an interrupt signal from a first user. In some embodiments, processing the first text representation includes translating the first text representation to a second text representation in a second language and processing the second text representation to form the data representing the second audio signal. In some embodiments include accepting an interrupt signal from the first user and interrupting the presentation of the second audio signal.Type: ApplicationFiled: May 20, 2009Publication date: November 25, 2010Applicant: BBN Technologies Corp.Inventor: David G. Stallard
-
Publication number: 20100250254Abstract: An acquiring unit acquires pattern sentences, which are similar to one another and include fixed segments and non-fixed segments, and substitution words that are substituted for the non-fixed segments. A sentence generating unit generates target sentences by replacing the non-fixed segments with the substitution words for each of the pattern sentences. A first synthetic-sound generating unit generates a first synthetic sound, a synthetic sound of the fixed segment, and a second synthetic-sound generating unit generates a second synthetic sound, a synthetic sound of the substitution word, for each of the target sentences. A calculating unit calculates a discontinuity value of a boundary between the first synthetic sound and the second synthetic sound for each of the target sentences and a selecting unit selects the target sentence having the smallest discontinuity value. A connecting unit connects the first synthetic sound and the second synthetic sound of the target sentence selected.Type: ApplicationFiled: September 15, 2009Publication date: September 30, 2010Applicant: KABUSHIKI KAISHA TOSHIBAInventor: Nobuaki Mizutani
-
Publication number: 20100250253Abstract: A speech-directed user interface system includes at least one speaker for delivering an audio signal to a user and at least one microphone for capturing speech utterances of a user. An interface device interfaces with the speaker and microphone and provides a plurality of audio signals to the speaker to be heard by the user. A control circuit is operably coupled with the interface device and is configured for selecting at least one of the plurality of audio signals as a foreground audio signal for delivery to the user through the speaker. The control circuit is operable for recognizing speech utterances of a user and using the recognized speech utterances to control the selection of the foreground audio signal.Type: ApplicationFiled: March 27, 2009Publication date: September 30, 2010Inventor: Yangmin Shen
-
Publication number: 20100211392Abstract: The speech synthesizing device acquires numerical data at regular time intervals, each piece of the numerical data representing a value having a plurality of digits, detects a change between two values represented by the numerical data that is acquired at two consecutive times, determines which digit of the value represented by the numerical data is used to generate speech data depending on the detected change, generates numerical information that indicates the determined digit of the value represented by the numerical data, and generates speech data from the digit indicated by the numerical information.Type: ApplicationFiled: September 21, 2009Publication date: August 19, 2010Applicant: KABUSHIKI KAISHA TOSHIBAInventors: Ryutaro Tokuda, Takehiko Kagoshima
-
Publication number: 20100211393Abstract: A speech synthesis device is provided with: a central segment selection unit for selecting a central segment from among a plurality of speech segments; a prosody generation unit for generating prosody information based on the central segment; a non-central segment selection unit for selecting a non-central segment, which is a segment outside of a central segment section, based on the central segment and the prosody information; and a waveform generation unit for generating a synthesized speech waveform based on the prosody information, the central segment, and the non-central segment. The speech synthesis device first selects a central segment that forms a basis for prosody generation and generates prosody information based on the central segment so that it is possible to sufficiently reduce both concatenation distortion and sound quality degradation accompanying prosody control in the section of the central segment.Type: ApplicationFiled: April 28, 2008Publication date: August 19, 2010Inventors: Masanori Kato, Yasuyuki Mitsui, Reishi Kondo
-
Publication number: 20100201793Abstract: A reading device includes a computing device and an image input device coupled to the computing device for capturing low resolution images and high resolution images. The reading machine also includes a computer program product residing on a computer readable medium. The medium is in communication with the computing device and includes instructions to operate in a plurality of modes to optimize performance for specific uses of the reading device and process low and high resolution images during operation of at least one of the plurality of modes.Type: ApplicationFiled: February 9, 2010Publication date: August 12, 2010Inventors: Raymond C. Kurzweil, Paul Albrecht, James Gashel, Lucy Gibson
-
Publication number: 20100198595Abstract: In a system comprising a voice recognition module, a session manager, and a voice generator module, a method for providing a service to a user comprises receiving an utterance via the voice recognition module; converting the utterance into one or more structures using lexicon tied to an ontology; identifying concepts in the utterance using the structures; provided the utterance includes sufficient information, selecting a service based on the concepts; generating a text message based on the selected service; and converting the text message to a voice message using the voice generator.Type: ApplicationFiled: February 3, 2009Publication date: August 5, 2010Applicant: SoftHUS Sp.z.o.oInventor: Eugeniusz Wlasiuk
-
Publication number: 20100198594Abstract: Mobile phone signals may be corrupted by noise, fading, interference with other signals, and low strength field coverage of a transmitting and/or a receiving mobile phone as they pass through the communication network (e.g., free space). Because of the corruption of the mobile phone signal, a voice conversation between a caller and a receiver may be interrupted and there may be gaps in a received oral communication from one or more participants in the voice conversation forcing either or both the caller and the receiver to repeat the conversation. Transmitting a transcript of the oral communication along with a voice signal comprising the oral communication can help ensure that voice conversation is not interrupted due to a corrupted voice signal. The transcript of the oral communication can be used to retrieve parts of the oral communication lost in transmission (e.g., by fading, etc.) to make the conversation more fluid.Type: ApplicationFiled: February 3, 2009Publication date: August 5, 2010Applicant: International Business Machines CorporationInventors: Rosario Gangemi, Giuseppe Longobardi
-
Publication number: 20100161327Abstract: A computer-implemented method for automatically analyzing, predicting, and/or modifying acoustic units of prosodic human speech utterances for use in speech synthesis or speech recognition. Possible steps include: initiating analysis of acoustic wave data representing the human speech utterances, via the phase state of the acoustic wave data; using one or more phase state defined acoustic wave metrics as common elements for analyzing, and optionally modifying, pitch, amplitude, duration, and other measurable acoustic parameters of the acoustic wave data, at predetermined time intervals; analyzing acoustic wave data representing a selected acoustic unit to determine the phase state of the acoustic unit; and analyzing the acoustic wave data representing the selected acoustic unit to determine at least one acoustic parameter of the acoustic unit with reference to the determined phase state of the selected acoustic unit. Also included are systems for implementing the described and related methods.Type: ApplicationFiled: December 16, 2009Publication date: June 24, 2010Inventors: Nishant CHANDRA, Reiner Wilhelms-Tricarico, Rattima Nitisaroj, Brian Mottershead, Gary A. Marple, John B. Reichenbach
-
Publication number: 20100131267Abstract: A method of recording speech for use in a speech samples library. In an exemplary embodiment, the method comprises recording a speaker pronouncing a phoneme with musical parameters characterizing pronunciation of another phoneme by the same or another speaker. For example, in one embodiment the method comprises: providing a recording of a first speaker pronouncing a first phoneme in a phonemic context. The pronunciation is characterized by some musical parameters. A second reader, who may be the same as the first reader, is then recorded pronouncing a second phoneme (different from the first phoneme) with the musical parameters that characterizes pronunciation of the first phoneme by the first speaker. The recordings made by the second reader are used for compiling a speech samples library.Type: ApplicationFiled: March 19, 2008Publication date: May 27, 2010Applicant: Vivo Text Ltd.Inventors: Gershon Silbert, Andres Hakim
-
Publication number: 20100125459Abstract: Exemplary embodiments provide for determining a sequence of words in a TTS system. An input text is analyzed using two models, a word n-gram model and an accent class n-gram model. A list of all possible words for each word in the input is generated for each model. Each word in each list for each model is given a score based on the probability that the word is the correct word in the sequence, based on the particular model. The two lists are combined and the two scores are combined for each word. A set of sequences of words are generated. Each sequence of words comprises a unique combination of an attribute and associated word for each word in the input. The combined score of each of word in the sequence of words is combined. A sequence of words having the highest score is selected and presented to a user.Type: ApplicationFiled: July 1, 2009Publication date: May 20, 2010Applicant: Nuance Communications, Inc.Inventors: Nobuyasu Itoh, Tohru Nagano, Masafumi Nishimura, Ryuki Tachibana
-
Publication number: 20100121629Abstract: A translation platform allows a client using a first language to communicate via translated voice and/or text to at least a second client using a second language. A control server uses various speech recognition engines, text translation engines and text to speech engines to accomplish real-time or near-real time translations.Type: ApplicationFiled: May 28, 2009Publication date: May 13, 2010Inventor: Sanford H. Cohen
-
Publication number: 20100100317Abstract: A method and apparatus for determining the manner in which a processor-enabled device should producing sounds from data is described. In at least one embodiment, the device includes a first device for synthesizing sounds digitally, and re-producing pre-recorded sounds, a second device for audible delivery thereof, memory in which is stored a database of a plurality data at least some of which is in the form of text-based indicators, and one or more pre-recorded sounds, a data transfer device by which the data is transferred between the processor of the device and the memory, and operating system software which controls the processing and flow of data between a processor and the memory, and whether the sounds are audibly reproduced. In accordance with at least one embodiment of the invention, the device is further capable of repeatedly determining one or more physical conditions, e.g.Type: ApplicationFiled: March 21, 2007Publication date: April 22, 2010Inventors: Rory Jones, Sven Jurgens
-
Publication number: 20100094632Abstract: Disclosed herein are various aspects of a toolkit used for generating a TTS voice for use in a spoken dialog system. The embodiments in each case may be in the form of the system, a computer-readable medium or a method for generating the TTS voice. An embodiment of the invention relates to a method of tracking progress in developing a text-to-speech (TTS) voice. The method comprises insuring that a corpus of recorded speech contains reading errors and matches an associated written text, creating a tuple for each utterance in the corpus and tracking progress for each utterance utilizing the tuple. Various parameters may be tracked using the tuple but the tuple provides a means for enabling multiple workers to efficiently process a database of utterance in preparation of a TTS voice.Type: ApplicationFiled: December 15, 2009Publication date: April 15, 2010Applicant: AT&T Corp,Inventors: Steven Lawrence Davis, Shane Fetters, David Eugene Schultz, Beverly Gustafson, Louise Loney
-
Publication number: 20100076766Abstract: The present invention discloses a method for producing graphical indicators and interactive systems for utilizing the graphical indicators. On the surface of an object, visually negligible graphical indicators are provided. The graphical indicators and main information, i.e. text or pictures, co-exist on the surface of object. The graphical indicators do not interfere with the main information when the perception of human eyes are concerned. With the graphical indicators, further information other than the main information on the surface of object are carried. In addition to the main information on the surface of object, one is able to obtain additional information through an auxiliary electronic device or trigger an interactive operation.Type: ApplicationFiled: November 19, 2009Publication date: March 25, 2010Applicant: Sonix Technology Co., Ltd.Inventor: Yao-Hung Tsai
-
Publication number: 20100063821Abstract: Technologies are described herein for providing a hands-free and non-visually occluding interaction with object information. In one method, a visual capture of a portion of an object is received through a hands-free and non-visually occluding visual capture device. An audio capture is also received from a user through a hands-free and non-visually occluding audio capture device. The audio capture may include a request for information about a portion of the object in the visual capture. The information is retrieved and is transmitted to the user for playback through a hands-free and non-visually occluding audio output device.Type: ApplicationFiled: September 9, 2008Publication date: March 11, 2010Inventors: Joseph C. Marsh, Eric M. Smith
-
Publication number: 20100030557Abstract: The disclosure relates to systems, methods and apparatus to convert speech to text and vice versa. One apparatus comprises a vocoder, a speech to text conversion engine, a text to speech conversion engine, and a user interface. The vocoder is operable to convert speech signals into packets and convert packets into speech signals. The speech to text conversion engine is operable to convert speech to text. The text to speech conversion engine is operable to convert text to speech. The user interface is operable to receive a user selection of a mode from among a plurality of modes, wherein a first mode enables the speech to text conversion engine, a second mode enables the text to speech conversion engine, and a third mode enables the speech to text conversion engine and the text to speech conversion engine.Type: ApplicationFiled: July 31, 2006Publication date: February 4, 2010Inventors: Stephen Molloy, Khaled Helmi El-Maleh
-
Publication number: 20090326951Abstract: Ratios of powers at the peaks of respective formants of the spectrum of a pitch-cycle waveform and powers at boundaries between the formants are obtained and, when the ratios are large, bandwidth of window functions are widened and the formant waveforms are generated by multiplying generated sinusoidal waveforms from the formant parameter sets on the basis of pitch-cycle waveform generating data by the window functions of the widened bandwidth, whereby a pitch-cycle waveform is generated by the sum of these formant waveforms.Type: ApplicationFiled: April 14, 2009Publication date: December 31, 2009Applicant: KABUSHIKI KAISHA TOSHIBAInventors: Ryo Morinaka, Takehiko Kagoshima
-
Publication number: 20090306987Abstract: There is provided a singing synthesis parameter data estimation system that automatically estimates singing synthesis parameter data for automatically synthesizing a human-like singing voice from an audio signal of input singing voice. A pitch parameter estimating section 9 estimates a pitch parameter, by which the pitch feature of an audio signal of synthesized singing voice is got closer to the pitch feature of the audio signal of input singing voice based on at least both of the pitch feature and lyric data with specified syllable bondaries of the audio signal of input singing voice.Type: ApplicationFiled: May 21, 2009Publication date: December 10, 2009Applicant: NATIONAL INSTITUTE OF ADVANCED INDUSTRIAL SCIENCE AND TECHNOLOGYInventors: Tomoyasu Nakano, Masataka Goto
-
Publication number: 20090295735Abstract: An electronic device and a method for automatically converting text to be displayed on a display screen of an electronic device into a speech signal when ambient light conditions affect viewing of the text. The method is performed by the electronic device and the method includes receiving a command to display text on the display screen and determining if an ambient light signal provided by an ambient light sensor is above a pre-determined viewing threshold. This ambient light signal corresponds to ambient light conditions adjacent the display screen. The method also includes automatically converting the text to a speech signal when the ambient light signal is above the pre-determined viewing threshold. Suitably, there is performed a step of emitting the speech signal in an audible form from a speaker.Type: ApplicationFiled: May 27, 2008Publication date: December 3, 2009Applicant: Motorola, Inc.Inventors: Wang Wang, Wei Guo, Kan Ni, Danilo Tan
-
Publication number: 20090278766Abstract: An adequate display operation control in accordance with the external world situation is realized. For example, where a user wears the wearing unit of a spectacle-shaped or head-worn unit, the user is made to be able to view any type of image on the display section immediately in front of the eyes, and provided with taken images, reproduced images, and received images. At the point, a control relative to various display operations such as on/off of the display operation, display operation mode, and source change is carried out based on external world information.Type: ApplicationFiled: August 17, 2007Publication date: November 12, 2009Applicant: SONY CORPORATIONInventors: Yoichiro Sako, Masaaki Tsuruta, Taiji Ito, Masamichi Asukai
-
Publication number: 20090271202Abstract: A speech synthesis apparatus includes a content selection unit that selects a text content item to be converted into speech; a related information selection unit that selects related information which can be at least converted into text and which is related to the text content item selected by the content selection unit; a data addition unit that converts the related information selected by the related information selection unit into text and adds text data of the text to text data of the text content item selected by the content selection unit; a text-to-speech conversion unit that converts the text data supplied from the data addition unit into a speech signal; and a speech output unit that outputs the speech signal supplied from the text-to-speech conversion unit.Type: ApplicationFiled: March 25, 2009Publication date: October 29, 2009Applicant: SONY ERICSSON MOBILE COMMUNICATIONS JAPAN, INC.Inventor: Susumu TAKATSUKA
-
Publication number: 20090271176Abstract: Methods, systems, and computer program products are provided for multilingual administration of enterprise data. Embodiments include retrieving enterprise data; extracting text from the enterprise data for rendering from digital media file, the extracted text being in a source language; identifying that the source language is not a predetermined default target language for rendering the enterprise data; translating the extracted text in the source language to translated text in the default target language; converting the translated text to synthesized speech in the default target language; and storing the synthesized speech in the default target language in a digital media file.Type: ApplicationFiled: April 24, 2008Publication date: October 29, 2009Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: William K. Bodin, David Jaramillo, Ann Marie Maynard
-
Publication number: 20090259473Abstract: Methods and apparatus to present a video program to a visually impaired person are disclosed. An example method comprises receiving a video stream and an associated audio stream of a video program, detecting a portion of the video program that is not readily consumable by a visually impaired person, obtaining text associated with the portion of the video program, converting the text to a second audio stream, and combining the second audio stream with the associated audio stream.Type: ApplicationFiled: April 14, 2008Publication date: October 15, 2009Inventors: Hisao M. Chang, Horst Schroeter
-
Publication number: 20090259471Abstract: A universal pattern processing system receives input data and produces output patterns that are best associated with said data. The system uses input means receiving and processing input data, a universal pattern decoder means transforming models using the input data and associating output patterns with original models that are changed least during transforming, and output means outputting best associated patterns chosen by a pattern decoder means.Type: ApplicationFiled: April 11, 2008Publication date: October 15, 2009Applicant: International Business Machines CorporationInventors: Dimitri KANEVSKY, David Nahamoo, Tara N. Sainath
-
Publication number: 20090240501Abstract: Described is a technology by which artificial words are generated based on seed words, and then used with a letter-to-sound conversion model. To generate an artificial word, a stressed syllable of a seed word is replaced with a different syllable, such as a candidate (artificial) syllable, when the phonemic structure and/or graphonemic structure of the stressed syllable and the candidate syllable match one another. In one aspect, the artificial words are provided for use with a letter-to-sound conversion model, which may be used to generate artificial phonemes from a source of words, such as in conjunction with other models. If the phonemes provided by the various models for a selected source word are in agreement relative to one another, the selected source word and an associated artificial phoneme may be added to a training set which may then be used to retrain the letter-to-sound conversion model.Type: ApplicationFiled: March 19, 2008Publication date: September 24, 2009Applicant: MICROSOFT CORPORATIONInventors: Yi Ning Chen, Jia Li You, Frank Kao-ping Soong
-
Publication number: 20090187407Abstract: The present invention relates to a system and methods for preparing reports, such as medical reports. The system and methods advantageously can verbalize information, using speech synthesis (text-to-speech), to support a dialogue between a user and the reporting system during the course of the preparation of the report in order that the user can avoid inefficient visual distractions.Type: ApplicationFiled: January 18, 2008Publication date: July 23, 2009Inventors: Jeffrey Soble, James Roberge
-
Publication number: 20090187408Abstract: A temporary child set is generated. An elastic ratio of an elastic section of a model pattern is calculated. A temporary typical pattern of the set is generated by combining the pattern belonging to the set with the model pattern having the elastic pattern expanded or contracted. A distortion between the temporary typical pattern of the set and the pattern belonging to the set is calculated, and a child set is determined as the set when the distortion is below a threshold. A typical pattern as the temporary typical pattern of the child set is stored with a classification rule as the classification item of the context of the pattern belonging to the child set.Type: ApplicationFiled: January 23, 2009Publication date: July 23, 2009Applicant: KABUSHIKI KAISHA TOSHIBAInventor: Nobuaki MIZUTANI
-
Publication number: 20090177475Abstract: Even when a pitch cycle has a large fluctuation and the pitch cycle string changes abruptly, it possible to suppress the affect of the pitch cycle fluctuation and generate high-quality synthesized speech. A speech synthesis device generates a synthesized speech corresponding to an input text sentence according to an original speech waveform stored in original speech waveform information storage unit (25). The speech synthesis device includes pitch cycle correction unit (40) which extracts a fluctuation component of the pitch cycle of the original speech waveform which is obtained from original speech waveform information storage unit (25) in order to generate the synthesized speech and which corrects, based on the extracted fluctuation component, the pitch cycle of the synthesized speech obtained by analyzing the input text sentence. Pitch cycle correction unit (40) connects the pitch cycle waveform of the original speech waveform at the pitch cycle of the corrected synthesized speech.Type: ApplicationFiled: July 4, 2007Publication date: July 9, 2009Applicant: NEC CORPORATIONInventor: Masanori Kato
-
Publication number: 20090157408Abstract: The present invention relates to a speech synthesizing method and apparatus based on a hidden Markov model (HMM). Among code words that are obtained by quantizing speech parameter instances for each state of an HMM model, a code word closest to a speech parameter generated from an input text using a known method is searched. When the distance between the searched code word and the speech parameter generated by the known method is smaller to or equal to a threshold value, the searched code word is output as a final speech parameter. When the distance exceeds the threshold value, the speech parameter generated by the known method is output as the final speech parameter. The final speech parameter is processed to generate final synthesized speech for the input text.Type: ApplicationFiled: June 27, 2008Publication date: June 18, 2009Applicant: Electronics and Telecommunications Research InstituteInventor: Sanghun KIM
-
Publication number: 20090150157Abstract: A word dictionary including sets of a character string which constitutes a word, a phoneme sequence which constitutes pronunciation of the word and a part of speech of the word is referenced, an entered text is analyzed, the entered text is divided into one or more subtexts, a phoneme sequence and a part of speech sequence are generated for each subtext, the part of speech sequence of the subtext and a list of part of speech sequence are collated to determine whether the phonetic sound of the subtext is to be converted or not, and the phonetic sounds of the phoneme sequence in the subtext whose phonetic sounds are determined to be converted are converted.Type: ApplicationFiled: September 15, 2008Publication date: June 11, 2009Applicant: KABUSHIKI KAISHA TOSHIBAInventors: Takehiko KAGOSHIMA, Noriko YAMANAKA, Makoto YAJIMA
-
Publication number: 20090125309Abstract: Methods, Systems, and Products are disclosed for synthesizing speech. Text is received for translation to speech. The text is correlated to phrases, and each phrase is converted into a corresponding string of phonemes. A phoneme identifier is retrieved that uniquely represents each phoneme in the string of phonemes. Each phoneme identifier is concatenated to produce a sequence of phoneme identifiers with each phoneme identifier separated by a comma. Each sequence of phoneme identifiers is concatenated and separated by a semi-colon.Type: ApplicationFiled: January 22, 2009Publication date: May 14, 2009Inventor: Steve Tischer
-
Publication number: 20090119091Abstract: A system and method for automated languages translation comprising a database containing pre-translated patterns that were translated by human translators, generating a transparent and seamless translation service. Whenever a user issues a translation request, the system offers suitable translated sentences from the aforementioned database. The system does so by separating the submitted text into elements and using a pattern recognition mechanism to identify a matching translation to each element. If there is no matching translated pattern in the database or if the user does not approve the translated sentence, the system transparently uses a suitable registered human translator to translate. The new translation is stored in the database, thus perfecting the database, and the translation request is delivered.Type: ApplicationFiled: October 14, 2008Publication date: May 7, 2009Inventor: Eitan Chaim Sarig
-
Publication number: 20090083036Abstract: Described is a technology by which synthesized speech generated from text is evaluated against a prosody model (trained offline) to determine whether the speech will sound unnatural. If so, the speech is regenerated with modified data. The evaluation and regeneration may be iterative until deemed natural sounding. For example, text is built into a lattice that is then (e.g., Viterbi) searched to find a best path. The sections (e.g., units) of data on the path are evaluated via a prosody model. If the evaluation deems a section to correspond to unnatural prosody, that section is replaced, e.g., by modifying/pruning the lattice and re-performing the search. Replacement may be iterative until all sections pass the evaluation. Unnatural prosody detection may be biased such that during evaluation, unnatural prosody is falsely detected at a higher rate relative to a rate at which unnatural prosody is missed.Type: ApplicationFiled: September 20, 2007Publication date: March 26, 2009Applicant: Microsoft CorporationInventors: Yong Zhao, Frank Kao-ping Soong, Min Chu, Lijuan Wang
-
Publication number: 20090063128Abstract: Provided are a device and method for interactive machine translation. The device includes a machine translation engine having a morphological/syntactic analyzer for analyzing morphemes and sentences of an original text and generating original text analysis information, and a translation generator for generating a translation and translation generation information on the basis of the original text analysis information, and a user interface module for displaying sentence structures of the original text and the translation, and a relationship between the original text and the translation to a user on the basis of the original text analysis information and the translation generation information, and for receiving corrections to the original text or the translation from the user. The device and method provide a user interface whereby the user can effectively recognize and correct a mistranslated part and a cause of the mistranslation, and rapidly provides a re-translated result according to the correction.Type: ApplicationFiled: September 5, 2008Publication date: March 5, 2009Applicant: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTEInventors: Young Ae SEO, Chang Hyun Kim, Seong Il Yang, Young Sook Hwang, Chang Hao Yin, Eun Jin Park, Sung Kwon Choi, Ki Young Lee, Oh Woog Kwon, Yoon Hyung Roh, Young Kil Kim
-
Publication number: 20090063154Abstract: Information about a device may be emotively conveyed to a user of the device. Input indicative of an operating state of the device may be received. The input may be transformed into data representing a simulated emotional state. Data representing an avatar that expresses the simulated emotional state may be generated and displayed. A query from the user regarding the simulated emotional state expressed by the avatar may be received. The query may be responded to.Type: ApplicationFiled: November 5, 2008Publication date: March 5, 2009Applicant: Ford Global Technologies, LLCInventors: Oleg Yurievitch Gusikhin, Perry Robinson MacNeille, Erica Klampfl, Kacie Alane Theisen, Dimitar Petrov Filev, Yifan Chen, Basavaraj Tonshal
-
Publication number: 20090055158Abstract: A speech translation apparatus includes a speech recognition unit configured to recognize input speech of a first language to generate a first text of the first language, an extraction unit configured to compare original prosody information of the input speech with first synthesized prosody information based on the first text to extract paralinguistic information about each of first words of the first text, a machine translation unit configured to translate the first text to a second text of a second language, a mapping unit configured to allocate the paralinguistic information about each of the first words to each of second words of the second text in accordance with synonymity, a generating unit configured to generate second synthesized prosody information based on the paralinguistic information allocated to each of the second words, and a speech synthesis unit configured to synthesize output speech based on the second synthesized prosody information.Type: ApplicationFiled: August 21, 2008Publication date: February 26, 2009Applicant: KABUSHIKI KAISHA TOSHIBAInventors: Dawei Xu, Takehiko Kagoshima
-
Publication number: 20090043585Abstract: Disclosed are systems, methods, and computer readable media for performing speech synthesis. The method embodiment comprises applying a first part of a speech synthesizer to a text corpus to obtain a plurality of phoneme sequences, the first part of the speech synthesizer only identifying possible phoneme sequences, for each of the obtained plurality of phoneme sequences, identifying joins that would be calculated to synthesize each of the plurality of respective phoneme sequences, and adding the identified joins to a cache for use in speech synthesis.Type: ApplicationFiled: August 9, 2007Publication date: February 12, 2009Applicant: AT&T Corp.Inventor: Alistair D. CONKIE
-
Publication number: 20090030670Abstract: Embodiments of the present invention provide a method, system and computer program product for real-time multi-lingual adaptation of manufacturing instructions in a manufacturing management system. In one embodiment of the invention, a manufacturing language adaptation method can be provided. The method can include identifying an operator receiving manufacturing instruction, determining a primary language preference for the operator and determining whether or not the manufacturing instructions have been translated into the primary language preference. If it is determined that the manufacturing instructions have been translated into the primary language preference, the manufacturing instructions can be presented to the operator in the primary language preference. Otherwise the manufacturing instructions can be submitted to a translation engine for translation into the primary language preference.Type: ApplicationFiled: July 25, 2007Publication date: January 29, 2009Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Ivory W. Knipfer, John W. Marreel, Kay M. Momsen, Ryan T. Paske
-
Publication number: 20090024385Abstract: A method and an apparatus for semantic parsing of electronic text documents. The electronic text documents can comprise a plurality of sentences with several language components. The method comprises analyzing at least one sentence of the electronic text document and dynamically generating a graph from the analyzed sentence of the text document. The graph represents a semantic representation of the analyzed one or more sentences. The method continues the analysis until an ambiguous sentence is determined and analyzed by evaluating at least a portion of the generated graph.Type: ApplicationFiled: July 16, 2007Publication date: January 22, 2009Applicant: SEMGINE, GMBHInventor: Martin Christian Hirsch
-
Publication number: 20090006096Abstract: Described is a voice persona service by which users convert text into speech waveforms, based on user-provided parameters and voice data from a service data store. The service may be remotely accessed, such as via the Internet. The user may provide text tagged with parameters, with the text sent to a text-to-speech engine along with base or custom voice data, and the resulting waveform morphed based on the tags. The user may also provide speech. Once created, a voice persona corresponding to the speech waveform may be persisted, exchanged, made public, shared and so forth. In one example, the voice persona service receives user input and parameters, and retrieves a base or custom voice that may be edited by the user via a morphing algorithm. The service outputs a waveform, such as a .wav file for embedding in a software program, and persists the voice persona corresponding to that waveform.Type: ApplicationFiled: June 27, 2007Publication date: January 1, 2009Applicant: Microsoft CorporationInventors: Yusheng Li, Min Chu, Xin Zou, Frank Kao-ping Soong
-
Publication number: 20080319755Abstract: According to an aspect of an embodiment, an apparatus for converting text data into sound signal, comprises: a phoneme determiner for determining phoneme data corresponding to a plurality of phonemes and pause data corresponding to a plurality of pauses to be inserted among a series of phonemes in the text data to be converted into sound signal; a phoneme length adjuster for modifying the phoneme data and the pause data by determining lengths of the phonemes, respectively in accordance with a speed of the sound signal and selectively adjusting the length of at least one of the phonemes which is placed immediately after one of the pauses so that the at least one of the phonemes is relatively extended timewise as compared to other phonemes; and a output unit for outputting sound signal on the basis of the adjusted phoneme data and pause data by the phoneme length adjuster.Type: ApplicationFiled: June 24, 2008Publication date: December 25, 2008Applicant: FUJITSU LIMITEDInventors: Rika Nishiike, Hitoshi Sasaki, Nobuyuki Katae, Kentaro Murase, Takuya Noda