Text Analysis, Generation Of Parameters For Speech Synthesis Out Of Text, E.g., Grapheme To Phoneme Translation, Prosody Generation, Stress, Or Intonation Determination, Etc. (epo) Patents (Class 704/E13.011)
  • Publication number: 20110010178
    Abstract: Provided is a system and method for transforming vernacular pronunciation with respect to Hanja using a statistical method. In a system for transforming vernacular pronunciation, a vernacular pronunciation extracting unit extracts a vernacular pronunciation with respect to a Hanja character string, a statistical data determining unit determines a statistical data with respect to the Hanja character string by using statistical data of features related to a Hanja-vernacular pronunciation transformation, and a vernacular pronunciation transforming unit transforms the Hanja character string into a vernacular pronunciation using the extracted vernacular pronunciation and the determined statistical data.
    Type: Application
    Filed: July 7, 2010
    Publication date: January 13, 2011
    Applicant: NHN Corporation
    Inventors: Hyunjung LEE, Taeil Kim, Hee-Cheol Seo, Ji Hye Lee
  • Publication number: 20100329505
    Abstract: An image processing apparatus includes: a storage module configured to store a plurality of pieces of comment data; an analyzing module configured to analyze an expression of a person contained in image data; a generating module configured to select a target comment data from among the comment data stored in the storage module based on the expression of the person analyzed by the analyzing module, and to generate voice data using the target comment data; and an output module configured to output reproduction data to be used for displaying the image data together with the voice data generated by the generating module.
    Type: Application
    Filed: June 1, 2010
    Publication date: December 30, 2010
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Kousuke Imoji, Yuki Kaneko, Junichi Takahashi
  • Publication number: 20100332224
    Abstract: In accordance with an example embodiment of the present invention, an apparatus comprises a controller configured to process punctuated text data, and to identify punctuation in said punctuated text data; and an output unit configured to generate audio output corresponding to said punctuated text data, and to generate tactile output corresponding to said identified punctuation.
    Type: Application
    Filed: June 30, 2009
    Publication date: December 30, 2010
    Applicant: NOKIA CORPORATION
    Inventors: Jakke Sakari Mäkelä, Jukka Pekka Naula, Niko Santeri Porjo
  • Publication number: 20100324905
    Abstract: Disclosed are techniques and systems to provide a narration of a text in multiple different voices. Further disclosed are techniques and systems for modifying a voice model associated with a selected character based on data received from a user.
    Type: Application
    Filed: January 14, 2010
    Publication date: December 23, 2010
    Inventors: Raymond C. Kurzweil, Paul Albrecht, Peter Chapman
  • Publication number: 20100318360
    Abstract: The present invention is a method and system for extracting messages from a person using the body features presented by a user. The present invention captures a set of images and extracts a first set of body features, along with a set of contexts, and a set of meanings. From the first set of body features, the set of contexts, and the set of meanings, the present invention generates a set of words corresponding to the message that the person is attempting to convey. The present invention can also use the body features of the person in addition to the voice of the person to further improve the accuracy of extracting the person's message.
    Type: Application
    Filed: June 10, 2009
    Publication date: December 16, 2010
    Applicant: Toyota Motor Engineering & Manufacturing North America, Inc.
    Inventor: Yasuo Uehara
  • Publication number: 20100318361
    Abstract: Assistive, context-relevant images may be provided. First, text may be received. Then a spell check indication may be received and a spelling check may be performed on the received text in response to the received spell check indication. Next, in response to the performed spelling check, a misspelling indication may be provided configured to indicate that at least one word in the received text is misspelled. A selection of the misspelling indication may then be received. Then, on a display device in response to the received selection of the misspelling indication, a plurality of suggested spellings for the at least one word and an image corresponding to a first one of the plurality of suggested spellings for the at least one word may be displayed.
    Type: Application
    Filed: June 11, 2009
    Publication date: December 16, 2010
    Applicant: Microsoft Corporation
    Inventors: Roderick C. Paulino, Jimmy Y. Sun
  • Publication number: 20100312564
    Abstract: A local text to speech feedback loop is utilized to modify algorithms used in speech synthesis to provide a user with an improved experience. A remote text to speech feedback loop is utilized to aggregate local feedback loop data and incorporate best solutions into new improved text to speech engine for deployment.
    Type: Application
    Filed: June 5, 2009
    Publication date: December 9, 2010
    Applicant: Microsoft Corporation
    Inventor: Michael D. Plumpe
  • Publication number: 20100312562
    Abstract: A rope-jumping algorithm is employed in a Hidden Markov Model based text to speech system to determine start and end models and to modify the start and end models by setting small co-variances. Disordered acoustic parameters due to violation of parameter constraints are avoided through the modification and result in stable line frequency spectrum for the generated speech.
    Type: Application
    Filed: June 4, 2009
    Publication date: December 9, 2010
    Applicant: Microsoft Corporation
    Inventors: Wenlin Wang, Guoliang Zhang, Jingyang Xu
  • Publication number: 20100299147
    Abstract: Systems and methods for facilitating communication including recognizing speech in a first language represented in a first audio signal; forming a first text representation of the speech; processing the first text representation to form data representing a second audio signal; and causing presentation of the second audio signal to a second user while responsive to an interrupt signal from a first user. In some embodiments, processing the first text representation includes translating the first text representation to a second text representation in a second language and processing the second text representation to form the data representing the second audio signal. In some embodiments include accepting an interrupt signal from the first user and interrupting the presentation of the second audio signal.
    Type: Application
    Filed: May 20, 2009
    Publication date: November 25, 2010
    Applicant: BBN Technologies Corp.
    Inventor: David G. Stallard
  • Publication number: 20100250254
    Abstract: An acquiring unit acquires pattern sentences, which are similar to one another and include fixed segments and non-fixed segments, and substitution words that are substituted for the non-fixed segments. A sentence generating unit generates target sentences by replacing the non-fixed segments with the substitution words for each of the pattern sentences. A first synthetic-sound generating unit generates a first synthetic sound, a synthetic sound of the fixed segment, and a second synthetic-sound generating unit generates a second synthetic sound, a synthetic sound of the substitution word, for each of the target sentences. A calculating unit calculates a discontinuity value of a boundary between the first synthetic sound and the second synthetic sound for each of the target sentences and a selecting unit selects the target sentence having the smallest discontinuity value. A connecting unit connects the first synthetic sound and the second synthetic sound of the target sentence selected.
    Type: Application
    Filed: September 15, 2009
    Publication date: September 30, 2010
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventor: Nobuaki Mizutani
  • Publication number: 20100250253
    Abstract: A speech-directed user interface system includes at least one speaker for delivering an audio signal to a user and at least one microphone for capturing speech utterances of a user. An interface device interfaces with the speaker and microphone and provides a plurality of audio signals to the speaker to be heard by the user. A control circuit is operably coupled with the interface device and is configured for selecting at least one of the plurality of audio signals as a foreground audio signal for delivery to the user through the speaker. The control circuit is operable for recognizing speech utterances of a user and using the recognized speech utterances to control the selection of the foreground audio signal.
    Type: Application
    Filed: March 27, 2009
    Publication date: September 30, 2010
    Inventor: Yangmin Shen
  • Publication number: 20100211392
    Abstract: The speech synthesizing device acquires numerical data at regular time intervals, each piece of the numerical data representing a value having a plurality of digits, detects a change between two values represented by the numerical data that is acquired at two consecutive times, determines which digit of the value represented by the numerical data is used to generate speech data depending on the detected change, generates numerical information that indicates the determined digit of the value represented by the numerical data, and generates speech data from the digit indicated by the numerical information.
    Type: Application
    Filed: September 21, 2009
    Publication date: August 19, 2010
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Ryutaro Tokuda, Takehiko Kagoshima
  • Publication number: 20100211393
    Abstract: A speech synthesis device is provided with: a central segment selection unit for selecting a central segment from among a plurality of speech segments; a prosody generation unit for generating prosody information based on the central segment; a non-central segment selection unit for selecting a non-central segment, which is a segment outside of a central segment section, based on the central segment and the prosody information; and a waveform generation unit for generating a synthesized speech waveform based on the prosody information, the central segment, and the non-central segment. The speech synthesis device first selects a central segment that forms a basis for prosody generation and generates prosody information based on the central segment so that it is possible to sufficiently reduce both concatenation distortion and sound quality degradation accompanying prosody control in the section of the central segment.
    Type: Application
    Filed: April 28, 2008
    Publication date: August 19, 2010
    Inventors: Masanori Kato, Yasuyuki Mitsui, Reishi Kondo
  • Publication number: 20100201793
    Abstract: A reading device includes a computing device and an image input device coupled to the computing device for capturing low resolution images and high resolution images. The reading machine also includes a computer program product residing on a computer readable medium. The medium is in communication with the computing device and includes instructions to operate in a plurality of modes to optimize performance for specific uses of the reading device and process low and high resolution images during operation of at least one of the plurality of modes.
    Type: Application
    Filed: February 9, 2010
    Publication date: August 12, 2010
    Inventors: Raymond C. Kurzweil, Paul Albrecht, James Gashel, Lucy Gibson
  • Publication number: 20100198595
    Abstract: In a system comprising a voice recognition module, a session manager, and a voice generator module, a method for providing a service to a user comprises receiving an utterance via the voice recognition module; converting the utterance into one or more structures using lexicon tied to an ontology; identifying concepts in the utterance using the structures; provided the utterance includes sufficient information, selecting a service based on the concepts; generating a text message based on the selected service; and converting the text message to a voice message using the voice generator.
    Type: Application
    Filed: February 3, 2009
    Publication date: August 5, 2010
    Applicant: SoftHUS Sp.z.o.o
    Inventor: Eugeniusz Wlasiuk
  • Publication number: 20100198594
    Abstract: Mobile phone signals may be corrupted by noise, fading, interference with other signals, and low strength field coverage of a transmitting and/or a receiving mobile phone as they pass through the communication network (e.g., free space). Because of the corruption of the mobile phone signal, a voice conversation between a caller and a receiver may be interrupted and there may be gaps in a received oral communication from one or more participants in the voice conversation forcing either or both the caller and the receiver to repeat the conversation. Transmitting a transcript of the oral communication along with a voice signal comprising the oral communication can help ensure that voice conversation is not interrupted due to a corrupted voice signal. The transcript of the oral communication can be used to retrieve parts of the oral communication lost in transmission (e.g., by fading, etc.) to make the conversation more fluid.
    Type: Application
    Filed: February 3, 2009
    Publication date: August 5, 2010
    Applicant: International Business Machines Corporation
    Inventors: Rosario Gangemi, Giuseppe Longobardi
  • Publication number: 20100161327
    Abstract: A computer-implemented method for automatically analyzing, predicting, and/or modifying acoustic units of prosodic human speech utterances for use in speech synthesis or speech recognition. Possible steps include: initiating analysis of acoustic wave data representing the human speech utterances, via the phase state of the acoustic wave data; using one or more phase state defined acoustic wave metrics as common elements for analyzing, and optionally modifying, pitch, amplitude, duration, and other measurable acoustic parameters of the acoustic wave data, at predetermined time intervals; analyzing acoustic wave data representing a selected acoustic unit to determine the phase state of the acoustic unit; and analyzing the acoustic wave data representing the selected acoustic unit to determine at least one acoustic parameter of the acoustic unit with reference to the determined phase state of the selected acoustic unit. Also included are systems for implementing the described and related methods.
    Type: Application
    Filed: December 16, 2009
    Publication date: June 24, 2010
    Inventors: Nishant CHANDRA, Reiner Wilhelms-Tricarico, Rattima Nitisaroj, Brian Mottershead, Gary A. Marple, John B. Reichenbach
  • Publication number: 20100131267
    Abstract: A method of recording speech for use in a speech samples library. In an exemplary embodiment, the method comprises recording a speaker pronouncing a phoneme with musical parameters characterizing pronunciation of another phoneme by the same or another speaker. For example, in one embodiment the method comprises: providing a recording of a first speaker pronouncing a first phoneme in a phonemic context. The pronunciation is characterized by some musical parameters. A second reader, who may be the same as the first reader, is then recorded pronouncing a second phoneme (different from the first phoneme) with the musical parameters that characterizes pronunciation of the first phoneme by the first speaker. The recordings made by the second reader are used for compiling a speech samples library.
    Type: Application
    Filed: March 19, 2008
    Publication date: May 27, 2010
    Applicant: Vivo Text Ltd.
    Inventors: Gershon Silbert, Andres Hakim
  • Publication number: 20100125459
    Abstract: Exemplary embodiments provide for determining a sequence of words in a TTS system. An input text is analyzed using two models, a word n-gram model and an accent class n-gram model. A list of all possible words for each word in the input is generated for each model. Each word in each list for each model is given a score based on the probability that the word is the correct word in the sequence, based on the particular model. The two lists are combined and the two scores are combined for each word. A set of sequences of words are generated. Each sequence of words comprises a unique combination of an attribute and associated word for each word in the input. The combined score of each of word in the sequence of words is combined. A sequence of words having the highest score is selected and presented to a user.
    Type: Application
    Filed: July 1, 2009
    Publication date: May 20, 2010
    Applicant: Nuance Communications, Inc.
    Inventors: Nobuyasu Itoh, Tohru Nagano, Masafumi Nishimura, Ryuki Tachibana
  • Publication number: 20100121629
    Abstract: A translation platform allows a client using a first language to communicate via translated voice and/or text to at least a second client using a second language. A control server uses various speech recognition engines, text translation engines and text to speech engines to accomplish real-time or near-real time translations.
    Type: Application
    Filed: May 28, 2009
    Publication date: May 13, 2010
    Inventor: Sanford H. Cohen
  • Publication number: 20100100317
    Abstract: A method and apparatus for determining the manner in which a processor-enabled device should producing sounds from data is described. In at least one embodiment, the device includes a first device for synthesizing sounds digitally, and re-producing pre-recorded sounds, a second device for audible delivery thereof, memory in which is stored a database of a plurality data at least some of which is in the form of text-based indicators, and one or more pre-recorded sounds, a data transfer device by which the data is transferred between the processor of the device and the memory, and operating system software which controls the processing and flow of data between a processor and the memory, and whether the sounds are audibly reproduced. In accordance with at least one embodiment of the invention, the device is further capable of repeatedly determining one or more physical conditions, e.g.
    Type: Application
    Filed: March 21, 2007
    Publication date: April 22, 2010
    Inventors: Rory Jones, Sven Jurgens
  • Publication number: 20100094632
    Abstract: Disclosed herein are various aspects of a toolkit used for generating a TTS voice for use in a spoken dialog system. The embodiments in each case may be in the form of the system, a computer-readable medium or a method for generating the TTS voice. An embodiment of the invention relates to a method of tracking progress in developing a text-to-speech (TTS) voice. The method comprises insuring that a corpus of recorded speech contains reading errors and matches an associated written text, creating a tuple for each utterance in the corpus and tracking progress for each utterance utilizing the tuple. Various parameters may be tracked using the tuple but the tuple provides a means for enabling multiple workers to efficiently process a database of utterance in preparation of a TTS voice.
    Type: Application
    Filed: December 15, 2009
    Publication date: April 15, 2010
    Applicant: AT&T Corp,
    Inventors: Steven Lawrence Davis, Shane Fetters, David Eugene Schultz, Beverly Gustafson, Louise Loney
  • Publication number: 20100076766
    Abstract: The present invention discloses a method for producing graphical indicators and interactive systems for utilizing the graphical indicators. On the surface of an object, visually negligible graphical indicators are provided. The graphical indicators and main information, i.e. text or pictures, co-exist on the surface of object. The graphical indicators do not interfere with the main information when the perception of human eyes are concerned. With the graphical indicators, further information other than the main information on the surface of object are carried. In addition to the main information on the surface of object, one is able to obtain additional information through an auxiliary electronic device or trigger an interactive operation.
    Type: Application
    Filed: November 19, 2009
    Publication date: March 25, 2010
    Applicant: Sonix Technology Co., Ltd.
    Inventor: Yao-Hung Tsai
  • Publication number: 20100063821
    Abstract: Technologies are described herein for providing a hands-free and non-visually occluding interaction with object information. In one method, a visual capture of a portion of an object is received through a hands-free and non-visually occluding visual capture device. An audio capture is also received from a user through a hands-free and non-visually occluding audio capture device. The audio capture may include a request for information about a portion of the object in the visual capture. The information is retrieved and is transmitted to the user for playback through a hands-free and non-visually occluding audio output device.
    Type: Application
    Filed: September 9, 2008
    Publication date: March 11, 2010
    Inventors: Joseph C. Marsh, Eric M. Smith
  • Publication number: 20100030557
    Abstract: The disclosure relates to systems, methods and apparatus to convert speech to text and vice versa. One apparatus comprises a vocoder, a speech to text conversion engine, a text to speech conversion engine, and a user interface. The vocoder is operable to convert speech signals into packets and convert packets into speech signals. The speech to text conversion engine is operable to convert speech to text. The text to speech conversion engine is operable to convert text to speech. The user interface is operable to receive a user selection of a mode from among a plurality of modes, wherein a first mode enables the speech to text conversion engine, a second mode enables the text to speech conversion engine, and a third mode enables the speech to text conversion engine and the text to speech conversion engine.
    Type: Application
    Filed: July 31, 2006
    Publication date: February 4, 2010
    Inventors: Stephen Molloy, Khaled Helmi El-Maleh
  • Publication number: 20090326951
    Abstract: Ratios of powers at the peaks of respective formants of the spectrum of a pitch-cycle waveform and powers at boundaries between the formants are obtained and, when the ratios are large, bandwidth of window functions are widened and the formant waveforms are generated by multiplying generated sinusoidal waveforms from the formant parameter sets on the basis of pitch-cycle waveform generating data by the window functions of the widened bandwidth, whereby a pitch-cycle waveform is generated by the sum of these formant waveforms.
    Type: Application
    Filed: April 14, 2009
    Publication date: December 31, 2009
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Ryo Morinaka, Takehiko Kagoshima
  • Publication number: 20090306987
    Abstract: There is provided a singing synthesis parameter data estimation system that automatically estimates singing synthesis parameter data for automatically synthesizing a human-like singing voice from an audio signal of input singing voice. A pitch parameter estimating section 9 estimates a pitch parameter, by which the pitch feature of an audio signal of synthesized singing voice is got closer to the pitch feature of the audio signal of input singing voice based on at least both of the pitch feature and lyric data with specified syllable bondaries of the audio signal of input singing voice.
    Type: Application
    Filed: May 21, 2009
    Publication date: December 10, 2009
    Applicant: NATIONAL INSTITUTE OF ADVANCED INDUSTRIAL SCIENCE AND TECHNOLOGY
    Inventors: Tomoyasu Nakano, Masataka Goto
  • Publication number: 20090295735
    Abstract: An electronic device and a method for automatically converting text to be displayed on a display screen of an electronic device into a speech signal when ambient light conditions affect viewing of the text. The method is performed by the electronic device and the method includes receiving a command to display text on the display screen and determining if an ambient light signal provided by an ambient light sensor is above a pre-determined viewing threshold. This ambient light signal corresponds to ambient light conditions adjacent the display screen. The method also includes automatically converting the text to a speech signal when the ambient light signal is above the pre-determined viewing threshold. Suitably, there is performed a step of emitting the speech signal in an audible form from a speaker.
    Type: Application
    Filed: May 27, 2008
    Publication date: December 3, 2009
    Applicant: Motorola, Inc.
    Inventors: Wang Wang, Wei Guo, Kan Ni, Danilo Tan
  • Publication number: 20090278766
    Abstract: An adequate display operation control in accordance with the external world situation is realized. For example, where a user wears the wearing unit of a spectacle-shaped or head-worn unit, the user is made to be able to view any type of image on the display section immediately in front of the eyes, and provided with taken images, reproduced images, and received images. At the point, a control relative to various display operations such as on/off of the display operation, display operation mode, and source change is carried out based on external world information.
    Type: Application
    Filed: August 17, 2007
    Publication date: November 12, 2009
    Applicant: SONY CORPORATION
    Inventors: Yoichiro Sako, Masaaki Tsuruta, Taiji Ito, Masamichi Asukai
  • Publication number: 20090271202
    Abstract: A speech synthesis apparatus includes a content selection unit that selects a text content item to be converted into speech; a related information selection unit that selects related information which can be at least converted into text and which is related to the text content item selected by the content selection unit; a data addition unit that converts the related information selected by the related information selection unit into text and adds text data of the text to text data of the text content item selected by the content selection unit; a text-to-speech conversion unit that converts the text data supplied from the data addition unit into a speech signal; and a speech output unit that outputs the speech signal supplied from the text-to-speech conversion unit.
    Type: Application
    Filed: March 25, 2009
    Publication date: October 29, 2009
    Applicant: SONY ERICSSON MOBILE COMMUNICATIONS JAPAN, INC.
    Inventor: Susumu TAKATSUKA
  • Publication number: 20090271176
    Abstract: Methods, systems, and computer program products are provided for multilingual administration of enterprise data. Embodiments include retrieving enterprise data; extracting text from the enterprise data for rendering from digital media file, the extracted text being in a source language; identifying that the source language is not a predetermined default target language for rendering the enterprise data; translating the extracted text in the source language to translated text in the default target language; converting the translated text to synthesized speech in the default target language; and storing the synthesized speech in the default target language in a digital media file.
    Type: Application
    Filed: April 24, 2008
    Publication date: October 29, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: William K. Bodin, David Jaramillo, Ann Marie Maynard
  • Publication number: 20090259473
    Abstract: Methods and apparatus to present a video program to a visually impaired person are disclosed. An example method comprises receiving a video stream and an associated audio stream of a video program, detecting a portion of the video program that is not readily consumable by a visually impaired person, obtaining text associated with the portion of the video program, converting the text to a second audio stream, and combining the second audio stream with the associated audio stream.
    Type: Application
    Filed: April 14, 2008
    Publication date: October 15, 2009
    Inventors: Hisao M. Chang, Horst Schroeter
  • Publication number: 20090259471
    Abstract: A universal pattern processing system receives input data and produces output patterns that are best associated with said data. The system uses input means receiving and processing input data, a universal pattern decoder means transforming models using the input data and associating output patterns with original models that are changed least during transforming, and output means outputting best associated patterns chosen by a pattern decoder means.
    Type: Application
    Filed: April 11, 2008
    Publication date: October 15, 2009
    Applicant: International Business Machines Corporation
    Inventors: Dimitri KANEVSKY, David Nahamoo, Tara N. Sainath
  • Publication number: 20090240501
    Abstract: Described is a technology by which artificial words are generated based on seed words, and then used with a letter-to-sound conversion model. To generate an artificial word, a stressed syllable of a seed word is replaced with a different syllable, such as a candidate (artificial) syllable, when the phonemic structure and/or graphonemic structure of the stressed syllable and the candidate syllable match one another. In one aspect, the artificial words are provided for use with a letter-to-sound conversion model, which may be used to generate artificial phonemes from a source of words, such as in conjunction with other models. If the phonemes provided by the various models for a selected source word are in agreement relative to one another, the selected source word and an associated artificial phoneme may be added to a training set which may then be used to retrain the letter-to-sound conversion model.
    Type: Application
    Filed: March 19, 2008
    Publication date: September 24, 2009
    Applicant: MICROSOFT CORPORATION
    Inventors: Yi Ning Chen, Jia Li You, Frank Kao-ping Soong
  • Publication number: 20090187407
    Abstract: The present invention relates to a system and methods for preparing reports, such as medical reports. The system and methods advantageously can verbalize information, using speech synthesis (text-to-speech), to support a dialogue between a user and the reporting system during the course of the preparation of the report in order that the user can avoid inefficient visual distractions.
    Type: Application
    Filed: January 18, 2008
    Publication date: July 23, 2009
    Inventors: Jeffrey Soble, James Roberge
  • Publication number: 20090187408
    Abstract: A temporary child set is generated. An elastic ratio of an elastic section of a model pattern is calculated. A temporary typical pattern of the set is generated by combining the pattern belonging to the set with the model pattern having the elastic pattern expanded or contracted. A distortion between the temporary typical pattern of the set and the pattern belonging to the set is calculated, and a child set is determined as the set when the distortion is below a threshold. A typical pattern as the temporary typical pattern of the child set is stored with a classification rule as the classification item of the context of the pattern belonging to the child set.
    Type: Application
    Filed: January 23, 2009
    Publication date: July 23, 2009
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventor: Nobuaki MIZUTANI
  • Publication number: 20090177475
    Abstract: Even when a pitch cycle has a large fluctuation and the pitch cycle string changes abruptly, it possible to suppress the affect of the pitch cycle fluctuation and generate high-quality synthesized speech. A speech synthesis device generates a synthesized speech corresponding to an input text sentence according to an original speech waveform stored in original speech waveform information storage unit (25). The speech synthesis device includes pitch cycle correction unit (40) which extracts a fluctuation component of the pitch cycle of the original speech waveform which is obtained from original speech waveform information storage unit (25) in order to generate the synthesized speech and which corrects, based on the extracted fluctuation component, the pitch cycle of the synthesized speech obtained by analyzing the input text sentence. Pitch cycle correction unit (40) connects the pitch cycle waveform of the original speech waveform at the pitch cycle of the corrected synthesized speech.
    Type: Application
    Filed: July 4, 2007
    Publication date: July 9, 2009
    Applicant: NEC CORPORATION
    Inventor: Masanori Kato
  • Publication number: 20090157408
    Abstract: The present invention relates to a speech synthesizing method and apparatus based on a hidden Markov model (HMM). Among code words that are obtained by quantizing speech parameter instances for each state of an HMM model, a code word closest to a speech parameter generated from an input text using a known method is searched. When the distance between the searched code word and the speech parameter generated by the known method is smaller to or equal to a threshold value, the searched code word is output as a final speech parameter. When the distance exceeds the threshold value, the speech parameter generated by the known method is output as the final speech parameter. The final speech parameter is processed to generate final synthesized speech for the input text.
    Type: Application
    Filed: June 27, 2008
    Publication date: June 18, 2009
    Applicant: Electronics and Telecommunications Research Institute
    Inventor: Sanghun KIM
  • Publication number: 20090150157
    Abstract: A word dictionary including sets of a character string which constitutes a word, a phoneme sequence which constitutes pronunciation of the word and a part of speech of the word is referenced, an entered text is analyzed, the entered text is divided into one or more subtexts, a phoneme sequence and a part of speech sequence are generated for each subtext, the part of speech sequence of the subtext and a list of part of speech sequence are collated to determine whether the phonetic sound of the subtext is to be converted or not, and the phonetic sounds of the phoneme sequence in the subtext whose phonetic sounds are determined to be converted are converted.
    Type: Application
    Filed: September 15, 2008
    Publication date: June 11, 2009
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Takehiko KAGOSHIMA, Noriko YAMANAKA, Makoto YAJIMA
  • Publication number: 20090125309
    Abstract: Methods, Systems, and Products are disclosed for synthesizing speech. Text is received for translation to speech. The text is correlated to phrases, and each phrase is converted into a corresponding string of phonemes. A phoneme identifier is retrieved that uniquely represents each phoneme in the string of phonemes. Each phoneme identifier is concatenated to produce a sequence of phoneme identifiers with each phoneme identifier separated by a comma. Each sequence of phoneme identifiers is concatenated and separated by a semi-colon.
    Type: Application
    Filed: January 22, 2009
    Publication date: May 14, 2009
    Inventor: Steve Tischer
  • Publication number: 20090119091
    Abstract: A system and method for automated languages translation comprising a database containing pre-translated patterns that were translated by human translators, generating a transparent and seamless translation service. Whenever a user issues a translation request, the system offers suitable translated sentences from the aforementioned database. The system does so by separating the submitted text into elements and using a pattern recognition mechanism to identify a matching translation to each element. If there is no matching translated pattern in the database or if the user does not approve the translated sentence, the system transparently uses a suitable registered human translator to translate. The new translation is stored in the database, thus perfecting the database, and the translation request is delivered.
    Type: Application
    Filed: October 14, 2008
    Publication date: May 7, 2009
    Inventor: Eitan Chaim Sarig
  • Publication number: 20090083036
    Abstract: Described is a technology by which synthesized speech generated from text is evaluated against a prosody model (trained offline) to determine whether the speech will sound unnatural. If so, the speech is regenerated with modified data. The evaluation and regeneration may be iterative until deemed natural sounding. For example, text is built into a lattice that is then (e.g., Viterbi) searched to find a best path. The sections (e.g., units) of data on the path are evaluated via a prosody model. If the evaluation deems a section to correspond to unnatural prosody, that section is replaced, e.g., by modifying/pruning the lattice and re-performing the search. Replacement may be iterative until all sections pass the evaluation. Unnatural prosody detection may be biased such that during evaluation, unnatural prosody is falsely detected at a higher rate relative to a rate at which unnatural prosody is missed.
    Type: Application
    Filed: September 20, 2007
    Publication date: March 26, 2009
    Applicant: Microsoft Corporation
    Inventors: Yong Zhao, Frank Kao-ping Soong, Min Chu, Lijuan Wang
  • Publication number: 20090063128
    Abstract: Provided are a device and method for interactive machine translation. The device includes a machine translation engine having a morphological/syntactic analyzer for analyzing morphemes and sentences of an original text and generating original text analysis information, and a translation generator for generating a translation and translation generation information on the basis of the original text analysis information, and a user interface module for displaying sentence structures of the original text and the translation, and a relationship between the original text and the translation to a user on the basis of the original text analysis information and the translation generation information, and for receiving corrections to the original text or the translation from the user. The device and method provide a user interface whereby the user can effectively recognize and correct a mistranslated part and a cause of the mistranslation, and rapidly provides a re-translated result according to the correction.
    Type: Application
    Filed: September 5, 2008
    Publication date: March 5, 2009
    Applicant: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE
    Inventors: Young Ae SEO, Chang Hyun Kim, Seong Il Yang, Young Sook Hwang, Chang Hao Yin, Eun Jin Park, Sung Kwon Choi, Ki Young Lee, Oh Woog Kwon, Yoon Hyung Roh, Young Kil Kim
  • Publication number: 20090063154
    Abstract: Information about a device may be emotively conveyed to a user of the device. Input indicative of an operating state of the device may be received. The input may be transformed into data representing a simulated emotional state. Data representing an avatar that expresses the simulated emotional state may be generated and displayed. A query from the user regarding the simulated emotional state expressed by the avatar may be received. The query may be responded to.
    Type: Application
    Filed: November 5, 2008
    Publication date: March 5, 2009
    Applicant: Ford Global Technologies, LLC
    Inventors: Oleg Yurievitch Gusikhin, Perry Robinson MacNeille, Erica Klampfl, Kacie Alane Theisen, Dimitar Petrov Filev, Yifan Chen, Basavaraj Tonshal
  • Publication number: 20090055158
    Abstract: A speech translation apparatus includes a speech recognition unit configured to recognize input speech of a first language to generate a first text of the first language, an extraction unit configured to compare original prosody information of the input speech with first synthesized prosody information based on the first text to extract paralinguistic information about each of first words of the first text, a machine translation unit configured to translate the first text to a second text of a second language, a mapping unit configured to allocate the paralinguistic information about each of the first words to each of second words of the second text in accordance with synonymity, a generating unit configured to generate second synthesized prosody information based on the paralinguistic information allocated to each of the second words, and a speech synthesis unit configured to synthesize output speech based on the second synthesized prosody information.
    Type: Application
    Filed: August 21, 2008
    Publication date: February 26, 2009
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Dawei Xu, Takehiko Kagoshima
  • Publication number: 20090043585
    Abstract: Disclosed are systems, methods, and computer readable media for performing speech synthesis. The method embodiment comprises applying a first part of a speech synthesizer to a text corpus to obtain a plurality of phoneme sequences, the first part of the speech synthesizer only identifying possible phoneme sequences, for each of the obtained plurality of phoneme sequences, identifying joins that would be calculated to synthesize each of the plurality of respective phoneme sequences, and adding the identified joins to a cache for use in speech synthesis.
    Type: Application
    Filed: August 9, 2007
    Publication date: February 12, 2009
    Applicant: AT&T Corp.
    Inventor: Alistair D. CONKIE
  • Publication number: 20090030670
    Abstract: Embodiments of the present invention provide a method, system and computer program product for real-time multi-lingual adaptation of manufacturing instructions in a manufacturing management system. In one embodiment of the invention, a manufacturing language adaptation method can be provided. The method can include identifying an operator receiving manufacturing instruction, determining a primary language preference for the operator and determining whether or not the manufacturing instructions have been translated into the primary language preference. If it is determined that the manufacturing instructions have been translated into the primary language preference, the manufacturing instructions can be presented to the operator in the primary language preference. Otherwise the manufacturing instructions can be submitted to a translation engine for translation into the primary language preference.
    Type: Application
    Filed: July 25, 2007
    Publication date: January 29, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Ivory W. Knipfer, John W. Marreel, Kay M. Momsen, Ryan T. Paske
  • Publication number: 20090024385
    Abstract: A method and an apparatus for semantic parsing of electronic text documents. The electronic text documents can comprise a plurality of sentences with several language components. The method comprises analyzing at least one sentence of the electronic text document and dynamically generating a graph from the analyzed sentence of the text document. The graph represents a semantic representation of the analyzed one or more sentences. The method continues the analysis until an ambiguous sentence is determined and analyzed by evaluating at least a portion of the generated graph.
    Type: Application
    Filed: July 16, 2007
    Publication date: January 22, 2009
    Applicant: SEMGINE, GMBH
    Inventor: Martin Christian Hirsch
  • Publication number: 20090006096
    Abstract: Described is a voice persona service by which users convert text into speech waveforms, based on user-provided parameters and voice data from a service data store. The service may be remotely accessed, such as via the Internet. The user may provide text tagged with parameters, with the text sent to a text-to-speech engine along with base or custom voice data, and the resulting waveform morphed based on the tags. The user may also provide speech. Once created, a voice persona corresponding to the speech waveform may be persisted, exchanged, made public, shared and so forth. In one example, the voice persona service receives user input and parameters, and retrieves a base or custom voice that may be edited by the user via a morphing algorithm. The service outputs a waveform, such as a .wav file for embedding in a software program, and persists the voice persona corresponding to that waveform.
    Type: Application
    Filed: June 27, 2007
    Publication date: January 1, 2009
    Applicant: Microsoft Corporation
    Inventors: Yusheng Li, Min Chu, Xin Zou, Frank Kao-ping Soong
  • Publication number: 20080319755
    Abstract: According to an aspect of an embodiment, an apparatus for converting text data into sound signal, comprises: a phoneme determiner for determining phoneme data corresponding to a plurality of phonemes and pause data corresponding to a plurality of pauses to be inserted among a series of phonemes in the text data to be converted into sound signal; a phoneme length adjuster for modifying the phoneme data and the pause data by determining lengths of the phonemes, respectively in accordance with a speed of the sound signal and selectively adjusting the length of at least one of the phonemes which is placed immediately after one of the pauses so that the at least one of the phonemes is relatively extended timewise as compared to other phonemes; and a output unit for outputting sound signal on the basis of the adjusted phoneme data and pause data by the phoneme length adjuster.
    Type: Application
    Filed: June 24, 2008
    Publication date: December 25, 2008
    Applicant: FUJITSU LIMITED
    Inventors: Rika Nishiike, Hitoshi Sasaki, Nobuyuki Katae, Kentaro Murase, Takuya Noda