Text Analysis, Generation Of Parameters For Speech Synthesis Out Of Text, E.g., Grapheme To Phoneme Translation, Prosody Generation, Stress, Or Intonation Determination, Etc. (epo) Patents (Class 704/E13.011)
  • Patent number: 8886538
    Abstract: Systems and methods for speech synthesis and, in particular, text-to-speech systems and methods for converting a text input to a synthetic waveform by processing prosodic and phonetic content of a spoken example of the text input to accurately mimic the input speech style and pronunciation. Systems and methods provide an interface to a TTS system to allow a user to input a text string and a spoken utterance of the text string, extract prosodic parameters from the spoken input, and process the prosodic parameters to derive corresponding markup for the text input to enable a more natural sounding synthesized speech.
    Type: Grant
    Filed: September 26, 2003
    Date of Patent: November 11, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Andy Aaron, Raimo Bakis, Ellen M. Eide, Wael M. Hamza
  • Publication number: 20140074478
    Abstract: A speech replication system including a speech generation unit having a program running in a memory of the speech generation unit, the program executing the steps of receiving an audio stream, identifying words within the audio stream, analyzing each word to determine the audio characteristics of the speaker's voice, storing the audio characteristics of the speaker's voice in the memory, receiving text information, converting the text information into an output audio stream using the audio characteristics of the speaker stored in the memory, and playing the output audio stream.
    Type: Application
    Filed: September 7, 2012
    Publication date: March 13, 2014
    Applicant: ISPEECH CORP.
    Inventors: Heath Ahrens, Florencio Isaac Martin, Tyler A.R. Auten
  • Publication number: 20140067395
    Abstract: A system and method are described for engaging an audience in a conversational advertisement. A conversational advertising system converses with an audience using spoken words. The conversational advertising system uses a speech recognition application to convert an audience's spoken input into text and a text-to-speech application to transform text of a response to speech that is to be played to the audience. The conversational adverting system follows an advertisement script to guide the audience in a conversation.
    Type: Application
    Filed: August 28, 2012
    Publication date: March 6, 2014
    Applicant: Nuance Communications, Inc.
    Inventors: Sundar Balasubramanian, Michael McSherry, Aaron Sheedy
  • Publication number: 20140067397
    Abstract: Techniques disclosed herein include systems and methods that improve audible emotional characteristics used when synthesizing speech from a text source. Systems and methods herein use emoticons identified from a source text to provide contextual text-to-speech expressivity. In general, techniques herein analyze text and identify emoticons included within the text. The source text is then tagged with corresponding mood indicators. For example, if the system identifies an emoticon at the end of a sentence, then the system can infer that this sentence has a specific tone or mood associated with it. Depending on whether the emoticon is a smiley face, angry face, sad face, laughing face, etc., the system can infer use or mood from the various emoticons and then change or modify the expressivity of the TTS output such as by changing intonation, prosody, speed, pauses, and other expressivity characteristics.
    Type: Application
    Filed: August 29, 2012
    Publication date: March 6, 2014
    Applicant: NUANCE COMMUNICATIONS, INC.
    Inventor: Carey Radebaugh
  • Publication number: 20140025366
    Abstract: TXTVOICETRANS can pronounce the written word in the same language or in another language. TXTVOICETRANS is a Machine Translation computer system that can translate the source text into another language and, at the same time, pronounce the translated text, word by word, preserving fully the accent and the stress of the spoken word and the intonation of a sequence of words. The pronunciation is based on whole words. The computer system can pronounce the most used synonym of the word or the concept the translated word belongs to, instead of the translated word, displayed with the translation.
    Type: Application
    Filed: July 20, 2012
    Publication date: January 23, 2014
    Inventors: Hristo Tzanev Georgiev, Maria Theresia Georgiev(-Good)
  • Publication number: 20140025381
    Abstract: Instead of relying on humans to subjectively evaluate speech intelligibility of a subject, a system objectively evaluates the speech intelligibility. The system receives speech input and calculates confidence scores at multiple different levels using a Template Constrained Generalized Posterior Probability algorithm. One or multiple intelligibility classifiers are utilized to classify the desired entities on an intelligibility scale. A specific intelligibility classifier utilizes features such as the various confidence scores. The scale of the intelligibility classification can be adjusted to suit the application scenario. Based on the confidence score distributions and the intelligibility classification results at multiple levels an overall objective intelligibility score is calculated. The objective intelligibility scores can be used to rank different subjects or systems being assessed according to their intelligibility levels. The speech that is below a predetermined intelligibility (e.g.
    Type: Application
    Filed: July 20, 2012
    Publication date: January 23, 2014
    Applicant: MICROSOFT CORPORATION
    Inventors: Linfang Wang, Yan Teng, Lijuan Wang, Frank Kao-Ping Soong, Zhe Geng, William Brad Waller, Mark Tillman Hanson
  • Publication number: 20140019135
    Abstract: A method of speech synthesis including receiving a text input sent by a sender, processing the text input responsive to at least one distinguishing characteristic of the sender to produce synthesized speech that is representative of a voice of the sender, and communicating the synthesized speech to a recipient user of the system.
    Type: Application
    Filed: July 16, 2012
    Publication date: January 16, 2014
    Applicant: GENERAL MOTORS LLC
    Inventors: Gaurav Talwar, Xufang Zhao, Ron M. Hecht
  • Publication number: 20130238338
    Abstract: A method and apparatus for improved approaches for uttering the spelling of words and phrases over a communication session is described. The method includes determining a character to produce a first audio signal representing a phonetic utterance of the character, determining a code word that starts with a code word character identical to the character, and generating a second audio signal representing an utterance of the code word, wherein the first audio signal and the second audio signal are provided over a communication session for detection of the character.
    Type: Application
    Filed: March 6, 2012
    Publication date: September 12, 2013
    Applicant: Verizon Patent and Licensing, Inc.
    Inventors: Manish G. Kharod, Bhaskar R. Gudlavenkatasiva, Nityanand Sharma, Sutap Chatterjee, Ganesh Bhathivi
  • Publication number: 20130166915
    Abstract: A method for secure text-to-speech conversion of text using speech or voice synthesis that prevents the originator's voice from being used or distributed inappropriately or in an unauthorized manner is described. Security controls authenticate the sender of the message, and optionally the recipient, and ensure that the message is read in the originator's voice, not the voice of another person. Such controls permit an originator's voiceprint file to be publicly accessible, but limit its use for voice synthesis to text-based content created by the sender, or sent to a trusted recipient. In this way a person can be assured that their voice cannot be used for content they did not write.
    Type: Application
    Filed: December 22, 2011
    Publication date: June 27, 2013
    Applicant: RESEARCH IN MOTION LIMITED
    Inventors: Simon Peter DESAI, Neil Patrick ADAMS
  • Publication number: 20130144624
    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for reducing latency in web-browsing TTS systems without the use of a plug-in or Flash® module. A system configured according to the disclosed methods allows the browser to send prosodically meaningful sections of text to a web server. A TTS server then converts intonational phrases of the text into audio and responds to the browser with the audio file. The system saves the audio file in a cache, with the file indexed by a unique identifier. As the system continues converting text into speech, when identical text appears the system uses the cached audio corresponding to the identical text without the need for re-synthesis via the TTS server.
    Type: Application
    Filed: December 1, 2011
    Publication date: June 6, 2013
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: Alistair D. CONKIE, Mark Charles Beutnagel, Taniya Mishra
  • Publication number: 20130132087
    Abstract: Methods, systems, and apparatus are generally described for providing an audio interface.
    Type: Application
    Filed: November 21, 2011
    Publication date: May 23, 2013
    Applicant: EMPIRE TECHNOLOGY DEVELOPMENT LLC
    Inventors: Noriaki Kuwahara, Tsutomu Miyasato, Yasuyuki Sumi
  • Publication number: 20130102295
    Abstract: A mobile voice platform for providing a user speech interface to computer-based services includes a mobile device having a processor, communication circuitry that provides access to the computer-based services, an operating system, and one or more applications that are run using the operating system and that utilize one or more of the computer-based services via the communication circuitry.
    Type: Application
    Filed: September 27, 2012
    Publication date: April 25, 2013
    Applicant: GM GLOBAL TECHNOLOGY OPERATIONS LLC
    Inventor: GM Global Technology Operations LLC
  • Publication number: 20130085758
    Abstract: A telecare and/or telehealth communication method is described. The method comprises providing predetermined voice messages configured to ask questions to or to give instructions to an assisted individual, providing an algorithm configured to communicate with the assisted individual, and communicating at least one of the predetermined voice messages configured to ask questions to or to give instructions to the assisted individual. The method further comprises analyzing a responsiveness and/or compliance characteristics of the assisted individual, and providing the assisted individual with voice messages in a form most acceptable and effective for the assisted individual on the basis of the analyzed responsiveness and/or the analyzed compliance characteristics.
    Type: Application
    Filed: September 28, 2012
    Publication date: April 4, 2013
    Applicant: GENERAL ELECTRIC COMPANY
    Inventors: Csenge CSOMA, Akos ERDOS, Alan DAVIES
  • Publication number: 20130080175
    Abstract: According to one embodiment, a markup assistance apparatus includes an acquisition unit, a first calculation unit, a detection unit and a presentation unit. The acquisition unit acquires feature amount for respective tags, each of the tags being used to control text-to-speech processing of a markup text. The first calculation unit calculates, for respective character strings, a variance of feature amounts of the tags which are assigned to the character string in a markup text. The detection unit detects first character string assigned first tag having the variance not less than a first threshold value as a first candidate including the tag to be corrected. The presentation unit presents the first candidate.
    Type: Application
    Filed: September 24, 2012
    Publication date: March 28, 2013
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Kouichirou Mori, Masahiro Morita
  • Publication number: 20130073287
    Abstract: A method, computer program product, and system for voice pronunciation for text communication is described. A selected portion of a text communication is determined. A prompt to record a pronunciation relating to the selected portion of the text communication is provided at a first computing device. The recorded pronunciation is associated with the selected portion of the text communication. A visual indicator, relating to the selected portion of the text communication and the recorded pronunciation, is displayed.
    Type: Application
    Filed: September 20, 2011
    Publication date: March 21, 2013
    Applicant: International Business Machines Corporation
    Inventors: Kristina Beckley, Vincent Burckhardt, Alexis Yao Pang Song, Smriti Talwar
  • Publication number: 20130073288
    Abstract: An email system for mobile devices, such as cellular phones and PDAs, is disclosed which allows email messages to be played back on the mobile device as voice messages on demand by way of a media player, thus eliminating the need for a unified messaging system. Email messages are received by the mobile device in a known manner. In accordance with an important aspect of the invention, the email messages are identified by the mobile device as they are received. After the message is identified, the mobile device sends the email message in text format to a server for conversion to speech or voice format. After the message is converted to speech format, the server sends the messages back to the user's mobile device and notifies the user of the email message and then plays the message back to the user through a media player upon demand.
    Type: Application
    Filed: November 15, 2012
    Publication date: March 21, 2013
    Applicant: NUANCE COMMUNICATIONS, INC.
    Inventor: Nuance Communications, Inc.
  • Publication number: 20130066632
    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for modifying the prosody of synthesized speech based on an associated speech act. A system configured according to the method embodiment (1) receives text, (2) performs an analysis of the text to determine and assign a speech act label to the text, and (3) converts the text to speech, where the speech prosody is based on the speech act label. The analysis performed compares the text to a corpus of previously tagged utterances to find a close match, determines a confidence score from a correlation of the text and the close match, and, if the confidence score is above a threshold value, retrieving the speech act label of the close match and assigning it to the text.
    Type: Application
    Filed: September 14, 2011
    Publication date: March 14, 2013
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: Alistair D. Conkie, Srinivas Bangalore, Vivek Kumar Rangarajan Sridhar, Ann K. Syrdal
  • Patent number: 8391544
    Abstract: An image processing apparatus includes: a storage module configured to store a plurality of pieces of comment data; an analyzing module configured to analyze an expression of a person contained in image data; a generating module configured to select a target comment data from among the comment data stored in the storage module based on the expression of the person analyzed by the analyzing module, and to generate voice data using the target comment data; and an output module configured to output reproduction data to be used for displaying the image data together with the voice data generated by the generating module.
    Type: Grant
    Filed: June 1, 2010
    Date of Patent: March 5, 2013
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Kousuke Imoji, Yuki Kaneko, Junichi Takahashi
  • Publication number: 20130046541
    Abstract: An apparatus for assisting visually impaired persons includes a headset. A camera is mounted on the headset. A microprocessor communicates with the camera for receiving an optically read code captured by the camera and converting the optically read code to an audio signal as a function of a trigger contained within the optical code. A speaker communicating with the processor outputs the audio signal.
    Type: Application
    Filed: August 3, 2012
    Publication date: February 21, 2013
    Inventors: Ronald L. Klein, James A. Kutsch, JR.
  • Publication number: 20130041669
    Abstract: A method, system, and computer program product are provided for speech output with confidence indication. The method includes receiving a confidence score for segments of speech or text to be synthesized to speech. The method includes modifying a speech segment by altering one or more parameters of the speech proportionally to the confidence score.
    Type: Application
    Filed: October 17, 2012
    Publication date: February 14, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: International Business Machines Corporation
  • Publication number: 20130041668
    Abstract: A voice learning apparatus includes a learning-material voice storage unit that stores learning material voice data including example sentence voice data; a learning text storage unit that stores a learning material text including an example sentence text; a learning-material text display controller that displays the learning material text; a learning-material voice output controller that performs voice output based on the learning material voice data; an example sentence specifying unit that specifies the example sentence text during the voice output; an example-sentence voice output controller that performs voice output based on the example sentence voice data associated with the specified example sentence text; and a learning-material voice output restart unit that restarts the voice output from a position where the voice output is stopped last time, after the voice output is performed based on the example sentence voice data.
    Type: Application
    Filed: August 7, 2012
    Publication date: February 14, 2013
    Applicant: Casio Computer Co., Ltd
    Inventor: Daisuke NAKAJIMA
  • Publication number: 20130030810
    Abstract: The present invention provides a frugal method for extraction of speech data and associated transcription from plurality of web resources (internet) for speech corpus creation characterized by an automation of the speech corpus creation and cost reduction. An integration of existing speech corpus with extracted speech data and its transcription from the web resources to build an aggregated rich speech corpus that are effective and easy to adapt for generating acoustic and language models for (Automatic Speech Recognition) ASR systems.
    Type: Application
    Filed: June 26, 2012
    Publication date: January 31, 2013
    Applicant: Tata Consultancy Services Limited
    Inventors: Sunil Kumar Kopparapu, Imran Ahmed Sheikh
  • Publication number: 20130018658
    Abstract: A prompt generation engine operates to dynamically extend prompts of a multimodal application. The prompt generation engine receives a media file having a metadata container. The prompt generation engine operates on a multimodal device that supports a voice mode and a non-voice mode for interacting with the multimodal device. The prompt generation engine retrieves from the metadata container a speech prompt related to content stored in the media file for inclusion in the multimodal application. The prompt generation engine modifies the multimodal application to include the speech prompt.
    Type: Application
    Filed: September 12, 2012
    Publication date: January 17, 2013
    Applicant: International Business Machiness Corporation
    Inventors: Ciprian Agapi, William K. Bodin, Charles W. Cross, JR.
  • Publication number: 20130006620
    Abstract: A system and method for providing automatic and coordinated sharing of conversational resources, e.g., functions and arguments, between network-connected servers and devices and their corresponding applications. In one aspect, a system for providing automatic and coordinated sharing of conversational resources includes a network having a first and second network device, the first and second network device each comprising a set of conversational resources, a dialog manager for managing a conversation and executing calls requesting a conversational service, and a communication stack for communicating messages over the network using conversational protocols, wherein the conversational protocols establish coordinated network communication between the dialog managers of the first and second network device to automatically share the set of conversational resources of the first and second network device, when necessary, to perform their respective requested conversational service.
    Type: Application
    Filed: September 11, 2012
    Publication date: January 3, 2013
    Applicant: Nuance Communications, Inc.
    Inventors: Stephane H. Maes, Ponani Gopalakrishnan
  • Publication number: 20120330667
    Abstract: Included in a speech synthesizer, a natural language processing unit divides text data, input from a text input unit, into a plurality of components (particularly, words). An importance prediction unit estimates an importance level of each component according to the degree of how much each component contributes to understanding when a listener hears synthesized speech. Then, the speech synthesizer determines a processing load based on the device state when executing synthesis processing and the importance level. Included in the speech synthesizer, a synthesizing control unit and a wave generation unit reduce the processing time for a phoneme with a low importance level by curtailing its processing load (relatively degrading its sound quality), allocate a part of the processing time, made available by this reduction, to the processing time of a phoneme with a high importance level, and generates synthesized speech in which important words are easily audible.
    Type: Application
    Filed: June 20, 2012
    Publication date: December 27, 2012
    Inventors: Qinghua Sun, Kenji Nagamatsu, Yusuke Fujita
  • Publication number: 20120330665
    Abstract: A system is configured to read a prescription label and output audio information corresponding to prescription information present on or linked to the prescription label. The system may have knowledge about prescription labels and prescription information, and use this knowledge to present the audio information in a structured form to the user.
    Type: Application
    Filed: June 4, 2012
    Publication date: December 27, 2012
    Applicant: Labels That Talk, LTD
    Inventor: Kenneth Berkun
  • Publication number: 20120330666
    Abstract: A method and system for vocalizing user-selected sporting event scores. A customized spoken score application module can be configured in association with a device. A real-time score can be preselected by a user from an existing sporting event website for automatically vocalizing the score in a multitude of languages utilizing a speech synthesizer and a translation engine. An existing text-to-speech engine can be integrated with the spoken score application module and controlled by the application module to automatically vocalize the preselected scores listed on the sporting event site. The synthetically-voiced, real-time score can be transmitted to the device at a predetermined time interval. Such an approach automatically and instantly pushes the real time vocal alerts thereby permitting the user to continue multitasking without activating the pre-selected vocal alerts.
    Type: Application
    Filed: June 6, 2012
    Publication date: December 27, 2012
    Inventors: Anthony Verna, Luis M. Ortiz
  • Publication number: 20120330668
    Abstract: A customized live tile application module can be configured in association with the mobile communication device in order to automatically vocalize the real-time information preselected by a user in a multitude of languages. A text-to-speech application module can be integrated with the customized live tile application module to automatically vocalize the preselected real-time information. The real-time information can be obtained from a tile and/or a website integrated with a remote server and announced after a text to speech conversion process without opening the tile, if the tiles are selected for announcement of information by the device. Such an approach automatically and instantly pushes a vocal alert with respect to the user-selected real-time information on the mobile communication device thereby permitting the user to continue multitasking. Information from tiles can also be rendered on second screens from a mobile device.
    Type: Application
    Filed: August 15, 2012
    Publication date: December 27, 2012
    Inventors: Anthony Verna, Luis M. Ortiz
  • Publication number: 20120323578
    Abstract: A sound control section (114) selects and outputs a text-to-speech item from items included in program information multiplexed with a broadcast signal; and starts or stops outputting the text-to-speech item, based on request from a remote controller control section (113). A sound generation section (115) converts the text-to-speech item to a sound signal. A speaker (109) reproduces the sound signal. The sound control section (114) compares each item of information about a program currently selected by user's operation of the remote controller, with each item of information about the previous program selected just before the user's operation. If an item of the currently selected program information is the same as the corresponding item of the operation-prior program information, and text-to-speech processing has been already completed for the item after the last change in the item, the sound control section (114) stops outputting the item to the sound generation section (115).
    Type: Application
    Filed: February 23, 2011
    Publication date: December 20, 2012
    Applicant: PANASONIC CORPORATION
    Inventor: Koumei Kubota
  • Publication number: 20120316881
    Abstract: A normalized spectrum storage unit 204 prestores normalized spectra calculated based on a random number series. A voiced sound generating unit 201 generates voiced sound waveforms based on a plurality of segments of voiced sounds corresponding to an inputted text and the normalized spectra stored in the normalized spectrum storage unit 204. An unvoiced sound generating unit 202 generates unvoiced sound waveforms based on a plurality of segments of unvoiced sounds corresponding to the inputted text. A synthesized speech generating unit 203 generates a synthesized speech based on the voiced sound waveforms generated by the voiced sound generating unit 201 and the unvoiced sound waveforms generated by the unvoiced sound generating unit 202.
    Type: Application
    Filed: March 23, 2011
    Publication date: December 13, 2012
    Applicant: NEC CORPORATION
    Inventor: Masanori Kato
  • Publication number: 20120310642
    Abstract: Techniques are provided for creating a mapping that maps locations in audio data (e.g., an audio book) to corresponding locations in text data (e.g., an e-book). Techniques are provided for using a mapping between audio data and text data, whether or not the mapping is created automatically or manually. A mapping may be used for bookmark switching where a bookmark established in one version of a digital work is used to identify a corresponding location with another version of the digital work. Alternatively, the mapping may be used to play audio that corresponds to text selected by a user. Alternatively, the mapping may be used to automatically highlight text in response to audio that corresponds to the text being played. Alternatively, the mapping may be used to determine where an annotation created in one media context (e.g., audio) will be consumed in another media context (e.g., text).
    Type: Application
    Filed: October 6, 2011
    Publication date: December 6, 2012
    Applicant: APPLE INC.
    Inventors: Xiang Cao, Alan C. Cannistraro, Gregory S. Robbin, Casey M. Dougherty
  • Publication number: 20120310643
    Abstract: Techniques for presenting data input as a plurality of data chunks including a first data chunk and a second data chunk. The techniques include converting the plurality of data chunks to a textual representation comprising a plurality of text chunks including a first text chunk corresponding to the first data chunk and a second text chunk corresponding to the second data chunk, respectively, and providing a presentation of at least part of the textual representation such that the first text chunk is presented differently than the second text chunk to, when presented, assist a user in proofing the textual representation.
    Type: Application
    Filed: May 23, 2012
    Publication date: December 6, 2012
    Applicant: Nuance Communications, Inc.
    Inventors: Martin Labsky, Jan Kleindienst, Tomas Macek, David Nahamoo, Jan Curin, Lars Koenig, Holger Quast
  • Publication number: 20120303371
    Abstract: Techniques for disambiguating at least one text segment from at least one acoustically similar word and/or phrase. The techniques include identifying at least one text segment, in a textual representation having a plurality of text segments, having at least one acoustically similar word and/or phrase, annotating the textual representation with disambiguating information to help disambiguate the at least one text segment from the at least one acoustically similar word and/or phrase, and synthesizing a speech signal, at least in part, by performing text-to-speech synthesis on at least a portion of the textual representation that includes the at least one text segment, wherein the speech signal includes speech corresponding to the disambiguating information located proximate the portion of the speech signal corresponding to the at least one text segment.
    Type: Application
    Filed: May 23, 2012
    Publication date: November 29, 2012
    Applicant: Nuance Communications, Inc.
    Inventors: Martin Labsky, Jan Kleindienst, Tomas Macek, David Nahamoo, Jan Curin, William F. Ganong, III
  • Publication number: 20120290304
    Abstract: A book support and optical scanner assembly for converting printed text to an audio output includes a support for supporting an open book and a pair of optical scanners adapted to scan opposite pages. The assembly also includes means for moving the scanners from the top of the page to the bottom of a page. Further, both scanners can be rotated off of the book for turning a page. In addition, the assembly includes a text to audio converter for converting the scanned text into spoken words and in one embodiment a translator to translate the scanned text into a pre-selected language.
    Type: Application
    Filed: May 9, 2011
    Publication date: November 15, 2012
    Inventor: Khaled Jafar Al-Hasan
  • Publication number: 20120284028
    Abstract: Methods and apparatus to present a video program to a visually impaired person are disclosed. An example method comprises detecting a text portion of a media stream including a video stream, the text portion not being consumable by a blind person, retrieving text associated with the text portion of the media stream, and converting the text to a first audio stream based on a first type of a first program in the media stream, and converting the text to a second audio stream based on a second type of a second program in the media stream.
    Type: Application
    Filed: July 19, 2012
    Publication date: November 8, 2012
    Inventors: Hisao M. Chang, Horst Schroeter
  • Publication number: 20120278081
    Abstract: A text-to-speech method for use in a plurality of languages, including: inputting text in a selected language; dividing the inputted text into a sequence of acoustic units; converting the sequence of acoustic units to a sequence of speech vectors using an acoustic model, wherein the model has a plurality of model parameters describing probability distributions which relate an acoustic unit to a speech vector; and outputting the sequence of speech vectors as audio in the selected language. A parameter of a predetermined type of each probability distribution in the selected language is expressed as a weighted sum of language independent parameters of the same type. The weighting used is language dependent, such that converting the sequence of acoustic units to a sequence of speech vectors includes retrieving the language dependent weights for the selected language.
    Type: Application
    Filed: June 10, 2009
    Publication date: November 1, 2012
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Byung Ha Chun, Sacha Krstulovic
  • Publication number: 20120265532
    Abstract: Embodiments of the invention include a system for providing a natural language objective assessment of relative color quality between a reference and a source image. The system may include a color converter that receives a difference measurement between the reference image and source image and determines a color attribute change based on the difference measurement. The color attributes may include hue shift, saturation changes, and color variation, for instance. Additionally, a magnitude index facility determines a magnitude of the determined color attribute change. Further, a natural language selector maps the color attribute change and the magnitude of the change to natural language and generates a report of the color attribute change and the magnitude of the color attribute change. The output can then be communicated to a user in either text or audio form, or in both text and audio forms.
    Type: Application
    Filed: April 15, 2011
    Publication date: October 18, 2012
    Applicant: TEKTRONIX, INC.
    Inventor: KEVIN M. FERGUSON
  • Publication number: 20120253816
    Abstract: A system and method to allow an author of an instant message to enable and control the production of audible speech to the recipient of the message. The voice of the author of the message is characterized into parameters compatible with a formative or articulative text-to-speech engine such that upon receipt, the receiving client device can generate audible speech signals from the message text according to the characterization of the author's voice. Alternatively, the author can store samples of his or her actual voice in a server so that, upon transmission of a message by the author to a recipient, the server extracts the samples needed only to synthesize the words in the text message, and delivers those to the receiving client device so that they are used by a client-side concatenative text-to-speech engine to generate audible speech signals having a close likeness to the actual voice of the author.
    Type: Application
    Filed: June 12, 2012
    Publication date: October 4, 2012
    Applicant: Nuance Communications, Inc.
    Inventors: Terry Wade Niemeyer, Liliana Orozco
  • Publication number: 20120239405
    Abstract: A system and method for generating audio content. Content is automatically retrieved from a website. The content is converted to audio files. The audio files are associated with a hierarchy. The hierarchy is determined from the website. One or more audio files are communicated to an electronic device utilized by a user in response to a request from the user.
    Type: Application
    Filed: May 30, 2012
    Publication date: September 20, 2012
    Inventors: William C. O'Conor, Nathan T. Bradley
  • Publication number: 20120203553
    Abstract: A recognition dictionary creating device includes a user dictionary in which a phoneme label string of an inputted voice is registered and an interlanguage acoustic data mapping table in which a correspondence between phoneme labels in different languages is defined, and refers to the interlanguage acoustic data mapping table to convert the phoneme label string registered in the user dictionary and expressed in a language set at the time of creating the user dictionary into a phoneme label string expressed in another language which the recognition dictionary creating device has switched.
    Type: Application
    Filed: January 22, 2010
    Publication date: August 9, 2012
    Inventor: Yuzo Maruta
  • Publication number: 20120191457
    Abstract: Techniques for predicting prosody in speech synthesis may make use of a data set of example text fragments with corresponding aligned spoken audio. To predict prosody for synthesizing an input text, the input text may be compared with the data set of example text fragments to select a best matching sequence of one or more example text fragments, each example text fragment in the sequence being paired with a portion of the input text. The selected example text fragment sequence may be aligned with the input text, e.g., at the word level, such that prosody may be extracted from the audio aligned with the example text fragments, and the extracted prosody may be applied to the synthesis of the input text using the alignment between the input text and the example text fragments.
    Type: Application
    Filed: January 24, 2011
    Publication date: July 26, 2012
    Applicant: Nuance Communications, Inc.
    Inventors: Stephen Minnis, Andrew P. Breen
  • Publication number: 20120184201
    Abstract: A compact display unit including a mobile application server. The mobile application server is mounted in the vehicle for receipt and transmission of communications. The mobile application server is operatively connected to the compact display unit. The compact display unit presents to a vehicle operator a plurality of pre-selected permissive communications for intercommunication between the vehicle and a remote base station located outside of the vehicle. A touch screen connects to the monitor and presents the programmed permissible communication options to the vehicle operator. The permissible communications options are capable of being masked by the remote base station, and cannot be unmasked by the vehicle operator.
    Type: Application
    Filed: March 20, 2012
    Publication date: July 19, 2012
    Applicant: QUALCOMM INCORPORATED
    Inventors: Michael Joseph Contour, Daniel Alexander Van Oosten Slingeland, Paul Michael Banasik, Marquis D. Doyle, III
  • Publication number: 20120185253
    Abstract: Embodiments are disclosed that relate to converting markup content to an audio output. For example, one disclosed embodiment provides, in a computing device a method including partitioning a markup document into a plurality of content panels, and forming a subset of content panels by filtering the plurality of content panels based upon geometric and/or location-based criteria of each panel relative to an overall organization of the markup document. The method further includes determining a document object model (DOM) analysis value for each content panel of the subset of content panels, identifying a set of content panels determined to contain text body content by filtering the subset of content panels based upon the DOM analysis value of each of the content panels of the subset of content panels, and converting text in a selected content panel determined to contain text body content to an audio output.
    Type: Application
    Filed: January 18, 2011
    Publication date: July 19, 2012
    Applicant: MICROSOFT CORPORATION
    Inventors: Chundong Wang, Philomena Lobo, Rui Zhou
  • Publication number: 20120143605
    Abstract: In one implementation, a collaboration server is a conference bridge or other network device configured to host an audio and/or video conference among a plurality of conference participants. The collaboration server sends conference data and a media stream including speech to a speech recognition engine. The conference data may include the conference roster or text extracted from documents or other files shared in the conference. The speech recognition engine updates a default language model according to the conference data and transcribes the speech in the media stream based on the updated language model. In one example, the performance of default language model, the updated language model, or both may be tested using a confidence interval or submitted for approval of the conference participant.
    Type: Application
    Filed: December 1, 2010
    Publication date: June 7, 2012
    Applicant: Cisco Technology, Inc.
    Inventors: Tyrone Terry Thorsen, Alan Darryl Gatzke
  • Publication number: 20120130718
    Abstract: A prompt collecting tool (190) for an interactive voice response system (100) includes a voice enabled application server (150), a voice simulator coupled to the voice enabled application server, and a processor coupled to the voice simulator. The processor can be programmed to execute (202) a voice application having a plurality of audio prompts, play (206) audio if a pre-stored audio is available for a particular prompt, capture (208) text when no pre-stored audio is available and forward (210) the captured text to the prompt collecting tool. The voice simulator can include a VoiceXML browser (160), a text to speech text service (170), and a text based recognition service (180) for example.
    Type: Application
    Filed: January 27, 2012
    Publication date: May 24, 2012
    Applicant: Nuance Communications, Inc.
    Inventors: Girish Dhanakshirur, James R. Lewis
  • Publication number: 20120123765
    Abstract: Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for presenting alternative translations. In one aspect, a method includes receiving source language text; receiving translated text corresponding to the source language text from a machine translation system; receiving segmentation data for the translated text, wherein the segmentation data includes a first segmentation of the translated text, the first segmentation dividing the translated text into two or more segments; receiving one or more alternative translations for each of the two or more segments; presenting the source text and the translated text to a user in a user interface; and in response to a user selection of a first portion of the translated text, displaying, in the user interface, one or more alternative translations for a first segment to which the first portion of translated text corresponds according to the first segmentation.
    Type: Application
    Filed: March 11, 2011
    Publication date: May 17, 2012
    Applicant: GOOGLE INC.
    Inventors: Joshua Estelle, Shankar Kumar, Wolfgang Macherey, Franz Josef Och, Peng Xu, Awaneesh Verma
  • Publication number: 20120114130
    Abstract: A cognitive load reduction system comprises a sound source position decision engine configured to receive one or more audio signals from a corresponding one or more signal generators, wherein the sound source position decision engine is further configured to identify two or more discrete sound sources within at least one of the one or more audio signals. The cognitive load reduction system further comprises an environmental assessment engine configured to assess environmental sounds within an environment. The cognitive load reduction system further comprises a sound location engine configured to output one or more audio signals configured to cause a plurality of speakers to change a perceived location of at least one of the discrete sound sources within the environment responsive to locations of other sounds within the environment.
    Type: Application
    Filed: November 9, 2010
    Publication date: May 10, 2012
    Applicant: Microsoft Corporation
    Inventor: Andrew Lovitt
  • Publication number: 20120109648
    Abstract: A communication system is described. The communication system including an automatic speech recognizer configured to receive a speech signal and to convert the speech signal into a text sequence. The communication also including a speech analyzer configured to receive the speech signal. The speech analyzer configured to extract paralinguistic characteristics from the speech signal. In addition, the communication system includes a speech output device coupled with the automatic speech recognizer and the speech analyzer. The speech output device configured to convert the text sequence into an output speech signal based on the extracted paralinguistic characteristics.
    Type: Application
    Filed: October 30, 2011
    Publication date: May 3, 2012
    Inventor: Fathy Yassa
  • Publication number: 20120089402
    Abstract: According to one embodiment, a speech synthesizer includes an analyzer, a first estimator, a selector, a generator, a second estimator, and a synthesizer. The analyzer analyzes text and extracts a linguistic feature. The first estimator selects a first prosody model adapted to the linguistic feature and estimates prosody information that maximizes a first likelihood representing probability of the selected first prosody model. The selector selects speech units that minimize a cost function determined in accordance with the prosody information. The generator generates a second prosody model that is a model of the prosody information of the speech units. The second estimator estimates prosody information that maximizes a third likelihood calculated on the basis of the first likelihood and a second likelihood representing probability of the second prosody model. The synthesizer generates synthetic speech by concatenating the speech units on the basis of the prosody information estimated by the second estimator.
    Type: Application
    Filed: October 12, 2011
    Publication date: April 12, 2012
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Javier Latorre, Masami Akamine
  • Publication number: 20120065979
    Abstract: A system and method for text to speech conversion. The method of performing text to speech conversion on a portable device includes: identifying a portion of text for conversion to speech format, wherein the identifying includes performing a prediction based on information associated with a user. While the portable device is connected to a power source, a text to speech conversion is performed on the portion of text to produce converted speech. The converted speech is stored into a memory device of the portable device. A reader application is executed, wherein a user request is received for narration of the portion of text.
    Type: Application
    Filed: September 14, 2010
    Publication date: March 15, 2012
    Applicant: SONY CORPORATION
    Inventors: Ling Jun Wong, True Xiong