Speech Synthesis; Text To Speech Systems (epo) Patents (Class 704/E13.001)
  • Patent number: 11541531
    Abstract: A method for controlling a robot (1) for handling a part to be handled (14), the handling robot (1) being linked to a control interface comprising a glove (40) comprising a first finger (41) provided with a first contact sensor (42) and a second finger (43) provided with a second contact sensor (44), the method comprising the following steps; a) associating, in a signal library (25), a first and a second recorded combination of signals (26, 21); b) acquiring a combination of signals originating from the sensors (26, 27) of the glove (40); c) comparing the acquired combination of signals with the recorded combinations (27, 28, 29) in the library (25); d) controlling the handling robot (1) in such a way as to perform a movement according to the velocity vector associated with the acquired combination of signals. A handling glove (40) and handling device implementing the method.
    Type: Grant
    Filed: November 13, 2018
    Date of Patent: January 3, 2023
    Assignee: COMMISSARIAT A L'ENERGIE ATOMIQUE ET AUX ENERGIES ALTERNATIVES
    Inventor: Franck Geffard
  • Patent number: 8675854
    Abstract: A system and method for merging multi-modal communications are disclosed. The multi-modal communications can be synchronous, asynchronous and semi-synchronous. By way of a non-limiting example, at least two devices operating with varied modalities can be connected to a conferencing appliance. The conferencing appliance can integrate the differing modalities from the at least two devices by executing at least one of turn taking, conference identification, participant identification, ordering of interjections, modulation of meaning, expectation of shared awareness, floor domination and combination thereof.
    Type: Grant
    Filed: May 1, 2012
    Date of Patent: March 18, 2014
    Assignee: Mitel Networks Corporation
    Inventors: Alain Michaud, Trung (Tim) Trinh, Tom Gray
  • Publication number: 20140039892
    Abstract: In one embodiment, a human interactive proof portal 140 may use a biometric input to determine whether a user is a standard user or a malicious actor. The human interactive proof portal 140 may receive an access request 302 for an online data service 122 from a user device 110. The human interactive proof portal 140 may send a proof challenge 304 to the user device 110 for presentation to a user. The human interactive proof portal 140 may receive from the user device 110 a proof response 306 having a biometric metadata description 430 based on a biometric input from the user.
    Type: Application
    Filed: August 2, 2012
    Publication date: February 6, 2014
    Applicant: Microsoft Corporation
    Inventors: Chad Mills, Robert Sim, Scott Laufer, Sung Chung
  • Publication number: 20140019134
    Abstract: A text-to-speech (TTS) engine combines recorded speech with synthesized speech from a TTS synthesizer based on text input. The TTS engine receives the text input and identifies the domain for the speech (e.g. navigation, dialing, . . . ). The identified domain is used in selecting domain specific speech recordings (e.g. pre-recorded static phrases such as “turn left”, “turn right” . . . ) from the input text. The speech recordings are obtained based on the static phrases for the domain that are identified from the input text. The TTS engine blends the static phrases with the TTS output to smooth the acoustic trajectory of the input text. The prosody of the static phrases is used to create similar prosody in the TTS output.
    Type: Application
    Filed: July 12, 2012
    Publication date: January 16, 2014
    Applicant: Microsoft Corporation
    Inventors: Sheng Zhao, Peng Wang, Difei Gao, Yijian Wu, Binggong Ding, Shenghua Ye, Max Leung
  • Publication number: 20140003596
    Abstract: For generating privacy, a detection module detects an optical lingual cue from user speech that generates an audible signal. A generation module transmits an inverse audible signal generated from the optical lingual cue.
    Type: Application
    Filed: June 28, 2012
    Publication date: January 2, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Robert T. Arenburg, Franck Barillaud, Shiv Dutta, Alfredo V. Mendoza
  • Publication number: 20130332167
    Abstract: According to some aspects, a method of providing an interactive audio presentation, at least in part, by traversing a plurality of audio animations, each audio animation comprising a plurality of frames, each of the plurality of frames comprising a duration, at least one audio element, and at least one gate indicating criteria for transitioning to and identification of a subsequent frame and/or a subsequent animation is provided. The method comprises rendering a first audio animation, receiving input from the user associated with the presentation, selecting a second audio animation based, at least in part, on the input, and rendering the second audio animation. Some aspects include a system for to performing the above method and some aspects include a computer readable medium storing instructions that perform the above method when executed by at least one processor.
    Type: Application
    Filed: June 12, 2012
    Publication date: December 12, 2013
    Applicant: Nuance Communications, Inc.
    Inventor: Robert M. Kilgore
  • Publication number: 20130253935
    Abstract: Methods, apparatuses, and computer program products for indicating a page number of an active document page within a document are provided. Embodiments include detecting, by a presentation controller, activation of a document page on a presentation device; in response to detecting the activation of the document page on the presentation device, tracking, by the presentation controller, an amount of time that the document page is consecutively active on the presentation device; determining, by the presentation controller, that the amount of time that the document page is consecutively active on the presentation device exceeds a predetermined threshold; and in response to determining that the predetermined threshold has been exceeded, providing to a target source, by the presentation controller, an output indicating a page number of the document page while the document page is active on the presentation device.
    Type: Application
    Filed: March 23, 2012
    Publication date: September 26, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Raghuswamyreddy Gundam, Newton P. Liu, Douglas W. Oliver, Terence Rodrigues, Wingcheung Tam
  • Publication number: 20130246067
    Abstract: A system for producing automated medical reports. The interface includes a menu area and a medical report area which is distinct from the menu area. The menu area includes a list of names representing medical conditions. The doctor may make different selections of names from the menu area as the medical service is being rendered to build a report in the medical report area. If a medical condition is not listed in the menu area, the doctor may add a new field for it and select/enter a name and a descriptor for the new field. Whereby, the field is automatically added in the menu area, and the name is automatically displayed in the new field without exiting the report/interface. Upon receiving a user selection of the new name, the descriptor associated therewith is retrieved from the memory and added in the medical report area without exiting the report/interface.
    Type: Application
    Filed: March 15, 2012
    Publication date: September 19, 2013
    Applicant: Sylvain Mailhot, Pathologiste SPRCP inc
    Inventor: Sylvain Mailhot
  • Publication number: 20130238339
    Abstract: Techniques that enable a user to select, from among multiple languages, a language to be used for performing text-to-speech conversion. In some embodiments, upon determining that multiple languages may be used to perform text-to-speech conversion for a portion of text, the multiple languages may be displayed to the user. The user may then select a particular language to be used from the multiple languages. The portion of text may then be converted to speech in the user-selected language.
    Type: Application
    Filed: March 6, 2012
    Publication date: September 12, 2013
    Applicant: Apple Inc.
    Inventors: Christopher Brian Fleizach, Darren C. Minifie
  • Publication number: 20130218566
    Abstract: The text-to-speech audio HIP technique described herein in some embodiments uses different correlated or uncorrelated words or sentences generated via a text-to-speech engine as audio HIP challenges. The technique can apply different effects in the text-to-speech synthesizer speaking a sentence to be used as a HIP challenge string. The different effects can include, for example, spectral frequency warping; vowel duration warping; background addition; echo addition; and varying the time duration between words, among others. In some embodiments the technique varies the set of parameters to prevent using Automated Speech Recognition tools from using previously used audio HIP challenges to learn a model which can then be used to recognize future audio HIP challenges generated by the technique. Additionally, in some embodiments the technique introduces the requirement of semantic understanding in HIP challenges.
    Type: Application
    Filed: February 17, 2012
    Publication date: August 22, 2013
    Applicant: MICROSOFT CORPORATION
    Inventors: Yao Qian, Frank Kao-Ping Soong, Bin Benjamin Zhu
  • Publication number: 20130211833
    Abstract: Techniques for overlaying a custom interface onto an existing kiosk interface are provided. An event is detected that triggers a kiosk to process an agent that overlays, and without modifying, the kiosk's existing interface. The agent alters screen features and visual presentation of the existing interface and provides additional alternative operations for navigating and executing features defined in the existing interface. In an embodiment, the agent provides a custom interface overlaid onto the existing interface to provide a customer-facing interface for individuals that are sight impaired.
    Type: Application
    Filed: February 9, 2012
    Publication date: August 15, 2013
    Applicant: NCR Corporatioin
    Inventors: Thomas V. Edwards, Daniel Francis Matteo
  • Patent number: 8509408
    Abstract: A text/voice system comprises a device configured to receive an incoming voice call intended for a called party, and detect, in response to receiving the voice call, the current status of the called party on a text messaging system, where the current status may include active or inactive. The device is also configured to establish a communication session between the calling party and the called party via the text messaging system, where speech from the calling party is translated to text and delivered to the called party during the communication session, and responsive text from the called party is translated to speech and delivered as speech to the calling party during the communication session.
    Type: Grant
    Filed: December 15, 2008
    Date of Patent: August 13, 2013
    Assignee: Verizon Patent and Licensing Inc.
    Inventors: Lee N Goodman, Sujin C Chang
  • Publication number: 20130187927
    Abstract: The present invention relates to a computer-implemented method for the automated production of an audiovisual animation, in particular a tutorial video, wherein the method comprises the following steps: a. obtaining a slide show created using a presentation program, wherein the slide show comprises one or more graphic images and one or more portions of text; b. automatically inserting one or more entry animations for the one or more graphic images into the slide show; c. automatically generating one or more speech sequences based on the one or more portions of text and inserting the one or more speech sequences into the slide show; and d. exporting the slide show to produce the audiovisual animation.
    Type: Application
    Filed: January 27, 2012
    Publication date: July 25, 2013
    Inventor: Rüdiger Weinmann
  • Publication number: 20130179170
    Abstract: Technologies are described herein for providing validated text-to-speech correction hints from aggregated pronunciation corrections received from text-to-speech applications. A number of pronunciation corrections are received by a Web service. The pronunciation corrections may be provided by users of text-to-speech applications executing on a variety of user computer systems. Each of the plurality of pronunciation corrections includes a specification of a word or phrase and a suggested pronunciation provided by the user. The pronunciation corrections are analyzed to generate validated correction hints, and the validated correction hints are provided back to the text-to-speech applications to be used to correct pronunciation of words and phrases in the text-to-speech applications.
    Type: Application
    Filed: January 9, 2012
    Publication date: July 11, 2013
    Applicant: Microsoft Corporation
    Inventors: Jeremy Edward Cath, Timothy Edwin Harris, James Oliver Tisdale, III
  • Publication number: 20130166303
    Abstract: A computer-implemented method includes receiving, in a computer system, a user query comprising at least a first term, parsing the user query to at least determine whether the user query assigns a field to the first term, the parsing resulting in a parsed query that conforms to a predefined format, performing a search in a metadata repository using the parsed query, the metadata repository embodied in a computer readable medium and including triplets generated based on multiple modes of metadata for video content, the search identifying a set of candidate scenes from the video content, ranking the set of candidate scenes according to a scoring metric into a ranked scene list, and generating an output from the computer system that includes at least part of the ranked scene list, the output generated in response to the user query.
    Type: Application
    Filed: November 13, 2009
    Publication date: June 27, 2013
    Applicant: ADOBE SYSTEMS INCORPORATED
    Inventors: Walter Chang, Michael J. Welch
  • Patent number: 8442429
    Abstract: While performing a function, a mobile device identifies that it is idle while it is downloading content or performing another task. During that idle time, it gathers one or more parameters (e.g., location, time, gender of user, age of user, etc.) and sends a request for an audio message (e.g., audio advertisement). One or more servers at a remote facility receive the request with the one or more parameters, and use the parameters to identify a targeted message. In some cases, the targeted message will include one or more dynamic variables (e.g., distance to store, time to event, etc.) that will be replaced based on the parameters received from the mobile device, so that the audio message is dynamically updated and customized for the mobile device. In one embodiment, the targeted message is transmitted to the mobile device as text. After being received at the mobile device, the text is optionally displayed and converted to an audio format and played for the user.
    Type: Grant
    Filed: April 6, 2010
    Date of Patent: May 14, 2013
    Inventor: Andre F. Hawit
  • Publication number: 20130117021
    Abstract: A method and system uses an integration application to extract an information feature from a message and to provide the information feature to a vehicle interface device which acts on the information feature to provide a service. The extracted information feature may be automatically acted upon, or may be outputted for review, editing, and/or selection before being acted on. The vehicle interface device may include a navigation system, infotainment system, telephone, and/or a head unit. The message may be received by the vehicle interface device or from a portable or remote device in linked communication with the vehicle interface device. The message may be a voice-based or text-based message. The service may include placing a call, sending a message, or providing navigation instructions using the information feature. An off-board or back-end service provider in communication with the integration application may extract and/or transcribe the information feature and/or provide a service.
    Type: Application
    Filed: October 31, 2012
    Publication date: May 9, 2013
    Applicant: GM Global Technolog Operations LLC
    Inventor: GM Global Technology Operations LLC
  • Publication number: 20130117019
    Abstract: A remote laboratory gateway enables a plurality of students to access and control a laboratory experiment remotely. Access is provided by an experimentation gateway, which is configured to provide secure access to the experiment via a network-centric, web-enabled interface graphical user interface. Experimental hardware is directly controlled by an experiment controller, which is communicatively coupled to the experimentation gateway and which may be a software application, a standalone computing device, or a virtual machine hosted on the experimentation gateway. The remote laboratory of the present specification may be configured for a software-as-a-service business model.
    Type: Application
    Filed: November 7, 2011
    Publication date: May 9, 2013
    Inventors: David Akopian, Arsen Melkonyan, Murillo Pontual, Grant Huang, Andreas Robert Gampe
  • Publication number: 20130117025
    Abstract: An apparatus for displaying an image in a portable terminal includes a camera to photograph the image, a touch screen to display the image and to allow selecting an object area of the displayed image, a memory to store the image, a controller to detect at least one object area within the image when displaying the image of the camera or the memory and to recognize object information of the detected object area to be converted into a voice, and an audio processing unit to output the voice.
    Type: Application
    Filed: October 24, 2012
    Publication date: May 9, 2013
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventor: Samsung Electronics Co., Ltd.
  • Publication number: 20130080174
    Abstract: In an embodiment, a retrieving device includes: a text input unit, a first extracting unit, a retrieving unit, a second extracting unit, an acquiring unit, and a selecting unit. The text input unit inputs a text including unknown word information representing a phrase that a user was unable to transcribe. The first extracting unit extracts related words representing a phrase related to the unknown word information among phrases other than the unknown word information included in the text. The retrieving unit retrieves a related document representing a document including the related words. The second extracting unit extracts candidate words representing candidates for the unknown word information from a plurality of phrases included in the related document. The acquiring unit acquires reading information representing estimated pronunciation of the unknown word information. The selecting unit selects at least one of candidate word of which pronunciation is similar to the reading information.
    Type: Application
    Filed: June 20, 2012
    Publication date: March 28, 2013
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Osamu Nishiyama, Nobuhiro Shimogori, Tomoo Ikeda, Kouji Ueno, Hirokazu Suzuki, Manabu Nagao
  • Publication number: 20130080173
    Abstract: A method and system of speech synthesis. A text input is received in a text-to-speech system and, using a processor of the system, the text input is processed into synthesized speech which is established as unintelligible. The text input is reprocessed into subsequent synthesized speech and output to a user via a loudspeaker to correct the unintelligible synthesized speech. In one embodiment, the synthesized speech can be established as unintelligible by predicting intelligibility of the synthesized speech, and determining that the predicted intelligibility is lower than a minimum threshold. In another embodiment, the synthesized speech can be established as unintelligible by outputting the synthesized speech to the user via the loudspeaker, and receiving an indication from the user that the synthesized speech is not intelligible.
    Type: Application
    Filed: September 27, 2011
    Publication date: March 28, 2013
    Applicant: GENERAL MOTORS LLC
    Inventors: Gaurav Talwar, Rathinavelu Chengalvarayan
  • Publication number: 20130080172
    Abstract: A method of evaluating attributes of synthesized speech. The method includes processing a text input into a synthesized speech utterance using a processor of a text-to-speech system, applying a human speech utterance to a speech model to obtain a reference wherein the human speech utterance corresponds to the text input, applying the synthesized speech utterance to at least one of the speech model or an other speech model to obtain a test, and calculating a difference between the test and the reference. The method also can be used in a speech synthesis method.
    Type: Application
    Filed: September 22, 2011
    Publication date: March 28, 2013
    Applicant: GENERAL MOTORS LLC
    Inventors: Gaurav Talwar, Xufang Zhao
  • Publication number: 20130060573
    Abstract: In some embodiments, disclosed is reading device that comprises a camera, at least one processor, and a user interface. The camera scans at least a portion of a document having text to generate a raster file. The processor processes the raster file to identify text blocks. The user interface allows a user to hierarchically navigate the text blocks when they are read to the user.
    Type: Application
    Filed: July 30, 2012
    Publication date: March 7, 2013
    Applicant: INTEL-GE CARE INNOVATIONS LLC
    Inventors: Gretchen Anderson, Jeff Witt, Ben Foss, JM Van Thong
  • Publication number: 20130035940
    Abstract: The invention provides an electrolaryngeal speech reconstruction method and a system thereof. Firstly, model parameters are extracted from the collected speech as a parameter library, then facial images of a speaker are acquired and then transmitted to an image analyzing and processing module to obtain the voice onset and offset times and the vowel classes, then a waveform of a voice source is synthesized by a voice source synthesis module, finally, the waveform of the above voice source is output by an electrolarynx vibration output module, wherein the voice source synthesis module firstly sets the model parameters of a glottal voice source so as to synthesize the waveform of the glottal voice source, and then a waveguide model is used to simulate sound transmission in a vocal tract and select shape parameters of the vocal tract according to the vowel classes.
    Type: Application
    Filed: September 4, 2012
    Publication date: February 7, 2013
    Applicant: XI'AN JIAOTONG UNIVERITY
    Inventors: MINGXI WAN, LIANG WU, SUPIN WANG, ZHIFENG NIU, CONGYING WAN
  • Publication number: 20130024189
    Abstract: A mobile terminal and a control method thereof are provided. The mobile terminal includes: an audio output module; a memory storing text; and a controller configured to convert at least a portion of the text into a speech and output the speech through the audio output module, wherein the controller stores at least a portion of speech data obtained by converting the at least a portion of the text into the speech in the memory, and outputs the speech based on the stored speech data to the audio output module when a speech output signal with respect to the at least portion of the text is obtained. When speech output signal with respect to a portion which has been output by speech is obtained, speech is output based on the stored speech data, thereby shortening time required for outputting the speech.
    Type: Application
    Filed: September 22, 2011
    Publication date: January 24, 2013
    Applicant: LG ELECTRONICS INC.
    Inventors: Jaemin KIM, Seungho HAN, Yongchul PARK
  • Publication number: 20130013312
    Abstract: A system and method for improving the response time of text-to-speech synthesis using triphone contexts. The method includes receiving input text, selecting a plurality of N phoneme units from a triphone unit selection database as candidate phonemes for synthesized speech based on the input text, wherein the triphone unit selection database comprises triphone units each comprising three phones and if the candidate phonemes are available in the triphone unit selection database, and applying a cost process to select a set of phonemes from the candidate phonemes. If so candidate phonemes are available in the triphone unit selection database, the method includes applying a single phoneme approach to select single phonemes for synthesis, the single phonemes used in synthesis independent of a triphone structure.
    Type: Application
    Filed: July 16, 2012
    Publication date: January 10, 2013
    Applicant: AT&T Intellectual Property II, L.P.
    Inventor: Alistair D. Conkie
  • Publication number: 20130013314
    Abstract: A mobile computing apparatus comprises a processing resource arranged to support, when in use, an operational environment, the operational environment supporting receipt of textual content, a workload estimator arranged to estimate a cognitive workload for a user, and a text-to-speech engine. The text-to-speech engine is arranged to translate at least part of the received textual content to a signal reproducible as audible speech in accordance with a predetermined relationship between the amount of the textual content to be translated and a cognitive workload level in a range of cognitive workload levels, the range of cognitive workload levels comprising at least one cognitive workload level between end values.
    Type: Application
    Filed: July 6, 2012
    Publication date: January 10, 2013
    Applicant: TOMTOM INTERNATIONAL B.V.
    Inventor: Breght Roderick Boschker
  • Publication number: 20130013313
    Abstract: A method, system and computer program product are provided for enhancement of speech synthesized by a statistical text-to-speech (TTS) system employing a parametric representation of speech in a space of acoustic feature vectors. The method includes: defining a parametric family of corrective transformations operating in the space of the acoustic feature vectors and dependent on a set of enhancing parameters; and defining a distortion indictor of a feature vector or a plurality of feature vectors.
    Type: Application
    Filed: July 7, 2011
    Publication date: January 10, 2013
    Applicant: International Business Machines Corporation
    Inventors: Slava Shechtman, Alexander Sorin
  • Publication number: 20120316873
    Abstract: A method of providing information of a mobile communication terminal, and a mobile communication terminal for performing the method, are provided. The method includes determining whether a search command event has been generated during a call with a counterpart terminal, converting a voice signal received from a microphone into a text when the generation of search command event is determined to have been generated, identifying information matching the text in a memory, and sending the information to the counterpart terminal.
    Type: Application
    Filed: December 20, 2011
    Publication date: December 13, 2012
    Applicant: SAMSUNG ELECTRONICS CO. LTD.
    Inventor: Yong Ho YOU
  • Publication number: 20120310650
    Abstract: In a voice synthesis apparatus, a phoneme piece interpolator acquires first phoneme piece data corresponding to a first value of sound characteristic, and second phoneme piece data corresponding to a second value of the sound characteristic. The first and second phoneme piece data indicate a spectrum of each frame of a phoneme piece. The phoneme piece interpolator interpolates between each frame of the first phoneme piece data and each frame of the second phoneme piece data so as to create phoneme piece data of the phoneme piece corresponding to a target value of the sound characteristic which is different from either of the first and second values of the sound characteristic. A voice synthesizer generates a voice signal having the target value of the sound characteristic based on the created phoneme piece data.
    Type: Application
    Filed: May 24, 2012
    Publication date: December 6, 2012
    Applicant: YAMAHA CORPORATION
    Inventors: Jordi BONADA, Merlijn BLAAUW, Makoto Tachibana
  • Publication number: 20120310649
    Abstract: Techniques are provided for creating a mapping that maps locations in audio data (e.g., an audio book) to corresponding locations in text data (e.g., an e-book). Techniques are provided for using a mapping between audio data and text data, whether the mapping is created automatically or manually. A mapping may be used for bookmark switching where a bookmark established in one version of a digital work (e.g., e-book) is used to identify a corresponding location with another version of the digital work (e.g., an audio book). Alternatively, the mapping may be used to play audio that corresponds to text selected by a user. Alternatively, the mapping may be used to automatically highlight text in response to audio that corresponds to the text being played. Alternatively, the mapping may be used to determine where an annotation created in one media context (e.g., audio) will be consumed in another media context.
    Type: Application
    Filed: October 6, 2011
    Publication date: December 6, 2012
    Applicant: APPLE INC.
    Inventors: Alan C. Cannistraro, Gregory S. Robbin, Casey M. Dougherty, Raymond Walsh, Melissa Breglio Hajj
  • Publication number: 20120296654
    Abstract: Method and apparatus that dynamically adjusts operational parameters of a text-to-speech engine in a speech-based system. A voice engine or other application of a device provides a mechanism to alter the adjustable operational parameters of the text-to-speech engine. In response to one or more environmental conditions, the adjustable operational parameters of the text-to-speech engine are modified to increase the intelligibility of synthesized speech.
    Type: Application
    Filed: May 18, 2012
    Publication date: November 22, 2012
    Inventors: James Hendrickson, Debra Drylie Scott, Duane Littleton, John Pecorari, Arkadiusz Slusarczyk
  • Publication number: 20120278082
    Abstract: The invention is directed to combining web browser and audio player functionality for the organization and consumption of web documents. Specifically, the invention identifies a set of web documents via a web browser, extracts content from the web documents, and adds the set of web documents to a playlist. In this way, users can build a playlist of web documents and utilize the functionality and convenience of an audio player and listen to the content of the playlist.
    Type: Application
    Filed: April 27, 2012
    Publication date: November 1, 2012
    Applicant: CHARMTECH LABS LLC
    Inventors: Yevgen Borodin, Alexander Dimitriyadi, Yury Puzis, Faisal Ahmed, Valentyn Melnyk
  • Publication number: 20120265533
    Abstract: Text can be obtained at a device from various forms of communication such as e-mails or text messages. Metadata can be obtained directly from the communication or from a secondary source identified by the directly obtained metadata. The metadata can be used to create a speaker profile. The speaker profile can be used to select voice data. The selected voice data can be used by a text-to-speech (TTS) engine to produce speech output having voice characteristics that best match the speaker profile.
    Type: Application
    Filed: April 18, 2011
    Publication date: October 18, 2012
    Applicant: APPLE INC.
    Inventor: Jonathan David Honeycutt
  • Publication number: 20120253815
    Abstract: A range of unified software authoring tools for creating a talking paper application for integration in an end user platform are described herein. The authoring tools are easy to use and are interoperable to provide an easy and cost-effective method of creating a talking paper application. The authoring tools provide a framework for creating audio content and image content and interactively linking the audio content and the image content. The authoring tools also provide for verifying the interactively linked audio and image content, reviewing the audio content, the image content and the interactive linking on a display device. Finally, the authoring tools provide for saving the audio content, the video content and the interactive linking for publication to a manufacturer for integration in an end user platform or talking paper platform.
    Type: Application
    Filed: June 11, 2012
    Publication date: October 4, 2012
    Applicant: MICROSOFT CORPORATION
    Inventors: Kentaro Toyama, Gerald Chu, Ravin Balakrishnan
  • Publication number: 20120253814
    Abstract: A system and method for aggregating text-based content and presenting the text-based content as spoken audio is described herein, where a server module retrieves and aggregates web content from web content providers that may include text-based web content that is then extracted, filtered and categorizes for a client module to retrieve and play as spoken audio.
    Type: Application
    Filed: December 2, 2011
    Publication date: October 4, 2012
    Applicant: Harman International (Shanghai) Management Co., Ltd.
    Inventors: Charles Chuanming Wang, Yong Ling
  • Publication number: 20120239404
    Abstract: An acquisition unit analyzes a text, and acquires phonemic and prosodic information. An editing unit edits a part of the phonemic and prosodic information. A speech synthesis unit converts the phonemic and prosodic information before editing the part to a first speech waveform, and converts the phonemic and prosodic information after editing the part to a second speech waveform. A period calculation unit calculates a contrast period corresponding to the part in the first speech waveform and the second speech waveform. A speech generation unit generates an output waveform by connecting a first partial waveform and a second partial waveform. The first partial waveform contains the contrast period of the first speech waveform. The second partial waveform contains the contrast period of the second speech waveform.
    Type: Application
    Filed: September 19, 2011
    Publication date: September 20, 2012
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventor: Osamu Nishiyama
  • Publication number: 20120239406
    Abstract: The present invention relates to a method for synthesizing a speech signal; comprising obtaining a speech sequence input signal comprising semantic content corresponding to a speaker's utterance; analyzing the input speech sequence signal to obtain a first sequence of feature vectors for the input speech sequence signal; synthesizing a second sequence of feature vectors different from and based on the first sequence of feature vectors; generating an excitation signal and filtering the excitation signal based on the second sequence of feature vectors to obtain a synthesized speech signal wherein the semantic content is obfuscated.
    Type: Application
    Filed: December 2, 2009
    Publication date: September 20, 2012
    Inventors: Johan Nikolaas Langehoveen Brummer, Avery Maxwell Glasser, Luis Buera Rodriquez
  • Publication number: 20120232907
    Abstract: A system and method for delivering a Human Interactive Proof, or reverse Turing test to the visually impaired; said test comprising a method for restricting access to a computer system, resource, or network to live persons, and for preventing the execution of automated scripts via an interface intended for human interaction. When queried for access to a protected resource, the system will respond with a challenge requiring unknown petitioners to solve an auditory puzzle before proceeding, said puzzle consisting of an audio waveform representative of the names or descriptions of a collection of apparently random objects. The subject of the test must either recognize a semantic or symbolic association between two or more objects, or isolate an object that does not belong with the others, indicating their selection by typing the name of the object with their keyboard.
    Type: Application
    Filed: March 9, 2011
    Publication date: September 13, 2012
    Inventor: Christopher Liam Ivey
  • Publication number: 20120226501
    Abstract: A document navigation tool that automatically navigates a document based on previous input from the user. The document navigation tool is utilized each time a page loads. The method recognizes user behavior on pages using patterns, which are based on four criterion: location, frequency, consistency, and scope. If the user has visited the page previously and has established a pattern, the method automatically focuses on the portion of the page indicated by the pattern, e.g. the location on a web page of the link clicked by the user in the user's last three visits to the page. If the user has not visited the page previously, the method logs the events that occur during this visit to the page.
    Type: Application
    Filed: May 14, 2012
    Publication date: September 6, 2012
    Applicant: FREEDOM SCIENTIFIC, INC.
    Inventors: Robert Gallo, Glen Gordon
  • Publication number: 20120226499
    Abstract: Methods of adding data identifiers and speech/voice recognition functionality are disclosed. A telnet client runs one or more scripts that add data identifiers to data fields in a telnet session. The input data is inserted in the corresponding fields based on data identifiers. Scripts run only on the telnet client without modifications to the server applications. Further disclosed are methods for providing speech recognition and voice functionality to telnet clients. Portions of input data are converted to voice and played to the user. A user also may provide input to certain fields of the telnet session by using his voice. Scripts running on the telnet client convert the user's voice into text and is inserted to corresponding fields.
    Type: Application
    Filed: May 9, 2012
    Publication date: September 6, 2012
    Applicant: WAVELINK CORPORATION
    Inventors: LAMAR JOHN VAN WAGENEN, BRANT DAVID THOMSEN, SCOTT ALLEN CADDES
  • Publication number: 20120226500
    Abstract: A system and method for capturing a voice information and using the voice information to modulate a content output signal. The method for capturing voice information includes receiving a request to create speech modulation and presenting a piece of textual content operable for use in creating the speech modulation based on the textual input. The method further includes receiving a first voice sample and determining a voice fingerprint based on said first voice sample. The voice fingerprint is operable for modulating speech during content rendering (e.g., audio output) such that a synthetic narration is performed based on the textual input. The voice fingerprint may then be stored and used for modulating the output.
    Type: Application
    Filed: March 2, 2011
    Publication date: September 6, 2012
    Applicant: SONY CORPORATION
    Inventors: Guru Balasubramanian, Kalyana Srinivas Kota, Utkarsh Pandya
  • Publication number: 20120221321
    Abstract: Appropriate processing results or appropriate apparatuses can be selected with a control device that selects the most probable speech recognition result by using speech recognition scores received with speech recognition results from two or more speech recognition apparatuses; sends the selected speech recognition result to two or more translation apparatuses respectively; selects the most probable translation result by using translation scores received with translation results from the two or more translation apparatuses; sends the selected translation result to two or more speech synthesis apparatuses respectively; receives a speech synthesis processing result including a speech synthesis result and a speech synthesis score from each of the two or more speech synthesis apparatuses; selects the most probable speech synthesis result by using the scores; and sends the selected speech synthesis result to a second terminal apparatus.
    Type: Application
    Filed: March 3, 2010
    Publication date: August 30, 2012
    Inventors: Satoshi Nakamura, Eiichiro Sumita, Yutaka Ashikari, Noriyuki Kimura, Chiori Hori
  • Publication number: 20120221338
    Abstract: A custom-content audible representation of selected data content is automatically created for a user. The content is based on content preferences of the user (e.g., one or more web browsing histories). The content is aggregated, converted using text-to-speech technology, and adapted to fit in a desired length selected for the personalized audible representation. The length of the audible representation may be custom for the user, and may be determined based on the amount of time the user is typically traveling.
    Type: Application
    Filed: February 25, 2011
    Publication date: August 30, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Eli M. Dow, Marie R. Laser, Sarah J. Sheppard, Jessie Yu
  • Publication number: 20120221340
    Abstract: Methods of adding data identifiers and speech/voice recognition functionality are disclosed. A telnet client runs one or more scripts that add data identifiers to data fields in a telnet session. The input data is inserted in the corresponding fields based on data identifiers. Scripts run only on the telnet client without modifications to the server applications. Further disclosed are methods for providing speech recognition and voice functionality to telnet clients. Portions of input data are converted to voice and played to the user. A user also may provide input to certain fields of the telnet session by using his voice. Scripts running on the telnet client convert the user's voice into text and is inserted to corresponding fields.
    Type: Application
    Filed: May 9, 2012
    Publication date: August 30, 2012
    Applicant: WAVELINK CORPORATION
    Inventors: Lamar John VAN WAGENEN, Brant David THOMSEN, Scott Allen CADDES
  • Publication number: 20120215540
    Abstract: The present invention relates to a method for selecting and downloading content from a content provider which is accessible via an IP/DNS/URL address to a mobile device, the content being any text information data, for converting the text information data to at least one audio message and for storing the at least one audio message as at least one audio file on the mobile device, wherein the at least one audio file is playable and discernable as a music file. Said method implemented on a mobile phone enables controlling and playing the audio messages as music files, for instance also in a car environment with a car kit enabling a control and a selection of one or more of said at least one audio files for playing from the mobile phone.
    Type: Application
    Filed: February 14, 2012
    Publication date: August 23, 2012
    Applicant: Beyo GmbH
    Inventor: Cüneyt Göktekin
  • Publication number: 20120209611
    Abstract: A synthesis filter 106 synthesizes a plurality of wide-band speech signals by combining wide-band phoneme signals and sound source signals from a speech signal code book 105, and a distortion evaluation unit 107 selects one of the wide-band speech signals with a minimum waveform distortion with respect to an up-sampled narrow-band speech signal output from a sampling conversion unit 101. A first bandpass filter 103 extracts a frequency component outside a narrow-band of the wide-band speech signal and a band synthesis unit 104 combines it with the up-sampled narrow-band speech signal.
    Type: Application
    Filed: October 22, 2010
    Publication date: August 16, 2012
    Applicant: Mitsubishi Electric Corporation
    Inventors: Satoru Furuta, Hirohisa Tasaki
  • Publication number: 20120203554
    Abstract: In one general aspect, emergency information for a person is received from a user. A unique identifier for the person is generated. The unique identifier is associated with the emergency information. The emergency information is stored on an emergency information device. The unique identifier is associated with the emergency information device. The emergency information device is sent to the user.
    Type: Application
    Filed: April 19, 2012
    Publication date: August 9, 2012
    Inventor: Linda Dougherty-Clark
  • Publication number: 20120197646
    Abstract: A system and method for processing voice requests from a user for accessing information on a computerized network and delivering information from a script server and an audio server in the network in audio format. A voice user interface subsystem includes: a dialog engine that is operable to interpret requests from users from the user input, communicate the requests to the script server and the audio server, and receive information from the script server and the audio server; a media telephony services (MTS) server, wherein the MTS server is operable to receive user input via a telephony system, and to transfer the user input to the dialog engine; and a broker coupled between the dialog engine and the MTS server. The broker establishes a session between the MTS server and the dialog engine and controls telephony functions with the telephony system.
    Type: Application
    Filed: April 16, 2012
    Publication date: August 2, 2012
    Applicant: Ben Franklin Patent Holding, LLC
    Inventors: Marianna TESSEL, Danny Lange, Eugene Ponomarenko, Mitsuru Oshima, Daniel Burkes, Tjoen Min Tjong
  • Publication number: 20120190407
    Abstract: Provided is portable electronic equipment capable of mutually converting character information and simplified character information. The portable electronic equipment is equipped with a display unit (21); a character information acquisition unit (41) that acquires character information; a trigger signal detection unit (42) that detects a prescribed trigger signal; a character information conversion unit (43) that simplifies character information by extracting sentence elements from the character information and rearranging the sentence elements into a prescribed order or simplifies the character information by replacing prescribed words in the character information with symbols pertaining to said words, when the trigger signal is detected by the trigger signal detection unit (42); and a display control unit (44) that displays on the display unit (21) the character information simplified by the character information conversion unit (43).
    Type: Application
    Filed: July 26, 2010
    Publication date: July 26, 2012
    Applicant: KYOCERA CORPORATION
    Inventors: Atsushi Miura, Yasumasa Sekigami, Shuuji Ishikawa