Speech Synthesis; Text To Speech Systems (epo) Patents (Class 704/E13.001)

E Subclasses

Methods for producing synthetic speech; speech synthesizers (epo) (Class 704/E13.002)

Concept-to-speech synthesizers; generation of natural phrases not from text but from machine-based concepts (EPO) (Class 704/E13.003)
Sound editing, manipulating voice of the synthesizer (EPO) (Class 704/E13.004)

Details of speech synthesis systems, e.g., synthesizer architecture, memory management, etc. (epo) (Class 704/E13.005)

Elementary speech units used in speech synthesizers; concatenation rules (epo) (Class 704/E13.009)

Concatenation (EPO) (Class 704/E13.01)

Text analysis, generation of parameters for speech synthesis out of text, e.g., grapheme to phoneme translation, prosody generation, stress, or intonation determination, etc. (epo) (Class 704/E13.011)

Method for controlling a manipulation robot and device implementing such a method

Patent number: 11541531

Abstract: A method for controlling a robot (1) for handling a part to be handled (14), the handling robot (1) being linked to a control interface comprising a glove (40) comprising a first finger (41) provided with a first contact sensor (42) and a second finger (43) provided with a second contact sensor (44), the method comprising the following steps; a) associating, in a signal library (25), a first and a second recorded combination of signals (26, 21); b) acquiring a combination of signals originating from the sensors (26, 27) of the glove (40); c) comparing the acquired combination of signals with the recorded combinations (27, 28, 29) in the library (25); d) controlling the handling robot (1) in such a way as to perform a movement according to the velocity vector associated with the acquired combination of signals. A handling glove (40) and handling device implementing the method.

Type: Grant

Filed: November 13, 2018

Date of Patent: January 3, 2023

Assignee: COMMISSARIAT A L'ENERGIE ATOMIQUE ET AUX ENERGIES ALTERNATIVES

Inventor: Franck Geffard
Multi-modal communications with conferencing and clients

Patent number: 8675854

Abstract: A system and method for merging multi-modal communications are disclosed. The multi-modal communications can be synchronous, asynchronous and semi-synchronous. By way of a non-limiting example, at least two devices operating with varied modalities can be connected to a conferencing appliance. The conferencing appliance can integrate the differing modalities from the at least two devices by executing at least one of turn taking, conference identification, participant identification, ordering of interjections, modulation of meaning, expectation of shared awareness, floor domination and combination thereof.

Type: Grant

Filed: May 1, 2012

Date of Patent: March 18, 2014

Assignee: Mitel Networks Corporation

Inventors: Alain Michaud, Trung (Tim) Trinh, Tom Gray
USING THE ABILITY TO SPEAK AS A HUMAN INTERACTIVE PROOF

Publication number: 20140039892

Abstract: In one embodiment, a human interactive proof portal 140 may use a biometric input to determine whether a user is a standard user or a malicious actor. The human interactive proof portal 140 may receive an access request 302 for an online data service 122 from a user device 110. The human interactive proof portal 140 may send a proof challenge 304 to the user device 110 for presentation to a user. The human interactive proof portal 140 may receive from the user device 110 a proof response 306 having a biometric metadata description 430 based on a biometric input from the user.

Type: Application

Filed: August 2, 2012

Publication date: February 6, 2014

Applicant: Microsoft Corporation

Inventors: Chad Mills, Robert Sim, Scott Laufer, Sung Chung
BLENDING RECORDED SPEECH WITH TEXT-TO-SPEECH OUTPUT FOR SPECIFIC DOMAINS

Publication number: 20140019134

Abstract: A text-to-speech (TTS) engine combines recorded speech with synthesized speech from a TTS synthesizer based on text input. The TTS engine receives the text input and identifies the domain for the speech (e.g. navigation, dialing, . . . ). The identified domain is used in selecting domain specific speech recordings (e.g. pre-recorded static phrases such as “turn left”, “turn right” . . . ) from the input text. The speech recordings are obtained based on the static phrases for the domain that are identified from the input text. The TTS engine blends the static phrases with the TTS output to smooth the acoustic trajectory of the input text. The prosody of the static phrases is used to create similar prosody in the TTS output.

Type: Application

Filed: July 12, 2012

Publication date: January 16, 2014

Applicant: Microsoft Corporation

Inventors: Sheng Zhao, Peng Wang, Difei Gao, Yijian Wu, Binggong Ding, Shenghua Ye, Max Leung
PRIVACY GENERATION

Publication number: 20140003596

Abstract: For generating privacy, a detection module detects an optical lingual cue from user speech that generates an audible signal. A generation module transmits an inverse audible signal generated from the optical lingual cue.

Type: Application

Filed: June 28, 2012

Publication date: January 2, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Robert T. Arenburg, Franck Barillaud, Shiv Dutta, Alfredo V. Mendoza
AUDIO ANIMATION METHODS AND APPARATUS

Publication number: 20130332167

Abstract: According to some aspects, a method of providing an interactive audio presentation, at least in part, by traversing a plurality of audio animations, each audio animation comprising a plurality of frames, each of the plurality of frames comprising a duration, at least one audio element, and at least one gate indicating criteria for transitioning to and identification of a subsequent frame and/or a subsequent animation is provided. The method comprises rendering a first audio animation, receiving input from the user associated with the presentation, selecting a second audio animation based, at least in part, on the input, and rendering the second audio animation. Some aspects include a system for to performing the above method and some aspects include a computer readable medium storing instructions that perform the above method when executed by at least one processor.

Type: Application

Filed: June 12, 2012

Publication date: December 12, 2013

Applicant: Nuance Communications, Inc.

Inventor: Robert M. Kilgore
Indicating A Page Number Of An Active Document Page Within A Document

Publication number: 20130253935

Abstract: Methods, apparatuses, and computer program products for indicating a page number of an active document page within a document are provided. Embodiments include detecting, by a presentation controller, activation of a document page on a presentation device; in response to detecting the activation of the document page on the presentation device, tracking, by the presentation controller, an amount of time that the document page is consecutively active on the presentation device; determining, by the presentation controller, that the amount of time that the document page is consecutively active on the presentation device exceeds a predetermined threshold; and in response to determining that the predetermined threshold has been exceeded, providing to a target source, by the presentation controller, an output indicating a page number of the document page while the document page is active on the presentation device.

Type: Application

Filed: March 23, 2012

Publication date: September 26, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Raghuswamyreddy Gundam, Newton P. Liu, Douglas W. Oliver, Terence Rodrigues, Wingcheung Tam
USER INTERFACE FOR PRODUCING AUTOMATED MEDICAL REPORTS AND A METHOD FOR UPDATING FIELDS OF SUCH INTERFACE ON THE FLY

Publication number: 20130246067

Abstract: A system for producing automated medical reports. The interface includes a menu area and a medical report area which is distinct from the menu area. The menu area includes a list of names representing medical conditions. The doctor may make different selections of names from the menu area as the medical service is being rendered to build a report in the medical report area. If a medical condition is not listed in the menu area, the doctor may add a new field for it and select/enter a name and a descriptor for the new field. Whereby, the field is automatically added in the menu area, and the name is automatically displayed in the new field without exiting the report/interface. Upon receiving a user selection of the new name, the descriptor associated therewith is retrieved from the memory and added in the medical report area without exiting the report/interface.

Type: Application

Filed: March 15, 2012

Publication date: September 19, 2013

Applicant: Sylvain Mailhot, Pathologiste SPRCP inc

Inventor: Sylvain Mailhot
HANDLING SPEECH SYNTHESIS OF CONTENT FOR MULTIPLE LANGUAGES

Publication number: 20130238339

Abstract: Techniques that enable a user to select, from among multiple languages, a language to be used for performing text-to-speech conversion. In some embodiments, upon determining that multiple languages may be used to perform text-to-speech conversion for a portion of text, the multiple languages may be displayed to the user. The user may then select a particular language to be used from the multiple languages. The portion of text may then be converted to speech in the user-selected language.

Type: Application

Filed: March 6, 2012

Publication date: September 12, 2013

Applicant: Apple Inc.

Inventors: Christopher Brian Fleizach, Darren C. Minifie
AUDIO HUMAN INTERACTIVE PROOF BASED ON TEXT-TO-SPEECH AND SEMANTICS

Publication number: 20130218566

Abstract: The text-to-speech audio HIP technique described herein in some embodiments uses different correlated or uncorrelated words or sentences generated via a text-to-speech engine as audio HIP challenges. The technique can apply different effects in the text-to-speech synthesizer speaking a sentence to be used as a HIP challenge string. The different effects can include, for example, spectral frequency warping; vowel duration warping; background addition; echo addition; and varying the time duration between words, among others. In some embodiments the technique varies the set of parameters to prevent using Automated Speech Recognition tools from using previously used audio HIP challenges to learn a model which can then be used to recognize future audio HIP challenges generated by the technique. Additionally, in some embodiments the technique introduces the requirement of semantic understanding in HIP challenges.

Type: Application

Filed: February 17, 2012

Publication date: August 22, 2013

Applicant: MICROSOFT CORPORATION

Inventors: Yao Qian, Frank Kao-Ping Soong, Bin Benjamin Zhu
TECHNIQUES FOR OVERLAYING A CUSTOM INTERFACE ONTO AN EXISTING KIOSK INTERFACE

Publication number: 20130211833

Abstract: Techniques for overlaying a custom interface onto an existing kiosk interface are provided. An event is detected that triggers a kiosk to process an agent that overlays, and without modifying, the kiosk's existing interface. The agent alters screen features and visual presentation of the existing interface and provides additional alternative operations for navigating and executing features defined in the existing interface. In an embodiment, the agent provides a custom interface overlaid onto the existing interface to provide a customer-facing interface for individuals that are sight impaired.

Type: Application

Filed: February 9, 2012

Publication date: August 15, 2013

Applicant: NCR Corporatioin

Inventors: Thomas V. Edwards, Daniel Francis Matteo
Voice and text communication system

Patent number: 8509408

Abstract: A text/voice system comprises a device configured to receive an incoming voice call intended for a called party, and detect, in response to receiving the voice call, the current status of the called party on a text messaging system, where the current status may include active or inactive. The device is also configured to establish a communication session between the calling party and the called party via the text messaging system, where speech from the calling party is translated to text and delivered to the called party during the communication session, and responsive text from the called party is translated to speech and delivered as speech to the calling party during the communication session.

Type: Grant

Filed: December 15, 2008

Date of Patent: August 13, 2013

Assignee: Verizon Patent and Licensing Inc.

Inventors: Lee N Goodman, Sujin C Chang
Method and System for Automated Production of Audiovisual Animations

Publication number: 20130187927

Abstract: The present invention relates to a computer-implemented method for the automated production of an audiovisual animation, in particular a tutorial video, wherein the method comprises the following steps: a. obtaining a slide show created using a presentation program, wherein the slide show comprises one or more graphic images and one or more portions of text; b. automatically inserting one or more entry animations for the one or more graphic images into the slide show; c. automatically generating one or more speech sequences based on the one or more portions of text and inserting the one or more speech sequences into the slide show; and d. exporting the slide show to produce the audiovisual animation.

Type: Application

Filed: January 27, 2012

Publication date: July 25, 2013

Inventor: Rüdiger Weinmann
CROWD-SOURCING PRONUNCIATION CORRECTIONS IN TEXT-TO-SPEECH ENGINES

Publication number: 20130179170

Abstract: Technologies are described herein for providing validated text-to-speech correction hints from aggregated pronunciation corrections received from text-to-speech applications. A number of pronunciation corrections are received by a Web service. The pronunciation corrections may be provided by users of text-to-speech applications executing on a variety of user computer systems. Each of the plurality of pronunciation corrections includes a specification of a word or phrase and a suggested pronunciation provided by the user. The pronunciation corrections are analyzed to generate validated correction hints, and the validated correction hints are provided back to the text-to-speech applications to be used to correct pronunciation of words and phrases in the text-to-speech applications.

Type: Application

Filed: January 9, 2012

Publication date: July 11, 2013

Applicant: Microsoft Corporation

Inventors: Jeremy Edward Cath, Timothy Edwin Harris, James Oliver Tisdale, III
ACCESSING MEDIA DATA USING METADATA REPOSITORY

Publication number: 20130166303

Abstract: A computer-implemented method includes receiving, in a computer system, a user query comprising at least a first term, parsing the user query to at least determine whether the user query assigns a field to the first term, the parsing resulting in a parsed query that conforms to a predefined format, performing a search in a metadata repository using the parsed query, the metadata repository embodied in a computer readable medium and including triplets generated based on multiple modes of metadata for video content, the search identifying a set of candidate scenes from the video content, ranking the set of candidate scenes according to a scoring metric into a ranked scene list, and generating an output from the computer system that includes at least part of the ranked scene list, the output generated in response to the user query.

Type: Application

Filed: November 13, 2009

Publication date: June 27, 2013

Applicant: ADOBE SYSTEMS INCORPORATED

Inventors: Walter Chang, Michael J. Welch
System for providing audio messages on a mobile device

Patent number: 8442429

Abstract: While performing a function, a mobile device identifies that it is idle while it is downloading content or performing another task. During that idle time, it gathers one or more parameters (e.g., location, time, gender of user, age of user, etc.) and sends a request for an audio message (e.g., audio advertisement). One or more servers at a remote facility receive the request with the one or more parameters, and use the parameters to identify a targeted message. In some cases, the targeted message will include one or more dynamic variables (e.g., distance to store, time to event, etc.) that will be replaced based on the parameters received from the mobile device, so that the audio message is dynamically updated and customized for the mobile device. In one embodiment, the targeted message is transmitted to the mobile device as text. After being received at the mobile device, the text is optionally displayed and converted to an audio format and played for the user.

Type: Grant

Filed: April 6, 2010

Date of Patent: May 14, 2013

Inventor: Andre F. Hawit
MESSAGE AND VEHICLE INTERFACE INTEGRATION SYSTEM AND METHOD

Publication number: 20130117021

Abstract: A method and system uses an integration application to extract an information feature from a message and to provide the information feature to a vehicle interface device which acts on the information feature to provide a service. The extracted information feature may be automatically acted upon, or may be outputted for review, editing, and/or selection before being acted on. The vehicle interface device may include a navigation system, infotainment system, telephone, and/or a head unit. The message may be received by the vehicle interface device or from a portable or remote device in linked communication with the vehicle interface device. The message may be a voice-based or text-based message. The service may include placing a call, sending a message, or providing navigation instructions using the information feature. An off-board or back-end service provider in communication with the integration application may extract and/or transcribe the information feature and/or provide a service.

Type: Application

Filed: October 31, 2012

Publication date: May 9, 2013

Applicant: GM Global Technolog Operations LLC

Inventor: GM Global Technology Operations LLC
Remote Laboratory Gateway

Publication number: 20130117019

Abstract: A remote laboratory gateway enables a plurality of students to access and control a laboratory experiment remotely. Access is provided by an experimentation gateway, which is configured to provide secure access to the experiment via a network-centric, web-enabled interface graphical user interface. Experimental hardware is directly controlled by an experiment controller, which is communicatively coupled to the experimentation gateway and which may be a software application, a standalone computing device, or a virtual machine hosted on the experimentation gateway. The remote laboratory of the present specification may be configured for a software-as-a-service business model.

Type: Application

Filed: November 7, 2011

Publication date: May 9, 2013

Inventors: David Akopian, Arsen Melkonyan, Murillo Pontual, Grant Huang, Andreas Robert Gampe
APPARATUS AND METHOD FOR REPRESENTING AN IMAGE IN A PORTABLE TERMINAL

Publication number: 20130117025

Abstract: An apparatus for displaying an image in a portable terminal includes a camera to photograph the image, a touch screen to display the image and to allow selecting an object area of the displayed image, a memory to store the image, a controller to detect at least one object area within the image when displaying the image of the camera or the memory and to recognize object information of the detected object area to be converted into a voice, and an audio processing unit to output the voice.

Type: Application

Filed: October 24, 2012

Publication date: May 9, 2013

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventor: Samsung Electronics Co., Ltd.
RETRIEVING DEVICE, RETRIEVING METHOD, AND COMPUTER PROGRAM PRODUCT

Publication number: 20130080174

Abstract: In an embodiment, a retrieving device includes: a text input unit, a first extracting unit, a retrieving unit, a second extracting unit, an acquiring unit, and a selecting unit. The text input unit inputs a text including unknown word information representing a phrase that a user was unable to transcribe. The first extracting unit extracts related words representing a phrase related to the unknown word information among phrases other than the unknown word information included in the text. The retrieving unit retrieves a related document representing a document including the related words. The second extracting unit extracts candidate words representing candidates for the unknown word information from a plurality of phrases included in the related document. The acquiring unit acquires reading information representing estimated pronunciation of the unknown word information. The selecting unit selects at least one of candidate word of which pronunciation is similar to the reading information.

Type: Application

Filed: June 20, 2012

Publication date: March 28, 2013

Applicant: KABUSHIKI KAISHA TOSHIBA

Inventors: Osamu Nishiyama, Nobuhiro Shimogori, Tomoo Ikeda, Kouji Ueno, Hirokazu Suzuki, Manabu Nagao
CORRECTING UNINTELLIGIBLE SYNTHESIZED SPEECH

Publication number: 20130080173

Abstract: A method and system of speech synthesis. A text input is received in a text-to-speech system and, using a processor of the system, the text input is processed into synthesized speech which is established as unintelligible. The text input is reprocessed into subsequent synthesized speech and output to a user via a loudspeaker to correct the unintelligible synthesized speech. In one embodiment, the synthesized speech can be established as unintelligible by predicting intelligibility of the synthesized speech, and determining that the predicted intelligibility is lower than a minimum threshold. In another embodiment, the synthesized speech can be established as unintelligible by outputting the synthesized speech to the user via the loudspeaker, and receiving an indication from the user that the synthesized speech is not intelligible.

Type: Application

Filed: September 27, 2011

Publication date: March 28, 2013

Applicant: GENERAL MOTORS LLC

Inventors: Gaurav Talwar, Rathinavelu Chengalvarayan
OBJECTIVE EVALUATION OF SYNTHESIZED SPEECH ATTRIBUTES

Publication number: 20130080172

Abstract: A method of evaluating attributes of synthesized speech. The method includes processing a text input into a synthesized speech utterance using a processor of a text-to-speech system, applying a human speech utterance to a speech model to obtain a reference wherein the human speech utterance corresponds to the text input, applying the synthesized speech utterance to at least one of the speech model or an other speech model to obtain a test, and calculating a difference between the test and the reference. The method also can be used in a speech synthesis method.

Type: Application

Filed: September 22, 2011

Publication date: March 28, 2013

Applicant: GENERAL MOTORS LLC

Inventors: Gaurav Talwar, Xufang Zhao
READING DEVICE WITH HIERARCHAL NAVIGATION

Publication number: 20130060573

Abstract: In some embodiments, disclosed is reading device that comprises a camera, at least one processor, and a user interface. The camera scans at least a portion of a document having text to generate a raster file. The processor processes the raster file to identify text blocks. The user interface allows a user to hierarchically navigate the text blocks when they are read to the user.

Type: Application

Filed: July 30, 2012

Publication date: March 7, 2013

Applicant: INTEL-GE CARE INNOVATIONS LLC

Inventors: Gretchen Anderson, Jeff Witt, Ben Foss, JM Van Thong
ELECTROLARYNGEAL SPEECH RECONSTRUCTION METHOD AND SYSTEM THEREOF

Publication number: 20130035940

Abstract: The invention provides an electrolaryngeal speech reconstruction method and a system thereof. Firstly, model parameters are extracted from the collected speech as a parameter library, then facial images of a speaker are acquired and then transmitted to an image analyzing and processing module to obtain the voice onset and offset times and the vowel classes, then a waveform of a voice source is synthesized by a voice source synthesis module, finally, the waveform of the above voice source is output by an electrolarynx vibration output module, wherein the voice source synthesis module firstly sets the model parameters of a glottal voice source so as to synthesize the waveform of the glottal voice source, and then a waveguide model is used to simulate sound transmission in a vocal tract and select shape parameters of the vocal tract according to the vowel classes.

Type: Application

Filed: September 4, 2012

Publication date: February 7, 2013

Applicant: XI'AN JIAOTONG UNIVERITY

Inventors: MINGXI WAN, LIANG WU, SUPIN WANG, ZHIFENG NIU, CONGYING WAN
MOBILE TERMINAL AND DISPLAY METHOD THEREOF

Publication number: 20130024189

Abstract: A mobile terminal and a control method thereof are provided. The mobile terminal includes: an audio output module; a memory storing text; and a controller configured to convert at least a portion of the text into a speech and output the speech through the audio output module, wherein the controller stores at least a portion of speech data obtained by converting the at least a portion of the text into the speech in the memory, and outputs the speech based on the stored speech data to the audio output module when a speech output signal with respect to the at least portion of the text is obtained. When speech output signal with respect to a portion which has been output by speech is obtained, speech is output based on the stored speech data, thereby shortening time required for outputting the speech.

Type: Application

Filed: September 22, 2011

Publication date: January 24, 2013

Applicant: LG ELECTRONICS INC.

Inventors: Jaemin KIM, Seungho HAN, Yongchul PARK
METHOD AND SYSTEM FOR PRESELECTION OF SUITABLE UNITS FOR CONCATENATIVE SPEECH

Publication number: 20130013312

Abstract: A system and method for improving the response time of text-to-speech synthesis using triphone contexts. The method includes receiving input text, selecting a plurality of N phoneme units from a triphone unit selection database as candidate phonemes for synthesized speech based on the input text, wherein the triphone unit selection database comprises triphone units each comprising three phones and if the candidate phonemes are available in the triphone unit selection database, and applying a cost process to select a set of phonemes from the candidate phonemes. If so candidate phonemes are available in the triphone unit selection database, the method includes applying a single phoneme approach to select single phonemes for synthesis, the single phonemes used in synthesis independent of a triphone structure.

Type: Application

Filed: July 16, 2012

Publication date: January 10, 2013

Applicant: AT&T Intellectual Property II, L.P.

Inventor: Alistair D. Conkie
MOBILE COMPUTING APPARATUS AND METHOD OF REDUCING USER WORKLOAD IN RELATION TO OPERATION OF A MOBILE COMPUTING APPARATUS

Publication number: 20130013314

Abstract: A mobile computing apparatus comprises a processing resource arranged to support, when in use, an operational environment, the operational environment supporting receipt of textual content, a workload estimator arranged to estimate a cognitive workload for a user, and a text-to-speech engine. The text-to-speech engine is arranged to translate at least part of the received textual content to a signal reproducible as audible speech in accordance with a predetermined relationship between the amount of the textual content to be translated and a cognitive workload level in a range of cognitive workload levels, the range of cognitive workload levels comprising at least one cognitive workload level between end values.

Type: Application

Filed: July 6, 2012

Publication date: January 10, 2013

Applicant: TOMTOM INTERNATIONAL B.V.

Inventor: Breght Roderick Boschker
STATISTICAL ENHANCEMENT OF SPEECH OUTPUT FROM A STATISTICAL TEXT-TO-SPEECH SYNTHESIS SYSTEM

Publication number: 20130013313

Abstract: A method, system and computer program product are provided for enhancement of speech synthesized by a statistical text-to-speech (TTS) system employing a parametric representation of speech in a space of acoustic feature vectors. The method includes: defining a parametric family of corrective transformations operating in the space of the acoustic feature vectors and dependent on a set of enhancing parameters; and defining a distortion indictor of a feature vector or a plurality of feature vectors.

Type: Application

Filed: July 7, 2011

Publication date: January 10, 2013

Applicant: International Business Machines Corporation

Inventors: Slava Shechtman, Alexander Sorin
METHOD OF PROVIDING INFORMATION AND MOBILE TELECOMMUNICATION TERMINAL THEREOF

Publication number: 20120316873

Abstract: A method of providing information of a mobile communication terminal, and a mobile communication terminal for performing the method, are provided. The method includes determining whether a search command event has been generated during a call with a counterpart terminal, converting a voice signal received from a microphone into a text when the generation of search command event is determined to have been generated, identifying information matching the text in a memory, and sending the information to the counterpart terminal.

Type: Application

Filed: December 20, 2011

Publication date: December 13, 2012

Applicant: SAMSUNG ELECTRONICS CO. LTD.

Inventor: Yong Ho YOU
VOICE SYNTHESIS APPARATUS

Publication number: 20120310650

Abstract: In a voice synthesis apparatus, a phoneme piece interpolator acquires first phoneme piece data corresponding to a first value of sound characteristic, and second phoneme piece data corresponding to a second value of the sound characteristic. The first and second phoneme piece data indicate a spectrum of each frame of a phoneme piece. The phoneme piece interpolator interpolates between each frame of the first phoneme piece data and each frame of the second phoneme piece data so as to create phoneme piece data of the phoneme piece corresponding to a target value of the sound characteristic which is different from either of the first and second values of the sound characteristic. A voice synthesizer generates a voice signal having the target value of the sound characteristic based on the created phoneme piece data.

Type: Application

Filed: May 24, 2012

Publication date: December 6, 2012

Applicant: YAMAHA CORPORATION

Inventors: Jordi BONADA, Merlijn BLAAUW, Makoto Tachibana
SWITCHING BETWEEN TEXT DATA AND AUDIO DATA BASED ON A MAPPING

Publication number: 20120310649

Abstract: Techniques are provided for creating a mapping that maps locations in audio data (e.g., an audio book) to corresponding locations in text data (e.g., an e-book). Techniques are provided for using a mapping between audio data and text data, whether the mapping is created automatically or manually. A mapping may be used for bookmark switching where a bookmark established in one version of a digital work (e.g., e-book) is used to identify a corresponding location with another version of the digital work (e.g., an audio book). Alternatively, the mapping may be used to play audio that corresponds to text selected by a user. Alternatively, the mapping may be used to automatically highlight text in response to audio that corresponds to the text being played. Alternatively, the mapping may be used to determine where an annotation created in one media context (e.g., audio) will be consumed in another media context.

Type: Application

Filed: October 6, 2011

Publication date: December 6, 2012

Applicant: APPLE INC.

Inventors: Alan C. Cannistraro, Gregory S. Robbin, Casey M. Dougherty, Raymond Walsh, Melissa Breglio Hajj
SYSTEMS AND METHODS FOR DYNAMICALLY IMPROVING USER INTELLIGIBILITY OF SYNTHESIZED SPEECH IN A WORK ENVIRONMENT

Publication number: 20120296654

Abstract: Method and apparatus that dynamically adjusts operational parameters of a text-to-speech engine in a speech-based system. A voice engine or other application of a device provides a mechanism to alter the adjustable operational parameters of the text-to-speech engine. In response to one or more environmental conditions, the adjustable operational parameters of the text-to-speech engine are modified to increase the intelligibility of synthesized speech.

Type: Application

Filed: May 18, 2012

Publication date: November 22, 2012

Inventors: James Hendrickson, Debra Drylie Scott, Duane Littleton, John Pecorari, Arkadiusz Slusarczyk
COMBINING WEB BROWSER AND AUDIO PLAYER FUNCTIONALITY TO FACILITATE ORGANIZATION AND CONSUMPTION OF WEB DOCUMENTS

Publication number: 20120278082

Abstract: The invention is directed to combining web browser and audio player functionality for the organization and consumption of web documents. Specifically, the invention identifies a set of web documents via a web browser, extracts content from the web documents, and adds the set of web documents to a playlist. In this way, users can build a playlist of web documents and utilize the functionality and convenience of an audio player and listen to the content of the playlist.

Type: Application

Filed: April 27, 2012

Publication date: November 1, 2012

Applicant: CHARMTECH LABS LLC

Inventors: Yevgen Borodin, Alexander Dimitriyadi, Yury Puzis, Faisal Ahmed, Valentyn Melnyk
VOICE ASSIGNMENT FOR TEXT-TO-SPEECH OUTPUT

Publication number: 20120265533

Abstract: Text can be obtained at a device from various forms of communication such as e-mails or text messages. Metadata can be obtained directly from the communication or from a secondary source identified by the directly obtained metadata. The metadata can be used to create a speaker profile. The speaker profile can be used to select voice data. The selected voice data can be used by a text-to-speech (TTS) engine to produce speech output having voice characteristics that best match the speaker profile.

Type: Application

Filed: April 18, 2011

Publication date: October 18, 2012

Applicant: APPLE INC.

Inventor: Jonathan David Honeycutt
TALKING PAPER AUTHORING TOOLS

Publication number: 20120253815

Abstract: A range of unified software authoring tools for creating a talking paper application for integration in an end user platform are described herein. The authoring tools are easy to use and are interoperable to provide an easy and cost-effective method of creating a talking paper application. The authoring tools provide a framework for creating audio content and image content and interactively linking the audio content and the image content. The authoring tools also provide for verifying the interactively linked audio and image content, reviewing the audio content, the image content and the interactive linking on a display device. Finally, the authoring tools provide for saving the audio content, the video content and the interactive linking for publication to a manufacturer for integration in an end user platform or talking paper platform.

Type: Application

Filed: June 11, 2012

Publication date: October 4, 2012

Applicant: MICROSOFT CORPORATION

Inventors: Kentaro Toyama, Gerald Chu, Ravin Balakrishnan
SYSTEM AND METHOD FOR WEB TEXT CONTENT AGGREGATION AND PRESENTATION

Publication number: 20120253814

Abstract: A system and method for aggregating text-based content and presenting the text-based content as spoken audio is described herein, where a server module retrieves and aggregates web content from web content providers that may include text-based web content that is then extracted, filtered and categorizes for a client module to retrieve and play as spoken audio.

Type: Application

Filed: December 2, 2011

Publication date: October 4, 2012

Applicant: Harman International (Shanghai) Management Co., Ltd.

Inventors: Charles Chuanming Wang, Yong Ling
APPARATUS AND METHOD FOR EDITING SPEECH SYNTHESIS, AND COMPUTER READABLE MEDIUM

Publication number: 20120239404

Abstract: An acquisition unit analyzes a text, and acquires phonemic and prosodic information. An editing unit edits a part of the phonemic and prosodic information. A speech synthesis unit converts the phonemic and prosodic information before editing the part to a first speech waveform, and converts the phonemic and prosodic information after editing the part to a second speech waveform. A period calculation unit calculates a contrast period corresponding to the part in the first speech waveform and the second speech waveform. A speech generation unit generates an output waveform by connecting a first partial waveform and a second partial waveform. The first partial waveform contains the contrast period of the first speech waveform. The second partial waveform contains the contrast period of the second speech waveform.

Type: Application

Filed: September 19, 2011

Publication date: September 20, 2012

Applicant: KABUSHIKI KAISHA TOSHIBA

Inventor: Osamu Nishiyama
OBFUSCATED SPEECH SYNTHESIS

Publication number: 20120239406

Abstract: The present invention relates to a method for synthesizing a speech signal; comprising obtaining a speech sequence input signal comprising semantic content corresponding to a speaker's utterance; analyzing the input speech sequence signal to obtain a first sequence of feature vectors for the input speech sequence signal; synthesizing a second sequence of feature vectors different from and based on the first sequence of feature vectors; generating an excitation signal and filtering the excitation signal based on the second sequence of feature vectors to obtain a synthesized speech signal wherein the semantic content is obfuscated.

Type: Application

Filed: December 2, 2009

Publication date: September 20, 2012

Inventors: Johan Nikolaas Langehoveen Brummer, Avery Maxwell Glasser, Luis Buera Rodriquez
System and Method for Delivering a Human Interactive Proof to the Visually Impaired by Means of Semantic Association of Objects

Publication number: 20120232907

Abstract: A system and method for delivering a Human Interactive Proof, or reverse Turing test to the visually impaired; said test comprising a method for restricting access to a computer system, resource, or network to live persons, and for preventing the execution of automated scripts via an interface intended for human interaction. When queried for access to a protected resource, the system will respond with a challenge requiring unknown petitioners to solve an auditory puzzle before proceeding, said puzzle consisting of an audio waveform representative of the names or descriptions of a collection of apparently random objects. The subject of the test must either recognize a semantic or symbolic association between two or more objects, or isolate an object that does not belong with the others, indicating their selection by typing the name of the object with their keyboard.

Type: Application

Filed: March 9, 2011

Publication date: September 13, 2012

Inventor: Christopher Liam Ivey
Document Navigation Method

Publication number: 20120226501

Abstract: A document navigation tool that automatically navigates a document based on previous input from the user. The document navigation tool is utilized each time a page loads. The method recognizes user behavior on pages using patterns, which are based on four criterion: location, frequency, consistency, and scope. If the user has visited the page previously and has established a pattern, the method automatically focuses on the portion of the page indicated by the pattern, e.g. the location on a web page of the link clicked by the user in the user's last three visits to the page. If the user has not visited the page previously, the method logs the events that occur during this visit to the page.

Type: Application

Filed: May 14, 2012

Publication date: September 6, 2012

Applicant: FREEDOM SCIENTIFIC, INC.

Inventors: Robert Gallo, Glen Gordon
SCRIPTING SUPPORT FOR DATA IDENTIFIERS, VOICE RECOGNITION AND SPEECH IN A TELNET SESSION

Publication number: 20120226499

Abstract: Methods of adding data identifiers and speech/voice recognition functionality are disclosed. A telnet client runs one or more scripts that add data identifiers to data fields in a telnet session. The input data is inserted in the corresponding fields based on data identifiers. Scripts run only on the telnet client without modifications to the server applications. Further disclosed are methods for providing speech recognition and voice functionality to telnet clients. Portions of input data are converted to voice and played to the user. A user also may provide input to certain fields of the telnet session by using his voice. Scripts running on the telnet client convert the user's voice into text and is inserted to corresponding fields.

Type: Application

Filed: May 9, 2012

Publication date: September 6, 2012

Applicant: WAVELINK CORPORATION

Inventors: LAMAR JOHN VAN WAGENEN, BRANT DAVID THOMSEN, SCOTT ALLEN CADDES
SYSTEM AND METHOD FOR CONTENT RENDERING INCLUDING SYNTHETIC NARRATION

Publication number: 20120226500

Abstract: A system and method for capturing a voice information and using the voice information to modulate a content output signal. The method for capturing voice information includes receiving a request to create speech modulation and presenting a piece of textual content operable for use in creating the speech modulation based on the textual input. The method further includes receiving a first voice sample and determining a voice fingerprint based on said first voice sample. The voice fingerprint is operable for modulating speech during content rendering (e.g., audio output) such that a synthetic narration is performed based on the textual input. The voice fingerprint may then be stored and used for modulating the output.

Type: Application

Filed: March 2, 2011

Publication date: September 6, 2012

Applicant: SONY CORPORATION

Inventors: Guru Balasubramanian, Kalyana Srinivas Kota, Utkarsh Pandya
SPEECH TRANSLATION SYSTEM, CONTROL DEVICE, AND CONTROL METHOD

Publication number: 20120221321

Abstract: Appropriate processing results or appropriate apparatuses can be selected with a control device that selects the most probable speech recognition result by using speech recognition scores received with speech recognition results from two or more speech recognition apparatuses; sends the selected speech recognition result to two or more translation apparatuses respectively; selects the most probable translation result by using translation scores received with translation results from the two or more translation apparatuses; sends the selected translation result to two or more speech synthesis apparatuses respectively; receives a speech synthesis processing result including a speech synthesis result and a speech synthesis score from each of the two or more speech synthesis apparatuses; selects the most probable speech synthesis result by using the scores; and sends the selected speech synthesis result to a second terminal apparatus.

Type: Application

Filed: March 3, 2010

Publication date: August 30, 2012

Inventors: Satoshi Nakamura, Eiichiro Sumita, Yutaka Ashikari, Noriyuki Kimura, Chiori Hori
AUTOMATICALLY GENERATING AUDIBLE REPRESENTATIONS OF DATA CONTENT BASED ON USER PREFERENCES

Publication number: 20120221338

Abstract: A custom-content audible representation of selected data content is automatically created for a user. The content is based on content preferences of the user (e.g., one or more web browsing histories). The content is aggregated, converted using text-to-speech technology, and adapted to fit in a desired length selected for the personalized audible representation. The length of the audible representation may be custom for the user, and may be determined based on the amount of time the user is typically traveling.

Type: Application

Filed: February 25, 2011

Publication date: August 30, 2012

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Eli M. Dow, Marie R. Laser, Sarah J. Sheppard, Jessie Yu
SCRIPTING SUPPORT FOR DATA IDENTIFIERS, VOICE RECOGNITION AND VOICE INPUT IN A TELNET SESSION

Publication number: 20120221340

Abstract: Methods of adding data identifiers and speech/voice recognition functionality are disclosed. A telnet client runs one or more scripts that add data identifiers to data fields in a telnet session. The input data is inserted in the corresponding fields based on data identifiers. Scripts run only on the telnet client without modifications to the server applications. Further disclosed are methods for providing speech recognition and voice functionality to telnet clients. Portions of input data are converted to voice and played to the user. A user also may provide input to certain fields of the telnet session by using his voice. Scripts running on the telnet client convert the user's voice into text and is inserted to corresponding fields.

Type: Application

Filed: May 9, 2012

Publication date: August 30, 2012

Applicant: WAVELINK CORPORATION

Inventors: Lamar John VAN WAGENEN, Brant David THOMSEN, Scott Allen CADDES
METHOD FOR CONVERTING CHARACTER TEXT MESSAGES TO AUDIO FILES WITH RESPECTIVE TITLES FOR THEIR SELECTION AND READING ALOUD WITH MOBILE DEVICES

Publication number: 20120215540

Abstract: The present invention relates to a method for selecting and downloading content from a content provider which is accessible via an IP/DNS/URL address to a mobile device, the content being any text information data, for converting the text information data to at least one audio message and for storing the at least one audio message as at least one audio file on the mobile device, wherein the at least one audio file is playable and discernable as a music file. Said method implemented on a mobile phone enables controlling and playing the audio messages as music files, for instance also in a car environment with a car kit enabling a control and a selection of one or more of said at least one audio files for playing from the mobile phone.

Type: Application

Filed: February 14, 2012

Publication date: August 23, 2012

Applicant: Beyo GmbH

Inventor: Cüneyt Göktekin
SPEECH SIGNAL RESTORATION DEVICE AND SPEECH SIGNAL RESTORATION METHOD

Publication number: 20120209611

Abstract: A synthesis filter 106 synthesizes a plurality of wide-band speech signals by combining wide-band phoneme signals and sound source signals from a speech signal code book 105, and a distortion evaluation unit 107 selects one of the wide-band speech signals with a minimum waveform distortion with respect to an up-sampled narrow-band speech signal output from a sampling conversion unit 101. A first bandpass filter 103 extracts a frequency component outside a narrow-band of the wide-band speech signal and a band synthesis unit 104 combines it with the up-sampled narrow-band speech signal.

Type: Application

Filed: October 22, 2010

Publication date: August 16, 2012

Applicant: Mitsubishi Electric Corporation

Inventors: Satoru Furuta, Hirohisa Tasaki
SYSTEMS AND METHODS FOR PROVIDING EMERGENCY INFORMATION

Publication number: 20120203554

Abstract: In one general aspect, emergency information for a person is received from a user. A unique identifier for the person is generated. The unique identifier is associated with the emergency information. The emergency information is stored on an emergency information device. The unique identifier is associated with the emergency information device. The emergency information device is sent to the user.

Type: Application

Filed: April 19, 2012

Publication date: August 9, 2012

Inventor: Linda Dougherty-Clark
Open Architecture For a Voice User Interface

Publication number: 20120197646

Abstract: A system and method for processing voice requests from a user for accessing information on a computerized network and delivering information from a script server and an audio server in the network in audio format. A voice user interface subsystem includes: a dialog engine that is operable to interpret requests from users from the user input, communicate the requests to the script server and the audio server, and receive information from the script server and the audio server; a media telephony services (MTS) server, wherein the MTS server is operable to receive user input via a telephony system, and to transfer the user input to the dialog engine; and a broker coupled between the dialog engine and the MTS server. The broker establishes a session between the MTS server and the dialog engine and controls telephony functions with the telephony system.

Type: Application

Filed: April 16, 2012

Publication date: August 2, 2012

Applicant: Ben Franklin Patent Holding, LLC

Inventors: Marianna TESSEL, Danny Lange, Eugene Ponomarenko, Mitsuru Oshima, Daniel Burkes, Tjoen Min Tjong
PORTABLE ELECTRONIC EQUIPMENT AND CHARACTER INFORMATION CONVERSION SYSTEM

Publication number: 20120190407

Abstract: Provided is portable electronic equipment capable of mutually converting character information and simplified character information. The portable electronic equipment is equipped with a display unit (21); a character information acquisition unit (41) that acquires character information; a trigger signal detection unit (42) that detects a prescribed trigger signal; a character information conversion unit (43) that simplifies character information by extracting sentence elements from the character information and rearranging the sentence elements into a prescribed order or simplifies the character information by replacing prescribed words in the character information with symbols pertaining to said words, when the trigger signal is detected by the trigger signal detection unit (42); and a display control unit (44) that displays on the display unit (21) the character information simplified by the character information conversion unit (43).

Type: Application

Filed: July 26, 2010

Publication date: July 26, 2012

Applicant: KYOCERA CORPORATION

Inventors: Atsushi Miura, Yasumasa Sekigami, Shuuji Ishikawa

1 2 3 4 5 … next