Speech Synthesis; Text To Speech Systems (epo) Patents (Class 704/E13.001)

E Subclasses

Methods for producing synthetic speech; speech synthesizers (epo) (Class 704/E13.002)

Concept-to-speech synthesizers; generation of natural phrases not from text but from machine-based concepts (EPO) (Class 704/E13.003)
Sound editing, manipulating voice of the synthesizer (EPO) (Class 704/E13.004)

Details of speech synthesis systems, e.g., synthesizer architecture, memory management, etc. (epo) (Class 704/E13.005)

Elementary speech units used in speech synthesizers; concatenation rules (epo) (Class 704/E13.009)

Concatenation (EPO) (Class 704/E13.01)

Text analysis, generation of parameters for speech synthesis out of text, e.g., grapheme to phoneme translation, prosody generation, stress, or intonation determination, etc. (epo) (Class 704/E13.011)

NETWORK ABSTRACTION GATEWAY AND CORRESPONDING METHOD TO ABSTRACT AN ENDPOINT

Publication number: 20120188892

Abstract: A network abstraction gateway includes at least one abstracted network interface for connectivity with an abstracted network wherein a user has an abstracted endpoint having a first identity in the abstracted network; a communication system interface for connectivity with at least one user's communication system and exposing abstracted endpoint behavior via a second identity in the user's communication system; means adapted to register a one-to-one relationship between the first identity and the second identity; means for extracting behavior of the abstracted endpoint; and endpoint abstraction means adapted to abstract the abstracted endpoint in the user's communication system via an endpoint abstraction using the second identity. The endpoint abstraction is responsive to behavior of the abstracted endpoint and is adapted to implement at least one feature and/or state of the user's communication system and to bi-directionally map the behavior of the abstracted endpoint.

Type: Application

Filed: January 25, 2012

Publication date: July 26, 2012

Applicant: ESCAUX NV

Inventors: Amaury Jean Robert DEMILIE, Jordi Pierre Victor Serge NELISSEN
INTERACTIVE FIGURINE IN A COMMUNICATIONS SYSTEM INCORPORATING SELECTIVE CONTENT DELIVERY

Publication number: 20120185254

Abstract: In a system, an interactive figurine delivers messages to a user in one of a number of forms. A server operation system includes processing capability which may individually couple content or may customize messages to a particular user of the interactive figurines. The interactive figurine contains an embedded circuit consisting of a receiver comprising a detector circuit tuned to at least one preselected frequency, a decoder to provide information indicative of intelligence and signals sent to the receiver, and a decoder circuit to provide actionable output signals indicative of information transmitted to the receiver. The server operation system may include a subscriber database and administration routines for customizing of messages and for directing messages. A user station intermediate the interactive figurine and the server module may be used to provide parental control or other control.

Type: Application

Filed: January 18, 2012

Publication date: July 19, 2012

Inventors: William A. Biehler, Gary W. Smith
Automatic Dominant Orientation Estimation In Text Images Based On Steerable Filters

Publication number: 20120179468

Abstract: Briefly, in accordance with one or more embodiments, an image processing system is capable of receiving an image containing text, applying optical character recognition to the image, and then audibly reproducing the text via text-to-speech synthesis. Prior to optical character recognition, an orientation corrector is capable of detecting an amount of angular rotation of the text in the image with respect to horizontal, and then rotating the image by an appropriate amount to sufficiently align the text with respect to horizontal for optimal optical character recognition. The detection may be performed using steerable filters to provide an energy versus orientation curve of the image data. A maximum of the energy curve may indicate the amount of angular rotation that may be corrected by the orientation corrector.

Type: Application

Filed: March 20, 2012

Publication date: July 12, 2012

Inventor: Oscar Nestares
Automated Video Creation Techniques

Publication number: 20120177345

Abstract: Various technologies and techniques are disclosed for automatically creating videos from text. A video creation system automatically creates videos from text. The text to be converted is received. Images are located that corresponds to the subject matter in the text. Music for the video is selected. A spoken version of the text is created using text-to-speech conversion, or through an uploaded recording. A video is created using the images, music, and voice. The video is made available for playback, downloading, and/or submission to other systems. A blog/web site plug-in is disclosed that automatically creates a video from an article on the site. An automated video creation system is disclosed that includes a video creation module that receives text, images, music, video, and/or narration from a user. The video creation module creates the video from the user input along with programmatically selected inputs that include text, images, music, video, and/or narration.

Type: Application

Filed: January 9, 2012

Publication date: July 12, 2012

Inventors: Matthew Joe Trainer, Troy Michael George Gardner
MULTI-LINGUAL TEXT-TO-SPEECH SYSTEM AND METHOD

Publication number: 20120173241

Abstract: A multi-lingual text-to-speech system and method processes a text to be synthesized via an acoustic-prosodic model selection module and an acoustic-prosodic model mergence module, and obtains a phonetic unit transformation table. In an online phase, the acoustic-prosodic model selection module, according to the text and a phonetic unit transcription corresponding to the text, uses at least a set controllable accent weighting parameter to select a transformation combination and find a second and a first acoustic-prosodic models. The acoustic-prosodic model mergence module merges the two acoustic-prosodic models into a merged acoustic-prosodic model, according to the at least a controllable accent weighting parameter, processes all transformations in the transformation combination and generates a merged acoustic-prosodic model sequence. A speech synthesizer and the merged acoustic-prosodic model sequence are further applied to synthesize the text into an L1-accent L2 speech.

Type: Application

Filed: August 25, 2011

Publication date: July 5, 2012

Applicant: INDUSTRIAL TECHNOLOGY RESEARCH INSTITUTE

Inventors: Jen-Yu LI, Jia-Jang Tu, Chih-Chung Kuo
METHOD FOR CONTROLLING A MOBILE COMMUNICATIONS DEVICE WHILE LOCATED IN A MOBILE VEHICLE

Publication number: 20120172012

Abstract: A method for controlling a mobile communications device while located in a mobile vehicle involves pairing the mobile communications device with a telematics unit via short range wireless communication. The method further involves, receiving an incoming text message at the mobile device while the mobile device is paired with the telematics unit. Upon receiving the text message, a text messaging management strategy is implemented via the telematics unit and/or the mobile device, where the text messaging management strategy is executable via an application that is resident on the mobile device.

Type: Application

Filed: January 4, 2011

Publication date: July 5, 2012

Applicant: GENERAL MOTORS LLC

Inventors: Anthony J. Sumcad, Shawn F. Granda, Lawrence D. Cepuran, Steven Swanson
CONTROLLABLE PROSODY RE-ESTIMATION SYSTEM AND METHOD AND COMPUTER PROGRAM PRODUCT THEREOF

Publication number: 20120166198

Abstract: In one embodiment of a controllable prosody re-estimation system, a TTS/STS engine consists of a prosody prediction/estimation module, a prosody re-estimation module and a speech synthesis module. The prosody prediction/estimation module generates predicted or estimated prosody information. And then the prosody re-estimation module re-estimates the predicted or estimated prosody information and produces new prosody information, according to a set of controllable parameters provided by a controllable prosody parameter interface. The new prosody information is provided to the speech synthesis module to produce a synthesized speech.

Type: Application

Filed: July 11, 2011

Publication date: June 28, 2012

Applicant: INDUSTRIAL TECHNOLOGY RESEARCH INSTITUTE

Inventors: Cheng-Yuan Lin, Chien-Hung Huang, Chih-Chung Kuo
HOSTED VOICE RECOGNITION SYSTEM FOR WIRELESS DEVICES

Publication number: 20120166199

Abstract: Methods, systems, and software for converting the audio input of a user of a hand-held client device or mobile phone into a textual representation by means of a backend server accessed by the device through a communications network. The text is then inserted into or used by an application of the client device to send a text message, instant message, email, or to insert a request into a web-based application or service. In one embodiment, the method includes the steps of initializing or launching the application on the device; recording and transmitting the recorded audio message from the client device to the backend server through a client-server communication protocol; converting the transmitted audio message into the textual representation in the backend server; and sending the converted text message back to the client device or forwarding it on to an alternate destination directly from the server.

Type: Application

Filed: February 13, 2012

Publication date: June 28, 2012

Inventors: Victor R. Jablokov, Igor R. Jablokov, Marc White
FACILITATING TEXT-TO-SPEECH CONVERSION OF A USERNAME OR A NETWORK ADDRESS CONTAINING A USERNAME

Publication number: 20120158406

Abstract: To facilitate text-to-speech conversion of a username, a first or last name of a user associated with the username may be retrieved, and a pronunciation of the username may be determined based at least in part on whether the name forms at least part of the username. To facilitate text-to-speech conversion of a domain name having a top level domain and at least one other level domain, a pronunciation for the top level domain may be determined based at least in part upon whether the top level domain is one of a predetermined set of top level domains. Each other level domain may be searched for one or more recognized words therewithin, and a pronunciation of the other level domain may be determined based at least in part on an outcome of the search. The username and domain name may form part of a network address such as an email address, URL or URI.

Type: Application

Filed: February 23, 2012

Publication date: June 21, 2012

Inventors: Matthew BELLS, Jennifer Elizabeth LHOTAK, Michael Angelo NANNI
METHOD AND SYSTEM FOR RECONSTRUCTING SPEECH FROM AN INPUT SIGNAL COMPRISING WHISPERS

Publication number: 20120150544

Abstract: A system for reconstructing speech from an input signal comprising whispers is disclosed. The system comprises an analysis unit configured to analyse the input signal to form a representation of the input signal; an enhancement unit configured to modify the representation of the input signal to adjust a spectrum of the input signal, wherein the adjusting of the spectrum of the input signal comprises modifying a bandwidth of at least one formant in the spectrum to achieve a predetermined spectral energy distribution and amplitude for the at least one formant; and a synthesis unit configured to reconstruct speech from the modified representation of the input signal.

Type: Application

Filed: August 25, 2010

Publication date: June 14, 2012

Inventors: Ian Vince McLoughlin, Hamid Reza Sharifzadeh, Farzaneh Ahmadi
Personality-Based Device

Publication number: 20120150543

Abstract: A personality-based theme may be provided. An application program may query a personality resource file for a prompt corresponding to a personality. Then the prompt may be received at a speech synthesis engine. Next, the speech synthesis engine may query a personality voice font database for a voice font corresponding to the personality. Then the speech synthesis engine may apply the voice font to the prompt. The voice font applied prompt may then be produced at an output device.

Type: Application

Filed: February 24, 2012

Publication date: June 14, 2012

Applicant: Microsoft Corporation

Inventors: Hugh A. Teegan, Eric N. Badger, Drew E. Linerud
Trajectory Tiling Approach for Text-to-Speech

Publication number: 20120143611

Abstract: Hidden Markov Models HMM trajectory tiling (HTT)-based approaches may be used to synthesize speech from text. In operation, a set of Hidden Markov Models (HMMs) and a set of waveform units may be obtained from a speech corpus. The set of HMMs are further refined via minimum generation error (MGE) training to generate a refined set of HMMs. Subsequently, a speech parameter trajectory may be generated by applying the refined set of HMMs to an input text. A unit lattice of candidate waveform units may be selected from the set of waveform units based at least on the speech parameter trajectory. A normalized cross-correlation (NCC)-based search on the unit lattice may be performed to obtain a minimal concatenation cost sequence of candidate waveform units, which are concatenated into a concatenated waveform sequence that is synthesized into speech.

Type: Application

Filed: December 7, 2010

Publication date: June 7, 2012

Applicant: Microsoft Corporation

Inventors: Yao Qian, Zhi-Jie Yan, Yi-Jian Wu, Frank Kao-Ping Soong
Speech Synthesis information Editing Apparatus

Publication number: 20120143600

Abstract: In a speech synthesis information editing apparatus, a phoneme storage unit stores phoneme information that designates a duration of each phoneme of speech to be synthesized. A feature storage unit stores feature information that designates a time variation in a feature of the speech. An edition processing unit changes a duration of each phoneme designated by the phoneme information with an expansion/compression degree depending on a feature designated by the feature information in correspondence to the phoneme.

Type: Application

Filed: December 1, 2011

Publication date: June 7, 2012

Applicant: Yamaha Corporation

Inventor: Tatsuya IRIYAMA
METHODS AND APPARATUS FOR RAPID ACOUSTIC UNIT SELECTION FROM A LARGE SPEECH CORPUS

Publication number: 20120136663

Abstract: A speech synthesis system can select recorded speech fragments, or acoustic units, from a very large database of acoustic units to produce artificial speech. The selected acoustic units are chosen to minimize a combination of target and concatenation costs for a given sentence. However, as concatenation costs, which are measures of the mismatch between sequential pairs or acoustic units, are expensive to compute, processing can be greatly reduced by pre-computing and aching the concatenation costs. The number of possible sequential pairs of acoustic units makes such caching prohibitive. Statistical experiments reveal that while about 85% of the acoustic units are typically used in common speech, less than 1% of the possible sequential pairs or acoustic units occur in practice. The system synthesizes a large body of speech, identifies the acoustic unit sequential pairs generated and their respective concatenation costs, and stores those concatenation costs likely to occur.

Type: Application

Filed: November 29, 2011

Publication date: May 31, 2012

Applicant: AT&T Intellectual Property II, L.P.

Inventors: Mark Charles Beutnagel, Mehryar Mohri, Michael Dennis Riley
SYSTEM AND METHOD FOR CLOUD-BASED TEXT-TO-SPEECH WEB SERVICES

Publication number: 20120136664

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating speech. One variation of the method is from a server side, and another variation of the method is from a client side. The server side method, as implemented by a network-based automatic speech processing system, includes first receiving, from a network client independent of knowledge of internal operations of the system, a request to generate a text-to-speech voice. The request can include speech samples, transcriptions of the speech samples, and metadata describing the speech samples. The system extracts sound units from the speech samples based on the transcriptions and generates an interactive demonstration of the text-to-speech voice based on the sound units, the transcriptions, and the metadata, wherein the interactive demonstration hides a back end processing implementation from the network client. The system provides access to the interactive demonstration to the network client.

Type: Application

Filed: November 30, 2010

Publication date: May 31, 2012

Applicant: AT&T Intellectual Property I, L.P.

Inventors: Mark Charles Beutnagel, Alistair D. Conkie, Yeon-Jun Kim, Horst Juergen Schroeter
ELECTRONIC DEVICE AND CONTROL METHOD THEREOF

Publication number: 20120136665

Abstract: Disclosed are an electronic device and a control method thereof, The electronic device includes a text-to-speech unit which converts a text into an audio signal, an audio output unit which outputs an audio corresponding to the converted audio signal; and a controller which controls the audio output unit to reoutput at least one of audios whose output is not completed if there is at least one audio which is not completely output among a plurality of audios output by the audio output unit.

Type: Application

Filed: May 16, 2011

Publication date: May 31, 2012

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventor: Mariusz PYTEL
TOUCH SCREEN DEVICE FOR ALLOWING BLIND PEOPLE TO OPERATE OBJECTS DISPLAYED THEREON AND OBJECT OPERATING METHOD IN THE TOUCH SCREEN DEVICE

Publication number: 20120123781

Abstract: A touch screen device allowing blind people to operate objects displayed thereon and an object operating method in the touch screen device are provided. The touch screen device includes a touch sensing unit generating key values corresponding to touched ‘touch position's of a virtual keyboard for controlling application software being executed, the number of touches and touch time and transmitting the key values to the application software when sensing touches of the virtual keyboard while the virtual keyboard is activated, an object determination unit reading text information of a focused object using hooking mechanism when the application software is executed based on the key values received from the touch sensing unit and the object among objects included in the application software is focused, and a speech synthesis unit converting the text information read by the object determination unit into speech data using a text-to-speech engine and outputting the speech data.

Type: Application

Filed: February 11, 2011

Publication date: May 17, 2012

Inventors: Kun Park, Yong Suk Pak
SYSTEM FOR BANDWIDTH EXTENSION OF NARROW-BAND SPEECH

Publication number: 20120116769

Abstract: A method applies a parametric approach to bandwidth extension but does not require training. The method computes narrowband linear predictive coefficients from a received narrowband speech signal, computes narrowband partial correlation coefficients using recursion, computes Mnb area coefficients from the partial correlation coefficient, and extracts Mwb area coefficients using interpolation. Wideband parcors are computed from the Mwb area coefficients and wideband LPCs are computed from the wideband parcors. The method further comprises synthesizing a wideband signal using the wideband LPCs and a wideband excitation signal, highpass filtering the synthesized wideband signal to produce a highband signal, and combining the highband signal with the original narrowband signal to generate a wideband signal.

Type: Application

Filed: November 7, 2011

Publication date: May 10, 2012

Applicant: AT&T Intellectual Property II, L.P.

Inventors: David Malah, Richard Vandervoort Cox
Very Low Bit Rate Signal Coder and Decoder

Publication number: 20120109653

Abstract: Improved oscillator-based source modeling methods for estimating model parameters, for evaluating model quality for restoring the input from the model parameters, and for improving performance over known in the art methods are disclosed. An application of these innovations to speech coding is described. The improved oscillator model is derived from the information contained in the current input signal as well as from some form of data history, often the restored versions of the earlier processed data. Operations can be performed in real time, and compression can be achieved at a user-specified level of performance and, in some cases, without information loss. The new model can be combined with methods in the existing art in order to complement the properties of these methods, to improve overall performance. The present invention is effective for very low bit-rate coding/compression and decoding/decompression of digital signals, including digitized speech and audio signals.

Type: Application

Filed: October 29, 2010

Publication date: May 3, 2012

Inventors: Anton Yen, Irina Gorodnitsky
AUDIO OUTPUT OF A DOCUMENT FROM MOBILE DEVICE

Publication number: 20120109656

Abstract: Architecture for playing a document converted into an audio format to a user of an audio-output capable device. The user can interact with the device to control play of the audio document such as pause, rewind, forward, etc. In more robust implementation, the audio-output capable device is a mobile device (e.g., cell phone) having a microphone for processing voice input. Voice commands can then be input to control play (“reading”) of the document audio file to pause, rewind, read paragraph, read next chapter, fast forward, etc. A communications server (e.g., email, attachments to email, etc.) transcodes text-based document content into an audio format by leveraging a text-to-speech (TTS) engine. The transcoded audio files are then transferred to mobile devices through viable transmission channels. Users can then play the audio-formatted document while freeing hand and eye usage for other tasks.

Type: Application

Filed: January 9, 2012

Publication date: May 3, 2012

Applicant: MICROSOFT CORPORATION

Inventors: Sheng-Yao Shih, Yun-Chiang Kung, Chiwei Che, Chih-Chung Wang
METHODS AND APPARATUSES FOR FACILITATING SPEECH SYNTHESIS

Publication number: 20120109654

Abstract: Methods and apparatuses are provided for facilitating speech synthesis. A method may include generating a plurality of input models representing an input by using a statistical model synthesizer to statistically model the input. The method may further include determining a speech unit sequence representing at least a portion of the input by using the input models to influence selection of one or more pre-recorded speech units having parameter representations. The method may additionally include identifying one or more bad units in the unit sequence. The method may also include replacing the identified one or more bad units with one or more parameters generated by the statistical model synthesizer. Corresponding apparatuses are also provided.

Type: Application

Filed: May 2, 2011

Publication date: May 3, 2012

Applicant: NOKIA CORPORATION

Inventors: Jani Kristian Nurminen, Hanna Margareeta Silen, Elina Helander
SPEECH RECOGNITION SYSTEM PLATFORM

Publication number: 20120109759

Abstract: A device including a processor and a memory, the processor operating software performing a method of providing content to a device, the method including the steps of receiving an input from a sending device, gathering information pertaining to the sending device or a receiving device, searching a storage unit for content related to the information and the input, generating a message based on the content returned from the storage unit, incorporating the message into the input, transmitting the input including the message to the receiving device.

Type: Application

Filed: October 27, 2011

Publication date: May 3, 2012

Inventors: Yaron Oren, Heath Ahrens
AUGMENTING COMMUNICATION SESSIONS WITH APPLICATIONS

Publication number: 20120108221

Abstract: Embodiments include applications as participants in a communication session such as a voice call. The applications provide functionality to the communication session by performing commands issued by the participants during the communication session to generate output data. Example functionality includes recording audio, playing music, obtaining search results, obtaining calendar data to schedule future meetings, etc. The output data is made available to the participants during the communication session.

Type: Application

Filed: October 28, 2010

Publication date: May 3, 2012

Applicant: MICROSOFT CORPORATION

Inventors: Shawn M. Thomas, Taqi Jaffri, Omar Aftab
WIRELESS SERVER BASED TEXT TO SPEECH EMAIL

Publication number: 20120109655

Abstract: An email system for mobile devices, such as cellular phones and PDAs, is disclosed which allows email messages to be played back on the mobile device as voice messages on demand by way of a media player, thus eliminating the need for a unified messaging system. Email messages are received by the mobile device in a known manner. In accordance with an important aspect of the invention, the email messages are identified by the mobile device as they are received. After the message is identified, the mobile device sends the email message in text format to a server for conversion to speech or voice format. After the message is converted to speech format, the server sends the messages back to the user's mobile device and notifies the user of the email message and then plays the message back to the user through a media player upon demand.

Type: Application

Filed: December 21, 2011

Publication date: May 3, 2012

Inventors: Stephen S. Burns, Mickey W. Kowitz
TEXT-TO-SPEECH METHOD AND SYSTEM, COMPUTER PROGRAM PRODUCT THEREFOR

Publication number: 20120109630

Abstract: A text-to-speech system adapted to operate on text in a first language including sections in a second language, includes a grapheme/phoneme transcriptor for converting the sections in the second language into phonemes of the second language; a mapping module configured for mapping at least part of the phonemes of the second language onto sets of phonemes of the first language; and a speech-synthesis module adapted to be fed with a resulting stream of phonemes including the sets of phonemes of the first language resulting from mapping and the stream of phonemes of the first language representative of the text, and to generate a speech signal from the resulting stream of phonemes.

Type: Application

Filed: January 10, 2012

Publication date: May 3, 2012

Applicant: Loquendo S.p.A.

Inventors: Leonardo Badino, Claudia Barolo, Silvia Quazza
VOICE QUALITY CONVERSION DEVICE, METHOD OF MANUFACTURING THE VOICE QUALITY CONVERSION DEVICE, VOWEL INFORMATION GENERATION DEVICE, AND VOICE QUALITY CONVERSION SYSTEM

Publication number: 20120095767

Abstract: A device includes: an input speech separation unit which separates an input speech into vocal tract information and voicing source information; a mouth opening degree calculation unit which calculates a mouth opening degree from the vocal tract information; a target vowel database storage unit which stores pieces of vowel information on a target speaker; an agreement degree calculation unit which calculates a degree of agreement between the calculated mouth opening degree and a mouth opening degree included in the vowel information; a target vowel selection unit which selects the vowel information from among the pieces of vowel information, based on the calculated agreement degree; a vowel transformation unit which transforms the vocal tract information on the input speech, using vocal tract information included in the selected vowel information; and a synthesis unit which generates a synthetic speech using the transformed vocal tract information and the voicing source information.

Type: Application

Filed: December 22, 2011

Publication date: April 19, 2012

Inventors: Yoshifumi HIROSE, Takahiro Kamai
SYSTEM AND METHOD FOR DISPLAYING TEXT IN AUGMENTED REALITY

Publication number: 20120088543

Abstract: A system and a method are provided for displaying text in low-light environments. An original image of text is captured in a low-light environment using a camera on a mobile device, whereby the imaged text comprising images of characters. A brightness setting and a contrast setting of the original image are adjusted to increase the contrast of the imaged text relative to a background of the original image. Optical character recognition is applied to the adjusted image to generate computer readable text or characters corresponding to each of the imaged text. The original image of text is displayed on the mobile device. The computer readable text is also displayed, overlaid the original image, wherein the computer readable text is aligned with the corresponding imaged text.

Type: Application

Filed: October 8, 2010

Publication date: April 12, 2012

Applicant: Research In Motion Limited

Inventors: Jeffery Lindner, James Hymel
METHODS AND APPARATUS TO AUDIBLY PROVIDE MESSAGES IN A MOBILE DEVICE

Publication number: 20120089401

Abstract: Methods and apparatus to audibly provide messages in a mobile device at described. An example method includes receiving a message at a mobile device, wherein the message includes an identification of a sender, an identification of a recipient, and a message contents, determining that the message contents includes a predetermined phrase, in response to determining that the message contents includes the predetermined phrase, audibly presenting the message contents.

Type: Application

Filed: October 8, 2010

Publication date: April 12, 2012

Inventors: JAMES ALLEN HYMEL, JEFFERY ERHARD LINDNER
SYSTEMS AND METHODS FOR USING HOMOPHONE LEXICONS IN ENGLISH TEXT-TO-SPEECH

Publication number: 20120089400

Abstract: The present invention relates to information systems. More specifically, the present invention relates to infrastructure and techniques for improving Text-to-Speech-enabled applications.

Type: Application

Filed: October 6, 2010

Publication date: April 12, 2012

Inventor: Caroline Gilles Henton
Voice Over Short Messaging Service

Publication number: 20120089399

Abstract: A method of operating a mobile communication device is described. A text message is received over a wireless messaging channel, wherein the text message contains a non-text representation of an utterance. The non-text representation is extracted from the text message, and an audio representation of the spoken utterance is synthesized from the non-text representation.

Type: Application

Filed: December 19, 2011

Publication date: April 12, 2012

Applicant: NUANCE COMMUNICATIONS, INC.

Inventor: Daniel L. Roth
READING ALOUD SUPPORT APPARATUS, METHOD, AND PROGRAM

Publication number: 20120078633

Abstract: According to one embodiment, a reading aloud support apparatus includes a reception unit, a first extraction unit, a second extraction unit, an acquisition unit, a generation unit, a presentation unit. The reception unit is configured to receive an instruction. The first extraction unit is configured to extract, as a partial document, a part of a document which corresponds to a range of words. The second extraction unit is configured to perform morphological analysis and to extract words as candidate words. The acquisition unit is configured to acquire attribute information items relates to the candidate words. The generation unit is configured to perform weighting relating to a value corresponding a distance and to determine each of candidate words to be preferentially presented to generate a presentation order. The presentation unit is configured to present the candidate words and the attribute information items in accordance with the presentation order.

Type: Application

Filed: March 22, 2011

Publication date: March 29, 2012

Inventors: Kosei Fume, Masaru Suzuki, Yuji Shimizu, Tatsuya Izuha
VOICE-BAND EXTENDING APPARATUS AND VOICE-BAND EXTENDING METHOD

Publication number: 20120078632

Abstract: An optical device includes a fast Fourier transform (FFT) unit, a signal noise ratio (SNR) calculation processing unit, a band selecting unit, an extension-signal creating unit, an addition unit, and an inverse fast Fourier transform (IFFT) unit. The FFT unit performs the Fourier transform on an input signal that is input from the outside. The SNR calculation processing unit calculates an SNR with respect to each of bands in the input signal. The band selecting unit selects a band of which SNR exceeds a threshold and is the maximum SNR, based on respective SNRs of the bands. The extension-signal creating unit creates an extension signal based on a signal acquired by the band selecting unit. The addition unit adds the extension signal to the input signal, and creates a band-extended signal. The IFFT unit performs the inverse fast Fourier transform on the band-extended signal, and creates an output signal.

Type: Application

Filed: June 13, 2011

Publication date: March 29, 2012

Applicant: FUJITSU LIMITED

Inventors: Taro Togawa, Shusaku Ito, Takeshi Otani, Masanao Suzuki, Yasuji Ota
SYSTEM AND METHOD FOR CONFIGURING VOICE SYNTHESIS

Publication number: 20120072223

Abstract: Systems and methods for providing synthesized speech in a manner that takes into account the environment where the speech is presented. A method embodiment includes, based on a listening environment and at least one other parameter associated with at least one other parameter, selecting an approach from the plurality of approaches for presenting synthesized speech in a listening environment, presenting synthesized speech according to the selected approach and based on natural language input received from a user indicating that an inability to understand the presented synthesized speech, selecting a second approach from the plurality of approaches and presenting subsequent synthesized speech using the second approach.

Type: Application

Filed: November 23, 2011

Publication date: March 22, 2012

Applicant: AT&T Intellectual Property II, L.P.

Inventors: Kenneth H. Rosen, Carroll W. Creswell, Jeffrey J. Farah, Pradeep K. Bansal, Ann K. Syrdal
SYSTEMS AND METHODS FOR NORMALIZING INPUT MEDIA

Publication number: 20120072204

Abstract: A method and system for processing input media for provision to a text to speech engine comprising: a rules engine configured to maintain and update rules for processing the input media; a pre-parsing filter module configured to determine one or more metadata attributes using pre-parsing rules; a parsing filter module configured to identify content component from the input media using the parsing rules; a context and language detector configured to determine a default context and a default language; a learning agent configured to divide the content component into units of interest; a tagging module configured to iteratively assign tags to the units of interest using the tagging rules, wherein each tag is associated with a post-parsing rule; a post-parsing filter module configured to modify the content component by executing the post-parsing rules identified by the tags assigned to the phrases and strings.

Type: Application

Filed: September 22, 2010

Publication date: March 22, 2012

Applicant: Voice On The Go Inc.

Inventors: Babak Nasri, Selva Thayaparam
VOICE PROCESSING DEVICE

Publication number: 20120065978

Abstract: In voice processing, a first distribution generation unit approximates a distribution of feature information representative of voice of a first speaker per a unit interval thereof as a mixed probability distribution which is a mixture of a plurality of first probability distributions corresponding to a plurality of different phones. A second distribution generation unit also approximates a distribution of feature information representative of voice of a second speaker as a mixed probability distribution which is a mixture of a plurality of second probability distributions. A function generation unit generates, for each phone, a conversion function for converting the feature information of voice of the first speaker to that of the second speaker based on respective statistics of the first and second probability distributions that correspond to the phone.

Type: Application

Filed: September 14, 2011

Publication date: March 15, 2012

Applicant: YAMAHA CORPORATION

Inventor: Fernando VILLAVICENCIO
CODING AND DECODING A TRANSIENT FRAME

Publication number: 20120065980

Abstract: An electronic device for coding a transient frame is described. The electronic device includes a processor and executable instructions stored in memory that is in electronic communication with the processor. The electronic device obtains a current transient frame. The electronic device also obtains a residual signal based on the current transient frame. Additionally, the electronic device determines a set of peak locations based on the residual signal. The electronic device further determines whether to use a first coding mode or a second coding mode for coding the current transient frame based on at least the set of peak locations. The electronic device also synthesizes an excitation based on the first coding mode if the first coding mode is determined. The electronic device also synthesizes an excitation based on the second coding mode if the second coding mode is determined.

Type: Application

Filed: September 8, 2011

Publication date: March 15, 2012

Applicant: QUALCOMM Incorporated

Inventors: Venkatesh Krishnan, Ananthapadmanabhan Arasanipalai Kandhadai
SPEECH RECOGNITION MODEL

Publication number: 20120053935

Abstract: In one implementation, speech or audio is converted to a searchable format by a speech recognition system. The speech recognition system uses a language model including probabilities of certain words occurring, which may depend on the occurrence of other words or sequences of words. The language model is partially built from personal vocabularies. Personal vocabularies are determined by known text from network traffic, including emails and Internet postings. The speech recognition system may incorporate the personal vocabulary of one user into the language model of another user based on a connection between the two users. The connection may be triggered by an email, a phone call, or an interaction in a social networking service. The speech recognition system may remove or add personal vocabularies to the language model based on a calculated confidence score from the resulting language model.

Type: Application

Filed: August 27, 2010

Publication date: March 1, 2012

Applicant: Cisco Technology, Inc.

Inventors: Ashutosh A. Malegaonkar, Gannu Satish Kumar, Guido K.M. Jouret
INTERNET TELEMATICS SERVICE PROVIDING SYSTEM AND INTERNET TELEMATICS SERVICE PROVIDING METHOD FOR PROVIDING PERSONALIZED WEB CONTENTS

Publication number: 20120054030

Abstract: An Internet telematics service providing system and method that may provide various Web contents and information is disclosed. The Internet telematics service providing system may include a request information receiver to receive request information about the Internet telematics service from a telematics system of a vehicle, a content selector to select Internet content associated with the request information via an Internet content providing server, and a content providing unit to provide the selected content to the telematics system. The telematics system may output the content to a user in an auditory form.

Type: Application

Filed: August 24, 2011

Publication date: March 1, 2012

Applicant: NHN CORPORATION

Inventors: Kyung Bo NAM, Choong Hee LEE
METHOD AND APPARATUS FOR GENERATING AND DISTRIBUTING A HYBRID VOICE RECORDING DERIVED FROM VOCAL ATTRIBUTES OF A REFERENCE VOICE AND A SUBJECT VOICE

Publication number: 20120046949

Abstract: A first person narrates a selected written text to generate a reference audio file including one or more parameters are selected from the sounds of the reference audio file, including the duration of a sound, the duration of a pause, the rise and fall of frequency relative to a reference frequency, and/or volume differential between select sounds. A voice profile library contains a phonetic library of sounds spoken by a subject speaker. An integration module generates a preliminary audio file of the selected text in the voice of the subject speaker and then modifies individual sounds by the parameters from the reference file, forming a hybrid audio file. The hybrid audio file retains the tonality of the subject voice, but incorporates the rhythm, cadence and inflections of the reference voice. The reference audio file, and/or the hybrid audio file are licensed or sold as part of a commercial transaction.

Type: Application

Filed: August 17, 2011

Publication date: February 23, 2012

Inventors: Patrick John Leddy, Ronald R. Shea
ELECTRONIC BOOK READER AND TEXT TO SPEECH CONVERTING METHOD

Publication number: 20120041765

Abstract: An electronic book reader includes a text obtaining module, a text analysis module, a speech synthesis module, a control module, and an audio output device. The text obtaining module is used for obtaining a selected segment of a text. The text analysis module is used for analyzing a time phrase of the selected segment to obtain a waiting time period according to meaning of the time phrase in the selected segment. The speech synthesis module is used for converting the selected segment into speech. The control module is used for sending the content of the selected segment to the speech synthesis module. Wherein the control module waits for the waiting time period after sending the time phrase to the speech synthesis. The audio output module is used for playing the speech.

Type: Application

Filed: May 10, 2011

Publication date: February 16, 2012

Applicant: HON HAI PRECISION INDUSTRY CO., LTD.

Inventors: CHIA-HUNG CHIEN, TUN-TAO TSAI, CHUN-WEN WANG, LIANG-MAO HUNG
SYSTEM AND METHOD FOR SYNTHETIC VOICE GENERATION AND MODIFICATION

Publication number: 20120035933

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating a synthetic voice. A system configured to practice the method combines a first database of a first text-to-speech voice and a second database of a second text-to-speech voice to generate a combined database, selects from the combined database, based on a policy, voice units of a phonetic category for the synthetic voice to yield selected voice units, and synthesizes speech based on the selected voice units. The system can synthesize speech without parameterizing the first text-to-speech voice and the second text-to-speech voice. A policy can define, for a particular phonetic category, from which text-to-speech voice to select voice units. The combined database can include multiple text-to-speech voices from different speakers. The combined database can include voices of a single speaker speaking in different styles. The combined database can include voices of different languages.

Type: Application

Filed: August 6, 2010

Publication date: February 9, 2012

Applicant: AT&T Intellectual Property I, L.P.

Inventors: Alistair D. CONKIE, Ann K. Syrdal
SPEECH GENERATION DEVICE WITH A PROJECTED DISPLAY AND OPTICAL INPUTS

Publication number: 20120035934

Abstract: In several embodiments, a speech generation device is disclosed. The speech generation device may generally include a projector configured to project images in the form of a projected display onto a projection Surface, an optical input device configured to detect an input directed towards the projected display and a speaker configured to generate an audio output. In addition, the speech generation device may include a processing unit communicatively coupled to the projector, the optical input device and the speaker. The processing unit may include a processor and related computer readable medium configured to store instructions executable by the processor, wherein the instructions stored on the computer readable medium configure the speech generation device to generate text-to-speech output.

Type: Application

Filed: August 3, 2011

Publication date: February 9, 2012

Applicant: DYNAVOX SYSTEMS LLC

Inventor: BOB CUNNINGHAM
Cooperative Processing For Portable Reading Machine

Publication number: 20120029920

Abstract: A handheld device includes an image input device capable of acquiring images, circuitry to send a representation of the image to a remote computing system that performs at least one processing function related to processing the image and circuitry to receive from the remote computing system data based on processing the image by the remote system.

Type: Application

Filed: October 11, 2011

Publication date: February 2, 2012

Inventors: Raymond C. Kurzweil, Paul Albrecht, Lucy Gibson
Modification of Speech Quality in Conversations Over Voice Channels

Publication number: 20120016674

Abstract: Techniques are disclosed for modifying speech quality in a conversation over a voice channel. For example, a method for modifying a speech quality associated with a spoken utterance transmittable over a voice channel comprises the following steps. The spoken utterance is obtained prior to an intended recipient of the spoken utterance receiving the spoken utterance. An existing speech quality of the spoken utterance is determined. The existing speech quality of the spoken utterance is compared to at least one desired speech quality associated with at least one previously obtained spoken utterance to determine whether the existing speech quality substantially matches the desired speech quality. At least one characteristic of the spoken utterance is modified to change the existing speech quality of the spoken utterance to the desired speech quality when the existing speech quality does not substantially match the desired speech quality.

Type: Application

Filed: July 16, 2010

Publication date: January 19, 2012

Applicant: International Business Machines Corporation

Inventors: Sarah H. Basson, Dimitri Kanevsky, David Nahamoo, Tara N. Sainath
BROADCAST SYSTEM USING TEXT TO SPEECH CONVERSION

Publication number: 20120016675

Abstract: A broadcast signal receiver comprises a text data receiver for receiving broadcast text data for display to a user in relation to a user interface; a text-to-speech (TTS) converter for converting received text data into an audio speech signal, the TTS converter being operable to detect whether a word for conversion is included in a stored list of words for conversion and, if so, to convert that word according to a conversion defined by the stored list; and if not, to convert that word according to a set of predetermined conversion rules; a conversion memory storing the list of words for conversion by the TTS converter; and an update receiver for receiving additional words and associated conversions for storage in the conversion memory.

Type: Application

Filed: June 1, 2011

Publication date: January 19, 2012

Applicant: Sony Europe Limited

Inventors: Huw HOPKINS, Timothy EDMUNDS
Method and System for Speech Synthesis and Advertising Service

Publication number: 20120010888

Abstract: Methods and systems for providing a network-accessible text-to-speech synthesis service are provided. The service accepts content as input. After extracting textual content from the input content, the service transforms the content into a format suitable for high-quality speech synthesis. Additionally, the service produces audible advertisements, which are combined with the synthesized speech. The audible advertisements themselves can be generated from textual advertisement content.

Type: Application

Filed: August 29, 2011

Publication date: January 12, 2012

Inventor: James H. Stephens, JR.
ELECTRONIC BOOK READER AND TEXT TO SPEECH CONVERTING METHOD

Publication number: 20110320206

Abstract: An electronic book reader includes a text obtaining module, a text highlighting module, a speech synthesis module, a player module, and a synchronization control module. The text obtaining module obtains a selected segment of a text. The text highlighting module highlights the selected segment. The speech synthesis module converts the selected segment into a speech. The player module plays the speech. The synchronization control module sends the selected segment to the text highlighting module and speech synthesis module synchronously.

Type: Application

Filed: April 22, 2011

Publication date: December 29, 2011

Applicant: HON HAI PRECISION INDUSTRY CO., LTD.

Inventors: CHIH-HUNG CHEN, CHIA-HUNG CHIEN, CHIEN-CHOU CHEN
CODING, MODIFICATION AND SYNTHESIS OF SPEECH SEGMENTS

Publication number: 20110320207

Abstract: The invention relates to a method for speech signal analysis, modification and synthesis comprising a phase for the location of analysis windows by means of an iterative process for the determination of the phase of the first sinusoidal component and comparison between the phase value of said component and a predetermined value, a phase for the selection of analysis frames corresponding to an allophone and readjustment of the duration and the fundamental frequency according to certain thresholds and a phase for the generation of synthetic speech from synthesis frames taking the information of the closest analysis frame as spectral information of the synthesis frame and taking as many synthesis frames as periods that the synthetic signal has. The method allows a coherent location of the analysis windows within the periods of the signal and the exact generation of the synthesis instants in a manner synchronous with the fundamental period.

Type: Application

Filed: December 21, 2010

Publication date: December 29, 2011

Applicant: TELEFONICA, S.A.

Inventors: Miguel Angel Rodriguez Crespo, Jose Gregorio Escalada Sardina, Ana Armenta Lopez Vicuna
SYSTEM AND METHOD FOR UNIT SELECTION TEXT-TO-SPEECH USING A MODIFIED VITERBI APPROACH

Publication number: 20110313772

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for speech synthesis. A system practicing the method receives a set of ordered lists of speech units, for each respective speech unit in each ordered list in the set of ordered lists, constructs a sublist of speech units from a next ordered list which are suitable for concatenation, performs a cost analysis of paths through the set of ordered lists of speech units based on the sublist of speech units for each respective speech unit, and synthesizes speech using a lowest cost path of speech units through the set of ordered lists based on the cost analysis. The ordered lists can be ordered based on the respective pitch of each speech unit. In one embodiment, speech units which do not have an assigned pitch can be assigned a pitch.

Type: Application

Filed: June 18, 2010

Publication date: December 22, 2011

Applicant: AT&T Intellectual Property I, L.P.

Inventor: Alistair D. CONKIE
CONTEXTUAL CONVERSION PLATFORM

Publication number: 20110295606

Abstract: A contextual conversion platform, and method for converting text-to-speech, are described that can convert content of a target to spoken content. Embodiments of the contextual conversion platform can identify certain contextual characteristics of the content, from which can be generated a spoken content input. This spoken content input can include tokens, e.g., words and abbreviations, to be converted to the spoken content, as well as substitution tokens that are selected from contextual repositories based on the context identified by the contextual conversion platform.

Type: Application

Filed: May 28, 2010

Publication date: December 1, 2011

Inventor: Daniel Ben-Ezri

prev 1 2 3 4 5 6 … next