Speech Synthesis; Text To Speech Systems (epo) Patents (Class 704/E13.001)
  • Publication number: 20120188892
    Abstract: A network abstraction gateway includes at least one abstracted network interface for connectivity with an abstracted network wherein a user has an abstracted endpoint having a first identity in the abstracted network; a communication system interface for connectivity with at least one user's communication system and exposing abstracted endpoint behavior via a second identity in the user's communication system; means adapted to register a one-to-one relationship between the first identity and the second identity; means for extracting behavior of the abstracted endpoint; and endpoint abstraction means adapted to abstract the abstracted endpoint in the user's communication system via an endpoint abstraction using the second identity. The endpoint abstraction is responsive to behavior of the abstracted endpoint and is adapted to implement at least one feature and/or state of the user's communication system and to bi-directionally map the behavior of the abstracted endpoint.
    Type: Application
    Filed: January 25, 2012
    Publication date: July 26, 2012
    Applicant: ESCAUX NV
    Inventors: Amaury Jean Robert DEMILIE, Jordi Pierre Victor Serge NELISSEN
  • Publication number: 20120185254
    Abstract: In a system, an interactive figurine delivers messages to a user in one of a number of forms. A server operation system includes processing capability which may individually couple content or may customize messages to a particular user of the interactive figurines. The interactive figurine contains an embedded circuit consisting of a receiver comprising a detector circuit tuned to at least one preselected frequency, a decoder to provide information indicative of intelligence and signals sent to the receiver, and a decoder circuit to provide actionable output signals indicative of information transmitted to the receiver. The server operation system may include a subscriber database and administration routines for customizing of messages and for directing messages. A user station intermediate the interactive figurine and the server module may be used to provide parental control or other control.
    Type: Application
    Filed: January 18, 2012
    Publication date: July 19, 2012
    Inventors: William A. Biehler, Gary W. Smith
  • Publication number: 20120179468
    Abstract: Briefly, in accordance with one or more embodiments, an image processing system is capable of receiving an image containing text, applying optical character recognition to the image, and then audibly reproducing the text via text-to-speech synthesis. Prior to optical character recognition, an orientation corrector is capable of detecting an amount of angular rotation of the text in the image with respect to horizontal, and then rotating the image by an appropriate amount to sufficiently align the text with respect to horizontal for optimal optical character recognition. The detection may be performed using steerable filters to provide an energy versus orientation curve of the image data. A maximum of the energy curve may indicate the amount of angular rotation that may be corrected by the orientation corrector.
    Type: Application
    Filed: March 20, 2012
    Publication date: July 12, 2012
    Inventor: Oscar Nestares
  • Publication number: 20120177345
    Abstract: Various technologies and techniques are disclosed for automatically creating videos from text. A video creation system automatically creates videos from text. The text to be converted is received. Images are located that corresponds to the subject matter in the text. Music for the video is selected. A spoken version of the text is created using text-to-speech conversion, or through an uploaded recording. A video is created using the images, music, and voice. The video is made available for playback, downloading, and/or submission to other systems. A blog/web site plug-in is disclosed that automatically creates a video from an article on the site. An automated video creation system is disclosed that includes a video creation module that receives text, images, music, video, and/or narration from a user. The video creation module creates the video from the user input along with programmatically selected inputs that include text, images, music, video, and/or narration.
    Type: Application
    Filed: January 9, 2012
    Publication date: July 12, 2012
    Inventors: Matthew Joe Trainer, Troy Michael George Gardner
  • Publication number: 20120173241
    Abstract: A multi-lingual text-to-speech system and method processes a text to be synthesized via an acoustic-prosodic model selection module and an acoustic-prosodic model mergence module, and obtains a phonetic unit transformation table. In an online phase, the acoustic-prosodic model selection module, according to the text and a phonetic unit transcription corresponding to the text, uses at least a set controllable accent weighting parameter to select a transformation combination and find a second and a first acoustic-prosodic models. The acoustic-prosodic model mergence module merges the two acoustic-prosodic models into a merged acoustic-prosodic model, according to the at least a controllable accent weighting parameter, processes all transformations in the transformation combination and generates a merged acoustic-prosodic model sequence. A speech synthesizer and the merged acoustic-prosodic model sequence are further applied to synthesize the text into an L1-accent L2 speech.
    Type: Application
    Filed: August 25, 2011
    Publication date: July 5, 2012
    Applicant: INDUSTRIAL TECHNOLOGY RESEARCH INSTITUTE
    Inventors: Jen-Yu LI, Jia-Jang Tu, Chih-Chung Kuo
  • Publication number: 20120172012
    Abstract: A method for controlling a mobile communications device while located in a mobile vehicle involves pairing the mobile communications device with a telematics unit via short range wireless communication. The method further involves, receiving an incoming text message at the mobile device while the mobile device is paired with the telematics unit. Upon receiving the text message, a text messaging management strategy is implemented via the telematics unit and/or the mobile device, where the text messaging management strategy is executable via an application that is resident on the mobile device.
    Type: Application
    Filed: January 4, 2011
    Publication date: July 5, 2012
    Applicant: GENERAL MOTORS LLC
    Inventors: Anthony J. Sumcad, Shawn F. Granda, Lawrence D. Cepuran, Steven Swanson
  • Publication number: 20120166198
    Abstract: In one embodiment of a controllable prosody re-estimation system, a TTS/STS engine consists of a prosody prediction/estimation module, a prosody re-estimation module and a speech synthesis module. The prosody prediction/estimation module generates predicted or estimated prosody information. And then the prosody re-estimation module re-estimates the predicted or estimated prosody information and produces new prosody information, according to a set of controllable parameters provided by a controllable prosody parameter interface. The new prosody information is provided to the speech synthesis module to produce a synthesized speech.
    Type: Application
    Filed: July 11, 2011
    Publication date: June 28, 2012
    Applicant: INDUSTRIAL TECHNOLOGY RESEARCH INSTITUTE
    Inventors: Cheng-Yuan Lin, Chien-Hung Huang, Chih-Chung Kuo
  • Publication number: 20120166199
    Abstract: Methods, systems, and software for converting the audio input of a user of a hand-held client device or mobile phone into a textual representation by means of a backend server accessed by the device through a communications network. The text is then inserted into or used by an application of the client device to send a text message, instant message, email, or to insert a request into a web-based application or service. In one embodiment, the method includes the steps of initializing or launching the application on the device; recording and transmitting the recorded audio message from the client device to the backend server through a client-server communication protocol; converting the transmitted audio message into the textual representation in the backend server; and sending the converted text message back to the client device or forwarding it on to an alternate destination directly from the server.
    Type: Application
    Filed: February 13, 2012
    Publication date: June 28, 2012
    Inventors: Victor R. Jablokov, Igor R. Jablokov, Marc White
  • Publication number: 20120158406
    Abstract: To facilitate text-to-speech conversion of a username, a first or last name of a user associated with the username may be retrieved, and a pronunciation of the username may be determined based at least in part on whether the name forms at least part of the username. To facilitate text-to-speech conversion of a domain name having a top level domain and at least one other level domain, a pronunciation for the top level domain may be determined based at least in part upon whether the top level domain is one of a predetermined set of top level domains. Each other level domain may be searched for one or more recognized words therewithin, and a pronunciation of the other level domain may be determined based at least in part on an outcome of the search. The username and domain name may form part of a network address such as an email address, URL or URI.
    Type: Application
    Filed: February 23, 2012
    Publication date: June 21, 2012
    Inventors: Matthew BELLS, Jennifer Elizabeth LHOTAK, Michael Angelo NANNI
  • Publication number: 20120150544
    Abstract: A system for reconstructing speech from an input signal comprising whispers is disclosed. The system comprises an analysis unit configured to analyse the input signal to form a representation of the input signal; an enhancement unit configured to modify the representation of the input signal to adjust a spectrum of the input signal, wherein the adjusting of the spectrum of the input signal comprises modifying a bandwidth of at least one formant in the spectrum to achieve a predetermined spectral energy distribution and amplitude for the at least one formant; and a synthesis unit configured to reconstruct speech from the modified representation of the input signal.
    Type: Application
    Filed: August 25, 2010
    Publication date: June 14, 2012
    Inventors: Ian Vince McLoughlin, Hamid Reza Sharifzadeh, Farzaneh Ahmadi
  • Publication number: 20120150543
    Abstract: A personality-based theme may be provided. An application program may query a personality resource file for a prompt corresponding to a personality. Then the prompt may be received at a speech synthesis engine. Next, the speech synthesis engine may query a personality voice font database for a voice font corresponding to the personality. Then the speech synthesis engine may apply the voice font to the prompt. The voice font applied prompt may then be produced at an output device.
    Type: Application
    Filed: February 24, 2012
    Publication date: June 14, 2012
    Applicant: Microsoft Corporation
    Inventors: Hugh A. Teegan, Eric N. Badger, Drew E. Linerud
  • Publication number: 20120143611
    Abstract: Hidden Markov Models HMM trajectory tiling (HTT)-based approaches may be used to synthesize speech from text. In operation, a set of Hidden Markov Models (HMMs) and a set of waveform units may be obtained from a speech corpus. The set of HMMs are further refined via minimum generation error (MGE) training to generate a refined set of HMMs. Subsequently, a speech parameter trajectory may be generated by applying the refined set of HMMs to an input text. A unit lattice of candidate waveform units may be selected from the set of waveform units based at least on the speech parameter trajectory. A normalized cross-correlation (NCC)-based search on the unit lattice may be performed to obtain a minimal concatenation cost sequence of candidate waveform units, which are concatenated into a concatenated waveform sequence that is synthesized into speech.
    Type: Application
    Filed: December 7, 2010
    Publication date: June 7, 2012
    Applicant: Microsoft Corporation
    Inventors: Yao Qian, Zhi-Jie Yan, Yi-Jian Wu, Frank Kao-Ping Soong
  • Publication number: 20120143600
    Abstract: In a speech synthesis information editing apparatus, a phoneme storage unit stores phoneme information that designates a duration of each phoneme of speech to be synthesized. A feature storage unit stores feature information that designates a time variation in a feature of the speech. An edition processing unit changes a duration of each phoneme designated by the phoneme information with an expansion/compression degree depending on a feature designated by the feature information in correspondence to the phoneme.
    Type: Application
    Filed: December 1, 2011
    Publication date: June 7, 2012
    Applicant: Yamaha Corporation
    Inventor: Tatsuya IRIYAMA
  • Publication number: 20120136663
    Abstract: A speech synthesis system can select recorded speech fragments, or acoustic units, from a very large database of acoustic units to produce artificial speech. The selected acoustic units are chosen to minimize a combination of target and concatenation costs for a given sentence. However, as concatenation costs, which are measures of the mismatch between sequential pairs or acoustic units, are expensive to compute, processing can be greatly reduced by pre-computing and aching the concatenation costs. The number of possible sequential pairs of acoustic units makes such caching prohibitive. Statistical experiments reveal that while about 85% of the acoustic units are typically used in common speech, less than 1% of the possible sequential pairs or acoustic units occur in practice. The system synthesizes a large body of speech, identifies the acoustic unit sequential pairs generated and their respective concatenation costs, and stores those concatenation costs likely to occur.
    Type: Application
    Filed: November 29, 2011
    Publication date: May 31, 2012
    Applicant: AT&T Intellectual Property II, L.P.
    Inventors: Mark Charles Beutnagel, Mehryar Mohri, Michael Dennis Riley
  • Publication number: 20120136664
    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating speech. One variation of the method is from a server side, and another variation of the method is from a client side. The server side method, as implemented by a network-based automatic speech processing system, includes first receiving, from a network client independent of knowledge of internal operations of the system, a request to generate a text-to-speech voice. The request can include speech samples, transcriptions of the speech samples, and metadata describing the speech samples. The system extracts sound units from the speech samples based on the transcriptions and generates an interactive demonstration of the text-to-speech voice based on the sound units, the transcriptions, and the metadata, wherein the interactive demonstration hides a back end processing implementation from the network client. The system provides access to the interactive demonstration to the network client.
    Type: Application
    Filed: November 30, 2010
    Publication date: May 31, 2012
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: Mark Charles Beutnagel, Alistair D. Conkie, Yeon-Jun Kim, Horst Juergen Schroeter
  • Publication number: 20120136665
    Abstract: Disclosed are an electronic device and a control method thereof, The electronic device includes a text-to-speech unit which converts a text into an audio signal, an audio output unit which outputs an audio corresponding to the converted audio signal; and a controller which controls the audio output unit to reoutput at least one of audios whose output is not completed if there is at least one audio which is not completely output among a plurality of audios output by the audio output unit.
    Type: Application
    Filed: May 16, 2011
    Publication date: May 31, 2012
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventor: Mariusz PYTEL
  • Publication number: 20120123781
    Abstract: A touch screen device allowing blind people to operate objects displayed thereon and an object operating method in the touch screen device are provided. The touch screen device includes a touch sensing unit generating key values corresponding to touched ‘touch position's of a virtual keyboard for controlling application software being executed, the number of touches and touch time and transmitting the key values to the application software when sensing touches of the virtual keyboard while the virtual keyboard is activated, an object determination unit reading text information of a focused object using hooking mechanism when the application software is executed based on the key values received from the touch sensing unit and the object among objects included in the application software is focused, and a speech synthesis unit converting the text information read by the object determination unit into speech data using a text-to-speech engine and outputting the speech data.
    Type: Application
    Filed: February 11, 2011
    Publication date: May 17, 2012
    Inventors: Kun Park, Yong Suk Pak
  • Publication number: 20120116769
    Abstract: A method applies a parametric approach to bandwidth extension but does not require training. The method computes narrowband linear predictive coefficients from a received narrowband speech signal, computes narrowband partial correlation coefficients using recursion, computes Mnb area coefficients from the partial correlation coefficient, and extracts Mwb area coefficients using interpolation. Wideband parcors are computed from the Mwb area coefficients and wideband LPCs are computed from the wideband parcors. The method further comprises synthesizing a wideband signal using the wideband LPCs and a wideband excitation signal, highpass filtering the synthesized wideband signal to produce a highband signal, and combining the highband signal with the original narrowband signal to generate a wideband signal.
    Type: Application
    Filed: November 7, 2011
    Publication date: May 10, 2012
    Applicant: AT&T Intellectual Property II, L.P.
    Inventors: David Malah, Richard Vandervoort Cox
  • Publication number: 20120109653
    Abstract: Improved oscillator-based source modeling methods for estimating model parameters, for evaluating model quality for restoring the input from the model parameters, and for improving performance over known in the art methods are disclosed. An application of these innovations to speech coding is described. The improved oscillator model is derived from the information contained in the current input signal as well as from some form of data history, often the restored versions of the earlier processed data. Operations can be performed in real time, and compression can be achieved at a user-specified level of performance and, in some cases, without information loss. The new model can be combined with methods in the existing art in order to complement the properties of these methods, to improve overall performance. The present invention is effective for very low bit-rate coding/compression and decoding/decompression of digital signals, including digitized speech and audio signals.
    Type: Application
    Filed: October 29, 2010
    Publication date: May 3, 2012
    Inventors: Anton Yen, Irina Gorodnitsky
  • Publication number: 20120109656
    Abstract: Architecture for playing a document converted into an audio format to a user of an audio-output capable device. The user can interact with the device to control play of the audio document such as pause, rewind, forward, etc. In more robust implementation, the audio-output capable device is a mobile device (e.g., cell phone) having a microphone for processing voice input. Voice commands can then be input to control play (“reading”) of the document audio file to pause, rewind, read paragraph, read next chapter, fast forward, etc. A communications server (e.g., email, attachments to email, etc.) transcodes text-based document content into an audio format by leveraging a text-to-speech (TTS) engine. The transcoded audio files are then transferred to mobile devices through viable transmission channels. Users can then play the audio-formatted document while freeing hand and eye usage for other tasks.
    Type: Application
    Filed: January 9, 2012
    Publication date: May 3, 2012
    Applicant: MICROSOFT CORPORATION
    Inventors: Sheng-Yao Shih, Yun-Chiang Kung, Chiwei Che, Chih-Chung Wang
  • Publication number: 20120109654
    Abstract: Methods and apparatuses are provided for facilitating speech synthesis. A method may include generating a plurality of input models representing an input by using a statistical model synthesizer to statistically model the input. The method may further include determining a speech unit sequence representing at least a portion of the input by using the input models to influence selection of one or more pre-recorded speech units having parameter representations. The method may additionally include identifying one or more bad units in the unit sequence. The method may also include replacing the identified one or more bad units with one or more parameters generated by the statistical model synthesizer. Corresponding apparatuses are also provided.
    Type: Application
    Filed: May 2, 2011
    Publication date: May 3, 2012
    Applicant: NOKIA CORPORATION
    Inventors: Jani Kristian Nurminen, Hanna Margareeta Silen, Elina Helander
  • Publication number: 20120109759
    Abstract: A device including a processor and a memory, the processor operating software performing a method of providing content to a device, the method including the steps of receiving an input from a sending device, gathering information pertaining to the sending device or a receiving device, searching a storage unit for content related to the information and the input, generating a message based on the content returned from the storage unit, incorporating the message into the input, transmitting the input including the message to the receiving device.
    Type: Application
    Filed: October 27, 2011
    Publication date: May 3, 2012
    Inventors: Yaron Oren, Heath Ahrens
  • Publication number: 20120108221
    Abstract: Embodiments include applications as participants in a communication session such as a voice call. The applications provide functionality to the communication session by performing commands issued by the participants during the communication session to generate output data. Example functionality includes recording audio, playing music, obtaining search results, obtaining calendar data to schedule future meetings, etc. The output data is made available to the participants during the communication session.
    Type: Application
    Filed: October 28, 2010
    Publication date: May 3, 2012
    Applicant: MICROSOFT CORPORATION
    Inventors: Shawn M. Thomas, Taqi Jaffri, Omar Aftab
  • Publication number: 20120109655
    Abstract: An email system for mobile devices, such as cellular phones and PDAs, is disclosed which allows email messages to be played back on the mobile device as voice messages on demand by way of a media player, thus eliminating the need for a unified messaging system. Email messages are received by the mobile device in a known manner. In accordance with an important aspect of the invention, the email messages are identified by the mobile device as they are received. After the message is identified, the mobile device sends the email message in text format to a server for conversion to speech or voice format. After the message is converted to speech format, the server sends the messages back to the user's mobile device and notifies the user of the email message and then plays the message back to the user through a media player upon demand.
    Type: Application
    Filed: December 21, 2011
    Publication date: May 3, 2012
    Inventors: Stephen S. Burns, Mickey W. Kowitz
  • Publication number: 20120109630
    Abstract: A text-to-speech system adapted to operate on text in a first language including sections in a second language, includes a grapheme/phoneme transcriptor for converting the sections in the second language into phonemes of the second language; a mapping module configured for mapping at least part of the phonemes of the second language onto sets of phonemes of the first language; and a speech-synthesis module adapted to be fed with a resulting stream of phonemes including the sets of phonemes of the first language resulting from mapping and the stream of phonemes of the first language representative of the text, and to generate a speech signal from the resulting stream of phonemes.
    Type: Application
    Filed: January 10, 2012
    Publication date: May 3, 2012
    Applicant: Loquendo S.p.A.
    Inventors: Leonardo Badino, Claudia Barolo, Silvia Quazza
  • Publication number: 20120095767
    Abstract: A device includes: an input speech separation unit which separates an input speech into vocal tract information and voicing source information; a mouth opening degree calculation unit which calculates a mouth opening degree from the vocal tract information; a target vowel database storage unit which stores pieces of vowel information on a target speaker; an agreement degree calculation unit which calculates a degree of agreement between the calculated mouth opening degree and a mouth opening degree included in the vowel information; a target vowel selection unit which selects the vowel information from among the pieces of vowel information, based on the calculated agreement degree; a vowel transformation unit which transforms the vocal tract information on the input speech, using vocal tract information included in the selected vowel information; and a synthesis unit which generates a synthetic speech using the transformed vocal tract information and the voicing source information.
    Type: Application
    Filed: December 22, 2011
    Publication date: April 19, 2012
    Inventors: Yoshifumi HIROSE, Takahiro Kamai
  • Publication number: 20120088543
    Abstract: A system and a method are provided for displaying text in low-light environments. An original image of text is captured in a low-light environment using a camera on a mobile device, whereby the imaged text comprising images of characters. A brightness setting and a contrast setting of the original image are adjusted to increase the contrast of the imaged text relative to a background of the original image. Optical character recognition is applied to the adjusted image to generate computer readable text or characters corresponding to each of the imaged text. The original image of text is displayed on the mobile device. The computer readable text is also displayed, overlaid the original image, wherein the computer readable text is aligned with the corresponding imaged text.
    Type: Application
    Filed: October 8, 2010
    Publication date: April 12, 2012
    Applicant: Research In Motion Limited
    Inventors: Jeffery Lindner, James Hymel
  • Publication number: 20120089401
    Abstract: Methods and apparatus to audibly provide messages in a mobile device at described. An example method includes receiving a message at a mobile device, wherein the message includes an identification of a sender, an identification of a recipient, and a message contents, determining that the message contents includes a predetermined phrase, in response to determining that the message contents includes the predetermined phrase, audibly presenting the message contents.
    Type: Application
    Filed: October 8, 2010
    Publication date: April 12, 2012
    Inventors: JAMES ALLEN HYMEL, JEFFERY ERHARD LINDNER
  • Publication number: 20120089400
    Abstract: The present invention relates to information systems. More specifically, the present invention relates to infrastructure and techniques for improving Text-to-Speech-enabled applications.
    Type: Application
    Filed: October 6, 2010
    Publication date: April 12, 2012
    Inventor: Caroline Gilles Henton
  • Publication number: 20120089399
    Abstract: A method of operating a mobile communication device is described. A text message is received over a wireless messaging channel, wherein the text message contains a non-text representation of an utterance. The non-text representation is extracted from the text message, and an audio representation of the spoken utterance is synthesized from the non-text representation.
    Type: Application
    Filed: December 19, 2011
    Publication date: April 12, 2012
    Applicant: NUANCE COMMUNICATIONS, INC.
    Inventor: Daniel L. Roth
  • Publication number: 20120078633
    Abstract: According to one embodiment, a reading aloud support apparatus includes a reception unit, a first extraction unit, a second extraction unit, an acquisition unit, a generation unit, a presentation unit. The reception unit is configured to receive an instruction. The first extraction unit is configured to extract, as a partial document, a part of a document which corresponds to a range of words. The second extraction unit is configured to perform morphological analysis and to extract words as candidate words. The acquisition unit is configured to acquire attribute information items relates to the candidate words. The generation unit is configured to perform weighting relating to a value corresponding a distance and to determine each of candidate words to be preferentially presented to generate a presentation order. The presentation unit is configured to present the candidate words and the attribute information items in accordance with the presentation order.
    Type: Application
    Filed: March 22, 2011
    Publication date: March 29, 2012
    Inventors: Kosei Fume, Masaru Suzuki, Yuji Shimizu, Tatsuya Izuha
  • Publication number: 20120078632
    Abstract: An optical device includes a fast Fourier transform (FFT) unit, a signal noise ratio (SNR) calculation processing unit, a band selecting unit, an extension-signal creating unit, an addition unit, and an inverse fast Fourier transform (IFFT) unit. The FFT unit performs the Fourier transform on an input signal that is input from the outside. The SNR calculation processing unit calculates an SNR with respect to each of bands in the input signal. The band selecting unit selects a band of which SNR exceeds a threshold and is the maximum SNR, based on respective SNRs of the bands. The extension-signal creating unit creates an extension signal based on a signal acquired by the band selecting unit. The addition unit adds the extension signal to the input signal, and creates a band-extended signal. The IFFT unit performs the inverse fast Fourier transform on the band-extended signal, and creates an output signal.
    Type: Application
    Filed: June 13, 2011
    Publication date: March 29, 2012
    Applicant: FUJITSU LIMITED
    Inventors: Taro Togawa, Shusaku Ito, Takeshi Otani, Masanao Suzuki, Yasuji Ota
  • Publication number: 20120072223
    Abstract: Systems and methods for providing synthesized speech in a manner that takes into account the environment where the speech is presented. A method embodiment includes, based on a listening environment and at least one other parameter associated with at least one other parameter, selecting an approach from the plurality of approaches for presenting synthesized speech in a listening environment, presenting synthesized speech according to the selected approach and based on natural language input received from a user indicating that an inability to understand the presented synthesized speech, selecting a second approach from the plurality of approaches and presenting subsequent synthesized speech using the second approach.
    Type: Application
    Filed: November 23, 2011
    Publication date: March 22, 2012
    Applicant: AT&T Intellectual Property II, L.P.
    Inventors: Kenneth H. Rosen, Carroll W. Creswell, Jeffrey J. Farah, Pradeep K. Bansal, Ann K. Syrdal
  • Publication number: 20120072204
    Abstract: A method and system for processing input media for provision to a text to speech engine comprising: a rules engine configured to maintain and update rules for processing the input media; a pre-parsing filter module configured to determine one or more metadata attributes using pre-parsing rules; a parsing filter module configured to identify content component from the input media using the parsing rules; a context and language detector configured to determine a default context and a default language; a learning agent configured to divide the content component into units of interest; a tagging module configured to iteratively assign tags to the units of interest using the tagging rules, wherein each tag is associated with a post-parsing rule; a post-parsing filter module configured to modify the content component by executing the post-parsing rules identified by the tags assigned to the phrases and strings.
    Type: Application
    Filed: September 22, 2010
    Publication date: March 22, 2012
    Applicant: Voice On The Go Inc.
    Inventors: Babak Nasri, Selva Thayaparam
  • Publication number: 20120065978
    Abstract: In voice processing, a first distribution generation unit approximates a distribution of feature information representative of voice of a first speaker per a unit interval thereof as a mixed probability distribution which is a mixture of a plurality of first probability distributions corresponding to a plurality of different phones. A second distribution generation unit also approximates a distribution of feature information representative of voice of a second speaker as a mixed probability distribution which is a mixture of a plurality of second probability distributions. A function generation unit generates, for each phone, a conversion function for converting the feature information of voice of the first speaker to that of the second speaker based on respective statistics of the first and second probability distributions that correspond to the phone.
    Type: Application
    Filed: September 14, 2011
    Publication date: March 15, 2012
    Applicant: YAMAHA CORPORATION
    Inventor: Fernando VILLAVICENCIO
  • Publication number: 20120065980
    Abstract: An electronic device for coding a transient frame is described. The electronic device includes a processor and executable instructions stored in memory that is in electronic communication with the processor. The electronic device obtains a current transient frame. The electronic device also obtains a residual signal based on the current transient frame. Additionally, the electronic device determines a set of peak locations based on the residual signal. The electronic device further determines whether to use a first coding mode or a second coding mode for coding the current transient frame based on at least the set of peak locations. The electronic device also synthesizes an excitation based on the first coding mode if the first coding mode is determined. The electronic device also synthesizes an excitation based on the second coding mode if the second coding mode is determined.
    Type: Application
    Filed: September 8, 2011
    Publication date: March 15, 2012
    Applicant: QUALCOMM Incorporated
    Inventors: Venkatesh Krishnan, Ananthapadmanabhan Arasanipalai Kandhadai
  • Publication number: 20120053935
    Abstract: In one implementation, speech or audio is converted to a searchable format by a speech recognition system. The speech recognition system uses a language model including probabilities of certain words occurring, which may depend on the occurrence of other words or sequences of words. The language model is partially built from personal vocabularies. Personal vocabularies are determined by known text from network traffic, including emails and Internet postings. The speech recognition system may incorporate the personal vocabulary of one user into the language model of another user based on a connection between the two users. The connection may be triggered by an email, a phone call, or an interaction in a social networking service. The speech recognition system may remove or add personal vocabularies to the language model based on a calculated confidence score from the resulting language model.
    Type: Application
    Filed: August 27, 2010
    Publication date: March 1, 2012
    Applicant: Cisco Technology, Inc.
    Inventors: Ashutosh A. Malegaonkar, Gannu Satish Kumar, Guido K.M. Jouret
  • Publication number: 20120054030
    Abstract: An Internet telematics service providing system and method that may provide various Web contents and information is disclosed. The Internet telematics service providing system may include a request information receiver to receive request information about the Internet telematics service from a telematics system of a vehicle, a content selector to select Internet content associated with the request information via an Internet content providing server, and a content providing unit to provide the selected content to the telematics system. The telematics system may output the content to a user in an auditory form.
    Type: Application
    Filed: August 24, 2011
    Publication date: March 1, 2012
    Applicant: NHN CORPORATION
    Inventors: Kyung Bo NAM, Choong Hee LEE
  • Publication number: 20120046949
    Abstract: A first person narrates a selected written text to generate a reference audio file including one or more parameters are selected from the sounds of the reference audio file, including the duration of a sound, the duration of a pause, the rise and fall of frequency relative to a reference frequency, and/or volume differential between select sounds. A voice profile library contains a phonetic library of sounds spoken by a subject speaker. An integration module generates a preliminary audio file of the selected text in the voice of the subject speaker and then modifies individual sounds by the parameters from the reference file, forming a hybrid audio file. The hybrid audio file retains the tonality of the subject voice, but incorporates the rhythm, cadence and inflections of the reference voice. The reference audio file, and/or the hybrid audio file are licensed or sold as part of a commercial transaction.
    Type: Application
    Filed: August 17, 2011
    Publication date: February 23, 2012
    Inventors: Patrick John Leddy, Ronald R. Shea
  • Publication number: 20120041765
    Abstract: An electronic book reader includes a text obtaining module, a text analysis module, a speech synthesis module, a control module, and an audio output device. The text obtaining module is used for obtaining a selected segment of a text. The text analysis module is used for analyzing a time phrase of the selected segment to obtain a waiting time period according to meaning of the time phrase in the selected segment. The speech synthesis module is used for converting the selected segment into speech. The control module is used for sending the content of the selected segment to the speech synthesis module. Wherein the control module waits for the waiting time period after sending the time phrase to the speech synthesis. The audio output module is used for playing the speech.
    Type: Application
    Filed: May 10, 2011
    Publication date: February 16, 2012
    Applicant: HON HAI PRECISION INDUSTRY CO., LTD.
    Inventors: CHIA-HUNG CHIEN, TUN-TAO TSAI, CHUN-WEN WANG, LIANG-MAO HUNG
  • Publication number: 20120035933
    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating a synthetic voice. A system configured to practice the method combines a first database of a first text-to-speech voice and a second database of a second text-to-speech voice to generate a combined database, selects from the combined database, based on a policy, voice units of a phonetic category for the synthetic voice to yield selected voice units, and synthesizes speech based on the selected voice units. The system can synthesize speech without parameterizing the first text-to-speech voice and the second text-to-speech voice. A policy can define, for a particular phonetic category, from which text-to-speech voice to select voice units. The combined database can include multiple text-to-speech voices from different speakers. The combined database can include voices of a single speaker speaking in different styles. The combined database can include voices of different languages.
    Type: Application
    Filed: August 6, 2010
    Publication date: February 9, 2012
    Applicant: AT&T Intellectual Property I, L.P.
    Inventors: Alistair D. CONKIE, Ann K. Syrdal
  • Publication number: 20120035934
    Abstract: In several embodiments, a speech generation device is disclosed. The speech generation device may generally include a projector configured to project images in the form of a projected display onto a projection Surface, an optical input device configured to detect an input directed towards the projected display and a speaker configured to generate an audio output. In addition, the speech generation device may include a processing unit communicatively coupled to the projector, the optical input device and the speaker. The processing unit may include a processor and related computer readable medium configured to store instructions executable by the processor, wherein the instructions stored on the computer readable medium configure the speech generation device to generate text-to-speech output.
    Type: Application
    Filed: August 3, 2011
    Publication date: February 9, 2012
    Applicant: DYNAVOX SYSTEMS LLC
    Inventor: BOB CUNNINGHAM
  • Publication number: 20120029920
    Abstract: A handheld device includes an image input device capable of acquiring images, circuitry to send a representation of the image to a remote computing system that performs at least one processing function related to processing the image and circuitry to receive from the remote computing system data based on processing the image by the remote system.
    Type: Application
    Filed: October 11, 2011
    Publication date: February 2, 2012
    Inventors: Raymond C. Kurzweil, Paul Albrecht, Lucy Gibson
  • Publication number: 20120016674
    Abstract: Techniques are disclosed for modifying speech quality in a conversation over a voice channel. For example, a method for modifying a speech quality associated with a spoken utterance transmittable over a voice channel comprises the following steps. The spoken utterance is obtained prior to an intended recipient of the spoken utterance receiving the spoken utterance. An existing speech quality of the spoken utterance is determined. The existing speech quality of the spoken utterance is compared to at least one desired speech quality associated with at least one previously obtained spoken utterance to determine whether the existing speech quality substantially matches the desired speech quality. At least one characteristic of the spoken utterance is modified to change the existing speech quality of the spoken utterance to the desired speech quality when the existing speech quality does not substantially match the desired speech quality.
    Type: Application
    Filed: July 16, 2010
    Publication date: January 19, 2012
    Applicant: International Business Machines Corporation
    Inventors: Sarah H. Basson, Dimitri Kanevsky, David Nahamoo, Tara N. Sainath
  • Publication number: 20120016675
    Abstract: A broadcast signal receiver comprises a text data receiver for receiving broadcast text data for display to a user in relation to a user interface; a text-to-speech (TTS) converter for converting received text data into an audio speech signal, the TTS converter being operable to detect whether a word for conversion is included in a stored list of words for conversion and, if so, to convert that word according to a conversion defined by the stored list; and if not, to convert that word according to a set of predetermined conversion rules; a conversion memory storing the list of words for conversion by the TTS converter; and an update receiver for receiving additional words and associated conversions for storage in the conversion memory.
    Type: Application
    Filed: June 1, 2011
    Publication date: January 19, 2012
    Applicant: Sony Europe Limited
    Inventors: Huw HOPKINS, Timothy EDMUNDS
  • Publication number: 20120010888
    Abstract: Methods and systems for providing a network-accessible text-to-speech synthesis service are provided. The service accepts content as input. After extracting textual content from the input content, the service transforms the content into a format suitable for high-quality speech synthesis. Additionally, the service produces audible advertisements, which are combined with the synthesized speech. The audible advertisements themselves can be generated from textual advertisement content.
    Type: Application
    Filed: August 29, 2011
    Publication date: January 12, 2012
    Inventor: James H. Stephens, JR.
  • Publication number: 20110320206
    Abstract: An electronic book reader includes a text obtaining module, a text highlighting module, a speech synthesis module, a player module, and a synchronization control module. The text obtaining module obtains a selected segment of a text. The text highlighting module highlights the selected segment. The speech synthesis module converts the selected segment into a speech. The player module plays the speech. The synchronization control module sends the selected segment to the text highlighting module and speech synthesis module synchronously.
    Type: Application
    Filed: April 22, 2011
    Publication date: December 29, 2011
    Applicant: HON HAI PRECISION INDUSTRY CO., LTD.
    Inventors: CHIH-HUNG CHEN, CHIA-HUNG CHIEN, CHIEN-CHOU CHEN
  • Publication number: 20110320207
    Abstract: The invention relates to a method for speech signal analysis, modification and synthesis comprising a phase for the location of analysis windows by means of an iterative process for the determination of the phase of the first sinusoidal component and comparison between the phase value of said component and a predetermined value, a phase for the selection of analysis frames corresponding to an allophone and readjustment of the duration and the fundamental frequency according to certain thresholds and a phase for the generation of synthetic speech from synthesis frames taking the information of the closest analysis frame as spectral information of the synthesis frame and taking as many synthesis frames as periods that the synthetic signal has. The method allows a coherent location of the analysis windows within the periods of the signal and the exact generation of the synthesis instants in a manner synchronous with the fundamental period.
    Type: Application
    Filed: December 21, 2010
    Publication date: December 29, 2011
    Applicant: TELEFONICA, S.A.
    Inventors: Miguel Angel Rodriguez Crespo, Jose Gregorio Escalada Sardina, Ana Armenta Lopez Vicuna
  • Publication number: 20110313772
    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for speech synthesis. A system practicing the method receives a set of ordered lists of speech units, for each respective speech unit in each ordered list in the set of ordered lists, constructs a sublist of speech units from a next ordered list which are suitable for concatenation, performs a cost analysis of paths through the set of ordered lists of speech units based on the sublist of speech units for each respective speech unit, and synthesizes speech using a lowest cost path of speech units through the set of ordered lists based on the cost analysis. The ordered lists can be ordered based on the respective pitch of each speech unit. In one embodiment, speech units which do not have an assigned pitch can be assigned a pitch.
    Type: Application
    Filed: June 18, 2010
    Publication date: December 22, 2011
    Applicant: AT&T Intellectual Property I, L.P.
    Inventor: Alistair D. CONKIE
  • Publication number: 20110295606
    Abstract: A contextual conversion platform, and method for converting text-to-speech, are described that can convert content of a target to spoken content. Embodiments of the contextual conversion platform can identify certain contextual characteristics of the content, from which can be generated a spoken content input. This spoken content input can include tokens, e.g., words and abbreviations, to be converted to the spoken content, as well as substitution tokens that are selected from contextual repositories based on the context identified by the contextual conversion platform.
    Type: Application
    Filed: May 28, 2010
    Publication date: December 1, 2011
    Inventor: Daniel Ben-Ezri