Synthesis Patents (Class 704/258)

Neural network (Class 704/259)

Image to speech (Class 704/260)

Vocal tract model (Class 704/261)

Linear prediction (Class 704/262)

Correlation (Class 704/263)

Excitation (Class 704/264)

Interpolation (Class 704/265)

Specialized model (Class 704/266)

Time element (Class 704/267)

Frequency element (Class 704/268)

Transformation (Class 704/269)

Text-to-speech device and text-to-speech method

Patent number: 8751237

Abstract: A sound control section (114) selects and outputs a text-to-speech item from items included in program information multiplexed with a broadcast signal; and starts or stops outputting the text-to-speech item, based on request from a remote controller control section (113). A sound generation section (115) converts the text-to-speech item to a sound signal. A speaker (109) reproduces the sound signal. The sound control section (114) compares each item of information about a program currently selected by user's operation of the remote controller, with each item of information about the previous program selected just before the user's operation. If an item of the currently selected program information is the same as the corresponding item of the operation-prior program information, and text-to-speech processing has been already completed for the item after the last change in the item, the sound control section (114) stops outputting the item to the sound generation section (115).

Type: Grant

Filed: February 23, 2011

Date of Patent: June 10, 2014

Assignee: Panasonic Corporation

Inventor: Koumei Kubota
Devices and methods for speech unit reduction in text-to-speech synthesis systems

Patent number: 8751236

Abstract: A device may receive a plurality of speech sounds that are indicative of pronunciations of a first linguistic term. The device may determine concatenation features of the plurality of speech sounds. The concatenation features may be indicative of an acoustic transition between a first speech sound and a second speech sound when the first speech sound and the second speech sound are concatenated. The first speech sound may be included in the plurality of speech sounds and the second speech sound may be indicative of a pronunciation of a second linguistic term. The device may cluster the plurality of speech sounds into one or more clusters based on the concatenation features. The device may provide a representative speech sound of the given cluster as the first speech sound when the first speech sound and the second speech sound are concatenated.

Type: Grant

Filed: October 23, 2013

Date of Patent: June 10, 2014

Assignee: Google Inc.

Inventors: Javier Gonzalvo Fructuoso, Alexander Gutkin, Ioannis Agiomyrgiannakis
Spoken interfaces

Patent number: 8744852

Abstract: A spoken interface is described for assisting a visually impaired user to obtain audible information and interact with elements displayed on a display screen. The spoken interface also enables access and control of other elements that are hidden by other windows. The interface receives user input data representing user inputs received by an input device and uses a movable selector to select an element of an application. The element selected by the selector may be either an editing type element or non-editing type element. The interface provides audio information regarding the selected editing or non-editing element and enables interaction with the selected element.

Type: Grant

Filed: December 20, 2006

Date of Patent: June 3, 2014

Assignee: Apple Inc.

Inventors: Eric T. Seymour, Richard W. Fabrick, II, Patti P. Yeh, John O. Louch
Methods and systems for training dictation-based speech-to-text systems using recorded samples

Patent number: 8744848

Abstract: A method and apparatus useful to train speech recognition engines is provided. Many of today's speech recognition engines require training to particular individuals to accurately convert speech to text. The training requires the use of significant resources for certain applications. To alleviate the resources, a trainer is provided with the text transcription and the audio file. The trainer updates the text based on the audio file. The changes are provided to the speech recognition to train the recognition engine and update the user profile. In certain aspects, the training is reversible as it is possible to over train the system such that the trained system is actually less proficient.

Type: Grant

Filed: April 21, 2011

Date of Patent: June 3, 2014

Assignee: NVQQ Incorporated

Inventors: Jeffrey Hoepfinger, David Mondragon
Speaker-adaptive synthesized voice

Patent number: 8744853

Abstract: An objective is to provide a technique for accurately reproducing features of a fundamental frequency of a target-speaker's voice on the basis of only a small amount of learning data. A learning apparatus learns shift amounts from a reference source F0 pattern to a target F0 pattern of a target-speaker's voice. The learning apparatus associates a source F0 pattern of a learning text to a target F0 pattern of the same learning text by associating their peaks and troughs. For each of points on the target F0 pattern, the learning apparatus obtains shift amounts in a time-axis direction and in a frequency-axis direction from a corresponding point on the source F0 pattern in reference to a result of the association, and learns a decision tree using, as an input feature vector, linguistic information obtained by parsing the learning text, and using, as an output feature vector, the calculated shift amounts.

Type: Grant

Filed: March 16, 2010

Date of Patent: June 3, 2014

Assignee: International Business Machines Corporation

Inventors: Masafumi Nishimura, Ryuki Tachibana
Method and system for enhancing a speech database

Patent number: 8744851

Abstract: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, identifying replacement segments in a secondary speech database, enhancing the primary speech database by substituting the identified secondary speech database segments for the corresponding identified segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.

Type: Grant

Filed: August 13, 2013

Date of Patent: June 3, 2014

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Alistair Conkie, Ann K Syrdal
Methods for activity reduction in pedestrian-to-vehicle communication networks

Patent number: 8738280

Abstract: Methods for pedestrian unit (PU) communication activity reduction in pedestrian-to-vehicle communication networks include obtaining safety risk information for a pedestrian at risk for involvement in an accident and using the risk information to adjust a PU communication activity. In some embodiments, the activity reduction is achieved without implementing understanding of surroundings. In other embodiments, the activity reduction is based on risk assessment provided by vehicles. In some embodiments, the activity reduction includes PU transmission reduction. In some embodiments the transmission activity reduction may be followed by reception activity reduction for overall power consumption reduction.

Type: Grant

Filed: May 10, 2012

Date of Patent: May 27, 2014

Assignee: Autotalks Ltd.

Inventor: Onn Haran
Speech synthesis apparatus and method utilizing acquisition of at least two speech unit waveforms acquired from a continuous memory region by one access

Patent number: 8731933

Abstract: A speech synthesizing apparatus includes a selector configured to select a plurality of speech units for synthesizing a speech of a phoneme sequence by referring to speech unit information stored in an information memory. Speech unit waveforms corresponding to the speech units are acquired from a plurality of speech unit waveforms stored in a waveform memory, and the speech is synthesized by utilizing the speech unit waveforms acquired. When acquiring the speech unit waveforms, at least two speech unit waveforms from a continuous region of the waveform memory are copied onto a buffer by one access, wherein a data quantity of the at least two speech unit waveforms is less than or equal to a size of the buffer.

Type: Grant

Filed: April 10, 2013

Date of Patent: May 20, 2014

Assignee: Kabushiki Kaisha Toshiba

Inventor: Takehiko Kagoshima
Scaled window overlap add for mixed signals

Patent number: 8731913

Abstract: A method for overlap-adding signals useful for performing frame loss concealment (FLC) in an audio decoder as well as in other applications. The method uses a dynamic mix of windows to overlap two signals whose normalized cross-correlation may vary from zero to one. If the overlapping signals are decomposed into a correlated component and an uncorrelated component, they are overlap-added separately using the appropriate window, and then added together. If the overlapping signals are not decomposed, a weighted mix of windows is used. The mix is determined by a measure estimating the amount of cross-correlation between overlapping signals, or the relative amount of correlated to uncorrelated signals.

Type: Grant

Filed: April 13, 2007

Date of Patent: May 20, 2014

Assignee: Broadcom Corporation

Inventors: Robert W. Zopf, Juin-Hwey Chen
System and method for synthetic voice generation and modification

Patent number: 8731932

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating a synthetic voice. A system configured to practice the method combines a first database of a first text-to-speech voice and a second database of a second text-to-speech voice to generate a combined database, selects from the combined database, based on a policy, voice units of a phonetic category for the synthetic voice to yield selected voice units, and synthesizes speech based on the selected voice units. The system can synthesize speech without parameterizing the first text-to-speech voice and the second text-to-speech voice. A policy can define, for a particular phonetic category, from which text-to-speech voice to select voice units. The combined database can include multiple text-to-speech voices from different speakers. The combined database can include voices of a single speaker speaking in different styles. The combined database can include voices of different languages.

Type: Grant

Filed: August 6, 2010

Date of Patent: May 20, 2014

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Alistair D. Conkie, Ann K. Syrdal
System and method for unit selection text-to-speech using a modified Viterbi approach

Patent number: 8731931

Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for speech synthesis. A system practicing the method receives a set of ordered lists of speech units, for each respective speech unit in each ordered list in the set of ordered lists, constructs a sublist of speech units from a next ordered list which are suitable for concatenation, performs a cost analysis of paths through the set of ordered lists of speech units based on the sublist of speech units for each respective speech unit, and synthesizes speech using a lowest cost path of speech units through the set of ordered lists based on the cost analysis. The ordered lists can be ordered based on the respective pitch of each speech unit. In one embodiment, speech units which do not have an assigned pitch can be assigned a pitch.

Type: Grant

Filed: June 18, 2010

Date of Patent: May 20, 2014

Assignee: AT&T Intellectual Property I, L.P.

Inventor: Alistair D. Conkie
Systems, methods and automated technologies for translating words into music and creating music pieces

Patent number: 8731943

Abstract: Systems, methods and computer program products are provided for translating a natural language into music. Through systematic parsing, music compositions can be created. These compositions can be created by one or more persons who do not speak the same natural language.

Type: Grant

Filed: February 5, 2010

Date of Patent: May 20, 2014

Assignee: Little Wing World LLC

Inventors: Nicolle Ruetz, David Warhol
VOICE SYNTHESIZING METHOD AND VOICE SYNTHESIZING APPARATUS

Publication number: 20140136207

Abstract: A voice synthesizing apparatus includes a first receiver configured to receive first utterance control information generated by detecting a start of a manipulation on a manipulating member by a user, a first synthesizer configured to synthesize, in response to a reception of the first utterance control information, a first voice corresponding to a first phoneme in a phoneme sequence of a voice to be synthesized to output the first voice, a second receiver configured to receive second utterance control information generated by detecting a completion of the manipulation on the manipulating member or a manipulation on a different manipulating member, and a second synthesizer configured to synthesize, in response to a reception of the second utterance control information, a second voice including at least the first phoneme and a succeeding phoneme being subsequent to the first phoneme of the voice to be synthesized to output the second voice.

Type: Application

Filed: November 14, 2013

Publication date: May 15, 2014

Applicant: Yamaha Corporation

Inventors: Hiraku KAYAMA, Yoshiki NISHITANI
Providing expressive user interaction with a multimodal application

Patent number: 8725513

Abstract: Methods, apparatus, and products are disclosed for providing expressive user interaction with a multimodal application, the multimodal application operating in a multimodal browser on a multimodal device supporting multiple modes of user interaction including a voice mode and one or more non-voice modes, the multimodal application operatively coupled to a speech engine through a VoiceXML interpreter, including: receiving, by the multimodal browser, user input from a user through a particular mode of user interaction; determining, by the multimodal browser, user output for the user in dependence upon the user input; determining, by the multimodal browser, a style for the user output in dependence upon the user input, the style specifying expressive output characteristics for at least one other mode of user interaction; and rendering, by the multimodal browser, the user output in dependence upon the style.

Type: Grant

Filed: April 12, 2007

Date of Patent: May 13, 2014

Assignee: Nuance Communications, Inc.

Inventors: Charles W. Cross, Jr., Ellen M. Eide, Igor R. Jablokov
Verb error recovery in speech recognition

Patent number: 8725505

Abstract: A computer implemented method and system for speech recognition are provided. The method and system generally maintain a set of verbs for speech recognition commands. Upon recognizing utterance of a verb of the set in combination with an invalid object or objects for the verb, the method and system generate an indication relative to the verb and invalid object. The indication can include informing the user that the system is unsure how to execute the command associated with the verb with the invalid object. The method and system can then receive a user input to specify how the verb and invalid object should be treated.

Type: Grant

Filed: October 22, 2004

Date of Patent: May 13, 2014

Assignee: Microsoft Corporation

Inventors: David Mowatt, Robert L. Chambers
File format, server, viewer device for digital comic, digital comic generation device

Patent number: 8719029

Abstract: A viewer device for a digital comic comprising: an information acquisition unit that acquires a digital comic in a file format for a digital comic viewed on a viewer device, the file format including speech balloon information including information of a speech balloon region that indicates a region of a speech balloon, first text information indicating a dialogue within each speech balloon, the first text information being correlated with each speech balloon, and first display control information including positional information and a transition order of a anchor point so as to enable the image of the entire page to be viewed on a monitor of the viewer device in a scroll view; and a voice reproduction section that synthesizes a voice for reading the letter corresponding to the text information based on an attribute of the character, an attribute of the speech balloon or the dialogue, and outputs the voice.

Type: Grant

Filed: June 20, 2013

Date of Patent: May 6, 2014

Assignee: Fujifilm Corporation

Inventor: Shunichiro Nonaka
Name synthesis

Patent number: 8719027

Abstract: An automated method of providing a pronunciation of a word to a remote device is disclosed. The method includes receiving an input indicative of the word to be pronounced. The method further includes searching a database having a plurality of records. Each of the records has an indication of a textual representation and an associated indication of an audible representation. At least one output is provided to the remote device of an audible representation of the word to be pronounced.

Type: Grant

Filed: February 28, 2007

Date of Patent: May 6, 2014

Assignee: Microsoft Corporation

Inventors: Yining Chen, Yusheng Li, Min Chu, Frank Kao-Ping Soong
Systems and methods for selective text to speech synthesis

Patent number: 8712776

Abstract: Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech in a language (e.g., dialect or accent) that is familiar to a user. The algorithms may be implemented on a system including several dedicated render engines. The system may be part of a back end coupled to a front end including storage for media assets and associated synthesized speech, and a request processor for receiving and processing requests that result in providing the synthesized speech. The front end may communicate media assets and associated synthesized speech content over a network to host devices coupled to portable electronic devices on which the media assets and synthesized speech are played back.

Type: Grant

Filed: September 29, 2008

Date of Patent: April 29, 2014

Assignee: Apple Inc.

Inventors: Jerome Bellegarda, Devang Naik, Kim Silverman
Methods and apparatus for formant-based voice synthesis

Patent number: 8706488

Abstract: In one aspect, a method of processing a voice signal to extract information to facilitate training a speech synthesis model is provided. The method comprises acts of detecting a plurality of candidate features in the voice signal, performing at least one comparison between one or more combinations of the plurality of candidate features and the voice signal, and selecting a set of features from the plurality of candidate features based, at least in part, on the at least one comparison. In another aspect, the method is performed by executing a program encoded on a computer readable medium. In another aspect, a speech synthesis model is provided by, at least in part, performing the method.

Type: Grant

Filed: February 27, 2013

Date of Patent: April 22, 2014

Assignee: Nuance Communications, Inc.

Inventors: Michael D. Edgington, Laurence Gillick, Jordan R. Cohen
Voice recognition terminal

Patent number: 8706492

Abstract: A voice recognition terminal executes a local voice recognition process and utilizes an external center voice recognition process. The terminal includes: a voice message synthesizing element for synthesizing at least one of a voice message to be output from a speaker according to the external center voice recognition process and a voice message to be output from the speaker according to the local voice recognition process so as to distinguish between characteristics of the voice message to be output from the speaker according to the external center voice recognition process and characteristics of the voice message to be output from the speaker according to the local voice recognition process; and a voice output element for outputting a synthesized voice message from the speaker.

Type: Grant

Filed: June 28, 2011

Date of Patent: April 22, 2014

Assignee: DENSO CORPORATION

Inventors: Kunio Yokoi, Kazuhisa Suzuki, Masayuki Takami, Naoyori Tanzawa
Speech signal restoration device and speech signal restoration method

Patent number: 8706497

Abstract: A synthesis filter 106 synthesizes a plurality of wide-band speech signals by combining wide-band phoneme signals and sound source signals from a speech signal code book 105, and a distortion evaluation unit 107 selects one of the wide-band speech signals with a minimum waveform distortion with respect to an up-sampled narrow-band speech signal output from a sampling conversion unit 101. A first bandpass filter 103 extracts a frequency component outside a narrow-band of the wide-band speech signal and a band synthesis unit 104 combines it with the up-sampled narrow-band speech signal.

Type: Grant

Filed: October 22, 2010

Date of Patent: April 22, 2014

Assignee: Mitsubishi Electric Corporation

Inventors: Satoru Furuta, Hirohisa Tasaki
System and method for selecting audio contents by using speech recognition

Patent number: 8706489

Abstract: A system and method for selecting audio contents by using the speech recognition to obtain a textual phrase from a series of audio contents are provided. The system includes an output module outputting the audio contents, an input module receiving a speech input from a user, a buffer temporarily storing the audio contents within a desired period and the speech input, and a recognizing module performing a speech recognition between the audio contents within the desired period and the speech input to generate an audio phrase and the corresponding textual phrase matching with the speech input.

Type: Grant

Filed: August 8, 2006

Date of Patent: April 22, 2014

Assignee: Delta Electronics Inc.

Inventors: Jia-lin Shen, Chien-Chou Hung
Controllable prosody re-estimation system and method and computer program product thereof

Patent number: 8706493

Abstract: In one embodiment of a controllable prosody re-estimation system, a TTS/STS engine consists of a prosody prediction/estimation module, a prosody re-estimation module and a speech synthesis module. The prosody prediction/estimation module generates predicted or estimated prosody information. And then the prosody re-estimation module re-estimates the predicted or estimated prosody information and produces new prosody information, according to a set of controllable parameters provided by a controllable prosody parameter interface. The new prosody information is provided to the speech synthesis module to produce a synthesized speech.

Type: Grant

Filed: July 11, 2011

Date of Patent: April 22, 2014

Assignee: Industrial Technology Research Institute

Inventors: Cheng-Yuan Lin, Chien-Hung Huang, Chih-Chung Kuo
Dynamic prosody adjustment for voice-rendering synthesized data

Patent number: 8694319

Abstract: Methods, systems, and products are disclosed for dynamic prosody adjustment for voice-rendering synthesized data that include retrieving synthesized data to be voice-rendered; identifying, for the synthesized data to be voice-rendered, a particular prosody setting; determining, in dependence upon the synthesized data to be voice-rendered and the context information for the context in which the synthesized data is to be voice-rendered, a section of the synthesized data to be rendered; and rendering the section of the synthesized data in dependence upon the identified particular prosody setting.

Type: Grant

Filed: November 3, 2005

Date of Patent: April 8, 2014

Assignee: International Business Machines Corporation

Inventors: William K. Bodin, David Jaramillo, Jerry W. Redman, Derral C. Thorson
Audio with sound effect generation for text-only applications

Patent number: 8694320

Abstract: A method of generating audio for a text-only application comprises the steps of adding tag to an input text, said tag is usable for adding sound effect to the generated audio; processing the tag to form instructions for generating the audio; generating audio with said effect based on the instructions, while the text being presented. The present invention adds entertainment value to text applications and provides very compact format compared to conventional multimedia as well as uses entertainment sound to make text-only applications such as SMS and email more fun and entertaining.

Type: Grant

Filed: April 24, 2008

Date of Patent: April 8, 2014

Assignee: Nokia Corporation

Inventor: Ole Kirkeby
Method and apparatus for generating synthetic speech with contrastive stress

Patent number: 8682671

Abstract: Techniques for generating synthetic speech with contrastive stress. In one aspect, a speech-enabled application generates a text input including a text transcription of a desired speech output, and inputs the text input to a speech synthesis system. The synthesis system generates an audio speech output corresponding to at least a portion of the text input, with at least one portion carrying contrastive stress, and provides the audio speech output for the speech-enabled application. In another aspect, a speech-enabled application inputs a plurality of text strings, each corresponding to a portion of a desired speech output, to a software module for rendering contrastive stress. The software module identifies a plurality of audio recordings that render at least one portion of at least one of the text strings as speech carrying contrastive stress. The speech-enabled application generates an audio speech output corresponding to the desired speech output using the audio recordings.

Type: Grant

Filed: April 17, 2013

Date of Patent: March 25, 2014

Assignee: Nuance Communications, Inc.

Inventors: Darren C. Meyer, Stephen R. Springer
System and Method for Configuring Voice Synthesis

Publication number: 20140081642

Abstract: Systems and methods for providing synthesized speech in a manner that takes into account the environment where the speech is presented. A method embodiment includes, based on a listening environment and at least one other parameter associated with at least one other parameter, selecting an approach from the plurality of approaches for presenting synthesized speech in a listening environment, presenting synthesized speech according to the selected approach and based on natural language input received from a user indicating that an inability to understand the presented synthesized speech, selecting a second approach from the plurality of approaches and presenting subsequent synthesized speech using the second approach.

Type: Application

Filed: November 26, 2013

Publication date: March 20, 2014

Inventors: Kenneth H. Rosen, Carroll W. Creswell, Jeffrey J. Farah, Pradeep K. Bansal, Ann K. Syrdal
Method for time scaling of a sequence of input signal values

Patent number: 8676584

Abstract: The invention relates to a digital signal processing technique that changes the length of an audio signal and, thus, effectively its play-out speed. This is used for frame rate conversion or sound effects in music production. Time scaling may further be used for fast forward or slow-motion audio play-out. According said method the waveform similarity overlap add approach is modified such that a maximized similarity is determined among similarity measures of sub-sequence pairs each comprising a sub-sequence to-be-matched from a input window and a matching sub-sequence from a search window wherein said sub-sequence pairs comprise at least two sub-sequence pairs of which a first pair comprises a first sub-sequence to-be-matched and a second pair comprises a different second sub-sequence to-be-matched. The input window allows for finding sub-sequence pairs with higher similarity than with a WSOLA approach based on a single sub-sequence to-be-matched. This results in less perceivable artefacts.

Type: Grant

Filed: June 22, 2009

Date of Patent: March 18, 2014

Assignee: Thomson Licensing

Inventor: Markus Schlosser
Automatically generating audible representations of data content based on user preferences

Patent number: 8670984

Abstract: A custom-content audible representation of selected data content is automatically created for a user. The content is based on content preferences of the user (e.g., one or more web browsing histories). The content is aggregated, converted using text-to-speech technology, and adapted to fit in a desired length selected for the personalized audible representation. The length of the audible representation may be custom for the user, and may be determined based on the amount of time the user is typically traveling.

Type: Grant

Filed: February 25, 2011

Date of Patent: March 11, 2014

Assignee: Nuance Communications, Inc.

Inventors: Eli M. Dow, Marie R. Laser, Sarah J. Sheppard, Jessie Yu
SEGMENT INFORMATION GENERATION DEVICE, SPEECH SYNTHESIS DEVICE, SPEECH SYNTHESIS METHOD, AND SPEECH SYNTHESIS PROGRAM

Publication number: 20140067396

Abstract: A segment information generation device includes a waveform cutout unit cuts out a speech waveform from natural speech at a time period not depending on a pitch frequency of the natural speech. A feature parameter extraction unit extracts a feature parameter of a speech waveform from the speech waveform cut out by the waveform cutout unit. A time domain waveform generation unit generates a time domain waveform based on the feature parameter.

Type: Application

Filed: May 10, 2012

Publication date: March 6, 2014

Inventor: Masanori Kato
METHOD, SYSTEM AND PROCESSOR-READABLE MEDIA FOR AUTOMATICALLY VOCALIZING USER PRE-SELECTED SPORTING EVENT SCORES

Publication number: 20140067398

Abstract: A method and system for vocalizing user-selected sporting event scores. A customized spoken score application module can be configured in association with a device. A real-time score can be preselected by a user from an existing sporting event website for automatically vocalizing the score in a multitude of languages utilizing a speech synthesizer and a translation engine. An existing text-to-speech engine can be integrated with the spoken score application module and controlled by the application module to automatically vocalize the preselected scores listed on the sporting event site. The synthetically-voiced, real-time score can be transmitted to the device at a predetermined time interval. Such an approach automatically and instantly pushes the real time vocal alerts thereby permitting the user to continue multitasking without activating the pre-selected vocal alerts.

Type: Application

Filed: August 30, 2012

Publication date: March 6, 2014

Inventor: Tony Verna
Question-answering system and method based on semantic labeling of text documents and user questions

Patent number: 8666730

Abstract: A question-answering system for searching exact answers in text documents provided in the electronic or digital form to questions formulated by user in the natural language is based on automatic semantic labeling of text documents and user questions. The system performs semantic labeling with the help of markers in terms of basic knowledge types, their components and attributes, in terms of question types from the predefined classifier for target words, and in terms of components of possible answers. A matching procedure makes use of mentioned types of semantic labels to determine exact answers to questions and present them to the user in the form of fragments of sentences or a newly synthesized phrase in the natural language. Users can independently add new types of questions to the system classifier and develop required linguistic patterns for the system linguistic knowledge base.

Type: Grant

Filed: March 12, 2010

Date of Patent: March 4, 2014

Assignee: Invention Machine Corporation

Inventors: James Todhunter, Igor Sovpel, Dzianis Pastanohau
System and method for generating customized text-to-speech voices

Patent number: 8666746

Abstract: A system and method are disclosed for generating customized text-to-speech voices for a particular application. The method comprises generating a custom text-to-speech voice by selecting a voice for generating a custom text-to-speech voice associated with a domain, collecting text data associated with the domain from a pre-existing text data source and using the collected text data, generating an in-domain inventory of synthesis speech units by selecting speech units appropriate to the domain via a search of a pre-existing inventory of synthesis speech units, or by recording the minimal inventory for a selected level of synthesis quality. The text-to-speech custom voice for the domain is generated utilizing the in-domain inventory of synthesis speech units. Active learning techniques may also be employed to identify problem phrases wherein only a few minutes of recorded data is necessary to deliver a high quality TTS custom voice.

Type: Grant

Filed: May 13, 2004

Date of Patent: March 4, 2014

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Srinivas Bangalore, Junlan Feng, Mazin G. Rahim, Juergen Schroeter, David Eugene Schulz, Ann K. Syrdal
Management and prioritization of processing multiple requests

Patent number: 8660843

Abstract: Systems and methods are described for systems that utilize an interaction manager to manage interactions—also known as requests or dialogues—from one or more applications. The interactions are managed properly even if multiple applications use different grammars. The interaction manager maintains a priority for each of the interactions, such as via an interaction list, where the priority of the interactions corresponds to an order in which the interactions are to be processed. Interactions are normally processed in the order in which they are received. However, the systems and method described herein may provide a grace period after processing a first interaction and before processing a second interaction. If a third interaction that is chained to the first interaction is received during this grace period, then the third interaction may be processed before the second interaction.

Type: Grant

Filed: January 23, 2013

Date of Patent: February 25, 2014

Assignee: Microsoft Corporation

Inventors: Stephen Russell Falcon, Clement Chun Pong Yip, Dan Banay, David Michael Miller
Personalized text-to-speech synthesis and personalized speech feature extraction

Patent number: 8655659

Abstract: A personalized text-to-speech synthesizing device includes: a personalized speech feature library creator, configured to recognize personalized speech features of a specific speaker by comparing a random speech fragment of the specific speaker with preset keywords, thereby to create a personalized speech feature library associated with the specific speaker, and store the personalized speech feature library in association with the specific speaker; and a text-to-speech synthesizer, configured to perform a speech synthesis of a text message from the specific speaker, based on the personalized speech feature library associated with the specific speaker and created by the personalized speech feature library creator, thereby to generate and output a speech fragment having pronunciation characteristics of the specific speaker.

Type: Grant

Filed: August 12, 2010

Date of Patent: February 18, 2014

Assignees: Sony Corporation, Sony Mobile Communications AB

Inventors: Qingfang Wang, Shouchun He
System and method for answering a communication notification

Patent number: 8655662

Abstract: Disclosed herein are systems, methods, and computer readable-media for answering a communication notification. The method for answering a communication notification comprises receiving a notification of communication from a user, converting information related to the notification to speech, outputting the information as speech to the user, and receiving from the user an instruction to accept or ignore the incoming communication associated with the notification. In one embodiment, information related to the notification comprises one or more of a telephone number, an area code, a geographic origin of the request, caller id, a voice message, address book information, a text message, an email, a subject line, an importance level, a photograph, a video clip, metadata, an IP address, or a domain name. Another embodiment involves notification assigned an importance level and repeat attempts at notification if it is of high importance.

Type: Grant

Filed: November 29, 2012

Date of Patent: February 18, 2014

Assignee: AT&T Intellectual Property I, L.P.

Inventor: Horst Schroeter
SYSTEM FOR CREATING MUSICAL CONTENT USING A CLIENT TERMINAL

Publication number: 20140046667

Abstract: A system for creating musical content using a client terminal, wherein diverse musical information such as a desired lyric and musical scale, duration and singing technique is input from an online or cloud computer, an embedded terminal or other such client terminal by means of technology for generating musical vocal content by using computer speech synthesis technology, and then speech in which cadence is expressed in accordance with the musical scale is synthesized as speech run by being produced for the applicable duration and is transmitted to the client terminal is provided.

Type: Application

Filed: April 17, 2012

Publication date: February 13, 2014

Applicant: TGENS CO., LTD

Inventors: Jong Hak Yeom, Won Mo Kang
Electrolaryngeal speech reconstruction method and system thereof

Patent number: 8650027

Abstract: The invention provides an electrolaryngeal speech reconstruction method and a system thereof. Firstly, model parameters are extracted from the collected speech as a parameter library, then facial images of a speaker are acquired and then transmitted to an image analyzing and processing module to obtain the voice onset and offset times and the vowel classes, then a waveform of a voice source is synthesized by a voice source synthesis module, finally, the waveform of the above voice source is output by an electrolarynx vibration output module, wherein the voice source synthesis module firstly sets the model parameters of a glottal voice source so as to synthesize the waveform of the glottal voice source, and then a waveguide model is used to simulate sound transmission in a vocal tract and select shape parameters of the vocal tract according to the vowel classes.

Type: Grant

Filed: September 4, 2012

Date of Patent: February 11, 2014

Assignee: Xi'an Jiaotong University

Inventors: Mingxi Wan, Liang Wu, Supin Wang, Zhifeng Niu, Congying Wan
Speech conversion

Patent number: 8650035

Abstract: A speech conversion system facilitates voice communications. A database comprises a plurality of conversion heuristics, at least some of the conversion heuristics being associated with identification information for at least one first party. At least one speech converter is configured to convert a first speech signal received from the at least one first party into a converted first speech signal different than the first speech signal.

Type: Grant

Filed: November 18, 2005

Date of Patent: February 11, 2014

Assignee: Verizon Laboratories Inc.

Inventor: Adrian E. Conway
Speech processing device, speech processing method, and computer program product for speech processing

Patent number: 8650034

Abstract: According to one embodiment, a speech processing device includes an utterance error occurrence determination information storage unit that stores utterance error occurrence determination information; a related word information storage unit that stores related word information including words; an utterance error occurrence determining unit that compares each of the divided words with the condition, gives the error pattern to the word corresponding to the condition, and determines that the word which does not correspond to the condition does not cause the utterance error; and a phoneme string generating unit that generates a phoneme string of the utterance error. The one of the error patterns associated with one of the conditions is the speech error, the utterance error occurrence determining unit further gives an incorrectly spoken word from the related word information, and the phoneme string generating unit generates a phoneme string of the incorrectly spoken word.

Type: Grant

Filed: August 12, 2011

Date of Patent: February 11, 2014

Assignee: Kabushiki Kaisha Toshiba

Inventor: Noriko Yamanaka
USING THE ABILITY TO SPEAK AS A HUMAN INTERACTIVE PROOF

Publication number: 20140039892

Abstract: In one embodiment, a human interactive proof portal 140 may use a biometric input to determine whether a user is a standard user or a malicious actor. The human interactive proof portal 140 may receive an access request 302 for an online data service 122 from a user device 110. The human interactive proof portal 140 may send a proof challenge 304 to the user device 110 for presentation to a user. The human interactive proof portal 140 may receive from the user device 110 a proof response 306 having a biometric metadata description 430 based on a biometric input from the user.

Type: Application

Filed: August 2, 2012

Publication date: February 6, 2014

Applicant: Microsoft Corporation

Inventors: Chad Mills, Robert Sim, Scott Laufer, Sung Chung
Electronic device and method of associating a voice font with a contact for text-to-speech conversion at the electronic device

Patent number: 8645140

Abstract: A method of associating a voice font with a contact for text-to-speech conversion at an electronic device includes obtaining, at the electronic device, the voice font for the contact, and storing the voice font in association with a contact data record stored in a contacts database at the electronic device. The contact data record includes contact data for the contact.

Type: Grant

Filed: February 25, 2009

Date of Patent: February 4, 2014

Assignee: BlackBerry Limited

Inventor: Yuriy Lobzakov
Method and system for text to speech conversion

Patent number: 8645141

Abstract: A system and method for text to speech conversion. The method of performing text to speech conversion on a portable device includes: identifying a portion of text for conversion to speech format, wherein the identifying includes performing a prediction based on information associated with a user. While the portable device is connected to a power source, a text to speech conversion is performed on the portion of text to produce converted speech. The converted speech is stored into a memory device of the portable device. A reader application is executed, wherein a user request is received for narration of the portion of text. During the executing, the converted speech is accessed from the memory device and rendered to the user, responsive to the user request.

Type: Grant

Filed: September 14, 2010

Date of Patent: February 4, 2014

Assignee: Sony Corporation

Inventors: Ling Jun Wong, True Xiong
Robot, method and program of correcting a robot voice in accordance with head movement

Patent number: 8639511

Abstract: A robot may include a driving control unit configured to control a driving of a movable unit that is connected movably to a body unit, a voice generating unit configured to generate a voice, and a voice output unit configured to output the voice, which has been generated by the voice generating unit. The voice generating unit may correct the voice, which is generated, based on a bearing of the movable unit, which is controlled by the driving control unit, to the body unit.

Type: Grant

Filed: September 14, 2010

Date of Patent: January 28, 2014

Assignee: Honda Motor Co., Ltd.

Inventors: Kazuhiro Nakadai, Takuma Otsuka, Hiroshi Okuno
Scripting support for data identifiers, voice recognition and speech in a telnet session

Patent number: 8635069

Abstract: Methods of adding data identifiers and speech/voice recognition functionality are disclosed. A telnet client runs one or more scripts that add data identifiers to data fields in a telnet session. The input data is inserted in the corresponding fields based on data identifiers. Scripts run only on the telnet client without modifications to the server applications. Further disclosed are methods for providing speech recognition and voice functionality to telnet clients. Portions of input data are converted to voice and played to the user. A user also may provide input to certain fields of the telnet session by using his voice. Scripts running on the telnet client convert the user's voice into text and is inserted to corresponding fields.

Type: Grant

Filed: August 16, 2007

Date of Patent: January 21, 2014

Assignee: Crimson Corporation

Inventors: Lamar John Van Wagenen, Brant David Thomsen, Scott Allen Caddes
Increasing the relevancy of media content

Patent number: 8635058

Abstract: The present invention relates to increasing the relevance of media content communicated to consumers who are consuming the media content. In this regard, at least one of a personal device can be synced with a media device, each of the personal device is associated with at least one of a consumer who is proximate the media device. At least one of a preferred human language associated with at least one of the personal device can be determined. The media device or media content can be configured and or caused to be communicated in at least one of the preferred human language to increase relevance of the media content communicated to the consumer. Other embodiments can include communicating at least a portion of the media content on the personal device and selecting relevant media content based in part on language, cultural, ethnic, time, day, occasion, or geography.

Type: Grant

Filed: March 2, 2010

Date of Patent: January 21, 2014

Inventor: Nilang Patel
Speech translation apparatus, method and program that generates insertion sentence explaining recognized emotion types

Patent number: 8635070

Abstract: According to one embodiment, a speech translation apparatus includes a receiving unit, a first recognition unit, a second recognition unit, a first generation unit, a translation unit, a second generation unit, a synthesis unit. The receiving unit is configured to receive a speech in a first language and convert to speech signal. The first recognition unit is configured to perform speech recognition and generate a transcription. The second recognition unit is configured to recognize which emotion type is included in the speech and generate emotion identification information including recognized emotion type(s). The first generation unit is configured to generate a filtered sentence. The translation unit is configured to generate a translation of the filtered sentence in the first language in a second language. The second generation unit is configured to generate an insertion sentence. The synthesis unit is configured to convert the filtered and the insertion sentences into speech signal.

Type: Grant

Filed: March 25, 2011

Date of Patent: January 21, 2014

Assignee: Kabushiki Kaisha Toshiba

Inventor: Kazuo Sumita
Speech synthesizing apparatus, method, and program

Patent number: 8630857

Abstract: Disclosed is a speech synthesizing apparatus including a segment selection unit that selects a segment suited to a target segment environment from candidate segments, includes a prosody change amount calculation unit that calculates prosody change amount of each candidate segment based on prosody information of candidate segments and the target segment environment, a selection criterion calculation unit that calculates a selection criterion based on the prosody change amount, a candidate selection unit that narrows down selection candidates based on the prosody change amount and the selection criterion, and an optimum segment search unit than searches for an optimum segment from among the narrowed-down candidate segments.

Type: Grant

Filed: February 15, 2008

Date of Patent: January 14, 2014

Assignee: NEC Corporation

Inventors: Masanori Kato, Reishi Kondo, Yasuyuki Mitsui
Speech synthesizing device, computer program product, and method

Patent number: 8626510

Abstract: An acquiring unit acquires pattern sentences, which are similar to one another and include fixed segments and non-fixed segments, and substitution words that are substituted for the non-fixed segments. A sentence generating unit generates target sentences by replacing the non-fixed segments with the substitution words for each of the pattern sentences. A first synthetic-sound generating unit generates a first synthetic sound, a synthetic sound of the fixed segment, and a second synthetic-sound generating unit generates a second synthetic sound, a synthetic sound of the substitution word, for each of the target sentences. A calculating unit calculates a discontinuity value of a boundary between the first synthetic sound and the second synthetic sound for each of the target sentences and a selecting unit selects the target sentence having the smallest discontinuity value. A connecting unit connects the first synthetic sound and the second synthetic sound of the target sentence selected.

Type: Grant

Filed: September 15, 2009

Date of Patent: January 7, 2014

Assignee: Kabushiki Kaisha Toshiba

Inventor: Nobuaki Mizutani
System for controlling digital effects in live performances with vocal improvisation

Patent number: 8620661

Abstract: A system for controlling digital effects in live performances with vocal improvisation is described. The system features a controller that utilizes several switches attached to clothing that is worn by an artist during a live performance. The switches activate a digital vocal processor unit that provides a dual mode, multi-channel phrase looping capability wherein individual channels can be selected for recording and replay during the performance. This combination of features allows a sequence of digital audio and video effects to be controlled by the artist during a performance while maintaining the freedom of movement desired to enhance the performance.

Type: Grant

Filed: February 28, 2011

Date of Patent: December 31, 2013

Inventor: Momilani Ramstrum

prev … 3 4 5 6 7 8 9 10 11 … next