Synthesis Patents (Class 704/258)
  • Patent number: 8751237
    Abstract: A sound control section (114) selects and outputs a text-to-speech item from items included in program information multiplexed with a broadcast signal; and starts or stops outputting the text-to-speech item, based on request from a remote controller control section (113). A sound generation section (115) converts the text-to-speech item to a sound signal. A speaker (109) reproduces the sound signal. The sound control section (114) compares each item of information about a program currently selected by user's operation of the remote controller, with each item of information about the previous program selected just before the user's operation. If an item of the currently selected program information is the same as the corresponding item of the operation-prior program information, and text-to-speech processing has been already completed for the item after the last change in the item, the sound control section (114) stops outputting the item to the sound generation section (115).
    Type: Grant
    Filed: February 23, 2011
    Date of Patent: June 10, 2014
    Assignee: Panasonic Corporation
    Inventor: Koumei Kubota
  • Patent number: 8751236
    Abstract: A device may receive a plurality of speech sounds that are indicative of pronunciations of a first linguistic term. The device may determine concatenation features of the plurality of speech sounds. The concatenation features may be indicative of an acoustic transition between a first speech sound and a second speech sound when the first speech sound and the second speech sound are concatenated. The first speech sound may be included in the plurality of speech sounds and the second speech sound may be indicative of a pronunciation of a second linguistic term. The device may cluster the plurality of speech sounds into one or more clusters based on the concatenation features. The device may provide a representative speech sound of the given cluster as the first speech sound when the first speech sound and the second speech sound are concatenated.
    Type: Grant
    Filed: October 23, 2013
    Date of Patent: June 10, 2014
    Assignee: Google Inc.
    Inventors: Javier Gonzalvo Fructuoso, Alexander Gutkin, Ioannis Agiomyrgiannakis
  • Patent number: 8744852
    Abstract: A spoken interface is described for assisting a visually impaired user to obtain audible information and interact with elements displayed on a display screen. The spoken interface also enables access and control of other elements that are hidden by other windows. The interface receives user input data representing user inputs received by an input device and uses a movable selector to select an element of an application. The element selected by the selector may be either an editing type element or non-editing type element. The interface provides audio information regarding the selected editing or non-editing element and enables interaction with the selected element.
    Type: Grant
    Filed: December 20, 2006
    Date of Patent: June 3, 2014
    Assignee: Apple Inc.
    Inventors: Eric T. Seymour, Richard W. Fabrick, II, Patti P. Yeh, John O. Louch
  • Patent number: 8744848
    Abstract: A method and apparatus useful to train speech recognition engines is provided. Many of today's speech recognition engines require training to particular individuals to accurately convert speech to text. The training requires the use of significant resources for certain applications. To alleviate the resources, a trainer is provided with the text transcription and the audio file. The trainer updates the text based on the audio file. The changes are provided to the speech recognition to train the recognition engine and update the user profile. In certain aspects, the training is reversible as it is possible to over train the system such that the trained system is actually less proficient.
    Type: Grant
    Filed: April 21, 2011
    Date of Patent: June 3, 2014
    Assignee: NVQQ Incorporated
    Inventors: Jeffrey Hoepfinger, David Mondragon
  • Patent number: 8744853
    Abstract: An objective is to provide a technique for accurately reproducing features of a fundamental frequency of a target-speaker's voice on the basis of only a small amount of learning data. A learning apparatus learns shift amounts from a reference source F0 pattern to a target F0 pattern of a target-speaker's voice. The learning apparatus associates a source F0 pattern of a learning text to a target F0 pattern of the same learning text by associating their peaks and troughs. For each of points on the target F0 pattern, the learning apparatus obtains shift amounts in a time-axis direction and in a frequency-axis direction from a corresponding point on the source F0 pattern in reference to a result of the association, and learns a decision tree using, as an input feature vector, linguistic information obtained by parsing the learning text, and using, as an output feature vector, the calculated shift amounts.
    Type: Grant
    Filed: March 16, 2010
    Date of Patent: June 3, 2014
    Assignee: International Business Machines Corporation
    Inventors: Masafumi Nishimura, Ryuki Tachibana
  • Patent number: 8744851
    Abstract: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, identifying replacement segments in a secondary speech database, enhancing the primary speech database by substituting the identified secondary speech database segments for the corresponding identified segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.
    Type: Grant
    Filed: August 13, 2013
    Date of Patent: June 3, 2014
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Alistair Conkie, Ann K Syrdal
  • Patent number: 8738280
    Abstract: Methods for pedestrian unit (PU) communication activity reduction in pedestrian-to-vehicle communication networks include obtaining safety risk information for a pedestrian at risk for involvement in an accident and using the risk information to adjust a PU communication activity. In some embodiments, the activity reduction is achieved without implementing understanding of surroundings. In other embodiments, the activity reduction is based on risk assessment provided by vehicles. In some embodiments, the activity reduction includes PU transmission reduction. In some embodiments the transmission activity reduction may be followed by reception activity reduction for overall power consumption reduction.
    Type: Grant
    Filed: May 10, 2012
    Date of Patent: May 27, 2014
    Assignee: Autotalks Ltd.
    Inventor: Onn Haran
  • Patent number: 8731933
    Abstract: A speech synthesizing apparatus includes a selector configured to select a plurality of speech units for synthesizing a speech of a phoneme sequence by referring to speech unit information stored in an information memory. Speech unit waveforms corresponding to the speech units are acquired from a plurality of speech unit waveforms stored in a waveform memory, and the speech is synthesized by utilizing the speech unit waveforms acquired. When acquiring the speech unit waveforms, at least two speech unit waveforms from a continuous region of the waveform memory are copied onto a buffer by one access, wherein a data quantity of the at least two speech unit waveforms is less than or equal to a size of the buffer.
    Type: Grant
    Filed: April 10, 2013
    Date of Patent: May 20, 2014
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Takehiko Kagoshima
  • Patent number: 8731913
    Abstract: A method for overlap-adding signals useful for performing frame loss concealment (FLC) in an audio decoder as well as in other applications. The method uses a dynamic mix of windows to overlap two signals whose normalized cross-correlation may vary from zero to one. If the overlapping signals are decomposed into a correlated component and an uncorrelated component, they are overlap-added separately using the appropriate window, and then added together. If the overlapping signals are not decomposed, a weighted mix of windows is used. The mix is determined by a measure estimating the amount of cross-correlation between overlapping signals, or the relative amount of correlated to uncorrelated signals.
    Type: Grant
    Filed: April 13, 2007
    Date of Patent: May 20, 2014
    Assignee: Broadcom Corporation
    Inventors: Robert W. Zopf, Juin-Hwey Chen
  • Patent number: 8731932
    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating a synthetic voice. A system configured to practice the method combines a first database of a first text-to-speech voice and a second database of a second text-to-speech voice to generate a combined database, selects from the combined database, based on a policy, voice units of a phonetic category for the synthetic voice to yield selected voice units, and synthesizes speech based on the selected voice units. The system can synthesize speech without parameterizing the first text-to-speech voice and the second text-to-speech voice. A policy can define, for a particular phonetic category, from which text-to-speech voice to select voice units. The combined database can include multiple text-to-speech voices from different speakers. The combined database can include voices of a single speaker speaking in different styles. The combined database can include voices of different languages.
    Type: Grant
    Filed: August 6, 2010
    Date of Patent: May 20, 2014
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Alistair D. Conkie, Ann K. Syrdal
  • Patent number: 8731931
    Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for speech synthesis. A system practicing the method receives a set of ordered lists of speech units, for each respective speech unit in each ordered list in the set of ordered lists, constructs a sublist of speech units from a next ordered list which are suitable for concatenation, performs a cost analysis of paths through the set of ordered lists of speech units based on the sublist of speech units for each respective speech unit, and synthesizes speech using a lowest cost path of speech units through the set of ordered lists based on the cost analysis. The ordered lists can be ordered based on the respective pitch of each speech unit. In one embodiment, speech units which do not have an assigned pitch can be assigned a pitch.
    Type: Grant
    Filed: June 18, 2010
    Date of Patent: May 20, 2014
    Assignee: AT&T Intellectual Property I, L.P.
    Inventor: Alistair D. Conkie
  • Patent number: 8731943
    Abstract: Systems, methods and computer program products are provided for translating a natural language into music. Through systematic parsing, music compositions can be created. These compositions can be created by one or more persons who do not speak the same natural language.
    Type: Grant
    Filed: February 5, 2010
    Date of Patent: May 20, 2014
    Assignee: Little Wing World LLC
    Inventors: Nicolle Ruetz, David Warhol
  • Publication number: 20140136207
    Abstract: A voice synthesizing apparatus includes a first receiver configured to receive first utterance control information generated by detecting a start of a manipulation on a manipulating member by a user, a first synthesizer configured to synthesize, in response to a reception of the first utterance control information, a first voice corresponding to a first phoneme in a phoneme sequence of a voice to be synthesized to output the first voice, a second receiver configured to receive second utterance control information generated by detecting a completion of the manipulation on the manipulating member or a manipulation on a different manipulating member, and a second synthesizer configured to synthesize, in response to a reception of the second utterance control information, a second voice including at least the first phoneme and a succeeding phoneme being subsequent to the first phoneme of the voice to be synthesized to output the second voice.
    Type: Application
    Filed: November 14, 2013
    Publication date: May 15, 2014
    Applicant: Yamaha Corporation
    Inventors: Hiraku KAYAMA, Yoshiki NISHITANI
  • Patent number: 8725513
    Abstract: Methods, apparatus, and products are disclosed for providing expressive user interaction with a multimodal application, the multimodal application operating in a multimodal browser on a multimodal device supporting multiple modes of user interaction including a voice mode and one or more non-voice modes, the multimodal application operatively coupled to a speech engine through a VoiceXML interpreter, including: receiving, by the multimodal browser, user input from a user through a particular mode of user interaction; determining, by the multimodal browser, user output for the user in dependence upon the user input; determining, by the multimodal browser, a style for the user output in dependence upon the user input, the style specifying expressive output characteristics for at least one other mode of user interaction; and rendering, by the multimodal browser, the user output in dependence upon the style.
    Type: Grant
    Filed: April 12, 2007
    Date of Patent: May 13, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Charles W. Cross, Jr., Ellen M. Eide, Igor R. Jablokov
  • Patent number: 8725505
    Abstract: A computer implemented method and system for speech recognition are provided. The method and system generally maintain a set of verbs for speech recognition commands. Upon recognizing utterance of a verb of the set in combination with an invalid object or objects for the verb, the method and system generate an indication relative to the verb and invalid object. The indication can include informing the user that the system is unsure how to execute the command associated with the verb with the invalid object. The method and system can then receive a user input to specify how the verb and invalid object should be treated.
    Type: Grant
    Filed: October 22, 2004
    Date of Patent: May 13, 2014
    Assignee: Microsoft Corporation
    Inventors: David Mowatt, Robert L. Chambers
  • Patent number: 8719029
    Abstract: A viewer device for a digital comic comprising: an information acquisition unit that acquires a digital comic in a file format for a digital comic viewed on a viewer device, the file format including speech balloon information including information of a speech balloon region that indicates a region of a speech balloon, first text information indicating a dialogue within each speech balloon, the first text information being correlated with each speech balloon, and first display control information including positional information and a transition order of a anchor point so as to enable the image of the entire page to be viewed on a monitor of the viewer device in a scroll view; and a voice reproduction section that synthesizes a voice for reading the letter corresponding to the text information based on an attribute of the character, an attribute of the speech balloon or the dialogue, and outputs the voice.
    Type: Grant
    Filed: June 20, 2013
    Date of Patent: May 6, 2014
    Assignee: Fujifilm Corporation
    Inventor: Shunichiro Nonaka
  • Patent number: 8719027
    Abstract: An automated method of providing a pronunciation of a word to a remote device is disclosed. The method includes receiving an input indicative of the word to be pronounced. The method further includes searching a database having a plurality of records. Each of the records has an indication of a textual representation and an associated indication of an audible representation. At least one output is provided to the remote device of an audible representation of the word to be pronounced.
    Type: Grant
    Filed: February 28, 2007
    Date of Patent: May 6, 2014
    Assignee: Microsoft Corporation
    Inventors: Yining Chen, Yusheng Li, Min Chu, Frank Kao-Ping Soong
  • Patent number: 8712776
    Abstract: Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech in a language (e.g., dialect or accent) that is familiar to a user. The algorithms may be implemented on a system including several dedicated render engines. The system may be part of a back end coupled to a front end including storage for media assets and associated synthesized speech, and a request processor for receiving and processing requests that result in providing the synthesized speech. The front end may communicate media assets and associated synthesized speech content over a network to host devices coupled to portable electronic devices on which the media assets and synthesized speech are played back.
    Type: Grant
    Filed: September 29, 2008
    Date of Patent: April 29, 2014
    Assignee: Apple Inc.
    Inventors: Jerome Bellegarda, Devang Naik, Kim Silverman
  • Patent number: 8706488
    Abstract: In one aspect, a method of processing a voice signal to extract information to facilitate training a speech synthesis model is provided. The method comprises acts of detecting a plurality of candidate features in the voice signal, performing at least one comparison between one or more combinations of the plurality of candidate features and the voice signal, and selecting a set of features from the plurality of candidate features based, at least in part, on the at least one comparison. In another aspect, the method is performed by executing a program encoded on a computer readable medium. In another aspect, a speech synthesis model is provided by, at least in part, performing the method.
    Type: Grant
    Filed: February 27, 2013
    Date of Patent: April 22, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Michael D. Edgington, Laurence Gillick, Jordan R. Cohen
  • Patent number: 8706492
    Abstract: A voice recognition terminal executes a local voice recognition process and utilizes an external center voice recognition process. The terminal includes: a voice message synthesizing element for synthesizing at least one of a voice message to be output from a speaker according to the external center voice recognition process and a voice message to be output from the speaker according to the local voice recognition process so as to distinguish between characteristics of the voice message to be output from the speaker according to the external center voice recognition process and characteristics of the voice message to be output from the speaker according to the local voice recognition process; and a voice output element for outputting a synthesized voice message from the speaker.
    Type: Grant
    Filed: June 28, 2011
    Date of Patent: April 22, 2014
    Assignee: DENSO CORPORATION
    Inventors: Kunio Yokoi, Kazuhisa Suzuki, Masayuki Takami, Naoyori Tanzawa
  • Patent number: 8706497
    Abstract: A synthesis filter 106 synthesizes a plurality of wide-band speech signals by combining wide-band phoneme signals and sound source signals from a speech signal code book 105, and a distortion evaluation unit 107 selects one of the wide-band speech signals with a minimum waveform distortion with respect to an up-sampled narrow-band speech signal output from a sampling conversion unit 101. A first bandpass filter 103 extracts a frequency component outside a narrow-band of the wide-band speech signal and a band synthesis unit 104 combines it with the up-sampled narrow-band speech signal.
    Type: Grant
    Filed: October 22, 2010
    Date of Patent: April 22, 2014
    Assignee: Mitsubishi Electric Corporation
    Inventors: Satoru Furuta, Hirohisa Tasaki
  • Patent number: 8706489
    Abstract: A system and method for selecting audio contents by using the speech recognition to obtain a textual phrase from a series of audio contents are provided. The system includes an output module outputting the audio contents, an input module receiving a speech input from a user, a buffer temporarily storing the audio contents within a desired period and the speech input, and a recognizing module performing a speech recognition between the audio contents within the desired period and the speech input to generate an audio phrase and the corresponding textual phrase matching with the speech input.
    Type: Grant
    Filed: August 8, 2006
    Date of Patent: April 22, 2014
    Assignee: Delta Electronics Inc.
    Inventors: Jia-lin Shen, Chien-Chou Hung
  • Patent number: 8706493
    Abstract: In one embodiment of a controllable prosody re-estimation system, a TTS/STS engine consists of a prosody prediction/estimation module, a prosody re-estimation module and a speech synthesis module. The prosody prediction/estimation module generates predicted or estimated prosody information. And then the prosody re-estimation module re-estimates the predicted or estimated prosody information and produces new prosody information, according to a set of controllable parameters provided by a controllable prosody parameter interface. The new prosody information is provided to the speech synthesis module to produce a synthesized speech.
    Type: Grant
    Filed: July 11, 2011
    Date of Patent: April 22, 2014
    Assignee: Industrial Technology Research Institute
    Inventors: Cheng-Yuan Lin, Chien-Hung Huang, Chih-Chung Kuo
  • Patent number: 8694319
    Abstract: Methods, systems, and products are disclosed for dynamic prosody adjustment for voice-rendering synthesized data that include retrieving synthesized data to be voice-rendered; identifying, for the synthesized data to be voice-rendered, a particular prosody setting; determining, in dependence upon the synthesized data to be voice-rendered and the context information for the context in which the synthesized data is to be voice-rendered, a section of the synthesized data to be rendered; and rendering the section of the synthesized data in dependence upon the identified particular prosody setting.
    Type: Grant
    Filed: November 3, 2005
    Date of Patent: April 8, 2014
    Assignee: International Business Machines Corporation
    Inventors: William K. Bodin, David Jaramillo, Jerry W. Redman, Derral C. Thorson
  • Patent number: 8694320
    Abstract: A method of generating audio for a text-only application comprises the steps of adding tag to an input text, said tag is usable for adding sound effect to the generated audio; processing the tag to form instructions for generating the audio; generating audio with said effect based on the instructions, while the text being presented. The present invention adds entertainment value to text applications and provides very compact format compared to conventional multimedia as well as uses entertainment sound to make text-only applications such as SMS and email more fun and entertaining.
    Type: Grant
    Filed: April 24, 2008
    Date of Patent: April 8, 2014
    Assignee: Nokia Corporation
    Inventor: Ole Kirkeby
  • Patent number: 8682671
    Abstract: Techniques for generating synthetic speech with contrastive stress. In one aspect, a speech-enabled application generates a text input including a text transcription of a desired speech output, and inputs the text input to a speech synthesis system. The synthesis system generates an audio speech output corresponding to at least a portion of the text input, with at least one portion carrying contrastive stress, and provides the audio speech output for the speech-enabled application. In another aspect, a speech-enabled application inputs a plurality of text strings, each corresponding to a portion of a desired speech output, to a software module for rendering contrastive stress. The software module identifies a plurality of audio recordings that render at least one portion of at least one of the text strings as speech carrying contrastive stress. The speech-enabled application generates an audio speech output corresponding to the desired speech output using the audio recordings.
    Type: Grant
    Filed: April 17, 2013
    Date of Patent: March 25, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Darren C. Meyer, Stephen R. Springer
  • Publication number: 20140081642
    Abstract: Systems and methods for providing synthesized speech in a manner that takes into account the environment where the speech is presented. A method embodiment includes, based on a listening environment and at least one other parameter associated with at least one other parameter, selecting an approach from the plurality of approaches for presenting synthesized speech in a listening environment, presenting synthesized speech according to the selected approach and based on natural language input received from a user indicating that an inability to understand the presented synthesized speech, selecting a second approach from the plurality of approaches and presenting subsequent synthesized speech using the second approach.
    Type: Application
    Filed: November 26, 2013
    Publication date: March 20, 2014
    Inventors: Kenneth H. Rosen, Carroll W. Creswell, Jeffrey J. Farah, Pradeep K. Bansal, Ann K. Syrdal
  • Patent number: 8676584
    Abstract: The invention relates to a digital signal processing technique that changes the length of an audio signal and, thus, effectively its play-out speed. This is used for frame rate conversion or sound effects in music production. Time scaling may further be used for fast forward or slow-motion audio play-out. According said method the waveform similarity overlap add approach is modified such that a maximized similarity is determined among similarity measures of sub-sequence pairs each comprising a sub-sequence to-be-matched from a input window and a matching sub-sequence from a search window wherein said sub-sequence pairs comprise at least two sub-sequence pairs of which a first pair comprises a first sub-sequence to-be-matched and a second pair comprises a different second sub-sequence to-be-matched. The input window allows for finding sub-sequence pairs with higher similarity than with a WSOLA approach based on a single sub-sequence to-be-matched. This results in less perceivable artefacts.
    Type: Grant
    Filed: June 22, 2009
    Date of Patent: March 18, 2014
    Assignee: Thomson Licensing
    Inventor: Markus Schlosser
  • Patent number: 8670984
    Abstract: A custom-content audible representation of selected data content is automatically created for a user. The content is based on content preferences of the user (e.g., one or more web browsing histories). The content is aggregated, converted using text-to-speech technology, and adapted to fit in a desired length selected for the personalized audible representation. The length of the audible representation may be custom for the user, and may be determined based on the amount of time the user is typically traveling.
    Type: Grant
    Filed: February 25, 2011
    Date of Patent: March 11, 2014
    Assignee: Nuance Communications, Inc.
    Inventors: Eli M. Dow, Marie R. Laser, Sarah J. Sheppard, Jessie Yu
  • Publication number: 20140067396
    Abstract: A segment information generation device includes a waveform cutout unit cuts out a speech waveform from natural speech at a time period not depending on a pitch frequency of the natural speech. A feature parameter extraction unit extracts a feature parameter of a speech waveform from the speech waveform cut out by the waveform cutout unit. A time domain waveform generation unit generates a time domain waveform based on the feature parameter.
    Type: Application
    Filed: May 10, 2012
    Publication date: March 6, 2014
    Inventor: Masanori Kato
  • Publication number: 20140067398
    Abstract: A method and system for vocalizing user-selected sporting event scores. A customized spoken score application module can be configured in association with a device. A real-time score can be preselected by a user from an existing sporting event website for automatically vocalizing the score in a multitude of languages utilizing a speech synthesizer and a translation engine. An existing text-to-speech engine can be integrated with the spoken score application module and controlled by the application module to automatically vocalize the preselected scores listed on the sporting event site. The synthetically-voiced, real-time score can be transmitted to the device at a predetermined time interval. Such an approach automatically and instantly pushes the real time vocal alerts thereby permitting the user to continue multitasking without activating the pre-selected vocal alerts.
    Type: Application
    Filed: August 30, 2012
    Publication date: March 6, 2014
    Inventor: Tony Verna
  • Patent number: 8666730
    Abstract: A question-answering system for searching exact answers in text documents provided in the electronic or digital form to questions formulated by user in the natural language is based on automatic semantic labeling of text documents and user questions. The system performs semantic labeling with the help of markers in terms of basic knowledge types, their components and attributes, in terms of question types from the predefined classifier for target words, and in terms of components of possible answers. A matching procedure makes use of mentioned types of semantic labels to determine exact answers to questions and present them to the user in the form of fragments of sentences or a newly synthesized phrase in the natural language. Users can independently add new types of questions to the system classifier and develop required linguistic patterns for the system linguistic knowledge base.
    Type: Grant
    Filed: March 12, 2010
    Date of Patent: March 4, 2014
    Assignee: Invention Machine Corporation
    Inventors: James Todhunter, Igor Sovpel, Dzianis Pastanohau
  • Patent number: 8666746
    Abstract: A system and method are disclosed for generating customized text-to-speech voices for a particular application. The method comprises generating a custom text-to-speech voice by selecting a voice for generating a custom text-to-speech voice associated with a domain, collecting text data associated with the domain from a pre-existing text data source and using the collected text data, generating an in-domain inventory of synthesis speech units by selecting speech units appropriate to the domain via a search of a pre-existing inventory of synthesis speech units, or by recording the minimal inventory for a selected level of synthesis quality. The text-to-speech custom voice for the domain is generated utilizing the in-domain inventory of synthesis speech units. Active learning techniques may also be employed to identify problem phrases wherein only a few minutes of recorded data is necessary to deliver a high quality TTS custom voice.
    Type: Grant
    Filed: May 13, 2004
    Date of Patent: March 4, 2014
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Srinivas Bangalore, Junlan Feng, Mazin G. Rahim, Juergen Schroeter, David Eugene Schulz, Ann K. Syrdal
  • Patent number: 8660843
    Abstract: Systems and methods are described for systems that utilize an interaction manager to manage interactions—also known as requests or dialogues—from one or more applications. The interactions are managed properly even if multiple applications use different grammars. The interaction manager maintains a priority for each of the interactions, such as via an interaction list, where the priority of the interactions corresponds to an order in which the interactions are to be processed. Interactions are normally processed in the order in which they are received. However, the systems and method described herein may provide a grace period after processing a first interaction and before processing a second interaction. If a third interaction that is chained to the first interaction is received during this grace period, then the third interaction may be processed before the second interaction.
    Type: Grant
    Filed: January 23, 2013
    Date of Patent: February 25, 2014
    Assignee: Microsoft Corporation
    Inventors: Stephen Russell Falcon, Clement Chun Pong Yip, Dan Banay, David Michael Miller
  • Patent number: 8655659
    Abstract: A personalized text-to-speech synthesizing device includes: a personalized speech feature library creator, configured to recognize personalized speech features of a specific speaker by comparing a random speech fragment of the specific speaker with preset keywords, thereby to create a personalized speech feature library associated with the specific speaker, and store the personalized speech feature library in association with the specific speaker; and a text-to-speech synthesizer, configured to perform a speech synthesis of a text message from the specific speaker, based on the personalized speech feature library associated with the specific speaker and created by the personalized speech feature library creator, thereby to generate and output a speech fragment having pronunciation characteristics of the specific speaker.
    Type: Grant
    Filed: August 12, 2010
    Date of Patent: February 18, 2014
    Assignees: Sony Corporation, Sony Mobile Communications AB
    Inventors: Qingfang Wang, Shouchun He
  • Patent number: 8655662
    Abstract: Disclosed herein are systems, methods, and computer readable-media for answering a communication notification. The method for answering a communication notification comprises receiving a notification of communication from a user, converting information related to the notification to speech, outputting the information as speech to the user, and receiving from the user an instruction to accept or ignore the incoming communication associated with the notification. In one embodiment, information related to the notification comprises one or more of a telephone number, an area code, a geographic origin of the request, caller id, a voice message, address book information, a text message, an email, a subject line, an importance level, a photograph, a video clip, metadata, an IP address, or a domain name. Another embodiment involves notification assigned an importance level and repeat attempts at notification if it is of high importance.
    Type: Grant
    Filed: November 29, 2012
    Date of Patent: February 18, 2014
    Assignee: AT&T Intellectual Property I, L.P.
    Inventor: Horst Schroeter
  • Publication number: 20140046667
    Abstract: A system for creating musical content using a client terminal, wherein diverse musical information such as a desired lyric and musical scale, duration and singing technique is input from an online or cloud computer, an embedded terminal or other such client terminal by means of technology for generating musical vocal content by using computer speech synthesis technology, and then speech in which cadence is expressed in accordance with the musical scale is synthesized as speech run by being produced for the applicable duration and is transmitted to the client terminal is provided.
    Type: Application
    Filed: April 17, 2012
    Publication date: February 13, 2014
    Applicant: TGENS CO., LTD
    Inventors: Jong Hak Yeom, Won Mo Kang
  • Patent number: 8650027
    Abstract: The invention provides an electrolaryngeal speech reconstruction method and a system thereof. Firstly, model parameters are extracted from the collected speech as a parameter library, then facial images of a speaker are acquired and then transmitted to an image analyzing and processing module to obtain the voice onset and offset times and the vowel classes, then a waveform of a voice source is synthesized by a voice source synthesis module, finally, the waveform of the above voice source is output by an electrolarynx vibration output module, wherein the voice source synthesis module firstly sets the model parameters of a glottal voice source so as to synthesize the waveform of the glottal voice source, and then a waveguide model is used to simulate sound transmission in a vocal tract and select shape parameters of the vocal tract according to the vowel classes.
    Type: Grant
    Filed: September 4, 2012
    Date of Patent: February 11, 2014
    Assignee: Xi'an Jiaotong University
    Inventors: Mingxi Wan, Liang Wu, Supin Wang, Zhifeng Niu, Congying Wan
  • Patent number: 8650035
    Abstract: A speech conversion system facilitates voice communications. A database comprises a plurality of conversion heuristics, at least some of the conversion heuristics being associated with identification information for at least one first party. At least one speech converter is configured to convert a first speech signal received from the at least one first party into a converted first speech signal different than the first speech signal.
    Type: Grant
    Filed: November 18, 2005
    Date of Patent: February 11, 2014
    Assignee: Verizon Laboratories Inc.
    Inventor: Adrian E. Conway
  • Patent number: 8650034
    Abstract: According to one embodiment, a speech processing device includes an utterance error occurrence determination information storage unit that stores utterance error occurrence determination information; a related word information storage unit that stores related word information including words; an utterance error occurrence determining unit that compares each of the divided words with the condition, gives the error pattern to the word corresponding to the condition, and determines that the word which does not correspond to the condition does not cause the utterance error; and a phoneme string generating unit that generates a phoneme string of the utterance error. The one of the error patterns associated with one of the conditions is the speech error, the utterance error occurrence determining unit further gives an incorrectly spoken word from the related word information, and the phoneme string generating unit generates a phoneme string of the incorrectly spoken word.
    Type: Grant
    Filed: August 12, 2011
    Date of Patent: February 11, 2014
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Noriko Yamanaka
  • Publication number: 20140039892
    Abstract: In one embodiment, a human interactive proof portal 140 may use a biometric input to determine whether a user is a standard user or a malicious actor. The human interactive proof portal 140 may receive an access request 302 for an online data service 122 from a user device 110. The human interactive proof portal 140 may send a proof challenge 304 to the user device 110 for presentation to a user. The human interactive proof portal 140 may receive from the user device 110 a proof response 306 having a biometric metadata description 430 based on a biometric input from the user.
    Type: Application
    Filed: August 2, 2012
    Publication date: February 6, 2014
    Applicant: Microsoft Corporation
    Inventors: Chad Mills, Robert Sim, Scott Laufer, Sung Chung
  • Patent number: 8645140
    Abstract: A method of associating a voice font with a contact for text-to-speech conversion at an electronic device includes obtaining, at the electronic device, the voice font for the contact, and storing the voice font in association with a contact data record stored in a contacts database at the electronic device. The contact data record includes contact data for the contact.
    Type: Grant
    Filed: February 25, 2009
    Date of Patent: February 4, 2014
    Assignee: BlackBerry Limited
    Inventor: Yuriy Lobzakov
  • Patent number: 8645141
    Abstract: A system and method for text to speech conversion. The method of performing text to speech conversion on a portable device includes: identifying a portion of text for conversion to speech format, wherein the identifying includes performing a prediction based on information associated with a user. While the portable device is connected to a power source, a text to speech conversion is performed on the portion of text to produce converted speech. The converted speech is stored into a memory device of the portable device. A reader application is executed, wherein a user request is received for narration of the portion of text. During the executing, the converted speech is accessed from the memory device and rendered to the user, responsive to the user request.
    Type: Grant
    Filed: September 14, 2010
    Date of Patent: February 4, 2014
    Assignee: Sony Corporation
    Inventors: Ling Jun Wong, True Xiong
  • Patent number: 8639511
    Abstract: A robot may include a driving control unit configured to control a driving of a movable unit that is connected movably to a body unit, a voice generating unit configured to generate a voice, and a voice output unit configured to output the voice, which has been generated by the voice generating unit. The voice generating unit may correct the voice, which is generated, based on a bearing of the movable unit, which is controlled by the driving control unit, to the body unit.
    Type: Grant
    Filed: September 14, 2010
    Date of Patent: January 28, 2014
    Assignee: Honda Motor Co., Ltd.
    Inventors: Kazuhiro Nakadai, Takuma Otsuka, Hiroshi Okuno
  • Patent number: 8635069
    Abstract: Methods of adding data identifiers and speech/voice recognition functionality are disclosed. A telnet client runs one or more scripts that add data identifiers to data fields in a telnet session. The input data is inserted in the corresponding fields based on data identifiers. Scripts run only on the telnet client without modifications to the server applications. Further disclosed are methods for providing speech recognition and voice functionality to telnet clients. Portions of input data are converted to voice and played to the user. A user also may provide input to certain fields of the telnet session by using his voice. Scripts running on the telnet client convert the user's voice into text and is inserted to corresponding fields.
    Type: Grant
    Filed: August 16, 2007
    Date of Patent: January 21, 2014
    Assignee: Crimson Corporation
    Inventors: Lamar John Van Wagenen, Brant David Thomsen, Scott Allen Caddes
  • Patent number: 8635058
    Abstract: The present invention relates to increasing the relevance of media content communicated to consumers who are consuming the media content. In this regard, at least one of a personal device can be synced with a media device, each of the personal device is associated with at least one of a consumer who is proximate the media device. At least one of a preferred human language associated with at least one of the personal device can be determined. The media device or media content can be configured and or caused to be communicated in at least one of the preferred human language to increase relevance of the media content communicated to the consumer. Other embodiments can include communicating at least a portion of the media content on the personal device and selecting relevant media content based in part on language, cultural, ethnic, time, day, occasion, or geography.
    Type: Grant
    Filed: March 2, 2010
    Date of Patent: January 21, 2014
    Inventor: Nilang Patel
  • Patent number: 8635070
    Abstract: According to one embodiment, a speech translation apparatus includes a receiving unit, a first recognition unit, a second recognition unit, a first generation unit, a translation unit, a second generation unit, a synthesis unit. The receiving unit is configured to receive a speech in a first language and convert to speech signal. The first recognition unit is configured to perform speech recognition and generate a transcription. The second recognition unit is configured to recognize which emotion type is included in the speech and generate emotion identification information including recognized emotion type(s). The first generation unit is configured to generate a filtered sentence. The translation unit is configured to generate a translation of the filtered sentence in the first language in a second language. The second generation unit is configured to generate an insertion sentence. The synthesis unit is configured to convert the filtered and the insertion sentences into speech signal.
    Type: Grant
    Filed: March 25, 2011
    Date of Patent: January 21, 2014
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Kazuo Sumita
  • Patent number: 8630857
    Abstract: Disclosed is a speech synthesizing apparatus including a segment selection unit that selects a segment suited to a target segment environment from candidate segments, includes a prosody change amount calculation unit that calculates prosody change amount of each candidate segment based on prosody information of candidate segments and the target segment environment, a selection criterion calculation unit that calculates a selection criterion based on the prosody change amount, a candidate selection unit that narrows down selection candidates based on the prosody change amount and the selection criterion, and an optimum segment search unit than searches for an optimum segment from among the narrowed-down candidate segments.
    Type: Grant
    Filed: February 15, 2008
    Date of Patent: January 14, 2014
    Assignee: NEC Corporation
    Inventors: Masanori Kato, Reishi Kondo, Yasuyuki Mitsui
  • Patent number: 8626510
    Abstract: An acquiring unit acquires pattern sentences, which are similar to one another and include fixed segments and non-fixed segments, and substitution words that are substituted for the non-fixed segments. A sentence generating unit generates target sentences by replacing the non-fixed segments with the substitution words for each of the pattern sentences. A first synthetic-sound generating unit generates a first synthetic sound, a synthetic sound of the fixed segment, and a second synthetic-sound generating unit generates a second synthetic sound, a synthetic sound of the substitution word, for each of the target sentences. A calculating unit calculates a discontinuity value of a boundary between the first synthetic sound and the second synthetic sound for each of the target sentences and a selecting unit selects the target sentence having the smallest discontinuity value. A connecting unit connects the first synthetic sound and the second synthetic sound of the target sentence selected.
    Type: Grant
    Filed: September 15, 2009
    Date of Patent: January 7, 2014
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Nobuaki Mizutani
  • Patent number: 8620661
    Abstract: A system for controlling digital effects in live performances with vocal improvisation is described. The system features a controller that utilizes several switches attached to clothing that is worn by an artist during a live performance. The switches activate a digital vocal processor unit that provides a dual mode, multi-channel phrase looping capability wherein individual channels can be selected for recording and replay during the performance. This combination of features allows a sequence of digital audio and video effects to be controlled by the artist during a performance while maintaining the freedom of movement desired to enhance the performance.
    Type: Grant
    Filed: February 28, 2011
    Date of Patent: December 31, 2013
    Inventor: Momilani Ramstrum