Synthesis Patents (Class 704/258)
-
Patent number: 8751237Abstract: A sound control section (114) selects and outputs a text-to-speech item from items included in program information multiplexed with a broadcast signal; and starts or stops outputting the text-to-speech item, based on request from a remote controller control section (113). A sound generation section (115) converts the text-to-speech item to a sound signal. A speaker (109) reproduces the sound signal. The sound control section (114) compares each item of information about a program currently selected by user's operation of the remote controller, with each item of information about the previous program selected just before the user's operation. If an item of the currently selected program information is the same as the corresponding item of the operation-prior program information, and text-to-speech processing has been already completed for the item after the last change in the item, the sound control section (114) stops outputting the item to the sound generation section (115).Type: GrantFiled: February 23, 2011Date of Patent: June 10, 2014Assignee: Panasonic CorporationInventor: Koumei Kubota
-
Patent number: 8751236Abstract: A device may receive a plurality of speech sounds that are indicative of pronunciations of a first linguistic term. The device may determine concatenation features of the plurality of speech sounds. The concatenation features may be indicative of an acoustic transition between a first speech sound and a second speech sound when the first speech sound and the second speech sound are concatenated. The first speech sound may be included in the plurality of speech sounds and the second speech sound may be indicative of a pronunciation of a second linguistic term. The device may cluster the plurality of speech sounds into one or more clusters based on the concatenation features. The device may provide a representative speech sound of the given cluster as the first speech sound when the first speech sound and the second speech sound are concatenated.Type: GrantFiled: October 23, 2013Date of Patent: June 10, 2014Assignee: Google Inc.Inventors: Javier Gonzalvo Fructuoso, Alexander Gutkin, Ioannis Agiomyrgiannakis
-
Patent number: 8744852Abstract: A spoken interface is described for assisting a visually impaired user to obtain audible information and interact with elements displayed on a display screen. The spoken interface also enables access and control of other elements that are hidden by other windows. The interface receives user input data representing user inputs received by an input device and uses a movable selector to select an element of an application. The element selected by the selector may be either an editing type element or non-editing type element. The interface provides audio information regarding the selected editing or non-editing element and enables interaction with the selected element.Type: GrantFiled: December 20, 2006Date of Patent: June 3, 2014Assignee: Apple Inc.Inventors: Eric T. Seymour, Richard W. Fabrick, II, Patti P. Yeh, John O. Louch
-
Patent number: 8744848Abstract: A method and apparatus useful to train speech recognition engines is provided. Many of today's speech recognition engines require training to particular individuals to accurately convert speech to text. The training requires the use of significant resources for certain applications. To alleviate the resources, a trainer is provided with the text transcription and the audio file. The trainer updates the text based on the audio file. The changes are provided to the speech recognition to train the recognition engine and update the user profile. In certain aspects, the training is reversible as it is possible to over train the system such that the trained system is actually less proficient.Type: GrantFiled: April 21, 2011Date of Patent: June 3, 2014Assignee: NVQQ IncorporatedInventors: Jeffrey Hoepfinger, David Mondragon
-
Patent number: 8744853Abstract: An objective is to provide a technique for accurately reproducing features of a fundamental frequency of a target-speaker's voice on the basis of only a small amount of learning data. A learning apparatus learns shift amounts from a reference source F0 pattern to a target F0 pattern of a target-speaker's voice. The learning apparatus associates a source F0 pattern of a learning text to a target F0 pattern of the same learning text by associating their peaks and troughs. For each of points on the target F0 pattern, the learning apparatus obtains shift amounts in a time-axis direction and in a frequency-axis direction from a corresponding point on the source F0 pattern in reference to a result of the association, and learns a decision tree using, as an input feature vector, linguistic information obtained by parsing the learning text, and using, as an output feature vector, the calculated shift amounts.Type: GrantFiled: March 16, 2010Date of Patent: June 3, 2014Assignee: International Business Machines CorporationInventors: Masafumi Nishimura, Ryuki Tachibana
-
Patent number: 8744851Abstract: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, identifying replacement segments in a secondary speech database, enhancing the primary speech database by substituting the identified secondary speech database segments for the corresponding identified segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.Type: GrantFiled: August 13, 2013Date of Patent: June 3, 2014Assignee: AT&T Intellectual Property II, L.P.Inventors: Alistair Conkie, Ann K Syrdal
-
Patent number: 8738280Abstract: Methods for pedestrian unit (PU) communication activity reduction in pedestrian-to-vehicle communication networks include obtaining safety risk information for a pedestrian at risk for involvement in an accident and using the risk information to adjust a PU communication activity. In some embodiments, the activity reduction is achieved without implementing understanding of surroundings. In other embodiments, the activity reduction is based on risk assessment provided by vehicles. In some embodiments, the activity reduction includes PU transmission reduction. In some embodiments the transmission activity reduction may be followed by reception activity reduction for overall power consumption reduction.Type: GrantFiled: May 10, 2012Date of Patent: May 27, 2014Assignee: Autotalks Ltd.Inventor: Onn Haran
-
Patent number: 8731933Abstract: A speech synthesizing apparatus includes a selector configured to select a plurality of speech units for synthesizing a speech of a phoneme sequence by referring to speech unit information stored in an information memory. Speech unit waveforms corresponding to the speech units are acquired from a plurality of speech unit waveforms stored in a waveform memory, and the speech is synthesized by utilizing the speech unit waveforms acquired. When acquiring the speech unit waveforms, at least two speech unit waveforms from a continuous region of the waveform memory are copied onto a buffer by one access, wherein a data quantity of the at least two speech unit waveforms is less than or equal to a size of the buffer.Type: GrantFiled: April 10, 2013Date of Patent: May 20, 2014Assignee: Kabushiki Kaisha ToshibaInventor: Takehiko Kagoshima
-
Patent number: 8731913Abstract: A method for overlap-adding signals useful for performing frame loss concealment (FLC) in an audio decoder as well as in other applications. The method uses a dynamic mix of windows to overlap two signals whose normalized cross-correlation may vary from zero to one. If the overlapping signals are decomposed into a correlated component and an uncorrelated component, they are overlap-added separately using the appropriate window, and then added together. If the overlapping signals are not decomposed, a weighted mix of windows is used. The mix is determined by a measure estimating the amount of cross-correlation between overlapping signals, or the relative amount of correlated to uncorrelated signals.Type: GrantFiled: April 13, 2007Date of Patent: May 20, 2014Assignee: Broadcom CorporationInventors: Robert W. Zopf, Juin-Hwey Chen
-
Patent number: 8731932Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for generating a synthetic voice. A system configured to practice the method combines a first database of a first text-to-speech voice and a second database of a second text-to-speech voice to generate a combined database, selects from the combined database, based on a policy, voice units of a phonetic category for the synthetic voice to yield selected voice units, and synthesizes speech based on the selected voice units. The system can synthesize speech without parameterizing the first text-to-speech voice and the second text-to-speech voice. A policy can define, for a particular phonetic category, from which text-to-speech voice to select voice units. The combined database can include multiple text-to-speech voices from different speakers. The combined database can include voices of a single speaker speaking in different styles. The combined database can include voices of different languages.Type: GrantFiled: August 6, 2010Date of Patent: May 20, 2014Assignee: AT&T Intellectual Property I, L.P.Inventors: Alistair D. Conkie, Ann K. Syrdal
-
Patent number: 8731931Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for speech synthesis. A system practicing the method receives a set of ordered lists of speech units, for each respective speech unit in each ordered list in the set of ordered lists, constructs a sublist of speech units from a next ordered list which are suitable for concatenation, performs a cost analysis of paths through the set of ordered lists of speech units based on the sublist of speech units for each respective speech unit, and synthesizes speech using a lowest cost path of speech units through the set of ordered lists based on the cost analysis. The ordered lists can be ordered based on the respective pitch of each speech unit. In one embodiment, speech units which do not have an assigned pitch can be assigned a pitch.Type: GrantFiled: June 18, 2010Date of Patent: May 20, 2014Assignee: AT&T Intellectual Property I, L.P.Inventor: Alistair D. Conkie
-
Patent number: 8731943Abstract: Systems, methods and computer program products are provided for translating a natural language into music. Through systematic parsing, music compositions can be created. These compositions can be created by one or more persons who do not speak the same natural language.Type: GrantFiled: February 5, 2010Date of Patent: May 20, 2014Assignee: Little Wing World LLCInventors: Nicolle Ruetz, David Warhol
-
Publication number: 20140136207Abstract: A voice synthesizing apparatus includes a first receiver configured to receive first utterance control information generated by detecting a start of a manipulation on a manipulating member by a user, a first synthesizer configured to synthesize, in response to a reception of the first utterance control information, a first voice corresponding to a first phoneme in a phoneme sequence of a voice to be synthesized to output the first voice, a second receiver configured to receive second utterance control information generated by detecting a completion of the manipulation on the manipulating member or a manipulation on a different manipulating member, and a second synthesizer configured to synthesize, in response to a reception of the second utterance control information, a second voice including at least the first phoneme and a succeeding phoneme being subsequent to the first phoneme of the voice to be synthesized to output the second voice.Type: ApplicationFiled: November 14, 2013Publication date: May 15, 2014Applicant: Yamaha CorporationInventors: Hiraku KAYAMA, Yoshiki NISHITANI
-
Patent number: 8725513Abstract: Methods, apparatus, and products are disclosed for providing expressive user interaction with a multimodal application, the multimodal application operating in a multimodal browser on a multimodal device supporting multiple modes of user interaction including a voice mode and one or more non-voice modes, the multimodal application operatively coupled to a speech engine through a VoiceXML interpreter, including: receiving, by the multimodal browser, user input from a user through a particular mode of user interaction; determining, by the multimodal browser, user output for the user in dependence upon the user input; determining, by the multimodal browser, a style for the user output in dependence upon the user input, the style specifying expressive output characteristics for at least one other mode of user interaction; and rendering, by the multimodal browser, the user output in dependence upon the style.Type: GrantFiled: April 12, 2007Date of Patent: May 13, 2014Assignee: Nuance Communications, Inc.Inventors: Charles W. Cross, Jr., Ellen M. Eide, Igor R. Jablokov
-
Patent number: 8725505Abstract: A computer implemented method and system for speech recognition are provided. The method and system generally maintain a set of verbs for speech recognition commands. Upon recognizing utterance of a verb of the set in combination with an invalid object or objects for the verb, the method and system generate an indication relative to the verb and invalid object. The indication can include informing the user that the system is unsure how to execute the command associated with the verb with the invalid object. The method and system can then receive a user input to specify how the verb and invalid object should be treated.Type: GrantFiled: October 22, 2004Date of Patent: May 13, 2014Assignee: Microsoft CorporationInventors: David Mowatt, Robert L. Chambers
-
Patent number: 8719029Abstract: A viewer device for a digital comic comprising: an information acquisition unit that acquires a digital comic in a file format for a digital comic viewed on a viewer device, the file format including speech balloon information including information of a speech balloon region that indicates a region of a speech balloon, first text information indicating a dialogue within each speech balloon, the first text information being correlated with each speech balloon, and first display control information including positional information and a transition order of a anchor point so as to enable the image of the entire page to be viewed on a monitor of the viewer device in a scroll view; and a voice reproduction section that synthesizes a voice for reading the letter corresponding to the text information based on an attribute of the character, an attribute of the speech balloon or the dialogue, and outputs the voice.Type: GrantFiled: June 20, 2013Date of Patent: May 6, 2014Assignee: Fujifilm CorporationInventor: Shunichiro Nonaka
-
Patent number: 8719027Abstract: An automated method of providing a pronunciation of a word to a remote device is disclosed. The method includes receiving an input indicative of the word to be pronounced. The method further includes searching a database having a plurality of records. Each of the records has an indication of a textual representation and an associated indication of an audible representation. At least one output is provided to the remote device of an audible representation of the word to be pronounced.Type: GrantFiled: February 28, 2007Date of Patent: May 6, 2014Assignee: Microsoft CorporationInventors: Yining Chen, Yusheng Li, Min Chu, Frank Kao-Ping Soong
-
Patent number: 8712776Abstract: Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech in a language (e.g., dialect or accent) that is familiar to a user. The algorithms may be implemented on a system including several dedicated render engines. The system may be part of a back end coupled to a front end including storage for media assets and associated synthesized speech, and a request processor for receiving and processing requests that result in providing the synthesized speech. The front end may communicate media assets and associated synthesized speech content over a network to host devices coupled to portable electronic devices on which the media assets and synthesized speech are played back.Type: GrantFiled: September 29, 2008Date of Patent: April 29, 2014Assignee: Apple Inc.Inventors: Jerome Bellegarda, Devang Naik, Kim Silverman
-
Patent number: 8706488Abstract: In one aspect, a method of processing a voice signal to extract information to facilitate training a speech synthesis model is provided. The method comprises acts of detecting a plurality of candidate features in the voice signal, performing at least one comparison between one or more combinations of the plurality of candidate features and the voice signal, and selecting a set of features from the plurality of candidate features based, at least in part, on the at least one comparison. In another aspect, the method is performed by executing a program encoded on a computer readable medium. In another aspect, a speech synthesis model is provided by, at least in part, performing the method.Type: GrantFiled: February 27, 2013Date of Patent: April 22, 2014Assignee: Nuance Communications, Inc.Inventors: Michael D. Edgington, Laurence Gillick, Jordan R. Cohen
-
Patent number: 8706492Abstract: A voice recognition terminal executes a local voice recognition process and utilizes an external center voice recognition process. The terminal includes: a voice message synthesizing element for synthesizing at least one of a voice message to be output from a speaker according to the external center voice recognition process and a voice message to be output from the speaker according to the local voice recognition process so as to distinguish between characteristics of the voice message to be output from the speaker according to the external center voice recognition process and characteristics of the voice message to be output from the speaker according to the local voice recognition process; and a voice output element for outputting a synthesized voice message from the speaker.Type: GrantFiled: June 28, 2011Date of Patent: April 22, 2014Assignee: DENSO CORPORATIONInventors: Kunio Yokoi, Kazuhisa Suzuki, Masayuki Takami, Naoyori Tanzawa
-
Patent number: 8706497Abstract: A synthesis filter 106 synthesizes a plurality of wide-band speech signals by combining wide-band phoneme signals and sound source signals from a speech signal code book 105, and a distortion evaluation unit 107 selects one of the wide-band speech signals with a minimum waveform distortion with respect to an up-sampled narrow-band speech signal output from a sampling conversion unit 101. A first bandpass filter 103 extracts a frequency component outside a narrow-band of the wide-band speech signal and a band synthesis unit 104 combines it with the up-sampled narrow-band speech signal.Type: GrantFiled: October 22, 2010Date of Patent: April 22, 2014Assignee: Mitsubishi Electric CorporationInventors: Satoru Furuta, Hirohisa Tasaki
-
Patent number: 8706489Abstract: A system and method for selecting audio contents by using the speech recognition to obtain a textual phrase from a series of audio contents are provided. The system includes an output module outputting the audio contents, an input module receiving a speech input from a user, a buffer temporarily storing the audio contents within a desired period and the speech input, and a recognizing module performing a speech recognition between the audio contents within the desired period and the speech input to generate an audio phrase and the corresponding textual phrase matching with the speech input.Type: GrantFiled: August 8, 2006Date of Patent: April 22, 2014Assignee: Delta Electronics Inc.Inventors: Jia-lin Shen, Chien-Chou Hung
-
Patent number: 8706493Abstract: In one embodiment of a controllable prosody re-estimation system, a TTS/STS engine consists of a prosody prediction/estimation module, a prosody re-estimation module and a speech synthesis module. The prosody prediction/estimation module generates predicted or estimated prosody information. And then the prosody re-estimation module re-estimates the predicted or estimated prosody information and produces new prosody information, according to a set of controllable parameters provided by a controllable prosody parameter interface. The new prosody information is provided to the speech synthesis module to produce a synthesized speech.Type: GrantFiled: July 11, 2011Date of Patent: April 22, 2014Assignee: Industrial Technology Research InstituteInventors: Cheng-Yuan Lin, Chien-Hung Huang, Chih-Chung Kuo
-
Patent number: 8694319Abstract: Methods, systems, and products are disclosed for dynamic prosody adjustment for voice-rendering synthesized data that include retrieving synthesized data to be voice-rendered; identifying, for the synthesized data to be voice-rendered, a particular prosody setting; determining, in dependence upon the synthesized data to be voice-rendered and the context information for the context in which the synthesized data is to be voice-rendered, a section of the synthesized data to be rendered; and rendering the section of the synthesized data in dependence upon the identified particular prosody setting.Type: GrantFiled: November 3, 2005Date of Patent: April 8, 2014Assignee: International Business Machines CorporationInventors: William K. Bodin, David Jaramillo, Jerry W. Redman, Derral C. Thorson
-
Patent number: 8694320Abstract: A method of generating audio for a text-only application comprises the steps of adding tag to an input text, said tag is usable for adding sound effect to the generated audio; processing the tag to form instructions for generating the audio; generating audio with said effect based on the instructions, while the text being presented. The present invention adds entertainment value to text applications and provides very compact format compared to conventional multimedia as well as uses entertainment sound to make text-only applications such as SMS and email more fun and entertaining.Type: GrantFiled: April 24, 2008Date of Patent: April 8, 2014Assignee: Nokia CorporationInventor: Ole Kirkeby
-
Patent number: 8682671Abstract: Techniques for generating synthetic speech with contrastive stress. In one aspect, a speech-enabled application generates a text input including a text transcription of a desired speech output, and inputs the text input to a speech synthesis system. The synthesis system generates an audio speech output corresponding to at least a portion of the text input, with at least one portion carrying contrastive stress, and provides the audio speech output for the speech-enabled application. In another aspect, a speech-enabled application inputs a plurality of text strings, each corresponding to a portion of a desired speech output, to a software module for rendering contrastive stress. The software module identifies a plurality of audio recordings that render at least one portion of at least one of the text strings as speech carrying contrastive stress. The speech-enabled application generates an audio speech output corresponding to the desired speech output using the audio recordings.Type: GrantFiled: April 17, 2013Date of Patent: March 25, 2014Assignee: Nuance Communications, Inc.Inventors: Darren C. Meyer, Stephen R. Springer
-
Publication number: 20140081642Abstract: Systems and methods for providing synthesized speech in a manner that takes into account the environment where the speech is presented. A method embodiment includes, based on a listening environment and at least one other parameter associated with at least one other parameter, selecting an approach from the plurality of approaches for presenting synthesized speech in a listening environment, presenting synthesized speech according to the selected approach and based on natural language input received from a user indicating that an inability to understand the presented synthesized speech, selecting a second approach from the plurality of approaches and presenting subsequent synthesized speech using the second approach.Type: ApplicationFiled: November 26, 2013Publication date: March 20, 2014Inventors: Kenneth H. Rosen, Carroll W. Creswell, Jeffrey J. Farah, Pradeep K. Bansal, Ann K. Syrdal
-
Patent number: 8676584Abstract: The invention relates to a digital signal processing technique that changes the length of an audio signal and, thus, effectively its play-out speed. This is used for frame rate conversion or sound effects in music production. Time scaling may further be used for fast forward or slow-motion audio play-out. According said method the waveform similarity overlap add approach is modified such that a maximized similarity is determined among similarity measures of sub-sequence pairs each comprising a sub-sequence to-be-matched from a input window and a matching sub-sequence from a search window wherein said sub-sequence pairs comprise at least two sub-sequence pairs of which a first pair comprises a first sub-sequence to-be-matched and a second pair comprises a different second sub-sequence to-be-matched. The input window allows for finding sub-sequence pairs with higher similarity than with a WSOLA approach based on a single sub-sequence to-be-matched. This results in less perceivable artefacts.Type: GrantFiled: June 22, 2009Date of Patent: March 18, 2014Assignee: Thomson LicensingInventor: Markus Schlosser
-
Patent number: 8670984Abstract: A custom-content audible representation of selected data content is automatically created for a user. The content is based on content preferences of the user (e.g., one or more web browsing histories). The content is aggregated, converted using text-to-speech technology, and adapted to fit in a desired length selected for the personalized audible representation. The length of the audible representation may be custom for the user, and may be determined based on the amount of time the user is typically traveling.Type: GrantFiled: February 25, 2011Date of Patent: March 11, 2014Assignee: Nuance Communications, Inc.Inventors: Eli M. Dow, Marie R. Laser, Sarah J. Sheppard, Jessie Yu
-
Publication number: 20140067396Abstract: A segment information generation device includes a waveform cutout unit cuts out a speech waveform from natural speech at a time period not depending on a pitch frequency of the natural speech. A feature parameter extraction unit extracts a feature parameter of a speech waveform from the speech waveform cut out by the waveform cutout unit. A time domain waveform generation unit generates a time domain waveform based on the feature parameter.Type: ApplicationFiled: May 10, 2012Publication date: March 6, 2014Inventor: Masanori Kato
-
Publication number: 20140067398Abstract: A method and system for vocalizing user-selected sporting event scores. A customized spoken score application module can be configured in association with a device. A real-time score can be preselected by a user from an existing sporting event website for automatically vocalizing the score in a multitude of languages utilizing a speech synthesizer and a translation engine. An existing text-to-speech engine can be integrated with the spoken score application module and controlled by the application module to automatically vocalize the preselected scores listed on the sporting event site. The synthetically-voiced, real-time score can be transmitted to the device at a predetermined time interval. Such an approach automatically and instantly pushes the real time vocal alerts thereby permitting the user to continue multitasking without activating the pre-selected vocal alerts.Type: ApplicationFiled: August 30, 2012Publication date: March 6, 2014Inventor: Tony Verna
-
Question-answering system and method based on semantic labeling of text documents and user questions
Patent number: 8666730Abstract: A question-answering system for searching exact answers in text documents provided in the electronic or digital form to questions formulated by user in the natural language is based on automatic semantic labeling of text documents and user questions. The system performs semantic labeling with the help of markers in terms of basic knowledge types, their components and attributes, in terms of question types from the predefined classifier for target words, and in terms of components of possible answers. A matching procedure makes use of mentioned types of semantic labels to determine exact answers to questions and present them to the user in the form of fragments of sentences or a newly synthesized phrase in the natural language. Users can independently add new types of questions to the system classifier and develop required linguistic patterns for the system linguistic knowledge base.Type: GrantFiled: March 12, 2010Date of Patent: March 4, 2014Assignee: Invention Machine CorporationInventors: James Todhunter, Igor Sovpel, Dzianis Pastanohau -
Patent number: 8666746Abstract: A system and method are disclosed for generating customized text-to-speech voices for a particular application. The method comprises generating a custom text-to-speech voice by selecting a voice for generating a custom text-to-speech voice associated with a domain, collecting text data associated with the domain from a pre-existing text data source and using the collected text data, generating an in-domain inventory of synthesis speech units by selecting speech units appropriate to the domain via a search of a pre-existing inventory of synthesis speech units, or by recording the minimal inventory for a selected level of synthesis quality. The text-to-speech custom voice for the domain is generated utilizing the in-domain inventory of synthesis speech units. Active learning techniques may also be employed to identify problem phrases wherein only a few minutes of recorded data is necessary to deliver a high quality TTS custom voice.Type: GrantFiled: May 13, 2004Date of Patent: March 4, 2014Assignee: AT&T Intellectual Property II, L.P.Inventors: Srinivas Bangalore, Junlan Feng, Mazin G. Rahim, Juergen Schroeter, David Eugene Schulz, Ann K. Syrdal
-
Patent number: 8660843Abstract: Systems and methods are described for systems that utilize an interaction manager to manage interactions—also known as requests or dialogues—from one or more applications. The interactions are managed properly even if multiple applications use different grammars. The interaction manager maintains a priority for each of the interactions, such as via an interaction list, where the priority of the interactions corresponds to an order in which the interactions are to be processed. Interactions are normally processed in the order in which they are received. However, the systems and method described herein may provide a grace period after processing a first interaction and before processing a second interaction. If a third interaction that is chained to the first interaction is received during this grace period, then the third interaction may be processed before the second interaction.Type: GrantFiled: January 23, 2013Date of Patent: February 25, 2014Assignee: Microsoft CorporationInventors: Stephen Russell Falcon, Clement Chun Pong Yip, Dan Banay, David Michael Miller
-
Patent number: 8655659Abstract: A personalized text-to-speech synthesizing device includes: a personalized speech feature library creator, configured to recognize personalized speech features of a specific speaker by comparing a random speech fragment of the specific speaker with preset keywords, thereby to create a personalized speech feature library associated with the specific speaker, and store the personalized speech feature library in association with the specific speaker; and a text-to-speech synthesizer, configured to perform a speech synthesis of a text message from the specific speaker, based on the personalized speech feature library associated with the specific speaker and created by the personalized speech feature library creator, thereby to generate and output a speech fragment having pronunciation characteristics of the specific speaker.Type: GrantFiled: August 12, 2010Date of Patent: February 18, 2014Assignees: Sony Corporation, Sony Mobile Communications ABInventors: Qingfang Wang, Shouchun He
-
Patent number: 8655662Abstract: Disclosed herein are systems, methods, and computer readable-media for answering a communication notification. The method for answering a communication notification comprises receiving a notification of communication from a user, converting information related to the notification to speech, outputting the information as speech to the user, and receiving from the user an instruction to accept or ignore the incoming communication associated with the notification. In one embodiment, information related to the notification comprises one or more of a telephone number, an area code, a geographic origin of the request, caller id, a voice message, address book information, a text message, an email, a subject line, an importance level, a photograph, a video clip, metadata, an IP address, or a domain name. Another embodiment involves notification assigned an importance level and repeat attempts at notification if it is of high importance.Type: GrantFiled: November 29, 2012Date of Patent: February 18, 2014Assignee: AT&T Intellectual Property I, L.P.Inventor: Horst Schroeter
-
Publication number: 20140046667Abstract: A system for creating musical content using a client terminal, wherein diverse musical information such as a desired lyric and musical scale, duration and singing technique is input from an online or cloud computer, an embedded terminal or other such client terminal by means of technology for generating musical vocal content by using computer speech synthesis technology, and then speech in which cadence is expressed in accordance with the musical scale is synthesized as speech run by being produced for the applicable duration and is transmitted to the client terminal is provided.Type: ApplicationFiled: April 17, 2012Publication date: February 13, 2014Applicant: TGENS CO., LTDInventors: Jong Hak Yeom, Won Mo Kang
-
Patent number: 8650027Abstract: The invention provides an electrolaryngeal speech reconstruction method and a system thereof. Firstly, model parameters are extracted from the collected speech as a parameter library, then facial images of a speaker are acquired and then transmitted to an image analyzing and processing module to obtain the voice onset and offset times and the vowel classes, then a waveform of a voice source is synthesized by a voice source synthesis module, finally, the waveform of the above voice source is output by an electrolarynx vibration output module, wherein the voice source synthesis module firstly sets the model parameters of a glottal voice source so as to synthesize the waveform of the glottal voice source, and then a waveguide model is used to simulate sound transmission in a vocal tract and select shape parameters of the vocal tract according to the vowel classes.Type: GrantFiled: September 4, 2012Date of Patent: February 11, 2014Assignee: Xi'an Jiaotong UniversityInventors: Mingxi Wan, Liang Wu, Supin Wang, Zhifeng Niu, Congying Wan
-
Patent number: 8650035Abstract: A speech conversion system facilitates voice communications. A database comprises a plurality of conversion heuristics, at least some of the conversion heuristics being associated with identification information for at least one first party. At least one speech converter is configured to convert a first speech signal received from the at least one first party into a converted first speech signal different than the first speech signal.Type: GrantFiled: November 18, 2005Date of Patent: February 11, 2014Assignee: Verizon Laboratories Inc.Inventor: Adrian E. Conway
-
Patent number: 8650034Abstract: According to one embodiment, a speech processing device includes an utterance error occurrence determination information storage unit that stores utterance error occurrence determination information; a related word information storage unit that stores related word information including words; an utterance error occurrence determining unit that compares each of the divided words with the condition, gives the error pattern to the word corresponding to the condition, and determines that the word which does not correspond to the condition does not cause the utterance error; and a phoneme string generating unit that generates a phoneme string of the utterance error. The one of the error patterns associated with one of the conditions is the speech error, the utterance error occurrence determining unit further gives an incorrectly spoken word from the related word information, and the phoneme string generating unit generates a phoneme string of the incorrectly spoken word.Type: GrantFiled: August 12, 2011Date of Patent: February 11, 2014Assignee: Kabushiki Kaisha ToshibaInventor: Noriko Yamanaka
-
Publication number: 20140039892Abstract: In one embodiment, a human interactive proof portal 140 may use a biometric input to determine whether a user is a standard user or a malicious actor. The human interactive proof portal 140 may receive an access request 302 for an online data service 122 from a user device 110. The human interactive proof portal 140 may send a proof challenge 304 to the user device 110 for presentation to a user. The human interactive proof portal 140 may receive from the user device 110 a proof response 306 having a biometric metadata description 430 based on a biometric input from the user.Type: ApplicationFiled: August 2, 2012Publication date: February 6, 2014Applicant: Microsoft CorporationInventors: Chad Mills, Robert Sim, Scott Laufer, Sung Chung
-
Patent number: 8645140Abstract: A method of associating a voice font with a contact for text-to-speech conversion at an electronic device includes obtaining, at the electronic device, the voice font for the contact, and storing the voice font in association with a contact data record stored in a contacts database at the electronic device. The contact data record includes contact data for the contact.Type: GrantFiled: February 25, 2009Date of Patent: February 4, 2014Assignee: BlackBerry LimitedInventor: Yuriy Lobzakov
-
Patent number: 8645141Abstract: A system and method for text to speech conversion. The method of performing text to speech conversion on a portable device includes: identifying a portion of text for conversion to speech format, wherein the identifying includes performing a prediction based on information associated with a user. While the portable device is connected to a power source, a text to speech conversion is performed on the portion of text to produce converted speech. The converted speech is stored into a memory device of the portable device. A reader application is executed, wherein a user request is received for narration of the portion of text. During the executing, the converted speech is accessed from the memory device and rendered to the user, responsive to the user request.Type: GrantFiled: September 14, 2010Date of Patent: February 4, 2014Assignee: Sony CorporationInventors: Ling Jun Wong, True Xiong
-
Patent number: 8639511Abstract: A robot may include a driving control unit configured to control a driving of a movable unit that is connected movably to a body unit, a voice generating unit configured to generate a voice, and a voice output unit configured to output the voice, which has been generated by the voice generating unit. The voice generating unit may correct the voice, which is generated, based on a bearing of the movable unit, which is controlled by the driving control unit, to the body unit.Type: GrantFiled: September 14, 2010Date of Patent: January 28, 2014Assignee: Honda Motor Co., Ltd.Inventors: Kazuhiro Nakadai, Takuma Otsuka, Hiroshi Okuno
-
Patent number: 8635069Abstract: Methods of adding data identifiers and speech/voice recognition functionality are disclosed. A telnet client runs one or more scripts that add data identifiers to data fields in a telnet session. The input data is inserted in the corresponding fields based on data identifiers. Scripts run only on the telnet client without modifications to the server applications. Further disclosed are methods for providing speech recognition and voice functionality to telnet clients. Portions of input data are converted to voice and played to the user. A user also may provide input to certain fields of the telnet session by using his voice. Scripts running on the telnet client convert the user's voice into text and is inserted to corresponding fields.Type: GrantFiled: August 16, 2007Date of Patent: January 21, 2014Assignee: Crimson CorporationInventors: Lamar John Van Wagenen, Brant David Thomsen, Scott Allen Caddes
-
Patent number: 8635058Abstract: The present invention relates to increasing the relevance of media content communicated to consumers who are consuming the media content. In this regard, at least one of a personal device can be synced with a media device, each of the personal device is associated with at least one of a consumer who is proximate the media device. At least one of a preferred human language associated with at least one of the personal device can be determined. The media device or media content can be configured and or caused to be communicated in at least one of the preferred human language to increase relevance of the media content communicated to the consumer. Other embodiments can include communicating at least a portion of the media content on the personal device and selecting relevant media content based in part on language, cultural, ethnic, time, day, occasion, or geography.Type: GrantFiled: March 2, 2010Date of Patent: January 21, 2014Inventor: Nilang Patel
-
Patent number: 8635070Abstract: According to one embodiment, a speech translation apparatus includes a receiving unit, a first recognition unit, a second recognition unit, a first generation unit, a translation unit, a second generation unit, a synthesis unit. The receiving unit is configured to receive a speech in a first language and convert to speech signal. The first recognition unit is configured to perform speech recognition and generate a transcription. The second recognition unit is configured to recognize which emotion type is included in the speech and generate emotion identification information including recognized emotion type(s). The first generation unit is configured to generate a filtered sentence. The translation unit is configured to generate a translation of the filtered sentence in the first language in a second language. The second generation unit is configured to generate an insertion sentence. The synthesis unit is configured to convert the filtered and the insertion sentences into speech signal.Type: GrantFiled: March 25, 2011Date of Patent: January 21, 2014Assignee: Kabushiki Kaisha ToshibaInventor: Kazuo Sumita
-
Patent number: 8630857Abstract: Disclosed is a speech synthesizing apparatus including a segment selection unit that selects a segment suited to a target segment environment from candidate segments, includes a prosody change amount calculation unit that calculates prosody change amount of each candidate segment based on prosody information of candidate segments and the target segment environment, a selection criterion calculation unit that calculates a selection criterion based on the prosody change amount, a candidate selection unit that narrows down selection candidates based on the prosody change amount and the selection criterion, and an optimum segment search unit than searches for an optimum segment from among the narrowed-down candidate segments.Type: GrantFiled: February 15, 2008Date of Patent: January 14, 2014Assignee: NEC CorporationInventors: Masanori Kato, Reishi Kondo, Yasuyuki Mitsui
-
Patent number: 8626510Abstract: An acquiring unit acquires pattern sentences, which are similar to one another and include fixed segments and non-fixed segments, and substitution words that are substituted for the non-fixed segments. A sentence generating unit generates target sentences by replacing the non-fixed segments with the substitution words for each of the pattern sentences. A first synthetic-sound generating unit generates a first synthetic sound, a synthetic sound of the fixed segment, and a second synthetic-sound generating unit generates a second synthetic sound, a synthetic sound of the substitution word, for each of the target sentences. A calculating unit calculates a discontinuity value of a boundary between the first synthetic sound and the second synthetic sound for each of the target sentences and a selecting unit selects the target sentence having the smallest discontinuity value. A connecting unit connects the first synthetic sound and the second synthetic sound of the target sentence selected.Type: GrantFiled: September 15, 2009Date of Patent: January 7, 2014Assignee: Kabushiki Kaisha ToshibaInventor: Nobuaki Mizutani
-
Patent number: 8620661Abstract: A system for controlling digital effects in live performances with vocal improvisation is described. The system features a controller that utilizes several switches attached to clothing that is worn by an artist during a live performance. The switches activate a digital vocal processor unit that provides a dual mode, multi-channel phrase looping capability wherein individual channels can be selected for recording and replay during the performance. This combination of features allows a sequence of digital audio and video effects to be controlled by the artist during a performance while maintaining the freedom of movement desired to enhance the performance.Type: GrantFiled: February 28, 2011Date of Patent: December 31, 2013Inventor: Momilani Ramstrum