Synthesis Patents (Class 704/258)
-
Patent number: 8478597Abstract: The present disclosure presents a useful metric for assessing the relative difficulty which non-native speakers face in pronouncing a given utterance and a method and systems for using such a metric in the evaluation and assessment of the utterances of non-native speakers. In an embodiment, the metric may be based on both known sources of difficulty for language learners and a corpus-based measure of cross-language sound differences. The method may be applied to speakers who primarily speak a first language speaking utterances in any non-native second language.Type: GrantFiled: January 10, 2006Date of Patent: July 2, 2013Assignee: Educational Testing ServiceInventors: Derrick Higgins, Klaus Zechner, Yoko Futagi, Rene Lawless
-
Patent number: 8478582Abstract: A server is disclosed for computing a score of an opinion that a message in a text file is expected to convey regarding a subject to be evaluated, wherein the message is written using literal strings and pictorial symbols. In this server, by the use of a pictorial-symbol dictionary memory storing a correspondence between designated pictorial-symbols to be rated and scores of opinions expressed by the respective pictorial-symbols, at least one of the used pictorial-symbols in the message which is coincident with at least one of the designated pictorial-symbols stored in the pictorial-symbol dictionary memory, is extracted from the message, at least one of the opinion scores which corresponds to the at least one extracted pictorial-symbol is retrieved within the pictorial-symbol dictionary memory, and an aggregate net opinion score for the message is calculated, based on an aggregate opinion score for the at least one extracted pictorial-symbol.Type: GrantFiled: February 2, 2010Date of Patent: July 2, 2013Assignee: KDDI CorporationInventors: Yukiko Habu, Ryoichi Kawada, Nobuhide Kotsuka, Sung Jiae, Koki Uchiyama, Santi Saeyor, Hirosuke Asano, Toshiaki Shimamura
-
Publication number: 20130166303Abstract: A computer-implemented method includes receiving, in a computer system, a user query comprising at least a first term, parsing the user query to at least determine whether the user query assigns a field to the first term, the parsing resulting in a parsed query that conforms to a predefined format, performing a search in a metadata repository using the parsed query, the metadata repository embodied in a computer readable medium and including triplets generated based on multiple modes of metadata for video content, the search identifying a set of candidate scenes from the video content, ranking the set of candidate scenes according to a scoring metric into a ranked scene list, and generating an output from the computer system that includes at least part of the ranked scene list, the output generated in response to the user query.Type: ApplicationFiled: November 13, 2009Publication date: June 27, 2013Applicant: ADOBE SYSTEMS INCORPORATEDInventors: Walter Chang, Michael J. Welch
-
Patent number: 8468017Abstract: The invention discloses a multi-stage quantization method, which includes the following steps: obtaining a reference codebook according to a previous stage codebook; obtaining a current stage codebook according to the reference codebook and a scaling factor; and quantizing an input vector by using the current stage codebook. The invention also discloses a multi-stage quantization device. With the invention, the current stage codebook may be obtained according to the previous stage codebook, by using the correlation between the current stage codebook and the previous stage codebook. As a result, it does not require an independent codebook space for the current stage codebook, which saves the storage space and improves the resource usage efficiency.Type: GrantFiled: May 1, 2010Date of Patent: June 18, 2013Assignee: Huawei Technologies Co., Ltd.Inventors: Eyal Shlomot, Jiliang Dai, Fuliang Yin, Xin Ma, Jun Zhang
-
Patent number: 8468020Abstract: An apparatus for synthesizing a speech including a waveform memory that stores a plurality of speech unit waveforms, an information memory that correspondingly stores speech unit information and an address of each of the speech unit waveforms, a selector that selects a speech unit sequence corresponding to the input phoneme sequence by referring to the speech unit information, a speech unit waveform acquisition unit that acquires a speech unit waveform corresponding to each speech unit of the speech unit sequence from the waveform memory by referring to the address, a speech unit concatenation unit that generates the speech by concatenating the speech unit waveform acquired.Type: GrantFiled: May 8, 2007Date of Patent: June 18, 2013Assignee: Kabushiki Kaisha ToshibaInventor: Takehiko Kagoshima
-
Publication number: 20130151243Abstract: A voice modulation apparatus is provided. The voice modulation apparatus includes an audio signal input unit which receives an audio signal from an external source; an extraction unit which extracts property information relating to a voice from the audio signal; a storage unit which stores the extracted property information; a control unit which modulates a target voice based on the extracted property information; and an output unit which outputs the modulated target voice.Type: ApplicationFiled: December 7, 2012Publication date: June 13, 2013Applicant: SAMSUNG ELECTRONICS CO., LTD.Inventor: Samsung Electronics Co., Ltd.
-
Patent number: 8457967Abstract: A procedure to automatically evaluate the spoken fluency of a speaker by prompting the speaker to talk on a given topic, recording the speaker's speech to get a recorded sample of speech, and then analyzing the patterns of disfluencies in the speech to compute a numerical score to quantify the spoken fluency skills of the speakers. The numerical fluency score accounts for various prosodic and lexical features, including formant-based filled-pause detection, closely-occurring exact and inexact repeat N-grams, normalized average distance between consecutive occurrences of N-grams. The lexical features and prosodic features are combined to classify the speaker with a C-class classification and develop a rating for the speaker.Type: GrantFiled: August 15, 2009Date of Patent: June 4, 2013Assignee: Nuance Communications, Inc.Inventors: Kartik Audhkhasi, Om D. Deshmukh, Kundan Kandhway, Ashish Verma
-
Patent number: 8456420Abstract: Many embodiments may comprise logic such as hardware and/or code to implement user interface for traversal of long sorted lists, via audible mapping of the lists, using sensor based gesture recognition, audio and tactile feedback and button selection while on the go. In several embodiments, such user interface modalities are physically small in size, enabling a user to be truly mobile by reducing the cognitive load required to operate the device. For some embodiments, the user interface may be divided across multiple worn devices, such as a mobile device, watch, earpiece, and ring. Rotation of the watch may be translated into navigation instructions, allowing the user to traverse the list while the user receives audio feedback via the earpiece to describe items in the list as well as audio feedback regarding the navigation state. Many embodiments offer the user a simple user interface to traverse the list without visual feedback.Type: GrantFiled: December 31, 2008Date of Patent: June 4, 2013Assignee: Intel CorporationInventors: Lama Nachman, David L. Graumann, Giuseppe Raffa, Jennifer Healey
-
Patent number: 8452600Abstract: An electronic reading device for reading ebooks and other digital media items combines a touch surface electronic reading device with accessibility technology to provide a visually impaired user more control over his or her reading experience. In some implementations, the reading device can be configured to operate in at least two modes: a continuous reading mode and an enhanced reading mode.Type: GrantFiled: August 18, 2010Date of Patent: May 28, 2013Assignee: Apple Inc.Inventor: Christopher B. Fleizach
-
Patent number: 8447609Abstract: Embodiments may be a standalone module or part of mobile devices, desktop computers, servers, stereo systems, or any other systems that might benefit from condensed audio presentations of item structures such as lists or tables. Embodiments may comprise logic such as hardware and/or code to adjust the temporal characteristics of items comprising words. The items maybe included in a structure such as a text listing or table, an audio listing or table, or a combination thereof, or may be individual words or phrases. For instance, embodiments may comprise a keyword extractor to extract keywords from the items and an abbreviations generator to generate abbreviations based upon the keywords. Further embodiments may comprise a text-to-speech generator to generate audible items based upon the abbreviations to render to a user while traversing the item structure.Type: GrantFiled: December 31, 2008Date of Patent: May 21, 2013Assignee: Intel CorporationInventors: Giuseppe Raffa, Lama Nachman, David L. Graumann, Michael E. Deisher
-
Patent number: 8447613Abstract: A method for optimizing message transmission and decoding comprises: reading data from a memory of an originating device, the data comprising information regarding the originating device; encoding the data by converting the data to a subset of words having a ranked recognition accuracy higher than the remainder of words; transmitting the encoded data from the originating device to a receiving system audibly as words via a telephone connection; utilizing a voice recognition software to recognize the words; decoding the words back to the data; and taking a predetermined action based on the data.Type: GrantFiled: April 28, 2009Date of Patent: May 21, 2013Assignee: iRobot CorporationInventors: Patrick Alan Hussey, Maryellen Abreu
-
Patent number: 8447604Abstract: Provided in some embodiments is a method including receiving ordered script words are indicative of dialogue words to be spoken, receiving audio data corresponding to at least a portion of the dialogue words to be spoken and including timecodes associated with dialogue words, generating a matrix of the ordered script words versus the dialogue words, aligning the matrix to determine hard alignment points that include matching consecutive sequences of ordered script words with corresponding sequences of dialogue words, partitioning the matrix of ordered script words into sub-matrices bounded by adjacent hard-alignment points and including corresponding sub-sets the script and dialogue words between the hard-alignment points, and aligning each of the sub-matrices.Type: GrantFiled: May 28, 2010Date of Patent: May 21, 2013Assignee: Adobe Systems IncorporatedInventor: Walter W. Chang
-
Patent number: 8447610Abstract: Techniques for generating synthetic speech with contrastive stress. In one aspect, a speech-enabled application generates a text input including a text transcription of a desired speech output, and inputs the text input to a speech synthesis system. The synthesis system generates an audio speech output corresponding to at least a portion of the text input, with at least one portion carrying contrastive stress, and provides the audio speech output for the speech-enabled application. In another aspect, a speech-enabled application inputs a plurality of text strings, each corresponding to a portion of a desired speech output, to a software module for rendering contrastive stress. The software module identifies a plurality of audio recordings that render at least one portion of at least one of the text strings as speech carrying contrastive stress. The speech-enabled application generates an audio speech output corresponding to the desired speech output using the audio recordings.Type: GrantFiled: August 9, 2010Date of Patent: May 21, 2013Assignee: Nuance Communications, Inc.Inventors: Darren C. Meyer, Stephen R. Springer
-
Patent number: 8447592Abstract: In one aspect, a method of processing a voice signal to extract information to facilitate training a speech synthesis model is provided. The method comprises acts of detecting a plurality of candidate features in the voice signal, performing at least one comparison between one or more combinations of the plurality of candidate features and the voice signal, and selecting a set of features from the plurality of candidate features based, at least in part, on the at least one comparison. In another aspect, the method is performed by executing a program encoded on a computer readable medium. In another aspect, a speech synthesis model is provided by, at least in part, performing the method.Type: GrantFiled: September 13, 2005Date of Patent: May 21, 2013Assignee: Nuance Communications, Inc.Inventors: Michael D. Edgington, Laurence Gillick, Jordan R. Cohen
-
Patent number: 8442423Abstract: A digital media item, such as an electronic book (eBook), may include testing content. The testing content may include questions about the content of the digital media item. When is user is viewing the digital media item on an electronic device, such as an eBook reader, the user may be allowed to select whether the testing content is displayed. The user may also be allowed to select a particular mode of testing, such as automatic testing, selective testing, etc. If the user chooses to display the testing content, the user may also be allowed to provide answers to the testing questions.Type: GrantFiled: January 26, 2009Date of Patent: May 14, 2013Assignee: Amazon Technologies, Inc.Inventors: Thomas A. Ryan, Edward J. Gayles, Laurent An Minh Nguyen, Steven K. Weiss, Martin Görner
-
Patent number: 8433369Abstract: A mobile terminal has a sound obtaining unit configured to obtain a sound signal; a voice recognition unit configured to recognize the sound signal and convert the sound signal into a text data; a display unit configured to display the text data divided in a plurality of units; a selection unit configured to receive a selection of one of the units from the text data divided in the plurality of the units displayed on the display unit; and a control unit configured to perform a predetermined process corresponding to each of the units selected by the selection unit.Type: GrantFiled: September 15, 2009Date of Patent: April 30, 2013Assignee: Fujitsu Mobile Communications LimitedInventor: Yasuhito Ambiru
-
Patent number: 8433575Abstract: A system and method is described in which a multimedia story is rendered to a consumer in dependence on features extracted from an audio signal representing for example a musical selection of the consumer. Features such as key changes and tempo of the music selection are related to dramatic parameters defined by and associated with story arcs, narrative story rules and film or story structure. In one example a selection of a few music tracks provides input audio signals (602) from which musical features are extracted (604), following which a dramatic parameter list and timeline are generated (606). Media fragments are then obtained (608), the fragments having story content associated with the dramatic parameters, and the fragments output (610) with the music selection.Type: GrantFiled: December 10, 2003Date of Patent: April 30, 2013Assignee: AMBX UK LimitedInventors: David A. Eves, Richard S. Cole, Christopher Thorne
-
Patent number: 8433573Abstract: A prosody modification device includes: a real voice prosody input part that receives real voice prosody information extracted from an utterance of a human; a regular prosody generating part that generates regular prosody information having a regular phoneme boundary that determines a boundary between phonemes and a regular phoneme length of a phoneme by using data representing a regular or statistical phoneme length in an utterance of a human with respect to a section including at least a phoneme or a phoneme string to be modified in the real voice prosody information; and a real voice prosody modification part that resets a real voice phoneme boundary by using the generated regular prosody information so that the real voice phoneme boundary and a real voice phoneme length of the phoneme or the phoneme string to be modified in the real voice prosody information are approximate to an actual phoneme boundary and an actual phoneme length of the utterance of the human, thereby modifying the real voice prosody inType: GrantFiled: February 11, 2008Date of Patent: April 30, 2013Assignee: Fujitsu LimitedInventors: Kentaro Murase, Nobuyuki Katae
-
Patent number: 8433574Abstract: Methods, systems, and software for converting the audio input of a user of a hand-held client device or mobile phone into a textual representation by means of a backend server accessed by the device through a communications network. The text is then inserted into or used by an application of the client device to send a text message, instant message, email, or to insert a request into a web-based application or service. In one embodiment, the method includes the steps of initializing or launching the application on the device; recording and transmitting the recorded audio message from the client device to the backend server through a client-server communication protocol; converting the transmitted audio message into the textual representation in the backend server; and sending the converted text message back to the client device or forwarding it on to an alternate destination directly from the server.Type: GrantFiled: February 13, 2012Date of Patent: April 30, 2013Assignee: Canyon IP Holdings, LLCInventors: Victor R. Jablokov, Igor R. Jablokov, Marc White
-
Patent number: 8428952Abstract: A system and method to allow an author of an instant message to enable and control the production of audible speech to the recipient of the message. The voice of the author of the message is characterized into parameters compatible with a formative or articulative text-to-speech engine such that upon receipt, the receiving client device can generate audible speech signals from the message text according to the characterization of the author's voice. Alternatively, the author can store samples of his or her actual voice in a server so that, upon transmission of a message by the author to a recipient, the server extracts the samples needed only to synthesize the words in the text message, and delivers those to the receiving client device so that they are used by a client-side concatenative text-to-speech engine to generate audible speech signals having a close likeness to the actual voice of the author.Type: GrantFiled: June 12, 2012Date of Patent: April 23, 2013Assignee: Nuance Communications, Inc.Inventors: Terry Wade Niemeyer, Liliana Orozco
-
Patent number: 8423366Abstract: A method includes receiving, by a system, a voice recording associated with a user, transcribing, the voice recording into text that includes a group of words, and storing an association between a portion of each respective word and a corresponding portion of the voice recording. The corresponding portion of the voice recording is the portion of the voice recording from which the portion of the respective word was transcribed. The method may also include determining a modification to a speech synthesis voice associated with the user based at least in part on the association.Type: GrantFiled: July 18, 2012Date of Patent: April 16, 2013Assignee: Google Inc.Inventors: Marcus Alexander Foster, Richard Zarek Cohen
-
Patent number: 8422641Abstract: Devices, systems, and methods for recording call sessions over a VoIP network using a distributed record server architecture are disclosed. An example recording device for recording segments of a call session includes a record server configured to receive an agent voice data stream and an external caller voice data stream from an agent telephone station, and a file repository configured to store voice data and call data associated with each recorded segment of the call session. The recording device is configured to tag recorded segments of each call session, which can be later used by a third-party application or database to check the status and/or integrity of the recorded call session.Type: GrantFiled: June 15, 2009Date of Patent: April 16, 2013Assignee: Calabrio, Inc.Inventor: James Paul Martin, II
-
Patent number: 8423365Abstract: A contextual conversion platform, and method for converting text-to-speech, are described that can convert content of a target to spoken content. Embodiments of the contextual conversion platform can identify certain contextual characteristics of the content, from which can be generated a spoken content input. This spoken content input can include tokens, e.g., words and abbreviations, to be converted to the spoken content, as well as substitution tokens that are selected from contextual repositories based on the context identified by the contextual conversion platform.Type: GrantFiled: May 28, 2010Date of Patent: April 16, 2013Inventor: Daniel Ben-Ezri
-
Patent number: 8412528Abstract: The present invention relates to computer-generated text-to-speech conversion. It relates in particular to a method and system for updating a Concatenative Text-To-Speech (CTTS) system with a speech database from a base version to a new version. The present invention performs an application-specific re-organization of a synthesizer's speech database by means of certain decision tree modifications. By that reorganization, certain synthesis units are made available for the new application, which are not available in prior art without a new speech session. This allows the creation of application-specific synthesizers with improved output speech quality for arbitrary domains and applications at very low cost.Type: GrantFiled: May 2, 2006Date of Patent: April 2, 2013Assignee: Nuance Communications, Inc.Inventors: Volker Fischer, Siegfried Kunzmann
-
Patent number: 8412529Abstract: An approach is provided for enhancing verbal communication sessions. A verbal component of a communication session is converted into textual information. The converted textual information is scanned for a text string to trigger an application. The application is invoked to provide supplemental information about the textual information or to perform an action in response to the textual information for or on behalf of a party of the communication session. The supplemental information or a confirmation of the action is transmitted to the party.Type: GrantFiled: October 29, 2008Date of Patent: April 2, 2013Assignee: Verizon Patent and Licensing Inc.Inventors: Martin W. McKee, Paul T. Schultz, Robert A. Sartini
-
Patent number: 8401856Abstract: A very common problem is when people speak a language other than the language which they are accustomed, syllables can be spoken for longer or shorter than the listener would regard as appropriate. An example of this can be observed when people who have a heavy Japanese accent speak English. Since Japanese words end with vowels, there is a tendency for native Japanese to add a vowel sound to the end of English words that should end with a consonant. Illustratively, native Japanese speakers often pronounce “orange” as “orenji.” An aspect provides an automatic speech-correcting process that would not necessarily need to know that fruit is being discussed; the system would only need to know that the speaker is accustomed to Japanese, that the listener is accustomed to English, that “orenji” is not a word in English, and that “orenji” is a typical Japanese mispronunciation of the English word “orange.Type: GrantFiled: May 17, 2010Date of Patent: March 19, 2013Assignee: Avaya Inc.Inventors: Terry Jennings, Paul Roller Michaelis
-
Publication number: 20130066631Abstract: The present invention provides a parametric speech synthesis method and a parametric speech synthesis system.Type: ApplicationFiled: October 27, 2011Publication date: March 14, 2013Applicant: GOERTEK INC.Inventors: Fengliang Wu, Zhenhua Wu
-
Patent number: 8396708Abstract: An avatar facial expression representation technology is provided. The avatar facial expression representation technology estimates changes in emotion and emphasis in a user's voice from vocal information, and changes in mouth shape of the user from pronunciation information of the voice. The avatar facial expression technology tracks a user's facial movements and changes in facial expression from image information and may represent avatar facial expressions based on the result of the these operations. Accordingly, the avatar facial expressions can be obtained which are similar to actual facial expressions of the user.Type: GrantFiled: January 28, 2010Date of Patent: March 12, 2013Assignee: Samsung Electronics Co., Ltd.Inventors: Chi-youn Park, Young-kyoo Hwang, Jung-bae Kim
-
Patent number: 8392191Abstract: The present invention provides a method and apparatus of forming Chinese prosodic words, which method comprises the steps of inputting Chinese text; performing process of word segmentation and part of speech annotation for the input Chinese text to generate an initial prosodic word sequence; inserting grids representing prosodic word boundaries for all the words in the initial prosodic word sequence to generate a grid prosodic word sequence; annotating the grids ready to be deleted in the grid prosodic word sequence based on the prosodic word forming means; judging the grids which actually need to be deleted in the grids ready to be deleted based on the prosodic word forming means; deleting the grids which actually need to be deleted in the grid prosodic word sequence, and word forming the words between every two grids in the remaining grids to generate prosodic words.Type: GrantFiled: December 10, 2007Date of Patent: March 5, 2013Assignee: Fujitsu LimitedInventors: Guo Qing, Nobuyuki Katae
-
Patent number: 8392194Abstract: A method for effecting a machine-based determination of speech intelligibility in an aircraft during flight operations includes: (a) in no particular order: (1) providing a representation of a machine-based speech evaluating signal; and (2) providing a representation of in-flight noise; (b) combining the representation of a machine-based speech evaluation signal and the representation of in-flight noise to obtain a combined noise signal; and (c) employing the combined noise signal to present the machine-based determination of speech intelligibility in an aircraft during flight operations.Type: GrantFiled: October 15, 2008Date of Patent: March 5, 2013Assignee: The Boeing CompanyInventor: Naval Kishore Agarwal
-
Patent number: 8380484Abstract: A method (50) of dynamically changing a sentence structure of a message can include the step of receiving (51) a user request for information, retrieving (52) data based on the information requested, and altering (53) among an intonation and/or the language conveying the information based on the context of the information to be presented. The intonation can optionally be altered by altering (54) a volume, a speed, and/or a pitch based on the information to be presented. The language can be altered by selecting (55) among a finite set of synonyms based on the information to be presented to the user or by selecting (56) among key verbs, adjectives or adverbs that vary along a continuum.Type: GrantFiled: August 10, 2004Date of Patent: February 19, 2013Assignee: International Business Machines CorporationInventors: Brent L. Davis, Stephen W. Hanley, Vanessa V. Michelini, Melanie D. Polkosky
-
Patent number: 8374873Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.Type: GrantFiled: August 11, 2009Date of Patent: February 12, 2013Assignee: Morphism, LLCInventor: James H. Stephens, Jr.
-
Patent number: 8374859Abstract: An automatic answering device and an automatic answering method for automatically answering to a user utterance are configured: to prepare a conversation scenario that is a set of input sentences and replay sentences, the input sentences each corresponding to a user utterance assumed to be uttered by a user, the reply sentences each being an automatic reply to the inputted sentence; to accept a user utterance; to determine the reply sentence to the accepted user utterance on the basis of the conversation scenario; and to present the determined reply sentence to the user. Data of the conversation scenario have a data structure that enables the inputted sentences and the reply sentences to be expressed in a state transition diagram in which each of the inputted sentences is defined as a morphism and the reply sentence corresponding to the inputted sentence is defined as an object.Type: GrantFiled: August 17, 2009Date of Patent: February 12, 2013Assignee: Universal Entertainment CorporationInventors: Shengyang Huang, Hiroshi Katukura
-
Patent number: 8374872Abstract: A device provides a question to a user, and receives, from the user, an unrecognized voice response to the question. The device also provides the unrecognized voice response to an utterance agent for determination of the unrecognized voice response without user involvement, and provides an additional question to the user prior to receiving the determination of the unrecognized voice response from the utterance agent.Type: GrantFiled: November 4, 2008Date of Patent: February 12, 2013Assignee: Verizon Patent and Licensing Inc.Inventor: Manohar R. Kesireddy
-
Patent number: 8374876Abstract: A system and a method for speech generation which assist the speech of those with a disability or a medical condition such as cerebral palsy, motor neurone disease or a dysarthia following a stroke. The system has a user interface having a multiplicity of states each of which correspond to a sound and a selector for making a selection of a state or a combination of states. The system also has a processor for processing the selected state or combination of states and an audio output for outputting the sound or combination of sounds. The sounds associated with the states can be phonemes or phonics and the user interface is typically a manually operable device such as a mouse, trackball, joystick or other device that allows a user to distinguish between states by manipulating the interface to a number of positions.Type: GrantFiled: February 1, 2007Date of Patent: February 12, 2013Assignee: The University of DundeeInventors: Rolf Black, Annula Waller, Eric Abel, Iain Murray, Graham Pullin
-
Publication number: 20130035940Abstract: The invention provides an electrolaryngeal speech reconstruction method and a system thereof. Firstly, model parameters are extracted from the collected speech as a parameter library, then facial images of a speaker are acquired and then transmitted to an image analyzing and processing module to obtain the voice onset and offset times and the vowel classes, then a waveform of a voice source is synthesized by a voice source synthesis module, finally, the waveform of the above voice source is output by an electrolarynx vibration output module, wherein the voice source synthesis module firstly sets the model parameters of a glottal voice source so as to synthesize the waveform of the glottal voice source, and then a waveguide model is used to simulate sound transmission in a vocal tract and select shape parameters of the vocal tract according to the vowel classes.Type: ApplicationFiled: September 4, 2012Publication date: February 7, 2013Applicant: XI'AN JIAOTONG UNIVERITYInventors: MINGXI WAN, LIANG WU, SUPIN WANG, ZHIFENG NIU, CONGYING WAN
-
Patent number: 8370150Abstract: The text information presentation device calculates an optimum readout speed on the basis of the content of text information being input, its arriving time, and its previous arriving time; speech-synthesizes text information being input, at the readout speed calculated; and outputs it as an audio signal, or alternatively controls the speed at which a video signal is output according to an output state of the speech synthesizing unit.Type: GrantFiled: July 15, 2008Date of Patent: February 5, 2013Assignee: Panasonic CorporationInventors: Keiichi Toiyama, Mitsuteru Kataoka, Kohsuke Yamamoto
-
Patent number: 8370148Abstract: Disclosed herein are systems, methods, and computer readable-media for answering a communication notification. The method for answering a communication notification comprises receiving a notification of communication from a user, converting information related to the notification to speech, outputting the information as speech to the user, and receiving from the user an instruction to accept or ignore the incoming communication associated with the notification. In one embodiment, information related to the notification comprises one or more of a telephone number, an area code, a geographic origin of the request, caller id, a voice message, address book information, a text message, an email, a subject line, an importance level, a photograph, a video clip, metadata, an IP address, or a domain name. Another embodiment involves notification assigned an importance level and repeat attempts at notification if it is of high importance.Type: GrantFiled: April 14, 2008Date of Patent: February 5, 2013Assignee: AT&T Intellectual Property I, L.P.Inventor: Horst Schroeter
-
Patent number: 8370151Abstract: Disclosed are techniques and systems to provide a narration of a text in multiple different voices where the portions of the text narrated using the different voices are selected by a user.Type: GrantFiled: January 14, 2010Date of Patent: February 5, 2013Assignee: K-NFB Reading Technology, Inc.Inventors: Raymond C. Kurzweil, Paul Albrecht, Peter Chapman
-
Patent number: 8370149Abstract: Waveform concatenation speech synthesis with high sound quality. Prosody with both high accuracy and high sound quality is achieved by performing a two-path search including a speech segment search and a prosody modification value search. An accurate accent is secured by evaluating the consistency of the prosody by using a statistical model of prosody variations (the slope of fundamental frequency) for both of two paths of the speech segment selection and the modification value search. In the prosody modification value search, a prosody modification value sequence that minimizes a modified prosody cost is searched for. This allows a search for a modification value sequence that can increase the likelihood of absolute values or variations of the prosody to the statistical model as high as possible with minimum modification values.Type: GrantFiled: August 15, 2008Date of Patent: February 5, 2013Assignee: Nuance Communications, Inc.Inventors: Ryuki Tachibana, Masafumi Nishimura
-
Patent number: 8364488Abstract: Disclosed are techniques and systems to provide a narration of a text in multiple different voices. Further disclosed are techniques and systems for modifying a voice model associated with a selected character based on data received from a user.Type: GrantFiled: January 14, 2010Date of Patent: January 29, 2013Assignee: K-NFB Reading Technology, Inc.Inventors: Raymond C. Kurzweil, Paul Albrecht, Peter Chapman
-
Patent number: 8364487Abstract: A language processing system may determine a display form of a spoken word by analyzing the spoken form using a language model that includes dictionary entries for display forms of homonyms. The homonyms may include trade names as well as given names and other phrases. The language processing system may receive spoken language and produce a display form of the language while displaying the proper form of the homonym. Such a system may be used in search systems where audio input is converted to a graphical display of a portion of the spoken input.Type: GrantFiled: October 21, 2008Date of Patent: January 29, 2013Assignee: Microsoft CorporationInventors: Yun-Cheng Ju, Julian J. Odell
-
Patent number: 8364466Abstract: The teachings described herein generally relate to a multilingual electronic translation of a source phrase to a destination language selected from multiple languages, and this can be accomplished through the use of a network environment. The electronic translation can occur as a spoken translation, can be in real-time, and can mimic the voice of the user of the system.Type: GrantFiled: June 16, 2012Date of Patent: January 29, 2013Assignee: NewTalk, Inc.Inventors: Bruce W. Nash, Craig A. Robinson, Martha P. Robinson, Robert H. Clemons
-
Patent number: 8364472Abstract: Provided is an audio encoding device which can detect an optimal pitch pulse when using pitch pulse information as redundant information.Type: GrantFiled: February 29, 2008Date of Patent: January 29, 2013Assignee: Panasonic CorporationInventor: Hiroyuki Ehara
-
Patent number: 8359202Abstract: Disclosed are techniques and systems to provide a narration of a text in multiple different voices where the portions of the text narrated using the different voices are selected by a user. Also disclosed are techniques and systems for associating characters with portions of a sequence of words selected by a user. Different characters having different voice models can be associated with different portions of a sequence of words.Type: GrantFiled: January 14, 2010Date of Patent: January 22, 2013Assignee: K-NFB Reading Technology, Inc.Inventors: Raymond C. Kurzweil, Paul Albrecht, Peter Chapman
-
Publication number: 20130013312Abstract: A system and method for improving the response time of text-to-speech synthesis using triphone contexts. The method includes receiving input text, selecting a plurality of N phoneme units from a triphone unit selection database as candidate phonemes for synthesized speech based on the input text, wherein the triphone unit selection database comprises triphone units each comprising three phones and if the candidate phonemes are available in the triphone unit selection database, and applying a cost process to select a set of phonemes from the candidate phonemes. If so candidate phonemes are available in the triphone unit selection database, the method includes applying a single phoneme approach to select single phonemes for synthesis, the single phonemes used in synthesis independent of a triphone structure.Type: ApplicationFiled: July 16, 2012Publication date: January 10, 2013Applicant: AT&T Intellectual Property II, L.P.Inventor: Alistair D. Conkie
-
Patent number: 8352269Abstract: Disclosed are techniques and systems to provide a narration of a text in multiple different voices. Further disclosed are techniques and systems for processing indicia in a document to determine a portion of words and associating a particular a voice model with the portion of words based on the indicia. During a readback process, an audible output corresponding to the words in the portion of words is generated using the voice model associated with the portion of words.Type: GrantFiled: January 14, 2010Date of Patent: January 8, 2013Assignee: K-NFB Reading Technology, Inc.Inventors: Raymond C. Kurzweil, Paul Albrecht, Peter Chapman
-
Patent number: 8352271Abstract: To facilitate text-to-speech conversion of a username, a first or last name of a user associated with the username may be retrieved, and a pronunciation of the username may be determined based at least in part on whether the name forms at least part of the username. To facilitate text-to-speech conversion of a domain name having a top level domain and at least one other level domain, a pronunciation for the top level domain may be determined based at least in part upon whether the top level domain is one of a predetermined set of top level domains. Each other level domain may be searched for one or more recognized words therewithin, and a pronunciation of the other level domain may be determined based at least in part on an outcome of the search. The username and domain name may form part of a network address such as an email address, URL or URI.Type: GrantFiled: February 23, 2012Date of Patent: January 8, 2013Assignee: Research In Motion LimitedInventors: Matthew Bells, Jennifer Elizabeth Lhotak, Michael Angelo Nanni
-
Patent number: 8352267Abstract: A plurality of input devices each includes a speaker, an operation data transmitter, a voice data receiver, and a voice controller. An information processing apparatus includes a voice storing area, object displaying programmed logic circuitry, operation data acquiring programmed logic circuitry, pointing position determining programmed logic circuitry, object specifying programmed logic circuitry, voice reading programmed logic circuitry, and voice data transmitting programmed logic circuitry. The pointing position determining programmed logic circuitry specifies, for each of the input devices, a pointing position on a screen based on operation data transmitted from the operation data transmitter. The voice reading programmed logic circuitry reads voice data corresponding to the pointing position for each of the input devices. The voice data transmitting programmed logic circuitry transmits the voice data to each of the input devices.Type: GrantFiled: June 27, 2007Date of Patent: January 8, 2013Assignee: Nintendo Co., Ltd.Inventor: Toshiaki Suzuki
-
Systems and methods for selective rate of speech and speech preferences for text to speech synthesis
Patent number: 8352268Abstract: Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech in a language (e.g., dialect or accent) that is familiar to a user. The algorithms may be implemented on a system including several dedicated render engines. The system may be part of a back end coupled to a front end including storage for media assets and associated synthesized speech, and a request processor for receiving and processing requests that result in providing the synthesized speech. The front end may communicate media assets and associated synthesized speech content over a network to host devices coupled to portable electronic devices on which the media assets and synthesized speech are played back.Type: GrantFiled: September 29, 2008Date of Patent: January 8, 2013Assignee: Apple Inc.Inventors: DeVang Naik, Kim Silverman, Jerome Bellegarda