Synthesis Patents (Class 704/258)
  • Patent number: 8478597
    Abstract: The present disclosure presents a useful metric for assessing the relative difficulty which non-native speakers face in pronouncing a given utterance and a method and systems for using such a metric in the evaluation and assessment of the utterances of non-native speakers. In an embodiment, the metric may be based on both known sources of difficulty for language learners and a corpus-based measure of cross-language sound differences. The method may be applied to speakers who primarily speak a first language speaking utterances in any non-native second language.
    Type: Grant
    Filed: January 10, 2006
    Date of Patent: July 2, 2013
    Assignee: Educational Testing Service
    Inventors: Derrick Higgins, Klaus Zechner, Yoko Futagi, Rene Lawless
  • Patent number: 8478582
    Abstract: A server is disclosed for computing a score of an opinion that a message in a text file is expected to convey regarding a subject to be evaluated, wherein the message is written using literal strings and pictorial symbols. In this server, by the use of a pictorial-symbol dictionary memory storing a correspondence between designated pictorial-symbols to be rated and scores of opinions expressed by the respective pictorial-symbols, at least one of the used pictorial-symbols in the message which is coincident with at least one of the designated pictorial-symbols stored in the pictorial-symbol dictionary memory, is extracted from the message, at least one of the opinion scores which corresponds to the at least one extracted pictorial-symbol is retrieved within the pictorial-symbol dictionary memory, and an aggregate net opinion score for the message is calculated, based on an aggregate opinion score for the at least one extracted pictorial-symbol.
    Type: Grant
    Filed: February 2, 2010
    Date of Patent: July 2, 2013
    Assignee: KDDI Corporation
    Inventors: Yukiko Habu, Ryoichi Kawada, Nobuhide Kotsuka, Sung Jiae, Koki Uchiyama, Santi Saeyor, Hirosuke Asano, Toshiaki Shimamura
  • Publication number: 20130166303
    Abstract: A computer-implemented method includes receiving, in a computer system, a user query comprising at least a first term, parsing the user query to at least determine whether the user query assigns a field to the first term, the parsing resulting in a parsed query that conforms to a predefined format, performing a search in a metadata repository using the parsed query, the metadata repository embodied in a computer readable medium and including triplets generated based on multiple modes of metadata for video content, the search identifying a set of candidate scenes from the video content, ranking the set of candidate scenes according to a scoring metric into a ranked scene list, and generating an output from the computer system that includes at least part of the ranked scene list, the output generated in response to the user query.
    Type: Application
    Filed: November 13, 2009
    Publication date: June 27, 2013
    Applicant: ADOBE SYSTEMS INCORPORATED
    Inventors: Walter Chang, Michael J. Welch
  • Patent number: 8468017
    Abstract: The invention discloses a multi-stage quantization method, which includes the following steps: obtaining a reference codebook according to a previous stage codebook; obtaining a current stage codebook according to the reference codebook and a scaling factor; and quantizing an input vector by using the current stage codebook. The invention also discloses a multi-stage quantization device. With the invention, the current stage codebook may be obtained according to the previous stage codebook, by using the correlation between the current stage codebook and the previous stage codebook. As a result, it does not require an independent codebook space for the current stage codebook, which saves the storage space and improves the resource usage efficiency.
    Type: Grant
    Filed: May 1, 2010
    Date of Patent: June 18, 2013
    Assignee: Huawei Technologies Co., Ltd.
    Inventors: Eyal Shlomot, Jiliang Dai, Fuliang Yin, Xin Ma, Jun Zhang
  • Patent number: 8468020
    Abstract: An apparatus for synthesizing a speech including a waveform memory that stores a plurality of speech unit waveforms, an information memory that correspondingly stores speech unit information and an address of each of the speech unit waveforms, a selector that selects a speech unit sequence corresponding to the input phoneme sequence by referring to the speech unit information, a speech unit waveform acquisition unit that acquires a speech unit waveform corresponding to each speech unit of the speech unit sequence from the waveform memory by referring to the address, a speech unit concatenation unit that generates the speech by concatenating the speech unit waveform acquired.
    Type: Grant
    Filed: May 8, 2007
    Date of Patent: June 18, 2013
    Assignee: Kabushiki Kaisha Toshiba
    Inventor: Takehiko Kagoshima
  • Publication number: 20130151243
    Abstract: A voice modulation apparatus is provided. The voice modulation apparatus includes an audio signal input unit which receives an audio signal from an external source; an extraction unit which extracts property information relating to a voice from the audio signal; a storage unit which stores the extracted property information; a control unit which modulates a target voice based on the extracted property information; and an output unit which outputs the modulated target voice.
    Type: Application
    Filed: December 7, 2012
    Publication date: June 13, 2013
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventor: Samsung Electronics Co., Ltd.
  • Patent number: 8457967
    Abstract: A procedure to automatically evaluate the spoken fluency of a speaker by prompting the speaker to talk on a given topic, recording the speaker's speech to get a recorded sample of speech, and then analyzing the patterns of disfluencies in the speech to compute a numerical score to quantify the spoken fluency skills of the speakers. The numerical fluency score accounts for various prosodic and lexical features, including formant-based filled-pause detection, closely-occurring exact and inexact repeat N-grams, normalized average distance between consecutive occurrences of N-grams. The lexical features and prosodic features are combined to classify the speaker with a C-class classification and develop a rating for the speaker.
    Type: Grant
    Filed: August 15, 2009
    Date of Patent: June 4, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Kartik Audhkhasi, Om D. Deshmukh, Kundan Kandhway, Ashish Verma
  • Patent number: 8456420
    Abstract: Many embodiments may comprise logic such as hardware and/or code to implement user interface for traversal of long sorted lists, via audible mapping of the lists, using sensor based gesture recognition, audio and tactile feedback and button selection while on the go. In several embodiments, such user interface modalities are physically small in size, enabling a user to be truly mobile by reducing the cognitive load required to operate the device. For some embodiments, the user interface may be divided across multiple worn devices, such as a mobile device, watch, earpiece, and ring. Rotation of the watch may be translated into navigation instructions, allowing the user to traverse the list while the user receives audio feedback via the earpiece to describe items in the list as well as audio feedback regarding the navigation state. Many embodiments offer the user a simple user interface to traverse the list without visual feedback.
    Type: Grant
    Filed: December 31, 2008
    Date of Patent: June 4, 2013
    Assignee: Intel Corporation
    Inventors: Lama Nachman, David L. Graumann, Giuseppe Raffa, Jennifer Healey
  • Patent number: 8452600
    Abstract: An electronic reading device for reading ebooks and other digital media items combines a touch surface electronic reading device with accessibility technology to provide a visually impaired user more control over his or her reading experience. In some implementations, the reading device can be configured to operate in at least two modes: a continuous reading mode and an enhanced reading mode.
    Type: Grant
    Filed: August 18, 2010
    Date of Patent: May 28, 2013
    Assignee: Apple Inc.
    Inventor: Christopher B. Fleizach
  • Patent number: 8447609
    Abstract: Embodiments may be a standalone module or part of mobile devices, desktop computers, servers, stereo systems, or any other systems that might benefit from condensed audio presentations of item structures such as lists or tables. Embodiments may comprise logic such as hardware and/or code to adjust the temporal characteristics of items comprising words. The items maybe included in a structure such as a text listing or table, an audio listing or table, or a combination thereof, or may be individual words or phrases. For instance, embodiments may comprise a keyword extractor to extract keywords from the items and an abbreviations generator to generate abbreviations based upon the keywords. Further embodiments may comprise a text-to-speech generator to generate audible items based upon the abbreviations to render to a user while traversing the item structure.
    Type: Grant
    Filed: December 31, 2008
    Date of Patent: May 21, 2013
    Assignee: Intel Corporation
    Inventors: Giuseppe Raffa, Lama Nachman, David L. Graumann, Michael E. Deisher
  • Patent number: 8447613
    Abstract: A method for optimizing message transmission and decoding comprises: reading data from a memory of an originating device, the data comprising information regarding the originating device; encoding the data by converting the data to a subset of words having a ranked recognition accuracy higher than the remainder of words; transmitting the encoded data from the originating device to a receiving system audibly as words via a telephone connection; utilizing a voice recognition software to recognize the words; decoding the words back to the data; and taking a predetermined action based on the data.
    Type: Grant
    Filed: April 28, 2009
    Date of Patent: May 21, 2013
    Assignee: iRobot Corporation
    Inventors: Patrick Alan Hussey, Maryellen Abreu
  • Patent number: 8447604
    Abstract: Provided in some embodiments is a method including receiving ordered script words are indicative of dialogue words to be spoken, receiving audio data corresponding to at least a portion of the dialogue words to be spoken and including timecodes associated with dialogue words, generating a matrix of the ordered script words versus the dialogue words, aligning the matrix to determine hard alignment points that include matching consecutive sequences of ordered script words with corresponding sequences of dialogue words, partitioning the matrix of ordered script words into sub-matrices bounded by adjacent hard-alignment points and including corresponding sub-sets the script and dialogue words between the hard-alignment points, and aligning each of the sub-matrices.
    Type: Grant
    Filed: May 28, 2010
    Date of Patent: May 21, 2013
    Assignee: Adobe Systems Incorporated
    Inventor: Walter W. Chang
  • Patent number: 8447610
    Abstract: Techniques for generating synthetic speech with contrastive stress. In one aspect, a speech-enabled application generates a text input including a text transcription of a desired speech output, and inputs the text input to a speech synthesis system. The synthesis system generates an audio speech output corresponding to at least a portion of the text input, with at least one portion carrying contrastive stress, and provides the audio speech output for the speech-enabled application. In another aspect, a speech-enabled application inputs a plurality of text strings, each corresponding to a portion of a desired speech output, to a software module for rendering contrastive stress. The software module identifies a plurality of audio recordings that render at least one portion of at least one of the text strings as speech carrying contrastive stress. The speech-enabled application generates an audio speech output corresponding to the desired speech output using the audio recordings.
    Type: Grant
    Filed: August 9, 2010
    Date of Patent: May 21, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Darren C. Meyer, Stephen R. Springer
  • Patent number: 8447592
    Abstract: In one aspect, a method of processing a voice signal to extract information to facilitate training a speech synthesis model is provided. The method comprises acts of detecting a plurality of candidate features in the voice signal, performing at least one comparison between one or more combinations of the plurality of candidate features and the voice signal, and selecting a set of features from the plurality of candidate features based, at least in part, on the at least one comparison. In another aspect, the method is performed by executing a program encoded on a computer readable medium. In another aspect, a speech synthesis model is provided by, at least in part, performing the method.
    Type: Grant
    Filed: September 13, 2005
    Date of Patent: May 21, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Michael D. Edgington, Laurence Gillick, Jordan R. Cohen
  • Patent number: 8442423
    Abstract: A digital media item, such as an electronic book (eBook), may include testing content. The testing content may include questions about the content of the digital media item. When is user is viewing the digital media item on an electronic device, such as an eBook reader, the user may be allowed to select whether the testing content is displayed. The user may also be allowed to select a particular mode of testing, such as automatic testing, selective testing, etc. If the user chooses to display the testing content, the user may also be allowed to provide answers to the testing questions.
    Type: Grant
    Filed: January 26, 2009
    Date of Patent: May 14, 2013
    Assignee: Amazon Technologies, Inc.
    Inventors: Thomas A. Ryan, Edward J. Gayles, Laurent An Minh Nguyen, Steven K. Weiss, Martin Görner
  • Patent number: 8433369
    Abstract: A mobile terminal has a sound obtaining unit configured to obtain a sound signal; a voice recognition unit configured to recognize the sound signal and convert the sound signal into a text data; a display unit configured to display the text data divided in a plurality of units; a selection unit configured to receive a selection of one of the units from the text data divided in the plurality of the units displayed on the display unit; and a control unit configured to perform a predetermined process corresponding to each of the units selected by the selection unit.
    Type: Grant
    Filed: September 15, 2009
    Date of Patent: April 30, 2013
    Assignee: Fujitsu Mobile Communications Limited
    Inventor: Yasuhito Ambiru
  • Patent number: 8433575
    Abstract: A system and method is described in which a multimedia story is rendered to a consumer in dependence on features extracted from an audio signal representing for example a musical selection of the consumer. Features such as key changes and tempo of the music selection are related to dramatic parameters defined by and associated with story arcs, narrative story rules and film or story structure. In one example a selection of a few music tracks provides input audio signals (602) from which musical features are extracted (604), following which a dramatic parameter list and timeline are generated (606). Media fragments are then obtained (608), the fragments having story content associated with the dramatic parameters, and the fragments output (610) with the music selection.
    Type: Grant
    Filed: December 10, 2003
    Date of Patent: April 30, 2013
    Assignee: AMBX UK Limited
    Inventors: David A. Eves, Richard S. Cole, Christopher Thorne
  • Patent number: 8433573
    Abstract: A prosody modification device includes: a real voice prosody input part that receives real voice prosody information extracted from an utterance of a human; a regular prosody generating part that generates regular prosody information having a regular phoneme boundary that determines a boundary between phonemes and a regular phoneme length of a phoneme by using data representing a regular or statistical phoneme length in an utterance of a human with respect to a section including at least a phoneme or a phoneme string to be modified in the real voice prosody information; and a real voice prosody modification part that resets a real voice phoneme boundary by using the generated regular prosody information so that the real voice phoneme boundary and a real voice phoneme length of the phoneme or the phoneme string to be modified in the real voice prosody information are approximate to an actual phoneme boundary and an actual phoneme length of the utterance of the human, thereby modifying the real voice prosody in
    Type: Grant
    Filed: February 11, 2008
    Date of Patent: April 30, 2013
    Assignee: Fujitsu Limited
    Inventors: Kentaro Murase, Nobuyuki Katae
  • Patent number: 8433574
    Abstract: Methods, systems, and software for converting the audio input of a user of a hand-held client device or mobile phone into a textual representation by means of a backend server accessed by the device through a communications network. The text is then inserted into or used by an application of the client device to send a text message, instant message, email, or to insert a request into a web-based application or service. In one embodiment, the method includes the steps of initializing or launching the application on the device; recording and transmitting the recorded audio message from the client device to the backend server through a client-server communication protocol; converting the transmitted audio message into the textual representation in the backend server; and sending the converted text message back to the client device or forwarding it on to an alternate destination directly from the server.
    Type: Grant
    Filed: February 13, 2012
    Date of Patent: April 30, 2013
    Assignee: Canyon IP Holdings, LLC
    Inventors: Victor R. Jablokov, Igor R. Jablokov, Marc White
  • Patent number: 8428952
    Abstract: A system and method to allow an author of an instant message to enable and control the production of audible speech to the recipient of the message. The voice of the author of the message is characterized into parameters compatible with a formative or articulative text-to-speech engine such that upon receipt, the receiving client device can generate audible speech signals from the message text according to the characterization of the author's voice. Alternatively, the author can store samples of his or her actual voice in a server so that, upon transmission of a message by the author to a recipient, the server extracts the samples needed only to synthesize the words in the text message, and delivers those to the receiving client device so that they are used by a client-side concatenative text-to-speech engine to generate audible speech signals having a close likeness to the actual voice of the author.
    Type: Grant
    Filed: June 12, 2012
    Date of Patent: April 23, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Terry Wade Niemeyer, Liliana Orozco
  • Patent number: 8423366
    Abstract: A method includes receiving, by a system, a voice recording associated with a user, transcribing, the voice recording into text that includes a group of words, and storing an association between a portion of each respective word and a corresponding portion of the voice recording. The corresponding portion of the voice recording is the portion of the voice recording from which the portion of the respective word was transcribed. The method may also include determining a modification to a speech synthesis voice associated with the user based at least in part on the association.
    Type: Grant
    Filed: July 18, 2012
    Date of Patent: April 16, 2013
    Assignee: Google Inc.
    Inventors: Marcus Alexander Foster, Richard Zarek Cohen
  • Patent number: 8422641
    Abstract: Devices, systems, and methods for recording call sessions over a VoIP network using a distributed record server architecture are disclosed. An example recording device for recording segments of a call session includes a record server configured to receive an agent voice data stream and an external caller voice data stream from an agent telephone station, and a file repository configured to store voice data and call data associated with each recorded segment of the call session. The recording device is configured to tag recorded segments of each call session, which can be later used by a third-party application or database to check the status and/or integrity of the recorded call session.
    Type: Grant
    Filed: June 15, 2009
    Date of Patent: April 16, 2013
    Assignee: Calabrio, Inc.
    Inventor: James Paul Martin, II
  • Patent number: 8423365
    Abstract: A contextual conversion platform, and method for converting text-to-speech, are described that can convert content of a target to spoken content. Embodiments of the contextual conversion platform can identify certain contextual characteristics of the content, from which can be generated a spoken content input. This spoken content input can include tokens, e.g., words and abbreviations, to be converted to the spoken content, as well as substitution tokens that are selected from contextual repositories based on the context identified by the contextual conversion platform.
    Type: Grant
    Filed: May 28, 2010
    Date of Patent: April 16, 2013
    Inventor: Daniel Ben-Ezri
  • Patent number: 8412528
    Abstract: The present invention relates to computer-generated text-to-speech conversion. It relates in particular to a method and system for updating a Concatenative Text-To-Speech (CTTS) system with a speech database from a base version to a new version. The present invention performs an application-specific re-organization of a synthesizer's speech database by means of certain decision tree modifications. By that reorganization, certain synthesis units are made available for the new application, which are not available in prior art without a new speech session. This allows the creation of application-specific synthesizers with improved output speech quality for arbitrary domains and applications at very low cost.
    Type: Grant
    Filed: May 2, 2006
    Date of Patent: April 2, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Volker Fischer, Siegfried Kunzmann
  • Patent number: 8412529
    Abstract: An approach is provided for enhancing verbal communication sessions. A verbal component of a communication session is converted into textual information. The converted textual information is scanned for a text string to trigger an application. The application is invoked to provide supplemental information about the textual information or to perform an action in response to the textual information for or on behalf of a party of the communication session. The supplemental information or a confirmation of the action is transmitted to the party.
    Type: Grant
    Filed: October 29, 2008
    Date of Patent: April 2, 2013
    Assignee: Verizon Patent and Licensing Inc.
    Inventors: Martin W. McKee, Paul T. Schultz, Robert A. Sartini
  • Patent number: 8401856
    Abstract: A very common problem is when people speak a language other than the language which they are accustomed, syllables can be spoken for longer or shorter than the listener would regard as appropriate. An example of this can be observed when people who have a heavy Japanese accent speak English. Since Japanese words end with vowels, there is a tendency for native Japanese to add a vowel sound to the end of English words that should end with a consonant. Illustratively, native Japanese speakers often pronounce “orange” as “orenji.” An aspect provides an automatic speech-correcting process that would not necessarily need to know that fruit is being discussed; the system would only need to know that the speaker is accustomed to Japanese, that the listener is accustomed to English, that “orenji” is not a word in English, and that “orenji” is a typical Japanese mispronunciation of the English word “orange.
    Type: Grant
    Filed: May 17, 2010
    Date of Patent: March 19, 2013
    Assignee: Avaya Inc.
    Inventors: Terry Jennings, Paul Roller Michaelis
  • Publication number: 20130066631
    Abstract: The present invention provides a parametric speech synthesis method and a parametric speech synthesis system.
    Type: Application
    Filed: October 27, 2011
    Publication date: March 14, 2013
    Applicant: GOERTEK INC.
    Inventors: Fengliang Wu, Zhenhua Wu
  • Patent number: 8396708
    Abstract: An avatar facial expression representation technology is provided. The avatar facial expression representation technology estimates changes in emotion and emphasis in a user's voice from vocal information, and changes in mouth shape of the user from pronunciation information of the voice. The avatar facial expression technology tracks a user's facial movements and changes in facial expression from image information and may represent avatar facial expressions based on the result of the these operations. Accordingly, the avatar facial expressions can be obtained which are similar to actual facial expressions of the user.
    Type: Grant
    Filed: January 28, 2010
    Date of Patent: March 12, 2013
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Chi-youn Park, Young-kyoo Hwang, Jung-bae Kim
  • Patent number: 8392191
    Abstract: The present invention provides a method and apparatus of forming Chinese prosodic words, which method comprises the steps of inputting Chinese text; performing process of word segmentation and part of speech annotation for the input Chinese text to generate an initial prosodic word sequence; inserting grids representing prosodic word boundaries for all the words in the initial prosodic word sequence to generate a grid prosodic word sequence; annotating the grids ready to be deleted in the grid prosodic word sequence based on the prosodic word forming means; judging the grids which actually need to be deleted in the grids ready to be deleted based on the prosodic word forming means; deleting the grids which actually need to be deleted in the grid prosodic word sequence, and word forming the words between every two grids in the remaining grids to generate prosodic words.
    Type: Grant
    Filed: December 10, 2007
    Date of Patent: March 5, 2013
    Assignee: Fujitsu Limited
    Inventors: Guo Qing, Nobuyuki Katae
  • Patent number: 8392194
    Abstract: A method for effecting a machine-based determination of speech intelligibility in an aircraft during flight operations includes: (a) in no particular order: (1) providing a representation of a machine-based speech evaluating signal; and (2) providing a representation of in-flight noise; (b) combining the representation of a machine-based speech evaluation signal and the representation of in-flight noise to obtain a combined noise signal; and (c) employing the combined noise signal to present the machine-based determination of speech intelligibility in an aircraft during flight operations.
    Type: Grant
    Filed: October 15, 2008
    Date of Patent: March 5, 2013
    Assignee: The Boeing Company
    Inventor: Naval Kishore Agarwal
  • Patent number: 8380484
    Abstract: A method (50) of dynamically changing a sentence structure of a message can include the step of receiving (51) a user request for information, retrieving (52) data based on the information requested, and altering (53) among an intonation and/or the language conveying the information based on the context of the information to be presented. The intonation can optionally be altered by altering (54) a volume, a speed, and/or a pitch based on the information to be presented. The language can be altered by selecting (55) among a finite set of synonyms based on the information to be presented to the user or by selecting (56) among key verbs, adjectives or adverbs that vary along a continuum.
    Type: Grant
    Filed: August 10, 2004
    Date of Patent: February 19, 2013
    Assignee: International Business Machines Corporation
    Inventors: Brent L. Davis, Stephen W. Hanley, Vanessa V. Michelini, Melanie D. Polkosky
  • Patent number: 8374873
    Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.
    Type: Grant
    Filed: August 11, 2009
    Date of Patent: February 12, 2013
    Assignee: Morphism, LLC
    Inventor: James H. Stephens, Jr.
  • Patent number: 8374859
    Abstract: An automatic answering device and an automatic answering method for automatically answering to a user utterance are configured: to prepare a conversation scenario that is a set of input sentences and replay sentences, the input sentences each corresponding to a user utterance assumed to be uttered by a user, the reply sentences each being an automatic reply to the inputted sentence; to accept a user utterance; to determine the reply sentence to the accepted user utterance on the basis of the conversation scenario; and to present the determined reply sentence to the user. Data of the conversation scenario have a data structure that enables the inputted sentences and the reply sentences to be expressed in a state transition diagram in which each of the inputted sentences is defined as a morphism and the reply sentence corresponding to the inputted sentence is defined as an object.
    Type: Grant
    Filed: August 17, 2009
    Date of Patent: February 12, 2013
    Assignee: Universal Entertainment Corporation
    Inventors: Shengyang Huang, Hiroshi Katukura
  • Patent number: 8374872
    Abstract: A device provides a question to a user, and receives, from the user, an unrecognized voice response to the question. The device also provides the unrecognized voice response to an utterance agent for determination of the unrecognized voice response without user involvement, and provides an additional question to the user prior to receiving the determination of the unrecognized voice response from the utterance agent.
    Type: Grant
    Filed: November 4, 2008
    Date of Patent: February 12, 2013
    Assignee: Verizon Patent and Licensing Inc.
    Inventor: Manohar R. Kesireddy
  • Patent number: 8374876
    Abstract: A system and a method for speech generation which assist the speech of those with a disability or a medical condition such as cerebral palsy, motor neurone disease or a dysarthia following a stroke. The system has a user interface having a multiplicity of states each of which correspond to a sound and a selector for making a selection of a state or a combination of states. The system also has a processor for processing the selected state or combination of states and an audio output for outputting the sound or combination of sounds. The sounds associated with the states can be phonemes or phonics and the user interface is typically a manually operable device such as a mouse, trackball, joystick or other device that allows a user to distinguish between states by manipulating the interface to a number of positions.
    Type: Grant
    Filed: February 1, 2007
    Date of Patent: February 12, 2013
    Assignee: The University of Dundee
    Inventors: Rolf Black, Annula Waller, Eric Abel, Iain Murray, Graham Pullin
  • Publication number: 20130035940
    Abstract: The invention provides an electrolaryngeal speech reconstruction method and a system thereof. Firstly, model parameters are extracted from the collected speech as a parameter library, then facial images of a speaker are acquired and then transmitted to an image analyzing and processing module to obtain the voice onset and offset times and the vowel classes, then a waveform of a voice source is synthesized by a voice source synthesis module, finally, the waveform of the above voice source is output by an electrolarynx vibration output module, wherein the voice source synthesis module firstly sets the model parameters of a glottal voice source so as to synthesize the waveform of the glottal voice source, and then a waveguide model is used to simulate sound transmission in a vocal tract and select shape parameters of the vocal tract according to the vowel classes.
    Type: Application
    Filed: September 4, 2012
    Publication date: February 7, 2013
    Applicant: XI'AN JIAOTONG UNIVERITY
    Inventors: MINGXI WAN, LIANG WU, SUPIN WANG, ZHIFENG NIU, CONGYING WAN
  • Patent number: 8370150
    Abstract: The text information presentation device calculates an optimum readout speed on the basis of the content of text information being input, its arriving time, and its previous arriving time; speech-synthesizes text information being input, at the readout speed calculated; and outputs it as an audio signal, or alternatively controls the speed at which a video signal is output according to an output state of the speech synthesizing unit.
    Type: Grant
    Filed: July 15, 2008
    Date of Patent: February 5, 2013
    Assignee: Panasonic Corporation
    Inventors: Keiichi Toiyama, Mitsuteru Kataoka, Kohsuke Yamamoto
  • Patent number: 8370148
    Abstract: Disclosed herein are systems, methods, and computer readable-media for answering a communication notification. The method for answering a communication notification comprises receiving a notification of communication from a user, converting information related to the notification to speech, outputting the information as speech to the user, and receiving from the user an instruction to accept or ignore the incoming communication associated with the notification. In one embodiment, information related to the notification comprises one or more of a telephone number, an area code, a geographic origin of the request, caller id, a voice message, address book information, a text message, an email, a subject line, an importance level, a photograph, a video clip, metadata, an IP address, or a domain name. Another embodiment involves notification assigned an importance level and repeat attempts at notification if it is of high importance.
    Type: Grant
    Filed: April 14, 2008
    Date of Patent: February 5, 2013
    Assignee: AT&T Intellectual Property I, L.P.
    Inventor: Horst Schroeter
  • Patent number: 8370151
    Abstract: Disclosed are techniques and systems to provide a narration of a text in multiple different voices where the portions of the text narrated using the different voices are selected by a user.
    Type: Grant
    Filed: January 14, 2010
    Date of Patent: February 5, 2013
    Assignee: K-NFB Reading Technology, Inc.
    Inventors: Raymond C. Kurzweil, Paul Albrecht, Peter Chapman
  • Patent number: 8370149
    Abstract: Waveform concatenation speech synthesis with high sound quality. Prosody with both high accuracy and high sound quality is achieved by performing a two-path search including a speech segment search and a prosody modification value search. An accurate accent is secured by evaluating the consistency of the prosody by using a statistical model of prosody variations (the slope of fundamental frequency) for both of two paths of the speech segment selection and the modification value search. In the prosody modification value search, a prosody modification value sequence that minimizes a modified prosody cost is searched for. This allows a search for a modification value sequence that can increase the likelihood of absolute values or variations of the prosody to the statistical model as high as possible with minimum modification values.
    Type: Grant
    Filed: August 15, 2008
    Date of Patent: February 5, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Ryuki Tachibana, Masafumi Nishimura
  • Patent number: 8364488
    Abstract: Disclosed are techniques and systems to provide a narration of a text in multiple different voices. Further disclosed are techniques and systems for modifying a voice model associated with a selected character based on data received from a user.
    Type: Grant
    Filed: January 14, 2010
    Date of Patent: January 29, 2013
    Assignee: K-NFB Reading Technology, Inc.
    Inventors: Raymond C. Kurzweil, Paul Albrecht, Peter Chapman
  • Patent number: 8364487
    Abstract: A language processing system may determine a display form of a spoken word by analyzing the spoken form using a language model that includes dictionary entries for display forms of homonyms. The homonyms may include trade names as well as given names and other phrases. The language processing system may receive spoken language and produce a display form of the language while displaying the proper form of the homonym. Such a system may be used in search systems where audio input is converted to a graphical display of a portion of the spoken input.
    Type: Grant
    Filed: October 21, 2008
    Date of Patent: January 29, 2013
    Assignee: Microsoft Corporation
    Inventors: Yun-Cheng Ju, Julian J. Odell
  • Patent number: 8364466
    Abstract: The teachings described herein generally relate to a multilingual electronic translation of a source phrase to a destination language selected from multiple languages, and this can be accomplished through the use of a network environment. The electronic translation can occur as a spoken translation, can be in real-time, and can mimic the voice of the user of the system.
    Type: Grant
    Filed: June 16, 2012
    Date of Patent: January 29, 2013
    Assignee: NewTalk, Inc.
    Inventors: Bruce W. Nash, Craig A. Robinson, Martha P. Robinson, Robert H. Clemons
  • Patent number: 8364472
    Abstract: Provided is an audio encoding device which can detect an optimal pitch pulse when using pitch pulse information as redundant information.
    Type: Grant
    Filed: February 29, 2008
    Date of Patent: January 29, 2013
    Assignee: Panasonic Corporation
    Inventor: Hiroyuki Ehara
  • Patent number: 8359202
    Abstract: Disclosed are techniques and systems to provide a narration of a text in multiple different voices where the portions of the text narrated using the different voices are selected by a user. Also disclosed are techniques and systems for associating characters with portions of a sequence of words selected by a user. Different characters having different voice models can be associated with different portions of a sequence of words.
    Type: Grant
    Filed: January 14, 2010
    Date of Patent: January 22, 2013
    Assignee: K-NFB Reading Technology, Inc.
    Inventors: Raymond C. Kurzweil, Paul Albrecht, Peter Chapman
  • Publication number: 20130013312
    Abstract: A system and method for improving the response time of text-to-speech synthesis using triphone contexts. The method includes receiving input text, selecting a plurality of N phoneme units from a triphone unit selection database as candidate phonemes for synthesized speech based on the input text, wherein the triphone unit selection database comprises triphone units each comprising three phones and if the candidate phonemes are available in the triphone unit selection database, and applying a cost process to select a set of phonemes from the candidate phonemes. If so candidate phonemes are available in the triphone unit selection database, the method includes applying a single phoneme approach to select single phonemes for synthesis, the single phonemes used in synthesis independent of a triphone structure.
    Type: Application
    Filed: July 16, 2012
    Publication date: January 10, 2013
    Applicant: AT&T Intellectual Property II, L.P.
    Inventor: Alistair D. Conkie
  • Patent number: 8352269
    Abstract: Disclosed are techniques and systems to provide a narration of a text in multiple different voices. Further disclosed are techniques and systems for processing indicia in a document to determine a portion of words and associating a particular a voice model with the portion of words based on the indicia. During a readback process, an audible output corresponding to the words in the portion of words is generated using the voice model associated with the portion of words.
    Type: Grant
    Filed: January 14, 2010
    Date of Patent: January 8, 2013
    Assignee: K-NFB Reading Technology, Inc.
    Inventors: Raymond C. Kurzweil, Paul Albrecht, Peter Chapman
  • Patent number: 8352271
    Abstract: To facilitate text-to-speech conversion of a username, a first or last name of a user associated with the username may be retrieved, and a pronunciation of the username may be determined based at least in part on whether the name forms at least part of the username. To facilitate text-to-speech conversion of a domain name having a top level domain and at least one other level domain, a pronunciation for the top level domain may be determined based at least in part upon whether the top level domain is one of a predetermined set of top level domains. Each other level domain may be searched for one or more recognized words therewithin, and a pronunciation of the other level domain may be determined based at least in part on an outcome of the search. The username and domain name may form part of a network address such as an email address, URL or URI.
    Type: Grant
    Filed: February 23, 2012
    Date of Patent: January 8, 2013
    Assignee: Research In Motion Limited
    Inventors: Matthew Bells, Jennifer Elizabeth Lhotak, Michael Angelo Nanni
  • Patent number: 8352267
    Abstract: A plurality of input devices each includes a speaker, an operation data transmitter, a voice data receiver, and a voice controller. An information processing apparatus includes a voice storing area, object displaying programmed logic circuitry, operation data acquiring programmed logic circuitry, pointing position determining programmed logic circuitry, object specifying programmed logic circuitry, voice reading programmed logic circuitry, and voice data transmitting programmed logic circuitry. The pointing position determining programmed logic circuitry specifies, for each of the input devices, a pointing position on a screen based on operation data transmitted from the operation data transmitter. The voice reading programmed logic circuitry reads voice data corresponding to the pointing position for each of the input devices. The voice data transmitting programmed logic circuitry transmits the voice data to each of the input devices.
    Type: Grant
    Filed: June 27, 2007
    Date of Patent: January 8, 2013
    Assignee: Nintendo Co., Ltd.
    Inventor: Toshiaki Suzuki
  • Patent number: 8352268
    Abstract: Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech in a language (e.g., dialect or accent) that is familiar to a user. The algorithms may be implemented on a system including several dedicated render engines. The system may be part of a back end coupled to a front end including storage for media assets and associated synthesized speech, and a request processor for receiving and processing requests that result in providing the synthesized speech. The front end may communicate media assets and associated synthesized speech content over a network to host devices coupled to portable electronic devices on which the media assets and synthesized speech are played back.
    Type: Grant
    Filed: September 29, 2008
    Date of Patent: January 8, 2013
    Assignee: Apple Inc.
    Inventors: DeVang Naik, Kim Silverman, Jerome Bellegarda