Synthesis Patents (Class 704/258)
  • Patent number: 8620660
    Abstract: Improved oscillator-based source modeling methods for estimating model parameters, for evaluating model quality for restoring the input from the model parameters, and for improving performance over known in the art methods are disclosed. An application of these innovations to speech coding is described. The improved oscillator model is derived from the information contained in the current input signal as well as from some form of data history, often the restored versions of the earlier processed data. Operations can be performed in real time, and compression can be achieved at a user-specified level of performance and, in some cases, without information loss. The new model can be combined with methods in the existing art in order to complement the properties of these methods, to improve overall performance. The present invention is effective for very low bit-rate coding/compression and decoding/decompression of digital signals, including digitized speech and audio signals.
    Type: Grant
    Filed: October 29, 2010
    Date of Patent: December 31, 2013
    Assignee: The United States of America, as Represented by the Secretary of the Navy
    Inventors: Anton Yen, Irina Gorodnitsky
  • Patent number: 8612228
    Abstract: A section corresponding to a given duration is sampled from sound data that indicates the voice of a player collected by a microphone, and a vocal tract cross-sectional area function of the sampled section is calculated. The vertical dimension of the mouth is calculated from a throat-side average cross-sectional area of the vocal tract cross-sectional area function, and the area of the mouth is calculated from a mouth-side average cross-sectional area. The transverse dimension of the mouth is calculated from the area of the mouth and the vertical dimension of the mouth.
    Type: Grant
    Filed: March 26, 2010
    Date of Patent: December 17, 2013
    Assignee: Namco Bandai Games Inc.
    Inventor: Hiroyuki Hiraishi
  • Publication number: 20130325476
    Abstract: An apparatus and method for generating a wave field synthesis (WFS) signal in consideration of a height of a speaker are disclosed. The WFS signal generation apparatus may include a waveform propagation distance determination unit to determine a propagation distance of a waveform propagate from a sound source based on a height of a speaker, and a WFS signal generation unit to generate a WFS signal corresponding to the speaker using the propagation distance of the waveform.
    Type: Application
    Filed: March 14, 2013
    Publication date: December 5, 2013
    Applicant: Electronics and Telecommunications Research Instit
    Inventor: Electronics and Telecommunications Research Institute
  • Publication number: 20130325477
    Abstract: A speech synthesis system includes: a training database storing training data which is set of features extracted from speech waveform data; a feature space division unit which divides a feature space which is a space concerning to the training data into partial spaces; a sparse or dense state detection unit which detects a sparse or dense state to each partial space which is the divided feature space, generates sparse or dense information which is information indicating the sparse or dense state and outputs the sparse or dense information; and a pronunciation information correcting unit which corrects pronunciation information which is used for speech synthesis based on the outputted sparse or dense information.
    Type: Application
    Filed: February 17, 2012
    Publication date: December 5, 2013
    Applicant: NEC Corporation
    Inventors: Yasuyuki Mitsui, Reishi Kondo, Masanori Kato
  • Patent number: 8594993
    Abstract: Frame mapping-based cross-lingual voice transformation may transform a target speech corpus in a particular language into a transformed target speech corpus that remains recognizable, and has the voice characteristics of a target speaker that provided the target speech corpus. A formant-based frequency warping is performed on the fundamental frequencies and the linear predictive coding (LPC) spectrums of source speech waveforms in a first language to produce transformed fundamental frequencies and transformed LPC spectrums. The transformed fundamental frequencies and the transformed LPC spectrums are then used to generate warped parameter trajectories. The warped parameter trajectories are further used to transform the target speech waveforms in the second language to produce transformed target speech waveform with voice characteristics of the first language that nevertheless retain at least some voice characteristics of the target speaker.
    Type: Grant
    Filed: April 4, 2011
    Date of Patent: November 26, 2013
    Assignee: Microsoft Corporation
    Inventors: Yao Qian, Frank Kao-Ping Soong
  • Patent number: 8583438
    Abstract: Described is a technology by which synthesized speech generated from text is evaluated against a prosody model (trained offline) to determine whether the speech will sound unnatural. If so, the speech is regenerated with modified data. The evaluation and regeneration may be iterative until deemed natural sounding. For example, text is built into a lattice that is then (e.g., Viterbi) searched to find a best path. The sections (e.g., units) of data on the path are evaluated via a prosody model. If the evaluation deems a section to correspond to unnatural prosody, that section is replaced, e.g., by modifying/pruning the lattice and re-performing the search. Replacement may be iterative until all sections pass the evaluation. Unnatural prosody detection may be biased such that during evaluation, unnatural prosody is falsely detected at a higher rate relative to a rate at which unnatural prosody is missed.
    Type: Grant
    Filed: September 20, 2007
    Date of Patent: November 12, 2013
    Assignee: Microsoft Corporation
    Inventors: Yong Zhao, Frank Kao-ping Soong, Min Chu, Lijuan Wang
  • Patent number: 8583439
    Abstract: Improved methods of presenting speech prompts to a user as part of an automated system that employs speech recognition or other voice input are described. The invention improves the user interface by providing in combination with at least one user prompt seeking a voice response, an enhanced user keyword prompt intended to facilitate the user selecting a keyword to speak in response to the user prompt. The enhanced keyword prompts may be the same words as those a user can speak as a reply to the user prompt but presented using a different audio presentation method, e.g., speech rate, audio level, or speaker voice, than used for the user prompt. In some cases, the user keyword prompts are different words from the expected user response keywords, or portions of words, e.g., truncated versions of keywords.
    Type: Grant
    Filed: January 12, 2004
    Date of Patent: November 12, 2013
    Assignee: Verizon Services Corp.
    Inventor: James Mark Kondziela
  • Patent number: 8583437
    Abstract: Service architecture for providing to a user terminal of a communications network textual information and relative speech synthesis, the user terminal being provided with a speech synthesis engine and a basic database of speech waveforms includes: a content server for downloading textual information requested by means of a browser application on the user terminal; a context manager for extracting context information from the textual information requested by the user terminal; a context selector for selecting an incremental database of speech waveforms associated with extracted context information and for downloading the incremental database into the user terminal; a database manager on the user terminal for managing the composition of an enlarged database of speech waveforms for the speech synthesis engine including the basic and the incremental databases of speech waveforms.
    Type: Grant
    Filed: May 31, 2005
    Date of Patent: November 12, 2013
    Assignee: Telecom Italia S.p.A.
    Inventors: Alessio Cervone, Ivano Salvatore Collotta, Paolo Coppo, Donato Ettorre, Maurizio Fodrini, Maura Turolla
  • Patent number: 8583443
    Abstract: Disclosed is a recording and reproducing apparatus comprising: an apparatus main body; and a remote controller to perform remote control of the apparatus main body, wherein the remote controller comprises: a key operating section to receive a key operation by a user; a sound information inputting section to input sound information; and a transmitting section to transmit sound data based on the sound information to the apparatus main body, and the apparatus main body comprises: a recording section to record input content data on a recording medium; a reproducing section to reproduce the content data; a receiving section to receive the sound data; a sound information recording section to record the sound data so as to be associated with a piece of the content data; and a sound information outputting section to reproduce the sound data to output the reproduced sound data.
    Type: Grant
    Filed: April 10, 2008
    Date of Patent: November 12, 2013
    Assignee: Funai Electric Co., Ltd.
    Inventor: Masayuki Misawa
  • Patent number: 8577682
    Abstract: An auditory user interactive interface to an application program being installed in the computer controlled system. A routine in an object, in an application program being installed in the computer controlled system for providing an auditory user interface to the program in combination with auditory means for offering the user of the computer controlled system the auditory user interface during installation of said application program, and responsive to the selection of the auditory interface provides the auditory user interface during said installation of the application program.
    Type: Grant
    Filed: October 27, 2005
    Date of Patent: November 5, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Peter T. Brunet, Anh Quy Lu, Mark Edward Nosewicz, Lawrence Frank Weiss
  • Patent number: 8571870
    Abstract: Techniques for generating synthetic speech with contrastive stress. In one aspect, a speech-enabled application generates a text input including a text transcription of a desired speech output, and inputs the text input to a speech synthesis system. The synthesis system generates an audio speech output corresponding to at least a portion of the text input, with at least one portion carrying contrastive stress, and provides the audio speech output for the speech-enabled application. In another aspect, a speech-enabled application inputs a plurality of text strings, each corresponding to a portion of a desired speech output, to a software module for rendering contrastive stress. The software module identifies a plurality of audio recordings that render at least one portion of at least one of the text strings as speech carrying contrastive stress. The speech-enabled application generates an audio speech output corresponding to the desired speech output using the audio recordings.
    Type: Grant
    Filed: August 9, 2010
    Date of Patent: October 29, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Darren C. Meyer, Stephen R. Springer
  • Patent number: 8571849
    Abstract: Disclosed herein are systems, methods, and computer readable-media for enriching spoken language translation with prosodic information in a statistical speech translation framework. The method includes receiving speech for translation to a target language, generating pitch accent labels representing segments of the received speech which are prosodically prominent, and injecting pitch accent labels with word tokens within the translation engine to create enriched target language output text. A further step may be added of synthesizing speech in the target language based on the prosody enriched target language output text. An automatic prosody labeler can generate pitch accent labels. An automatic prosody labeler can exploit lexical, syntactic, and prosodic information of the speech. A maximum entropy model may be used to determine which segments of the speech are prosodically prominent.
    Type: Grant
    Filed: September 30, 2008
    Date of Patent: October 29, 2013
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Srinivas Bangalore, Vivek Kumar Rangarajan Sridhar
  • Patent number: 8571039
    Abstract: A method and apparatus for transmitting an audio signal over a communication channel comprising encoding the audio signal with an encoder 204 using a first sampling rate, filtering the audio signal using a first cut off frequency, the first cut off frequency being chosen in dependence upon the first sampling rate, and transmitting the encoded and filtered audio signal over the communication channel. The presence of a condition in which the sampling rate of the encoder 204 is to be switched to a second sampling rate at a switching time is determined and if the condition has been determined to be present, the cut off frequency used in the filtering step is gradually changed from the first cut off frequency to a second cut off frequency, the second cut off frequency being chosen in dependence upon the second sampling rate, such that the audio bandwidth of the transmitted signal changes gradually when the sampling rate is switched to the second sampling rate.
    Type: Grant
    Filed: June 23, 2010
    Date of Patent: October 29, 2013
    Assignee: Skype
    Inventors: Stefan Strommer, Karsten Vandborg Sorensen, Soren Skak Jensen, Koen Vos, Jon Bergenheim
  • Patent number: 8566106
    Abstract: A method and device for searching an algebraic codebook during encoding of a sound signal, wherein the algebraic codebook comprises a set of codevectors formed of a number of pulse positions and a number of pulses distributed over the pulse positions. In the algebraic codebook searching method and device, a reference signal for use in searching the algebraic codebook is calculated. In a first stage, a position of a first pulse is determined in relation with the reference signal and among the number of pulse positions. In each of a number of stages subsequent to the first stage, (a) an algebraic codebook gain is recomputed, (b) the reference signal is updated using the recomputed algebraic codebook gain and (c) a position of another pulse is determined in relation with the updated reference signal and among the number of pulse positions.
    Type: Grant
    Filed: September 11, 2008
    Date of Patent: October 22, 2013
    Assignee: Voiceage Corporation
    Inventors: Redwan Salami, Vaclav Eksler, Milan Jelinek
  • Patent number: 8566099
    Abstract: A system and method for improving the response time of text-to-speech synthesis using triphone contexts. The method includes identifying a set of triphone sequences, tabulating the set of triphone sequences using a plurality of contexts, where each context specific triphone sequence of the plurality of context specific triphone sequences has a top N triphone units made of the triphone units having lowest target costs when each triphone unit is individually combined into a 5-phoneme combination. Input texts having one of the contexts are received, and one of the context specific triphone sequences is selected based on the context. Input text is then synthesized using the context specific triphone sequence.
    Type: Grant
    Filed: July 16, 2012
    Date of Patent: October 22, 2013
    Assignee: AT&T Intellectual Property II, L.P.
    Inventor: Alistair D. Conkie
  • Patent number: 8566098
    Abstract: A system and method are disclosed for synthesizing speech based on a selected speech act. A method includes modifying synthesized speech of a spoken dialogue system, by (1) receiving a user utterance, (2) analyzing the user utterance to determine an appropriate speech act, and (3) generating a response of a type associated with the appropriate speech act, wherein in linguistic variables in the response are selected, based on the appropriate speech act.
    Type: Grant
    Filed: October 30, 2007
    Date of Patent: October 22, 2013
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Ann K Syrdal, Mark Beutnagel, Alistair D Conkie, Yeon-Jun Kim
  • Patent number: 8566078
    Abstract: A method of generating a statistical machine translation database through a game in which a monolingual structure is provided to a plurality of players. A first translation attempt is received from each of the plurality of players. The first translation attempt from each of the plurality of players is compared. Feedback is provided to each of the plurality of players and the attempts are received and compared to provide feedback to iteratively converge subsequent translations from each of the plurality of players into a final translated structure.
    Type: Grant
    Filed: January 29, 2010
    Date of Patent: October 22, 2013
    Assignee: International Business Machines Corporation
    Inventors: Ruhi Sarikaya, Jiri Navratil, Osamuyimen Stewart, David Lubensky
  • Patent number: 8560301
    Abstract: A language expression apparatus and a method based on a context and a intent awareness, are provided. The apparatus and method may recognize a context and an intent of a user and may generate a language expression based on the recognized context and the recognized intent, thereby providing an interpretation/translation service and/or providing an education service for learning a language.
    Type: Grant
    Filed: March 2, 2010
    Date of Patent: October 15, 2013
    Assignee: Samsung Electronics Co., Ltd.
    Inventor: Yeo Jin Kim
  • Patent number: 8560315
    Abstract: A conference support device includes an image receiving portion that receives captured images from conference terminals, a voice receiving portion that receives, from one of the conference terminals, a voice that is generated by a first participant, a first storage portion that stores the captured images and the voice, a voice recognition portion that recognizes the voice, a text data creation portion that creates text data that express the words that are included in the voice, an addressee specification portion that specifies a second participant, whom the voice is addressing, an image creation portion that creates a display image that is configured from the captured images and in which the text data are associated with the first participant and a specified image is associated with at least one of the first participant and the second participant, and a transmission portion that transmits the display image to the conference terminals.
    Type: Grant
    Filed: March 12, 2010
    Date of Patent: October 15, 2013
    Assignee: Brother Kogyo Kabushiki Kaisha
    Inventor: Mizuho Yasoshima
  • Patent number: 8560317
    Abstract: A vocabulary dictionary storing unit for storing a plurality of words in advance, a vocabulary dictionary managing unit for extracting recognition target words, a matching unit for calculating a degree of matching with the recognition target words based on an accepted voice, a result output unit for outputting, as a recognition result, a word having a best score from a result of calculating the degree of matching, and an extraction criterion information managing unit for changing extraction criterion information according to a result of monitoring by a monitor control unit are provided. The vocabulary dictionary storing unit further includes a scale information storing unit for storing scale information serving as a scale at the time of extracting the recognition target words, and an extraction criterion information storing unit for storing extraction criterion information indicating a criterion of the recognition target words at the time of extracting the recognition target words.
    Type: Grant
    Filed: September 18, 2006
    Date of Patent: October 15, 2013
    Assignee: Fujitsu Limited
    Inventor: Kenji Abe
  • Patent number: 8554541
    Abstract: A virtual pet system includes: a virtual pet client, adapted to receive a sentence in natural language and send the sentence to a Q&A server; the Q&A server, adapted to receive the sentence, process the sentence through natural language comprehension, generate an answer in natural language based on a result of natural language comprehension and reasoning knowledge, and send the answer in natural language to the virtual pet client. A method for virtual pet chatting includes: receiving a sentence in natural language, perform natural language comprehension for the sentence, and generating an answer in natural language based on a result of natural language comprehension and reasoning knowledge.
    Type: Grant
    Filed: September 18, 2008
    Date of Patent: October 8, 2013
    Assignee: Tencent Technology (Shenzhen) Company Ltd.
    Inventors: Haisong Yang, Zhiyuan Liu, Yunfeng Liu, Rongling Yu
  • Patent number: 8554566
    Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.
    Type: Grant
    Filed: November 29, 2012
    Date of Patent: October 8, 2013
    Assignee: Morphism LLC
    Inventor: James H. Stephens, Jr.
  • Patent number: 8554565
    Abstract: According to one embodiment, a speech synthesizer generates a speech segment sequence and synthesizes speech by connecting speech segments of the generated speech segment sequence. If a speech segment of a synthesized first speech segment sequence is different from the speech segment of a synthesized second speech segment sequence having the same synthesis unit as the first speech segment sequence, the speech synthesizer disables the speech segment of the first speech segment sequence that is different from the speech segment of the second speech segment sequence.
    Type: Grant
    Filed: September 14, 2010
    Date of Patent: October 8, 2013
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Osamu Nishiyama, Takehiko Kagoshima
  • Patent number: 8548809
    Abstract: A voice guidance system for providing a guidance by voice concerning operations of an information processing apparatus, comprises a detector that detects that a predetermined function of the information processing apparatus is disabled, and a voice guidance unit that outputs a voice message reporting a reason why the predetermined function of the information processing apparatus is disabled, in response to the detection output of the detector.
    Type: Grant
    Filed: June 16, 2005
    Date of Patent: October 1, 2013
    Assignee: Fuji Xerox Co., Ltd.
    Inventors: Kanji Itaki, Michihiro Kawamura, Nozomi Noguchi
  • Publication number: 20130253934
    Abstract: A method of providing user participation in a social broadcast environment is disclosed. A network communication is received from a user of a broadcast that includes a preference data indicating a preference of the user that a promoted content be included in the broadcast. Via a responsive network communication, a feedback data is provided to the user that includes a predicted future time at which the promoted content may be included in the broadcast.
    Type: Application
    Filed: January 31, 2013
    Publication date: September 26, 2013
    Applicant: JELLI, INC.
    Inventors: Jateen P. Parekh, Michael S. Dougherty, Sarah Caplener, Mitchell A. Yawitz, Scott Strain, Adam J. Dobrer
  • Patent number: 8542839
    Abstract: An audio processing apparatus and method for a mobile device are provided. The audio processing apparatus and method may appropriately determine sound source localizations corresponding to a voice signal and an audio signal, and thereby may simultaneously provide a voice call service and a multimedia service. Also, the audio processing apparatus and method may guarantee quality of the voice call service even when simultaneously providing the voice call service and the multimedia service.
    Type: Grant
    Filed: March 18, 2009
    Date of Patent: September 24, 2013
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Chang Yong Son, Do Hyung Kim, Sang Oak Woo, Kang Eun Lee
  • Patent number: 8538743
    Abstract: A software language including language constructs for disambiguating text that is to be converted to speech using configurable lexeme based rules. The language can include at least one conditional statement and a significance indicator. The conditional statement can define a sense of usage for a lexeme. The significance indicator can define a criteria for selecting an associated sense of usage. The language can also include an action expression that is associated with a conditional statement that defines a set of programmatic actions to be executed upon a selection of the associated usage sense. The conditional statement can include a context range specification that defines a scope of an input string for examination when evaluating the conditional statement. Further, the conditional statement can include a directive that represents a defined condition of the lexeme or the text surrounding the lexeme.
    Type: Grant
    Filed: March 21, 2007
    Date of Patent: September 17, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Oswaldo Gago, Steven M. Hancock, Maria E. Smith
  • Publication number: 20130238337
    Abstract: A voice quality conversion system includes: an analysis unit which analyzes sounds of plural vowels of different types to generate first vocal tract shape information for each type of the vowels; a combination unit which combines, for each type of the vowels, the first vocal tract shape information on that type of vowel and the first vocal tract shape information on a different type of vowel to generate second vocal tract shape information on that type of vowel; and a synthesis unit which (i) combines vocal tract shape information on a vowel included in input speech and the second vocal tract shape information on the same type of vowel to convert vocal tract shape information on the input speech, and (ii) generates a synthetic sound using the converted vocal tract shape information and voicing source information on the input speech to convert the voice quality of the input speech.
    Type: Application
    Filed: April 29, 2013
    Publication date: September 12, 2013
    Applicant: Panasonic Corporation
    Inventors: Takahiro KAMAI, Yoshifumi HIROSE
  • Patent number: 8527273
    Abstract: Systems and methods for identifying the N-best strings of a weighted automaton. A potential for each state of an input automaton to a set of destination states of the input automaton is first determined. Then, the N-best paths are found in the result of an on-the-fly determinization of the input automaton. Only the portion of the input automaton needed to identify the N-best paths is determinized. As the input automaton is determinized, a potential for each new state of the partially determinized automaton is determined and is used in identifying the N-best paths of the determinized automaton, which correspond exactly to the N-best strings of the input automaton.
    Type: Grant
    Filed: July 30, 2012
    Date of Patent: September 3, 2013
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Mehryar Mohri, Michael Dennis Riley
  • Patent number: 8527276
    Abstract: A method and system for is disclosed for speech synthesis using deep neural networks. A neural network may be trained to map input phonetic transcriptions of training-time text strings into sequences of acoustic feature vectors, which yield predefined speech waveforms when processed by a signal generation module. The training-time text strings may correspond to written transcriptions of speech carried in the predefined speech waveforms. Subsequent to training, a run-time text string may be translated to a run-time phonetic transcription, which may include a run-time sequence of phonetic-context descriptors, each of which contains a phonetic speech unit, data indicating phonetic context, and data indicating time duration of the respective phonetic speech unit. The trained neural network may then map the run-time sequence of the phonetic-context descriptors to run-time predicted feature vectors, which may in turn be translated into synthesized speech by the signal generation module.
    Type: Grant
    Filed: October 25, 2012
    Date of Patent: September 3, 2013
    Assignee: Google Inc.
    Inventors: Andrew William Senior, Byungha Chun, Michael Schuster
  • Patent number: 8527258
    Abstract: A simultaneous interpretation system includes headsets for inputting and outputting voice, and a portable terminal for receiving an original language voice speech signal to be interpreted that is output from the first headset. The portable terminal outputs an interpreted voice speech signal based on the original language voice speech signal that has been interpreted into a different language to the second headset. The portable terminal either performs the interpretation or accesses an interpretation server to provide the second headset with the interpreted voice speech signal. Hence, the simultaneous interpretation is carried out using the short-range communication between the users by medium of the single portable terminal and thus more efficient and unrestricted conversation is realized.
    Type: Grant
    Filed: February 1, 2010
    Date of Patent: September 3, 2013
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Kyoung-Yup Kim, Jun-Tai Kim
  • Patent number: 8527275
    Abstract: A contextual input device includes a plurality of tactually discernable keys disposed in a predetermined configuration which replicates a particular relationship among a plurality of items associated with a known physical object. The tactually discernable keys are typically labeled with Braille type. The known physical object is typically a collection of related items grouped together by some common relationship. A computer-implemented process determines whether an input signal represents a selection of an item from among a plurality of items or an attribute pertaining to an item among the plurality of items. Once the selected item or attribute pertaining to an item is determined, the computer-implemented process transforms a user's selection from the input signal into an analog audio signal which is then audibly output as human speech with an electro-acoustic transducer.
    Type: Grant
    Filed: July 17, 2009
    Date of Patent: September 3, 2013
    Assignee: Cal Poly Corporation
    Inventors: Fantin Dennis, C. Arthur MacCarley
  • Patent number: 8527283
    Abstract: A method (100) includes receiving (101) an input digital audio signal comprising a narrow-band signal. The input digital audio signal is processed (102) to generate a processed digital audio signal. An estimate of the high-band energy level corresponding to the input digital audio signal is determined (103). Modification of the estimated high-band energy level is done based on an estimation accuracy and/or narrow-band signal characteristics (104). A high-band digital audio signal is generated based on the modified estimate of the high-band energy level and an estimated high-band spectrum corresponding to the modified estimate of the high-band energy level (105).
    Type: Grant
    Filed: January 19, 2011
    Date of Patent: September 3, 2013
    Assignee: Motorola Mobility LLC
    Inventors: Mark A. Jasiuk, Tenkasi V. Ramabadran
  • Patent number: 8527281
    Abstract: Methods and systems for sculpting synthesized speech using a graphic user interface are disclosed. An operator enters a stream of text that is used to produce a stream of target phonetic-units. The stream of target phonetic-units is then submitted to a unit-selection process to produce a stream of selected phonetic-units, each selected phonetic-unit derived from a database of sample phonetic-units. After the stream of sample phonetic-units is selected, an operator can remove various selected phonetic-units from the stream of selected phonetic-units, prune the sample phonetic-database and edit various cost functions using the graphic user interface. The edited speech information can then be submitted to the unit-selection process to produce a second stream of selected phonetic-units.
    Type: Grant
    Filed: June 29, 2012
    Date of Patent: September 3, 2013
    Assignee: Nuance Communications, Inc.
    Inventors: Peter Rutten, Paul A. Taylor
  • Patent number: 8521535
    Abstract: A biochemical analyzer having a microprocessing apparatus with expandable voice capacity is characterized in that a driving module is installed in a data processor and a voice carrier is replaceable. Thereby, increase or decrease of voice files can be easily done by replacing the current voice carrier with an alternative voice carrier storing desired voice files, without the need of replacing the driving module together with the voice carrier, thereby saving costs and reducing processing procedures.
    Type: Grant
    Filed: November 10, 2010
    Date of Patent: August 27, 2013
    Inventor: Chun-Yu Chen
  • Patent number: 8521513
    Abstract: A language-neutral speech grammar extensible markup language (GRXML) document and a localized response document are used to build a localized GRXML document. The language-neutral GRXML document specifies an initial grammar rule element. The initial grammar rule element specifies a given response type identifier and a given action. The localized response document contains a given response entry that specifies the given response type identifier and a given response in a given language. The localized GRXML document specifies a new grammar rule element. The new grammar rule element specifies the given response in the given language and the given action. The localized GRXML document is installed in an interactive voice response (IVR) system. The localized GRXML document configures the IVR system to perform the given action when a user of the IVR system speaks the given response to the IVR system.
    Type: Grant
    Filed: March 12, 2010
    Date of Patent: August 27, 2013
    Assignee: Microsoft Corporation
    Inventors: Thomas W. Millett, David Notario
  • Patent number: 8515759
    Abstract: An apparatus for synthesizing a rendered output signal having a first audio channel and a second audio channel includes a decorrelator stage for generating a decorrelator signal based on a downmix signal, and a combiner for performing a weighted combination of the downmix signal and a decorrelated signal based on parametric audio object information, downmix information and target rendering information. The combiner solves the problem of optimally combining matrixing with decorrelation for a high quality stereo scene reproduction of a number of individual audio objects using a multichannel downmix.
    Type: Grant
    Filed: April 23, 2008
    Date of Patent: August 20, 2013
    Assignee: Dolby International AB
    Inventors: Jonas Engdegard, Heiko Purnhagen, Barbara Resch, Lars Villemoes, Cornelia Falch, Juergen Herre, Johannes Hilpert, Andreas Hoelzer, Leonid Terentiev
  • Patent number: 8515749
    Abstract: Systems and methods for facilitating communication including recognizing speech in a first language represented in a first audio signal; forming a first text representation of the speech; processing the first text representation to form data representing a second audio signal; and causing presentation of the second audio signal to a second user while responsive to an interrupt signal from a first user. In some embodiments, processing the first text representation includes translating the first text representation to a second text representation in a second language and processing the second text representation to form the data representing the second audio signal. In some embodiments include accepting an interrupt signal from the first user and interrupting the presentation of the second audio signal.
    Type: Grant
    Filed: May 20, 2009
    Date of Patent: August 20, 2013
    Assignee: Raytheon BBN Technologies Corp.
    Inventor: David G. Stallard
  • Patent number: 8510113
    Abstract: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, identifying replacement segments in a secondary speech database, enhancing the primary speech database by substituting the identified secondary speech database segments for the corresponding identified segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.
    Type: Grant
    Filed: August 31, 2006
    Date of Patent: August 13, 2013
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Alistair Conkie, Ann K. Syrdal
  • Patent number: 8510112
    Abstract: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, modifying the identified segments in the primary speech database using selected mappings, enhancing the primary speech database by substituting the modified segments for the corresponding identified database segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.
    Type: Grant
    Filed: August 31, 2006
    Date of Patent: August 13, 2013
    Assignee: AT&T Intellectual Property II, L.P.
    Inventors: Alistair Conkie, Ann Syrdal
  • Patent number: 8504368
    Abstract: A synthetic speech text-input device is provided that allows a user to intuitively know an amount of an input text that can be fit in a desired duration. A synthetic speech text-input device 1 includes: an input unit that receives a set duration in which a speech to be synthesized is to be fit, and a text for a synthetic speech; a text amount calculation unit that calculates an acceptable text amount based on the set duration received by the input unit, the acceptable text amount being an amount of a text acceptable as a synthetic speech of the set duration; and a text amount output unit that outputs the acceptable text amount calculated by the text amount calculation unit, when the input unit receives the text.
    Type: Grant
    Filed: September 10, 2010
    Date of Patent: August 6, 2013
    Assignee: Fujitsu Limited
    Inventors: Nobuyuki Katae, Kentaro Murase
  • Patent number: 8498860
    Abstract: A modulation device including: a modulation unit for modulating a carrier in an audible sound range by an encoded transmission signal to generate a modulated signal; a masker sound generation unit for generating a masker signal outputted as a masker sound for making the modulated signal harder to hear when transmitted with the modulated signal; and an acoustic signal generation unit for inserting the masker signal in the modulated signal to generate an acoustic signal.
    Type: Grant
    Filed: October 2, 2006
    Date of Patent: July 30, 2013
    Assignee: NTT DoCoMo, Inc.
    Inventor: Hosei Matsuoka
  • Patent number: 8498867
    Abstract: Disclosed are techniques and systems to provide a narration of a text in multiple different voices. Further disclosed are techniques and systems for generating an audible output in which different portions of a text are narrated using voice models associated with different characters.
    Type: Grant
    Filed: January 14, 2010
    Date of Patent: July 30, 2013
    Assignee: K-NFB Reading Technology, Inc.
    Inventors: Raymond C. Kurzweil, Paul Albrecht, Peter Chapman
  • Patent number: 8498866
    Abstract: Disclosed are techniques and systems to provide a narration of a text in multiple different languages where the portions of the text narrated using the different voices associated with different languages are selected by a user.
    Type: Grant
    Filed: January 14, 2010
    Date of Patent: July 30, 2013
    Assignee: K-NFB Reading Technology, Inc.
    Inventors: Raymond C. Kurzweil, Paul Albrecht, Peter Chapman
  • Patent number: 8494849
    Abstract: A method of transmitting speech data to a remote device in a distributed speech recognition system, includes the steps of: dividing an input speech signal into frames; calculating, for each frame, a voice activity value representative of the presence of speech activity in the frame; grouping the frames into multiframes, each multiframe including a predetermined number of frames; calculating, for each multiframe, a voice activity marker representative of the number of frames in the multiframe representing speech activity; and selectively transmitting, on the basis of the voice activity marker associated with each multiframe, the multiframes to the remote device.
    Type: Grant
    Filed: June 20, 2005
    Date of Patent: July 23, 2013
    Assignee: Telecom Italia S.p.A.
    Inventors: Ivano Salvatore Collotta, Donato Ettorre, Maurizio Fodrini, Pierluigi Gallo, Roberto Spagnolo
  • Patent number: 8489400
    Abstract: Disclosed herein are methods for presenting speech from a selected text that is on a computing device. This method includes presenting text on a touch-sensitive display and having that text size within a threshold level so that the computing device can accurately determine the intent of the user when the user touches the touch screen. Once the user touch has been received, the computing device identifies and interprets the portion of text that is to be selected, and subsequently presents the text audibly to the user.
    Type: Grant
    Filed: August 6, 2012
    Date of Patent: July 16, 2013
    Assignee: AT&T Intellectual Property I, L.P.
    Inventors: Alistair D. Conkie, Horst Schroeter
  • Patent number: 8484027
    Abstract: A method for narrating a digital book includes retrievably storing first data relating to narration of the digital book by a first end-user. The first data is then provided to a user device having stored thereon the digital book. Subsequently, the digital book is presented in narrated form to a second end-user via the user device. In particular, the digital book is displayed via a display portion of the user device while simultaneously providing in audible form the first data via an audio output portion of the user device.
    Type: Grant
    Filed: June 10, 2010
    Date of Patent: July 9, 2013
    Assignee: Skyreader Media Inc.
    Inventor: William A. Murphy
  • Patent number: 8484026
    Abstract: A portable audio control system that controls an audio signal transmitted from an electronic device, including an earphone device and an audio control device. The audio control device includes an audio source receiver, a signal synthesis module, and an audio output unit. The audio receiver, which is connected with the electronic device, is used for receiving the audio signal. The signal synthesis module receives both the audio signal and a voice signal coming from an external audio resource, and then synthesizes those signals. The audio transmitter is used to output the synthesized sound to the earphone device. As users utilize the portable audio control system to connect with the electronic device, both sound from the electronic device and the external voice or song can be listened at the same time.
    Type: Grant
    Filed: August 24, 2009
    Date of Patent: July 9, 2013
    Inventor: Pi-Fen Lin
  • Patent number: 8484028
    Abstract: A system for visually navigating a document in conjunction with a text-to-speech (“TTS) engine presents a visual display of a region of interest that is related to the text of the document that is being audibly presented as speech to a user. When the TTS engine converts the text to speech and presents the speech to the user, the system presents the corresponding section of text on a display. During the presentation, if the system encounters a linked section of text, the visual display changes to display a linked region of interest that corresponds to the linked section of text.
    Type: Grant
    Filed: October 24, 2008
    Date of Patent: July 9, 2013
    Assignee: Fuji Xerox Co., Ltd.
    Inventors: Scott Carter, Laurent Denoue
  • Patent number: 8484035
    Abstract: A method of altering a social signaling characteristic of a speech signal. A statistically large number of speech samples created by different speakers in different tones of voice are evaluated to determine one or more relationships that exist between a selected social signaling characteristic and one or more measurable parameters of the speech samples. An input audio voice signal is then processed in accordance with these relationships to modify one or more of controllable parameters of input audio voice signal to produce a modified output audio voice signal in which said selected social signaling characteristic is modified. In a specific illustrative embodiment, a two-level hidden Markov model is used to identify voiced and unvoiced speech segments and selected controllable characteristics of these speech segments are modified to alter the desired social signaling characteristic.
    Type: Grant
    Filed: September 6, 2007
    Date of Patent: July 9, 2013
    Assignee: Massachusetts Institute of Technology
    Inventor: Alex Paul Pentland