Synthesis Patents (Class 704/258)
-
Patent number: 8620660Abstract: Improved oscillator-based source modeling methods for estimating model parameters, for evaluating model quality for restoring the input from the model parameters, and for improving performance over known in the art methods are disclosed. An application of these innovations to speech coding is described. The improved oscillator model is derived from the information contained in the current input signal as well as from some form of data history, often the restored versions of the earlier processed data. Operations can be performed in real time, and compression can be achieved at a user-specified level of performance and, in some cases, without information loss. The new model can be combined with methods in the existing art in order to complement the properties of these methods, to improve overall performance. The present invention is effective for very low bit-rate coding/compression and decoding/decompression of digital signals, including digitized speech and audio signals.Type: GrantFiled: October 29, 2010Date of Patent: December 31, 2013Assignee: The United States of America, as Represented by the Secretary of the NavyInventors: Anton Yen, Irina Gorodnitsky
-
Patent number: 8612228Abstract: A section corresponding to a given duration is sampled from sound data that indicates the voice of a player collected by a microphone, and a vocal tract cross-sectional area function of the sampled section is calculated. The vertical dimension of the mouth is calculated from a throat-side average cross-sectional area of the vocal tract cross-sectional area function, and the area of the mouth is calculated from a mouth-side average cross-sectional area. The transverse dimension of the mouth is calculated from the area of the mouth and the vertical dimension of the mouth.Type: GrantFiled: March 26, 2010Date of Patent: December 17, 2013Assignee: Namco Bandai Games Inc.Inventor: Hiroyuki Hiraishi
-
Publication number: 20130325476Abstract: An apparatus and method for generating a wave field synthesis (WFS) signal in consideration of a height of a speaker are disclosed. The WFS signal generation apparatus may include a waveform propagation distance determination unit to determine a propagation distance of a waveform propagate from a sound source based on a height of a speaker, and a WFS signal generation unit to generate a WFS signal corresponding to the speaker using the propagation distance of the waveform.Type: ApplicationFiled: March 14, 2013Publication date: December 5, 2013Applicant: Electronics and Telecommunications Research InstitInventor: Electronics and Telecommunications Research Institute
-
Publication number: 20130325477Abstract: A speech synthesis system includes: a training database storing training data which is set of features extracted from speech waveform data; a feature space division unit which divides a feature space which is a space concerning to the training data into partial spaces; a sparse or dense state detection unit which detects a sparse or dense state to each partial space which is the divided feature space, generates sparse or dense information which is information indicating the sparse or dense state and outputs the sparse or dense information; and a pronunciation information correcting unit which corrects pronunciation information which is used for speech synthesis based on the outputted sparse or dense information.Type: ApplicationFiled: February 17, 2012Publication date: December 5, 2013Applicant: NEC CorporationInventors: Yasuyuki Mitsui, Reishi Kondo, Masanori Kato
-
Patent number: 8594993Abstract: Frame mapping-based cross-lingual voice transformation may transform a target speech corpus in a particular language into a transformed target speech corpus that remains recognizable, and has the voice characteristics of a target speaker that provided the target speech corpus. A formant-based frequency warping is performed on the fundamental frequencies and the linear predictive coding (LPC) spectrums of source speech waveforms in a first language to produce transformed fundamental frequencies and transformed LPC spectrums. The transformed fundamental frequencies and the transformed LPC spectrums are then used to generate warped parameter trajectories. The warped parameter trajectories are further used to transform the target speech waveforms in the second language to produce transformed target speech waveform with voice characteristics of the first language that nevertheless retain at least some voice characteristics of the target speaker.Type: GrantFiled: April 4, 2011Date of Patent: November 26, 2013Assignee: Microsoft CorporationInventors: Yao Qian, Frank Kao-Ping Soong
-
Patent number: 8583438Abstract: Described is a technology by which synthesized speech generated from text is evaluated against a prosody model (trained offline) to determine whether the speech will sound unnatural. If so, the speech is regenerated with modified data. The evaluation and regeneration may be iterative until deemed natural sounding. For example, text is built into a lattice that is then (e.g., Viterbi) searched to find a best path. The sections (e.g., units) of data on the path are evaluated via a prosody model. If the evaluation deems a section to correspond to unnatural prosody, that section is replaced, e.g., by modifying/pruning the lattice and re-performing the search. Replacement may be iterative until all sections pass the evaluation. Unnatural prosody detection may be biased such that during evaluation, unnatural prosody is falsely detected at a higher rate relative to a rate at which unnatural prosody is missed.Type: GrantFiled: September 20, 2007Date of Patent: November 12, 2013Assignee: Microsoft CorporationInventors: Yong Zhao, Frank Kao-ping Soong, Min Chu, Lijuan Wang
-
Patent number: 8583439Abstract: Improved methods of presenting speech prompts to a user as part of an automated system that employs speech recognition or other voice input are described. The invention improves the user interface by providing in combination with at least one user prompt seeking a voice response, an enhanced user keyword prompt intended to facilitate the user selecting a keyword to speak in response to the user prompt. The enhanced keyword prompts may be the same words as those a user can speak as a reply to the user prompt but presented using a different audio presentation method, e.g., speech rate, audio level, or speaker voice, than used for the user prompt. In some cases, the user keyword prompts are different words from the expected user response keywords, or portions of words, e.g., truncated versions of keywords.Type: GrantFiled: January 12, 2004Date of Patent: November 12, 2013Assignee: Verizon Services Corp.Inventor: James Mark Kondziela
-
Patent number: 8583437Abstract: Service architecture for providing to a user terminal of a communications network textual information and relative speech synthesis, the user terminal being provided with a speech synthesis engine and a basic database of speech waveforms includes: a content server for downloading textual information requested by means of a browser application on the user terminal; a context manager for extracting context information from the textual information requested by the user terminal; a context selector for selecting an incremental database of speech waveforms associated with extracted context information and for downloading the incremental database into the user terminal; a database manager on the user terminal for managing the composition of an enlarged database of speech waveforms for the speech synthesis engine including the basic and the incremental databases of speech waveforms.Type: GrantFiled: May 31, 2005Date of Patent: November 12, 2013Assignee: Telecom Italia S.p.A.Inventors: Alessio Cervone, Ivano Salvatore Collotta, Paolo Coppo, Donato Ettorre, Maurizio Fodrini, Maura Turolla
-
Patent number: 8583443Abstract: Disclosed is a recording and reproducing apparatus comprising: an apparatus main body; and a remote controller to perform remote control of the apparatus main body, wherein the remote controller comprises: a key operating section to receive a key operation by a user; a sound information inputting section to input sound information; and a transmitting section to transmit sound data based on the sound information to the apparatus main body, and the apparatus main body comprises: a recording section to record input content data on a recording medium; a reproducing section to reproduce the content data; a receiving section to receive the sound data; a sound information recording section to record the sound data so as to be associated with a piece of the content data; and a sound information outputting section to reproduce the sound data to output the reproduced sound data.Type: GrantFiled: April 10, 2008Date of Patent: November 12, 2013Assignee: Funai Electric Co., Ltd.Inventor: Masayuki Misawa
-
Patent number: 8577682Abstract: An auditory user interactive interface to an application program being installed in the computer controlled system. A routine in an object, in an application program being installed in the computer controlled system for providing an auditory user interface to the program in combination with auditory means for offering the user of the computer controlled system the auditory user interface during installation of said application program, and responsive to the selection of the auditory interface provides the auditory user interface during said installation of the application program.Type: GrantFiled: October 27, 2005Date of Patent: November 5, 2013Assignee: Nuance Communications, Inc.Inventors: Peter T. Brunet, Anh Quy Lu, Mark Edward Nosewicz, Lawrence Frank Weiss
-
Patent number: 8571870Abstract: Techniques for generating synthetic speech with contrastive stress. In one aspect, a speech-enabled application generates a text input including a text transcription of a desired speech output, and inputs the text input to a speech synthesis system. The synthesis system generates an audio speech output corresponding to at least a portion of the text input, with at least one portion carrying contrastive stress, and provides the audio speech output for the speech-enabled application. In another aspect, a speech-enabled application inputs a plurality of text strings, each corresponding to a portion of a desired speech output, to a software module for rendering contrastive stress. The software module identifies a plurality of audio recordings that render at least one portion of at least one of the text strings as speech carrying contrastive stress. The speech-enabled application generates an audio speech output corresponding to the desired speech output using the audio recordings.Type: GrantFiled: August 9, 2010Date of Patent: October 29, 2013Assignee: Nuance Communications, Inc.Inventors: Darren C. Meyer, Stephen R. Springer
-
Patent number: 8571849Abstract: Disclosed herein are systems, methods, and computer readable-media for enriching spoken language translation with prosodic information in a statistical speech translation framework. The method includes receiving speech for translation to a target language, generating pitch accent labels representing segments of the received speech which are prosodically prominent, and injecting pitch accent labels with word tokens within the translation engine to create enriched target language output text. A further step may be added of synthesizing speech in the target language based on the prosody enriched target language output text. An automatic prosody labeler can generate pitch accent labels. An automatic prosody labeler can exploit lexical, syntactic, and prosodic information of the speech. A maximum entropy model may be used to determine which segments of the speech are prosodically prominent.Type: GrantFiled: September 30, 2008Date of Patent: October 29, 2013Assignee: AT&T Intellectual Property I, L.P.Inventors: Srinivas Bangalore, Vivek Kumar Rangarajan Sridhar
-
Patent number: 8571039Abstract: A method and apparatus for transmitting an audio signal over a communication channel comprising encoding the audio signal with an encoder 204 using a first sampling rate, filtering the audio signal using a first cut off frequency, the first cut off frequency being chosen in dependence upon the first sampling rate, and transmitting the encoded and filtered audio signal over the communication channel. The presence of a condition in which the sampling rate of the encoder 204 is to be switched to a second sampling rate at a switching time is determined and if the condition has been determined to be present, the cut off frequency used in the filtering step is gradually changed from the first cut off frequency to a second cut off frequency, the second cut off frequency being chosen in dependence upon the second sampling rate, such that the audio bandwidth of the transmitted signal changes gradually when the sampling rate is switched to the second sampling rate.Type: GrantFiled: June 23, 2010Date of Patent: October 29, 2013Assignee: SkypeInventors: Stefan Strommer, Karsten Vandborg Sorensen, Soren Skak Jensen, Koen Vos, Jon Bergenheim
-
Patent number: 8566106Abstract: A method and device for searching an algebraic codebook during encoding of a sound signal, wherein the algebraic codebook comprises a set of codevectors formed of a number of pulse positions and a number of pulses distributed over the pulse positions. In the algebraic codebook searching method and device, a reference signal for use in searching the algebraic codebook is calculated. In a first stage, a position of a first pulse is determined in relation with the reference signal and among the number of pulse positions. In each of a number of stages subsequent to the first stage, (a) an algebraic codebook gain is recomputed, (b) the reference signal is updated using the recomputed algebraic codebook gain and (c) a position of another pulse is determined in relation with the updated reference signal and among the number of pulse positions.Type: GrantFiled: September 11, 2008Date of Patent: October 22, 2013Assignee: Voiceage CorporationInventors: Redwan Salami, Vaclav Eksler, Milan Jelinek
-
Patent number: 8566099Abstract: A system and method for improving the response time of text-to-speech synthesis using triphone contexts. The method includes identifying a set of triphone sequences, tabulating the set of triphone sequences using a plurality of contexts, where each context specific triphone sequence of the plurality of context specific triphone sequences has a top N triphone units made of the triphone units having lowest target costs when each triphone unit is individually combined into a 5-phoneme combination. Input texts having one of the contexts are received, and one of the context specific triphone sequences is selected based on the context. Input text is then synthesized using the context specific triphone sequence.Type: GrantFiled: July 16, 2012Date of Patent: October 22, 2013Assignee: AT&T Intellectual Property II, L.P.Inventor: Alistair D. Conkie
-
Patent number: 8566098Abstract: A system and method are disclosed for synthesizing speech based on a selected speech act. A method includes modifying synthesized speech of a spoken dialogue system, by (1) receiving a user utterance, (2) analyzing the user utterance to determine an appropriate speech act, and (3) generating a response of a type associated with the appropriate speech act, wherein in linguistic variables in the response are selected, based on the appropriate speech act.Type: GrantFiled: October 30, 2007Date of Patent: October 22, 2013Assignee: AT&T Intellectual Property I, L.P.Inventors: Ann K Syrdal, Mark Beutnagel, Alistair D Conkie, Yeon-Jun Kim
-
Patent number: 8566078Abstract: A method of generating a statistical machine translation database through a game in which a monolingual structure is provided to a plurality of players. A first translation attempt is received from each of the plurality of players. The first translation attempt from each of the plurality of players is compared. Feedback is provided to each of the plurality of players and the attempts are received and compared to provide feedback to iteratively converge subsequent translations from each of the plurality of players into a final translated structure.Type: GrantFiled: January 29, 2010Date of Patent: October 22, 2013Assignee: International Business Machines CorporationInventors: Ruhi Sarikaya, Jiri Navratil, Osamuyimen Stewart, David Lubensky
-
Patent number: 8560301Abstract: A language expression apparatus and a method based on a context and a intent awareness, are provided. The apparatus and method may recognize a context and an intent of a user and may generate a language expression based on the recognized context and the recognized intent, thereby providing an interpretation/translation service and/or providing an education service for learning a language.Type: GrantFiled: March 2, 2010Date of Patent: October 15, 2013Assignee: Samsung Electronics Co., Ltd.Inventor: Yeo Jin Kim
-
Patent number: 8560315Abstract: A conference support device includes an image receiving portion that receives captured images from conference terminals, a voice receiving portion that receives, from one of the conference terminals, a voice that is generated by a first participant, a first storage portion that stores the captured images and the voice, a voice recognition portion that recognizes the voice, a text data creation portion that creates text data that express the words that are included in the voice, an addressee specification portion that specifies a second participant, whom the voice is addressing, an image creation portion that creates a display image that is configured from the captured images and in which the text data are associated with the first participant and a specified image is associated with at least one of the first participant and the second participant, and a transmission portion that transmits the display image to the conference terminals.Type: GrantFiled: March 12, 2010Date of Patent: October 15, 2013Assignee: Brother Kogyo Kabushiki KaishaInventor: Mizuho Yasoshima
-
Patent number: 8560317Abstract: A vocabulary dictionary storing unit for storing a plurality of words in advance, a vocabulary dictionary managing unit for extracting recognition target words, a matching unit for calculating a degree of matching with the recognition target words based on an accepted voice, a result output unit for outputting, as a recognition result, a word having a best score from a result of calculating the degree of matching, and an extraction criterion information managing unit for changing extraction criterion information according to a result of monitoring by a monitor control unit are provided. The vocabulary dictionary storing unit further includes a scale information storing unit for storing scale information serving as a scale at the time of extracting the recognition target words, and an extraction criterion information storing unit for storing extraction criterion information indicating a criterion of the recognition target words at the time of extracting the recognition target words.Type: GrantFiled: September 18, 2006Date of Patent: October 15, 2013Assignee: Fujitsu LimitedInventor: Kenji Abe
-
Patent number: 8554541Abstract: A virtual pet system includes: a virtual pet client, adapted to receive a sentence in natural language and send the sentence to a Q&A server; the Q&A server, adapted to receive the sentence, process the sentence through natural language comprehension, generate an answer in natural language based on a result of natural language comprehension and reasoning knowledge, and send the answer in natural language to the virtual pet client. A method for virtual pet chatting includes: receiving a sentence in natural language, perform natural language comprehension for the sentence, and generating an answer in natural language based on a result of natural language comprehension and reasoning knowledge.Type: GrantFiled: September 18, 2008Date of Patent: October 8, 2013Assignee: Tencent Technology (Shenzhen) Company Ltd.Inventors: Haisong Yang, Zhiyuan Liu, Yunfeng Liu, Rongling Yu
-
Patent number: 8554566Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.Type: GrantFiled: November 29, 2012Date of Patent: October 8, 2013Assignee: Morphism LLCInventor: James H. Stephens, Jr.
-
Patent number: 8554565Abstract: According to one embodiment, a speech synthesizer generates a speech segment sequence and synthesizes speech by connecting speech segments of the generated speech segment sequence. If a speech segment of a synthesized first speech segment sequence is different from the speech segment of a synthesized second speech segment sequence having the same synthesis unit as the first speech segment sequence, the speech synthesizer disables the speech segment of the first speech segment sequence that is different from the speech segment of the second speech segment sequence.Type: GrantFiled: September 14, 2010Date of Patent: October 8, 2013Assignee: Kabushiki Kaisha ToshibaInventors: Osamu Nishiyama, Takehiko Kagoshima
-
Patent number: 8548809Abstract: A voice guidance system for providing a guidance by voice concerning operations of an information processing apparatus, comprises a detector that detects that a predetermined function of the information processing apparatus is disabled, and a voice guidance unit that outputs a voice message reporting a reason why the predetermined function of the information processing apparatus is disabled, in response to the detection output of the detector.Type: GrantFiled: June 16, 2005Date of Patent: October 1, 2013Assignee: Fuji Xerox Co., Ltd.Inventors: Kanji Itaki, Michihiro Kawamura, Nozomi Noguchi
-
Publication number: 20130253934Abstract: A method of providing user participation in a social broadcast environment is disclosed. A network communication is received from a user of a broadcast that includes a preference data indicating a preference of the user that a promoted content be included in the broadcast. Via a responsive network communication, a feedback data is provided to the user that includes a predicted future time at which the promoted content may be included in the broadcast.Type: ApplicationFiled: January 31, 2013Publication date: September 26, 2013Applicant: JELLI, INC.Inventors: Jateen P. Parekh, Michael S. Dougherty, Sarah Caplener, Mitchell A. Yawitz, Scott Strain, Adam J. Dobrer
-
Patent number: 8542839Abstract: An audio processing apparatus and method for a mobile device are provided. The audio processing apparatus and method may appropriately determine sound source localizations corresponding to a voice signal and an audio signal, and thereby may simultaneously provide a voice call service and a multimedia service. Also, the audio processing apparatus and method may guarantee quality of the voice call service even when simultaneously providing the voice call service and the multimedia service.Type: GrantFiled: March 18, 2009Date of Patent: September 24, 2013Assignee: Samsung Electronics Co., Ltd.Inventors: Chang Yong Son, Do Hyung Kim, Sang Oak Woo, Kang Eun Lee
-
Patent number: 8538743Abstract: A software language including language constructs for disambiguating text that is to be converted to speech using configurable lexeme based rules. The language can include at least one conditional statement and a significance indicator. The conditional statement can define a sense of usage for a lexeme. The significance indicator can define a criteria for selecting an associated sense of usage. The language can also include an action expression that is associated with a conditional statement that defines a set of programmatic actions to be executed upon a selection of the associated usage sense. The conditional statement can include a context range specification that defines a scope of an input string for examination when evaluating the conditional statement. Further, the conditional statement can include a directive that represents a defined condition of the lexeme or the text surrounding the lexeme.Type: GrantFiled: March 21, 2007Date of Patent: September 17, 2013Assignee: Nuance Communications, Inc.Inventors: Oswaldo Gago, Steven M. Hancock, Maria E. Smith
-
Publication number: 20130238337Abstract: A voice quality conversion system includes: an analysis unit which analyzes sounds of plural vowels of different types to generate first vocal tract shape information for each type of the vowels; a combination unit which combines, for each type of the vowels, the first vocal tract shape information on that type of vowel and the first vocal tract shape information on a different type of vowel to generate second vocal tract shape information on that type of vowel; and a synthesis unit which (i) combines vocal tract shape information on a vowel included in input speech and the second vocal tract shape information on the same type of vowel to convert vocal tract shape information on the input speech, and (ii) generates a synthetic sound using the converted vocal tract shape information and voicing source information on the input speech to convert the voice quality of the input speech.Type: ApplicationFiled: April 29, 2013Publication date: September 12, 2013Applicant: Panasonic CorporationInventors: Takahiro KAMAI, Yoshifumi HIROSE
-
Patent number: 8527273Abstract: Systems and methods for identifying the N-best strings of a weighted automaton. A potential for each state of an input automaton to a set of destination states of the input automaton is first determined. Then, the N-best paths are found in the result of an on-the-fly determinization of the input automaton. Only the portion of the input automaton needed to identify the N-best paths is determinized. As the input automaton is determinized, a potential for each new state of the partially determinized automaton is determined and is used in identifying the N-best paths of the determinized automaton, which correspond exactly to the N-best strings of the input automaton.Type: GrantFiled: July 30, 2012Date of Patent: September 3, 2013Assignee: AT&T Intellectual Property II, L.P.Inventors: Mehryar Mohri, Michael Dennis Riley
-
Patent number: 8527276Abstract: A method and system for is disclosed for speech synthesis using deep neural networks. A neural network may be trained to map input phonetic transcriptions of training-time text strings into sequences of acoustic feature vectors, which yield predefined speech waveforms when processed by a signal generation module. The training-time text strings may correspond to written transcriptions of speech carried in the predefined speech waveforms. Subsequent to training, a run-time text string may be translated to a run-time phonetic transcription, which may include a run-time sequence of phonetic-context descriptors, each of which contains a phonetic speech unit, data indicating phonetic context, and data indicating time duration of the respective phonetic speech unit. The trained neural network may then map the run-time sequence of the phonetic-context descriptors to run-time predicted feature vectors, which may in turn be translated into synthesized speech by the signal generation module.Type: GrantFiled: October 25, 2012Date of Patent: September 3, 2013Assignee: Google Inc.Inventors: Andrew William Senior, Byungha Chun, Michael Schuster
-
Patent number: 8527258Abstract: A simultaneous interpretation system includes headsets for inputting and outputting voice, and a portable terminal for receiving an original language voice speech signal to be interpreted that is output from the first headset. The portable terminal outputs an interpreted voice speech signal based on the original language voice speech signal that has been interpreted into a different language to the second headset. The portable terminal either performs the interpretation or accesses an interpretation server to provide the second headset with the interpreted voice speech signal. Hence, the simultaneous interpretation is carried out using the short-range communication between the users by medium of the single portable terminal and thus more efficient and unrestricted conversation is realized.Type: GrantFiled: February 1, 2010Date of Patent: September 3, 2013Assignee: Samsung Electronics Co., Ltd.Inventors: Kyoung-Yup Kim, Jun-Tai Kim
-
Patent number: 8527275Abstract: A contextual input device includes a plurality of tactually discernable keys disposed in a predetermined configuration which replicates a particular relationship among a plurality of items associated with a known physical object. The tactually discernable keys are typically labeled with Braille type. The known physical object is typically a collection of related items grouped together by some common relationship. A computer-implemented process determines whether an input signal represents a selection of an item from among a plurality of items or an attribute pertaining to an item among the plurality of items. Once the selected item or attribute pertaining to an item is determined, the computer-implemented process transforms a user's selection from the input signal into an analog audio signal which is then audibly output as human speech with an electro-acoustic transducer.Type: GrantFiled: July 17, 2009Date of Patent: September 3, 2013Assignee: Cal Poly CorporationInventors: Fantin Dennis, C. Arthur MacCarley
-
Patent number: 8527283Abstract: A method (100) includes receiving (101) an input digital audio signal comprising a narrow-band signal. The input digital audio signal is processed (102) to generate a processed digital audio signal. An estimate of the high-band energy level corresponding to the input digital audio signal is determined (103). Modification of the estimated high-band energy level is done based on an estimation accuracy and/or narrow-band signal characteristics (104). A high-band digital audio signal is generated based on the modified estimate of the high-band energy level and an estimated high-band spectrum corresponding to the modified estimate of the high-band energy level (105).Type: GrantFiled: January 19, 2011Date of Patent: September 3, 2013Assignee: Motorola Mobility LLCInventors: Mark A. Jasiuk, Tenkasi V. Ramabadran
-
Patent number: 8527281Abstract: Methods and systems for sculpting synthesized speech using a graphic user interface are disclosed. An operator enters a stream of text that is used to produce a stream of target phonetic-units. The stream of target phonetic-units is then submitted to a unit-selection process to produce a stream of selected phonetic-units, each selected phonetic-unit derived from a database of sample phonetic-units. After the stream of sample phonetic-units is selected, an operator can remove various selected phonetic-units from the stream of selected phonetic-units, prune the sample phonetic-database and edit various cost functions using the graphic user interface. The edited speech information can then be submitted to the unit-selection process to produce a second stream of selected phonetic-units.Type: GrantFiled: June 29, 2012Date of Patent: September 3, 2013Assignee: Nuance Communications, Inc.Inventors: Peter Rutten, Paul A. Taylor
-
Patent number: 8521535Abstract: A biochemical analyzer having a microprocessing apparatus with expandable voice capacity is characterized in that a driving module is installed in a data processor and a voice carrier is replaceable. Thereby, increase or decrease of voice files can be easily done by replacing the current voice carrier with an alternative voice carrier storing desired voice files, without the need of replacing the driving module together with the voice carrier, thereby saving costs and reducing processing procedures.Type: GrantFiled: November 10, 2010Date of Patent: August 27, 2013Inventor: Chun-Yu Chen
-
Patent number: 8521513Abstract: A language-neutral speech grammar extensible markup language (GRXML) document and a localized response document are used to build a localized GRXML document. The language-neutral GRXML document specifies an initial grammar rule element. The initial grammar rule element specifies a given response type identifier and a given action. The localized response document contains a given response entry that specifies the given response type identifier and a given response in a given language. The localized GRXML document specifies a new grammar rule element. The new grammar rule element specifies the given response in the given language and the given action. The localized GRXML document is installed in an interactive voice response (IVR) system. The localized GRXML document configures the IVR system to perform the given action when a user of the IVR system speaks the given response to the IVR system.Type: GrantFiled: March 12, 2010Date of Patent: August 27, 2013Assignee: Microsoft CorporationInventors: Thomas W. Millett, David Notario
-
Patent number: 8515759Abstract: An apparatus for synthesizing a rendered output signal having a first audio channel and a second audio channel includes a decorrelator stage for generating a decorrelator signal based on a downmix signal, and a combiner for performing a weighted combination of the downmix signal and a decorrelated signal based on parametric audio object information, downmix information and target rendering information. The combiner solves the problem of optimally combining matrixing with decorrelation for a high quality stereo scene reproduction of a number of individual audio objects using a multichannel downmix.Type: GrantFiled: April 23, 2008Date of Patent: August 20, 2013Assignee: Dolby International ABInventors: Jonas Engdegard, Heiko Purnhagen, Barbara Resch, Lars Villemoes, Cornelia Falch, Juergen Herre, Johannes Hilpert, Andreas Hoelzer, Leonid Terentiev
-
Patent number: 8515749Abstract: Systems and methods for facilitating communication including recognizing speech in a first language represented in a first audio signal; forming a first text representation of the speech; processing the first text representation to form data representing a second audio signal; and causing presentation of the second audio signal to a second user while responsive to an interrupt signal from a first user. In some embodiments, processing the first text representation includes translating the first text representation to a second text representation in a second language and processing the second text representation to form the data representing the second audio signal. In some embodiments include accepting an interrupt signal from the first user and interrupting the presentation of the second audio signal.Type: GrantFiled: May 20, 2009Date of Patent: August 20, 2013Assignee: Raytheon BBN Technologies Corp.Inventor: David G. Stallard
-
Patent number: 8510113Abstract: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, identifying replacement segments in a secondary speech database, enhancing the primary speech database by substituting the identified secondary speech database segments for the corresponding identified segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.Type: GrantFiled: August 31, 2006Date of Patent: August 13, 2013Assignee: AT&T Intellectual Property II, L.P.Inventors: Alistair Conkie, Ann K. Syrdal
-
Patent number: 8510112Abstract: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, modifying the identified segments in the primary speech database using selected mappings, enhancing the primary speech database by substituting the modified segments for the corresponding identified database segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.Type: GrantFiled: August 31, 2006Date of Patent: August 13, 2013Assignee: AT&T Intellectual Property II, L.P.Inventors: Alistair Conkie, Ann Syrdal
-
Patent number: 8504368Abstract: A synthetic speech text-input device is provided that allows a user to intuitively know an amount of an input text that can be fit in a desired duration. A synthetic speech text-input device 1 includes: an input unit that receives a set duration in which a speech to be synthesized is to be fit, and a text for a synthetic speech; a text amount calculation unit that calculates an acceptable text amount based on the set duration received by the input unit, the acceptable text amount being an amount of a text acceptable as a synthetic speech of the set duration; and a text amount output unit that outputs the acceptable text amount calculated by the text amount calculation unit, when the input unit receives the text.Type: GrantFiled: September 10, 2010Date of Patent: August 6, 2013Assignee: Fujitsu LimitedInventors: Nobuyuki Katae, Kentaro Murase
-
Patent number: 8498860Abstract: A modulation device including: a modulation unit for modulating a carrier in an audible sound range by an encoded transmission signal to generate a modulated signal; a masker sound generation unit for generating a masker signal outputted as a masker sound for making the modulated signal harder to hear when transmitted with the modulated signal; and an acoustic signal generation unit for inserting the masker signal in the modulated signal to generate an acoustic signal.Type: GrantFiled: October 2, 2006Date of Patent: July 30, 2013Assignee: NTT DoCoMo, Inc.Inventor: Hosei Matsuoka
-
Patent number: 8498867Abstract: Disclosed are techniques and systems to provide a narration of a text in multiple different voices. Further disclosed are techniques and systems for generating an audible output in which different portions of a text are narrated using voice models associated with different characters.Type: GrantFiled: January 14, 2010Date of Patent: July 30, 2013Assignee: K-NFB Reading Technology, Inc.Inventors: Raymond C. Kurzweil, Paul Albrecht, Peter Chapman
-
Patent number: 8498866Abstract: Disclosed are techniques and systems to provide a narration of a text in multiple different languages where the portions of the text narrated using the different voices associated with different languages are selected by a user.Type: GrantFiled: January 14, 2010Date of Patent: July 30, 2013Assignee: K-NFB Reading Technology, Inc.Inventors: Raymond C. Kurzweil, Paul Albrecht, Peter Chapman
-
Patent number: 8494849Abstract: A method of transmitting speech data to a remote device in a distributed speech recognition system, includes the steps of: dividing an input speech signal into frames; calculating, for each frame, a voice activity value representative of the presence of speech activity in the frame; grouping the frames into multiframes, each multiframe including a predetermined number of frames; calculating, for each multiframe, a voice activity marker representative of the number of frames in the multiframe representing speech activity; and selectively transmitting, on the basis of the voice activity marker associated with each multiframe, the multiframes to the remote device.Type: GrantFiled: June 20, 2005Date of Patent: July 23, 2013Assignee: Telecom Italia S.p.A.Inventors: Ivano Salvatore Collotta, Donato Ettorre, Maurizio Fodrini, Pierluigi Gallo, Roberto Spagnolo
-
Patent number: 8489400Abstract: Disclosed herein are methods for presenting speech from a selected text that is on a computing device. This method includes presenting text on a touch-sensitive display and having that text size within a threshold level so that the computing device can accurately determine the intent of the user when the user touches the touch screen. Once the user touch has been received, the computing device identifies and interprets the portion of text that is to be selected, and subsequently presents the text audibly to the user.Type: GrantFiled: August 6, 2012Date of Patent: July 16, 2013Assignee: AT&T Intellectual Property I, L.P.Inventors: Alistair D. Conkie, Horst Schroeter
-
Patent number: 8484027Abstract: A method for narrating a digital book includes retrievably storing first data relating to narration of the digital book by a first end-user. The first data is then provided to a user device having stored thereon the digital book. Subsequently, the digital book is presented in narrated form to a second end-user via the user device. In particular, the digital book is displayed via a display portion of the user device while simultaneously providing in audible form the first data via an audio output portion of the user device.Type: GrantFiled: June 10, 2010Date of Patent: July 9, 2013Assignee: Skyreader Media Inc.Inventor: William A. Murphy
-
Patent number: 8484026Abstract: A portable audio control system that controls an audio signal transmitted from an electronic device, including an earphone device and an audio control device. The audio control device includes an audio source receiver, a signal synthesis module, and an audio output unit. The audio receiver, which is connected with the electronic device, is used for receiving the audio signal. The signal synthesis module receives both the audio signal and a voice signal coming from an external audio resource, and then synthesizes those signals. The audio transmitter is used to output the synthesized sound to the earphone device. As users utilize the portable audio control system to connect with the electronic device, both sound from the electronic device and the external voice or song can be listened at the same time.Type: GrantFiled: August 24, 2009Date of Patent: July 9, 2013Inventor: Pi-Fen Lin
-
Patent number: 8484028Abstract: A system for visually navigating a document in conjunction with a text-to-speech (“TTS) engine presents a visual display of a region of interest that is related to the text of the document that is being audibly presented as speech to a user. When the TTS engine converts the text to speech and presents the speech to the user, the system presents the corresponding section of text on a display. During the presentation, if the system encounters a linked section of text, the visual display changes to display a linked region of interest that corresponds to the linked section of text.Type: GrantFiled: October 24, 2008Date of Patent: July 9, 2013Assignee: Fuji Xerox Co., Ltd.Inventors: Scott Carter, Laurent Denoue
-
Patent number: 8484035Abstract: A method of altering a social signaling characteristic of a speech signal. A statistically large number of speech samples created by different speakers in different tones of voice are evaluated to determine one or more relationships that exist between a selected social signaling characteristic and one or more measurable parameters of the speech samples. An input audio voice signal is then processed in accordance with these relationships to modify one or more of controllable parameters of input audio voice signal to produce a modified output audio voice signal in which said selected social signaling characteristic is modified. In a specific illustrative embodiment, a two-level hidden Markov model is used to identify voiced and unvoiced speech segments and selected controllable characteristics of these speech segments are modified to alter the desired social signaling characteristic.Type: GrantFiled: September 6, 2007Date of Patent: July 9, 2013Assignee: Massachusetts Institute of TechnologyInventor: Alex Paul Pentland