Synthesis Patents (Class 704/258)

Neural network (Class 704/259)

Image to speech (Class 704/260)

Vocal tract model (Class 704/261)

Linear prediction (Class 704/262)

Correlation (Class 704/263)

Excitation (Class 704/264)

Interpolation (Class 704/265)

Specialized model (Class 704/266)

Time element (Class 704/267)

Frequency element (Class 704/268)

Transformation (Class 704/269)

Very low bit rate signal coder and decoder

Patent number: 8620660

Abstract: Improved oscillator-based source modeling methods for estimating model parameters, for evaluating model quality for restoring the input from the model parameters, and for improving performance over known in the art methods are disclosed. An application of these innovations to speech coding is described. The improved oscillator model is derived from the information contained in the current input signal as well as from some form of data history, often the restored versions of the earlier processed data. Operations can be performed in real time, and compression can be achieved at a user-specified level of performance and, in some cases, without information loss. The new model can be combined with methods in the existing art in order to complement the properties of these methods, to improve overall performance. The present invention is effective for very low bit-rate coding/compression and decoding/decompression of digital signals, including digitized speech and audio signals.

Type: Grant

Filed: October 29, 2010

Date of Patent: December 31, 2013

Assignee: The United States of America, as Represented by the Secretary of the Navy

Inventors: Anton Yen, Irina Gorodnitsky
Character mouth shape control method

Patent number: 8612228

Abstract: A section corresponding to a given duration is sampled from sound data that indicates the voice of a player collected by a microphone, and a vocal tract cross-sectional area function of the sampled section is calculated. The vertical dimension of the mouth is calculated from a throat-side average cross-sectional area of the vocal tract cross-sectional area function, and the area of the mouth is calculated from a mouth-side average cross-sectional area. The transverse dimension of the mouth is calculated from the area of the mouth and the vertical dimension of the mouth.

Type: Grant

Filed: March 26, 2010

Date of Patent: December 17, 2013

Assignee: Namco Bandai Games Inc.

Inventor: Hiroyuki Hiraishi
APPARATUS AND METHOD FOR GENERATING WAVE FIELD SYNTHESIS SIGNALS

Publication number: 20130325476

Abstract: An apparatus and method for generating a wave field synthesis (WFS) signal in consideration of a height of a speaker are disclosed. The WFS signal generation apparatus may include a waveform propagation distance determination unit to determine a propagation distance of a waveform propagate from a sound source based on a height of a speaker, and a WFS signal generation unit to generate a WFS signal corresponding to the speaker using the propagation distance of the waveform.

Type: Application

Filed: March 14, 2013

Publication date: December 5, 2013

Applicant: Electronics and Telecommunications Research Instit

Inventor: Electronics and Telecommunications Research Institute
SPEECH SYNTHESIS SYSTEM, SPEECH SYNTHESIS METHOD AND SPEECH SYNTHESIS PROGRAM

Publication number: 20130325477

Abstract: A speech synthesis system includes: a training database storing training data which is set of features extracted from speech waveform data; a feature space division unit which divides a feature space which is a space concerning to the training data into partial spaces; a sparse or dense state detection unit which detects a sparse or dense state to each partial space which is the divided feature space, generates sparse or dense information which is information indicating the sparse or dense state and outputs the sparse or dense information; and a pronunciation information correcting unit which corrects pronunciation information which is used for speech synthesis based on the outputted sparse or dense information.

Type: Application

Filed: February 17, 2012

Publication date: December 5, 2013

Applicant: NEC Corporation

Inventors: Yasuyuki Mitsui, Reishi Kondo, Masanori Kato
Frame mapping approach for cross-lingual voice transformation

Patent number: 8594993

Abstract: Frame mapping-based cross-lingual voice transformation may transform a target speech corpus in a particular language into a transformed target speech corpus that remains recognizable, and has the voice characteristics of a target speaker that provided the target speech corpus. A formant-based frequency warping is performed on the fundamental frequencies and the linear predictive coding (LPC) spectrums of source speech waveforms in a first language to produce transformed fundamental frequencies and transformed LPC spectrums. The transformed fundamental frequencies and the transformed LPC spectrums are then used to generate warped parameter trajectories. The warped parameter trajectories are further used to transform the target speech waveforms in the second language to produce transformed target speech waveform with voice characteristics of the first language that nevertheless retain at least some voice characteristics of the target speaker.

Type: Grant

Filed: April 4, 2011

Date of Patent: November 26, 2013

Assignee: Microsoft Corporation

Inventors: Yao Qian, Frank Kao-Ping Soong
Unnatural prosody detection in speech synthesis

Patent number: 8583438

Abstract: Described is a technology by which synthesized speech generated from text is evaluated against a prosody model (trained offline) to determine whether the speech will sound unnatural. If so, the speech is regenerated with modified data. The evaluation and regeneration may be iterative until deemed natural sounding. For example, text is built into a lattice that is then (e.g., Viterbi) searched to find a best path. The sections (e.g., units) of data on the path are evaluated via a prosody model. If the evaluation deems a section to correspond to unnatural prosody, that section is replaced, e.g., by modifying/pruning the lattice and re-performing the search. Replacement may be iterative until all sections pass the evaluation. Unnatural prosody detection may be biased such that during evaluation, unnatural prosody is falsely detected at a higher rate relative to a rate at which unnatural prosody is missed.

Type: Grant

Filed: September 20, 2007

Date of Patent: November 12, 2013

Assignee: Microsoft Corporation

Inventors: Yong Zhao, Frank Kao-ping Soong, Min Chu, Lijuan Wang
Enhanced interface for use with speech recognition

Patent number: 8583439

Abstract: Improved methods of presenting speech prompts to a user as part of an automated system that employs speech recognition or other voice input are described. The invention improves the user interface by providing in combination with at least one user prompt seeking a voice response, an enhanced user keyword prompt intended to facilitate the user selecting a keyword to speak in response to the user prompt. The enhanced keyword prompts may be the same words as those a user can speak as a reply to the user prompt but presented using a different audio presentation method, e.g., speech rate, audio level, or speaker voice, than used for the user prompt. In some cases, the user keyword prompts are different words from the expected user response keywords, or portions of words, e.g., truncated versions of keywords.

Type: Grant

Filed: January 12, 2004

Date of Patent: November 12, 2013

Assignee: Verizon Services Corp.

Inventor: James Mark Kondziela
Speech synthesis with incremental databases of speech waveforms on user terminals over a communications network

Patent number: 8583437

Abstract: Service architecture for providing to a user terminal of a communications network textual information and relative speech synthesis, the user terminal being provided with a speech synthesis engine and a basic database of speech waveforms includes: a content server for downloading textual information requested by means of a browser application on the user terminal; a context manager for extracting context information from the textual information requested by the user terminal; a context selector for selecting an incremental database of speech waveforms associated with extracted context information and for downloading the incremental database into the user terminal; a database manager on the user terminal for managing the composition of an enlarged database of speech waveforms for the speech synthesis engine including the basic and the incremental databases of speech waveforms.

Type: Grant

Filed: May 31, 2005

Date of Patent: November 12, 2013

Assignee: Telecom Italia S.p.A.

Inventors: Alessio Cervone, Ivano Salvatore Collotta, Paolo Coppo, Donato Ettorre, Maurizio Fodrini, Maura Turolla
Recording and reproducing apparatus

Patent number: 8583443

Abstract: Disclosed is a recording and reproducing apparatus comprising: an apparatus main body; and a remote controller to perform remote control of the apparatus main body, wherein the remote controller comprises: a key operating section to receive a key operation by a user; a sound information inputting section to input sound information; and a transmitting section to transmit sound data based on the sound information to the apparatus main body, and the apparatus main body comprises: a recording section to record input content data on a recording medium; a reproducing section to reproduce the content data; a receiving section to receive the sound data; a sound information recording section to record the sound data so as to be associated with a piece of the content data; and a sound information outputting section to reproduce the sound data to output the reproduced sound data.

Type: Grant

Filed: April 10, 2008

Date of Patent: November 12, 2013

Assignee: Funai Electric Co., Ltd.

Inventor: Masayuki Misawa
System and method to use text-to-speech to prompt whether text-to-speech output should be added during installation of a program on a computer system normally controlled through a user interactive display

Patent number: 8577682

Abstract: An auditory user interactive interface to an application program being installed in the computer controlled system. A routine in an object, in an application program being installed in the computer controlled system for providing an auditory user interface to the program in combination with auditory means for offering the user of the computer controlled system the auditory user interface during installation of said application program, and responsive to the selection of the auditory interface provides the auditory user interface during said installation of the application program.

Type: Grant

Filed: October 27, 2005

Date of Patent: November 5, 2013

Assignee: Nuance Communications, Inc.

Inventors: Peter T. Brunet, Anh Quy Lu, Mark Edward Nosewicz, Lawrence Frank Weiss
Method and apparatus for generating synthetic speech with contrastive stress

Patent number: 8571870

Abstract: Techniques for generating synthetic speech with contrastive stress. In one aspect, a speech-enabled application generates a text input including a text transcription of a desired speech output, and inputs the text input to a speech synthesis system. The synthesis system generates an audio speech output corresponding to at least a portion of the text input, with at least one portion carrying contrastive stress, and provides the audio speech output for the speech-enabled application. In another aspect, a speech-enabled application inputs a plurality of text strings, each corresponding to a portion of a desired speech output, to a software module for rendering contrastive stress. The software module identifies a plurality of audio recordings that render at least one portion of at least one of the text strings as speech carrying contrastive stress. The speech-enabled application generates an audio speech output corresponding to the desired speech output using the audio recordings.

Type: Grant

Filed: August 9, 2010

Date of Patent: October 29, 2013

Assignee: Nuance Communications, Inc.

Inventors: Darren C. Meyer, Stephen R. Springer
System and method for enriching spoken language translation with prosodic information

Patent number: 8571849

Abstract: Disclosed herein are systems, methods, and computer readable-media for enriching spoken language translation with prosodic information in a statistical speech translation framework. The method includes receiving speech for translation to a target language, generating pitch accent labels representing segments of the received speech which are prosodically prominent, and injecting pitch accent labels with word tokens within the translation engine to create enriched target language output text. A further step may be added of synthesizing speech in the target language based on the prosody enriched target language output text. An automatic prosody labeler can generate pitch accent labels. An automatic prosody labeler can exploit lexical, syntactic, and prosodic information of the speech. A maximum entropy model may be used to determine which segments of the speech are prosodically prominent.

Type: Grant

Filed: September 30, 2008

Date of Patent: October 29, 2013

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Srinivas Bangalore, Vivek Kumar Rangarajan Sridhar
Encoding and decoding speech signals

Patent number: 8571039

Abstract: A method and apparatus for transmitting an audio signal over a communication channel comprising encoding the audio signal with an encoder 204 using a first sampling rate, filtering the audio signal using a first cut off frequency, the first cut off frequency being chosen in dependence upon the first sampling rate, and transmitting the encoded and filtered audio signal over the communication channel. The presence of a condition in which the sampling rate of the encoder 204 is to be switched to a second sampling rate at a switching time is determined and if the condition has been determined to be present, the cut off frequency used in the filtering step is gradually changed from the first cut off frequency to a second cut off frequency, the second cut off frequency being chosen in dependence upon the second sampling rate, such that the audio bandwidth of the transmitted signal changes gradually when the sampling rate is switched to the second sampling rate.

Type: Grant

Filed: June 23, 2010

Date of Patent: October 29, 2013

Assignee: Skype

Inventors: Stefan Strommer, Karsten Vandborg Sorensen, Soren Skak Jensen, Koen Vos, Jon Bergenheim
Method and device for fast algebraic codebook search in speech and audio coding

Patent number: 8566106

Abstract: A method and device for searching an algebraic codebook during encoding of a sound signal, wherein the algebraic codebook comprises a set of codevectors formed of a number of pulse positions and a number of pulses distributed over the pulse positions. In the algebraic codebook searching method and device, a reference signal for use in searching the algebraic codebook is calculated. In a first stage, a position of a first pulse is determined in relation with the reference signal and among the number of pulse positions. In each of a number of stages subsequent to the first stage, (a) an algebraic codebook gain is recomputed, (b) the reference signal is updated using the recomputed algebraic codebook gain and (c) a position of another pulse is determined in relation with the updated reference signal and among the number of pulse positions.

Type: Grant

Filed: September 11, 2008

Date of Patent: October 22, 2013

Assignee: Voiceage Corporation

Inventors: Redwan Salami, Vaclav Eksler, Milan Jelinek
Tabulating triphone sequences by 5-phoneme contexts for speech synthesis

Patent number: 8566099

Abstract: A system and method for improving the response time of text-to-speech synthesis using triphone contexts. The method includes identifying a set of triphone sequences, tabulating the set of triphone sequences using a plurality of contexts, where each context specific triphone sequence of the plurality of context specific triphone sequences has a top N triphone units made of the triphone units having lowest target costs when each triphone unit is individually combined into a 5-phoneme combination. Input texts having one of the contexts are received, and one of the context specific triphone sequences is selected based on the context. Input text is then synthesized using the context specific triphone sequence.

Type: Grant

Filed: July 16, 2012

Date of Patent: October 22, 2013

Assignee: AT&T Intellectual Property II, L.P.

Inventor: Alistair D. Conkie
System and method for improving synthesized speech interactions of a spoken dialog system

Patent number: 8566098

Abstract: A system and method are disclosed for synthesizing speech based on a selected speech act. A method includes modifying synthesized speech of a spoken dialogue system, by (1) receiving a user utterance, (2) analyzing the user utterance to determine an appropriate speech act, and (3) generating a response of a type associated with the appropriate speech act, wherein in linguistic variables in the response are selected, based on the appropriate speech act.

Type: Grant

Filed: October 30, 2007

Date of Patent: October 22, 2013

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Ann K Syrdal, Mark Beutnagel, Alistair D Conkie, Yeon-Jun Kim
Game based method for translation data acquisition and evaluation

Patent number: 8566078

Abstract: A method of generating a statistical machine translation database through a game in which a monolingual structure is provided to a plurality of players. A first translation attempt is received from each of the plurality of players. The first translation attempt from each of the plurality of players is compared. Feedback is provided to each of the plurality of players and the attempts are received and compared to provide feedback to iteratively converge subsequent translations from each of the plurality of players into a final translated structure.

Type: Grant

Filed: January 29, 2010

Date of Patent: October 22, 2013

Assignee: International Business Machines Corporation

Inventors: Ruhi Sarikaya, Jiri Navratil, Osamuyimen Stewart, David Lubensky
Apparatus and method for language expression using context and intent awareness

Patent number: 8560301

Abstract: A language expression apparatus and a method based on a context and a intent awareness, are provided. The apparatus and method may recognize a context and an intent of a user and may generate a language expression based on the recognized context and the recognized intent, thereby providing an interpretation/translation service and/or providing an education service for learning a language.

Type: Grant

Filed: March 2, 2010

Date of Patent: October 15, 2013

Assignee: Samsung Electronics Co., Ltd.

Inventor: Yeo Jin Kim
Conference support device, conference support method, and computer-readable medium storing conference support program

Patent number: 8560315

Abstract: A conference support device includes an image receiving portion that receives captured images from conference terminals, a voice receiving portion that receives, from one of the conference terminals, a voice that is generated by a first participant, a first storage portion that stores the captured images and the voice, a voice recognition portion that recognizes the voice, a text data creation portion that creates text data that express the words that are included in the voice, an addressee specification portion that specifies a second participant, whom the voice is addressing, an image creation portion that creates a display image that is configured from the captured images and in which the text data are associated with the first participant and a specified image is associated with at least one of the first participant and the second participant, and a transmission portion that transmits the display image to the conference terminals.

Type: Grant

Filed: March 12, 2010

Date of Patent: October 15, 2013

Assignee: Brother Kogyo Kabushiki Kaisha

Inventor: Mizuho Yasoshima
Voice recognition apparatus and recording medium storing voice recognition program

Patent number: 8560317

Abstract: A vocabulary dictionary storing unit for storing a plurality of words in advance, a vocabulary dictionary managing unit for extracting recognition target words, a matching unit for calculating a degree of matching with the recognition target words based on an accepted voice, a result output unit for outputting, as a recognition result, a word having a best score from a result of calculating the degree of matching, and an extraction criterion information managing unit for changing extraction criterion information according to a result of monitoring by a monitor control unit are provided. The vocabulary dictionary storing unit further includes a scale information storing unit for storing scale information serving as a scale at the time of extracting the recognition target words, and an extraction criterion information storing unit for storing extraction criterion information indicating a criterion of the recognition target words at the time of extracting the recognition target words.

Type: Grant

Filed: September 18, 2006

Date of Patent: October 15, 2013

Assignee: Fujitsu Limited

Inventor: Kenji Abe
Virtual pet system, method and apparatus for virtual pet chatting

Patent number: 8554541

Abstract: A virtual pet system includes: a virtual pet client, adapted to receive a sentence in natural language and send the sentence to a Q&A server; the Q&A server, adapted to receive the sentence, process the sentence through natural language comprehension, generate an answer in natural language based on a result of natural language comprehension and reasoning knowledge, and send the answer in natural language to the virtual pet client. A method for virtual pet chatting includes: receiving a sentence in natural language, perform natural language comprehension for the sentence, and generating an answer in natural language based on a result of natural language comprehension and reasoning knowledge.

Type: Grant

Filed: September 18, 2008

Date of Patent: October 8, 2013

Assignee: Tencent Technology (Shenzhen) Company Ltd.

Inventors: Haisong Yang, Zhiyuan Liu, Yunfeng Liu, Rongling Yu
Training and applying prosody models

Patent number: 8554566

Abstract: Techniques for training and applying prosody models for speech synthesis are provided. A speech recognition engine processes audible speech to produce text annotated with prosody information. A prosody model is trained with this annotated text. After initial training, the model is applied during speech synthesis to generate speech with non-standard prosody from input text. Multiple prosody models can be used to represent different prosody styles.

Type: Grant

Filed: November 29, 2012

Date of Patent: October 8, 2013

Assignee: Morphism LLC

Inventor: James H. Stephens, Jr.
Speech segment processor

Patent number: 8554565

Abstract: According to one embodiment, a speech synthesizer generates a speech segment sequence and synthesizes speech by connecting speech segments of the generated speech segment sequence. If a speech segment of a synthesized first speech segment sequence is different from the speech segment of a synthesized second speech segment sequence having the same synthesis unit as the first speech segment sequence, the speech synthesizer disables the speech segment of the first speech segment sequence that is different from the speech segment of the second speech segment sequence.

Type: Grant

Filed: September 14, 2010

Date of Patent: October 8, 2013

Assignee: Kabushiki Kaisha Toshiba

Inventors: Osamu Nishiyama, Takehiko Kagoshima
Voice guidance system and voice guidance method using the same

Patent number: 8548809

Abstract: A voice guidance system for providing a guidance by voice concerning operations of an information processing apparatus, comprises a detector that detects that a predetermined function of the information processing apparatus is disabled, and a voice guidance unit that outputs a voice message reporting a reason why the predetermined function of the information processing apparatus is disabled, in response to the detection output of the detector.

Type: Grant

Filed: June 16, 2005

Date of Patent: October 1, 2013

Assignee: Fuji Xerox Co., Ltd.

Inventors: Kanji Itaki, Michihiro Kawamura, Nozomi Noguchi
SOCIAL BROADCASTING USER EXPERIENCE

Publication number: 20130253934

Abstract: A method of providing user participation in a social broadcast environment is disclosed. A network communication is received from a user of a broadcast that includes a preference data indicating a preference of the user that a promoted content be included in the broadcast. Via a responsive network communication, a feedback data is provided to the user that includes a predicted future time at which the promoted content may be included in the broadcast.

Type: Application

Filed: January 31, 2013

Publication date: September 26, 2013

Applicant: JELLI, INC.

Inventors: Jateen P. Parekh, Michael S. Dougherty, Sarah Caplener, Mitchell A. Yawitz, Scott Strain, Adam J. Dobrer
Audio processing apparatus and method of mobile device

Patent number: 8542839

Abstract: An audio processing apparatus and method for a mobile device are provided. The audio processing apparatus and method may appropriately determine sound source localizations corresponding to a voice signal and an audio signal, and thereby may simultaneously provide a voice call service and a multimedia service. Also, the audio processing apparatus and method may guarantee quality of the voice call service even when simultaneously providing the voice call service and the multimedia service.

Type: Grant

Filed: March 18, 2009

Date of Patent: September 24, 2013

Assignee: Samsung Electronics Co., Ltd.

Inventors: Chang Yong Son, Do Hyung Kim, Sang Oak Woo, Kang Eun Lee
Disambiguating text that is to be converted to speech using configurable lexeme based rules

Patent number: 8538743

Abstract: A software language including language constructs for disambiguating text that is to be converted to speech using configurable lexeme based rules. The language can include at least one conditional statement and a significance indicator. The conditional statement can define a sense of usage for a lexeme. The significance indicator can define a criteria for selecting an associated sense of usage. The language can also include an action expression that is associated with a conditional statement that defines a set of programmatic actions to be executed upon a selection of the associated usage sense. The conditional statement can include a context range specification that defines a scope of an input string for examination when evaluating the conditional statement. Further, the conditional statement can include a directive that represents a defined condition of the lexeme or the text surrounding the lexeme.

Type: Grant

Filed: March 21, 2007

Date of Patent: September 17, 2013

Assignee: Nuance Communications, Inc.

Inventors: Oswaldo Gago, Steven M. Hancock, Maria E. Smith
VOICE QUALITY CONVERSION SYSTEM, VOICE QUALITY CONVERSION DEVICE, VOICE QUALITY CONVERSION METHOD, VOCAL TRACT INFORMATION GENERATION DEVICE, AND VOCAL TRACT INFORMATION GENERATION METHOD

Publication number: 20130238337

Abstract: A voice quality conversion system includes: an analysis unit which analyzes sounds of plural vowels of different types to generate first vocal tract shape information for each type of the vowels; a combination unit which combines, for each type of the vowels, the first vocal tract shape information on that type of vowel and the first vocal tract shape information on a different type of vowel to generate second vocal tract shape information on that type of vowel; and a synthesis unit which (i) combines vocal tract shape information on a vowel included in input speech and the second vocal tract shape information on the same type of vowel to convert vocal tract shape information on the input speech, and (ii) generates a synthetic sound using the converted vocal tract shape information and voicing source information on the input speech to convert the voice quality of the input speech.

Type: Application

Filed: April 29, 2013

Publication date: September 12, 2013

Applicant: Panasonic Corporation

Inventors: Takahiro KAMAI, Yoshifumi HIROSE
Systems and methods for determining the N-best strings

Patent number: 8527273

Abstract: Systems and methods for identifying the N-best strings of a weighted automaton. A potential for each state of an input automaton to a set of destination states of the input automaton is first determined. Then, the N-best paths are found in the result of an on-the-fly determinization of the input automaton. Only the portion of the input automaton needed to identify the N-best paths is determinized. As the input automaton is determinized, a potential for each new state of the partially determinized automaton is determined and is used in identifying the N-best paths of the determinized automaton, which correspond exactly to the N-best strings of the input automaton.

Type: Grant

Filed: July 30, 2012

Date of Patent: September 3, 2013

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Mehryar Mohri, Michael Dennis Riley
Speech synthesis using deep neural networks

Patent number: 8527276

Abstract: A method and system for is disclosed for speech synthesis using deep neural networks. A neural network may be trained to map input phonetic transcriptions of training-time text strings into sequences of acoustic feature vectors, which yield predefined speech waveforms when processed by a signal generation module. The training-time text strings may correspond to written transcriptions of speech carried in the predefined speech waveforms. Subsequent to training, a run-time text string may be translated to a run-time phonetic transcription, which may include a run-time sequence of phonetic-context descriptors, each of which contains a phonetic speech unit, data indicating phonetic context, and data indicating time duration of the respective phonetic speech unit. The trained neural network may then map the run-time sequence of the phonetic-context descriptors to run-time predicted feature vectors, which may in turn be translated into synthesized speech by the signal generation module.

Type: Grant

Filed: October 25, 2012

Date of Patent: September 3, 2013

Assignee: Google Inc.

Inventors: Andrew William Senior, Byungha Chun, Michael Schuster
Simultaneous interpretation system

Patent number: 8527258

Abstract: A simultaneous interpretation system includes headsets for inputting and outputting voice, and a portable terminal for receiving an original language voice speech signal to be interpreted that is output from the first headset. The portable terminal outputs an interpreted voice speech signal based on the original language voice speech signal that has been interpreted into a different language to the second headset. The portable terminal either performs the interpretation or accesses an interpretation server to provide the second headset with the interpreted voice speech signal. Hence, the simultaneous interpretation is carried out using the short-range communication between the users by medium of the single portable terminal and thus more efficient and unrestricted conversation is realized.

Type: Grant

Filed: February 1, 2010

Date of Patent: September 3, 2013

Assignee: Samsung Electronics Co., Ltd.

Inventors: Kyoung-Yup Kim, Jun-Tai Kim
Transforming a tactually selected user input into an audio output

Patent number: 8527275

Abstract: A contextual input device includes a plurality of tactually discernable keys disposed in a predetermined configuration which replicates a particular relationship among a plurality of items associated with a known physical object. The tactually discernable keys are typically labeled with Braille type. The known physical object is typically a collection of related items grouped together by some common relationship. A computer-implemented process determines whether an input signal represents a selection of an item from among a plurality of items or an attribute pertaining to an item among the plurality of items. Once the selected item or attribute pertaining to an item is determined, the computer-implemented process transforms a user's selection from the input signal into an analog audio signal which is then audibly output as human speech with an electro-acoustic transducer.

Type: Grant

Filed: July 17, 2009

Date of Patent: September 3, 2013

Assignee: Cal Poly Corporation

Inventors: Fantin Dennis, C. Arthur MacCarley
Method and apparatus for estimating high-band energy in a bandwidth extension system

Patent number: 8527283

Abstract: A method (100) includes receiving (101) an input digital audio signal comprising a narrow-band signal. The input digital audio signal is processed (102) to generate a processed digital audio signal. An estimate of the high-band energy level corresponding to the input digital audio signal is determined (103). Modification of the estimated high-band energy level is done based on an estimation accuracy and/or narrow-band signal characteristics (104). A high-band digital audio signal is generated based on the modified estimate of the high-band energy level and an estimated high-band spectrum corresponding to the modified estimate of the high-band energy level (105).

Type: Grant

Filed: January 19, 2011

Date of Patent: September 3, 2013

Assignee: Motorola Mobility LLC

Inventors: Mark A. Jasiuk, Tenkasi V. Ramabadran
Method and apparatus for sculpting synthesized speech

Patent number: 8527281

Abstract: Methods and systems for sculpting synthesized speech using a graphic user interface are disclosed. An operator enters a stream of text that is used to produce a stream of target phonetic-units. The stream of target phonetic-units is then submitted to a unit-selection process to produce a stream of selected phonetic-units, each selected phonetic-unit derived from a database of sample phonetic-units. After the stream of sample phonetic-units is selected, an operator can remove various selected phonetic-units from the stream of selected phonetic-units, prune the sample phonetic-database and edit various cost functions using the graphic user interface. The edited speech information can then be submitted to the unit-selection process to produce a second stream of selected phonetic-units.

Type: Grant

Filed: June 29, 2012

Date of Patent: September 3, 2013

Assignee: Nuance Communications, Inc.

Inventors: Peter Rutten, Paul A. Taylor
Biochemical analyzer having microprocessing apparatus with expandable voice capacity

Patent number: 8521535

Abstract: A biochemical analyzer having a microprocessing apparatus with expandable voice capacity is characterized in that a driving module is installed in a data processor and a voice carrier is replaceable. Thereby, increase or decrease of voice files can be easily done by replacing the current voice carrier with an alternative voice carrier storing desired voice files, without the need of replacing the driving module together with the voice carrier, thereby saving costs and reducing processing procedures.

Type: Grant

Filed: November 10, 2010

Date of Patent: August 27, 2013

Inventor: Chun-Yu Chen
Localization for interactive voice response systems

Patent number: 8521513

Abstract: A language-neutral speech grammar extensible markup language (GRXML) document and a localized response document are used to build a localized GRXML document. The language-neutral GRXML document specifies an initial grammar rule element. The initial grammar rule element specifies a given response type identifier and a given action. The localized response document contains a given response entry that specifies the given response type identifier and a given response in a given language. The localized GRXML document specifies a new grammar rule element. The new grammar rule element specifies the given response in the given language and the given action. The localized GRXML document is installed in an interactive voice response (IVR) system. The localized GRXML document configures the IVR system to perform the given action when a user of the IVR system speaks the given response to the IVR system.

Type: Grant

Filed: March 12, 2010

Date of Patent: August 27, 2013

Assignee: Microsoft Corporation

Inventors: Thomas W. Millett, David Notario
Apparatus and method for synthesizing an output signal

Patent number: 8515759

Abstract: An apparatus for synthesizing a rendered output signal having a first audio channel and a second audio channel includes a decorrelator stage for generating a decorrelator signal based on a downmix signal, and a combiner for performing a weighted combination of the downmix signal and a decorrelated signal based on parametric audio object information, downmix information and target rendering information. The combiner solves the problem of optimally combining matrixing with decorrelation for a high quality stereo scene reproduction of a number of individual audio objects using a multichannel downmix.

Type: Grant

Filed: April 23, 2008

Date of Patent: August 20, 2013

Assignee: Dolby International AB

Inventors: Jonas Engdegard, Heiko Purnhagen, Barbara Resch, Lars Villemoes, Cornelia Falch, Juergen Herre, Johannes Hilpert, Andreas Hoelzer, Leonid Terentiev
Speech-to-speech translation

Patent number: 8515749

Abstract: Systems and methods for facilitating communication including recognizing speech in a first language represented in a first audio signal; forming a first text representation of the speech; processing the first text representation to form data representing a second audio signal; and causing presentation of the second audio signal to a second user while responsive to an interrupt signal from a first user. In some embodiments, processing the first text representation includes translating the first text representation to a second text representation in a second language and processing the second text representation to form the data representing the second audio signal. In some embodiments include accepting an interrupt signal from the first user and interrupting the presentation of the second audio signal.

Type: Grant

Filed: May 20, 2009

Date of Patent: August 20, 2013

Assignee: Raytheon BBN Technologies Corp.

Inventor: David G. Stallard
Method and system for enhancing a speech database

Patent number: 8510113

Abstract: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, identifying replacement segments in a secondary speech database, enhancing the primary speech database by substituting the identified secondary speech database segments for the corresponding identified segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.

Type: Grant

Filed: August 31, 2006

Date of Patent: August 13, 2013

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Alistair Conkie, Ann K. Syrdal
Method and system for enhancing a speech database

Patent number: 8510112

Abstract: A system, method and computer readable medium that enhances a speech database for speech synthesis is disclosed. The method may include labeling audio files in a primary speech database, identifying segments in the labeled audio files that have varying pronunciations based on language differences, modifying the identified segments in the primary speech database using selected mappings, enhancing the primary speech database by substituting the modified segments for the corresponding identified database segments in the primary speech database, and storing the enhanced primary speech database for use in speech synthesis.

Type: Grant

Filed: August 31, 2006

Date of Patent: August 13, 2013

Assignee: AT&T Intellectual Property II, L.P.

Inventors: Alistair Conkie, Ann Syrdal
Synthetic speech text-input device and program

Patent number: 8504368

Abstract: A synthetic speech text-input device is provided that allows a user to intuitively know an amount of an input text that can be fit in a desired duration. A synthetic speech text-input device 1 includes: an input unit that receives a set duration in which a speech to be synthesized is to be fit, and a text for a synthetic speech; a text amount calculation unit that calculates an acceptable text amount based on the set duration received by the input unit, the acceptable text amount being an amount of a text acceptable as a synthetic speech of the set duration; and a text amount output unit that outputs the acceptable text amount calculated by the text amount calculation unit, when the input unit receives the text.

Type: Grant

Filed: September 10, 2010

Date of Patent: August 6, 2013

Assignee: Fujitsu Limited

Inventors: Nobuyuki Katae, Kentaro Murase
Modulation device, modulation method, demodulation device, and demodulation method

Patent number: 8498860

Abstract: A modulation device including: a modulation unit for modulating a carrier in an audible sound range by an encoded transmission signal to generate a modulated signal; a masker sound generation unit for generating a masker signal outputted as a masker sound for making the modulated signal harder to hear when transmitted with the modulated signal; and an acoustic signal generation unit for inserting the masker signal in the modulated signal to generate an acoustic signal.

Type: Grant

Filed: October 2, 2006

Date of Patent: July 30, 2013

Assignee: NTT DoCoMo, Inc.

Inventor: Hosei Matsuoka
Systems and methods for selection and use of multiple characters for document narration

Patent number: 8498867

Abstract: Disclosed are techniques and systems to provide a narration of a text in multiple different voices. Further disclosed are techniques and systems for generating an audible output in which different portions of a text are narrated using voice models associated with different characters.

Type: Grant

Filed: January 14, 2010

Date of Patent: July 30, 2013

Assignee: K-NFB Reading Technology, Inc.

Inventors: Raymond C. Kurzweil, Paul Albrecht, Peter Chapman
Systems and methods for multiple language document narration

Patent number: 8498866

Abstract: Disclosed are techniques and systems to provide a narration of a text in multiple different languages where the portions of the text narrated using the different voices associated with different languages are selected by a user.

Type: Grant

Filed: January 14, 2010

Date of Patent: July 30, 2013

Assignee: K-NFB Reading Technology, Inc.

Inventors: Raymond C. Kurzweil, Paul Albrecht, Peter Chapman
Method and apparatus for transmitting speech data to a remote device in a distributed speech recognition system

Patent number: 8494849

Abstract: A method of transmitting speech data to a remote device in a distributed speech recognition system, includes the steps of: dividing an input speech signal into frames; calculating, for each frame, a voice activity value representative of the presence of speech activity in the frame; grouping the frames into multiframes, each multiframe including a predetermined number of frames; calculating, for each multiframe, a voice activity marker representative of the number of frames in the multiframe representing speech activity; and selectively transmitting, on the basis of the voice activity marker associated with each multiframe, the multiframes to the remote device.

Type: Grant

Filed: June 20, 2005

Date of Patent: July 23, 2013

Assignee: Telecom Italia S.p.A.

Inventors: Ivano Salvatore Collotta, Donato Ettorre, Maurizio Fodrini, Pierluigi Gallo, Roberto Spagnolo
System and method for audibly presenting selected text

Patent number: 8489400

Abstract: Disclosed herein are methods for presenting speech from a selected text that is on a computing device. This method includes presenting text on a touch-sensitive display and having that text size within a threshold level so that the computing device can accurately determine the intent of the user when the user touches the touch screen. Once the user touch has been received, the computing device identifies and interprets the portion of text that is to be selected, and subsequently presents the text audibly to the user.

Type: Grant

Filed: August 6, 2012

Date of Patent: July 16, 2013

Assignee: AT&T Intellectual Property I, L.P.

Inventors: Alistair D. Conkie, Horst Schroeter
Method for live remote narration of a digital book

Patent number: 8484027

Abstract: A method for narrating a digital book includes retrievably storing first data relating to narration of the digital book by a first end-user. The first data is then provided to a user device having stored thereon the digital book. Subsequently, the digital book is presented in narrated form to a second end-user via the user device. In particular, the digital book is displayed via a display portion of the user device while simultaneously providing in audible form the first data via an audio output portion of the user device.

Type: Grant

Filed: June 10, 2010

Date of Patent: July 9, 2013

Assignee: Skyreader Media Inc.

Inventor: William A. Murphy
Portable audio control system and audio control device thereof

Patent number: 8484026

Abstract: A portable audio control system that controls an audio signal transmitted from an electronic device, including an earphone device and an audio control device. The audio control device includes an audio source receiver, a signal synthesis module, and an audio output unit. The audio receiver, which is connected with the electronic device, is used for receiving the audio signal. The signal synthesis module receives both the audio signal and a voice signal coming from an external audio resource, and then synthesizes those signals. The audio transmitter is used to output the synthesized sound to the earphone device. As users utilize the portable audio control system to connect with the electronic device, both sound from the electronic device and the external voice or song can be listened at the same time.

Type: Grant

Filed: August 24, 2009

Date of Patent: July 9, 2013

Inventor: Pi-Fen Lin
Systems and methods for document navigation with a text-to-speech engine

Patent number: 8484028

Abstract: A system for visually navigating a document in conjunction with a text-to-speech (“TTS) engine presents a visual display of a region of interest that is related to the text of the document that is being audibly presented as speech to a user. When the TTS engine converts the text to speech and presents the speech to the user, the system presents the corresponding section of text on a display. During the presentation, if the system encounters a linked section of text, the visual display changes to display a linked region of interest that corresponds to the linked section of text.

Type: Grant

Filed: October 24, 2008

Date of Patent: July 9, 2013

Assignee: Fuji Xerox Co., Ltd.

Inventors: Scott Carter, Laurent Denoue
Modification of voice waveforms to change social signaling

Patent number: 8484035

Abstract: A method of altering a social signaling characteristic of a speech signal. A statistically large number of speech samples created by different speakers in different tones of voice are evaluated to determine one or more relationships that exist between a selected social signaling characteristic and one or more measurable parameters of the speech samples. An input audio voice signal is then processed in accordance with these relationships to modify one or more of controllable parameters of input audio voice signal to produce a modified output audio voice signal in which said selected social signaling characteristic is modified. In a specific illustrative embodiment, a two-level hidden Markov model is used to identify voiced and unvoiced speech segments and selected controllable characteristics of these speech segments are modified to alter the desired social signaling characteristic.

Type: Grant

Filed: September 6, 2007

Date of Patent: July 9, 2013

Assignee: Massachusetts Institute of Technology

Inventor: Alex Paul Pentland

prev … 4 5 6 7 8 9 10 11 12 … next