Systems Using Speech Synthesizers (epo) Patents (Class 704/E13.008)
-
Patent number: 11862169Abstract: Providing speech-to-text (STT) transcription by a user endpoint device includes initiating an audio communication between an enterprise server and the user endpoint device, the audio communication comprising a voice interaction between a user associated with the user endpoint device and an agent associated with an agent device to which the enterprise server routes the audio communication; performing a first STT of at least a portion of the voice interaction to produce a first transcribed speech in a first language; concurrent with performing the first STT, performing, by the user endpoint device, a second STT of the at least the portion of the voice interaction to produce a second transcribed speech in a second language different than the first language, and transmitting the at least the portion of the voice interaction and at least the first transcribed speech from the user endpoint device to the enterprise server.Type: GrantFiled: September 11, 2020Date of Patent: January 2, 2024Assignee: Avaya Management L.P.Inventors: Valentine C. Matula, Pushkar Yashavant Deole, Sandesh Chopdekar, Navin Daga
-
Publication number: 20140058733Abstract: The amount of speech output to a blind or low-vision user using a screen reader application is automatically adjusted based on how the user navigates to a control in a graphic user interface. Navigation by mouse presumes the user has greater knowledge of the identity of the control than navigation by tab keystroke which is more indicative of a user searching for a control. In addition, accelerator keystrokes indicate a higher level of specificity to set focus on a control and thus less verbosity is required to sufficiently inform the screen reader user.Type: ApplicationFiled: August 23, 2012Publication date: February 27, 2014Applicant: FREEDOM SCIENTIFIC, INC.Inventors: Garald Lee Voorhees, Glen Gordon, Eric Damery
-
Publication number: 20130238340Abstract: Methods and apparatuses for wearing state device operation are disclosed. In one example, a headset includes a sensor for detecting a headset donned state or a headset doffed state. The headset operation is modified based on whether the headset is donned or doffed.Type: ApplicationFiled: March 9, 2012Publication date: September 12, 2013Applicant: Plantronics, Inc.Inventor: Scott Walsh
-
Publication number: 20130030811Abstract: Sensors within the vehicle monitor driver movement, such as face and head movement to ascertain the direction a driver is looking, and gestural movement to ascertain what the driver may be pointing at. This information is combined with video camera data taken of the external vehicle surroundings. The apparatus uses these data to assist the speech dialogue processor to disambiguate phrases uttered by the driver. The apparatus can issue informative responses or control vehicular functions based on queries automatically generated based on the disambiguated phrases.Type: ApplicationFiled: July 29, 2011Publication date: January 31, 2013Applicant: PANASONIC CORPORATIONInventors: Jules Olleon, Rohit Talati, David Kryze, Akihiko Sugiura
-
Publication number: 20120221339Abstract: According to one embodiment, a method, apparatus for synthesizing speech, and a method for training acoustic model used in speech synthesis is provided. The method for synthesizing speech may include determining data generated by text analysis as fuzzy heteronym data, performing fuzzy heteronym prediction on the fuzzy heteronym data to output a plurality of candidate pronunciations of the fuzzy heteronym data and probabilities thereof, generating fuzzy context feature labels based on the plurality of candidate pronunciations and probabilities thereof, determining model parameters for the fuzzy context feature labels based on acoustic model with fuzzy decision tree, generating speech parameters from the model parameters, and synthesizing the speech parameters via synthesizer as speech.Type: ApplicationFiled: February 22, 2012Publication date: August 30, 2012Inventors: Xi Wang, Xiaoyan Lou, Jian Li
-
Publication number: 20120166176Abstract: A conventional speech recognition dictionary, translation dictionary and speech synthesis dictionary used in speech translation have inconsistencies.Type: ApplicationFiled: March 3, 2010Publication date: June 28, 2012Inventors: Satoshi Nakamura, Eiichiro Sumita, Yutaka Ashikari, Noriyuki Kimura, Chiori Hori
-
Publication number: 20110313762Abstract: A method, system, and computer program product are provided for speech output with confidence indication. The method includes receiving a confidence score for segments of speech or text to be synthesized to speech. The method includes modifying a speech segment by altering one or more parameters of the speech proportionally to the confidence score.Type: ApplicationFiled: June 20, 2010Publication date: December 22, 2011Applicant: International Business Machines CorporationInventors: Shay Ben-David, Ron Hoory
-
Publication number: 20110282668Abstract: A method of and system for speech synthesis. First and second text inputs are received in a text-to-speech system, and processed into respective first and second speech outputs corresponding to stored speech respectively from first and second speakers using a processor of the system. The second speech output of the second speaker is adapted to sound like the first speech output of the first speaker.Type: ApplicationFiled: May 14, 2010Publication date: November 17, 2011Applicant: GENERAL MOTORS LLCInventors: Jeffrey M. Stefan, Gaurav Talwar, Rathinavelu Chengalvarayan
-
Publication number: 20110274311Abstract: A sign language recognition method includes a camera capturing an image of a gesture from a signer, comparing the image of the gesture with a number of gestures to find out the meanings of the gesture, and displaying or vocalizing the meanings of the gestures.Type: ApplicationFiled: August 8, 2010Publication date: November 10, 2011Applicant: HON HAI PRECISION INDUSTRY CO., LTD.Inventors: HOU-HSIEN LEE, CHANG-JUNG LEE, CHIH-PING LO
-
VOICE SYNTHESIS DEVICE, NAVIGATION DEVICE HAVING THE SAME, AND METHOD FOR SYNTHESIZING VOICE MESSAGE
Publication number: 20110218809Abstract: A voice synthesis device includes: a memory for storing a plurality of recorded voice data; a dividing unit for dividing a text into a plurality of words or phrases, wherein the text is to be converted into a voice message; a verifying unit for verifying whether one of the recorded voice data corresponding to each word or phrase is disposed in the memory; and a voice synthesizing unit for preparing a whole of the text with the recorded voice data when all of the recorded voice data corresponding to all of the plurality of words or phrases are disposed in the memory, and for preparing the whole of the text with rule-based synthesized voice data when at least one of the recorded voice data corresponding to one of the plurality of words or phrases is not disposed in the memory.Type: ApplicationFiled: February 8, 2011Publication date: September 8, 2011Applicant: DENSO CORPORATIONInventors: Ryuichi Suzuki, Takashi Ooi -
Publication number: 20110116610Abstract: Messages in a message system are converted from one format to another format in accordance with preferred message formats and/or conditions. Message formats can include text messages, multimedia messages, visual voicemail messages, and/or other audio/visual messages. Based on conditions such as recipient device location or velocity and a preferred message format a message can be converted into an appropriate transmission format and transmitted and/or communicated to the recipient in its appropriate format (e.g., text, multimedia, audio, etc. . . .).Type: ApplicationFiled: November 19, 2009Publication date: May 19, 2011Applicant: AT&T MOBILITY II LLCInventors: Venson Shaw, Robert Z. Evora
-
Publication number: 20110054903Abstract: Embodiments of rich text modeling for speech synthesis are disclosed. In operation, a text-to-speech engine refines a plurality of rich context models based on decision tree-tied Hidden Markov Models (HMMs) to produce a plurality of refined rich context models. The text-to-speech engine then generates synthesized speech for an input text based at least on some of the plurality of refined rich context models.Type: ApplicationFiled: December 2, 2009Publication date: March 3, 2011Applicant: MICROSOFT CORPORATIONInventors: Zhi-Jie Yan, Yao Qian, Frank Kao-Ping Soong
-
Publication number: 20110029325Abstract: Methods and apparatus to enhance healthcare information analyses are disclosed herein.Type: ApplicationFiled: July 28, 2009Publication date: February 3, 2011Applicant: General Electric Company, a New York CorporationInventors: Emil Markov Georgiev, Erik Paul Kemper
-
Publication number: 20100223058Abstract: A speech synthesis device includes a pitch pattern generation unit (104) which generates a pitch pattern by combining, based on pitch pattern target data including phonemic information formed from at least syllables, phonemes, and words, a standard pattern which approximately expresses the rough shape of the pitch pattern and an original utterance pattern which expresses the pitch pattern of a recorded speech, a unit waveform selection unit (106) which selects unit waveform data based on the generated pitch pattern and upon selection, selects original utterance unit waveform data corresponding to the original utterance pattern in a section where the original utterance pattern is used, and a speech waveform generation unit (107) which generates a synthetic speech by editing the selected unit waveform data so as to reproduce prosody represented by the generated pitch pattern.Type: ApplicationFiled: August 28, 2008Publication date: September 2, 2010Inventors: Yasuyuki Mitsui, Reishi Kondo
-
Publication number: 20100088089Abstract: Synthesizing a set of digital speech samples corresponding to a selected voicing state includes dividing speech model parameters into frames, with a frame of speech model parameters including pitch information, voicing information determining the voicing state in one or more frequency regions, and spectral information. First and second digital filters are computed using, respectively, first and second frames of speech model parameters, with the frequency responses of the digital filters corresponding to the spectral information in frequency regions for which the voicing state equals the selected voicing state. A set of pulse locations are determined, and sets of first and second signal samples are produced using the pulse locations and, respectively, the first and second digital filters. Finally, the sets of first and second signal samples are combined to produce a set of digital speech samples corresponding to the selected voicing state.Type: ApplicationFiled: August 21, 2009Publication date: April 8, 2010Applicant: DIGITAL VOICE SYSTEMS, INC.Inventor: John C. Hardwick
-
Publication number: 20100082350Abstract: An approach providing the efficient use of speech synthesis in rendering text content as audio in a communications network. The communications network can include a telephony network and a data network in support of, for example, Voice over Internet Protocol (VoIP) services. A speech synthesis system receives a text string from either a telephony network, or a data network. The speech synthesis system determines whether a rendered audio file of the text string is stored in a database and to render the text string to output the rendered audio file, if the rendered audio is determined not to exist. The rendered audio file is stored in the database for re-use according to a hash value generated by the speech synthesis system based on the text string.Type: ApplicationFiled: December 8, 2009Publication date: April 1, 2010Applicant: VERIZON BUSINESS GLOBAL LLCInventors: Paul T. Schultz, Robert A. Sartini
-
Publication number: 20100082345Abstract: An “Animation Synthesizer” uses trainable probabilistic models, such as Hidden Markov Models (HMM), Artificial Neural Networks (ANN), etc., to provide speech and text driven body animation synthesis. Probabilistic models are trained using synchronized motion and speech inputs (e.g., live or recorded audio/video feeds) at various speech levels, such as sentences, phrases, words, phonemes, sub-phonemes, etc., depending upon the available data, and the motion type or body part being modeled. The Animation Synthesizer then uses the trainable probabilistic model for selecting animation trajectories for one or more different body parts (e.g., face, head, hands, arms, etc.) based on an arbitrary text and/or speech input. These animation trajectories are then used to synthesize a sequence of animations for digital avatars, cartoon characters, computer generated anthropomorphic persons or creatures, actual motions for physical robots, etc.Type: ApplicationFiled: September 26, 2008Publication date: April 1, 2010Applicant: MICROSOFT CORPORATIONInventors: Lijuan Wang, Lei Ma, Frank Kao-Ping Soong
-
Publication number: 20100070281Abstract: Disclosed herein are methods for presenting speech from a selected text that is on a computing device. This method includes presenting text on a touch-sensitive display and having that text size within a threshold level so that the computing device can accurately determine the intent of the user when the user touches the touch screen. Once the user touch has been received, the computing device identifies and interprets the portion of text that is to be selected, and subsequently presents the text audibly to the user.Type: ApplicationFiled: October 24, 2008Publication date: March 18, 2010Applicant: AT&T Intellectual Property I, L.P.Inventors: Alistair D. CONKIE, Horst Schroeter
-
Publication number: 20100057465Abstract: A text-to-speech (TTS) system implemented in an automotive vehicle is dynamically tuned to improve intelligibility over a wide variety of vehicle operating states and environmental conditions. In one embodiment of the present invention, a TTS system is interfaced to one or more vehicle sensors to measure parameters including vehicle speed, interior noise, visibility conditions, and road roughness, among others. In response to measurements of these operating parameters, TTS voice volume, pitch, and speed, among other parameters, may be tuned in order to improve intelligibility of the TTS voice system and increase its effectiveness for the operator of the vehicle.Type: ApplicationFiled: September 3, 2008Publication date: March 4, 2010Inventors: DAVID MICHAEL KIRSCH, Ritchie Winson Huang
-
Publication number: 20100042411Abstract: A method of building an audio description of a particular product of a class of products includes providing a plurality of human voice recordings, wherein each of the human voice recordings includes audio corresponding to an attribute value common to many of the products. The method also includes automatically obtaining attribute values of the particular product, wherein the attribute values reside electronically. The method also includes automatically applying a plurality of rules for selecting a subset of the human voice recordings that correspond to the obtained attribute values and automatically stitching the selected subset of human voice recordings together to provide a voiceover product description of the particular product. A similar method is used to build an audio description of a particular process.Type: ApplicationFiled: August 15, 2008Publication date: February 18, 2010Inventors: Jamie M. Addessi, Mark Paul Bonfigli, Richard F. Gibbs, JR., Christopher Nathaniel Scott
-
Publication number: 20100030561Abstract: A system that outputs phonemes and accents of texts. The system has a storage section storing a first corpus in which spellings, phonemes, and accents of a text input beforehand are recorded separately for individual segmentations of the words that are contained in the text. A text for which phonemes and accents are to be output is acquired and the first corpus is searched to retrieve at least one set of spellings that match the spellings in the text from among sets of contiguous spellings. Then, the combination of a phoneme and an accent that has a higher probability of occurrence in the first corpus than a predetermined reference probability is selected as the phonemes and accent of the text.Type: ApplicationFiled: August 3, 2009Publication date: February 4, 2010Applicant: Nuance Communications, Inc.Inventors: Shinsuke Mori, Toru Nagano, Masafumi Nishimura
-
Publication number: 20090313023Abstract: The invention converts raw data in a base language (e.g. English) into conversational formatted messages in multiple languages. The process converts input data rows into related sequences to a set of prerecorded audio phrase files. The sequences reference both recorded phrases of input data components and user-created text phrases inserted before and after the input data. When the audio sequences are played in sequence, a coherent conversational message in the language of the caller results. An IVR server responding to a caller's menu selection uses the invention's output data to generate the coherent response. Two embodiment are presented, a simple embodiment that responds to messages, and a more complex embodiment that converts enterprise demographic and member-event data collected over a period into audio sentences played in response to a menu item section by a caller in the caller's language.Type: ApplicationFiled: June 15, 2009Publication date: December 17, 2009Inventor: Ralph Jones
-
Publication number: 20090306986Abstract: Service architecture for providing to a user terminal of a communications network textual information and relative speech synthesis, the user terminal being provided with a speech synthesis engine and a basic database of speech waveforms includes: a content server for downloading textual information requested by means of a browser application on the user terminal; a context manager for extracting context information from the textual information requested by the user terminal; a context selector for selecting an incremental database of speech waveforms associated with extracted context information and for downloading the incremental database into the user terminal; a database manager on the user terminal for managing the composition of an enlarged database of speech waveforms for the speech synthesis engine including the basic and the incremental databases of speech waveforms.Type: ApplicationFiled: May 31, 2005Publication date: December 10, 2009Inventors: Alessio Cervone, Ivano Salvatore Collotta, Paolo Coppo, Donato Ettorre, Maurizio Fodrini, Maura Turolla
-
Publication number: 20090299746Abstract: A method for performing speech synthesis to a textual content at a client. The method includes the steps of: performing speech synthesis to the textual content based on a current acoustical unit set Scurrent in a corpus at the client; analyzing the textual content and generating a list of target units with corresponding context features, selecting multiple acoustical unit candidates for each target unit according to the context features based on an acoustical unit set Stotal that is more plentiful than the current acoustical unit set Scurrent in the corpus at the client, and determining acoustical units suitable for speech synthesis for the textual content according to the multiple unit candidates; and updating the current acoustical unit set Scurrent in the corpus at the client based on the determined acoustical units.Type: ApplicationFiled: May 27, 2009Publication date: December 3, 2009Inventors: Fan Ping Meng, Yong Qin, Qin Shi, Zhiwei Shuang
-
Publication number: 20090281786Abstract: A natural-language processing system (10) includes a registration-candidate storage section (32) that stores therein registration-candidate dictionary data, a judgment means (22) that compares input data against the registration-candidate dictionary data to thereby judge whether or not the input data includes a word corresponding to the registration-candidate dictionary data, an inquiry means (23) that inquires to a user whether or not corresponding dictionary data is to be registered in a dictionary storage section (31) to accept a user's instruction if it is judged that a corresponding word exists, a dictionary registration means (24) that registers the corresponding dictionary data in the dictionary storage section based on the input instruction, and a natural language processing means (25) that executes a natural-language processing onto the input data by using the dictionary data registered in the dictionary storage section.Type: ApplicationFiled: September 6, 2007Publication date: November 12, 2009Inventors: Shinichi Ando, Kunihiko Sadamasa, Shinichi Doi
-
Publication number: 20090234652Abstract: The voice synthesis device includes: an emotion input unit (202) which obtains an utterance mode of a voice waveform for which voice synthesis is to be performed; a prosody generation unit (205) which generate a prosody which is used when a language-processed text is uttered in the obtained utterance mode; a characteristic tone selection unit (203) which selects a characteristic tone based on the utterance mode, the characteristic tone is observed when the text is uttered in the obtained utterance mode: a characteristic tone temporal position estimation unit (604) which (i) judges whether or not each of phonemes included in a phonologic sequence of the text is to be uttered with the characteristic tone, based on the phonologic sequence, the characteristic tone, and the prosody, and (ii) decide a phoneme which is an utterance position where the text is uttered with the characteristic tone: and an element selection unit (606) and an element connection unit (209) which generates the voice waveform based on the pType: ApplicationFiled: May 2, 2006Publication date: September 17, 2009Inventors: Yumiko Kato, Takahiro Kamai
-
Publication number: 20090222269Abstract: An apparatus for voice synthesis includes: a word database for storing words and voices; a syllable database for storing syllables and voices; a processor for executing a process including: extracting a word from a document, generating a voice signal based on the extracted voice when the extracted word is included in the word database synthesizing a voice signal based on the extracted voice associated with the one or more syllables corresponding to the extracted word when the extracted word is not found in the word database; a speaker for producing a voice based on either of the generated and the synthesized voice signal; and a display for selectively displaying the extracted word when the voice based on the synthesized voice signal is produced by the speaker.Type: ApplicationFiled: May 11, 2009Publication date: September 3, 2009Inventor: Shinichiro MORI
-
Publication number: 20090192781Abstract: A machine translation method, system for using the method, and computer readable media are disclosed. The method includes the steps of receiving a source language sentence, selecting a set of target language n-grams using a lexical classifier and based on the source language sentence. When selecting the set of target language n-grams, in at least one n-gram, n is greater than 1. The method continues by combining the selected set of target language n-grams as a finite state acceptor (FSA), weighting the FSA with data from the lexical classifier, and generating an n-best list of target sentences from the FSA. As an alternate to using the FSA, N strings may be generated from the n-grams and ranked using a language model. The N strings may be represented by an FSA for efficiency but it is not necessary.Type: ApplicationFiled: January 30, 2008Publication date: July 30, 2009Applicant: AT&T LabsInventors: Srinivas BANGALORE, Emil Ettelaie
-
Publication number: 20090177474Abstract: A speech synthesizer includes a periodic component fusing unit and an aperiodic component fusing unit, and fuses periodic components and aperiodic components of a plurality of speech units for each segment, which are selected by a unit selector, by a periodic component fusing unit and an aperiodic component fusing unit, respectively. The speech synthesizer is further provided with an adder, so that the adder adds, edits, and concatenates the periodic components and the aperiodic components of the fused speech units to generate a speech waveform.Type: ApplicationFiled: September 18, 2008Publication date: July 9, 2009Applicant: KABUSHIKI KAISHA TOSHIBAInventors: Masahiro Morita, Takehiko Kagoshima
-
Publication number: 20090171665Abstract: Techniques are described for enabling flexible and dynamic creation and/or modification of voice data for a position-determining device. In some embodiments, a voice package is provided that includes a language database and a plurality of audio files. The language database specifies appropriate syntax and vocabulary for information that is intended for audio output by a position-determining device. The audio files include words and/or phrases that may be accessed by the position-determining device to communicate the information via audible output. Some embodiments utilize a voice package toolkit to construct and/or customize one or more parts of a voice package.Type: ApplicationFiled: December 18, 2008Publication date: July 2, 2009Applicant: GARMIN LTD.Inventors: Scott D. Hammerschmidt, Jacob W. Caire, Michael P. Russell, David W. Wiskur, Scott J. Brunk
-
Publication number: 20090171668Abstract: A management system for guiding an agent in a media-specific dialogue has a conversion engine for instantiating ongoing dialogue as machine-readable text, if the dialogue is in voice media, a context analysis engine for determining facts from the text, a rules engine for asserting rules based on fact input, and a presentation engine for presenting information to the agent to guide the agent in the dialogue. The context analysis engine passes determined facts to the rules engine, which selects and asserts to the presentation engine rules based on the facts, and the presentation engine provides periodically updated guidance to the agent based on the rules asserted.Type: ApplicationFiled: December 28, 2007Publication date: July 2, 2009Inventors: Dave Sneyders, Brian Galvin, S. Michael Perlmutter
-
Publication number: 20090150152Abstract: A method and apparatus for indexing one or more audio signals using a speech to text engine and a phoneme detection engine, and generating a combined lattice comprising a text part and a phoneme part. A word to be searched is searched for in the text part, and if not found, or is found with low certainty is divided into phonemes and searched for in the phoneme parts of the lattice.Type: ApplicationFiled: November 18, 2007Publication date: June 11, 2009Applicant: Nice SystemsInventors: Moshe WASSERBLAT, Barak Eilam, Yuval Lubowich, Maor Nissan
-
Publication number: 20090132255Abstract: Embodiments of the present invention improve methods of performing speech recognition with barge-in. In one embodiment, the present invention includes a speech recognition method comprising starting a synthesis of recorded speech, receiving a user speech input signal providing information regarding a user choice, detecting an initial portion of the user speech input signal, selectively altering the synthesis of recorded speech, and recognizing the user choice.Type: ApplicationFiled: November 19, 2007Publication date: May 21, 2009Applicant: Sensory, IncorporatedInventor: Younan Lu
-
Publication number: 20090106027Abstract: An object of the invention is to conveniently increase standard patterns registered in a voice recognition device to efficiently extend the amount of words that can be voice-recognized. New standard patterns are generated by modifying a part of an existing standard pattern. A pattern matching unit 16 of a modifying-part specifying unit 14 performs pattern matching process to specify a part to be modified in the existing standard pattern of a usage source. A standard pattern generating unit 18 generates the new standard patterns by cutting or deleting voice data of the modifying part of the usage-source standard pattern, substituting the voice data of the modifying part of the usage-source standard pattern for another voice data, or combining the voice data of the modifying part of the usage-source standard pattern with another voice data. A standard pattern database update unit 20 adds the new standard patterns to a standard pattern database 24.Type: ApplicationFiled: May 25, 2006Publication date: April 23, 2009Applicant: Matsushita Electric Industrial Co., Ltd.Inventors: Toshiyuki Teranishi, Kouji Hatano
-
Publication number: 20090063153Abstract: A system and method for generating a synthetic text-to-speech TTS voice are disclosed. A user is presented with at least one TTS voice and at least one voice characteristic. A new synthetic TTS voice is generated by blending a plurality of existing TTS voices according to the selected voice characteristics. The blending of voices involves interpolating segmented parameters of each TTS voice. Segmented parameters may be, for example, prosodic characteristics of the speech such as pitch, volume, phone durations, accents, stress, mis-pronunciations and emotion.Type: ApplicationFiled: November 4, 2008Publication date: March 5, 2009Applicant: AT&T Corp.Inventors: David A. Kapilow, Kenneth H. Rosen, Juergen Schroeter
-
Publication number: 20090055192Abstract: A device for use by a deafblind person is disclosed. The device comprises a first key for manually inputting a series of words in the form of a code, a second key for manually inputting an action to be performed by the device, a third key for manually inputting a user preference, and a fourth key for manually inputting communication instructions. The device further has an internal processor programmed to carry out communication functions and search and guide functions. The device has various safety and security functions for pedestrians or persons in transit. In a preferred embodiment, the device comprises an electronic cane known as an eCane. Also disclosed is a system for allowing a deafblind person to enjoy television programs.Type: ApplicationFiled: November 3, 2008Publication date: February 26, 2009Inventor: Raanan Liebermann
-
Publication number: 20080312920Abstract: An expressive speech-to-speech generation system which can generate expressive speech output by using expressive parameters extracted from the original speech signal to drive the standard TTS system. The system comprises: speech recognition means, machine translation means, text-to-speech generation means, expressive parameter detection means for extracting expressive parameters from the speech of language A, and expressive parameter mapping means for mapping the expressive parameters extracted by the expressive parameter detection means from language A to language B, and driving the text-to-speech generation means by the mapping results to synthesize expressive speech.Type: ApplicationFiled: August 23, 2008Publication date: December 18, 2008Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Shen Liqin, Shi Qin, Donald T. Tang, Zhang Wei
-
Patent number: 7467026Abstract: An autonomous robot is controlled by the local robot information controller which is connected to a robot application network to which the transceiver to communicate with the autonomous robot is attached. The robot application network, a user LAN adaptive controller an information distribution manager and the third party information provider subsystem are linked with a public network. The information distribution manager acquires the information from the third party information provider subsystem on the schedule which is set by the user LAN adaptive controller. The local robot information controller receives the information distribution manager and convert it into the data that generates robot gestures. The robot performs actions in accordance to the gesture data received from the local robot information controller.Type: GrantFiled: August 13, 2004Date of Patent: December 16, 2008Assignee: Honda Motor Co. Ltd.Inventors: Yoshiaki Sakagami, Shinichi Matsunaga, Naoaki Sumida
-
Publication number: 20080221904Abstract: A method for generating animated sequences of talking heads in text-to-speech applications wherein a processor samples a plurality of frames comprising image samples. The processor reads first data comprising one or more parameters associated with noise-producing orifice images of sequences of at least three concatenated phonemes which correspond to an input stimulus. The processor reads, based on the first data. second data comprising images of a noise-producing entity. The processor generates an animated sequence of the noise-producing entity.Type: ApplicationFiled: May 19, 2008Publication date: September 11, 2008Applicant: AT&T Corp.Inventors: Eric Cosatto, Hans Peter Graf, Juergen Schroeter
-
Publication number: 20080126099Abstract: A method of representing information to a person comprising displaying an image viewable by a person, the image comprising visual markers representative of portions of a human body minimally necessary to communicate with the person, the visual markers, when viewed by the person, causing the person to extrapolate the human body, a remainder of the image being visually silent with respect to the person. The method is particularly applicable to represent information so as to be perceivable by a hearing-impaired person (e.g. deaf person) wherein a plurality of images, when displayed, one after another on a display device, represent information perceivable by the hearing-impaired person via sign language.Type: ApplicationFiled: October 25, 2007Publication date: May 29, 2008Applicant: UNIVERSITE DE SHERBROOKEInventors: Denis Belisle, Johanne Deschenes
-
Publication number: 20080004861Abstract: A system and method for characterizing, synthesizing, and/or canceling out acoustic signals from inanimate sound sources is disclosed. Propagating wave electromagnetic sensors monitor excitation sources in sound producing systems, such as machines, musical instruments, and various other structures. Acoustical output from these sound producing systems is also monitored. From such information, a transfer function characterizing the sound producing system is generated. From the transfer function, acoustical output from the sound producing system may be synthesized or canceled. The methods disclosed enable accurate calculation of matched transfer functions relating specific excitations to specific acoustical outputs. Knowledge of such signals and functions can be used to effect various sound replication, sound source identification, and sound cancellation applications.Type: ApplicationFiled: September 6, 2007Publication date: January 3, 2008Inventors: John Holzrichter, Greg Burnett, Lawrence Ng
-
Patent number: RE45262Abstract: A navigation system and method involving wireless communications technology and speech processing technology is presented. In accordance with an embodiment of the invention, the navigation system includes a subscriber unit communicating with a service provider. The subscriber unit includes a global positioning system mechanism to determine subscriber position information and a speech processing mechanism to receive destination information spoken by a subscriber. The subscriber unit transmits the subscriber position and destination information to the service provider, which gathers navigation information, including a map and a route from the subscriber position to the specified destination. The service provider transmits the navigation information to the subscriber unit. The subscriber unit conveys the received navigation information to the subscriber via an output mechanism, such as a speech synthesis unit or a graphical display.Type: GrantFiled: December 2, 2004Date of Patent: November 25, 2014Assignee: Intel CorporationInventor: Christopher R. Wiener