Speech Synthesis; Text To Speech Systems (epo) Patents (Class 704/E13.001)
-
Publication number: 20110283190Abstract: An interface device and method of use, comprising audio and image inputs; a processor for determining topics of interest, and receiving information of interest to the user from a remote resource; an audiovisual output for presenting an anthropomorphic object conveying the received information, having a selectively defined and adaptively alterable mood; an external communication device adapted to remotely communicate at least a voice conversation with a human user of the personal interface device. Also provided is a system and method adapted to receive logic for, synthesize, and engage in conversation dependent on received conversational logic and a personality.Type: ApplicationFiled: May 12, 2011Publication date: November 17, 2011Inventor: Alexander POLTORAK
-
Publication number: 20110276332Abstract: A speech synthesis method comprising: receiving a text input and outputting speech corresponding to said text input using a stochastic model, said stochastic model comprising an acoustic model and an excitation model, said acoustic model having a plurality of model parameters describing probability distributions which relate a word or part thereof to a feature, said excitation model comprising excitation model parameters which are used to model the vocal chords and lungs to output the speech using said features; wherein said acoustic parameters and excitation parameters have been jointly estimated; and outputting said speech.Type: ApplicationFiled: May 6, 2011Publication date: November 10, 2011Applicant: Kabushiki Kaisha ToshibaInventors: Ranniery MAIA, Byung Ha Chun
-
Publication number: 20110270614Abstract: A method and an apparatus for switching speech or audio signals, wherein the method for switching speech or audio signals includes when switching of a speech or audio, weighting a first high frequency band signal of a current frame of speech or audio signal and a second high frequency band signal of the previous M frame of speech or audio signals to obtain a processed first high frequency band signal, where M is greater than or equal to 1, and synthesizing the processed first high frequency band signal and a first low frequency band signal of the current frame of speech or audio signal into a wide frequency band signal. In this way, speech or audio signals with different bandwidths can be smoothly switched, thus improving the quality of audio signals received by a user.Type: ApplicationFiled: June 16, 2011Publication date: November 3, 2011Applicant: HUAWEI TECHNOLOGIES CO., LTD.Inventors: Zexin Liu, Lei Miao, Chen Hu, Wenhai Wu, Yue Lang, Qing Zhang
-
Publication number: 20110270613Abstract: The disclosed solution includes a method for dynamically switching modalities based upon inferred conditions in a dialogue session involving a speech application. The method establishes a dialogue session between a user and the speech application. During the dialogue session, the user interacts using an original modality and a second modality. The speech application interacts using a speech modality only. A set of conditions indicative of interaction problems using the original modality can be inferred. Responsive to the inferring step, the original modality can be changed to the second modality. A modality transition to the second modality can be transparent the speech application and can occur without interrupting the dialogue session. The original modality and the second modality can be different modalities; one including a text exchange modality and another including a speech modality.Type: ApplicationFiled: July 8, 2011Publication date: November 3, 2011Applicant: Nuance Communications, Inc.Inventors: William V. Da Palma, Baiju D. Mandalia, Victor S. Moore, Wendi L. Nusbickel
-
Publication number: 20110264452Abstract: Example embodiments disclosed herein relate to audio output of speech data using speech control commands. In particular, example embodiments include a mechanism for accessing text data. Example embodiments may also include a mechanism for outputting the text data as audio by converting the text data to speech audio data and transmitting the speech audio data over an audio output. Example embodiments may also include a mechanism for receiving speech control commands that allow for voice control of the output of the audio data.Type: ApplicationFiled: April 27, 2010Publication date: October 27, 2011Inventors: Ramya Venkataramu, Molly Joy
-
Publication number: 20110260832Abstract: In one embodiment, a method includes enrolling a potential enrollee for an identity-monitoring service. The enrolling includes acquiring personally-identifying information (PII) and capturing a voiceprint. Following successful completion of the enrolling, the potential enrollee is an enrollee. The method further includes, responsive to an identified suspicious event related to the PII, creating an identity alert, establishing voice communication with an individual purporting to be the enrollee, and performing voice-biometric verification of the individual. The voice-biometric verification includes comparing one or more spoken utterances with the voiceprint. Following successful completion of the voice-biometric verification, the individual is a verified enrollee. In addition, the method includes authorizing delivery of the identity alert to the verified enrollee.Type: ApplicationFiled: April 25, 2011Publication date: October 27, 2011Inventors: Joe Ross, Isaac Chapa, Adrian Cruz, Harold E. Gottschalk, JR.
-
Publication number: 20110243447Abstract: Method and apparatus of synthesizing speech from a plurality of portion of text data, each portion having at least one associated attribute. The invention is achieved by determining (25, 35, 45) a value of the attribute for each of the portions of text data, selecting (27, 37, 47) a voice from a plurality of candidate voices on the basis of each of said determined attribute values, and converting (29, 39, 49) each portion of text data into synthesized speech using said respective selected voice.Type: ApplicationFiled: December 7, 2009Publication date: October 6, 2011Applicant: KONINKLIJKE PHILIPS ELECTRONICS N.V.Inventor: Franciscus Johannes Henricus Maria Meulenbroeks
-
Publication number: 20110246201Abstract: While performing a function, a mobile device identifies that it is idle while it is downloading content or performing another task. During that idle time, it gathers one or more parameters (e.g., location, time, gender of user, age of user, etc.) and sends a request for an audio message (e.g., audio advertisement). One or more servers at a remote facility receive the request with the one or more parameters, and use the parameters to identify a targeted message. In some cases, the targeted message will include one or more dynamic variables (e.g., distance to store, time to event, etc.) that will be replaced based on the parameters received from the mobile device, so that the audio message is dynamically updated and customized for the mobile device. In one embodiment, the targeted message is transmitted to the mobile device as text. After being received at the mobile device, the text is optionally displayed and converted to an audio format and played for the user.Type: ApplicationFiled: April 6, 2010Publication date: October 6, 2011Inventor: Andre F. Hawit
-
Publication number: 20110246199Abstract: According to one embodiment, a speech synthesizer generates a speech segment sequence and synthesizes speech by connecting speech segments of the generated speech segment sequence. If a speech segment of a synthesized first speech segment sequence is different from the speech segment of a synthesized second speech segment sequence having the same synthesis unit as the first speech segment sequence, the speech synthesizer disables the speech segment of the first speech segment sequence that is different from the speech segment of the second speech segment sequence.Type: ApplicationFiled: September 14, 2010Publication date: October 6, 2011Applicant: KABUSHIKI KAISHA TOSHIBAInventors: Osamu NISHIYAMA, Takehiko Kagoshima
-
Publication number: 20110243310Abstract: A system may include a database configured to selectively store and retrieve data. The system may further include a call record parser configured to receive a plurality of call records, each call record being associated with a respective call, parse the plurality of call records to identify periods of resource usage and types of resource usage for the associated calls, create parsed data based on the identified periods of resource usage and the types of resource usage, and store the parsed data in the database indexed according to the type of the identified resource and including the start and end times for the identified periods of usage.Type: ApplicationFiled: March 30, 2010Publication date: October 6, 2011Applicant: Verizon Patent and Licensing Inc.Inventors: Belinda Franklin-Barr, John Rivera
-
Publication number: 20110238420Abstract: According to one embodiment, a method for editing speech is disclosed. The method can generate speech information from a text. The speech information includes phonologic information and prosody information. The method can divide the speech information into a plurality of speech units, based on at least one of the phonologic information and the prosody information. The method can search at least two speech units from the plurality of speech units. At least one of the phonologic information and the prosody information in the at least two speech units are identical or similar. In addition, the method can store a speech unit waveform corresponding to one of the at least two speech units as a representative speech unit into a memory.Type: ApplicationFiled: September 13, 2010Publication date: September 29, 2011Applicant: KABUSHIKI KAISHA TOSHIBAInventors: Gou Hirabayashi, Takehiko Kagoshima
-
Publication number: 20110231192Abstract: A system and method for generating audio content. Content is automatically retrieved from an original website according to a predetermined schedule to generate retrieved content. The retrieved content is converted to one or more audio file. A hierarchy is assigned to the one or more audio files to provide an audible website that mimics a hierarch of the retrieved content as represented at the original website. The audible website is stored in a database for retrieval by one or more users. A first user input is received indicating an attempt to access the original website. The audible website is indicated as being associated with the original website in response to the user selection. Portion of the audible website are played in response to a second user input.Type: ApplicationFiled: May 2, 2011Publication date: September 22, 2011Inventors: William C. O'Conor, Nathan T. Bradley
-
Publication number: 20110231193Abstract: Various technologies for generating a synthesized singing voice waveform. In one implementation, the computer program may receive a request from a user to create a synthesized singing voice using the lyrics of a song and a digital file containing its melody as inputs. The computer program may then dissect the lyrics' text and its melody file into its corresponding sub-phonemic units and musical score respectively. The musical score may be further dissected into a sequence of musical notes and duration times for each musical note. The computer program may then determine a fundamental frequency (F0), or pitch, of each musical note.Type: ApplicationFiled: June 2, 2011Publication date: September 22, 2011Applicant: Microsoft CorporationInventors: Yao Qian, Frank Soong
-
Publication number: 20110205849Abstract: A digital calendar device involving an electronic device having a graphic user interface. The electronic device is capable of receiving at least one note from a user, and retrieving and displaying the at least one note, as well as running on at least one operating system. The digital calendar device further includes features, such retrieval and displaying features, a touch screen interface, handwriting support feature, a speech recognition feature, a reminder feature, a frame feature, a mechanical support feature, and a fortune-teller software program.Type: ApplicationFiled: February 23, 2010Publication date: August 25, 2011Applicant: SONY CORPORATION, A JAPANESE CORPORATIONInventor: Feng Kang
-
Publication number: 20110200214Abstract: A hearing aid includes a microphone to convert audible sounds into sound-related electrical signals and a memory configured to store a plurality of hearing aid profiles. Each hearing aid profile has an associated audio label. The hearing aid further includes a processor coupled to the microphone and to the memory and configured to select one of the plurality of hearing aid profiles. The processor applies the one of the plurality of hearing aid profiles to the sound-related electrical signals to produce a shaped output signal to compensate for a hearing impairment of a user. The processor is configured to insert the associated audio label into the shaped output signal. The hearing aid also includes a speaker coupled to the processor and configured to convert the shaped output signal into an audible sound.Type: ApplicationFiled: February 8, 2011Publication date: August 18, 2011Applicant: AUDIOTONIQ, INC.Inventors: John Michael Page Knox, David Matthew Landry, Samir Ibrahim, Andrew Lawrence Eisenberg
-
Publication number: 20110202344Abstract: Techniques for providing speech output for speech-enabled applications. A synthesis system receives from a speech-enabled application a text input including a text transcription of a desired speech output. The synthesis system selects one or more audio recordings corresponding to one or more portions of the text input. In one aspect, the synthesis system selects from audio recordings provided by a developer of the speech-enabled application. In another aspect, the synthesis system selects an audio recording of a speaker speaking a plurality of words. The synthesis system forms a speech output including the one or more selected audio recordings and provides the speech output for the speech-enabled application.Type: ApplicationFiled: February 12, 2010Publication date: August 18, 2011Inventors: Darren C. Meyer, Corinne Bos-Plachez, Martine Marguerite Staessen
-
Publication number: 20110202347Abstract: A communication converter is described for converting among speech signals and textual information, permitting communication between telephone users and textual instant communications users.Type: ApplicationFiled: April 28, 2011Publication date: August 18, 2011Applicant: VERIZON BUSINESS GLOBAL LLCInventors: Richard G. Moore, Gregory L. Mumford, Duraisamy Gunasekar
-
Publication number: 20110196680Abstract: When a system (100) is used for synthesizing speech having prosody serving as a reference, the system stores speech element information representing a speech element capable of synthesizing speech having a degree of naturalness indicating a degree of similarity to speech uttered by a human higher than a predetermined reference value (speech element information storage (115)). The system accepts requested prosody information representing prosody requested by the user (requested prosody information accepting part (113)). The system generates intermediate prosody information representing intermediate prosody between the reference prosody and the requested prosody (intermediate prosody information generator (114)). The system executes a speech synthesis process to synthesize speech based on the generated intermediate prosody information and the stored speech element information (speech synthesizer (116)).Type: ApplicationFiled: August 21, 2009Publication date: August 11, 2011Applicant: NEC CORPORATIONInventor: Masanori Kato
-
Publication number: 20110184738Abstract: TTS is a well known technology for decades used for various applications from Artificial Call centers attendants to PC software that allows people with visual impairments or reading disabilities to listen to written works on a home computer. However to date TTS is not widely adopted for PC and Mobile users for daily reading tasks such as reading emails, reading pdf and word documents, reading through website content, and for reading books. The present invention offers new user experience for operating TTS for day to day usage. More specifically this invention describes a synchronization technique for following text being read by TTS engines and specific interfaces for touch pads, touch and multi touch screens. Nevertheless this invention also describes usage of other input methods such as touchpad, mouse, and keyboard.Type: ApplicationFiled: January 25, 2011Publication date: July 28, 2011Inventors: Dror KALISKY, Sharon CARMEL
-
Publication number: 20110179452Abstract: A device for providing a television sequence has a database interface, a search request receiver, a television sequence rendition module and an output interface. The database interface accesses at least one database, using a search request. The search request receiver is formed to control the database interface so as to acquire at least audio content and at least image content separate therefrom via the database interface for the search request. The television sequence rendition module combines the separate audio content and the image content to generate the television sequence based on the audio content and the image content. The output interface outputs the television sequence to a television sequence distributor.Type: ApplicationFiled: January 21, 2011Publication date: July 21, 2011Inventors: Peter Dunker, Uwe Kuehhirt, Andreas Haupt, Christian Dittmar, Holger Grossman
-
Publication number: 20110153754Abstract: In various embodiments, a method for receiving alerts through a network includes providing a device having a pop-up management module and a display; providing a communications interface between the device and one or more database systems located outside the network; providing a user interface configured to allow the user to selectively choose to display, on the display, one or more message types generated by the one or more database systems, wherein said one or more message types are received by said pop-up management module via the network and displayed on the display as a pop-up message. A related system includes a device registered in the network having a processor, a memory device, a transceiver, a user interface, and a display, wherein the processor is configured to control a pop-up management module for displaying one or more message types as a pop-up message. The device may be a WiMAX-enabled device and the network may be a WiMAX network.Type: ApplicationFiled: December 22, 2009Publication date: June 23, 2011Applicant: CLEAR WIRELESS, LLCInventor: Don GUNASEKARA
-
Publication number: 20110153314Abstract: A method for dynamically adjusting the spectral content of an audio signal, which increases the harmonic content of said audio signal, said method comprising translating an encoded digital signal into data bands, creating a psychoacoustic model to identify sections of said data bands that are deficient in harmonic quality, analyzing the fundamental frequency and amplitude of said harmonically deficient data bands, creating additional higher order harmonics for said harmonically deficient data bands, adding said higher order harmonics back to said encoded digital signal to form a newly enhanced signal, inverse filtering said newly enhanced signal, and converting said inverse filtered signal to an analog waveform for consumption by the listener.Type: ApplicationFiled: February 28, 2011Publication date: June 23, 2011Inventors: J. Craig Oxford, Patrick Taylor, D. Michael Shields
-
Publication number: 20110144997Abstract: A voice synthesis model generation device, a voice synthesis model generation system, a communication terminal device, and a method for generating a voice synthesis model all of which are capable of preferably acquiring a user's voice. A voice synthesis model generation system is configured to include a mobile communication terminal device and a voice synthesis model generation device. The mobile communication terminal device includes a characteristic amount extraction portion that extracts a characteristic amount of input voice, and a text data acquisition portion that acquires text data from the voice.Type: ApplicationFiled: July 7, 2009Publication date: June 16, 2011Applicant: NTT DOCOMO, INCInventor: Noriko Mizuguchi
-
Publication number: 20110137655Abstract: A speech synthesis system includes a server device and a client device. The server device stores speech element information and speech element identification information in association with each other so that, in a case that speech element information representing respective speech elements included in speech uttered by a speech registering user are arranged in the order of arrangement of the speech elements in the speech, at least one of speech element identification information identifying the respective speech element information has different information from information arranged in accordance with a predetermined rule. The client device transmits speech element identification information to the server device based on accepted text information. The client device executes a speech synthesis process based on the speech element information received from the server device.Type: ApplicationFiled: June 22, 2009Publication date: June 9, 2011Inventors: Reishi Kondo, Masanori Kato, Yasuyuki Mitsui
-
Publication number: 20110131516Abstract: Provided are a content display device and a content display method each capable of reliably providing, even if a plurality of content items displayed on a single screen are to be read aloud consecutively, a user with voice reading each article, a program therefor, and a storage medium storing the program. A television (100) is a content display device capable of displaying a plurality of content items in a single screen and sequentially reading aloud by voice text strings relating to the respective content items. The television (100) includes a setting section (114) for setting the screen to have a display condition displaying a content item, among the content items, which has a text string relating to the content item and being currently read aloud in order to notify a user of the content item in such a manner that the content item is distinguishable from the other content item(s).Type: ApplicationFiled: July 16, 2009Publication date: June 2, 2011Applicant: SHARP KABUSHIKI KAISHAInventors: Hirofumi Furukawa, Kiyotaka Kashito
-
Publication number: 20110111805Abstract: A communication device establishes an audio connection with a far-end user via a communication network. The communication device receives text input from a near-end user, and converts the text input into speech signals. The speech signals are transmitted to the far-end user using the established audio connection while muting audio input to its microphone. Other embodiments are also described and claimed.Type: ApplicationFiled: November 6, 2009Publication date: May 12, 2011Applicant: Apple Inc.Inventors: Baptiste P. Paquier, Aram M. Lindahl, Phillip G. Tamchina
-
Publication number: 20110112836Abstract: Electronic device and method for obtaining a digital speech signal and a control command relating to the digital speech signal while obtaining the digital speech signal, and for temporally associating the control command with a substantially corresponding time instant in the digital speech signal to which the control command was directed, wherein the control command determines one or more punctuation marks or another, optionally symbolic, elements to be at least logically positioned at a text location corresponding to the communication instant relative to the digital speech signal so as to cultivate the speech to text conversion procedure.Type: ApplicationFiled: July 3, 2008Publication date: May 12, 2011Applicant: MOBITER DICTA OYInventors: Risto Kurki-Suonio, Andrew Cotton
-
Publication number: 20110106538Abstract: This speech synthesis system includes a server device and a client device. The client device accepts text information representing text, and transmits a speech element request to the server device. The server device stores speech element information. The server device receives the speech element request transmitted by the client device and, in response to the received speech element request, transmits speech element information to the client device so that the speech element information is received by the client device in a different order from an order of arrangement of speech elements in speech corresponding to the text. The client device executes a speech synthesis process by rearranging the speech element information so that speech elements represented by the received speech element information are arranged in the same order as the order of arrangement of the speech elements in the speech corresponding to the text.Type: ApplicationFiled: June 22, 2009Publication date: May 5, 2011Inventors: Reishi Kondo, Masanori Kato, Yasuyuki Mitsui
-
Publication number: 20110106537Abstract: Embodiments of the invention address the deficiencies of the prior art by providing a method, apparatus, and program product to of converting components of a web page to voice prompts for a user. In some embodiments, the method comprises selectively determining at least one HTML component from a plurality of HTML components of a web page to transform into a voice prompt for a mobile system based upon a voice attribute file associated with the web page. The method further comprises transforming the at least one HTML component into parameterized data suitable for use by the mobile system based upon at least a portion of the voice attribute file associated with the at least one HTML component and transmitting the parameterized data to the mobile system.Type: ApplicationFiled: October 30, 2009Publication date: May 5, 2011Inventors: Paul M. Funyak, Norman J. Connors, Paul E. Kolonay, Matthew Aaron Nichols
-
Publication number: 20110099014Abstract: Systems and methods are described for performing packet loss concealment (PLC) to mitigate the effect of one or more lost frames within a series of frames that represent a speech signal. In accordance with the exemplary systems and methods, PLC is performed by searching a codebook of speech-related parameter profiles to identify content that is being spoken and by selecting a profile associated with the identified content for use in predicting or estimating speech-related parameter information associated with one or more lost frames of a speech signal. The predicted/estimated speech-related parameter information is then used to synthesize one or more frames to replace the lost frame(s) of the speech signal.Type: ApplicationFiled: September 21, 2010Publication date: April 28, 2011Applicant: BROADCOM CORPORATIONInventor: Robert W. Zopf
-
Publication number: 20110083075Abstract: An emotive advisory system for use by one or more occupants of an automotive vehicle includes a directional speaker array, and a computer. The computer is configured to determine an audio direction, and output data representing an avatar for visual display. The computer is further configured to output data representing a spoken statement for the avatar for audio play from the speaker array such that the audio from the speaker array is directed in the determined audio direction. A visual appearance of the avatar and the spoken statement for the avatar convey a simulated emotional state.Type: ApplicationFiled: October 2, 2009Publication date: April 7, 2011Applicant: FORD GLOBAL TECHNOLOGIES, LLCInventors: Perry Robinson MacNeille, Oleg Yurievitch Gusikhin, Kacie Alane Theisen
-
Patent number: 7921014Abstract: A system for generating high-quality synthesized text-to-speech includes a learning data generating unit, a frequency data generating unit, and a setting unit. The learning data generating unit recognizes inputted speech, and then generates first learning data in which wordings of phrases are associated with readings thereof. The frequency data generating unit generates, based on the first learning data, frequency data indicating appearance frequencies of both wordings and readings of phrases. The setting unit sets the thus generated frequency data for a language processing unit in order to approximate outputted speech of text-to-speech to the inputted speech. Furthermore, the language processing unit generates, from a wording of text, a reading corresponding to the wording, on the basis of the appearance frequencies.Type: GrantFiled: July 9, 2007Date of Patent: April 5, 2011Assignee: Nuance Communications, Inc.Inventors: Gakuto Kurata, Toru Nagano, Masafumi Nishimura, Ryuki Tachibana
-
Publication number: 20110077945Abstract: This invention relates to a method, a computer program product, apparatuses and a system for extracting coded parameter set from an encoded audio/speech stream, said audio/speech stream being distributed to a sequence of packets, and generating a time scaled encoded audio/speech stream in the parameter coded domain using said extracted coded parameter set.Type: ApplicationFiled: June 6, 2007Publication date: March 31, 2011Applicant: NOKIA CORPORATIONInventors: Pasi Sakari Ojala, Ari Kalevi Lakaniemi
-
Publication number: 20110077048Abstract: The invention relates to a system for data correlation, having: a receiving device 1 having an image acquisition element 10 and a data set generator 12 for generating at least one object data set from at least one acquired first image, which represents a physical object, and an identification label, which uniquely determines an object-related acquisition procedure, and at least one information data set from at least one acquired second image, which represents coded information related to the physical object, and the identification label; a correlation device 2 for the extraction 20 of the coded information from the information data set, for the semantic analysis 22 of the extracted information, and for the generation of at least one combination data sets ? from the results of the semantic analysis, the extracted information, and the at least one object data set with the same identification label as the extracted information data set; and a user device 3 for the storage and further use of the combination dataType: ApplicationFiled: March 3, 2009Publication date: March 31, 2011Applicant: Linguatec Sprachtechnologien GmbHInventor: Reinhard Busch
-
Publication number: 20110060590Abstract: A synthetic speech text-input device is provided that allows a user to intuitively know an amount of an input text that can be fit in a desired duration. A synthetic speech text-input device 1 includes: an input unit that receives a set duration in which a speech to be synthesized is to be fit, and a text for a synthetic speech; a text amount calculation unit that calculates an acceptable text amount based on the set duration received by the input unit, the acceptable text amount being an amount of a text acceptable as a synthetic speech of the set duration; and a text amount output unit that outputs the acceptable text amount calculated by the text amount calculation unit, when the input unit receives the text.Type: ApplicationFiled: September 10, 2010Publication date: March 10, 2011Applicant: JUJITSU LIMITEDInventors: Nobuyuki Katae, Kentaro Murase
-
Publication number: 20110050594Abstract: A user interface for a touch-screen display of a dedicated handheld electronic book reader device is described. The user interface detects human gestures manifest as pressure being applied by a finger or stylus to regions on the touch-screen display. In one implementation, the touch-screen user interface enables a user to turn one or more pages in response to applying a force or pressure to the touch-screen display. In another implementation, the touch-screen user interface is configured to bookmark a page temporarily by applying a pressure to the display, then allowing a user to turn pages to a new page, but reverting back to a previously-displayed page when the pressure is removed. In another implementation, the touch-screen user interface identifies and filters electronic books based on book size and/or a time available to read a book. In another implementation, the touch-screen user interface converts text to speech in response to a user touching the touch-screen display.Type: ApplicationFiled: September 2, 2009Publication date: March 3, 2011Inventors: John T. Kim, Christopher Green, Joseph J. Hebenstreit, Kevin E. Keller
-
Publication number: 20110054880Abstract: Techniques and systems for content transformation between devices are disclosed. In one aspect, a system includes a host device that sends content to client devices, and client devices that receive content from the host device in one format and transform the content into a different format. The client devices present the transformed content to users. In another aspect, the host device presents content in a native format, determines that a client device requires the content to be in a different format, converts the content to a reference format, and sends the converted content to the client device.Type: ApplicationFiled: September 2, 2009Publication date: March 3, 2011Inventor: Christopher B. Fleizach
-
Publication number: 20110046957Abstract: Techniques are disclosed for frequency splicing in which speech segments used in the creation of a final speech waveform are constructed, at least in part, by combining (e.g., summing) a small number (e.g., two) of component speech segments that overlap substantially, or entirely, in time but have spectral energy that occupies disjoint, or substantially disjoint, frequency ranges. The component speech segments may be derived from speech segments produced by different speakers or from different speech segments produced by the same speaker. Depending on the embodiment, frequency splicing may supplement rule-based, concatenative, hybrid, or limited-vocabulary speech synthesis systems to provide various advantages.Type: ApplicationFiled: August 24, 2010Publication date: February 24, 2011Applicant: NovaSpeech, LLCInventors: Susan R. Hertz, Harold G. Mills
-
Publication number: 20110046943Abstract: A data processing method and apparatus that may set emotion based on development of a story are provided. The method and apparatus may set emotion without inputting emotion for each sentence of text data. Emotion setting information is generated based on development of the story and the like, and may be applied to the text data.Type: ApplicationFiled: April 5, 2010Publication date: February 24, 2011Applicant: SAMSUNG ELECTRONICS CO., LTD.Inventors: Dong Yeol Lee, Seung Seop Park, Jae Hyun Ahn
-
Publication number: 20110040554Abstract: A procedure to automatically evaluate the spoken fluency of a speaker by prompting the speaker to talk on a given topic, recording the speaker's speech to get a recorded sample of speech, and then analyzing the patterns of disfluencies in the speech to compute a numerical score to quantify the spoken fluency skills of the speakers. The numerical fluency score accounts for various prosodic and lexical features, including formant-based filled-pause detection, closely-occurring exact and inexact repeat N-grams, normalized average distance between consecutive occurrences of N-grams. The lexical features and prosodic features are combined to classify the speaker with a C-class classification and develop a rating for the speaker.Type: ApplicationFiled: August 15, 2009Publication date: February 17, 2011Applicant: International Business Machines CorporationInventors: Kartik Audhkhasi, Om D. Deshmukh, Kundan Kandhway, Ashish Verma
-
Publication number: 20110035222Abstract: Systems and methods for selecting one of several audio clips associated with a text item for playback are provided. The electronic device can determine which audio clip to play back at any point in time using different approaches, including for example receiving a user selection or randomly selecting audio clips. In some embodiments, the electronic device can intelligently select audio clips based on attributes of the media item, the electronic device operations, or the environment of the electronic device. The attributes can include, for example, metadata values of the media item, the type of ongoing operations of the electronic device, and environmental characteristics that can be measured or detected using sensors of or coupled to the electronic device. Different audio clips can be associated with particular attribute values, such that an audio clip corresponding to the detected or received attribute values are played back.Type: ApplicationFiled: August 4, 2009Publication date: February 10, 2011Applicant: Apple Inc.Inventor: Jon Schiller
-
Publication number: 20110035223Abstract: Systems and methods for retrieving and playing back audio clips for streamed or remotely received media items are provided. An electronic device can provide audio clips identifying media items at any suitable time, including for example to identify media items that are currently played back or available for playback. When the media items played back are not locally stored, the electronic device may not have a corresponding audio clip locally stored. In such cases, the electronic device can identify a streamed media item, and retrieve an audio clip corresponding to text items associated with the media item. For example, the electronic device can retrieve audio clips corresponding to the artist, title and album of the received media item. The electronic device can retrieve audio clips from any suitable source, such as a dedicated audio clip server or other remote source, a remote text-to-speech engine, or a locally stored text-to-speech engine.Type: ApplicationFiled: August 4, 2009Publication date: February 10, 2011Applicant: Apple Inc.Inventor: Jon Schiller
-
Publication number: 20110018889Abstract: A media processing comparison system (“MPCS”) and techniques facilitate concurrent, subjective quality comparisons between media presentations produced by different instances of media processing components performing the same functions (for example, instances of media processing components in the form of hardware, software, and/or firmware, such as parsers, codecs, decryptors, and/or demultiplexers, supplied by the same or different entities) in a particular media content player. The MPCS receives an ordered stream of encoded media samples from a media source, and decodes a particular encoded media sample using two or more different instances of media processing components. A single renderer renders and/or coordinates the synchronous presentation of decoded media samples from each instance of media processing component(s) as separate media presentations. The media presentations may be subjectively compared and/or selected for storage by a user in a sample-by-sample manner.Type: ApplicationFiled: July 23, 2009Publication date: January 27, 2011Applicant: MICROSOFT CORPORATIONInventors: Firoz Dalal, Shyam Sadhwani
-
Publication number: 20110015930Abstract: A unified communication system is disclosed that allows a variety of end point types to participate in a communication event using a common, unified communication system. In some implementations, a calling party interacts with a client application residing on an endpoint to make a communication request to another endpoint. A communication event manager residing in the unified communication system selects a script from a repository of scripts based on the communication event and the capabilities of the endpoints. A communication event execution engine receives a user profile associated with at least one of the endpoints. The user profile can be configured by the user to describe the user's preferences for how the communication should be processed by the unified communication system.Type: ApplicationFiled: September 7, 2010Publication date: January 20, 2011Applicant: INTELEPEER, INC.Inventors: John Ward, Haydar Haba, Charles Studt, Peter Antypas, Jonathan Green
-
Publication number: 20110015929Abstract: A contextual input device includes a plurality of tactually discernable keys disposed in a predetermined configuration which replicates a particular relationship among a plurality of items associated with a known physical object. The tactually discernable keys are typically labeled with Braille type. The known physical object is typically a collection of related items grouped together by some common relationship. A computer-implemented process determines whether a input signal represents a selection of an item from among a plurality of items or an attribute pertaining to an item among the plurality of items. Once the selected item or attribute pertaining to an item is determined, the computer-implemented process transforms a user's selection from the input signal into an analog audio signal which is then audibly output as human speech with an electro-acoustic transducer.Type: ApplicationFiled: July 17, 2009Publication date: January 20, 2011Applicant: Calpoly CorporationInventors: Dennis Fantin, C. Arthur MacCarley
-
Publication number: 20110010179Abstract: A method and an apparatus for voice synthesis and processing have been presented. In one exemplary method, a first audio recording of a human speech in a natural language is received. Then speech analysis synthesis algorithm is applied to the first audio recording to synthesize a second audio recording from the first audio recording such that the second audio recording sounds humanistic and consistent, but unintelligible.Type: ApplicationFiled: July 13, 2009Publication date: January 13, 2011Inventor: Devang K. Naik
-
Publication number: 20100332232Abstract: A method and device for updating statuses of synthesis filters are provided. The method includes: exciting a synthesis filter corresponding to a first encoding rate by using an excitation signal of the first encoding rate, outputting reconstructed signal information, and updating status information of the synthesis filter and a synthesis filter corresponding to a second encoding rate. In the present disclosure, the status of the synthesis filter corresponding to the current rate and the statuses of the synthesis filters at other rates are updated. Thus, synchronization between the statuses of the synthesis filters corresponding to different rates at the encoding terminal may be realized, thereby facilitating the consistency of the reconstructed signals of the encoding and decoding terminals when the encoding rate is switched, and improving the quality of the reconstructed signal of the decoding terminal.Type: ApplicationFiled: September 16, 2010Publication date: December 30, 2010Inventor: Jinliang DAI
-
Publication number: 20100330975Abstract: The invention provides a internet radio interface for use in vehicles. The interface allows a device unit, with wireless capability and voice interface technology, to communicate with a vehicle, mobile phone, and portal in order to manage and upload various user preferences to the device unit as set out by the user prior to getting into the vehicle. The device unit interacts with the user to permit various functions and access preferable channels as well as managing secondary functions of the user, including cell phone communications.Type: ApplicationFiled: June 28, 2010Publication date: December 30, 2010Inventor: Otman A. Basir
-
Publication number: 20100324907Abstract: The invention proposes the synthesis of a signal consisting of consecutive blocks. It proposes more particularly, on receipt of such a signal, to replace, by synthesis, lost or erroneous blocks of this signal. To this end, it proposes an attenuation of the overvoicing during the generation of a signal synthesis. More particularly, a voiced excitation is generated on the basis of the pitch period (T) estimated or transmitted at the previous block, by optionally applying a correction of plus or minus a sample of the duration of this period (counted in terms of number of samples), by constituting groups (A?,B?,C?,D?) of at least two samples and inverting positions of samples in the groups, randomly (B?,C?) or in a forced manner. An over-harmonicity in the excitation generated is thus broken and the effect of overvoicing in the synthesis of the generated signal is thereby attenuated.Type: ApplicationFiled: October 17, 2007Publication date: December 23, 2010Applicant: France TelecomInventors: David Virette, Balazs Kovesi
-
Publication number: 20100324902Abstract: Disclosed are techniques and systems to provide a narration of a text in multiple different voices. In some aspects, systems and methods described herein can include receiving a user-based selection of a first portion of words in a document where the document has a pre-associated first voice model and overwriting the association of the first voice model, by the one or more computers, with a second voice model for the first portion of words.Type: ApplicationFiled: January 14, 2010Publication date: December 23, 2010Inventors: Raymond Kurzweil, Paul Albrecht, Peter Chapman