Speech Synthesis; Text To Speech Systems (epo) Patents (Class 704/E13.001)

E Subclasses

Methods for producing synthetic speech; speech synthesizers (epo) (Class 704/E13.002)

Concept-to-speech synthesizers; generation of natural phrases not from text but from machine-based concepts (EPO) (Class 704/E13.003)
Sound editing, manipulating voice of the synthesizer (EPO) (Class 704/E13.004)

Details of speech synthesis systems, e.g., synthesizer architecture, memory management, etc. (epo) (Class 704/E13.005)

Elementary speech units used in speech synthesizers; concatenation rules (epo) (Class 704/E13.009)

Concatenation (EPO) (Class 704/E13.01)

Text analysis, generation of parameters for speech synthesis out of text, e.g., grapheme to phoneme translation, prosody generation, stress, or intonation determination, etc. (epo) (Class 704/E13.011)

METHOD AND APPARATUS FOR CREATING CUSTOMIZED TEXT-TO-SPEECH PODCASTS AND VIDEOS INCORPORATING ASSOCIATED MEDIA

Publication number: 20090204243

Abstract: Method and apparatus for creating a customized text-to-speech podcast by receiving a text file, parsing and tagging the text file, creating multiple audio files by text-to-speech technology, and creating a podcast by combining the audio files. The podcast can be an audio podcast or a video podcast. Video podcasts associate related video content with the audio content.

Type: Application

Filed: January 9, 2009

Publication date: August 13, 2009

Inventors: Harpreet MARWAHA, Brett ROBINSON
INFORMATION PROCESSING DEVICE AND INFORMATION PROCESSING METHOD

Publication number: 20090195351

Abstract: A information processing device includes: a busy-level acquiring section for acquiring information on user's busy-level; a controller for determining a presentation form of information currently presented according to the user's busy-level acquired by the busy-level acquiring section; an information processor for performing a predetermined processing to the information under the control of the controller; and an output processor for outputting the information having been subjected to the processing by the information processor to an output section.

Type: Application

Filed: January 21, 2009

Publication date: August 6, 2009

Applicant: Sony Corporation

Inventors: Naoki Takeda, Tetsujiro Kondo
SCRIPT CONTROL FOR LIP ANIMATION IN A SCENE GENERATED BY A COMPUTER RENDERING ENGINE

Publication number: 20090184967

Abstract: A system for controlling a rendering engine by using specialized commands. The commands are used to generate a production, such as a television show, at an end-user's computer that executes the rendering engine. In one embodiment, the commands are sent over a network, such as the Internet, to achieve broadcasts of video programs at very high compression and efficiency. Commands for setting and moving camera viewpoints, animating characters, and defining or controlling scenes and sounds are described. At a fine level of control math models and coordinate systems can be used make specifications. At a coarse level of control the command language approaches the text format traditionally used in television or movie scripts. Simple names for objects within a scene are used to identify items, directions and paths. Commands are further simplified by having the rendering engine use defaults when specifications are left out.

Type: Application

Filed: October 27, 2008

Publication date: July 23, 2009

Inventor: Charles J. Kulas
HANDHELD DICTIONARY AND TRANSLATION APPARATUS

Publication number: 20090182548

Abstract: Provided herein is a dictionary and/or translation device that is handheld and, thus, portable. The device may be incorporated in or with a cellular telephone, a personal digital assistant (PDA), a pager, a handheld computer, or the like. A page of text (or a portion thereof may be photographed as a digital image, and the digitally photographed page may be converted to electronic text using an optical character recognition system. The converted text may be viewed on a display that includes a system for highlighting a word or words from the output of the optical character recognition system for definition or translation. The definition or translation of the highlighted word or words may be provided and output to either one of the display or an audio output (such as a speaker) or both.

Type: Application

Filed: November 24, 2008

Publication date: July 16, 2009

Inventor: Jan Scott Zwolinski
Voice Intelligibility Enhancement System and Voice Intelligibility Enhancement Method

Publication number: 20090175459

Abstract: In a voice intelligibility enhancement system that controls a gain of a voice signal based on noise power and voice power of the voice signal generated by a voice signal generation unit, it is detected whether the voice power is equal to or greater than a predetermined level, noise power output when the voice power is less than the predetermined level is measured and stored, noise power to be output when the voice power exceeds the predetermined level is estimated to be the stored noise power, and gain of a voice signal is controlled on the basis of the voice power and the estimated noise power.

Type: Application

Filed: December 19, 2008

Publication date: July 9, 2009

Inventors: Toru Marumoto, Nozomu Saito
SYSTEMS AND METHODS FOR RESPONDING TO NATURAL LANGUAGE SPEECH UTTERANCE

Publication number: 20090171664

Abstract: Systems and methods for receiving natural language queries and/or commands and execute the queries and/or commands. The systems and methods overcomes the deficiencies of prior art speech query and response systems through the application of a complete speech-based information query, retrieval, presentation and command environment. This environment makes significant use of context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for one or more users making queries or commands in multiple domains. Through this integrated approach, a complete speech-based natural language query and response environment can be created. The systems and methods creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context and presenting the expected results for a particular question or command.

Type: Application

Filed: February 4, 2009

Publication date: July 2, 2009

Inventors: Robert A. Kennewick, David Locke, Michael R. Kennewick, SR., Michael R. Kennewick, JR., Richard Kennewick, Tom Freeman
Interpolation Device, Audio Reproduction Device, Interpolation Method, and Interpolation Program

Publication number: 20090171666

Abstract: An interpolation device (4) includes a band extraction high-pass filter (11) for extracting a frequency component of a predetermined lower limit frequency or above from reproduction data obtained by digitizing an audio waveform signal; a multiplier (13) for frequency-shifting the frequency component extracted by the band extraction high-pass filter (11); lower side wave band suppression high-pass filter (14) suppressing the frequency component of the lower side wave band in the frequency component subjected to frequency shift by the multiplier (13); and an adder (17) for adding the frequency component after suppression by the lower side wave band suppression high-pass filter (14). It is possible to reduce the processing load.

Type: Application

Filed: November 29, 2006

Publication date: July 2, 2009

Applicant: Kabushiki Kaisha Kenwood

Inventor: Hideki Ohtsu
Providing notifications using text-to-speech conversion

Publication number: 20090156240

Abstract: Methods and systems for providing electronic notifications are described. A server is configured to serve an interface, such as a Web page, to a terminal that requests from a user a first set of user contacts to be used to provide notifications to the user by a telephonic notification system in response to a notification process initiated by an organization associated with the user. The interface further requests a first set of priorities corresponding to the first set of user contacts, wherein the notification system will attempt to provide notifications to the first set user contracts in an order based at least in part on the first set of priorities. A database is configured to store the first set of user contacts and the first set of priorities. A voice interface circuit is configured to transmit a voice notification to at least one of the first set of user contacts.

Type: Application

Filed: November 24, 2008

Publication date: June 18, 2009

Applicant: 3N GLOBAL, INC.

Inventors: Steve Kirchmeier, Cinta Putra
METHOD AND APPARATUS FOR TRAINING DIFFERENCE PROSODY ADAPTATION MODEL, METHOD AND APPARATUS FOR GENERATING DIFFERENCE PROSODY ADAPTATION MODEL, METHOD AND APPARATUS FOR PROSODY PREDICTION, METHOD AND APPARATUS FOR SPEECH SYNTHESIS

Publication number: 20090157409

Abstract: A method includes, generating, for each parameter of the prosody vector, an initial parameter prediction model with a plurality of attributes related to difference prosody prediction and at least part of attribute combinations of the plurality of attributes, in which each of the plurality of attributes and the attribute combinations is included as an item, calculating importance of each item in the parameter prediction model, deleting the item having the lowest importance calculated, re-generating a parameter prediction model with the remaining items, determining whether the re-generated parameter prediction model is an optimal model, and repeating the step of calculating importance and the steps following the step of calculating importance with the re-generated parameter prediction model, if the re-generated parameter prediction model is determined as not an optimal model, wherein the difference prosody vector and all parameter prediction models of the difference prosody vector constitute the difference pros

Type: Application

Filed: December 4, 2008

Publication date: June 18, 2009

Inventors: Yi Lifu, Li Jian, Lou Xiaoyan, Hao Jie
Systems and methods for generating verbal feedback messages in head-worn electronic devices

Publication number: 20090144061

Abstract: Systems and methods for generating and providing verbal feedback messages to wearers of man-machine interface (MMI)-enabled head-worn electronic devices. An exemplary head-worn electronic device includes an MMI and an acoustic signal generator configured to provide verbal acoustic messages to a wearer of the head-worn electronic device in response to the wearer's interaction with the MMI. The head-worn electronic device may be further configured to monitor device states and generate and provide verbal acoustic messages indicative of changes to the device states to the wearer. The verbal messages are digitally stored and accessed by a microprocessor configured to execute a verbal feedback generation program. Further, the verbal messages may be stored according to multiple different natural languages, thereby allowing a user to select a preferred natural language by which the verbal acoustic messages are fed back to the user.

Type: Application

Filed: November 29, 2007

Publication date: June 4, 2009

Inventors: Jacob T. Meyberg, Eric R. Bradford, Stephen V. Cahill
Method For Operating A Navigation Device

Publication number: 20090143982

Abstract: A method for operating a navigation device that includes an input device for inputting operator commands and/or locations, particularly starting points and/or destinations, a road network database, a route calculation unit for calculating a planned route with consideration of the locations and the road network database, wherein the route leads from the starting point to the destination, a signal receiving unit for receiving position signals, particularly GPS signals, a position determining unit that determines the current position based on the position signals, and a voice output module that is able to generate and acoustically output a voice message, particularly maneuvering instructions, in dependence on predetermined boundary conditions by combining at least two voice message elements, wherein the voice message elements to be combined are analyzed prior to the acoustic output of the voice message, and wherein the voice message is changed in accordance with predetermined prioritization rules depending on th

Type: Application

Filed: November 25, 2008

Publication date: June 4, 2009

Inventors: Jochen Katzer, Thorsten W. Schmidt, Matthias Kahlow
SPEECH PROCESSING APPARATUS AND SPEECH SYNTHESIS APPARATUS

Publication number: 20090144053

Abstract: An information extraction unit extracts spectral envelope information of L-dimension from each frame of speech data. The spectral envelope information does not have a spectral fine structure. A basis storage unit stores N bases (L>N>1). Each basis is differently a frequency band having a maximum as a peak frequency in a spectral domain having L-dimension. A value corresponding to a frequency outside the frequency band along a frequency axis of the spectral domain is zero. Two frequency bands of which two peak frequencies are adjacent along the frequency axis partially overlap. A parameter calculation unit minimizes a distortion between the spectral envelope information and a linear combination of each basis with a coefficient by changing the coefficient, and sets the coefficient of each basis from which the distortion is minimized to a spectral envelope parameter of the spectral envelope information.

Type: Application

Filed: December 3, 2008

Publication date: June 4, 2009

Applicant: KABUSHIKI KAISHA TOSHIBA

Inventors: Masatsune TAMURA, Katsumi TSUCHIYA, Takehiko KAGOSHIMA
System and Method for Generating a Web Podcast Service

Publication number: 20090144060

Abstract: Disclosed is a system and method for generating a web podcast interview that allows a single user to create his own multi-voices interview from his computer. The method allows the user to enter a set of questions from a text file using a text editor. (Answers may also be entered from a text file although this is not the more preferred embodiment.) For each question, the user may select one particular interviewer voice among a plurality of predefined interviewer voices, and by using a text-to-speech module in a text-to-speech server, each question is converted into an audio question having the selected interviewer voice. Then, the user preferably records answers to each audio question using a telephone. And a questions/answers sequence in a podcast compliant format is generated.

Type: Application

Filed: December 1, 2008

Publication date: June 4, 2009

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Steve Groeger, Brian Heasman, Christopher von Koschembahr, Yuk-Lun Wong
Method, Apparatus, and Program for Certifying a Voice Profile When Transmitting Text Messages for Synthesized Speech

Publication number: 20090144057

Abstract: A mechanism is provided for authenticating and using a personal voice profile. The voice profile may be issued by a trusted third party, such as a certification authority. The personal voice profile may include information for generating a digest or digital signature for text messages. A speech synthesis system may speak the text message using the voice characteristics, such as prosodic characteristics, only if the voice profile is authenticated and the text message is valid and free of tampering.

Type: Application

Filed: April 8, 2008

Publication date: June 4, 2009

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Rafael Graniello Cabezas, Jason Eric Moore, Elizabeth Silvia
DATA PROCESSING DEVICE AND COMPUTER-READABLE STORAGE MEDIUM STORING SET OF PROGRAM INSTRUCTIONS EXCUTABLE ON DATA PROCESSING DEVICE

Publication number: 20090138268

Abstract: A data processing device includes a displaying unit, a receiving unit, a determining unit, and a controlling unit. The displaying unit displaying one of a first operation screen and a second operation screen. Input data is inputted into the receiving unit by a user. The determining unit determines, based on at least one of the input data and settings of an OS, which of the first operation screen and the second operation screen should be displayed on the displaying unit. The controlling unit controls the displaying unit to display the first operation screen if the determining unit determines that the first operation screen should be displayed on the displaying unit, and control the displaying unit to display the second operation screen if the determining unit determines that the second operation screen should be displayed on the displaying unit.

Type: Application

Filed: November 26, 2008

Publication date: May 28, 2009

Applicant: BROTHER KOGYO KABUSHIKI KAISHA

Inventors: Hirotoshi MAEHIRA, Masahiro FUJISHITA
Unsupervised Topic Segmentation of Acoustic Speech Signal

Publication number: 20090132252

Abstract: Disclosed methods and apparatus segment a signal, such as an acoustic speech signal, into coherent segments, such as coherent topics. In the case of an acoustic speech signal, the segmentation relies on only raw acoustic information and may be performed without requiring access to, or generation of, a transcript of the acoustic speech signal. Recurring acoustic patterns are found by matching pairs of sounds, based on acoustic similarity. Information about distributional similarity from multiple local comparisons is aggregated and is further processed to fill gaps in the data by growing regions that represent recurring acoustic patterns. Selection criteria are used to identify coherent topics represented by the grown regions and topic boundaries therebetween. Another signal, such as a video signal, may be partitioned according to topic boundaries identified in an acoustic speech signal that is related to the video signal.

Type: Application

Filed: November 20, 2007

Publication date: May 21, 2009

Applicant: MASSACHUSETTS INSTITUTE OF TECHNOLOGY

Inventors: Igor Malioutov, Alex Park
System and Method for Receiving User-Specific Information Over Digital Radio

Publication number: 20090116584

Abstract: A system and method for permitting users to receive desired/user-specific data or information, e.g., electronic mail or other user-subscribed services, e.g., for textual information, over the airwaves via a receiver, e.g., a digital receiver In one embodiment, an authorization process is provided wherein the receiver includes a hard-coded user ID stored thereon for comparison with an input user ID encoded with user-specific data signals. A splitter permits simultaneous processing of e.g., radio frequency signals including user-specific information via an authorization path, as well as radio frequency signals having standard audio and/or audio/video information.

Type: Application

Filed: July 14, 2005

Publication date: May 7, 2009

Applicant: THOMSON LICENSING

Inventor: Danny Hardin
PREDICTING A RESULTANT ATTRIBUTE OF A TEXT FILE BEFORE IT HAS BEEN CONVERTED INTO AN AUDIO FILE

Publication number: 20090112597

Abstract: An apparatus for predicting a resultant attribute of a text file before it has been converted to an audio file by a text-to-speech converter application. In accordance with an embodiment, the apparatus includes: a receiver component for receiving a text file and a request to determine a resultant attribute of the text file before it is converted to an audio file, by a text-to-speech converter component; a calculation component for determining a file type associated with the received text file and the size of the received text file; a calculation component for identifying an attribute associated with the determined file type; and a calculation component for determining from the identified attribute and the size of the received text file a resultant attribute of the text file before it is converted to an audio file by the text-to-speech converter component.

Type: Application

Filed: October 22, 2008

Publication date: April 30, 2009

Inventors: Declan Tarrant, Edward G. Mackle, Eamon Phelan, Keith Pilson
MESSAGE DELIVERY USING A VOICE MAIL SYSTEM

Publication number: 20090110159

Abstract: Voicemail systems and methods can provide a user with means for receiving categorized messages from parties. The categories can be independent of the intended recipients of the messages, such that multiple users can receive the same message. A user can subscribe to receive categorized messages within selected categories or from selected parties. A registered party, including a merchant, an organization, a government agency and/or another party, can input messages to selected categories and can input distribution parameters for the messages. Expiration dates can be associated with the messages such that messages can be deleted once expired.

Type: Application

Filed: December 31, 2008

Publication date: April 30, 2009

Inventor: Rohit Satish Kalbag
METHOD AND APPARATUS FOR PREPARING A DOCUMENT TO BE READ BY TEXT-TO-SPEECH READER

Publication number: 20090099846

Abstract: There is disclosed a method and system for preparing a document to be read by a text-to-speech reader. The method can include identifying two or more voice types available to the text-to-speech reader, identifying the text elements within the document, grouping related text elements together, and classifying the text elements according to voice types available to the text-to-speech reader. The method of grouping the related text elements together can include syntactic and intelligent clustering. The classification of text elements can include performing latent semantic analysis on the text elements and characteristics of the available voice types.

Type: Application

Filed: December 19, 2008

Publication date: April 16, 2009

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: John B. Pickering
SYSTEMS AND METHODS FOR MAINTENANCE KNOWLEDGE MANAGEMENT

Publication number: 20090094028

Abstract: Knowledge-based information can be captured and processed to create a library of such knowledge. A maintenance worker performing a task for an asset can record audio and/or video information during the performance, and can upload the recording to a maintenance system. The system processes the recording to produce a text file corresponding to any speech during the recording, and generates a search index allowing the text file to be searched by a user. If the task is performed in the context of a work order, for example, information from the work order can be associated with the text file so that a user can search by text search, keyword, task, or other such information. A user then can locate and access the text file and/or the corresponding recording for playback.

Type: Application

Filed: October 4, 2007

Publication date: April 9, 2009

Applicant: Oracle International Corporation

Inventors: Brian Schmidt, George Thomas
METHOD AND SYSTEM FOR PRESELECTION OF SUITABLE UNITS FOR CONCATENATIVE SPEECH

Publication number: 20090094035

Abstract: A system and method for improving the response time of text-to-speech synthesis utilizes “triphone contexts” (i.e., triplets comprising a central phoneme and its immediate context) as the basic unit, instead of performing phoneme-by-phoneme synthesis. The method comprises a method of generating a triphone preselection cost database for use in speech synthesis, the method comprising 1) selecting a triphone sequence u1-u2-u3, 2) calculating a preselection cost for each 5-phoneme sequence ua-u1-u2-u3-ub, where u2 is allowed to match any identically labeled phoneme in a database and the units ua and ub vary over the entire phoneme universe and 3) storing a group of the selected triphone sequences exhibiting the lowest costs in a triphone preselection cost database.

Type: Application

Filed: December 1, 2008

Publication date: April 9, 2009

Applicant: AT&T Corp.

Inventor: Alistair D. Conkie
Audio Reader Device

Publication number: 20090089061

Abstract: An audio reader device for reading printed infrared media includes a linear sensor device sensitive to infra-red. A processor is operatively connected to the sensor device and is configured to read and decode infra-red audio data on the media. A memory is operatively connected to the processor for storing the audio data. A sound processing integrated circuit and speaker arrangement is operatively connected to the memory for playback of the audio data. A roller arrangement feeds the media past the linear sensor device.

Type: Application

Filed: November 26, 2008

Publication date: April 2, 2009

Inventors: Kia Silverbrook, Paul Lapstun, Simon Robert Walmsley
FLASH PAIRING BETWEEN BLUETOOTH DEVICES

Publication number: 20090088076

Abstract: In an example embodiment, a technique that allows a device unable to display a confirmation value and/or unable to receive a keyed data entry to confirm a generated confirmation value with a confirmation value produced by a second device. The confirmation value is output one character at a time. For example, for performing a six digit numerical comparison (NC), each digit is presented one at a time enabling a user to compare the output digit with the corresponding digit output by the second device.

Type: Application

Filed: October 1, 2007

Publication date: April 2, 2009

Inventors: Gregory Scott MERCURIO, Cullen Jennings
ADJUSTING OR SETTING VEHICLE ELEMENTS THROUGH SPEECH CONTROL

Publication number: 20090089065

Abstract: A speech processing device includes an automotive device that filters data that is sent and received across an in-vehicle bus. The device selectively acquires vehicle data related to a user settings or adjustments of an in-vehicle system. An interface acquires the selected vehicle data from one or more in-vehicle sensors in response to a user's articulation of a first code phrase. A memory stores the selected vehicle data with unique identifying data associated with a user. The unique identifying data establishes a connection between the selected vehicle data and the user when a second code phrase is articulated by the user. A data interface provides access to the selected vehicle data and relationship data retained in the memory and enables the processing of the data to customize the in-vehicle system. The data interface is responsive to a user's articulation of a third code phrase to process the selected vehicle data that enables the setting or adjustment of the in-vehicle system.

Type: Application

Filed: September 30, 2008

Publication date: April 2, 2009

Inventors: Markus Buck, Tim Haulick, Gerhard Uwe Schmidt
Method and apparatus for enhanced telecommunication interface

Publication number: 20090088140

Abstract: A telecommunications system including a telephone including a calling party identification receiver and a peripheral device transceiver; and a headset configured to communication with the telephone via the peripheral device transceiver and configured to deliver calling party identification information to a user as audio information.

Type: Application

Filed: September 27, 2007

Publication date: April 2, 2009

Inventors: Rami Caspi, William J. Beyda
INTERACTIVE DEBUGGING AND TUNING OF METHODS FOR CTTS VOICE BUILDING

Publication number: 20090083037

Abstract: A method, a system, and an apparatus for identifying and correcting sources of problems in synthesized speech which is generated using a concatenative text-to-speech (CTTS) technique. The method can include the step of displaying a waveform corresponding to synthesized speech generated from concatenated phonetic units. The synthesized speech can be generated from text input received from a user. The method further can include the step of displaying parameters corresponding to at least one of the phonetic units. The method can include the step of displaying the original recordings containing selected phonetic units. An editing input can be received from the user and the parameters can be adjusted in accordance with the editing input.

Type: Application

Filed: December 3, 2008

Publication date: March 26, 2009

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Philip Gleason, Maria E. Smith, Mahesh Viswanathan, Jie Zeng
ENHANCED DELIVERY OF AUDIO DATA FOR PORTABLE PLAYBACK

Publication number: 20090077204

Abstract: A system for selection by a user and delivery to the user over an internetwork transmission channel of selected audio data files at a delivery rate of at least twice the delivery rate for normal, audibly perceptible playback of an audio data file. The user registers the user's selection of audio material with a central library of audio and/or text data files, and a digitized and optionally compressed omnibus file containing the user's selections is prepared and transmitted to the user at a high data transfer rate. The user receives downloads the selected data files to a personal computer or to a portable storage and playback unit (SPU) that may store and play back digitized text or audio data, using a docking station. The user carries this SPU until the user has an opportunity to audio process and play back the text or audio data files in audibly perceptible form.

Type: Application

Filed: November 17, 2008

Publication date: March 19, 2009

Applicant: Sony Corporation

Inventors: James M. JANKY, Nathan Schulhof
METHOD AND APPARATUS TO CONTROL OPERATION OF A PLAYBACK DEVICE

Publication number: 20090076821

Abstract: Media metadata is accessible for a plurality of media items (See FIG. 12). The media metadata includes a number of strings to identify information regarding the media items (See FIG. 12). Phonetic metadata is associated the number of strings of the media metadata (See FIG. 12). Each portion of the phonetic metadata is stored in an original language of the string (See FIG. 12).

Type: Application

Filed: August 21, 2006

Publication date: March 19, 2009

Applicant: GRACENOTE, INC.

Inventors: Vadim Brenner, Peter C. DiMaria, Dale T. Roberts, Michael W. Mantle, Michael W Orme
FUNDAMENTAL FREQUENCY PATTERN GENERATION APPARATUS AND FUNDAMENTAL FREQUENCY PATTERN GENERATION METHOD

Publication number: 20090070116

Abstract: A fundamental frequency pattern generation apparatus includes a first storage including representative vectors each corresponding to a prosodic control unit and having a section for changing the number of phonemes, a second storage unit including a rule to select a vector corresponding to an input context, a selection unit configured to select a vector from the representative vectors by applying the rule to the context and output the selected vector, a calculation unit configured to calculate an expansion/contraction ratio of the section of the selected vector in a time-axis direction based on a designated value for a specific feature amount related to a length of a fundamental frequency pattern to be generated, the designated value of the feature amount being required of the fundamental frequency pattern to be generated, and an expansion/contraction unit configured to expand/contract the selected vector based on the expansion/contraction ratio to generate the fundamental frequency pattern.

Type: Application

Filed: September 5, 2008

Publication date: March 12, 2009

Inventor: Nobuaki Mizutani
Interpolation method

Publication number: 20090070117

Abstract: According to an aspect of an embodiment, a method for interpolating a partial loss of an audio signal including a sound signal component and a background noise component in transmission thereof, the method comprising the steps of: calculating frequency characteristic of the background noise in the audio signal; extracting the sound signal component from the audio signal; generating pseudo noise by applying the frequency characteristic of the background noise included in the audio signal to white noise; and generating an interpolation signal by combining the pseudo noise with the extracted sound signal component included in the audio signal to supersede the partial loss of the audio signal.

Type: Application

Filed: September 5, 2008

Publication date: March 12, 2009

Applicant: FUJITSU LIMITED

Inventor: Kaori Endo
AUDIBLE METADATA

Publication number: 20090070114

Abstract: This disclosure describes systems and methods for audibly presenting metadata. Audibly presentable metadata is referred to as audible metadata. Audible metadata may be associated with one or more media objects. In one embodiment, audible metadata is pre-recorded requiring little or no processing before it can be rendered. In another embodiment, audible metadata is text, and a text-to-speech conversion device may be used to convert the text into renderable audible metadata. Audible metadata may be rendered at any point before or after rendering of a media object, or may be rendered during rendering of a media object via a dynamic user request.

Type: Application

Filed: September 10, 2007

Publication date: March 12, 2009

Applicant: Yahoo! Inc.

Inventor: Chris Staszak
ROBOT APPARATUS WITH VOCAL INTERACTIVE FUNCTION AND METHOD THEREFOR

Publication number: 20090063155

Abstract: The present invention provides a robot apparatus with a vocal interactive function. The robot apparatus receives a vocal input, and recognizes the vocal input. The robot apparatus stores a plurality of output data, an output count of each of the output data, and a weighted value of each of the output data. The robot apparatus outputs output data according to the weighted values of all the output data corresponding to the vocal input, and adds one to the output count of the output data. The robot apparatus calculates the weighted values of all the output data corresponding to the vocal input according to the output count. Consequently, the robot apparatus may output different and variable output data when receiving the same vocal input. The present invention also provides a vocal interactive method adapted for the robot apparatus.

Type: Application

Filed: August 13, 2008

Publication date: March 5, 2009

Applicant: HON HAI PRECISION INDUSTRY CO., LTD.

Inventors: Tsu-Li Chiang, Chuan-Hong Wang, Kuo-Pao Hung, Kuan-Hong Hsieh
VOICE SYNTHESIS METHOD AND INTERPERSONAL COMMUNICATION METHOD, PARTICULARLY FOR MULTIPLAYER ONLINE GAMES

Publication number: 20090063156

Abstract: A voice synthesis method, said method comprising a step of choosing a synthetic voice from among a set of voices having predetermined spectral signatures and a step of recording the natural voice of a first person, the method comprising a step of transforming the natural recorded voice so as to conform with the spectral signature of the chosen synthetic voice, the natural voice thereby transformed being recorded, said method comprising a step of determining at least one situation parameter for a first character from among a set of predefined parameters, each predefined parameter being associated with a spectral alteration of the emitted voice, the determined situation parameter particularly characterizing the environment or the physical or psychological state of the character, the method comprising a step of spectrally altering the transformed natural voice so as to conform with the spectral alteration associated with the character's situation parameter.

Type: Application

Filed: August 26, 2008

Publication date: March 5, 2009

Applicant: Alcatel Lucent

Inventors: Sylvain SQUEDIN, Serge Papillon
AUDIO REPRODUCING METHOD, CHARACTER CODE USING DEVICE, DISTRIBUTION SERVICE SYSTEM, AND CHARACTER CODE MANAGEMENT METHOD

Publication number: 20090063152

Abstract: A character code is associated with sound as well as character or sign so as to enhance expressiveness on the Internet or in electronic mail. Sound data is recorded in the character code using device in association with the character code. The user can reproduce an intended sound in the same way as he or she displays a character on the character code using device, whereby the user can enhance his or her expressiveness on the Internet or in electronic mail, for example.

Type: Application

Filed: April 10, 2006

Publication date: March 5, 2009

Inventor: Tadahiko Munakata
SYSTEM AND METHOD FOR BLENDING SYNTHETIC VOICES

Publication number: 20090063153

Abstract: A system and method for generating a synthetic text-to-speech TTS voice are disclosed. A user is presented with at least one TTS voice and at least one voice characteristic. A new synthetic TTS voice is generated by blending a plurality of existing TTS voices according to the selected voice characteristics. The blending of voices involves interpolating segmented parameters of each TTS voice. Segmented parameters may be, for example, prosodic characteristics of the speech such as pitch, volume, phone durations, accents, stress, mis-pronunciations and emotion.

Type: Application

Filed: November 4, 2008

Publication date: March 5, 2009

Applicant: AT&T Corp.

Inventors: David A. Kapilow, Kenneth H. Rosen, Juergen Schroeter
Dynamic Mixed-Initiative Dialog Generation in Speech Recognition

Publication number: 20090055163

Abstract: Disclosed are a method (500), apparatus (100) and computer program product for generating a mixed-initiative dialog to obtain information for dialog slots. A composite grammar dependent upon a set of unfilled slots is constructed (501). A prompt, dependent upon the a set of unfilled slots, is presented (309) to a user. An utterance is received (301) from the user in response to said prompt. Relevant information is determined based upon the further utterance. One or more said unfilled slots are filled (302) with said relevant information.

Type: Application

Filed: August 20, 2007

Publication date: February 26, 2009

Inventors: Sandeep Jindal, Pankaj Kankar
Conversion of text email or SMS message to speech spoken by animated avatar for hands-free reception of email and SMS messages while driving a vehicle

Publication number: 20090055187

Abstract: Subscribers can access and listen to their email while they drive, access to the email messages being hands-free so a person can listen to email while they drive. In further accord with the present invention, a selectable avatar speaks the email message. And, the invention provides unified messaging such that SMS and email are unified and present and spoken by the avatar, so the subscriber need not access two devices (an instant message device, and an email device). Additionally, the invention can convert natural language to an acronym to be spoken by the avatar, and can convert acronyms in a message to natural language spoken by the avatar; subscriber selects the desired one of these two.

Type: Application

Filed: August 21, 2007

Publication date: February 26, 2009

Inventors: Howard Leventhal, Anan Yaagoub
HMM-BASED BILINGUAL (MANDARIN-ENGLISH) TTS TECHNIQUES

Publication number: 20090055162

Abstract: An exemplary method for generating speech based on text in one or more languages includes providing a phone set for two or more languages, training multilingual HMMs where the HMMs include state level sharing across languages, receiving text in one or more of the languages of the multilingual HMMs and generating speech, for the received text, based at least in part on the multilingual HMMs. Other exemplary techniques include mapping between a decision tree for a first language and a decision tree for a second language, and optionally vice versa, and Kullback-Leibler divergence analysis for a multilingual text-to-speech system.

Type: Application

Filed: August 20, 2007

Publication date: February 26, 2009

Applicant: Microsoft Corporation

Inventors: Yao Qian, Frank Kao-PingK Soong
PITCH PATTERN GENERATION METHOD AND APPARATUS THEREOF

Publication number: 20090055188

Abstract: The prosody control unit pattern generation module generates pitch patterns in respective prosody control units based on language attribute information, the phoneme duration and emphasis degree information, the modification method decision module decides a modification method by smoothing processing with respect to the pitch pattern in a connection portion between the prosody control unit and at least one of previous and next prosody control units based on at least emphasis degree information to generate modification method information, and the pattern connection module modifies pitch patterns generated in respective prosody control units by smoothing processing according to the modification method information and connects them to generate a sentence pitch pattern corresponding to a text to be a target for speech synthesis.

Type: Application

Filed: February 22, 2008

Publication date: February 26, 2009

Applicant: KABUSHIKI KAISHA TOSHIBA

Inventors: Gou Hirabayashi, Takehiko Kagoshima
APPARATUS, SYSTEM, AND METHOD FOR VOICE CHAT TRANSCRIPTION

Publication number: 20090048845

Abstract: An apparatus, system, and method to transcribe a voice chat session initiated from a text chat session. The system includes a chat server, a voice server, and a transcription engine. The chat server is configured to facilitate a text chat session between multiple instant messaging clients. The voice server is coupled to the chat server and configured to facilitate a transition from the text chat session to a voice chat session between the multiple instant messaging clients. The transcription engine is coupled to the voice server and configured to generate a voice transcription of the voice chat session. The voice transcription may be aggregated into a text chat history.

Type: Application

Filed: August 17, 2007

Publication date: February 19, 2009

Inventors: Erik J. Burckart, Steve R. Campbell, Andrew Ivory, Aaron K. Shook
Synthesis by Generation and Concatenation of Multi-Form Segments

Publication number: 20090048841

Abstract: A speech synthesis system and method is described. A speech segment database references speech segments having various different speech representational structures. A speech segment selector selects from the speech segment database a sequence of speech segment candidates corresponding to a target text. A speech segment sequencer generates from the speech segment candidates sequenced speech segments corresponding to the target text. A speech segment synthesizer combines the selected sequenced speech segments to produce a synthesized speech signal output corresponding to the target text.

Type: Application

Filed: August 14, 2007

Publication date: February 19, 2009

Applicant: Nuance Communications, Inc.

Inventors: Vincent Pollet, Andrew Breen
Scripting support for data identifiers, voice recognition and speech in a telnet session

Publication number: 20090048831

Abstract: Methods of adding data identifiers and speech/voice recognition functionality are disclosed. A telnet client runs one or more scripts that add data identifiers to data fields in a telnet session. The input data is inserted in the corresponding fields based on data identifiers. Scripts run only on the telnet client without modifications to the server applications. Further disclosed are methods for providing speech recognition and voice functionality to telnet clients. Portions of input data are converted to voice and played to the user. A user also may provide input to certain fields of the telnet session by using his voice. Scripts running on the telnet client convert the user's voice into text and is inserted to corresponding fields.

Type: Application

Filed: August 16, 2007

Publication date: February 19, 2009

Inventors: Lamar John Van Wagenen, Brant David Thomsen, Scott Allen Caddes
SYSTEM-EFFECTED TEXT ANNOTATION FOR EXPRESSIVE PROSODY IN SPEECH SYNTHESIS AND RECOGNITION

Publication number: 20090048843

Abstract: The inventive system can automatically annotate the relationship of text and acoustic units for the purposes of: (a) predicting how the text is to be pronounced as expressively synthesized speech, and (b) improving the proportion of expressively uttered speech as correctly identified text representing the speaker's message. The system can automatically annotate text corpora for relationships of uttered speech for a particular speaking style and for acoustic units in terms of context and content of the text to the utterances. The inventive system can use kinesthetically defined expressive speech production phonetics that are recognizable and controllable according to kinesensic feedback principles. In speech synthesis embodiments of the invention, the text annotations can specify how the text is to be expressively pronounced as synthesized speech.

Type: Application

Filed: August 8, 2008

Publication date: February 19, 2009

Inventors: Rattima NITISAROJ, Gary Marple, Nishant Chandra
System and method for IVR development

Publication number: 20090041215

Abstract: In one embodiment, a system and method is illustrated as including creating a visual script containing a component that includes at least one of a function component, a decisional component, a speak component, and a capture component, and converting the visual script to a computer script. Further, this system and method may include retrieving a computer script from a pre-populated database, the computer script containing at least one component and being formatted using a language including at least one of an IVR-XML and a character delimited flat file, and generating training data using the computer script, the training data formatted as a linear computer script.

Type: Application

Filed: August 9, 2007

Publication date: February 12, 2009

Inventors: Michael Schmitt, Nicole Holte, Jeffrey Clement, Matt Weyland, Eric Pilhofer, Roman Loy
System and method for phonetic representation

Publication number: 20090043584

Abstract: A method for generating an Approximate Phonetic Representation (APR) of a given word, the word having a sequence of characters, the method comprising: Receiving the word; Generating the APR by applying at least one metaphone3 translation rule to encode one or more of the characters of the given word into a resulting APR; and Returning either the generated APR and/or one or more words matching the APR from a dictionary of words.

Type: Application

Filed: August 6, 2007

Publication date: February 12, 2009

Inventor: Lawrence Brooke Frank Philips
ANSWER AN INCOMING VOICE CALL WITHOUT REQUIRING A USER TO SPEAK

Publication number: 20090037178

Abstract: A system comprises a wireless transceiver and logic coupled to the wireless transceiver. The logic is adapted to answer a phone call from a calling party with an automated voice message and then, in the same phone call, to enable a user to have a two-way conversation with the calling party without requiring the user to speak.

Type: Application

Filed: July 30, 2007

Publication date: February 5, 2009

Inventor: Yogesh K. MITTAL
Method and Apparatus for Automatically Converting Voice

Publication number: 20090037179

Abstract: The invention proposes a method and apparatus for significantly improving the quality of voice morphing and guaranteeing the similarity of converted voice. The invention sets several standard speakers in a TTS database, and selects the voices of different standard speakers for speech synthesis according to different roles, wherein the voice of the selected standard speaker is similar to the original role to a certain extent. Then the invention further performs voice morphing on the standard voice similar to the original voice to a certain extent, in order to accurately mimic the voice of the original speaker, so as to make the converted voice closer to the original voice features while guaranteeing the similarity.

Type: Application

Filed: July 29, 2008

Publication date: February 5, 2009

Applicant: International Business Machines Corporation

Inventors: Yi Liu, Yong Qin, Qin Shi, Zhi Wei Shuang
APPARATUS AND METHOD FOR SYNTHESIZING A PLURALITY OF WAVEFORMS IN SYNCHRONIZED MANNER

Publication number: 20090025537

Abstract: A plurality of blocks of waveform data are stored in a memory, which also stores, for each of the blocks, synchronizing information representative of a plurality of cycle synchronizing points that are indicative of periodic specific phase positions where the block of waveform data should be synchronized in phase with another block of waveform data. Two blocks of waveform data (e.g., harmonic and nonharmonic components) are read out from the memory, along with the synchronizing information. On the basis of the synchronizing information, the readout of two blocks of waveform data is controlled using the synchronizing information. There is stored, for each of the blocks, at least one piece of synchronizing position information indicative of a specific position where the block should be synchronized with another block, and the readout of the individual blocks of waveform data is controlled so that the blocks are synchronized with each other using the synchronizing position information.

Type: Application

Filed: September 24, 2007

Publication date: January 29, 2009

Applicant: Yamaha Corporation

Inventors: Motoichi Tamura, Yasuyuki Umeyama
Speech synthesizer and speech synthesis system

Publication number: 20090024393

Abstract: A speech synthesizer conducts a dialogue among a plurality of synthesized speakers, including a self speaker and one or more partner speakers, by use of a voice profile table describing emotional characteristics of synthesized voices, a speaker database storing feature data for different types of speakers and/or different speaking tones, a speech synthesis engine that synthesizes speech from input text according to feature data fitting the voice profile assigned to each synthesized speaker, and a profile manager that updates the voice profiles according to the content of the spoken text. The voice profiles of partner speakers are initially derived from the voice profile of the self speaker. A synthesized dialogue can be set up simply by selecting the voice profile of the self speaker.

Type: Application

Filed: June 11, 2008

Publication date: January 22, 2009

Applicant: OKI ELECTRIC INDUSTRY CO., LTD.

Inventor: Tsutomu Kaneyasu

prev … 2 3 4 5 6 7 8 next