Speech Synthesis; Text To Speech Systems (epo) Patents (Class 704/E13.001)
  • Publication number: 20090204243
    Abstract: Method and apparatus for creating a customized text-to-speech podcast by receiving a text file, parsing and tagging the text file, creating multiple audio files by text-to-speech technology, and creating a podcast by combining the audio files. The podcast can be an audio podcast or a video podcast. Video podcasts associate related video content with the audio content.
    Type: Application
    Filed: January 9, 2009
    Publication date: August 13, 2009
    Inventors: Harpreet MARWAHA, Brett ROBINSON
  • Publication number: 20090195351
    Abstract: A information processing device includes: a busy-level acquiring section for acquiring information on user's busy-level; a controller for determining a presentation form of information currently presented according to the user's busy-level acquired by the busy-level acquiring section; an information processor for performing a predetermined processing to the information under the control of the controller; and an output processor for outputting the information having been subjected to the processing by the information processor to an output section.
    Type: Application
    Filed: January 21, 2009
    Publication date: August 6, 2009
    Applicant: Sony Corporation
    Inventors: Naoki Takeda, Tetsujiro Kondo
  • Publication number: 20090184967
    Abstract: A system for controlling a rendering engine by using specialized commands. The commands are used to generate a production, such as a television show, at an end-user's computer that executes the rendering engine. In one embodiment, the commands are sent over a network, such as the Internet, to achieve broadcasts of video programs at very high compression and efficiency. Commands for setting and moving camera viewpoints, animating characters, and defining or controlling scenes and sounds are described. At a fine level of control math models and coordinate systems can be used make specifications. At a coarse level of control the command language approaches the text format traditionally used in television or movie scripts. Simple names for objects within a scene are used to identify items, directions and paths. Commands are further simplified by having the rendering engine use defaults when specifications are left out.
    Type: Application
    Filed: October 27, 2008
    Publication date: July 23, 2009
    Inventor: Charles J. Kulas
  • Publication number: 20090182548
    Abstract: Provided herein is a dictionary and/or translation device that is handheld and, thus, portable. The device may be incorporated in or with a cellular telephone, a personal digital assistant (PDA), a pager, a handheld computer, or the like. A page of text (or a portion thereof may be photographed as a digital image, and the digitally photographed page may be converted to electronic text using an optical character recognition system. The converted text may be viewed on a display that includes a system for highlighting a word or words from the output of the optical character recognition system for definition or translation. The definition or translation of the highlighted word or words may be provided and output to either one of the display or an audio output (such as a speaker) or both.
    Type: Application
    Filed: November 24, 2008
    Publication date: July 16, 2009
    Inventor: Jan Scott Zwolinski
  • Publication number: 20090175459
    Abstract: In a voice intelligibility enhancement system that controls a gain of a voice signal based on noise power and voice power of the voice signal generated by a voice signal generation unit, it is detected whether the voice power is equal to or greater than a predetermined level, noise power output when the voice power is less than the predetermined level is measured and stored, noise power to be output when the voice power exceeds the predetermined level is estimated to be the stored noise power, and gain of a voice signal is controlled on the basis of the voice power and the estimated noise power.
    Type: Application
    Filed: December 19, 2008
    Publication date: July 9, 2009
    Inventors: Toru Marumoto, Nozomu Saito
  • Publication number: 20090171664
    Abstract: Systems and methods for receiving natural language queries and/or commands and execute the queries and/or commands. The systems and methods overcomes the deficiencies of prior art speech query and response systems through the application of a complete speech-based information query, retrieval, presentation and command environment. This environment makes significant use of context, prior information, domain knowledge, and user specific profile data to achieve a natural environment for one or more users making queries or commands in multiple domains. Through this integrated approach, a complete speech-based natural language query and response environment can be created. The systems and methods creates, stores and uses extensive personal profile information for each user, thereby improving the reliability of determining the context and presenting the expected results for a particular question or command.
    Type: Application
    Filed: February 4, 2009
    Publication date: July 2, 2009
    Inventors: Robert A. Kennewick, David Locke, Michael R. Kennewick, SR., Michael R. Kennewick, JR., Richard Kennewick, Tom Freeman
  • Publication number: 20090171666
    Abstract: An interpolation device (4) includes a band extraction high-pass filter (11) for extracting a frequency component of a predetermined lower limit frequency or above from reproduction data obtained by digitizing an audio waveform signal; a multiplier (13) for frequency-shifting the frequency component extracted by the band extraction high-pass filter (11); lower side wave band suppression high-pass filter (14) suppressing the frequency component of the lower side wave band in the frequency component subjected to frequency shift by the multiplier (13); and an adder (17) for adding the frequency component after suppression by the lower side wave band suppression high-pass filter (14). It is possible to reduce the processing load.
    Type: Application
    Filed: November 29, 2006
    Publication date: July 2, 2009
    Applicant: Kabushiki Kaisha Kenwood
    Inventor: Hideki Ohtsu
  • Publication number: 20090156240
    Abstract: Methods and systems for providing electronic notifications are described. A server is configured to serve an interface, such as a Web page, to a terminal that requests from a user a first set of user contacts to be used to provide notifications to the user by a telephonic notification system in response to a notification process initiated by an organization associated with the user. The interface further requests a first set of priorities corresponding to the first set of user contacts, wherein the notification system will attempt to provide notifications to the first set user contracts in an order based at least in part on the first set of priorities. A database is configured to store the first set of user contacts and the first set of priorities. A voice interface circuit is configured to transmit a voice notification to at least one of the first set of user contacts.
    Type: Application
    Filed: November 24, 2008
    Publication date: June 18, 2009
    Applicant: 3N GLOBAL, INC.
    Inventors: Steve Kirchmeier, Cinta Putra
  • Publication number: 20090157409
    Abstract: A method includes, generating, for each parameter of the prosody vector, an initial parameter prediction model with a plurality of attributes related to difference prosody prediction and at least part of attribute combinations of the plurality of attributes, in which each of the plurality of attributes and the attribute combinations is included as an item, calculating importance of each item in the parameter prediction model, deleting the item having the lowest importance calculated, re-generating a parameter prediction model with the remaining items, determining whether the re-generated parameter prediction model is an optimal model, and repeating the step of calculating importance and the steps following the step of calculating importance with the re-generated parameter prediction model, if the re-generated parameter prediction model is determined as not an optimal model, wherein the difference prosody vector and all parameter prediction models of the difference prosody vector constitute the difference pros
    Type: Application
    Filed: December 4, 2008
    Publication date: June 18, 2009
    Inventors: Yi Lifu, Li Jian, Lou Xiaoyan, Hao Jie
  • Publication number: 20090144061
    Abstract: Systems and methods for generating and providing verbal feedback messages to wearers of man-machine interface (MMI)-enabled head-worn electronic devices. An exemplary head-worn electronic device includes an MMI and an acoustic signal generator configured to provide verbal acoustic messages to a wearer of the head-worn electronic device in response to the wearer's interaction with the MMI. The head-worn electronic device may be further configured to monitor device states and generate and provide verbal acoustic messages indicative of changes to the device states to the wearer. The verbal messages are digitally stored and accessed by a microprocessor configured to execute a verbal feedback generation program. Further, the verbal messages may be stored according to multiple different natural languages, thereby allowing a user to select a preferred natural language by which the verbal acoustic messages are fed back to the user.
    Type: Application
    Filed: November 29, 2007
    Publication date: June 4, 2009
    Inventors: Jacob T. Meyberg, Eric R. Bradford, Stephen V. Cahill
  • Publication number: 20090143982
    Abstract: A method for operating a navigation device that includes an input device for inputting operator commands and/or locations, particularly starting points and/or destinations, a road network database, a route calculation unit for calculating a planned route with consideration of the locations and the road network database, wherein the route leads from the starting point to the destination, a signal receiving unit for receiving position signals, particularly GPS signals, a position determining unit that determines the current position based on the position signals, and a voice output module that is able to generate and acoustically output a voice message, particularly maneuvering instructions, in dependence on predetermined boundary conditions by combining at least two voice message elements, wherein the voice message elements to be combined are analyzed prior to the acoustic output of the voice message, and wherein the voice message is changed in accordance with predetermined prioritization rules depending on th
    Type: Application
    Filed: November 25, 2008
    Publication date: June 4, 2009
    Inventors: Jochen Katzer, Thorsten W. Schmidt, Matthias Kahlow
  • Publication number: 20090144053
    Abstract: An information extraction unit extracts spectral envelope information of L-dimension from each frame of speech data. The spectral envelope information does not have a spectral fine structure. A basis storage unit stores N bases (L>N>1). Each basis is differently a frequency band having a maximum as a peak frequency in a spectral domain having L-dimension. A value corresponding to a frequency outside the frequency band along a frequency axis of the spectral domain is zero. Two frequency bands of which two peak frequencies are adjacent along the frequency axis partially overlap. A parameter calculation unit minimizes a distortion between the spectral envelope information and a linear combination of each basis with a coefficient by changing the coefficient, and sets the coefficient of each basis from which the distortion is minimized to a spectral envelope parameter of the spectral envelope information.
    Type: Application
    Filed: December 3, 2008
    Publication date: June 4, 2009
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Masatsune TAMURA, Katsumi TSUCHIYA, Takehiko KAGOSHIMA
  • Publication number: 20090144060
    Abstract: Disclosed is a system and method for generating a web podcast interview that allows a single user to create his own multi-voices interview from his computer. The method allows the user to enter a set of questions from a text file using a text editor. (Answers may also be entered from a text file although this is not the more preferred embodiment.) For each question, the user may select one particular interviewer voice among a plurality of predefined interviewer voices, and by using a text-to-speech module in a text-to-speech server, each question is converted into an audio question having the selected interviewer voice. Then, the user preferably records answers to each audio question using a telephone. And a questions/answers sequence in a podcast compliant format is generated.
    Type: Application
    Filed: December 1, 2008
    Publication date: June 4, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Steve Groeger, Brian Heasman, Christopher von Koschembahr, Yuk-Lun Wong
  • Publication number: 20090144057
    Abstract: A mechanism is provided for authenticating and using a personal voice profile. The voice profile may be issued by a trusted third party, such as a certification authority. The personal voice profile may include information for generating a digest or digital signature for text messages. A speech synthesis system may speak the text message using the voice characteristics, such as prosodic characteristics, only if the voice profile is authenticated and the text message is valid and free of tampering.
    Type: Application
    Filed: April 8, 2008
    Publication date: June 4, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Rafael Graniello Cabezas, Jason Eric Moore, Elizabeth Silvia
  • Publication number: 20090138268
    Abstract: A data processing device includes a displaying unit, a receiving unit, a determining unit, and a controlling unit. The displaying unit displaying one of a first operation screen and a second operation screen. Input data is inputted into the receiving unit by a user. The determining unit determines, based on at least one of the input data and settings of an OS, which of the first operation screen and the second operation screen should be displayed on the displaying unit. The controlling unit controls the displaying unit to display the first operation screen if the determining unit determines that the first operation screen should be displayed on the displaying unit, and control the displaying unit to display the second operation screen if the determining unit determines that the second operation screen should be displayed on the displaying unit.
    Type: Application
    Filed: November 26, 2008
    Publication date: May 28, 2009
    Applicant: BROTHER KOGYO KABUSHIKI KAISHA
    Inventors: Hirotoshi MAEHIRA, Masahiro FUJISHITA
  • Publication number: 20090132252
    Abstract: Disclosed methods and apparatus segment a signal, such as an acoustic speech signal, into coherent segments, such as coherent topics. In the case of an acoustic speech signal, the segmentation relies on only raw acoustic information and may be performed without requiring access to, or generation of, a transcript of the acoustic speech signal. Recurring acoustic patterns are found by matching pairs of sounds, based on acoustic similarity. Information about distributional similarity from multiple local comparisons is aggregated and is further processed to fill gaps in the data by growing regions that represent recurring acoustic patterns. Selection criteria are used to identify coherent topics represented by the grown regions and topic boundaries therebetween. Another signal, such as a video signal, may be partitioned according to topic boundaries identified in an acoustic speech signal that is related to the video signal.
    Type: Application
    Filed: November 20, 2007
    Publication date: May 21, 2009
    Applicant: MASSACHUSETTS INSTITUTE OF TECHNOLOGY
    Inventors: Igor Malioutov, Alex Park
  • Publication number: 20090116584
    Abstract: A system and method for permitting users to receive desired/user-specific data or information, e.g., electronic mail or other user-subscribed services, e.g., for textual information, over the airwaves via a receiver, e.g., a digital receiver In one embodiment, an authorization process is provided wherein the receiver includes a hard-coded user ID stored thereon for comparison with an input user ID encoded with user-specific data signals. A splitter permits simultaneous processing of e.g., radio frequency signals including user-specific information via an authorization path, as well as radio frequency signals having standard audio and/or audio/video information.
    Type: Application
    Filed: July 14, 2005
    Publication date: May 7, 2009
    Applicant: THOMSON LICENSING
    Inventor: Danny Hardin
  • Publication number: 20090112597
    Abstract: An apparatus for predicting a resultant attribute of a text file before it has been converted to an audio file by a text-to-speech converter application. In accordance with an embodiment, the apparatus includes: a receiver component for receiving a text file and a request to determine a resultant attribute of the text file before it is converted to an audio file, by a text-to-speech converter component; a calculation component for determining a file type associated with the received text file and the size of the received text file; a calculation component for identifying an attribute associated with the determined file type; and a calculation component for determining from the identified attribute and the size of the received text file a resultant attribute of the text file before it is converted to an audio file by the text-to-speech converter component.
    Type: Application
    Filed: October 22, 2008
    Publication date: April 30, 2009
    Inventors: Declan Tarrant, Edward G. Mackle, Eamon Phelan, Keith Pilson
  • Publication number: 20090110159
    Abstract: Voicemail systems and methods can provide a user with means for receiving categorized messages from parties. The categories can be independent of the intended recipients of the messages, such that multiple users can receive the same message. A user can subscribe to receive categorized messages within selected categories or from selected parties. A registered party, including a merchant, an organization, a government agency and/or another party, can input messages to selected categories and can input distribution parameters for the messages. Expiration dates can be associated with the messages such that messages can be deleted once expired.
    Type: Application
    Filed: December 31, 2008
    Publication date: April 30, 2009
    Inventor: Rohit Satish Kalbag
  • Publication number: 20090099846
    Abstract: There is disclosed a method and system for preparing a document to be read by a text-to-speech reader. The method can include identifying two or more voice types available to the text-to-speech reader, identifying the text elements within the document, grouping related text elements together, and classifying the text elements according to voice types available to the text-to-speech reader. The method of grouping the related text elements together can include syntactic and intelligent clustering. The classification of text elements can include performing latent semantic analysis on the text elements and characteristics of the available voice types.
    Type: Application
    Filed: December 19, 2008
    Publication date: April 16, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: John B. Pickering
  • Publication number: 20090094028
    Abstract: Knowledge-based information can be captured and processed to create a library of such knowledge. A maintenance worker performing a task for an asset can record audio and/or video information during the performance, and can upload the recording to a maintenance system. The system processes the recording to produce a text file corresponding to any speech during the recording, and generates a search index allowing the text file to be searched by a user. If the task is performed in the context of a work order, for example, information from the work order can be associated with the text file so that a user can search by text search, keyword, task, or other such information. A user then can locate and access the text file and/or the corresponding recording for playback.
    Type: Application
    Filed: October 4, 2007
    Publication date: April 9, 2009
    Applicant: Oracle International Corporation
    Inventors: Brian Schmidt, George Thomas
  • Publication number: 20090094035
    Abstract: A system and method for improving the response time of text-to-speech synthesis utilizes “triphone contexts” (i.e., triplets comprising a central phoneme and its immediate context) as the basic unit, instead of performing phoneme-by-phoneme synthesis. The method comprises a method of generating a triphone preselection cost database for use in speech synthesis, the method comprising 1) selecting a triphone sequence u1-u2-u3, 2) calculating a preselection cost for each 5-phoneme sequence ua-u1-u2-u3-ub, where u2 is allowed to match any identically labeled phoneme in a database and the units ua and ub vary over the entire phoneme universe and 3) storing a group of the selected triphone sequences exhibiting the lowest costs in a triphone preselection cost database.
    Type: Application
    Filed: December 1, 2008
    Publication date: April 9, 2009
    Applicant: AT&T Corp.
    Inventor: Alistair D. Conkie
  • Publication number: 20090089061
    Abstract: An audio reader device for reading printed infrared media includes a linear sensor device sensitive to infra-red. A processor is operatively connected to the sensor device and is configured to read and decode infra-red audio data on the media. A memory is operatively connected to the processor for storing the audio data. A sound processing integrated circuit and speaker arrangement is operatively connected to the memory for playback of the audio data. A roller arrangement feeds the media past the linear sensor device.
    Type: Application
    Filed: November 26, 2008
    Publication date: April 2, 2009
    Inventors: Kia Silverbrook, Paul Lapstun, Simon Robert Walmsley
  • Publication number: 20090088076
    Abstract: In an example embodiment, a technique that allows a device unable to display a confirmation value and/or unable to receive a keyed data entry to confirm a generated confirmation value with a confirmation value produced by a second device. The confirmation value is output one character at a time. For example, for performing a six digit numerical comparison (NC), each digit is presented one at a time enabling a user to compare the output digit with the corresponding digit output by the second device.
    Type: Application
    Filed: October 1, 2007
    Publication date: April 2, 2009
    Inventors: Gregory Scott MERCURIO, Cullen Jennings
  • Publication number: 20090089065
    Abstract: A speech processing device includes an automotive device that filters data that is sent and received across an in-vehicle bus. The device selectively acquires vehicle data related to a user settings or adjustments of an in-vehicle system. An interface acquires the selected vehicle data from one or more in-vehicle sensors in response to a user's articulation of a first code phrase. A memory stores the selected vehicle data with unique identifying data associated with a user. The unique identifying data establishes a connection between the selected vehicle data and the user when a second code phrase is articulated by the user. A data interface provides access to the selected vehicle data and relationship data retained in the memory and enables the processing of the data to customize the in-vehicle system. The data interface is responsive to a user's articulation of a third code phrase to process the selected vehicle data that enables the setting or adjustment of the in-vehicle system.
    Type: Application
    Filed: September 30, 2008
    Publication date: April 2, 2009
    Inventors: Markus Buck, Tim Haulick, Gerhard Uwe Schmidt
  • Publication number: 20090088140
    Abstract: A telecommunications system including a telephone including a calling party identification receiver and a peripheral device transceiver; and a headset configured to communication with the telephone via the peripheral device transceiver and configured to deliver calling party identification information to a user as audio information.
    Type: Application
    Filed: September 27, 2007
    Publication date: April 2, 2009
    Inventors: Rami Caspi, William J. Beyda
  • Publication number: 20090083037
    Abstract: A method, a system, and an apparatus for identifying and correcting sources of problems in synthesized speech which is generated using a concatenative text-to-speech (CTTS) technique. The method can include the step of displaying a waveform corresponding to synthesized speech generated from concatenated phonetic units. The synthesized speech can be generated from text input received from a user. The method further can include the step of displaying parameters corresponding to at least one of the phonetic units. The method can include the step of displaying the original recordings containing selected phonetic units. An editing input can be received from the user and the parameters can be adjusted in accordance with the editing input.
    Type: Application
    Filed: December 3, 2008
    Publication date: March 26, 2009
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Philip Gleason, Maria E. Smith, Mahesh Viswanathan, Jie Zeng
  • Publication number: 20090077204
    Abstract: A system for selection by a user and delivery to the user over an internetwork transmission channel of selected audio data files at a delivery rate of at least twice the delivery rate for normal, audibly perceptible playback of an audio data file. The user registers the user's selection of audio material with a central library of audio and/or text data files, and a digitized and optionally compressed omnibus file containing the user's selections is prepared and transmitted to the user at a high data transfer rate. The user receives downloads the selected data files to a personal computer or to a portable storage and playback unit (SPU) that may store and play back digitized text or audio data, using a docking station. The user carries this SPU until the user has an opportunity to audio process and play back the text or audio data files in audibly perceptible form.
    Type: Application
    Filed: November 17, 2008
    Publication date: March 19, 2009
    Applicant: Sony Corporation
    Inventors: James M. JANKY, Nathan Schulhof
  • Publication number: 20090076821
    Abstract: Media metadata is accessible for a plurality of media items (See FIG. 12). The media metadata includes a number of strings to identify information regarding the media items (See FIG. 12). Phonetic metadata is associated the number of strings of the media metadata (See FIG. 12). Each portion of the phonetic metadata is stored in an original language of the string (See FIG. 12).
    Type: Application
    Filed: August 21, 2006
    Publication date: March 19, 2009
    Applicant: GRACENOTE, INC.
    Inventors: Vadim Brenner, Peter C. DiMaria, Dale T. Roberts, Michael W. Mantle, Michael W Orme
  • Publication number: 20090070116
    Abstract: A fundamental frequency pattern generation apparatus includes a first storage including representative vectors each corresponding to a prosodic control unit and having a section for changing the number of phonemes, a second storage unit including a rule to select a vector corresponding to an input context, a selection unit configured to select a vector from the representative vectors by applying the rule to the context and output the selected vector, a calculation unit configured to calculate an expansion/contraction ratio of the section of the selected vector in a time-axis direction based on a designated value for a specific feature amount related to a length of a fundamental frequency pattern to be generated, the designated value of the feature amount being required of the fundamental frequency pattern to be generated, and an expansion/contraction unit configured to expand/contract the selected vector based on the expansion/contraction ratio to generate the fundamental frequency pattern.
    Type: Application
    Filed: September 5, 2008
    Publication date: March 12, 2009
    Inventor: Nobuaki Mizutani
  • Publication number: 20090070117
    Abstract: According to an aspect of an embodiment, a method for interpolating a partial loss of an audio signal including a sound signal component and a background noise component in transmission thereof, the method comprising the steps of: calculating frequency characteristic of the background noise in the audio signal; extracting the sound signal component from the audio signal; generating pseudo noise by applying the frequency characteristic of the background noise included in the audio signal to white noise; and generating an interpolation signal by combining the pseudo noise with the extracted sound signal component included in the audio signal to supersede the partial loss of the audio signal.
    Type: Application
    Filed: September 5, 2008
    Publication date: March 12, 2009
    Applicant: FUJITSU LIMITED
    Inventor: Kaori Endo
  • Publication number: 20090070114
    Abstract: This disclosure describes systems and methods for audibly presenting metadata. Audibly presentable metadata is referred to as audible metadata. Audible metadata may be associated with one or more media objects. In one embodiment, audible metadata is pre-recorded requiring little or no processing before it can be rendered. In another embodiment, audible metadata is text, and a text-to-speech conversion device may be used to convert the text into renderable audible metadata. Audible metadata may be rendered at any point before or after rendering of a media object, or may be rendered during rendering of a media object via a dynamic user request.
    Type: Application
    Filed: September 10, 2007
    Publication date: March 12, 2009
    Applicant: Yahoo! Inc.
    Inventor: Chris Staszak
  • Publication number: 20090063155
    Abstract: The present invention provides a robot apparatus with a vocal interactive function. The robot apparatus receives a vocal input, and recognizes the vocal input. The robot apparatus stores a plurality of output data, an output count of each of the output data, and a weighted value of each of the output data. The robot apparatus outputs output data according to the weighted values of all the output data corresponding to the vocal input, and adds one to the output count of the output data. The robot apparatus calculates the weighted values of all the output data corresponding to the vocal input according to the output count. Consequently, the robot apparatus may output different and variable output data when receiving the same vocal input. The present invention also provides a vocal interactive method adapted for the robot apparatus.
    Type: Application
    Filed: August 13, 2008
    Publication date: March 5, 2009
    Applicant: HON HAI PRECISION INDUSTRY CO., LTD.
    Inventors: Tsu-Li Chiang, Chuan-Hong Wang, Kuo-Pao Hung, Kuan-Hong Hsieh
  • Publication number: 20090063156
    Abstract: A voice synthesis method, said method comprising a step of choosing a synthetic voice from among a set of voices having predetermined spectral signatures and a step of recording the natural voice of a first person, the method comprising a step of transforming the natural recorded voice so as to conform with the spectral signature of the chosen synthetic voice, the natural voice thereby transformed being recorded, said method comprising a step of determining at least one situation parameter for a first character from among a set of predefined parameters, each predefined parameter being associated with a spectral alteration of the emitted voice, the determined situation parameter particularly characterizing the environment or the physical or psychological state of the character, the method comprising a step of spectrally altering the transformed natural voice so as to conform with the spectral alteration associated with the character's situation parameter.
    Type: Application
    Filed: August 26, 2008
    Publication date: March 5, 2009
    Applicant: Alcatel Lucent
    Inventors: Sylvain SQUEDIN, Serge Papillon
  • Publication number: 20090063152
    Abstract: A character code is associated with sound as well as character or sign so as to enhance expressiveness on the Internet or in electronic mail. Sound data is recorded in the character code using device in association with the character code. The user can reproduce an intended sound in the same way as he or she displays a character on the character code using device, whereby the user can enhance his or her expressiveness on the Internet or in electronic mail, for example.
    Type: Application
    Filed: April 10, 2006
    Publication date: March 5, 2009
    Inventor: Tadahiko Munakata
  • Publication number: 20090063153
    Abstract: A system and method for generating a synthetic text-to-speech TTS voice are disclosed. A user is presented with at least one TTS voice and at least one voice characteristic. A new synthetic TTS voice is generated by blending a plurality of existing TTS voices according to the selected voice characteristics. The blending of voices involves interpolating segmented parameters of each TTS voice. Segmented parameters may be, for example, prosodic characteristics of the speech such as pitch, volume, phone durations, accents, stress, mis-pronunciations and emotion.
    Type: Application
    Filed: November 4, 2008
    Publication date: March 5, 2009
    Applicant: AT&T Corp.
    Inventors: David A. Kapilow, Kenneth H. Rosen, Juergen Schroeter
  • Publication number: 20090055163
    Abstract: Disclosed are a method (500), apparatus (100) and computer program product for generating a mixed-initiative dialog to obtain information for dialog slots. A composite grammar dependent upon a set of unfilled slots is constructed (501). A prompt, dependent upon the a set of unfilled slots, is presented (309) to a user. An utterance is received (301) from the user in response to said prompt. Relevant information is determined based upon the further utterance. One or more said unfilled slots are filled (302) with said relevant information.
    Type: Application
    Filed: August 20, 2007
    Publication date: February 26, 2009
    Inventors: Sandeep Jindal, Pankaj Kankar
  • Publication number: 20090055187
    Abstract: Subscribers can access and listen to their email while they drive, access to the email messages being hands-free so a person can listen to email while they drive. In further accord with the present invention, a selectable avatar speaks the email message. And, the invention provides unified messaging such that SMS and email are unified and present and spoken by the avatar, so the subscriber need not access two devices (an instant message device, and an email device). Additionally, the invention can convert natural language to an acronym to be spoken by the avatar, and can convert acronyms in a message to natural language spoken by the avatar; subscriber selects the desired one of these two.
    Type: Application
    Filed: August 21, 2007
    Publication date: February 26, 2009
    Inventors: Howard Leventhal, Anan Yaagoub
  • Publication number: 20090055162
    Abstract: An exemplary method for generating speech based on text in one or more languages includes providing a phone set for two or more languages, training multilingual HMMs where the HMMs include state level sharing across languages, receiving text in one or more of the languages of the multilingual HMMs and generating speech, for the received text, based at least in part on the multilingual HMMs. Other exemplary techniques include mapping between a decision tree for a first language and a decision tree for a second language, and optionally vice versa, and Kullback-Leibler divergence analysis for a multilingual text-to-speech system.
    Type: Application
    Filed: August 20, 2007
    Publication date: February 26, 2009
    Applicant: Microsoft Corporation
    Inventors: Yao Qian, Frank Kao-PingK Soong
  • Publication number: 20090055188
    Abstract: The prosody control unit pattern generation module generates pitch patterns in respective prosody control units based on language attribute information, the phoneme duration and emphasis degree information, the modification method decision module decides a modification method by smoothing processing with respect to the pitch pattern in a connection portion between the prosody control unit and at least one of previous and next prosody control units based on at least emphasis degree information to generate modification method information, and the pattern connection module modifies pitch patterns generated in respective prosody control units by smoothing processing according to the modification method information and connects them to generate a sentence pitch pattern corresponding to a text to be a target for speech synthesis.
    Type: Application
    Filed: February 22, 2008
    Publication date: February 26, 2009
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Gou Hirabayashi, Takehiko Kagoshima
  • Publication number: 20090048845
    Abstract: An apparatus, system, and method to transcribe a voice chat session initiated from a text chat session. The system includes a chat server, a voice server, and a transcription engine. The chat server is configured to facilitate a text chat session between multiple instant messaging clients. The voice server is coupled to the chat server and configured to facilitate a transition from the text chat session to a voice chat session between the multiple instant messaging clients. The transcription engine is coupled to the voice server and configured to generate a voice transcription of the voice chat session. The voice transcription may be aggregated into a text chat history.
    Type: Application
    Filed: August 17, 2007
    Publication date: February 19, 2009
    Inventors: Erik J. Burckart, Steve R. Campbell, Andrew Ivory, Aaron K. Shook
  • Publication number: 20090048841
    Abstract: A speech synthesis system and method is described. A speech segment database references speech segments having various different speech representational structures. A speech segment selector selects from the speech segment database a sequence of speech segment candidates corresponding to a target text. A speech segment sequencer generates from the speech segment candidates sequenced speech segments corresponding to the target text. A speech segment synthesizer combines the selected sequenced speech segments to produce a synthesized speech signal output corresponding to the target text.
    Type: Application
    Filed: August 14, 2007
    Publication date: February 19, 2009
    Applicant: Nuance Communications, Inc.
    Inventors: Vincent Pollet, Andrew Breen
  • Publication number: 20090048831
    Abstract: Methods of adding data identifiers and speech/voice recognition functionality are disclosed. A telnet client runs one or more scripts that add data identifiers to data fields in a telnet session. The input data is inserted in the corresponding fields based on data identifiers. Scripts run only on the telnet client without modifications to the server applications. Further disclosed are methods for providing speech recognition and voice functionality to telnet clients. Portions of input data are converted to voice and played to the user. A user also may provide input to certain fields of the telnet session by using his voice. Scripts running on the telnet client convert the user's voice into text and is inserted to corresponding fields.
    Type: Application
    Filed: August 16, 2007
    Publication date: February 19, 2009
    Inventors: Lamar John Van Wagenen, Brant David Thomsen, Scott Allen Caddes
  • Publication number: 20090048843
    Abstract: The inventive system can automatically annotate the relationship of text and acoustic units for the purposes of: (a) predicting how the text is to be pronounced as expressively synthesized speech, and (b) improving the proportion of expressively uttered speech as correctly identified text representing the speaker's message. The system can automatically annotate text corpora for relationships of uttered speech for a particular speaking style and for acoustic units in terms of context and content of the text to the utterances. The inventive system can use kinesthetically defined expressive speech production phonetics that are recognizable and controllable according to kinesensic feedback principles. In speech synthesis embodiments of the invention, the text annotations can specify how the text is to be expressively pronounced as synthesized speech.
    Type: Application
    Filed: August 8, 2008
    Publication date: February 19, 2009
    Inventors: Rattima NITISAROJ, Gary Marple, Nishant Chandra
  • Publication number: 20090041215
    Abstract: In one embodiment, a system and method is illustrated as including creating a visual script containing a component that includes at least one of a function component, a decisional component, a speak component, and a capture component, and converting the visual script to a computer script. Further, this system and method may include retrieving a computer script from a pre-populated database, the computer script containing at least one component and being formatted using a language including at least one of an IVR-XML and a character delimited flat file, and generating training data using the computer script, the training data formatted as a linear computer script.
    Type: Application
    Filed: August 9, 2007
    Publication date: February 12, 2009
    Inventors: Michael Schmitt, Nicole Holte, Jeffrey Clement, Matt Weyland, Eric Pilhofer, Roman Loy
  • Publication number: 20090043584
    Abstract: A method for generating an Approximate Phonetic Representation (APR) of a given word, the word having a sequence of characters, the method comprising: Receiving the word; Generating the APR by applying at least one metaphone3 translation rule to encode one or more of the characters of the given word into a resulting APR; and Returning either the generated APR and/or one or more words matching the APR from a dictionary of words.
    Type: Application
    Filed: August 6, 2007
    Publication date: February 12, 2009
    Inventor: Lawrence Brooke Frank Philips
  • Publication number: 20090037178
    Abstract: A system comprises a wireless transceiver and logic coupled to the wireless transceiver. The logic is adapted to answer a phone call from a calling party with an automated voice message and then, in the same phone call, to enable a user to have a two-way conversation with the calling party without requiring the user to speak.
    Type: Application
    Filed: July 30, 2007
    Publication date: February 5, 2009
    Inventor: Yogesh K. MITTAL
  • Publication number: 20090037179
    Abstract: The invention proposes a method and apparatus for significantly improving the quality of voice morphing and guaranteeing the similarity of converted voice. The invention sets several standard speakers in a TTS database, and selects the voices of different standard speakers for speech synthesis according to different roles, wherein the voice of the selected standard speaker is similar to the original role to a certain extent. Then the invention further performs voice morphing on the standard voice similar to the original voice to a certain extent, in order to accurately mimic the voice of the original speaker, so as to make the converted voice closer to the original voice features while guaranteeing the similarity.
    Type: Application
    Filed: July 29, 2008
    Publication date: February 5, 2009
    Applicant: International Business Machines Corporation
    Inventors: Yi Liu, Yong Qin, Qin Shi, Zhi Wei Shuang
  • Publication number: 20090025537
    Abstract: A plurality of blocks of waveform data are stored in a memory, which also stores, for each of the blocks, synchronizing information representative of a plurality of cycle synchronizing points that are indicative of periodic specific phase positions where the block of waveform data should be synchronized in phase with another block of waveform data. Two blocks of waveform data (e.g., harmonic and nonharmonic components) are read out from the memory, along with the synchronizing information. On the basis of the synchronizing information, the readout of two blocks of waveform data is controlled using the synchronizing information. There is stored, for each of the blocks, at least one piece of synchronizing position information indicative of a specific position where the block should be synchronized with another block, and the readout of the individual blocks of waveform data is controlled so that the blocks are synchronized with each other using the synchronizing position information.
    Type: Application
    Filed: September 24, 2007
    Publication date: January 29, 2009
    Applicant: Yamaha Corporation
    Inventors: Motoichi Tamura, Yasuyuki Umeyama
  • Publication number: 20090024393
    Abstract: A speech synthesizer conducts a dialogue among a plurality of synthesized speakers, including a self speaker and one or more partner speakers, by use of a voice profile table describing emotional characteristics of synthesized voices, a speaker database storing feature data for different types of speakers and/or different speaking tones, a speech synthesis engine that synthesizes speech from input text according to feature data fitting the voice profile assigned to each synthesized speaker, and a profile manager that updates the voice profiles according to the content of the spoken text. The voice profiles of partner speakers are initially derived from the voice profile of the self speaker. A synthesized dialogue can be set up simply by selecting the voice profile of the self speaker.
    Type: Application
    Filed: June 11, 2008
    Publication date: January 22, 2009
    Applicant: OKI ELECTRIC INDUSTRY CO., LTD.
    Inventor: Tsutomu Kaneyasu