Abstract: Systems, methods, and devices are described for generating multi-tone waveforms. A count signal having a count value is generated. A plurality of step values and a plurality of phase values are received. For each increment of the count value, an index value corresponding to each step value of the plurality of step values is calculated based on the step value, the count value, and a respective phase value of the plurality of phase values. A tone point value corresponding to each calculated index value is determined to generate a plurality of tone point values for each increment of the count value. The determined tone point values are summed to generate a corresponding waveform point for each increment of the count value. A waveform is generated as a sequence of generated waveform points.
Abstract: Disclosed herein are systems, methods, and non-transitory computer-readable storage media for speech synthesis. A system practicing the method receives a set of ordered lists of speech units, for each respective speech unit in each ordered list in the set of ordered lists, constructs a sublist of speech units from a next ordered list which are suitable for concatenation, performs a cost analysis of paths through the set of ordered lists of speech units based on the sublist of speech units for each respective speech unit, and synthesizes speech using a lowest cost path of speech units through the set of ordered lists based on the cost analysis. The ordered lists can be ordered based on the respective pitch of each speech unit. In one embodiment, speech units which do not have an assigned pitch can be assigned a pitch.
Abstract: A computer implemented method, system and computer usable program code for synthesizing speech. A computer implemented method for synthesizing speech includes providing a database of speech of a source speaker, and providing a prosody model of speech of a target speaker different from the source speaker. Text input to be synthesized is received, and the prosody model of speech of the target speaker is applied to the text input to select segments of the speech of the source speaker in the database to form synthesized speech of the text input. The synthesized speech of the text input is then output.
Type:
Application
Filed:
January 7, 2008
Publication date:
July 9, 2009
Inventors:
Andrew S. Aaron, Ellen Marie Eide, Raul Fernandez
Abstract: A method and system for providing efficient menu services for an information processing system that uses a telephone or other form of audio user interface. In one embodiment, the menu services provide effective support for novice users by providing a full listing of available keywords and rotating house advertisements which inform novice users of potential features and information. For experienced users, cues are rendered so that at any time the user can say a desired keyword to invoke the corresponding application. The menu is flat to facilitate its usage. Full keyword listings are rendered after the user is given a brief cue to say a keyword. Service messages rotate words and word prosody. When listening to receive information from the user, after the user has been cued, soft background music or other audible signals are rendered to inform the user that a response may now be spoken to the service.
Type:
Application
Filed:
November 20, 2007
Publication date:
June 26, 2008
Applicant:
Microsoft Corporation
Inventors:
Lisa Joy Stifelman, Hadi Partovi, Haleh Partovi, David Bryan Alpert, Matthew Talin Marx, Scott James Bailey, Kyle D. Sims, Darby McDonough Bailey, Roderick Steven Brathwaite, Eugene Koh, Angus Macdonald Davis
Abstract: A method for concatenating a first frame of samples and a subsequent second frame of samples, the method comprising applying a phase filter adapted to minimizing a discontinuity at a boundary between the first and second frames of samples.