Abstract: The conversion of speech can be used to transform an utterance by a source speaker to match the speech characteristic of a target speaker, for applications such as dubbing a motion picture. During a training phase, utterances corresponding to the same sentences by both the target speaker and source speaker are force aligned according to the phonemes within the sentences. A transformation or mapping is trained so that each frame of the source utterances is mapped to a corresponding frame of the target utterance. After the completion of the training phase, a source utterance is divided into frames, which are transformed into target frames. After all target frames are created from the sequence of frames from the source utterance, a target utterance is created having the speech of the source speaker, but with the vocal characteristics of the target speaker.
Type:
Application
Filed:
March 8, 2006
Publication date:
September 13, 2007
Applicant:
Voxonic, Inc.
Inventors:
Oytun Turk, Levent Mustafa Arslan, Fred Deutsch
Abstract: An automatic donor selection algorithm estimates the subjective voice conversion output quality from a set of objective distance measures between the source and target speaker's acoustical features. The algorithm learns the relationship of the subjective scores and the objective distance measures through nonlinear regression with an MLP. Once the MLP is trained, the algorithm can be used in the selection or ranking of a set of source speakers in terms of the expected output quality for transformations to a specific target voice.
Type:
Application
Filed:
March 14, 2006
Publication date:
February 1, 2007
Applicant:
Voxonic, Inc.
Inventors:
Oytun Turk, Levent Arslan, Fred Deutsch
Abstract: The conversion of speech can be used to transform an utterance by a source speaker to match the speech characteristic of a target speaker. During a training phase, utterances corresponding to the same sentences by both the target speaker can source speaker can be force aligned according to the phonemes within the sentences. A target codebook and source codebook as well as a transformation between the two can be trained. After the completion of a training phase, a source utterance can be divided into entries in the source codebook and transformed into entries in the target codebook. During the transformation, the situation arises where a single source codebook entry can have several target codebook entries. The number of entries can be reduced with the application of confidence measures.