Abstract: A method for speech synthesis using prosody capture and transfer includes receiving a first speech in a target prosody and receiving a second speech in a target voice; extracting prosodic features from a first speech segment in the target prosody; generating a synthetic speech segment in the target voice with the target prosody based on transferring the prosodic features from the first speech segment per phoneme to a second speech segment.
Type:
Application
Filed:
January 30, 2023
Publication date:
August 3, 2023
Applicant:
SPEECH MORPHING SYSTEMS, INC.
Inventors:
Fathy F. YASSA, Mark Seligman, Darko Pekar
Abstract: A method for synthesizing cross-lingual speech includes receiving a request for synthesizing speech, the request for synthesizing speech including a target text document and a target language. Phonetic transcriptions are generated for the target text document. Prosodic annotations for the target text document are generated based on the target text document and the target language. Phone durations and acoustic features are generated based on the phonetic transcriptions and the prosodic annotations using a neural network. A speech corresponding to the target text document in the target language is synthesized based on the generated phone durations and acoustic features.
Abstract: A communication system is described. The communication system including an automatic speech recognizer configured to receive a speech signal and to convert the speech signal into a text sequence. The communication also including a speech analyzer configured to receive the speech signal. The speech analyzer configured to extract paralinguistic characteristics from the speech signal. The communication system further includes a translator coupled with the automatic speech recognizer. The translator configured to convert the text sequence from a first language to a second language. In addition, the communication system includes a speech output device coupled with the automatic speech recognizer and the speech analyzer. The speech output device configured to convert the text sequence into an output speech signal based on the extracted paralinguistic characteristics.
Abstract: A networked communication system is described. The communication system including an automatic speech recognizer configured to receive a speech signal from a client over a network and to convert the speech signal into a text sequence. The communication also including a speech analyzer configured to receive the speech signal. The speech analyzer configured to extract paralinguistic characteristics from the speech signal. In addition, the communication system includes a speech output device coupled with the automatic speech recognizer and the speech analyzer. The speech output device configured to convert the text sequence into an output speech signal based on the extracted paralinguistic characteristics.
Abstract: A method of morphing speech from an original speaker into the speech of a second, target speaker with decomposing either speech into source and filter, and without the need to determine the formant positions by warping spectral envelops.
Abstract: A method for determining the prosody of a tag question in human speech and preserving said prosody as the human speech is translated into a different language.
Type:
Grant
Filed:
August 1, 2016
Date of Patent:
May 1, 2018
Assignee:
SPEECH MORPHING SYSTEMS, INC.
Inventors:
Fathy Yassa, Caroline Henton, Meir Friedlander
Abstract: An exemplary computer system configured to user multiple automatic speech recognizers (ASRs) with a plurality of language and acoustic models to increase the accuracy of speech recognition.
Abstract: Method and apparatus for segmenting speech by detecting the pauses between the words and/or phrases, and to determine whether a particular time interval contains speech or non-speech, such as a pause.
Abstract: Method and apparatus for segmenting speech by detecting the pauses between the words and/or phrases, and to determine whether a particular time interval contains speech or non-speech, such as a pause.
Abstract: A method for determining the prosody of a tag question in human speech and preserving said prosody as the human speech is translated into a different language.
Abstract: Method and apparatus for segmenting speech by detecting the pauses between the words and/or phrases, and to determine whether a particular time interval contains speech or non-speech, such as a pause.
Abstract: A method of transferring the prosody of tag questions across languages includes extracting prosodic parameters of speech in a first language having a tag question and mapping the prosodic parameters to speech segments in a second language corresponding to the tag question. Accordingly, semantic and pragmatic intent of the tag question in the first language may be correctly conveyed in the second language.
Abstract: Method and apparatus for segmenting speech by detecting the pauses between the words and/or phrases, and to determine whether a particular time interval contains speech or non-speech, such as a pause.
Abstract: Method and apparatus for segmenting speech by detecting the pauses between the words and/or phrases, and to determine whether a particular time interval contains speech or non-speech, such as a pause.
Abstract: Method and apparatus for segmenting speech by detecting the pauses between the words and/or phrases, and to determine whether a particular time interval contains speech or non-speech, such as a pause.