Patents by Inventor Ryo Morinaka
Ryo Morinaka has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 9830904Abstract: According to an embodiment, a text-to-speech device includes a context acquirer, an acoustic model parameter acquirer, a conversion parameter acquirer, a converter, and a waveform generator. The context acquirer is configured to acquire a context sequence affecting fluctuations in voice. The acoustic model parameter acquirer is configured to acquire an acoustic model parameter sequence that corresponds to the context sequence and represents an acoustic model in a standard speaking style of a target speaker. The conversion parameter acquirer is configured to acquire a conversion parameter sequence corresponding to the context sequence to convert an acoustic model parameter in the standard speaking style into one in a different speaking style. The converter is configured to convert the acoustic model parameter sequence using the conversion parameter sequence. The waveform generator is configured to generate a voice signal based on the acoustic model parameter sequence acquired after conversion.Type: GrantFiled: June 17, 2016Date of Patent: November 28, 2017Assignee: KABUSHIKI KAISHA TOSHIBAInventors: Yu Nasu, Masatsune Tamura, Ryo Morinaka, Masahiro Morita
-
Publication number: 20160300564Abstract: According to an embodiment, a text-to-speech device includes a context acquirer, an acoustic model parameter acquirer, a conversion parameter acquirer, a converter, and a waveform generator. The context acquirer is configured to acquire a context sequence affecting fluctuations in voice. The acoustic model parameter acquirer is configured to acquire an acoustic model parameter sequence that corresponds to the context sequence and represents an acoustic model in a standard speaking style of a target speaker. The conversion parameter acquirer is configured to acquire a conversion parameter sequence corresponding to the context sequence to convert an acoustic model parameter in the standard speaking style into one in a different speaking style. The converter is configured to convert the acoustic model parameter sequence using the conversion parameter sequence. The waveform generator is configured to generate a voice signal based on the acoustic model parameter sequence acquired after conversion.Type: ApplicationFiled: June 17, 2016Publication date: October 13, 2016Applicant: KABUSHIKI KAISHA TOSHIBAInventors: Yu NASU, Masatsune Tamura, Ryo Morinaka, Masahiro Morita
-
Patent number: 9002711Abstract: According to an embodiment, a speech synthesis apparatus includes a selecting unit configured to select speaker's parameters one by one for respective speakers and obtain a plurality of speakers' parameters, the speaker's parameters being prepared for respective pitch waveforms corresponding to speaker's speech sounds, the speaker's parameters including formant frequencies, formant phases, formant powers, and window functions concerning respective formants that are contained in the respective pitch waveforms. The apparatus includes a mapping unit configured to make formants correspond to each other between the plurality of speakers' parameters using a cost function based on the formant frequencies and the formant powers. The apparatus includes a generating unit configured to generate an interpolated speaker's parameter by interpolating, at desired interpolation ratios, the formant frequencies, formant phases, formant powers, and window functions of formants which are made to correspond to each other.Type: GrantFiled: December 16, 2010Date of Patent: April 7, 2015Assignee: Kabushiki Kaisha ToshibaInventors: Ryo Morinaka, Takehiko Kagoshima
-
Publication number: 20140257816Abstract: According to an embodiment, a speech synthesis dictionary modification device includes an extracting unit, a display unit, an acquiring unit, an modification unit, and an updating unit. The extracting unit extracts a synthesis information containing a feature sequence of a synthetic speech from the synthetic speech generated by using a speech synthesis dictionary containing probability distributions of speech features. The display unit displays an image prompting to modify a probability distribution contained in the speech synthesis dictionary on a basis of the synthesis information extracted by the extracting unit. The acquiring unit acquires an instruction to modify the probability distribution contained in the speech synthesis dictionary. The modification unit modifies the probability distribution contained in the speech synthesis dictionary according to the instruction.Type: ApplicationFiled: January 31, 2014Publication date: September 11, 2014Applicant: KABUSHIKI KAISHA TOSHIBAInventors: Ryo Morinaka, Masatsune Tamura, Masahiro Morita
-
Patent number: 8175881Abstract: A phoneme sequence corresponding to a target speech is divided into a plurality of segments. A plurality of speech units for each segment is selected from a speech unit memory that stores speech units having at least one frame. The plurality of speech units has a prosodic feature accordant or similar to the target speech. A formant parameter having at least one formant frequency is generated for each frame of the plurality of speech units. A fused formant parameter of each frame is generated from formant parameters of each frame of the plurality of speech units. A fused speech unit of each segment is generated from the fused formant parameter of each frame. A synthesized speech is generated by concatenating the fused speech unit of each segment.Type: GrantFiled: August 14, 2008Date of Patent: May 8, 2012Assignee: Kabushiki Kaisha ToshibaInventors: Ryo Morinaka, Masatsune Tamura, Takehiko Kagoshima
-
Publication number: 20110087488Abstract: According to an embodiment, a speech synthesis apparatus includes a selecting unit configured to select speaker's parameters one by one for respective speakers and obtain a plurality of speakers' parameters, the speaker's parameters being prepared for respective pitch waveforms corresponding to speaker's speech sounds, the speaker's parameters including formant frequencies, formant phases, formant powers, and window functions concerning respective formants that are contained in the respective pitch waveforms. The apparatus includes a mapping unit configured to make formants correspond to each other between the plurality of speakers' parameters using a cost function based on the formant frequencies and the formant powers. The apparatus includes a generating unit configured to generate an interpolated speaker's parameter by interpolating, at desired interpolation ratios, the formant frequencies, formant phases, formant powers, and window functions of formants which are made to correspond to each other.Type: ApplicationFiled: December 16, 2010Publication date: April 14, 2011Inventors: Ryo Morinaka, Takehiko Kagoshima
-
Publication number: 20090326951Abstract: Ratios of powers at the peaks of respective formants of the spectrum of a pitch-cycle waveform and powers at boundaries between the formants are obtained and, when the ratios are large, bandwidth of window functions are widened and the formant waveforms are generated by multiplying generated sinusoidal waveforms from the formant parameter sets on the basis of pitch-cycle waveform generating data by the window functions of the widened bandwidth, whereby a pitch-cycle waveform is generated by the sum of these formant waveforms.Type: ApplicationFiled: April 14, 2009Publication date: December 31, 2009Applicant: KABUSHIKI KAISHA TOSHIBAInventors: Ryo Morinaka, Takehiko Kagoshima
-
Publication number: 20090048844Abstract: A phoneme sequence corresponding to a target speech is divided into a plurality of segments. A plurality of speech units for each segment is selected from a speech unit memory that stores speech units having at least one frame. The plurality of speech units has a prosodic feature accordant or similar to the target speech. A formant parameter having at least one formant frequency is generated for each frame of the plurality of speech units. A fused formant parameter of each frame is generated from formant parameters of each frame of the plurality of speech units. A fused speech unit of each segment is generated from the fused formant parameter of each frame. A synthesized speech is generated by concatenating the fused speech unit of each segment.Type: ApplicationFiled: August 14, 2008Publication date: February 19, 2009Applicant: KABUSHIKI KAISHA TOSHIBAInventors: Ryo Morinaka, Masatsune Tamura, Takehiko Kagoshima