Patents by Inventor Masatsune Tamura

Masatsune Tamura has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11423874
    Abstract: A speech synthesis model training device includes one or more hardware processors configured to perform the following. Storing, in a speech corpus storing unit, speech data, and pitch mark information and context information of the speech data. From the speech data, analyzing acoustic feature parameters at each pitch mark timing in pitch mark information. From the acoustic feature parameters analyzed, training a statistical model which has a plurality of states and which includes an output distribution of acoustic feature parameters including pitch feature parameters and a duration distribution based on timing parameters.
    Type: Grant
    Filed: July 29, 2020
    Date of Patent: August 23, 2022
    Assignee: KABUSHIKI KAISHA TOSHIBA
    Inventors: Masatsune Tamura, Masahiro Morita
  • Patent number: 11348569
    Abstract: A speech processing device includes a hardware processor configured to receive input speech and extract speech frames from the input speech. The hardware processor is configured to calculate a spectrum parameter for each of the speech frames, calculate a first phase spectrum for each of the speech frames, calculate a group delay spectrum from the first phase spectrum based on a frequency component of the first phase spectrum, calculate a band group delay parameter in a predetermined frequency band from the group delay spectrum, and calculate a band group delay compensation parameter to compensate a difference between a second phase spectrum reconstructed from the band group delay parameter and the first phase spectrum. The hardware processor is configured to generate a speech waveform based on the spectrum parameter, the band group delay parameter, and the band group delay compensation parameter.
    Type: Grant
    Filed: April 7, 2020
    Date of Patent: May 31, 2022
    Assignee: KABUSHIKI KAISHA TOSHIBA
    Inventors: Masatsune Tamura, Masahiro Morita
  • Patent number: 11170756
    Abstract: A speech processing device of an embodiment includes a spectrum parameter calculation unit, a phase spectrum calculation unit, a group delay spectrum calculation unit, a band group delay parameter calculation unit, and a band group delay compensation parameter calculation unit. The spectrum parameter calculation unit calculates a spectrum parameter. The phase spectrum calculation unit calculates a first phase spectrum. The group delay spectrum calculation unit calculates a group delay spectrum from the first phase spectrum based on a frequency component of the first phase spectrum. The band group delay parameter calculation unit calculates a band group delay parameter in a predetermined frequency band from a group delay spectrum. The band group delay compensation parameter calculation unit calculates a band group delay compensation parameter to compensate a difference between a second phase spectrum reconstructed from the band group delay parameter and the first phase spectrum.
    Type: Grant
    Filed: April 7, 2020
    Date of Patent: November 9, 2021
    Assignee: KABUSHIKI KAISHA TOSHIBA
    Inventors: Masatsune Tamura, Masahiro Morita
  • Patent number: 10878801
    Abstract: A speech synthesis device of an embodiment includes a memory unit, a creating unit, a deciding unit, a generating unit and a waveform generating unit. The memory unit stores, as statistical model information of a statistical model, an output distribution of acoustic feature parameters including pitch feature parameters and a duration distribution. The creating unit creates a statistical model sequence from context information and the statistical model information. The deciding unit decides a pitch-cycle waveform count of each state using a duration based on the duration distribution of each state of each statistical model in the statistical model sequence, and pitch information based on the output distribution of the pitch feature parameters. The generating unit generates an output distribution sequence based on the pitch-cycle waveform count, and acoustic feature parameters based on the output distribution sequence.
    Type: Grant
    Filed: February 14, 2018
    Date of Patent: December 29, 2020
    Assignee: KABUSHIKI KAISHA TOSHIBA
    Inventors: Masatsune Tamura, Masahiro Morita
  • Publication number: 20200357381
    Abstract: A speech synthesis device of an embodiment includes a memory unit, a creating unit, a deciding unit, a generating unit and a waveform generating unit. The memory unit stores, as statistical model information of a statistical model, an output distribution of acoustic feature parameters including pitch feature parameters and a duration distribution. The creating unit creates a statistical model sequence from context information and the statistical model information. The deciding unit decides a pitch-cycle waveform count of each state using a duration based on the duration distribution of each state of each statistical model in the statistical model sequence, and pitch information based on the output distribution of the pitch feature parameters. The generating unit generates an output distribution sequence based on the pitch-cycle waveform count, and acoustic feature parameters based on the output distribution sequence.
    Type: Application
    Filed: July 29, 2020
    Publication date: November 12, 2020
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Masatsune TAMURA, Masahiro MORITA
  • Publication number: 20200234692
    Abstract: A speech processing device of an embodiment includes a spectrum parameter calculation unit, a phase spectrum calculation unit, a group delay spectrum calculation unit, a band group delay parameter calculation unit, and a band group delay compensation parameter calculation unit. The spectrum parameter calculation unit calculates a spectrum parameter. The phase spectrum calculation unit calculates a first phase spectrum. The group delay spectrum calculation unit calculates a group delay spectrum from the first phase spectrum based on a frequency component of the first phase spectrum. The band group delay parameter calculation unit calculates a band group delay parameter in a predetermined frequency band from a group delay spectrum. The band group delay compensation parameter calculation unit calculates a band group delay compensation parameter to compensate a difference between a second phase spectrum reconstructed from the band group delay parameter and the first phase spectrum.
    Type: Application
    Filed: April 7, 2020
    Publication date: July 23, 2020
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Masatsune TAMURA, Masahiro MORITA
  • Publication number: 20200234691
    Abstract: A speech processing device of an embodiment includes a spectrum parameter calculation unit, a phase spectrum calculation unit, a group delay spectrum calculation unit, a band group delay parameter calculation unit, and a band group delay compensation parameter calculation unit. The spectrum parameter calculation unit calculates a spectrum parameter. The phase spectrum calculation unit calculates a first phase spectrum. The group delay spectrum calculation unit calculates a group delay spectrum from the first phase spectrum based on a frequency component of the first phase spectrum. The band group delay parameter calculation unit calculates a band group delay parameter in a predetermined frequency band from a group delay spectrum. The band group delay compensation parameter calculation unit calculates a band group delay compensation parameter to compensate a difference between a second phase spectrum reconstructed from the band group delay parameter and the first phase spectrum.
    Type: Application
    Filed: April 7, 2020
    Publication date: July 23, 2020
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Masatsune TAMURA, Masahiro MORITA
  • Patent number: 10650800
    Abstract: A speech processing device of an embodiment includes a spectrum parameter calculation unit, a phase spectrum calculation unit, a group delay spectrum calculation unit, a band group delay parameter calculation unit, and a band group delay compensation parameter calculation unit. The spectrum parameter calculation unit calculates a spectrum parameter. The phase spectrum calculation unit calculates a first phase spectrum. The group delay spectrum calculation unit calculates a group delay spectrum from the first phase spectrum based on a frequency component of the first phase spectrum. The band group delay parameter calculation unit calculates a band group delay parameter in a predetermined frequency band from a group delay spectrum. The band group delay compensation parameter calculation unit calculates a band group delay compensation parameter to compensate a difference between a second phase spectrum reconstructed from the band group delay parameter and the first phase spectrum.
    Type: Grant
    Filed: February 16, 2018
    Date of Patent: May 12, 2020
    Assignee: KABUSHIKI KAISHA TOSHIBA
    Inventors: Masatsune Tamura, Masahiro Morita
  • Patent number: 10529314
    Abstract: A speech synthesizer includes a statistical-model sequence generator, a multiple-acoustic feature parameter sequence generator, and a waveform generator. The statistical-model sequence generator generates, based on context information corresponding to an input text, a statistical model sequence that comprises a first sequence of a statistical model comprising a plurality of states. The multiple-acoustic feature parameter sequence generator, for each speech section corresponding to each state of the statistical model sequence, selects a first plurality of acoustic feature parameters from a first set of acoustic feature parameters extracted from a first speech waveform stored in a speech database and generates a multiple-acoustic feature parameter sequence that comprises a sequence of the first plurality of acoustic feature parameters.
    Type: Grant
    Filed: February 16, 2017
    Date of Patent: January 7, 2020
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Masatsune Tamura, Masahiro Morita
  • Patent number: 10347237
    Abstract: According to an embodiment, a device includes a table creator, an estimator, and a dictionary creator. The table creator is configured to create a table based on similarity between distributions of nodes of speech synthesis dictionaries of a specific speaker in respective first and second languages. The estimator is configured to estimate a matrix to transform the speech synthesis dictionary of the specific speaker in the first language to a speech synthesis dictionary of a target speaker in the first language, based on speech and a recorded text of the target speaker in the first language and the speech synthesis dictionary of the specific speaker in the first language. The dictionary creator is configured to create a speech synthesis dictionary of the target speaker in the second language, based on the table, the matrix, and the speech synthesis dictionary of the specific speaker in the second language.
    Type: Grant
    Filed: July 9, 2015
    Date of Patent: July 9, 2019
    Assignee: KABUSHIKI KAISHA TOSHIBA
    Inventors: Kentaro Tachibana, Masatsune Tamura, Yamato Ohtani
  • Patent number: 10157608
    Abstract: According to an embodiment, a voice processing device includes an interface system, a determining processor, and a predicting processor. The interface system configured to receive neutral voice data representing audio in a neutral voice of a user. The determining processor configured to determine a predictive parameter based at least in part on the neutral voice data. The predicting processor configured to predict a voice conversion model for converting the neutral voice of the speaker to a target voice using at least the predictive parameter.
    Type: Grant
    Filed: February 15, 2017
    Date of Patent: December 18, 2018
    Assignee: KABUSHIKI KAISHA TOSHIBA
    Inventors: Yamato Ohtani, Yu Nasu, Masatsune Tamura, Masahiro Morita
  • Patent number: 10109286
    Abstract: According to an embodiment, a speech synthesizer includes a source generator, a phase modulator, and a vocal tract filter unit. The source generator generates a source signal by using a fundamental frequency sequence and a pulse signal. The phase modulator modulates, with respect to the source signal generated by the source generator, a phase of the pulse signal at each pitch mark based on audio watermarking information. The vocal tract filter unit generates a speech signal by using a spectrum parameter sequence with respect to the source signal in which the phase of the pulse signal is modulated by the phase modulator.
    Type: Grant
    Filed: September 14, 2017
    Date of Patent: October 23, 2018
    Assignee: KABUSHIKI KAISHA TOSHIBA
    Inventors: Kentaro Tachibana, Takehiko Kagoshima, Masatsune Tamura, Masahiro Morita
  • Publication number: 20180174570
    Abstract: A speech synthesis device of an embodiment includes a memory unit, a creating unit, a deciding unit, a generating unit and a waveform generating unit. The memory unit stores, as statistical model information of a statistical model, an output distribution of acoustic feature parameters including pitch feature parameters and a duration distribution. The creating unit creates a statistical model sequence from context information and the statistical model information. The deciding unit decides a pitch-cycle waveform count of each state using a duration based on the duration distribution of each state of each statistical model in the statistical model sequence, and pitch information based on the output distribution of the pitch feature parameters. The generating unit generates an output distribution sequence based on the pitch-cycle waveform count, and acoustic feature parameters based on the output distribution sequence.
    Type: Application
    Filed: February 14, 2018
    Publication date: June 21, 2018
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Masatsune TAMURA, Masahiro MORITA
  • Publication number: 20180174571
    Abstract: A speech processing device of an embodiment includes a spectrum parameter calculation unit, a phase spectrum calculation unit, a group delay spectrum calculation unit, a band group delay parameter calculation unit, and a band group delay compensation parameter calculation unit. The spectrum parameter calculation unit calculates a spectrum parameter. The phase spectrum calculation unit calculates a first phase spectrum. The group delay spectrum calculation unit calculates a group delay spectrum from the first phase spectrum based on a frequency component of the first phase spectrum. The band group delay parameter calculation unit calculates a band group delay parameter in a predetermined frequency band from a group delay spectrum. The band group delay compensation parameter calculation unit calculates a band group delay compensation parameter to compensate a difference between a second phase spectrum reconstructed from the band group delay parameter and the first phase spectrum.
    Type: Application
    Filed: February 16, 2018
    Publication date: June 21, 2018
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Masatsune TAMURA, Masahiro MORITA
  • Patent number: 9870779
    Abstract: According to an embodiment, a speech synthesizer includes a source generator, a phase modulator, and a vocal tract filter unit. The source generator generates a source signal by using a fundamental frequency sequence and a pulse signal. The phase modulator modulates, with respect to the source signal generated by the source generator, a phase of the pulse signal at each pitch mark based on audio watermarking information. The vocal tract filter unit generates a speech signal by using a spectrum parameter sequence with respect to the source signal in which the phase of the pulse signal is modulated by the phase modulator.
    Type: Grant
    Filed: July 16, 2015
    Date of Patent: January 16, 2018
    Assignee: KABUSHIKI KAISHA TOSHIBA
    Inventors: Kentaro Tachibana, Takehiko Kagoshima, Masatsune Tamura, Masahiro Morita
  • Publication number: 20180005637
    Abstract: According to an embodiment, a speech synthesizer includes a source generator, a phase modulator, and a vocal tract filter unit. The source generator generates a source signal by using a fundamental frequency sequence and a pulse signal. The phase modulator modulates, with respect to the source signal generated by the source generator, a phase of the pulse signal at each pitch mark based on audio watermarking information. The vocal tract filter unit generates a speech signal by using a spectrum parameter sequence with respect to the source signal in which the phase of the pulse signal is modulated by the phase modulator.
    Type: Application
    Filed: September 14, 2017
    Publication date: January 4, 2018
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Kentaro TACHIBANA, Takehiko KAGOSHIMA, Masatsune TAMURA, Masahiro MORITA
  • Patent number: 9830904
    Abstract: According to an embodiment, a text-to-speech device includes a context acquirer, an acoustic model parameter acquirer, a conversion parameter acquirer, a converter, and a waveform generator. The context acquirer is configured to acquire a context sequence affecting fluctuations in voice. The acoustic model parameter acquirer is configured to acquire an acoustic model parameter sequence that corresponds to the context sequence and represents an acoustic model in a standard speaking style of a target speaker. The conversion parameter acquirer is configured to acquire a conversion parameter sequence corresponding to the context sequence to convert an acoustic model parameter in the standard speaking style into one in a different speaking style. The converter is configured to convert the acoustic model parameter sequence using the conversion parameter sequence. The waveform generator is configured to generate a voice signal based on the acoustic model parameter sequence acquired after conversion.
    Type: Grant
    Filed: June 17, 2016
    Date of Patent: November 28, 2017
    Assignee: KABUSHIKI KAISHA TOSHIBA
    Inventors: Yu Nasu, Masatsune Tamura, Ryo Morinaka, Masahiro Morita
  • Publication number: 20170162186
    Abstract: A speech synthesizer includes a statistical-model sequence generator, a multiple-acoustic feature parameter sequence generator, and a waveform generator. The statistical-model sequence generator generates, based on context information corresponding to an input text, a statistical model sequence that comprises a first sequence of a statistical model comprising a plurality of states. The multiple-acoustic feature parameter sequence generator, for each speech section corresponding to each state of the statistical model sequence, selects a first plurality of acoustic feature parameters from a first set of acoustic feature parameters extracted from a first speech waveform stored in a speech database and generates a multiple-acoustic feature parameter sequence that comprises a sequence of the first plurality of acoustic feature parameters.
    Type: Application
    Filed: February 16, 2017
    Publication date: June 8, 2017
    Inventors: Masatsune TAMURA, Masahiro MORITA
  • Publication number: 20170162187
    Abstract: According to an embodiment, a voice processing device includes an interface system, a determining processor, and a predicting processor. The interface system configured to receive neutral voice data representing audio in a neutral voice of a user. The determining processor configured to determine a predictive parameter based at least in part on the neutral voice data. The predicting processor configured to predict a voice conversion model for converting the neutral voice of the speaker to a target voice using at least the predictive parameter.
    Type: Application
    Filed: February 15, 2017
    Publication date: June 8, 2017
    Inventors: Yamato OHTANI, Yu NASU, Masatsune TAMURA, Masahiro MORITA
  • Publication number: 20160300564
    Abstract: According to an embodiment, a text-to-speech device includes a context acquirer, an acoustic model parameter acquirer, a conversion parameter acquirer, a converter, and a waveform generator. The context acquirer is configured to acquire a context sequence affecting fluctuations in voice. The acoustic model parameter acquirer is configured to acquire an acoustic model parameter sequence that corresponds to the context sequence and represents an acoustic model in a standard speaking style of a target speaker. The conversion parameter acquirer is configured to acquire a conversion parameter sequence corresponding to the context sequence to convert an acoustic model parameter in the standard speaking style into one in a different speaking style. The converter is configured to convert the acoustic model parameter sequence using the conversion parameter sequence. The waveform generator is configured to generate a voice signal based on the acoustic model parameter sequence acquired after conversion.
    Type: Application
    Filed: June 17, 2016
    Publication date: October 13, 2016
    Applicant: KABUSHIKI KAISHA TOSHIBA
    Inventors: Yu NASU, Masatsune Tamura, Ryo Morinaka, Masahiro Morita