Patents by Inventor Hirokazu Kameoka

Hirokazu Kameoka has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11935553
    Abstract: It is possible to stably learn, in a short time, a model that can output embedded vectors for calculating a set of time frequency points at which the same sound source is dominant. Parameters of the neural network are learned based on a spectrogram of a sound source signal formed by a plurality of sound sources such that embedded vectors for time frequency points at which the same sound source is dominant are similar to embedded vectors for each of time frequency points output by a neural network, which is a CNN.
    Type: Grant
    Filed: February 22, 2019
    Date of Patent: March 19, 2024
    Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Hirokazu Kameoka, Li Li
  • Patent number: 11900957
    Abstract: To be able to convert to a voice of the desired attribution.
    Type: Grant
    Filed: June 13, 2019
    Date of Patent: February 13, 2024
    Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventor: Hirokazu Kameoka
  • Patent number: 11869486
    Abstract: To be able to convert to a voice of the desired attribution. A learning unit learns a converter to minimize a value of a learning criterion of the converter, learns a voice identifier to minimize a value of a learning criterion of the voice identifier, and learns an attribution identifier to minimize a value of a learning criterion of the attribution identifier.
    Type: Grant
    Filed: August 13, 2019
    Date of Patent: January 9, 2024
    Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Hirokazu Kameoka, Takuhiro Kaneko
  • Publication number: 20230419977
    Abstract: A voice signal conversion model learning device including: a data-for-learning acquisition unit that acquires input data for learning that is a voice signal input; a conversion learning model execution unit that executes a conversion learning model that converts the input data for learning into learning stage conversion destination data; and an update unit that updates the conversion learning model by learning, in which: a probability density function is defined as a target feature amount distribution function, the probability density function being a function on a vector space representing a series of voice feature amounts and representing a distribution of a series of voice feature amounts of a target voice signal that is a voice signal having a predetermined attribute; a point is defined as an initial value point, the point being in the vector space and representing a series of feature amounts of the input data for learning; a function is defined as a score function, the function having a point x in the ve
    Type: Application
    Filed: November 10, 2020
    Publication date: December 28, 2023
    Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventor: Hirokazu KAMEOKA
  • Publication number: 20230386489
    Abstract: A voice signal conversion model learning device comprising: a learning data acquisition unit that acquires learning input data which is an input voice signal; and a learning stage conversion unit that executes a conversion learning model which is a model of machine learning including learning stage conversion processing of converting the learning input data into learning stage conversion destination data which is a voice signal of a conversion destination, wherein the learning stage conversion processing includes local feature quantity acquisition processing of acquiring a feature quantity for each learning input-side subset which is a subset of processing target input data having the processing target input data as a population, based on the processing target input data which is data to be processed, the conversion learning model further includes adjustment parameter value acquisition processing of acquiring an adjustment parameter value, which is a value of a parameter for adjusting a statistical value of a
    Type: Application
    Filed: October 23, 2020
    Publication date: November 30, 2023
    Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Takuhiro KANEKO, Hirokazu KAMEOKA, Ko TANAKA, Nobukatsu HOJO
  • Patent number: 11798579
    Abstract: A parameter included in a fundamental frequency pattern of a voice can be estimated from the fundamental frequency pattern with high accuracy and the fundamental frequency pattern of the voice can be reconstructed from the parameter included in the fundamental frequency pattern. A learning unit 30 learns a deep generation model including an encoder which regards a parameter included in a fundamental frequency pattern in a voice signal as a latent variable of the deep generation model and estimates the latent variable from the fundamental frequency pattern in the voice signal on the basis of parallel data of the fundamental frequency pattern in the voice signal and the parameter included in the fundamental frequency pattern in the voice signal, and a decoder which reconstructs the fundamental frequency pattern in the voice signal from the latent variable.
    Type: Grant
    Filed: February 19, 2019
    Date of Patent: October 24, 2023
    Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Ko Tanaka, Hirokazu Kameoka
  • Publication number: 20230274751
    Abstract: A voice signal conversion model learning device includes: a generation unit configured to execute generation processing of generating a conversion destination voice signal on the basis of an input voice signal that is a voice signal of an input voice, conversion source attribute information that is information indicating an attribute of an input voice that is a voice represented by the input voice signal, and conversion destination attribute information indicating an attribute of a voice represented by the conversion destination voice signal that is a voice signal of a conversion destination of the input voice signal; and an identification unit configured to execute voice estimation processing of estimating whether or not a voice signal that is a processing target is a voice signal representing a vocal sound actually uttered by a person on the basis of the conversion source attribute information and the conversion destination attribute intonation, wherein the conversion destination voice signal is input to th
    Type: Application
    Filed: July 27, 2020
    Publication date: August 31, 2023
    Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Takuhiro KANEKO, Hirokazu KAMEOKA, Ko TANAKA, Nobukatsu HOJO
  • Publication number: 20230260539
    Abstract: A voice signal conversion model learning device including: a generation unit configured to generate a conversion destination voice signal on the basis of an input voice signal that is a voice signal of an input voice and conversion destination attribute information indicating an attribute of a voice represented by the conversion destination voice signal that is a voice signal of a conversion destination of the input voice signal; and an identification unit configured to execute a voice estimation process of estimating whether a voice signal represents a voice actually uttered by a person on the basis of the conversion destination voice signal, wherein the generation unit executes characteristic processing that is processing based on a neural network with respect to information indicating characteristics of the input voice signal and processing of converting a result of the characteristic processing based on a conversion mapping that is a mapping updated in accordance with an estimation result of the identific
    Type: Application
    Filed: July 27, 2020
    Publication date: August 17, 2023
    Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Takuhiro KANEKO, Hirokazu KAMEOKA, Ko TANAKA, Nobukatsu HOJO
  • Publication number: 20230138232
    Abstract: A conversion learning device includes: a source encoding unit that converts, by using a first machine learning model, a feature amount sequence of a source domain that is a characteristic of conversion-source content data, into a first internal representation vector sequence that is a matrix in which internal representation vectors at individual locations of the feature amount sequence of the source domain are arranged; a target encoding unit that converts, by using a second machine learning model, a feature amount sequence of a target domain that is a characteristic of conversion-target content data, into a second internal representation vector sequence that is a matrix in which internal representation vectors at individual locations of the feature amount sequence of the target domain are arranged; an attention matrix calculation unit that calculates, by using the first internal representation vector sequence and the second internal representation vector sequence, an attention matrix that is a matrix mapping
    Type: Application
    Filed: January 30, 2020
    Publication date: May 4, 2023
    Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Hirokazu KAMEOKA, Ko TANAKA, Takuhiro KANEKO, Nobukatsu HOJO
  • Publication number: 20220335944
    Abstract: A voice conversion device is provided with a linguistic information extraction unit that extracts linguistic information corresponding to utterance content from a conversion source voice signal, an appearance feature extraction unit that extracts appearance features expressing features related to the look of a person's face from a captured image of the person, and a converted voice generation unit that generates a converted voice on a basis of the linguistic information and the appearance features.
    Type: Application
    Filed: September 4, 2020
    Publication date: October 20, 2022
    Inventors: Hirokazu Kameoka, Ko Tanaka, Yasunori Oishi, Takuhiro Kaneko, Aaron Puche Valero
  • Patent number: 11450332
    Abstract: To be able to convert to a voice of the desired attribution. Learning an encoder for, on the basis of parallel data of a sound feature vector series in a conversion-source voice signal and a latent vector series in the conversion-source voice signal, and an attribution label indicating attribution of the conversion-source voice signal, estimating a latent vector series from input of a sound feature vector series and an attribution label, and a decoder for reconfiguring the sound feature vector series from input of the latent vector series and the attribution label.
    Type: Grant
    Filed: February 20, 2019
    Date of Patent: September 20, 2022
    Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Hirokazu Kameoka, Takuhiro Kaneko, Ko Tanaka, Nobukatsu Hojo
  • Patent number: 11393452
    Abstract: The present invention relates to methods of converting a speech into another speech that sounds more natural. The method includes learning for a target conversion function and a target identifier according to an optimal condition in which the target conversion function and the target identifier compete with each other. The target conversion function converts source speech into target speech. The target identifier identifies whether the converted target speech follows the same distribution as actual target speech. The methods include learning for a source conversion function and a source identifier according to an optimal condition in which the source conversion function and the source identifier compete with each other. The source conversion function converts target speech into source speech, and the source identifier identifies whether the converted source speech follows the same distribution as actual source speech.
    Type: Grant
    Filed: February 20, 2019
    Date of Patent: July 19, 2022
    Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Ko Tanaka, Takuhiro Kaneko, Hirokazu Kameoka, Nobukatsu Hojo
  • Publication number: 20220215851
    Abstract: The present invention realizes conversion into data having a desired attribute. A training unit 32 trains a converter so as to minimize the value of a learning criterion for the converter, and trains an integrated discriminator so as to minimize the value of a learning criterion for the integrated discriminator.
    Type: Application
    Filed: January 31, 2020
    Publication date: July 7, 2022
    Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Hirokazu KAMEOKA, Takuhiro KANEKO, Ko TANAKA
  • Publication number: 20220156552
    Abstract: Accurate conversion to data of a conversion target domain is allowed. A training unit 32 trains a forward generator, an inverse generator, a conversion target discriminator, and a conversion source discriminator to optimize an objective function.
    Type: Application
    Filed: February 26, 2020
    Publication date: May 19, 2022
    Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Takuhiro KANEKO, Hirokazu KAMEOKA, Ko TANAKA, Nobukatsu HOJO
  • Publication number: 20220122591
    Abstract: To be able to convert to a voice of the desired attribution. A learning unit learns a converter to minimize a value of a learning criterion of the converter, learns a voice identifier to minimize a value of a learning criterion of the voice identifier, and learns an attribution identifier to minimize a value of a learning criterion of the attribution identifier.
    Type: Application
    Filed: August 13, 2019
    Publication date: April 21, 2022
    Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Hirokazu KAMEOKA, Takuhiro KANEKO
  • Publication number: 20210118460
    Abstract: To be able to convert to a voice of the desired attribution.
    Type: Application
    Filed: June 13, 2019
    Publication date: April 22, 2021
    Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventor: Hirokazu KAMEOKA
  • Publication number: 20200395036
    Abstract: It is possible to stably learn, in a short time, a model that can output embedded vectors for calculating a set of time frequency points at which the same sound source is dominant. Parameters of the neural network are learned based on a spectrogram of a sound source signal formed by a plurality of sound sources such that embedded vectors for time frequency points at which the same sound source is dominant are similar to embedded vectors for each of time frequency points output by a neural network, which is a CNN.
    Type: Application
    Filed: February 22, 2019
    Publication date: December 17, 2020
    Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Hirokazu KAMEOKA, Li LI
  • Publication number: 20200394996
    Abstract: To be able to convert to a voice of more natural audio quality.
    Type: Application
    Filed: February 20, 2019
    Publication date: December 17, 2020
    Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Ko TANAKA, Takuhiro KANEKO, Hirokazu KAMEOKA, Nobukatsu HOJO
  • Publication number: 20200395028
    Abstract: To be able to convert to a voice of the desired attribution. Learning an encoder for, on the basis of parallel data of a sound feature vector series in a conversion-source voice signal and a latent vector series in the conversion-source voice signal, and an attribution label indicating attribution of the conversion-source voice signal, estimating a latent vector series from input of a sound feature vector series and an attribution label, and a decoder for reconfiguring the sound feature vector series from input of the latent vector series and the attribution label.
    Type: Application
    Filed: February 20, 2019
    Publication date: December 17, 2020
    Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Hirokazu KAMEOKA, Takuhiro KANEKO, Ko TANAKA, Nobukatsu HOJO
  • Publication number: 20200395041
    Abstract: A parameter included in a fundamental frequency pattern of a voice can be estimated from the fundamental frequency pattern with high accuracy and the fundamental frequency pattern of the voice can be reconstructed from the parameter included in the fundamental frequency pattern. A learning unit 30 learns a deep generation model including an encoder which regards a parameter included in a fundamental frequency pattern in a voice signal as a latent variable of the deep generation model and estimates the latent variable from the fundamental frequency pattern in the voice signal on the basis of parallel data of the fundamental frequency pattern in the voice signal and the parameter included in the fundamental frequency pattern in the voice signal, and a decoder which reconstructs the fundamental frequency pattern in the voice signal from the latent variable.
    Type: Application
    Filed: February 19, 2019
    Publication date: December 17, 2020
    Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Ko TANAKA, Hirokazu KAMEOKA