Patents by Inventor Yusuke IJIMA

Yusuke IJIMA has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240347039
    Abstract: A speech synthesis apparatus according to the present disclosure includes a memory and a processor coupled to the memory. The processor is configured to: obtain utterance information on subjects to be uttered, wherein the subjects to be uttered are texts contained in data on a book, obtain image information on images that are contained in the data on the book, obtain speech data corresponding to the subjects to be uttered; and generate, based on the obtained utterance information, the obtained image information, and the obtained speech data, a speech synthesis model for reading out a text associated with an image.
    Type: Application
    Filed: August 18, 2022
    Publication date: October 17, 2024
    Applicants: NIPPON TELEGRAPH AND TELEPHONE CORPORATION, The University of Tokyo
    Inventors: Yusuke IJIMA, Tomoki KORIYAMA, Shinnosuke TAKAMICHI
  • Publication number: 20240312465
    Abstract: A speaker embedding apparatus includes processing circuitry configured to accept input of voice data, generate utterance unit segmentation information indicating a duration length for each utterance of a speaker in the input voice data, and use a duration length for each utterance indicated in the generated utterance unit segmentation information as training data and train a speaker identification model for outputting an identification result of a speaker when a duration length for each utterance of the speaker is input.
    Type: Application
    Filed: February 2, 2021
    Publication date: September 19, 2024
    Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Yusuke IJIMA, Kenichi FUJITA, Atsushi ANDO
  • Patent number: 11915688
    Abstract: An estimation device (100), which is an estimation device that estimates a duration of a speech section, includes: a representation conversion unit (11) that performs representation conversion of a plurality of words included in learning utterance information to a plurality of pieces of numeric representation data; an estimation data generation unit (12) that generates estimation data by using a plurality of pieces of the learning utterance information and the plurality of pieces of numeric representation data; an estimation model learning unit (13) that learns an estimation model by using the estimation data and the durations of the plurality of words; and an estimation unit (20) that estimates the duration of a predetermined speech section based on utterance information of a user by using the estimation model.
    Type: Grant
    Filed: January 30, 2020
    Date of Patent: February 27, 2024
    Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventor: Yusuke Ijima
  • Publication number: 20240013239
    Abstract: An acquisition unit acquires a voice feature quantity vector representing a feature of input voice data, an emotion expression vector representing a customer's emotion corresponding to the voice data, and a purchase intention vector representing a purchase intention of the customer corresponding to the voice data. A learning unit generates, by learning, a purchase intention estimation model for estimating a purchase intention of a customer corresponding to the voice data by using the voice feature quantity vector, the emotion expression vector, and the purchase intention vector.
    Type: Application
    Filed: November 26, 2020
    Publication date: January 11, 2024
    Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Mizuki NAGANO, Yusuke IJIMA
  • Publication number: 20230252983
    Abstract: An input unit receives a morpheme array and parts-of-speech of morphemes of the morpheme array. An ambiguous word candidate acquisition unit (26) acquires, for each morpheme of the morpheme array, based on a notation and a part-of-speech of the morpheme, reading candidates of the morpheme from reading candidates of the morpheme defined in advance for each combination of a notation and a part-of-speech of the morpheme. A disambiguation unit (30) determines a reading of the morpheme from the acquired reading candidates of the morpheme by using a disambiguation rule in which a reading of the morpheme is defined in advance correspondingly to appearance positions of other morphemes and notations, parts-of-speech, or character types of the other morphemes.
    Type: Application
    Filed: May 8, 2019
    Publication date: August 10, 2023
    Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Nozomi KOBAYASHI, Yusuke IJIMA, Junji TOMITA
  • Patent number: 11568761
    Abstract: The present invention provides a pronunciation error detection apparatus capable of following a text without the need for a correct sentence even when erroneous recognition such as a reading error occurs.
    Type: Grant
    Filed: September 13, 2018
    Date of Patent: January 31, 2023
    Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Satoshi Kobashikawa, Ryo Masumura, Hosana Kamiyama, Yusuke Ijima, Yushi Aono
  • Publication number: 20230005468
    Abstract: A pause estimation model learning apparatus includes: a morphological analysis unit configured to perform morphological analysis on training text data to provide M types of information, M being an integer that is equal to or larger than 2; a feature selection unit configured to combine N pieces of information, among the M pieces of information, to be an input feature when a predetermined certain condition is satisfied, and select predetermined one of the N pieces of information to be the input feature when the certain condition is not satisfied, N being an integer that is equal to or larger than 2 and equal to or smaller than M; and a learning unit configured to learn a pause estimation model by using the input feature selected by the feature selection unit and a pause correct label.
    Type: Application
    Filed: November 26, 2019
    Publication date: January 5, 2023
    Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Mizuki NAGANO, Yusuke IJIMA, Nozomi KOBAYASHI
  • Patent number: 11545135
    Abstract: An acoustic model learning device is provided for obtaining an acoustic model used to synthesize voice signals with intonation. The device includes a first learning unit that learns the acoustic model to estimate synthetic acoustic feature values using voice and speaker determination models based on acoustic feature values of speakers, language feature values corresponding to the acoustic feature values and speaker data items, a second learning unit that learns the voice determination model to determine whether the synthetic acoustic feature value is a predetermined acoustic feature value or not based on the acoustic feature values and the synthetic acoustic feature values, and a third learning unit that learns the speaker determination model to determine whether the speaker of the synthetic acoustic feature value is a predetermined speaker or not based on the acoustic feature values and the synthetic acoustic feature values.
    Type: Grant
    Filed: September 25, 2019
    Date of Patent: January 3, 2023
    Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Hiroki Kanagawa, Yusuke Ijima
  • Publication number: 20220406289
    Abstract: A detection device includes a labeling acoustic feature calculation unit configured to calculate a labeling acoustic feature from voice data, a time information acquisition unit configured to acquire a label with time information corresponding to the voice data from a label with no time information corresponding to the voice data and the labeling acoustic feature through a use of a labeling acoustic model configured to receive, as inputs, a label with no time information and a labeling acoustic feature and output a label with time information, an acoustic feature prediction unit configured to predict an acoustic feature corresponding to the label with time information and acquire a predicted value through a use of an acoustic model configured to receive, as an input, a label with time information and output an acoustic feature, an acoustic feature calculation unit configured to calculate an acoustic feature from the voice data, a difference calculation unit configured to determine an acoustic difference betwe
    Type: Application
    Filed: November 25, 2019
    Publication date: December 22, 2022
    Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Hiroki KANAGAWA, Yusuke IJIMA
  • Publication number: 20220139381
    Abstract: An estimation device (100), which is an estimation device that estimates a duration of a speech section, includes: a representation conversion unit (11) that performs representation conversion of a plurality of words included in learning utterance information to a plurality of pieces of numeric representation data; an estimation data generation unit (12) that generates estimation data by using a plurality of pieces of the learning utterance information and the plurality of pieces of numeric representation data; an estimation model learning unit (13) that learns an estimation model by using the estimation data and the durations of the plurality of words; and an estimation unit (20) that estimates the duration of a predetermined speech section based on utterance information of a user by using the estimation model.
    Type: Application
    Filed: January 30, 2020
    Publication date: May 5, 2022
    Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventor: Yusuke IJIMA
  • Publication number: 20220051655
    Abstract: An acoustic model learning device is provided for obtaining an acoustic model used to synthesize voice signals with intonation. The device includes a first learning unit that learns the acoustic model to estimate synthetic acoustic feature values using voice and speaker determination models based on acoustic feature values of speakers, language feature values corresponding to the acoustic feature values and speaker data items, a second learning unit that learns the voice determination model to determine whether the synthetic acoustic feature value is a predetermined acoustic feature value or not based on the acoustic feature values and the synthetic acoustic feature values, and a third learning unit that learns the speaker determination model to determine whether the speaker of the synthetic acoustic feature value is a predetermined speaker or not based on the acoustic feature values and the synthetic acoustic feature values.
    Type: Application
    Filed: September 25, 2019
    Publication date: February 17, 2022
    Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Hiroki KANAGAWA, Yusuke IJIMA
  • Publication number: 20200219413
    Abstract: The present invention provides a pronunciation error detection apparatus capable of following a text without the need for a correct sentence even when erroneous recognition such as a reading error occurs.
    Type: Application
    Filed: September 13, 2018
    Publication date: July 9, 2020
    Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Satoshi KOBASHIKAWA, Ryo MASUMURA, Hosana KAMIYAMA, Yusuke IJIMA, Yushi AONO
  • Publication number: 20190362703
    Abstract: Provided is a word vectorization device that converts a word to a word vector considering the acoustic feature of the word. A word vectorization model learning device comprises a learning part for learning a word vectorization model by using a vector wL,s(t) indicating a word yL,s(t) included in learning text data, and an acoustic feature amount afL,s(t) that is an acoustic feature amount of speech data corresponding to the learning text data and that corresponds to the word yL,s(t). The word vectorization model includes a neural network that receives a vector indicating a word as an input and outputs the acoustic feature amount of speech data corresponding to the word, and the word vectorization model is a model that uses an output value from any intermediate layer as a word vector.
    Type: Application
    Filed: February 14, 2018
    Publication date: November 28, 2019
    Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Yusuke IJIMA, Nobukatsu HOJO, Taichi ASAMI