Patents by Inventor Yusuke IJIMA
Yusuke IJIMA has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240347039Abstract: A speech synthesis apparatus according to the present disclosure includes a memory and a processor coupled to the memory. The processor is configured to: obtain utterance information on subjects to be uttered, wherein the subjects to be uttered are texts contained in data on a book, obtain image information on images that are contained in the data on the book, obtain speech data corresponding to the subjects to be uttered; and generate, based on the obtained utterance information, the obtained image information, and the obtained speech data, a speech synthesis model for reading out a text associated with an image.Type: ApplicationFiled: August 18, 2022Publication date: October 17, 2024Applicants: NIPPON TELEGRAPH AND TELEPHONE CORPORATION, The University of TokyoInventors: Yusuke IJIMA, Tomoki KORIYAMA, Shinnosuke TAKAMICHI
-
Publication number: 20240312465Abstract: A speaker embedding apparatus includes processing circuitry configured to accept input of voice data, generate utterance unit segmentation information indicating a duration length for each utterance of a speaker in the input voice data, and use a duration length for each utterance indicated in the generated utterance unit segmentation information as training data and train a speaker identification model for outputting an identification result of a speaker when a duration length for each utterance of the speaker is input.Type: ApplicationFiled: February 2, 2021Publication date: September 19, 2024Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATIONInventors: Yusuke IJIMA, Kenichi FUJITA, Atsushi ANDO
-
Patent number: 11915688Abstract: An estimation device (100), which is an estimation device that estimates a duration of a speech section, includes: a representation conversion unit (11) that performs representation conversion of a plurality of words included in learning utterance information to a plurality of pieces of numeric representation data; an estimation data generation unit (12) that generates estimation data by using a plurality of pieces of the learning utterance information and the plurality of pieces of numeric representation data; an estimation model learning unit (13) that learns an estimation model by using the estimation data and the durations of the plurality of words; and an estimation unit (20) that estimates the duration of a predetermined speech section based on utterance information of a user by using the estimation model.Type: GrantFiled: January 30, 2020Date of Patent: February 27, 2024Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATIONInventor: Yusuke Ijima
-
Publication number: 20240013239Abstract: An acquisition unit acquires a voice feature quantity vector representing a feature of input voice data, an emotion expression vector representing a customer's emotion corresponding to the voice data, and a purchase intention vector representing a purchase intention of the customer corresponding to the voice data. A learning unit generates, by learning, a purchase intention estimation model for estimating a purchase intention of a customer corresponding to the voice data by using the voice feature quantity vector, the emotion expression vector, and the purchase intention vector.Type: ApplicationFiled: November 26, 2020Publication date: January 11, 2024Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATIONInventors: Mizuki NAGANO, Yusuke IJIMA
-
Publication number: 20230252983Abstract: An input unit receives a morpheme array and parts-of-speech of morphemes of the morpheme array. An ambiguous word candidate acquisition unit (26) acquires, for each morpheme of the morpheme array, based on a notation and a part-of-speech of the morpheme, reading candidates of the morpheme from reading candidates of the morpheme defined in advance for each combination of a notation and a part-of-speech of the morpheme. A disambiguation unit (30) determines a reading of the morpheme from the acquired reading candidates of the morpheme by using a disambiguation rule in which a reading of the morpheme is defined in advance correspondingly to appearance positions of other morphemes and notations, parts-of-speech, or character types of the other morphemes.Type: ApplicationFiled: May 8, 2019Publication date: August 10, 2023Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATIONInventors: Nozomi KOBAYASHI, Yusuke IJIMA, Junji TOMITA
-
Patent number: 11568761Abstract: The present invention provides a pronunciation error detection apparatus capable of following a text without the need for a correct sentence even when erroneous recognition such as a reading error occurs.Type: GrantFiled: September 13, 2018Date of Patent: January 31, 2023Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATIONInventors: Satoshi Kobashikawa, Ryo Masumura, Hosana Kamiyama, Yusuke Ijima, Yushi Aono
-
Publication number: 20230005468Abstract: A pause estimation model learning apparatus includes: a morphological analysis unit configured to perform morphological analysis on training text data to provide M types of information, M being an integer that is equal to or larger than 2; a feature selection unit configured to combine N pieces of information, among the M pieces of information, to be an input feature when a predetermined certain condition is satisfied, and select predetermined one of the N pieces of information to be the input feature when the certain condition is not satisfied, N being an integer that is equal to or larger than 2 and equal to or smaller than M; and a learning unit configured to learn a pause estimation model by using the input feature selected by the feature selection unit and a pause correct label.Type: ApplicationFiled: November 26, 2019Publication date: January 5, 2023Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATIONInventors: Mizuki NAGANO, Yusuke IJIMA, Nozomi KOBAYASHI
-
Patent number: 11545135Abstract: An acoustic model learning device is provided for obtaining an acoustic model used to synthesize voice signals with intonation. The device includes a first learning unit that learns the acoustic model to estimate synthetic acoustic feature values using voice and speaker determination models based on acoustic feature values of speakers, language feature values corresponding to the acoustic feature values and speaker data items, a second learning unit that learns the voice determination model to determine whether the synthetic acoustic feature value is a predetermined acoustic feature value or not based on the acoustic feature values and the synthetic acoustic feature values, and a third learning unit that learns the speaker determination model to determine whether the speaker of the synthetic acoustic feature value is a predetermined speaker or not based on the acoustic feature values and the synthetic acoustic feature values.Type: GrantFiled: September 25, 2019Date of Patent: January 3, 2023Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATIONInventors: Hiroki Kanagawa, Yusuke Ijima
-
Publication number: 20220406289Abstract: A detection device includes a labeling acoustic feature calculation unit configured to calculate a labeling acoustic feature from voice data, a time information acquisition unit configured to acquire a label with time information corresponding to the voice data from a label with no time information corresponding to the voice data and the labeling acoustic feature through a use of a labeling acoustic model configured to receive, as inputs, a label with no time information and a labeling acoustic feature and output a label with time information, an acoustic feature prediction unit configured to predict an acoustic feature corresponding to the label with time information and acquire a predicted value through a use of an acoustic model configured to receive, as an input, a label with time information and output an acoustic feature, an acoustic feature calculation unit configured to calculate an acoustic feature from the voice data, a difference calculation unit configured to determine an acoustic difference betweType: ApplicationFiled: November 25, 2019Publication date: December 22, 2022Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATIONInventors: Hiroki KANAGAWA, Yusuke IJIMA
-
Publication number: 20220139381Abstract: An estimation device (100), which is an estimation device that estimates a duration of a speech section, includes: a representation conversion unit (11) that performs representation conversion of a plurality of words included in learning utterance information to a plurality of pieces of numeric representation data; an estimation data generation unit (12) that generates estimation data by using a plurality of pieces of the learning utterance information and the plurality of pieces of numeric representation data; an estimation model learning unit (13) that learns an estimation model by using the estimation data and the durations of the plurality of words; and an estimation unit (20) that estimates the duration of a predetermined speech section based on utterance information of a user by using the estimation model.Type: ApplicationFiled: January 30, 2020Publication date: May 5, 2022Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATIONInventor: Yusuke IJIMA
-
Publication number: 20220051655Abstract: An acoustic model learning device is provided for obtaining an acoustic model used to synthesize voice signals with intonation. The device includes a first learning unit that learns the acoustic model to estimate synthetic acoustic feature values using voice and speaker determination models based on acoustic feature values of speakers, language feature values corresponding to the acoustic feature values and speaker data items, a second learning unit that learns the voice determination model to determine whether the synthetic acoustic feature value is a predetermined acoustic feature value or not based on the acoustic feature values and the synthetic acoustic feature values, and a third learning unit that learns the speaker determination model to determine whether the speaker of the synthetic acoustic feature value is a predetermined speaker or not based on the acoustic feature values and the synthetic acoustic feature values.Type: ApplicationFiled: September 25, 2019Publication date: February 17, 2022Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATIONInventors: Hiroki KANAGAWA, Yusuke IJIMA
-
Publication number: 20200219413Abstract: The present invention provides a pronunciation error detection apparatus capable of following a text without the need for a correct sentence even when erroneous recognition such as a reading error occurs.Type: ApplicationFiled: September 13, 2018Publication date: July 9, 2020Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATIONInventors: Satoshi KOBASHIKAWA, Ryo MASUMURA, Hosana KAMIYAMA, Yusuke IJIMA, Yushi AONO
-
Publication number: 20190362703Abstract: Provided is a word vectorization device that converts a word to a word vector considering the acoustic feature of the word. A word vectorization model learning device comprises a learning part for learning a word vectorization model by using a vector wL,s(t) indicating a word yL,s(t) included in learning text data, and an acoustic feature amount afL,s(t) that is an acoustic feature amount of speech data corresponding to the learning text data and that corresponds to the word yL,s(t). The word vectorization model includes a neural network that receives a vector indicating a word as an input and outputs the acoustic feature amount of speech data corresponding to the word, and the word vectorization model is a model that uses an output value from any intermediate layer as a word vector.Type: ApplicationFiled: February 14, 2018Publication date: November 28, 2019Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATIONInventors: Yusuke IJIMA, Nobukatsu HOJO, Taichi ASAMI