Patents by Inventor Takaaki FUKUTOMI

Takaaki FUKUTOMI has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240127796
    Abstract: The present invention estimates intention of an utterance more accurately than the related arts. A learning device learns an estimation model on the basis of learning data including an acoustic signal for learning and a label indicating whether or not the acoustic signal has been uttered to a predetermined target. The learning device includes: a feature synchronization unit configured to obtain a post-synchronization feature by synchronizing an acoustic feature obtained from the acoustic signal for learning with a text feature corresponding to the acoustic signal; an utterance intention estimation unit configured to estimate whether or not the acoustic signal has been uttered to the predetermined target by using the post-synchronization feature; and a parameter update unit configured to update a parameter of the estimation model on the basis of the label included in the learning data and an estimation result by the utterance intention estimation unit.
    Type: Application
    Filed: February 18, 2021
    Publication date: April 18, 2024
    Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Hiroshi SATO, Takaaki FUKUTOMI, Yusuke SHINOHARA
  • Patent number: 11942074
    Abstract: A learning data acquisition device or the like, capable of acquiring learning data by superimposing noise data on clean voice data at an appropriate SN ratio, is provided. The learning data acquisition device includes a voice recognition influence degree calculation unit and a learning data acquisition unit. The voice recognition influence degree calculation unit calculates an influence degree on voice recognition accuracy caused by a change of a signal-to-noise ratio, based on a result of voice recognition on the kth noise superimposed voice data and a result of voice recognition on the k?1th noise superimposed voice data, where K is an integer of 2 or larger, k=2, 3, . . .
    Type: Grant
    Filed: January 29, 2020
    Date of Patent: March 26, 2024
    Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Takaaki Fukutomi, Takashi Nakamura, Kiyoaki Matsui
  • Patent number: 11741989
    Abstract: Detection precision of a non-verbal sound is improved. An acoustic model storage unit 10A stores an acoustic model that is configured by a deep neural network with a bottleneck structure, and estimates a phoneme state from a sound feature value. A non-verbal sound model storage unit 10B stores a non-verbal sound model that estimates a posterior probability of a non-verbal sound likeliness from the sound feature value and a bottleneck feature value. A sound feature value extraction unit 11 extracts a sound feature value from an input sound signal. A bottleneck feature value estimation unit 12 inputs the sound feature value to the acoustic model and obtains an output of a bottleneck layer of the acoustic model as a bottleneck feature value. A non-verbal sound detection unit 13 inputs the sound feature value and the bottleneck feature value to the non-verbal sound model and obtains the posterior probability of the non-verbal sound likeliness output by the non-verbal sound model.
    Type: Grant
    Filed: October 31, 2019
    Date of Patent: August 29, 2023
    Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Takashi Nakamura, Takaaki Fukutomi, Kiyoaki Matsui
  • Patent number: 11621015
    Abstract: A training speech data generating apparatus includes: a voice conversion unit that converts, using fourth noise data, which is noise data based on third noise data, and speech data, the speech data so as to make the speech data clearly audible under a noise environment corresponding to the fourth noise data; and a noise superimposition unit that obtains training speech data by superimposing the third noise data and the converted speech data.
    Type: Grant
    Filed: March 11, 2019
    Date of Patent: April 4, 2023
    Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Takaaki Fukutomi, Manabu Okamoto, Takashi Nakamura, Kiyoaki Matsui
  • Patent number: 11587553
    Abstract: Provided is technology for assessing whether uttered speech detected from input speech is speech suited to a prescribed purpose. A method comprises detecting, from input speech including speech uttered by a speaker and noise, the uttered speech corresponding to the speech uttered by the speaker, extracting an acoustic feature of the uttered speech, generating, from the uttered speech, a speech recognition result set with a recognition score, generating, from the speech recognition result set with the recognition score, a speech recognition result word vector expression set and a speech recognition result part-of-speech vector expression set, generating a target utterance estimation model, providing, using the target utterance estimation model, a probability of the uttered speech being suited to the prescribed purpose, and outputting the uttered speech and the speech recognition result set with the recognition score, the the uttered speech suitable to the prescribed purpose.
    Type: Grant
    Filed: February 7, 2019
    Date of Patent: February 21, 2023
    Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Takashi Nakamura, Takaaki Fukutomi
  • Publication number: 20220328047
    Abstract: Recognition results are acquired with high responsiveness without being affected by a network communication state. A speech recognition control device (1) acquires recognition results from a speech recognition device (2) with which it communicates through a network (3) and a speech recognition unit (13). A communication state measuring unit (11) measures a communication state of the network (3). A speech recognition requesting unit (12) transmits a request for a speech recognition process to each of the speech recognition device (2) and the speech recognition unit (13) with a timeout time set in accordance with an immediately prior communication state of the network (3). A recognition result output unit (14) outputs a recognition result based on a recognition result received from one or recognition results received from both of the speech recognition device (2) and the speech recognition unit (13).
    Type: Application
    Filed: June 4, 2019
    Publication date: October 13, 2022
    Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Takaaki FUKUTOMI, Yoshikazu YAMAGUCHI, Yusuke SHINOHARA, Kiyoaki MATSUI, Takafumi MORIYA
  • Publication number: 20220246138
    Abstract: A learning device includes: a speech recognition portion configured to perform speech recognition processing on an acoustic feature value sequence O of an utterance unit using a recognition parameter ?ini, and obtain a recognition hypothesis Hm and an overall score xm; a hypothesis evaluation portion configured to evaluate the recognition hypothesis Hm and obtain an evaluation value Em using a correct answer text that is a correct speech recognition result for the acoustic feature value sequence O; a reranking portion configured to obtain an overall score xm,k for the recognition hypothesis Hm and give a rank rankm,k thereto using a recognition parameter ?k; an optimal parameter calculation portion configured to obtain, as a calculation result, an optimal value of a recognition parameter or a value expressing inappropriateness of the recognition parameter ?k based on the evaluation value Em and the rank rankm,k; and a model learning portion configured to learn a regression model for estimating an optimal reco
    Type: Application
    Filed: June 7, 2019
    Publication date: August 4, 2022
    Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Hiroshi SATO, Takaaki FUKUTOMI
  • Publication number: 20220122626
    Abstract: Provided is a technology of learning an acoustic model with a certain degree of accuracy of sound recognition within a short calculation period.
    Type: Application
    Filed: January 23, 2020
    Publication date: April 21, 2022
    Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Kiyoaki MATSUI, Takafumi MORIYA, Takaaki FUKUTOMI, Yusuke SHINOHARA, Yoshikazu YAMAGUCHI, Manabu OKAMOTO
  • Publication number: 20220101828
    Abstract: A learning data acquisition device or the like, capable of acquiring learning data by superimposing noise data on clean voice data at an appropriate SN ratio, is provided. The learning data acquisition device includes a voice recognition influence degree calculation unit and a learning data acquisition unit. The voice recognition influence degree calculation unit calculates an influence degree on voice recognition accuracy caused by a change of a signal-to-noise ratio, based on a result of voice recognition on the kth noise superimposed voice data and a result of voice recognition on the k?1th noise superimposed voice data, where K is an integer of 2 or larger, k=2, 3, . . .
    Type: Application
    Filed: January 29, 2020
    Publication date: March 31, 2022
    Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Takaaki FUKUTOMI, Takashi NAKAMURA, Kiyoaki MATSUI
  • Patent number: 11227580
    Abstract: The present invention provides a device for estimating the deterioration factor of speech recognition accuracy by estimating an acoustic factor that leads to a speech recognition error. The device extracts an acoustic feature amount for each frame from an input speech, calculates a posterior probability for each acoustic event for the acoustic feature amount for each frame, corrects the posterior probability by filtering the posterior probability for each acoustic event using a time-series filter with weighting coefficients developed in the time axis, outputs a set of speech recognition results with a recognition score, outputs a feature amount for the speech recognition results for each frame, calculates and outputs a principal deterioration factor class for the speech recognition accuracy for each frame on the basis of the corrected posterior probability, the feature amount for speech recognition results for each frame, and the acoustic feature amount for each frame.
    Type: Grant
    Filed: February 6, 2019
    Date of Patent: January 18, 2022
    Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Takashi Nakamura, Takaaki Fukutomi
  • Publication number: 20210272587
    Abstract: Detection precision of a non-verbal sound is improved. An acoustic model storage unit 10A stores an acoustic model that is configured by a deep neural network with a bottleneck structure, and estimates a phoneme state from a sound feature value. A non-verbal sound model storage unit 10B stores a non-verbal sound model that estimates a posterior probability of a non-verbal sound likeliness from the sound feature value and a bottleneck feature value. A sound feature value extraction unit 11 extracts a sound feature value from an input sound signal. A bottleneck feature value estimation unit 12 inputs the sound feature value to the acoustic model and obtains an output of a bottleneck layer of the acoustic model as a bottleneck feature value. A non-verbal sound detection unit 13 inputs the sound feature value and the bottleneck feature value to the non-verbal sound model and obtains the posterior probability of the non-verbal sound likeliness output by the non-verbal sound model.
    Type: Application
    Filed: October 31, 2019
    Publication date: September 2, 2021
    Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Takashi NAKAMURA, Takaaki FUKUTOMI, Kiyoaki MATSUI
  • Publication number: 20210035558
    Abstract: Provided is technology for assessing whether uttered speech detected from input speech is speech suited to a prescribed purpose. A method comprises detecting, from input speech including speech uttered by a speaker and noise, the uttered speech corresponding to the speech uttered by the speaker, extracting an acoustic feature of the uttered speech, generating, from the uttered speech, a speech recognition result set with a recognition score, generating, from the speech recognition result set with the recognition score, a speech recognition result word vector expression set and a speech recognition result part-of-speech vector expression set, generating a target utterance estimation model, providing, using the target utterance estimation model, a probability of the uttered speech being suited to the prescribed purpose, and outputting the uttered speech and the speech recognition result set with the recognition score, the the uttered speech suitable to the prescribed purpose.
    Type: Application
    Filed: February 7, 2019
    Publication date: February 4, 2021
    Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Takashi NAKAMURA, Takaaki FUKUTOMI
  • Publication number: 20210035553
    Abstract: The present invention provides a device for estimating the deterioration factor of speech recognition accuracy by estimating an acoustic factor that leads to a speech recognition error. The device extracts an acoustic feature amount for each frame from an input speech, calculates a posterior probability for each acoustic event for the acoustic feature amount for each frame, corrects the posterior probability by filtering the posterior probability for each acoustic event using a time-series filter with weighting coefficients developed in the time axis, outputs a set of speech recognition results with a recognition score, outputs a feature amount for the speech recognition results for each frame, calculates and outputs a principal deterioration factor class for the speech recognition accuracy for each frame on the basis of the corrected posterior probability, the feature amount for speech recognition results for each frame, and the acoustic feature amount for each frame.
    Type: Application
    Filed: February 6, 2019
    Publication date: February 4, 2021
    Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Takashi NAKAMURA, Takaaki FUKUTOMI
  • Publication number: 20210005215
    Abstract: A training speech data generating apparatus includes: a voice conversion unit that converts, using fourth noise data, which is noise data based on third noise data, and speech data, the speech data so as to make the speech data clearly audible under a noise environment corresponding to the fourth noise data; and a noise superimposition unit that obtains training speech data by superimposing the third noise data and the converted speech data.
    Type: Application
    Filed: March 11, 2019
    Publication date: January 7, 2021
    Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Takaaki FUKUTOMI, Manabu OKAMOTO, Takashi NAKAMURA, Kiyoaki MATSUI