Patents by Inventor Takaaki FUKUTOMI
Takaaki FUKUTOMI has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240127796Abstract: The present invention estimates intention of an utterance more accurately than the related arts. A learning device learns an estimation model on the basis of learning data including an acoustic signal for learning and a label indicating whether or not the acoustic signal has been uttered to a predetermined target. The learning device includes: a feature synchronization unit configured to obtain a post-synchronization feature by synchronizing an acoustic feature obtained from the acoustic signal for learning with a text feature corresponding to the acoustic signal; an utterance intention estimation unit configured to estimate whether or not the acoustic signal has been uttered to the predetermined target by using the post-synchronization feature; and a parameter update unit configured to update a parameter of the estimation model on the basis of the label included in the learning data and an estimation result by the utterance intention estimation unit.Type: ApplicationFiled: February 18, 2021Publication date: April 18, 2024Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATIONInventors: Hiroshi SATO, Takaaki FUKUTOMI, Yusuke SHINOHARA
-
Patent number: 11942074Abstract: A learning data acquisition device or the like, capable of acquiring learning data by superimposing noise data on clean voice data at an appropriate SN ratio, is provided. The learning data acquisition device includes a voice recognition influence degree calculation unit and a learning data acquisition unit. The voice recognition influence degree calculation unit calculates an influence degree on voice recognition accuracy caused by a change of a signal-to-noise ratio, based on a result of voice recognition on the kth noise superimposed voice data and a result of voice recognition on the k?1th noise superimposed voice data, where K is an integer of 2 or larger, k=2, 3, . . .Type: GrantFiled: January 29, 2020Date of Patent: March 26, 2024Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATIONInventors: Takaaki Fukutomi, Takashi Nakamura, Kiyoaki Matsui
-
Patent number: 11741989Abstract: Detection precision of a non-verbal sound is improved. An acoustic model storage unit 10A stores an acoustic model that is configured by a deep neural network with a bottleneck structure, and estimates a phoneme state from a sound feature value. A non-verbal sound model storage unit 10B stores a non-verbal sound model that estimates a posterior probability of a non-verbal sound likeliness from the sound feature value and a bottleneck feature value. A sound feature value extraction unit 11 extracts a sound feature value from an input sound signal. A bottleneck feature value estimation unit 12 inputs the sound feature value to the acoustic model and obtains an output of a bottleneck layer of the acoustic model as a bottleneck feature value. A non-verbal sound detection unit 13 inputs the sound feature value and the bottleneck feature value to the non-verbal sound model and obtains the posterior probability of the non-verbal sound likeliness output by the non-verbal sound model.Type: GrantFiled: October 31, 2019Date of Patent: August 29, 2023Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATIONInventors: Takashi Nakamura, Takaaki Fukutomi, Kiyoaki Matsui
-
Patent number: 11621015Abstract: A training speech data generating apparatus includes: a voice conversion unit that converts, using fourth noise data, which is noise data based on third noise data, and speech data, the speech data so as to make the speech data clearly audible under a noise environment corresponding to the fourth noise data; and a noise superimposition unit that obtains training speech data by superimposing the third noise data and the converted speech data.Type: GrantFiled: March 11, 2019Date of Patent: April 4, 2023Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATIONInventors: Takaaki Fukutomi, Manabu Okamoto, Takashi Nakamura, Kiyoaki Matsui
-
Patent number: 11587553Abstract: Provided is technology for assessing whether uttered speech detected from input speech is speech suited to a prescribed purpose. A method comprises detecting, from input speech including speech uttered by a speaker and noise, the uttered speech corresponding to the speech uttered by the speaker, extracting an acoustic feature of the uttered speech, generating, from the uttered speech, a speech recognition result set with a recognition score, generating, from the speech recognition result set with the recognition score, a speech recognition result word vector expression set and a speech recognition result part-of-speech vector expression set, generating a target utterance estimation model, providing, using the target utterance estimation model, a probability of the uttered speech being suited to the prescribed purpose, and outputting the uttered speech and the speech recognition result set with the recognition score, the the uttered speech suitable to the prescribed purpose.Type: GrantFiled: February 7, 2019Date of Patent: February 21, 2023Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATIONInventors: Takashi Nakamura, Takaaki Fukutomi
-
Publication number: 20220328047Abstract: Recognition results are acquired with high responsiveness without being affected by a network communication state. A speech recognition control device (1) acquires recognition results from a speech recognition device (2) with which it communicates through a network (3) and a speech recognition unit (13). A communication state measuring unit (11) measures a communication state of the network (3). A speech recognition requesting unit (12) transmits a request for a speech recognition process to each of the speech recognition device (2) and the speech recognition unit (13) with a timeout time set in accordance with an immediately prior communication state of the network (3). A recognition result output unit (14) outputs a recognition result based on a recognition result received from one or recognition results received from both of the speech recognition device (2) and the speech recognition unit (13).Type: ApplicationFiled: June 4, 2019Publication date: October 13, 2022Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATIONInventors: Takaaki FUKUTOMI, Yoshikazu YAMAGUCHI, Yusuke SHINOHARA, Kiyoaki MATSUI, Takafumi MORIYA
-
Publication number: 20220246138Abstract: A learning device includes: a speech recognition portion configured to perform speech recognition processing on an acoustic feature value sequence O of an utterance unit using a recognition parameter ?ini, and obtain a recognition hypothesis Hm and an overall score xm; a hypothesis evaluation portion configured to evaluate the recognition hypothesis Hm and obtain an evaluation value Em using a correct answer text that is a correct speech recognition result for the acoustic feature value sequence O; a reranking portion configured to obtain an overall score xm,k for the recognition hypothesis Hm and give a rank rankm,k thereto using a recognition parameter ?k; an optimal parameter calculation portion configured to obtain, as a calculation result, an optimal value of a recognition parameter or a value expressing inappropriateness of the recognition parameter ?k based on the evaluation value Em and the rank rankm,k; and a model learning portion configured to learn a regression model for estimating an optimal recoType: ApplicationFiled: June 7, 2019Publication date: August 4, 2022Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATIONInventors: Hiroshi SATO, Takaaki FUKUTOMI
-
Publication number: 20220122626Abstract: Provided is a technology of learning an acoustic model with a certain degree of accuracy of sound recognition within a short calculation period.Type: ApplicationFiled: January 23, 2020Publication date: April 21, 2022Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATIONInventors: Kiyoaki MATSUI, Takafumi MORIYA, Takaaki FUKUTOMI, Yusuke SHINOHARA, Yoshikazu YAMAGUCHI, Manabu OKAMOTO
-
Publication number: 20220101828Abstract: A learning data acquisition device or the like, capable of acquiring learning data by superimposing noise data on clean voice data at an appropriate SN ratio, is provided. The learning data acquisition device includes a voice recognition influence degree calculation unit and a learning data acquisition unit. The voice recognition influence degree calculation unit calculates an influence degree on voice recognition accuracy caused by a change of a signal-to-noise ratio, based on a result of voice recognition on the kth noise superimposed voice data and a result of voice recognition on the k?1th noise superimposed voice data, where K is an integer of 2 or larger, k=2, 3, . . .Type: ApplicationFiled: January 29, 2020Publication date: March 31, 2022Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATIONInventors: Takaaki FUKUTOMI, Takashi NAKAMURA, Kiyoaki MATSUI
-
Patent number: 11227580Abstract: The present invention provides a device for estimating the deterioration factor of speech recognition accuracy by estimating an acoustic factor that leads to a speech recognition error. The device extracts an acoustic feature amount for each frame from an input speech, calculates a posterior probability for each acoustic event for the acoustic feature amount for each frame, corrects the posterior probability by filtering the posterior probability for each acoustic event using a time-series filter with weighting coefficients developed in the time axis, outputs a set of speech recognition results with a recognition score, outputs a feature amount for the speech recognition results for each frame, calculates and outputs a principal deterioration factor class for the speech recognition accuracy for each frame on the basis of the corrected posterior probability, the feature amount for speech recognition results for each frame, and the acoustic feature amount for each frame.Type: GrantFiled: February 6, 2019Date of Patent: January 18, 2022Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATIONInventors: Takashi Nakamura, Takaaki Fukutomi
-
Publication number: 20210272587Abstract: Detection precision of a non-verbal sound is improved. An acoustic model storage unit 10A stores an acoustic model that is configured by a deep neural network with a bottleneck structure, and estimates a phoneme state from a sound feature value. A non-verbal sound model storage unit 10B stores a non-verbal sound model that estimates a posterior probability of a non-verbal sound likeliness from the sound feature value and a bottleneck feature value. A sound feature value extraction unit 11 extracts a sound feature value from an input sound signal. A bottleneck feature value estimation unit 12 inputs the sound feature value to the acoustic model and obtains an output of a bottleneck layer of the acoustic model as a bottleneck feature value. A non-verbal sound detection unit 13 inputs the sound feature value and the bottleneck feature value to the non-verbal sound model and obtains the posterior probability of the non-verbal sound likeliness output by the non-verbal sound model.Type: ApplicationFiled: October 31, 2019Publication date: September 2, 2021Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATIONInventors: Takashi NAKAMURA, Takaaki FUKUTOMI, Kiyoaki MATSUI
-
Publication number: 20210035558Abstract: Provided is technology for assessing whether uttered speech detected from input speech is speech suited to a prescribed purpose. A method comprises detecting, from input speech including speech uttered by a speaker and noise, the uttered speech corresponding to the speech uttered by the speaker, extracting an acoustic feature of the uttered speech, generating, from the uttered speech, a speech recognition result set with a recognition score, generating, from the speech recognition result set with the recognition score, a speech recognition result word vector expression set and a speech recognition result part-of-speech vector expression set, generating a target utterance estimation model, providing, using the target utterance estimation model, a probability of the uttered speech being suited to the prescribed purpose, and outputting the uttered speech and the speech recognition result set with the recognition score, the the uttered speech suitable to the prescribed purpose.Type: ApplicationFiled: February 7, 2019Publication date: February 4, 2021Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATIONInventors: Takashi NAKAMURA, Takaaki FUKUTOMI
-
Publication number: 20210035553Abstract: The present invention provides a device for estimating the deterioration factor of speech recognition accuracy by estimating an acoustic factor that leads to a speech recognition error. The device extracts an acoustic feature amount for each frame from an input speech, calculates a posterior probability for each acoustic event for the acoustic feature amount for each frame, corrects the posterior probability by filtering the posterior probability for each acoustic event using a time-series filter with weighting coefficients developed in the time axis, outputs a set of speech recognition results with a recognition score, outputs a feature amount for the speech recognition results for each frame, calculates and outputs a principal deterioration factor class for the speech recognition accuracy for each frame on the basis of the corrected posterior probability, the feature amount for speech recognition results for each frame, and the acoustic feature amount for each frame.Type: ApplicationFiled: February 6, 2019Publication date: February 4, 2021Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATIONInventors: Takashi NAKAMURA, Takaaki FUKUTOMI
-
Publication number: 20210005215Abstract: A training speech data generating apparatus includes: a voice conversion unit that converts, using fourth noise data, which is noise data based on third noise data, and speech data, the speech data so as to make the speech data clearly audible under a noise environment corresponding to the fourth noise data; and a noise superimposition unit that obtains training speech data by superimposing the third noise data and the converted speech data.Type: ApplicationFiled: March 11, 2019Publication date: January 7, 2021Applicant: NIPPON TELEGRAPH AND TELEPHONE CORPORATIONInventors: Takaaki FUKUTOMI, Manabu OKAMOTO, Takashi NAKAMURA, Kiyoaki MATSUI