Patents by Inventor Shota HORIGUCHI

Shota HORIGUCHI has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

MULTI-SPEAKER DIARIZATION OF AUDIO INPUT USING A NEURAL NETWORK

Publication number: 20220254352

Abstract: An audio analysis platform may receive a portion of an audio input, wherein the audio input corresponds to audio associated with a plurality of speakers. The audio analysis platform may process, using a neural network, the portion of the audio input to determine voice activity of the plurality of speakers during the portion of the audio input, wherein the neural network is trained using reference audio data and reference diarization data corresponding to the reference audio data. The audio analysis platform may determine, based on the neural network being used to process the portion of the audio input, a diarization output associated with the portion of the audio input, wherein the diarization output indicates individual voice activity of the plurality of speakers. The audio analysis platform may provide the diarization output to indicate the individual voice activity of the plurality of speakers during the portion of the audio input.

Type: Application

Filed: August 31, 2020

Publication date: August 11, 2022

Applicants: The Johns Hopkins University, Hitachi, Ltd.

Inventors: Yusuke FUJITA, Shinji WATANABE, Naoyuki KANDA, Shota HORIGUCHI
Speaker estimation method and speaker estimation device

Patent number: 11107476

Abstract: A speaker estimation method that estimate the speaker from audio and image includes: inputting audio; extracting a feature quantity representing a voice characteristic from the input audio; inputting an image; detecting person regions of respective persons from the input image; estimating feature quantities representing voice characteristics from the respective detected person regions; Performing a change such that an image taken from another position and with another angle is input when any person is not detected; calculating a similarity between the feature quantity representing the voice characteristic extracted from the audio and the feature quantity representing the voice characteristic estimated from the person region in the image; and estimating a speaker from the calculated similarity.

Type: Grant

Filed: February 26, 2019

Date of Patent: August 31, 2021

Assignee: HITACHI, LTD.

Inventors: Shota Horiguchi, Naoyuki Kanda
SPEAKER ESTIMATION METHOD AND SPEAKER ESTIMATION DEVICE

Publication number: 20190272828

Abstract: A speaker estimation method that estimate the speaker from audio and image includes: inputting audio; extracting a feature quantity representing a voice characteristic from the input audio; inputting an image; detecting person regions of respective persons from the input image; estimating feature quantities representing voice characteristics from the respective detected person regions; Performing a change such that an image taken from another position and with another angle is input when any person is not detected; calculating a similarity between the feature quantity representing the voice characteristic extracted from the audio and the feature quantity representing the voice characteristic estimated from the person region in the image; and estimating a speaker from the calculated similarity.

Type: Application

Filed: February 26, 2019

Publication date: September 5, 2019

Applicant: HITACHI, LTD.

Inventors: Shota HORIGUCHI, Naoyuki KANDA

MULTI-SPEAKER DIARIZATION OF AUDIO INPUT USING A NEURAL NETWORK

Speaker estimation method and speaker estimation device

SPEAKER ESTIMATION METHOD AND SPEAKER ESTIMATION DEVICE