Patents by Inventor Naoyuki Kanda

Naoyuki Kanda has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Speaker estimation method and speaker estimation device

Patent number: 11107476

Abstract: A speaker estimation method that estimate the speaker from audio and image includes: inputting audio; extracting a feature quantity representing a voice characteristic from the input audio; inputting an image; detecting person regions of respective persons from the input image; estimating feature quantities representing voice characteristics from the respective detected person regions; Performing a change such that an image taken from another position and with another angle is input when any person is not detected; calculating a similarity between the feature quantity representing the voice characteristic extracted from the audio and the feature quantity representing the voice characteristic estimated from the person region in the image; and estimating a speaker from the calculated similarity.

Type: Grant

Filed: February 26, 2019

Date of Patent: August 31, 2021

Assignee: HITACHI, LTD.

Inventors: Shota Horiguchi, Naoyuki Kanda
Mixed reality display system and mixed reality display terminal

Patent number: 11068072

Abstract: In a mixed reality display system, a server and a plurality of mixed reality display terminals are connected, and virtual objects are displayed. The virtual objects include a shared virtual object for which a plurality of terminals have an operation authority and a private virtual object for which only a specific terminal has an operation authority. The server has virtual object attribute information for displaying the virtual objects in each terminal, and each terminal has a motion detecting unit that detects a motion of a user for switching between the shared virtual object and the private virtual object. When a detection result by the motion detecting unit is received from the terminal, the server updates the virtual object attribute information depending on whether the virtual object is the shared virtual object or the private virtual object, and transmits data of the virtual object after the update to each terminal.

Type: Grant

Filed: August 11, 2020

Date of Patent: July 20, 2021

Assignee: MAXELL, LTD.

Inventor: Naoyuki Kanda
Speech recognition device and computer program

Patent number: 10909976

Abstract: A speech recognition device includes: an acoustic model based on an End-to-End neural network responsive to an observed sequence formed of prescribed acoustic features obtained from a speech signal by feature extracting unit, for calculating probability of the observed sequence being a certain symbol sequence; and a decoder responsive to a symbol sequence candidate, for decoding a speech signal by a WFST based on a posterior probability of each of word sequences corresponding to the symbol sequence candidate, probabilities calculated by the acoustic model for symbol sequences selected based on an observed sequence, and a posterior probability of each of the plurality of symbol sequences.

Type: Grant

Filed: June 2, 2017

Date of Patent: February 2, 2021

Assignee: National Institute of Information and Communications Technology

Inventor: Naoyuki Kanda
MIXED REALITY DISPLAY SYSTEM AND MIXED REALITY DISPLAY TERMINAL

Publication number: 20200371602

Abstract: In a mixed reality display system, a server and a plurality of mixed reality display terminals are connected, and virtual objects are displayed. The virtual objects include a shared virtual object for which a plurality of terminals have an operation authority and a private virtual object for which only a specific terminal has an operation authority. The server has virtual object attribute information for displaying the virtual objects in each terminal, and each terminal has a motion detecting unit that detects a motion of a user for switching between the shared virtual object and the private virtual object. When a detection result by the motion detecting unit is received from the terminal, the server updates the virtual object attribute information depending on whether the virtual object is the shared virtual object or the private virtual object, and transmits data of the virtual object after the update to each terminal.

Type: Application

Filed: August 11, 2020

Publication date: November 26, 2020

Inventor: Naoyuki KANDA
Mixed reality display system and mixed reality display terminal

Patent number: 10775897

Abstract: In a mixed reality display system, a server and a plurality of mixed reality display terminals are connected, and virtual objects are displayed. The virtual objects include a shared virtual object for which a plurality of terminals have an operation authority and a private virtual object for which only a specific terminal has an operation authority. The server has virtual object attribute information for displaying the virtual objects in each terminal, and each terminal has a motion detecting unit that detects a motion of a user for switching between the shared virtual object and the private virtual object. When a detection result by the motion detecting unit is received from the terminal, the server updates the virtual object attribute information depending on whether the virtual object is the shared virtual object or the private virtual object, and transmits data of the virtual object after the update to each terminal.

Type: Grant

Filed: June 6, 2017

Date of Patent: September 15, 2020

Assignee: Maxell, Ltd.

Inventor: Naoyuki Kanda
MIXED REALITY DISPLAY SYSTEM AND MIXED REALITY DISPLAY TERMINAL

Publication number: 20200241647

Abstract: In a mixed reality display system, a server and a plurality of mixed reality display terminals are connected, and virtual objects are displayed. The virtual objects include a shared virtual object for which a plurality of terminals have an operation authority and a private virtual object for which only a specific terminal has an operation authority. The server has virtual object attribute information for displaying the virtual objects in each terminal, and each terminal has a motion detecting unit that detects a motion of a user for switching between the shared virtual object and the private virtual object. When a detection result by the motion detecting unit is received from the terminal, the server updates the virtual object attribute information depending on whether the virtual object is the shared virtual object or the private virtual object, and transmits data of the virtual object after the update to each terminal.

Type: Application

Filed: June 6, 2017

Publication date: July 30, 2020

Inventor: Naoyuki KANDA
HEAD-MOUNTED DISPLAY

Publication number: 20200167003

Abstract: A head-mounted display (HMD) 1, which is operated by a gesture operation performed by a user 3, is provided with a distance image acquisition unit 106 that detects a gesture operation, a position information acquisition unit 103 that acquires position information of the HMD 1, and a communication unit 2 that performs communication with another HMD 1?. A control unit 205 sets and displays an operating space 600 where a gesture operation performed by the user 3 is valid, exchanges position information and operating space information of the host HMD 1 and the other HMD 1 therebetween by the communication unit 2, and adjusts the operating space of the host HMD so that the operating space 600 and an operating space 600? of the other HMD 1 do not overlap each other.

Type: Application

Filed: August 24, 2017

Publication date: May 28, 2020

Inventors: Yo NONOMURA, Naoyuki KANDA
Speech recognition device and computer program

Patent number: 10607602

Abstract: An object is to provide a speech recognition device with improved recognition accuracy using characteristics of a neural network. A speech recognition device includes: an acoustic model 308 implemented by a RNN (recurrent neural network) for calculating, for each state sequence, the 45 posterior probability of a state sequence in response to an observed sequence consisting of prescribed speech features obtained from a speech; a WFST 320 based on S-1HCLG calculating, for each word sequence, posterior probability of a word sequence in response to a state sequence; and a hypothesis selecting unit 322, performing speech recognition of the speech signal based on a score calculated for each hypothesis of a 50 word sequence corresponding to the speech signal, using the posterior probabilities calculated by the acoustic model 308 and the WFST 320 for the input observed sequence.

Type: Grant

Filed: May 10, 2016

Date of Patent: March 31, 2020

Assignee: NATIONAL INSTITUTE OF INFORMATION AND COMMUNICATIONS TECHNOLOGY

Inventor: Naoyuki Kanda
Voice search system, voice search method, and computer-readable storage medium

Patent number: 10489451

Abstract: Provided is a voice search technology that can efficiently find and check a problematic call. To this end, a voice search system of the present invention includes a call search database that stores, for each of a reception channel and a transmission channel of each of a plurality of pieces of recorded call voice data, voice section sequences in association with predetermined keywords and time information. The call search database is searched based on an input search keyword, so that a voice section sequence that contains the search keyword is obtained.

Type: Grant

Filed: September 11, 2013

Date of Patent: November 26, 2019

Assignee: Hitachi, Ltd.

Inventors: Yusuke Fujita, Ryu Takeda, Naoyuki Kanda
Recurrent neural network training method, computer program therefor and speech recognition device

Patent number: 10467525

Abstract: [Object] An object is to provide a training method of improving training of a recurrent neural network (RNN) using time-sequential data. [Solution] The training method includes a step 220 of initializing the RNN, and a training step 226 of training the RNN by designating a certain vector as a start position and optimizing various parameters to minimize error function. The training step 226 includes: an updating step 250 of updating RNN parameters through Truncated BPTT using consecutive N (N?3) vectors having a designated vector as a start point and using a reference value of a tail vector as a correct label; and a first repetition step 240 of repeating the process of executing the training step by newly designating a vector at a position satisfying a prescribed relation with the tail of N vectors used at the updating step until an end condition is satisfied. The vector at a position satisfying the prescribed relation is positioned at least two vectors behind the designated vector.

Type: Grant

Filed: May 10, 2016

Date of Patent: November 5, 2019

Assignee: National Institute of Information and Communications Technology

Inventor: Naoyuki Kanda
SPEAKER ESTIMATION METHOD AND SPEAKER ESTIMATION DEVICE

Publication number: 20190272828

Abstract: A speaker estimation method that estimate the speaker from audio and image includes: inputting audio; extracting a feature quantity representing a voice characteristic from the input audio; inputting an image; detecting person regions of respective persons from the input image; estimating feature quantities representing voice characteristics from the respective detected person regions; Performing a change such that an image taken from another position and with another angle is input when any person is not detected; calculating a similarity between the feature quantity representing the voice characteristic extracted from the audio and the feature quantity representing the voice characteristic estimated from the person region in the image; and estimating a speaker from the calculated similarity.

Type: Application

Filed: February 26, 2019

Publication date: September 5, 2019

Applicant: HITACHI, LTD.

Inventors: Shota HORIGUCHI, Naoyuki KANDA
SPEECH RECOGNITION DEVICE AND COMPUTER PROGRAM

Publication number: 20190139540

Abstract: A speech recognition device includes: an acoustic model based on an End-to-End neural network responsive to an observed sequence formed of prescribed acoustic features obtained from a speech signal by feature extracting unit, for calculating probability of the observed sequence being a certain symbol sequence; and a decoder responsive to a symbol sequence candidate, for decoding a speech signal by a WFST based on a posterior probability of each of word sequences corresponding to the symbol sequence candidate, probabilities calculated by the acoustic model for symbol sequences selected based on an observed sequence, and a posterior probability of each of the plurality of symbol sequences.

Type: Application

Filed: June 2, 2017

Publication date: May 9, 2019

Applicant: National Institute of Information and Communications Technology

Inventor: Naoyuki KANDA
SPEECH RECOGNITION DEVICE AND COMPUTER PROGRAM

Publication number: 20180204566

Abstract: [Object] An object is to provide a speech recognition device with improved recognition accuracy using characteristics of a neural network. [Solution] A speech recognition device includes: an acoustic model 308 implemented by a RNN (recurrent neural network) for calculating, for each state sequence, the posterior probability of a state sequence in response to an observed sequence consisting of prescribed speech features obtained from a speech; a WFST 320 based on S?1HCLG calculating, for each word sequence, posterior probability of a word sequence in response to a state sequence; and a hypothesis selecting unit 322, performing speech recognition of the speech signal based on a score calculated for each hypothesis of a word sequence corresponding to the speech signal, using the posterior probabilities calculated by the acoustic model 308 and the WFST 320 for the input observed sequence.

Type: Application

Filed: May 10, 2016

Publication date: July 19, 2018

Inventor: Naoyuki KANDA
Mobile robot and sound source position estimation system

Patent number: 9989626

Abstract: The present invention pertains to a method for estimating a sound source position in a space with high accuracy using a microphone installed on a robot moving in the space. A mobile robot includes a self-position estimation unit configure to estimate the self-position of the mobile robot, a sound source information obtaining unit configured to obtain direction information of an observed sound source, and a sound source position estimation unit configured to estimate the position of the sound source based on the estimated self-position and the direction information of the sound source.

Type: Grant

Filed: April 12, 2013

Date of Patent: June 5, 2018

Assignee: Hitachi, Ltd.

Inventors: Takashi Sumiyoshi, Yasunari Obuchi, Naoyuki Kanda, Ryu Takeda
RECURRENT NEURAL NETWORK TRAINING METHOD, COMPUTER PROGRAM THEREFOR AND SPEECH RECOGNITION DEVICE

Publication number: 20180121800

Abstract: [Object] An object is to provide a training method of improving training of a recurrent neural network (RNN) using time-sequential data. [Solution] The training method includes a step 220 of initializing the RNN, and a training step 226 of training the RNN by designating a certain vector as a start position and optimizing various parameters to minimize error function. The training step 226 includes: an updating step 250 of updating RNN parameters through Truncated BPTT using consecutive N (N?3) vectors having a designated vector as a start point and using a reference value of a tail vector as a correct label; and a first repetition step 240 of repeating the process of executing the training step by newly designating a vector at a position satisfying a prescribed relation with the tail of N vectors used at the updating step until an end condition is satisfied. The vector at a position satisfying the prescribed relation is positioned at least two vectors behind the designated vector.

Type: Application

Filed: May 10, 2016

Publication date: May 3, 2018

Inventor: Naoyuki KANDA
Voice Search System, Voice Search Method, and Computer-Readable Storage Medium

Publication number: 20160171100

Abstract: Provided is a voice search technology that can efficiently find and check a problematic call. To this end, a voice search system of the present invention includes a call search database that stores, for each of a reception channel and a transmission channel of each of a plurality of pieces of recorded call voice data, voice section sequences in association with predetermined keywords and time information. The call search database is searched based on an input search keyword, so that a voice section sequence that contains the search keyword is obtained.

Type: Application

Filed: September 11, 2013

Publication date: June 16, 2016

Applicant: Hitachi, Ltd.

Inventors: Yusuke FUJITA, Ryu TAKEDA, Naoyuki KANDA
Mobile Robot and Sound Source Position Estimation System

Publication number: 20160103202

Abstract: The present invention pertains to a method for estimating a sound source position in a space with high accuracy using a microphone installed on a robot moving in the space. A mobile robot includes a self-position estimation unit configure to estimate the self-position of the mobile robot, a sound source information obtaining unit configured to obtain direction information of an observed sound source, and a sound source position estimation unit configured to estimate the position of the sound source based on the estimated self-position and the direction information of the sound source.

Type: Application

Filed: April 12, 2013

Publication date: April 14, 2016

Inventors: Takashi SUMIYOSHI, Yasunari OBUCHI, Naoyuki KANDA, Ryu TAKEDA
SEARCH SYSTEM AND SEARCH METHOD FOR SPEECH DATABASE

Publication number: 20090234854

Abstract: An acoustic feature representing speech data provided with meta data is extracted. Next, a group of acoustic features which are extracted only from the speech data containing a specific word in the meta data and not from the other speech data is extracted from obtained sub-groups of acoustic features. The word and the extracted group of acoustic features are associated with each other to be stored. When there is a search key matching the word in the input search keys, the group of acoustic features corresponding to the word is output. Accordingly, the efforts of a user for inputting a key when the user searches for speech data are reduced.

Type: Application

Filed: November 13, 2008

Publication date: September 17, 2009

Inventors: Naoyuki Kanda, Takashi Sumiyoshi, Yasunari Obuchi

prev 1 2