Patents by Inventor Takaaki Hori

Takaaki Hori has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

System and Method for a Dialogue Response Generation System

Publication number: 20210082398

Abstract: A computer-implemented method for training a dialogue response generation system and the dialogue response generation system are provided. The method includes arranging a first multimodal encoder-decoder for the dialogue response generation or video description having a first input and a first output, wherein the first multimodal encoder-decoder has been pretrained by training audio-video datasets with training video description sentences, arranging a second multimodal encoder-decoder for dialog response generation having a second input and a second output, providing first audio-visual datasets with first corresponding video description sentences to the first input of the first multimodal encoder-decoder, wherein the first encoder-decoder generates first output values based on the first audio-visual datasets with the first corresponding description sentences, providing the first audio-visual datasets excluding the first corresponding video description sentences to the second multimodal encoder-decoder.

Type: Application

Filed: September 13, 2019

Publication date: March 18, 2021

Inventors: Chiori Hori, Anoop Cherian, Tim Marks, Takaaki Hori
Methods and systems for recognizing simultaneous speech by multiple speakers

Patent number: 10811000

Abstract: Systems and methods for a speech recognition system for recognizing speech including overlapping speech by multiple speakers. The system including a hardware processor. A computer storage memory to store data along with having computer-executable instructions stored thereon that, when executed by the processor is to implement a stored speech recognition network. An input interface to receive an acoustic signal, the received acoustic signal including a mixture of speech signals by multiple speakers, wherein the multiple speakers include target speakers. An encoder network and a decoder network of the stored speech recognition network are trained to transform the received acoustic signal into a text for each target speaker. Such that the encoder network outputs a set of recognition encodings, and the decoder network uses the set of recognition encodings to output the text for each target speaker. An output interface to transmit the text for each target speaker.

Type: Grant

Filed: April 13, 2018

Date of Patent: October 20, 2020

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Jonathan Le Roux, Takaaki Hori, Shane Settle, Hiroshi Seki, Shinji Watanabe, John Hershey
System and Method for End-to-End Speech Recognition with Triggered Attention

Publication number: 20200312306

Abstract: A speech recognition system includes an encoder to convert an input acoustic signal into a sequence of encoder states, an alignment decoder to identify locations of encoder states in the sequence of encoder states that encode transcription outputs, a partition module to partition the sequence of encoder states into a set of partitions based on the locations of the identified encoder states, and an attention-based decoder to determine the transcription outputs for each partition of encoder states submitted to the attention-based decoder as an input. Upon receiving the acoustic signal, the system uses the encoder to produce the sequence of encoder states, partitions the sequence of encoder states into the set of partitions based on the locations of the encoder states identified by the alignment decoder, and submits the set of partitions sequentially into the attention-based decoder to produce a transcription output for each of the submitted partitions.

Type: Application

Filed: March 25, 2019

Publication date: October 1, 2020

Applicant: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Niko Moritz, Takaaki Hori, Jonathan Le Roux
Method and apparatus for open-vocabulary end-to-end speech recognition

Patent number: 10672388

Abstract: A speech recognition system includes an input device to receive voice sounds, one or more processors, and one or more storage devices storing parameters and program modules including instructions which cause the one or more processors to perform operations.

Type: Grant

Filed: December 15, 2017

Date of Patent: June 2, 2020

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Takaaki Hori, Shinji Watanabe, John Hershey
Method and apparatus for multi-lingual end-to-end speech recognition

Patent number: 10593321

Abstract: A method for training a multi-language speech recognition network includes providing utterance datasets corresponding to predetermined languages, inserting language identification (ID) labels into the utterance datasets, wherein each of the utterance datasets is labelled by each of the language ID labels, concatenating the labeled utterance datasets, generating initial network parameters from the utterance datasets, selecting the initial network parameters according to a predetermined sequence, and training, iteratively, an end-to-end network with a series of the selected initial network parameters and the concatenated labeled utterance datasets until a training result reaches a threshold.

Type: Grant

Filed: December 15, 2017

Date of Patent: March 17, 2020

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Shinji Watanabe, Takaaki Hori, Hiroshi Seki, Jonathan Le Roux, John Hershey
Methods and Systems for Recognizing Simultaneous Speech by Multiple Speakers

Publication number: 20190318725

Abstract: Systems and methods for a speech recognition system for recognizing speech including overlapping speech by multiple speakers. The system including a hardware processor. A computer storage memory to store data along with having computer-executable instructions stored thereon that, when executed by the processor is to implement a stored speech recognition network. An input interface to receive an acoustic signal, the received acoustic signal including a mixture of speech signals by multiple speakers, wherein the multiple speakers include target speakers. An encoder network and a decoder network of the stored speech recognition network are trained to transform the received acoustic signal into a text for each target speaker. Such that the encoder network outputs a set of recognition encodings, and the decoder network uses the set of recognition encodings to output the text for each target speaker. An output interface to transmit the text for each target speaker.

Type: Application

Filed: April 13, 2018

Publication date: October 17, 2019

Inventors: Jonathan Le Roux, Takaaki Hori, Shane Settle, Hiroshi Seki, Shinji Watanabe, John Hershey
Method and system for multi-modal fusion model

Patent number: 10417498

Abstract: A system for generating a word sequence includes one or more processors in connection with a memory and one or more storage devices storing instructions causing operations that include receiving first and second input vectors, extracting first and second feature vectors, estimating a first set of weights and a second set of weights, calculating a first content vector from the first set of weights and the first feature vectors, and calculating a second content vector, transforming the first content vector into a first modal content vector having a predetermined dimension and transforming the second content vector into a second modal content vector having the predetermined dimension, estimating a set of modal attention weights, generating a weighted content vector having the predetermined dimension from the set of modal attention weights and the first and second modal content vectors, and generating a predicted word using the sequence generator.

Type: Grant

Filed: March 29, 2017

Date of Patent: September 17, 2019

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Chiori Hori, Takaaki Hori, John Hershey, Tim Marks
Method and Apparatus for Multi-Lingual End-to-End Speech Recognition

Publication number: 20190189111

Abstract: A method for training a multi-language speech recognition network includes providing utterance datasets corresponding to predetermined languages, inserting language identification (ID) labels into the utterance datasets, wherein each of the utterance datasets is labelled by each of the language ID labels, concatenating the labeled utterance datasets, generating initial network parameters from the utterance datasets, selecting the initial network parameters according to a predetermined sequence, and training, iteratively, an end-to-end network with a series of the selected initial network parameters and the concatenated labeled utterance datasets until a training result reaches a threshold.

Type: Application

Filed: December 15, 2017

Publication date: June 20, 2019

Inventors: Shinji Watanabe, Takaaki Hori, Hiroshi Seki, Jonathan Le Roux, John Hershey
Method and Apparatus for Open-Vocabulary End-to-End Speech Recognition

Publication number: 20190189115

Abstract: A speech recognition system includes an input device to receive voice sounds, one or more processors, and one or more storage devices storing parameters and program modules including instructions which cause the one or more processors to perform operations.

Type: Application

Filed: December 15, 2017

Publication date: June 20, 2019

Inventors: Takaaki Hori, Shinji Watanabe, John Hershey
Method and system for training language models to reduce recognition errors

Patent number: 10176799

Abstract: A method and for training a language model to reduce recognition errors, wherein the language model is a recurrent neural network language model (RNNLM) by first acquiring training samples. An automatic speech recognition system (ASR) is applied to the training samples to produce recognized words and probabilites of the recognized words, and an N-best list is selected from the recognized words based on the probabilities. determining word errors using reference data for hypotheses in the N-best list. The hypotheses are rescored using the RNNLM. Then, we determine gradients for the hypotheses using the word errors and gradients for words in the hypotheses. Lastly, parameters of the RNNLM are updated using a sum of the gradients.

Type: Grant

Filed: February 2, 2016

Date of Patent: January 8, 2019

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Takaaki Hori, Chiori Hori, Shinji Watanabe, John Hershey
System and Method for End-to-End speech recognition

Publication number: 20180330718

Abstract: A speech recognition system includes an input device to receive voice sounds, one or more processors, and one or more storage devices storing parameters and program modules including instructions executable by the one or more processors.

Type: Application

Filed: May 11, 2017

Publication date: November 15, 2018

Applicant: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Takaaki Hori, Shinji Watanabe, John Hershey
System and Method for Multichannel End-to-End Speech Recognition

Publication number: 20180261225

Abstract: A speech recognition system includes a plurality of microphones to receive acoustic signals including speech signals, an input interface to generate multichannel inputs from the acoustic signals, one or more storages to store a multichannel speech recognition network, wherein the multichannel speech recognition network comprises mask estimation networks to generate time-frequency masks from the multichannel inputs, a beamformer network trained to select a reference channel input from the multichannel inputs using the time-frequency masks and generate an enhanced speech dataset based on the reference channel input and an encoder-decoder network trained to transform the enhanced speech dataset into a text. The system further includes one or more processors, using the multichannel speech recognition network in association with the one or more storages, to generate the text from the multichannel inputs, and an output interface to render the text.

Type: Application

Filed: October 3, 2017

Publication date: September 13, 2018

Inventors: Shinji Watanabe, Tsubasa Ochiai, Takaaki Hori, John R. Hershey
Method and System for Multi-Modal Fusion Model

Publication number: 20180189572

Abstract: A system for generating a word sequence includes one or more processors in connection with a memory and one or more storage devices storing instructions causing operations that include receiving first and second input vectors, extracting first and second feature vectors, estimating a first set of weights and a second set of weights, calculating a first content vector from the first set of weights and the first feature vectors, and calculating a second content vector, transforming the first content vector into a first modal content vector having a predetermined dimension and transforming the second content vector into a second modal content vector having the predetermined dimension, estimating a set of modal attention weights, generating a weighted content vector having the predetermined dimension from the set of modal attention weights and the first and second modal content vectors, and generating a predicted word using the sequence generator.

Type: Application

Filed: March 29, 2017

Publication date: July 5, 2018

Applicant: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Chiori Hori, Takaaki Hori, John Hershey, Tim Marks
Method and System for Multi-Label Classification

Publication number: 20180157743

Abstract: A method for performing multi-label classification includes extracting a feature vector from an input vector including input data by a feature extractor, determining, by a label predictor, a relevant vector including relevant labels having relevant scores based on the feature vector, updating a binary masking vector by masking pre-selected labels having been selected in previous label selections, applying the updated binary masking vector to the relevant vector such that the relevant label vector is updated to exclude the pre-selected labels from the relevant labels, and selecting a relevant label from the updated relevant label vector based on the relevant scores of the updated relevant label vector.

Type: Application

Filed: December 7, 2016

Publication date: June 7, 2018

Applicant: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Takaaki Hori, Chiori Hori, Shinji Watanabe, John Hershey, Bret Harsham, Jonathan Le Roux
Method and system for role dependent context sensitive spoken and textual language understanding with neural networks

Patent number: 9842106

Abstract: A method and system processes utterances that are acquired either from an automatic speech recognition (ASR) system or text. The utterances have associated identities of each party, such as role A utterances and role B utterances. The information corresponding to utterances, such as word sequence and identity, are converted to features. Each feature is received in an input layer of a neural network (NN). A dimensionality of each feature is reduced, in a projection layer of the NN, to produce a reduced dimensional feature. The reduced dimensional feature is processed to provide probabilities of labels for the utterances.

Type: Grant

Filed: December 4, 2015

Date of Patent: December 12, 2017

Assignee: Mitsubishi Electric Research Laboratories, Inc

Inventors: Chiori Hori, Takaaki Hori, Shinji Watanabe, John Hershey
Method and System for Training Language Models to Reduce Recognition Errors

Publication number: 20170221474

Abstract: A method and for training a language model to reduce recognition errors, wherein the language model is a recurrent neural network language model (RNNLM) by first acquiring training samples. An automatic speech recognition system (ASR) is appled to the training samples to produce recognized words and probabilites of the recognized words, and an N-best list is selected from the recognized words based on the probabilities. determining word erros using reference data for hypotheses in the N-best list. The hypotheses are rescored using the RNNLM. Then, we determine gradients for the hypotheses using the word errors and gradients for words in the hypotheses. Lastly, parameters of the RNNLM are updated using a sum of the gradients.

Type: Application

Filed: February 2, 2016

Publication date: August 3, 2017

Inventors: Takaaki Hori, Chiori Hori, Shinji Watanabe, John Hershey
Method and System for Role Dependent Context Sensitive Spoken and Textual Language Understanding with Neural Networks

Publication number: 20170161256

Abstract: A method and system processes utterances that are acquired either from an automatic speech recognition (ASR) system or text. The utterances have associated identities of each party, such as role A utterances and role B utterances. The information corresponding to utterances, such as word sequence and identity, are converted to features. Each feature is received in an input layer of a neural network(NN). A dimensionality of each feature is reduced, in a projection layer of the NN, to produce a reduced dimensional feature. The reduced dimensional feature is processed to provide probabilities of labels for the utterances.

Type: Application

Filed: December 4, 2015

Publication date: June 8, 2017

Inventors: Chiori Hori, Takaaki Hori, Shinji Watanabe, John Hershey
Body cosmetics for wetted skin

Patent number: 9517192

Abstract: A body cosmetic for application to wetted skin after bathing, and the like, which provide high moisturizing effects, spread well to all over the body, and are easy to apply. An oil-in-water body cosmetic for application to wetted skin, which contains a water-soluble polymer and the following components (A) to (C): (A) from 20 to 50% by mass of oily ingredients containing (A1) oil that is pasty at 25° C. and (A2) polar oil that is liquid at 25° C.

Type: Grant

Filed: December 1, 2014

Date of Patent: December 13, 2016

Assignee: KAO CORPORATION

Inventors: Tomomi Ishii, Takaaki Hori, Kazuhiro Yamaki
BODY COSMETICS FOR WETTED SKIN

Publication number: 20150086502

Abstract: A body cosmetic for application to wetted skin after bathing, and the like, which provide high moisturizing effects, spread well to all over the body, and are easy to apply. An oil-in-water body cosmetic for application to wetted skin, which contains a water-soluble polymer and the following components (A) to (C): (A) from 20 to 50% by mass of oily ingredients containing (Al) oil that is pasty at 25° C. and (A2) polar oil that is liquid at 25° C.

Type: Application

Filed: December 1, 2014

Publication date: March 26, 2015

Applicant: KAO CORPORATION

Inventors: Tomomi ISHII, Takaaki HORI, Kazuhiro YAMAKI
Body cosmetics for wetted skin

Patent number: 8933125

Abstract: A body cosmetic for application to wetted skin after bathing, and the like, which provide high moisturizing effects, spread well to all over the body, and are easy to apply. An oil-in-water body cosmetic for application to wetted skin, which contains a water-soluble polymer and the following components (A) to (C): (A) from 20 to 50% by mass of oily ingredients containing (A1) oil that is pasty at 25° C. and (A2) polar oil that is liquid at 25° C.; (B) from 11 to 50% by mass of glycerin; and (C) from 20 to 60% by mass of water; in this body cosmetic for application to wetted skin, the content of the pasty oil (A1) is from 1 to 20% by mass, and the content of the liquid polar oil (A2) is from 1 to 20% by mass.

Type: Grant

Filed: July 30, 2008

Date of Patent: January 13, 2015

Assignee: Kao Corporation

Inventors: Tomomi Ishii, Takaaki Hori, Kazuhiro Yamaki

prev 1 2 3 next