Patents by Inventor Chiori Hori

Chiori Hori has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Long-context end-to-end speech recognition system

Patent number: 11978435

Abstract: This invention relates generally to speech processing and more particularly to end-to-end automatic speech recognition (ASR) that utilizes long contextual information. Some embodiments of the invention provide a system and a method for end-to-end ASR suitable for recognizing long audio recordings such as lecture and conversational speeches. This disclosure includes a Transformer-based ASR system that utilizes contextual information, wherein the Transformer accepts multiple utterances at the same time and predicts transcript for the last utterance. This is repeated in a sliding-window fashion with one-utterance shifts to recognize the entire recording. In addition, some embodiments of the present invention may use acoustic and/or text features obtained from only the previous utterances spoken by the same speaker as the last utterance when the long audio recording includes multiple speakers.

Type: Grant

Filed: October 13, 2020

Date of Patent: May 7, 2024

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Takaaki Hori, Niko Moritz, Chiori Hori, Jonathan Le Roux
Low-latency Captioning System

Publication number: 20240046085

Abstract: An artificial intelligence (AI) low-latency processing system is provided. The low-latency processing system includes a processor; and a memory having instructions stored thereon. The low-latency processing system is configured to collect a sequence of frames jointly including information dispersed among at least some frames in the sequence of frames, execute a timing neural network trained to identify an early subsequence of frames in the sequence of frames including at least a portion of the information indicative of the information, and execute a decoding neural network trained to decode the information from the portion of the information in the subsequence of frames, wherein the timing neural network is jointly trained with the decoding neural network to iteratively identify the smallest number of subframes from the beginning of a training sequence of frames containing a portion of training information sufficient to decode the training information.

Type: Application

Filed: August 4, 2022

Publication date: February 8, 2024

Applicant: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Chiori Hori, Jonathan Le Roux, Anoop Cherian, 02139 Marks
System and method for using human relationship structures for email classification

Patent number: 11651222

Abstract: Systems and methods for a computer system for detecting anomalies in incoming communication from a sender to a receiver. Accepting a relationship structure defining a trained association model between the sender and the receiver, and the incoming communication. Accessing neural networks trained to detect anomalies in the incoming communication and classify the anomalies by type, subject to correspondence between content of the incoming communication and the trained association model between the sender and the receiver. Compute an updated association model, based on sender and the receivers organizational indications using the content of the incoming communication. Execute the neural networks by submitting the incoming communication and the updated association model to produce a result of anomaly detection and anomaly classification type.

Type: Grant

Filed: April 3, 2020

Date of Patent: May 16, 2023

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Bret Harsham, Chiori Hori
Method and system for scene-aware interaction

Patent number: 11635299

Abstract: A navigation system for providing driving instructions to a driver of a vehicle traveling on a route is provided. The driving instructions are generated by executing a multimodal fusion method that comprises extracting features from sensor measurements, annotating the features with directions for the vehicle to follow the route with respect to objects sensed by the sensors, and encoding the annotated features with a multimodal attention neural network to produce encodings. The encodings are transformed into a common latent space, and the transformed encodings are fused using an attention mechanism producing an encoded representation of the scene. The method further comprises decoding the encoded representation with a sentence generation neural network to generate a driving instruction and submitting the driving instruction to an output device.

Type: Grant

Filed: February 6, 2020

Date of Patent: April 25, 2023

Inventors: Chiori Hori, Anoop Cherian, Siheng Chen, Tim Marks, Jonathan Le Roux, Takaaki Hori, Bret Harsham, Anthony Vetro, Alan Sullivan
Scene-aware video encoder system and method

Patent number: 11582485

Abstract: Embodiments of the present disclosure discloses a scene-aware video encoder system. The scene-aware encoder system transforms a sequence of video frames of a video of a scene into a spatio-temporal scene graph. The spatio-temporal scene graph includes nodes representing one or multiple static and dynamic objects in the scene. Each node of the spatio-temporal scene graph describes an appearance, a location, and/or a motion of each of the objects (static and dynamic objects) at different time instances. The nodes of the spatio-temporal scene graph are embedded into a latent space using a spatio-temporal transformer encoding different combinations of different nodes of the spatio-temporal scene graph corresponding to different spatio-temporal volumes of the scene. Each node of the different nodes encoded in each of the combinations is weighted with an attention score determined as a function of similarities of spatio-temporal locations of the different nodes in the combination.

Type: Grant

Filed: February 7, 2022

Date of Patent: February 14, 2023

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Anoop Cherian, Chiori Hori, Jonathan Le Roux, Tim Marks, Alan Sullivan
Low-latency captioning system

Patent number: 11445267

Abstract: A scene captioning system is provided. The scene captioning system includes an interface configured to acquire a stream of scene data signals including frames and sound data, a memory to store a computer-executable scene captioning model including a scene encoder, a timing decoder, a timing detector, and a caption decoder, wherein the audio-visual encoder is shared by the timing decoder and the timing detector and the caption decoder, and a processor, in connection with the memory. The processor is configured to perform steps of extracting scene features from the scene data signals by use of the audio-visual encoder, determining a timing of generating a caption by use of the timing detector, wherein the timing is arranged an early stage of the stream of scene data signals, and generating the caption based on the scene features by using the caption decoder according to the timing.

Type: Grant

Filed: July 23, 2021

Date of Patent: September 13, 2022

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Chiori Hori, Takaaki Hori, Anoop Cherian, Tim Marks, Jonathan Le Roux
Long-context End-to-end Speech Recognition System

Publication number: 20220115006

Abstract: This invention relates generally to speech processing and more particularly to end-to-end automatic speech recognition (ASR) that utilizes long contextual information. Some embodiments of the invention provide a system and a method for end-to-end ASR suitable for recognizing long audio recordings such as lecture and conversational speeches. This disclosure includes a Transformer-based ASR system that utilizes contextual information, wherein the Transformer accepts multiple utterances at the same time and predicts transcript for the last utterance. This is repeated in a sliding-window fashion with one-utterance shifts to recognize the entire recording. In addition, some embodiments of the present invention may use acoustic and/or text features obtained from only the previous utterances spoken by the same speaker as the last utterance when the long audio recording includes multiple speakers.

Type: Application

Filed: October 13, 2020

Publication date: April 14, 2022

Applicant: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Takaaki Hori, Niko Moritz, Chiori Hori, Jonathan Le Roux
Multi-Dimensional Deep Neural Network

Publication number: 20220076100

Abstract: An artificial intelligence (AI) system is disclosed. The AI system comprises an input interface to accept input data; a memory storing a multi-dimensional neural network having a sequence of deep neural networks (DNNs) with an inner DNN and an outer DNN; a processor configured to submit the input data to the multi-dimensional neural network to produce an output of the outer DNN and an output interface to render at least a function of the output. Each DNN processes the input data sequentially by a sequence of layers along a first dimension of data propagation. The DNNs are arranged along a second dimension of data propagation from the inner DNN to the outer DNN. Further, the DNNs are connected such that an output of at least one layer of a DNN is combined with an input to at least one layer of subsequent DNN in the sequence of DNNs.

Type: Application

Filed: September 10, 2020

Publication date: March 10, 2022

Applicant: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Chiori Hori, Peng Gao, Shijie Geng, Takaaki Hori, Jonathan Le Roux
System and method for a dialogue response generation system

Patent number: 11264009

Abstract: A computer-implemented method for training a dialogue response generation system and the dialogue response generation system are provided. The method includes arranging a first multimodal encoder-decoder for the dialogue response generation or video description having a first input and a first output, wherein the first multimodal encoder-decoder has been pretrained by training audio-video datasets with training video description sentences, arranging a second multimodal encoder-decoder for dialog response generation having a second input and a second output, providing first audio-visual datasets with first corresponding video description sentences to the first input of the first multimodal encoder-decoder, wherein the first encoder-decoder generates first output values based on the first audio-visual datasets with the first corresponding description sentences, providing the first audio-visual datasets excluding the first corresponding video description sentences to the second multimodal encoder-decoder.

Type: Grant

Filed: September 13, 2019

Date of Patent: March 1, 2022

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Chiori Hori, Anoop Cherian, Tim Marks, Takaaki Hori
Scene-aware video dialog

Patent number: 11210523

Abstract: A scene aware dialog system includes an input interface to receive a sequence of video frames, contextual information, and a query and a memory configured to store neural networks trained to generate a response to the input query by analyzing one or combination of input sequence of video frames and the input contextual information. The system further includes a processor configured to detect and classify objects in each video frame of the sequence of video frames; determine relationships among the classified objects in each of the video frame; extract features representing the classified objects and the determined relationships for each of the video frame to produce a sequence of feature vectors; and submit the sequence of feature vectors, the input query and the input contextual information to the neural network to generate a response to the input query.

Type: Grant

Filed: February 6, 2020

Date of Patent: December 28, 2021

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Shijie Geng, Peng Gao, Anoop Cherian, Chiori Hori, Jonathan Le Roux
System and Method for Using Human Relationship Structures for Email Classification

Publication number: 20210312395

Abstract: Systems and methods for a computer system for detecting anomalies in incoming communication from a sender to a receiver. Accepting a relationship structure defining a trained association model between the sender and the receiver, and the incoming communication. Accessing neural networks trained to detect anomalies in the incoming communication and classify the anomalies by type, subject to correspondence between content of the incoming communication and the trained association model between the sender and the receiver. Compute an updated association model, based on sender and the receivers organizational indications using the content of the incoming communication. Execute the neural networks by submitting the incoming communication and the updated association model to produce a result of anomaly detection and anomaly classification type.

Type: Application

Filed: April 3, 2020

Publication date: October 7, 2021

Applicant: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Bret Harsham, Chiori Hori
Scene-Aware Video Dialog

Publication number: 20210248375

Abstract: A scene aware dialog system includes an input interface to receive a sequence of video frames, contextual information, and a query and a memory configured to store neural networks trained to generate a response to the input query by analyzing one or combination of input sequence of video frames and the input contextual information. The system further includes a processor configured to detect and classify objects in each video frame of the sequence of video frames; determine relationships among the classified objects in each of the video frame; extract features representing the classified objects and the determined relationships for each of the video frame to produce a sequence of feature vectors; and submit the sequence of feature vectors, the input query and the input contextual information to the neural network to generate a response to the input query.

Type: Application

Filed: February 6, 2020

Publication date: August 12, 2021

Applicant: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Shijie Geng, Peng Gao, Anoop Cherian, Chiori Hori, Jonathan Le Roux
Method and System for Scene-Aware Interaction

Publication number: 20210247201

Abstract: A navigation system configured to provide driving instructions to a driver of a moving vehicle based on real-time description of objects in a scene pertinent to driving the vehicle is provided.

Type: Application

Filed: February 6, 2020

Publication date: August 12, 2021

Applicant: Mitsubishi ELectric Research Laboratories, Inc.

Inventors: Chiori Hori, Anoop Cherian, Siheng Chen, Tim Marks, Jonathan Le Roux, Takaaki Hori, Bret Harsham, Anthony Vetro, Alan Sullivan
Method and system for multi-label classification

Patent number: 11086918

Abstract: A method for performing multi-label classification includes extracting a feature vector from an input vector including input data by a feature extractor, determining, by a label predictor, a relevant vector including relevant labels having relevant scores based on the feature vector, updating a binary masking vector by masking pre-selected labels having been selected in previous label selections, applying the updated binary masking vector to the relevant vector such that the relevant label vector is updated to exclude the pre-selected labels from the relevant labels, and selecting a relevant label from the updated relevant label vector based on the relevant scores of the updated relevant label vector.

Type: Grant

Filed: December 7, 2016

Date of Patent: August 10, 2021

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Takaaki Hori, Chiori Hori, Shinji Watanabe, John Hershey, Bret Harsham, Jonathan Le Roux
Position estimation under multipath transmission

Patent number: 11079495

Abstract: A positioning system for tracking a position of a vehicle includes a receiver configured to receive phase measurements of satellite signals received at multiple instances of time from multiple satellites, and a memory configured to store a recurrent neural network trained to determine a position of the vehicle from a set of phase measurements in a presence of noise caused by a multipath transmission of at least some of the satellite signals at some instances of time. A processor of the positioning system is configured to track the position of the vehicle over different instances of time by processing the set of phase measurements received at each instance of time with the recurrent neural network to produce the position of the vehicle at each instance of time.

Type: Grant

Filed: October 31, 2018

Date of Patent: August 3, 2021

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Kyeong Jin Kim, Philip Orlik, Chiori Hori
System and Method for a Dialogue Response Generation System

Publication number: 20210082398

Abstract: A computer-implemented method for training a dialogue response generation system and the dialogue response generation system are provided. The method includes arranging a first multimodal encoder-decoder for the dialogue response generation or video description having a first input and a first output, wherein the first multimodal encoder-decoder has been pretrained by training audio-video datasets with training video description sentences, arranging a second multimodal encoder-decoder for dialog response generation having a second input and a second output, providing first audio-visual datasets with first corresponding video description sentences to the first input of the first multimodal encoder-decoder, wherein the first encoder-decoder generates first output values based on the first audio-visual datasets with the first corresponding description sentences, providing the first audio-visual datasets excluding the first corresponding video description sentences to the second multimodal encoder-decoder.

Type: Application

Filed: September 13, 2019

Publication date: March 18, 2021

Inventors: Chiori Hori, Anoop Cherian, Tim Marks, Takaaki Hori
Position Estimation Under Multipath Transmission

Publication number: 20200132861

Abstract: A positioning system for tracking a position of a vehicle includes a receiver configured to receive phase measurements of satellite signals received at multiple instances of time from multiple satellites, and a memory configured to store a recurrent neural network trained to determine a position of the vehicle from a set of phase measurements in a presence of noise caused by a multipath transmission of at least some of the satellite signals at some instances of time. A processor of the positioning system is configured to track the position of the vehicle over different instances of time by processing the set of phase measurements received at each instance of time with the recurrent neural network to produce the position of the vehicle at each instance of time.

Type: Application

Filed: October 31, 2018

Publication date: April 30, 2020

Inventors: Kyeong Jin Kim, Philip Orlik, Chiori Hori
Method and system for multi-modal fusion model

Patent number: 10417498

Abstract: A system for generating a word sequence includes one or more processors in connection with a memory and one or more storage devices storing instructions causing operations that include receiving first and second input vectors, extracting first and second feature vectors, estimating a first set of weights and a second set of weights, calculating a first content vector from the first set of weights and the first feature vectors, and calculating a second content vector, transforming the first content vector into a first modal content vector having a predetermined dimension and transforming the second content vector into a second modal content vector having the predetermined dimension, estimating a set of modal attention weights, generating a weighted content vector having the predetermined dimension from the set of modal attention weights and the first and second modal content vectors, and generating a predicted word using the sequence generator.

Type: Grant

Filed: March 29, 2017

Date of Patent: September 17, 2019

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Chiori Hori, Takaaki Hori, John Hershey, Tim Marks
Method and system for training language models to reduce recognition errors

Patent number: 10176799

Abstract: A method and for training a language model to reduce recognition errors, wherein the language model is a recurrent neural network language model (RNNLM) by first acquiring training samples. An automatic speech recognition system (ASR) is applied to the training samples to produce recognized words and probabilites of the recognized words, and an N-best list is selected from the recognized words based on the probabilities. determining word errors using reference data for hypotheses in the N-best list. The hypotheses are rescored using the RNNLM. Then, we determine gradients for the hypotheses using the word errors and gradients for words in the hypotheses. Lastly, parameters of the RNNLM are updated using a sum of the gradients.

Type: Grant

Filed: February 2, 2016

Date of Patent: January 8, 2019

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Takaaki Hori, Chiori Hori, Shinji Watanabe, John Hershey
Method and System for Multi-Modal Fusion Model

Publication number: 20180189572

Abstract: A system for generating a word sequence includes one or more processors in connection with a memory and one or more storage devices storing instructions causing operations that include receiving first and second input vectors, extracting first and second feature vectors, estimating a first set of weights and a second set of weights, calculating a first content vector from the first set of weights and the first feature vectors, and calculating a second content vector, transforming the first content vector into a first modal content vector having a predetermined dimension and transforming the second content vector into a second modal content vector having the predetermined dimension, estimating a set of modal attention weights, generating a weighted content vector having the predetermined dimension from the set of modal attention weights and the first and second modal content vectors, and generating a predicted word using the sequence generator.

Type: Application

Filed: March 29, 2017

Publication date: July 5, 2018

Applicant: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Chiori Hori, Takaaki Hori, John Hershey, Tim Marks

1 2 next