Patents by Inventor Chiori Hori

Chiori Hori has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11978435
    Abstract: This invention relates generally to speech processing and more particularly to end-to-end automatic speech recognition (ASR) that utilizes long contextual information. Some embodiments of the invention provide a system and a method for end-to-end ASR suitable for recognizing long audio recordings such as lecture and conversational speeches. This disclosure includes a Transformer-based ASR system that utilizes contextual information, wherein the Transformer accepts multiple utterances at the same time and predicts transcript for the last utterance. This is repeated in a sliding-window fashion with one-utterance shifts to recognize the entire recording. In addition, some embodiments of the present invention may use acoustic and/or text features obtained from only the previous utterances spoken by the same speaker as the last utterance when the long audio recording includes multiple speakers.
    Type: Grant
    Filed: October 13, 2020
    Date of Patent: May 7, 2024
    Assignee: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Takaaki Hori, Niko Moritz, Chiori Hori, Jonathan Le Roux
  • Publication number: 20240046085
    Abstract: An artificial intelligence (AI) low-latency processing system is provided. The low-latency processing system includes a processor; and a memory having instructions stored thereon. The low-latency processing system is configured to collect a sequence of frames jointly including information dispersed among at least some frames in the sequence of frames, execute a timing neural network trained to identify an early subsequence of frames in the sequence of frames including at least a portion of the information indicative of the information, and execute a decoding neural network trained to decode the information from the portion of the information in the subsequence of frames, wherein the timing neural network is jointly trained with the decoding neural network to iteratively identify the smallest number of subframes from the beginning of a training sequence of frames containing a portion of training information sufficient to decode the training information.
    Type: Application
    Filed: August 4, 2022
    Publication date: February 8, 2024
    Applicant: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Chiori Hori, Jonathan Le Roux, Anoop Cherian, 02139 Marks
  • Patent number: 11651222
    Abstract: Systems and methods for a computer system for detecting anomalies in incoming communication from a sender to a receiver. Accepting a relationship structure defining a trained association model between the sender and the receiver, and the incoming communication. Accessing neural networks trained to detect anomalies in the incoming communication and classify the anomalies by type, subject to correspondence between content of the incoming communication and the trained association model between the sender and the receiver. Compute an updated association model, based on sender and the receivers organizational indications using the content of the incoming communication. Execute the neural networks by submitting the incoming communication and the updated association model to produce a result of anomaly detection and anomaly classification type.
    Type: Grant
    Filed: April 3, 2020
    Date of Patent: May 16, 2023
    Assignee: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Bret Harsham, Chiori Hori
  • Patent number: 11635299
    Abstract: A navigation system for providing driving instructions to a driver of a vehicle traveling on a route is provided. The driving instructions are generated by executing a multimodal fusion method that comprises extracting features from sensor measurements, annotating the features with directions for the vehicle to follow the route with respect to objects sensed by the sensors, and encoding the annotated features with a multimodal attention neural network to produce encodings. The encodings are transformed into a common latent space, and the transformed encodings are fused using an attention mechanism producing an encoded representation of the scene. The method further comprises decoding the encoded representation with a sentence generation neural network to generate a driving instruction and submitting the driving instruction to an output device.
    Type: Grant
    Filed: February 6, 2020
    Date of Patent: April 25, 2023
    Inventors: Chiori Hori, Anoop Cherian, Siheng Chen, Tim Marks, Jonathan Le Roux, Takaaki Hori, Bret Harsham, Anthony Vetro, Alan Sullivan
  • Patent number: 11582485
    Abstract: Embodiments of the present disclosure discloses a scene-aware video encoder system. The scene-aware encoder system transforms a sequence of video frames of a video of a scene into a spatio-temporal scene graph. The spatio-temporal scene graph includes nodes representing one or multiple static and dynamic objects in the scene. Each node of the spatio-temporal scene graph describes an appearance, a location, and/or a motion of each of the objects (static and dynamic objects) at different time instances. The nodes of the spatio-temporal scene graph are embedded into a latent space using a spatio-temporal transformer encoding different combinations of different nodes of the spatio-temporal scene graph corresponding to different spatio-temporal volumes of the scene. Each node of the different nodes encoded in each of the combinations is weighted with an attention score determined as a function of similarities of spatio-temporal locations of the different nodes in the combination.
    Type: Grant
    Filed: February 7, 2022
    Date of Patent: February 14, 2023
    Assignee: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Anoop Cherian, Chiori Hori, Jonathan Le Roux, Tim Marks, Alan Sullivan
  • Patent number: 11445267
    Abstract: A scene captioning system is provided. The scene captioning system includes an interface configured to acquire a stream of scene data signals including frames and sound data, a memory to store a computer-executable scene captioning model including a scene encoder, a timing decoder, a timing detector, and a caption decoder, wherein the audio-visual encoder is shared by the timing decoder and the timing detector and the caption decoder, and a processor, in connection with the memory. The processor is configured to perform steps of extracting scene features from the scene data signals by use of the audio-visual encoder, determining a timing of generating a caption by use of the timing detector, wherein the timing is arranged an early stage of the stream of scene data signals, and generating the caption based on the scene features by using the caption decoder according to the timing.
    Type: Grant
    Filed: July 23, 2021
    Date of Patent: September 13, 2022
    Assignee: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Chiori Hori, Takaaki Hori, Anoop Cherian, Tim Marks, Jonathan Le Roux
  • Publication number: 20220115006
    Abstract: This invention relates generally to speech processing and more particularly to end-to-end automatic speech recognition (ASR) that utilizes long contextual information. Some embodiments of the invention provide a system and a method for end-to-end ASR suitable for recognizing long audio recordings such as lecture and conversational speeches. This disclosure includes a Transformer-based ASR system that utilizes contextual information, wherein the Transformer accepts multiple utterances at the same time and predicts transcript for the last utterance. This is repeated in a sliding-window fashion with one-utterance shifts to recognize the entire recording. In addition, some embodiments of the present invention may use acoustic and/or text features obtained from only the previous utterances spoken by the same speaker as the last utterance when the long audio recording includes multiple speakers.
    Type: Application
    Filed: October 13, 2020
    Publication date: April 14, 2022
    Applicant: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Takaaki Hori, Niko Moritz, Chiori Hori, Jonathan Le Roux
  • Publication number: 20220076100
    Abstract: An artificial intelligence (AI) system is disclosed. The AI system comprises an input interface to accept input data; a memory storing a multi-dimensional neural network having a sequence of deep neural networks (DNNs) with an inner DNN and an outer DNN; a processor configured to submit the input data to the multi-dimensional neural network to produce an output of the outer DNN and an output interface to render at least a function of the output. Each DNN processes the input data sequentially by a sequence of layers along a first dimension of data propagation. The DNNs are arranged along a second dimension of data propagation from the inner DNN to the outer DNN. Further, the DNNs are connected such that an output of at least one layer of a DNN is combined with an input to at least one layer of subsequent DNN in the sequence of DNNs.
    Type: Application
    Filed: September 10, 2020
    Publication date: March 10, 2022
    Applicant: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Chiori Hori, Peng Gao, Shijie Geng, Takaaki Hori, Jonathan Le Roux
  • Patent number: 11264009
    Abstract: A computer-implemented method for training a dialogue response generation system and the dialogue response generation system are provided. The method includes arranging a first multimodal encoder-decoder for the dialogue response generation or video description having a first input and a first output, wherein the first multimodal encoder-decoder has been pretrained by training audio-video datasets with training video description sentences, arranging a second multimodal encoder-decoder for dialog response generation having a second input and a second output, providing first audio-visual datasets with first corresponding video description sentences to the first input of the first multimodal encoder-decoder, wherein the first encoder-decoder generates first output values based on the first audio-visual datasets with the first corresponding description sentences, providing the first audio-visual datasets excluding the first corresponding video description sentences to the second multimodal encoder-decoder.
    Type: Grant
    Filed: September 13, 2019
    Date of Patent: March 1, 2022
    Assignee: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Chiori Hori, Anoop Cherian, Tim Marks, Takaaki Hori
  • Patent number: 11210523
    Abstract: A scene aware dialog system includes an input interface to receive a sequence of video frames, contextual information, and a query and a memory configured to store neural networks trained to generate a response to the input query by analyzing one or combination of input sequence of video frames and the input contextual information. The system further includes a processor configured to detect and classify objects in each video frame of the sequence of video frames; determine relationships among the classified objects in each of the video frame; extract features representing the classified objects and the determined relationships for each of the video frame to produce a sequence of feature vectors; and submit the sequence of feature vectors, the input query and the input contextual information to the neural network to generate a response to the input query.
    Type: Grant
    Filed: February 6, 2020
    Date of Patent: December 28, 2021
    Assignee: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Shijie Geng, Peng Gao, Anoop Cherian, Chiori Hori, Jonathan Le Roux
  • Publication number: 20210312395
    Abstract: Systems and methods for a computer system for detecting anomalies in incoming communication from a sender to a receiver. Accepting a relationship structure defining a trained association model between the sender and the receiver, and the incoming communication. Accessing neural networks trained to detect anomalies in the incoming communication and classify the anomalies by type, subject to correspondence between content of the incoming communication and the trained association model between the sender and the receiver. Compute an updated association model, based on sender and the receivers organizational indications using the content of the incoming communication. Execute the neural networks by submitting the incoming communication and the updated association model to produce a result of anomaly detection and anomaly classification type.
    Type: Application
    Filed: April 3, 2020
    Publication date: October 7, 2021
    Applicant: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Bret Harsham, Chiori Hori
  • Publication number: 20210248375
    Abstract: A scene aware dialog system includes an input interface to receive a sequence of video frames, contextual information, and a query and a memory configured to store neural networks trained to generate a response to the input query by analyzing one or combination of input sequence of video frames and the input contextual information. The system further includes a processor configured to detect and classify objects in each video frame of the sequence of video frames; determine relationships among the classified objects in each of the video frame; extract features representing the classified objects and the determined relationships for each of the video frame to produce a sequence of feature vectors; and submit the sequence of feature vectors, the input query and the input contextual information to the neural network to generate a response to the input query.
    Type: Application
    Filed: February 6, 2020
    Publication date: August 12, 2021
    Applicant: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Shijie Geng, Peng Gao, Anoop Cherian, Chiori Hori, Jonathan Le Roux
  • Publication number: 20210247201
    Abstract: A navigation system configured to provide driving instructions to a driver of a moving vehicle based on real-time description of objects in a scene pertinent to driving the vehicle is provided.
    Type: Application
    Filed: February 6, 2020
    Publication date: August 12, 2021
    Applicant: Mitsubishi ELectric Research Laboratories, Inc.
    Inventors: Chiori Hori, Anoop Cherian, Siheng Chen, Tim Marks, Jonathan Le Roux, Takaaki Hori, Bret Harsham, Anthony Vetro, Alan Sullivan
  • Patent number: 11086918
    Abstract: A method for performing multi-label classification includes extracting a feature vector from an input vector including input data by a feature extractor, determining, by a label predictor, a relevant vector including relevant labels having relevant scores based on the feature vector, updating a binary masking vector by masking pre-selected labels having been selected in previous label selections, applying the updated binary masking vector to the relevant vector such that the relevant label vector is updated to exclude the pre-selected labels from the relevant labels, and selecting a relevant label from the updated relevant label vector based on the relevant scores of the updated relevant label vector.
    Type: Grant
    Filed: December 7, 2016
    Date of Patent: August 10, 2021
    Assignee: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Takaaki Hori, Chiori Hori, Shinji Watanabe, John Hershey, Bret Harsham, Jonathan Le Roux
  • Patent number: 11079495
    Abstract: A positioning system for tracking a position of a vehicle includes a receiver configured to receive phase measurements of satellite signals received at multiple instances of time from multiple satellites, and a memory configured to store a recurrent neural network trained to determine a position of the vehicle from a set of phase measurements in a presence of noise caused by a multipath transmission of at least some of the satellite signals at some instances of time. A processor of the positioning system is configured to track the position of the vehicle over different instances of time by processing the set of phase measurements received at each instance of time with the recurrent neural network to produce the position of the vehicle at each instance of time.
    Type: Grant
    Filed: October 31, 2018
    Date of Patent: August 3, 2021
    Assignee: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Kyeong Jin Kim, Philip Orlik, Chiori Hori
  • Publication number: 20210082398
    Abstract: A computer-implemented method for training a dialogue response generation system and the dialogue response generation system are provided. The method includes arranging a first multimodal encoder-decoder for the dialogue response generation or video description having a first input and a first output, wherein the first multimodal encoder-decoder has been pretrained by training audio-video datasets with training video description sentences, arranging a second multimodal encoder-decoder for dialog response generation having a second input and a second output, providing first audio-visual datasets with first corresponding video description sentences to the first input of the first multimodal encoder-decoder, wherein the first encoder-decoder generates first output values based on the first audio-visual datasets with the first corresponding description sentences, providing the first audio-visual datasets excluding the first corresponding video description sentences to the second multimodal encoder-decoder.
    Type: Application
    Filed: September 13, 2019
    Publication date: March 18, 2021
    Inventors: Chiori Hori, Anoop Cherian, Tim Marks, Takaaki Hori
  • Publication number: 20200132861
    Abstract: A positioning system for tracking a position of a vehicle includes a receiver configured to receive phase measurements of satellite signals received at multiple instances of time from multiple satellites, and a memory configured to store a recurrent neural network trained to determine a position of the vehicle from a set of phase measurements in a presence of noise caused by a multipath transmission of at least some of the satellite signals at some instances of time. A processor of the positioning system is configured to track the position of the vehicle over different instances of time by processing the set of phase measurements received at each instance of time with the recurrent neural network to produce the position of the vehicle at each instance of time.
    Type: Application
    Filed: October 31, 2018
    Publication date: April 30, 2020
    Inventors: Kyeong Jin Kim, Philip Orlik, Chiori Hori
  • Patent number: 10417498
    Abstract: A system for generating a word sequence includes one or more processors in connection with a memory and one or more storage devices storing instructions causing operations that include receiving first and second input vectors, extracting first and second feature vectors, estimating a first set of weights and a second set of weights, calculating a first content vector from the first set of weights and the first feature vectors, and calculating a second content vector, transforming the first content vector into a first modal content vector having a predetermined dimension and transforming the second content vector into a second modal content vector having the predetermined dimension, estimating a set of modal attention weights, generating a weighted content vector having the predetermined dimension from the set of modal attention weights and the first and second modal content vectors, and generating a predicted word using the sequence generator.
    Type: Grant
    Filed: March 29, 2017
    Date of Patent: September 17, 2019
    Assignee: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Chiori Hori, Takaaki Hori, John Hershey, Tim Marks
  • Patent number: 10176799
    Abstract: A method and for training a language model to reduce recognition errors, wherein the language model is a recurrent neural network language model (RNNLM) by first acquiring training samples. An automatic speech recognition system (ASR) is applied to the training samples to produce recognized words and probabilites of the recognized words, and an N-best list is selected from the recognized words based on the probabilities. determining word errors using reference data for hypotheses in the N-best list. The hypotheses are rescored using the RNNLM. Then, we determine gradients for the hypotheses using the word errors and gradients for words in the hypotheses. Lastly, parameters of the RNNLM are updated using a sum of the gradients.
    Type: Grant
    Filed: February 2, 2016
    Date of Patent: January 8, 2019
    Assignee: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Takaaki Hori, Chiori Hori, Shinji Watanabe, John Hershey
  • Publication number: 20180189572
    Abstract: A system for generating a word sequence includes one or more processors in connection with a memory and one or more storage devices storing instructions causing operations that include receiving first and second input vectors, extracting first and second feature vectors, estimating a first set of weights and a second set of weights, calculating a first content vector from the first set of weights and the first feature vectors, and calculating a second content vector, transforming the first content vector into a first modal content vector having a predetermined dimension and transforming the second content vector into a second modal content vector having the predetermined dimension, estimating a set of modal attention weights, generating a weighted content vector having the predetermined dimension from the set of modal attention weights and the first and second modal content vectors, and generating a predicted word using the sequence generator.
    Type: Application
    Filed: March 29, 2017
    Publication date: July 5, 2018
    Applicant: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Chiori Hori, Takaaki Hori, John Hershey, Tim Marks