Patents by Inventor Chiori Hori
Chiori Hori has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11978435Abstract: This invention relates generally to speech processing and more particularly to end-to-end automatic speech recognition (ASR) that utilizes long contextual information. Some embodiments of the invention provide a system and a method for end-to-end ASR suitable for recognizing long audio recordings such as lecture and conversational speeches. This disclosure includes a Transformer-based ASR system that utilizes contextual information, wherein the Transformer accepts multiple utterances at the same time and predicts transcript for the last utterance. This is repeated in a sliding-window fashion with one-utterance shifts to recognize the entire recording. In addition, some embodiments of the present invention may use acoustic and/or text features obtained from only the previous utterances spoken by the same speaker as the last utterance when the long audio recording includes multiple speakers.Type: GrantFiled: October 13, 2020Date of Patent: May 7, 2024Assignee: Mitsubishi Electric Research Laboratories, Inc.Inventors: Takaaki Hori, Niko Moritz, Chiori Hori, Jonathan Le Roux
-
Publication number: 20240046085Abstract: An artificial intelligence (AI) low-latency processing system is provided. The low-latency processing system includes a processor; and a memory having instructions stored thereon. The low-latency processing system is configured to collect a sequence of frames jointly including information dispersed among at least some frames in the sequence of frames, execute a timing neural network trained to identify an early subsequence of frames in the sequence of frames including at least a portion of the information indicative of the information, and execute a decoding neural network trained to decode the information from the portion of the information in the subsequence of frames, wherein the timing neural network is jointly trained with the decoding neural network to iteratively identify the smallest number of subframes from the beginning of a training sequence of frames containing a portion of training information sufficient to decode the training information.Type: ApplicationFiled: August 4, 2022Publication date: February 8, 2024Applicant: Mitsubishi Electric Research Laboratories, Inc.Inventors: Chiori Hori, Jonathan Le Roux, Anoop Cherian, 02139 Marks
-
Patent number: 11651222Abstract: Systems and methods for a computer system for detecting anomalies in incoming communication from a sender to a receiver. Accepting a relationship structure defining a trained association model between the sender and the receiver, and the incoming communication. Accessing neural networks trained to detect anomalies in the incoming communication and classify the anomalies by type, subject to correspondence between content of the incoming communication and the trained association model between the sender and the receiver. Compute an updated association model, based on sender and the receivers organizational indications using the content of the incoming communication. Execute the neural networks by submitting the incoming communication and the updated association model to produce a result of anomaly detection and anomaly classification type.Type: GrantFiled: April 3, 2020Date of Patent: May 16, 2023Assignee: Mitsubishi Electric Research Laboratories, Inc.Inventors: Bret Harsham, Chiori Hori
-
Patent number: 11635299Abstract: A navigation system for providing driving instructions to a driver of a vehicle traveling on a route is provided. The driving instructions are generated by executing a multimodal fusion method that comprises extracting features from sensor measurements, annotating the features with directions for the vehicle to follow the route with respect to objects sensed by the sensors, and encoding the annotated features with a multimodal attention neural network to produce encodings. The encodings are transformed into a common latent space, and the transformed encodings are fused using an attention mechanism producing an encoded representation of the scene. The method further comprises decoding the encoded representation with a sentence generation neural network to generate a driving instruction and submitting the driving instruction to an output device.Type: GrantFiled: February 6, 2020Date of Patent: April 25, 2023Inventors: Chiori Hori, Anoop Cherian, Siheng Chen, Tim Marks, Jonathan Le Roux, Takaaki Hori, Bret Harsham, Anthony Vetro, Alan Sullivan
-
Patent number: 11582485Abstract: Embodiments of the present disclosure discloses a scene-aware video encoder system. The scene-aware encoder system transforms a sequence of video frames of a video of a scene into a spatio-temporal scene graph. The spatio-temporal scene graph includes nodes representing one or multiple static and dynamic objects in the scene. Each node of the spatio-temporal scene graph describes an appearance, a location, and/or a motion of each of the objects (static and dynamic objects) at different time instances. The nodes of the spatio-temporal scene graph are embedded into a latent space using a spatio-temporal transformer encoding different combinations of different nodes of the spatio-temporal scene graph corresponding to different spatio-temporal volumes of the scene. Each node of the different nodes encoded in each of the combinations is weighted with an attention score determined as a function of similarities of spatio-temporal locations of the different nodes in the combination.Type: GrantFiled: February 7, 2022Date of Patent: February 14, 2023Assignee: Mitsubishi Electric Research Laboratories, Inc.Inventors: Anoop Cherian, Chiori Hori, Jonathan Le Roux, Tim Marks, Alan Sullivan
-
Patent number: 11445267Abstract: A scene captioning system is provided. The scene captioning system includes an interface configured to acquire a stream of scene data signals including frames and sound data, a memory to store a computer-executable scene captioning model including a scene encoder, a timing decoder, a timing detector, and a caption decoder, wherein the audio-visual encoder is shared by the timing decoder and the timing detector and the caption decoder, and a processor, in connection with the memory. The processor is configured to perform steps of extracting scene features from the scene data signals by use of the audio-visual encoder, determining a timing of generating a caption by use of the timing detector, wherein the timing is arranged an early stage of the stream of scene data signals, and generating the caption based on the scene features by using the caption decoder according to the timing.Type: GrantFiled: July 23, 2021Date of Patent: September 13, 2022Assignee: Mitsubishi Electric Research Laboratories, Inc.Inventors: Chiori Hori, Takaaki Hori, Anoop Cherian, Tim Marks, Jonathan Le Roux
-
Publication number: 20220115006Abstract: This invention relates generally to speech processing and more particularly to end-to-end automatic speech recognition (ASR) that utilizes long contextual information. Some embodiments of the invention provide a system and a method for end-to-end ASR suitable for recognizing long audio recordings such as lecture and conversational speeches. This disclosure includes a Transformer-based ASR system that utilizes contextual information, wherein the Transformer accepts multiple utterances at the same time and predicts transcript for the last utterance. This is repeated in a sliding-window fashion with one-utterance shifts to recognize the entire recording. In addition, some embodiments of the present invention may use acoustic and/or text features obtained from only the previous utterances spoken by the same speaker as the last utterance when the long audio recording includes multiple speakers.Type: ApplicationFiled: October 13, 2020Publication date: April 14, 2022Applicant: Mitsubishi Electric Research Laboratories, Inc.Inventors: Takaaki Hori, Niko Moritz, Chiori Hori, Jonathan Le Roux
-
Publication number: 20220076100Abstract: An artificial intelligence (AI) system is disclosed. The AI system comprises an input interface to accept input data; a memory storing a multi-dimensional neural network having a sequence of deep neural networks (DNNs) with an inner DNN and an outer DNN; a processor configured to submit the input data to the multi-dimensional neural network to produce an output of the outer DNN and an output interface to render at least a function of the output. Each DNN processes the input data sequentially by a sequence of layers along a first dimension of data propagation. The DNNs are arranged along a second dimension of data propagation from the inner DNN to the outer DNN. Further, the DNNs are connected such that an output of at least one layer of a DNN is combined with an input to at least one layer of subsequent DNN in the sequence of DNNs.Type: ApplicationFiled: September 10, 2020Publication date: March 10, 2022Applicant: Mitsubishi Electric Research Laboratories, Inc.Inventors: Chiori Hori, Peng Gao, Shijie Geng, Takaaki Hori, Jonathan Le Roux
-
Patent number: 11264009Abstract: A computer-implemented method for training a dialogue response generation system and the dialogue response generation system are provided. The method includes arranging a first multimodal encoder-decoder for the dialogue response generation or video description having a first input and a first output, wherein the first multimodal encoder-decoder has been pretrained by training audio-video datasets with training video description sentences, arranging a second multimodal encoder-decoder for dialog response generation having a second input and a second output, providing first audio-visual datasets with first corresponding video description sentences to the first input of the first multimodal encoder-decoder, wherein the first encoder-decoder generates first output values based on the first audio-visual datasets with the first corresponding description sentences, providing the first audio-visual datasets excluding the first corresponding video description sentences to the second multimodal encoder-decoder.Type: GrantFiled: September 13, 2019Date of Patent: March 1, 2022Assignee: Mitsubishi Electric Research Laboratories, Inc.Inventors: Chiori Hori, Anoop Cherian, Tim Marks, Takaaki Hori
-
Patent number: 11210523Abstract: A scene aware dialog system includes an input interface to receive a sequence of video frames, contextual information, and a query and a memory configured to store neural networks trained to generate a response to the input query by analyzing one or combination of input sequence of video frames and the input contextual information. The system further includes a processor configured to detect and classify objects in each video frame of the sequence of video frames; determine relationships among the classified objects in each of the video frame; extract features representing the classified objects and the determined relationships for each of the video frame to produce a sequence of feature vectors; and submit the sequence of feature vectors, the input query and the input contextual information to the neural network to generate a response to the input query.Type: GrantFiled: February 6, 2020Date of Patent: December 28, 2021Assignee: Mitsubishi Electric Research Laboratories, Inc.Inventors: Shijie Geng, Peng Gao, Anoop Cherian, Chiori Hori, Jonathan Le Roux
-
Publication number: 20210312395Abstract: Systems and methods for a computer system for detecting anomalies in incoming communication from a sender to a receiver. Accepting a relationship structure defining a trained association model between the sender and the receiver, and the incoming communication. Accessing neural networks trained to detect anomalies in the incoming communication and classify the anomalies by type, subject to correspondence between content of the incoming communication and the trained association model between the sender and the receiver. Compute an updated association model, based on sender and the receivers organizational indications using the content of the incoming communication. Execute the neural networks by submitting the incoming communication and the updated association model to produce a result of anomaly detection and anomaly classification type.Type: ApplicationFiled: April 3, 2020Publication date: October 7, 2021Applicant: Mitsubishi Electric Research Laboratories, Inc.Inventors: Bret Harsham, Chiori Hori
-
Publication number: 20210248375Abstract: A scene aware dialog system includes an input interface to receive a sequence of video frames, contextual information, and a query and a memory configured to store neural networks trained to generate a response to the input query by analyzing one or combination of input sequence of video frames and the input contextual information. The system further includes a processor configured to detect and classify objects in each video frame of the sequence of video frames; determine relationships among the classified objects in each of the video frame; extract features representing the classified objects and the determined relationships for each of the video frame to produce a sequence of feature vectors; and submit the sequence of feature vectors, the input query and the input contextual information to the neural network to generate a response to the input query.Type: ApplicationFiled: February 6, 2020Publication date: August 12, 2021Applicant: Mitsubishi Electric Research Laboratories, Inc.Inventors: Shijie Geng, Peng Gao, Anoop Cherian, Chiori Hori, Jonathan Le Roux
-
Publication number: 20210247201Abstract: A navigation system configured to provide driving instructions to a driver of a moving vehicle based on real-time description of objects in a scene pertinent to driving the vehicle is provided.Type: ApplicationFiled: February 6, 2020Publication date: August 12, 2021Applicant: Mitsubishi ELectric Research Laboratories, Inc.Inventors: Chiori Hori, Anoop Cherian, Siheng Chen, Tim Marks, Jonathan Le Roux, Takaaki Hori, Bret Harsham, Anthony Vetro, Alan Sullivan
-
Patent number: 11086918Abstract: A method for performing multi-label classification includes extracting a feature vector from an input vector including input data by a feature extractor, determining, by a label predictor, a relevant vector including relevant labels having relevant scores based on the feature vector, updating a binary masking vector by masking pre-selected labels having been selected in previous label selections, applying the updated binary masking vector to the relevant vector such that the relevant label vector is updated to exclude the pre-selected labels from the relevant labels, and selecting a relevant label from the updated relevant label vector based on the relevant scores of the updated relevant label vector.Type: GrantFiled: December 7, 2016Date of Patent: August 10, 2021Assignee: Mitsubishi Electric Research Laboratories, Inc.Inventors: Takaaki Hori, Chiori Hori, Shinji Watanabe, John Hershey, Bret Harsham, Jonathan Le Roux
-
Patent number: 11079495Abstract: A positioning system for tracking a position of a vehicle includes a receiver configured to receive phase measurements of satellite signals received at multiple instances of time from multiple satellites, and a memory configured to store a recurrent neural network trained to determine a position of the vehicle from a set of phase measurements in a presence of noise caused by a multipath transmission of at least some of the satellite signals at some instances of time. A processor of the positioning system is configured to track the position of the vehicle over different instances of time by processing the set of phase measurements received at each instance of time with the recurrent neural network to produce the position of the vehicle at each instance of time.Type: GrantFiled: October 31, 2018Date of Patent: August 3, 2021Assignee: Mitsubishi Electric Research Laboratories, Inc.Inventors: Kyeong Jin Kim, Philip Orlik, Chiori Hori
-
Publication number: 20210082398Abstract: A computer-implemented method for training a dialogue response generation system and the dialogue response generation system are provided. The method includes arranging a first multimodal encoder-decoder for the dialogue response generation or video description having a first input and a first output, wherein the first multimodal encoder-decoder has been pretrained by training audio-video datasets with training video description sentences, arranging a second multimodal encoder-decoder for dialog response generation having a second input and a second output, providing first audio-visual datasets with first corresponding video description sentences to the first input of the first multimodal encoder-decoder, wherein the first encoder-decoder generates first output values based on the first audio-visual datasets with the first corresponding description sentences, providing the first audio-visual datasets excluding the first corresponding video description sentences to the second multimodal encoder-decoder.Type: ApplicationFiled: September 13, 2019Publication date: March 18, 2021Inventors: Chiori Hori, Anoop Cherian, Tim Marks, Takaaki Hori
-
Publication number: 20200132861Abstract: A positioning system for tracking a position of a vehicle includes a receiver configured to receive phase measurements of satellite signals received at multiple instances of time from multiple satellites, and a memory configured to store a recurrent neural network trained to determine a position of the vehicle from a set of phase measurements in a presence of noise caused by a multipath transmission of at least some of the satellite signals at some instances of time. A processor of the positioning system is configured to track the position of the vehicle over different instances of time by processing the set of phase measurements received at each instance of time with the recurrent neural network to produce the position of the vehicle at each instance of time.Type: ApplicationFiled: October 31, 2018Publication date: April 30, 2020Inventors: Kyeong Jin Kim, Philip Orlik, Chiori Hori
-
Patent number: 10417498Abstract: A system for generating a word sequence includes one or more processors in connection with a memory and one or more storage devices storing instructions causing operations that include receiving first and second input vectors, extracting first and second feature vectors, estimating a first set of weights and a second set of weights, calculating a first content vector from the first set of weights and the first feature vectors, and calculating a second content vector, transforming the first content vector into a first modal content vector having a predetermined dimension and transforming the second content vector into a second modal content vector having the predetermined dimension, estimating a set of modal attention weights, generating a weighted content vector having the predetermined dimension from the set of modal attention weights and the first and second modal content vectors, and generating a predicted word using the sequence generator.Type: GrantFiled: March 29, 2017Date of Patent: September 17, 2019Assignee: Mitsubishi Electric Research Laboratories, Inc.Inventors: Chiori Hori, Takaaki Hori, John Hershey, Tim Marks
-
Patent number: 10176799Abstract: A method and for training a language model to reduce recognition errors, wherein the language model is a recurrent neural network language model (RNNLM) by first acquiring training samples. An automatic speech recognition system (ASR) is applied to the training samples to produce recognized words and probabilites of the recognized words, and an N-best list is selected from the recognized words based on the probabilities. determining word errors using reference data for hypotheses in the N-best list. The hypotheses are rescored using the RNNLM. Then, we determine gradients for the hypotheses using the word errors and gradients for words in the hypotheses. Lastly, parameters of the RNNLM are updated using a sum of the gradients.Type: GrantFiled: February 2, 2016Date of Patent: January 8, 2019Assignee: Mitsubishi Electric Research Laboratories, Inc.Inventors: Takaaki Hori, Chiori Hori, Shinji Watanabe, John Hershey
-
Publication number: 20180189572Abstract: A system for generating a word sequence includes one or more processors in connection with a memory and one or more storage devices storing instructions causing operations that include receiving first and second input vectors, extracting first and second feature vectors, estimating a first set of weights and a second set of weights, calculating a first content vector from the first set of weights and the first feature vectors, and calculating a second content vector, transforming the first content vector into a first modal content vector having a predetermined dimension and transforming the second content vector into a second modal content vector having the predetermined dimension, estimating a set of modal attention weights, generating a weighted content vector having the predetermined dimension from the set of modal attention weights and the first and second modal content vectors, and generating a predicted word using the sequence generator.Type: ApplicationFiled: March 29, 2017Publication date: July 5, 2018Applicant: Mitsubishi Electric Research Laboratories, Inc.Inventors: Chiori Hori, Takaaki Hori, John Hershey, Tim Marks