Patents by Inventor Jonathan Le Roux
Jonathan Le Roux has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11579598Abstract: A system for controlling an operation of a machine including a plurality of actuators assisting one or multiple tools to perform one or multiple tasks, in response to receiving an acoustic mixture of signals generated by the tool performing a task and by the plurality of actuators actuating the tool, submit the acoustic mixture of signals into a neural network trained to separate from the acoustic mixture a signal generated by the tool performing the task from signals generated by the actuators actuating the tool to extract the signal generated by the tool performing the task from the acoustic mixture of signals, analyze the extracted signal to produce a state of performance of the task, and execute a control action selected according to the state of performance of the task.Type: GrantFiled: October 17, 2019Date of Patent: February 14, 2023Assignee: Mitsubishi Electric Research Laboratories, Inc.Inventors: Gordon Wichern, Jonathan Le Roux, Fatemeh Pishdadian
-
Patent number: 11582485Abstract: Embodiments of the present disclosure discloses a scene-aware video encoder system. The scene-aware encoder system transforms a sequence of video frames of a video of a scene into a spatio-temporal scene graph. The spatio-temporal scene graph includes nodes representing one or multiple static and dynamic objects in the scene. Each node of the spatio-temporal scene graph describes an appearance, a location, and/or a motion of each of the objects (static and dynamic objects) at different time instances. The nodes of the spatio-temporal scene graph are embedded into a latent space using a spatio-temporal transformer encoding different combinations of different nodes of the spatio-temporal scene graph corresponding to different spatio-temporal volumes of the scene. Each node of the different nodes encoded in each of the combinations is weighted with an attention score determined as a function of similarities of spatio-temporal locations of the different nodes in the combination.Type: GrantFiled: February 7, 2022Date of Patent: February 14, 2023Assignee: Mitsubishi Electric Research Laboratories, Inc.Inventors: Anoop Cherian, Chiori Hori, Jonathan Le Roux, Tim Marks, Alan Sullivan
-
Publication number: 20230042468Abstract: A system and method for reverberation reduction is disclosed. A first Deep Neural Network (DNN) produces a first estimate of a target direct-path signal from a mixture of acoustic signals that include the target direct-path signal and a reverberation of the target direct-path signal. A filter modeling a room impulse response (RIR) for the first estimate is estimated. The filter when applied to the first estimate of the target direct-path signal generates a result closest to a residual between the mixture of the acoustic signals and the first estimate of the target direct-path signal according to a distance function. A mixture with reduced reverberation of the target direct-path signal is obtained by removing the result of applying the filter to the first estimate of the target direct-path signal from the received mixture. A second DNN produces a second estimate of the target direct-path signal from the mixture with reduced reverberation.Type: ApplicationFiled: March 10, 2022Publication date: February 9, 2023Applicant: Mitsubishi Electric Research Laboratories, Inc.Inventors: Zhong-Qiu Wang, Gordon Wichern, Jonathan Le Roux
-
Publication number: 20230020834Abstract: Embodiments disclose a method and system for a scene-aware audio-video representation of a scene. The scene-aware audio video representation corresponds to a graph of nodes connected by edges. A node in the graph is indicative of the video features of an object in the scene. An edge in the graph connecting two nodes indicates an interaction of the corresponding two objects in the scene. In the graph, at least one or more edges are associated with audio features of a sound generated by the interaction of the corresponding two objects. The graph of the audio-video representation of the scene may be used to perform a variety of different tasks. Examples of the tasks include one or a combination of an action recognition, an anomaly detection, a sound localization and enhancement, a noisy-background sound removal, and a system control.Type: ApplicationFiled: July 19, 2021Publication date: January 19, 2023Applicant: Mitsubishi Electric Research Laboratories, Inc.Inventors: Moitreya Chatterjee, Anoop Cherian, Jonathan Le Roux
-
Publication number: 20230017503Abstract: The present disclosure provides an artificial intelligence (AI) system for sequence-to-sequence modeling with attention adapted for streaming applications. The AI system comprises at least one processor; and memory having instructions stored thereon that, when executed by the processor, cause the AI system to process each input frame in a sequence of input frames through layers of a deep neural network (DNN) to produce a sequence of outputs. At least some of the layers of the DNN include a dual self-attention module having a dual non-causal and causal architecture attending to non-causal frames and causal frames. Further, the AI system renders the sequence of outputs.Type: ApplicationFiled: July 2, 2021Publication date: January 19, 2023Applicant: Mitsubishi Electric Research Laboratories, Inc.Inventors: Niko Moritz, Takaaki Hori, Jonathan Le Roux
-
Patent number: 11557283Abstract: An artificial intelligence (AI) system is disclosed. The AI system includes a processor that processes a sequence of input frames with a neural network including a dilated self-attention module trained to compute a sequence of outputs by transforming each input frame into a corresponding query frame, a corresponding key frame, and a corresponding value frame leading to a sequence of key frames, a sequence of value frames, and a sequence of query frames of same ordering and by performing attention calculations for each query frame with respect to a combination of a portion of the sequences of key and value frames restricted based on a location of the query frame and a dilation sequence of the key frames and a dilation sequence of value frames extracted by processing different frames of the sequences of key and value frames with a predetermined extraction function. Further, the processor renders the sequence of outputs.Type: GrantFiled: March 26, 2021Date of Patent: January 17, 2023Assignee: Mitsubishi Electric Research Laboratories, Inc.Inventors: Niko Moritz, Takaaki Hori, Jonathan Le Roux
-
Patent number: 11475908Abstract: The audio processing system includes a memory to store a neural network trained to process an audio mixture to output estimation of at least a subset of a set of audio sources present in the audio mixture. The audio sources are subject to hierarchical constraints enforcing a parent-children hierarchy on the set of audio sources, such that a parent audio source in includes a mixture of its one or multiple children audio sources. The subset includes a parent audio source and at least one of its children audio sources. The system further comprises a processor to process a received input audio mixture using the neural network to estimate the subset of audio sources and their mutual relationships according to the parent-children hierarchy. The system further includes an output interface configured to render the extracted audio sources and their mutual relationships.Type: GrantFiled: October 7, 2020Date of Patent: October 18, 2022Assignee: Mitsubishi Electric Research Laboratories, Inc.Inventors: Gordon Wichern, Jonathan Le Roux, Ethan Manilow
-
Patent number: 11462211Abstract: A linguistic system for transcribing an input, where the linguistic system comprises a processor configured to execute a neural network multiple times while varying weights of at least some nodes of the neural network to produce multiple transcriptions of the input. Further, determine a distribution of pairwise distances of the multiple transcriptions; determine a legitimacy of the input based on the distribution; and transcribe the input using stored weights of the nodes of the neural network when the input is determined as legitimate to produce a final transcription of the input.Type: GrantFiled: April 9, 2020Date of Patent: October 4, 2022Assignee: Mitsubishi Electric Research Laboratories, Inc.Inventors: Jonathan Le Roux, Tejas Jayashankar, Pierre Moulin
-
Publication number: 20220310070Abstract: An artificial intelligence (AI) system is disclosed. The AI system includes a processor that processes a sequence of input frames with a neural network including a dilated self-attention module trained to compute a sequence of outputs by transforming each input frame into a corresponding query frame, a corresponding key frame, and a corresponding value frame leading to a sequence of key frames, a sequence of value frames, and a sequence of query frames of same ordering and by performing attention calculations for each query frame with respect to a combination of a portion of the sequences of key and value frames restricted based on a location of the query frame and a dilation sequence of the key frames and a dilation sequence of value frames extracted by processing different frames of the sequences of key and value frames with a predetermined extraction function. Further, the processor renders the sequence of outputs.Type: ApplicationFiled: March 26, 2021Publication date: September 29, 2022Applicant: Mitsubishi Electric Research Laboratories, Inc.Inventors: Niko Moritz, Takaaki Hori, Jonathan Le Roux
-
Patent number: 11445267Abstract: A scene captioning system is provided. The scene captioning system includes an interface configured to acquire a stream of scene data signals including frames and sound data, a memory to store a computer-executable scene captioning model including a scene encoder, a timing decoder, a timing detector, and a caption decoder, wherein the audio-visual encoder is shared by the timing decoder and the timing detector and the caption decoder, and a processor, in connection with the memory. The processor is configured to perform steps of extracting scene features from the scene data signals by use of the audio-visual encoder, determining a timing of generating a caption by use of the timing detector, wherein the timing is arranged an early stage of the stream of scene data signals, and generating the caption based on the scene features by using the caption decoder according to the timing.Type: GrantFiled: July 23, 2021Date of Patent: September 13, 2022Assignee: Mitsubishi Electric Research Laboratories, Inc.Inventors: Chiori Hori, Takaaki Hori, Anoop Cherian, Tim Marks, Jonathan Le Roux
-
Patent number: 11373639Abstract: A speech recognition system successively processes each encoder state of encoded acoustic features with a frame-synchronous decoder (FSD) and label-synchronous decoder (LSD) modules. Upon identifying an encoder state carrying information about new transcription output, the system expands a current list of FSD prefixes with FSD module, evaluates the FSD prefixes with LSD module, and prunes the FSD prefixes according to joint FSD and LSD scores. FSD and LSD modules are synchronized by having LSD module to process the portion of the encoder states including new transcription output identified by the FSD module and to produce LSD scores for the FSD prefixes determined by the FSD module.Type: GrantFiled: December 12, 2019Date of Patent: June 28, 2022Assignee: Mitsubishi Electric Research Laboratories, Inc.Inventors: Niko Moritz, Takaaki Hori, Jonathan Le Roux
-
Publication number: 20220129749Abstract: A method for training a neural network with a graph-based temporal classification (GTC) objective function, using a directed graph of nodes connected by edges representing labels and transitions among the labels, is provided. The directed graph specifies one or a combination of non-monotonic alignment between a sequence of labels and a sequence of probability distributions and constraints on the label repetitions. The method comprises executing a neural network to transform a sequence of observations into the sequence of probability distributions, and updating parameters of the neural network based on the GTC objective function configured to maximize a sum of conditional probabilities of all possible sequences of labels that are generated by unfolding the directed graph to the length of the sequence of observations and mapping each unfolded sequence of nodes and edges to a possible sequence of labels.Type: ApplicationFiled: April 20, 2021Publication date: April 28, 2022Applicant: Mitsubishi Electric Research Laboratories, Inc.Inventors: Niko Moritz, Takaaki Hori, Jonathan Le Roux
-
Publication number: 20220115006Abstract: This invention relates generally to speech processing and more particularly to end-to-end automatic speech recognition (ASR) that utilizes long contextual information. Some embodiments of the invention provide a system and a method for end-to-end ASR suitable for recognizing long audio recordings such as lecture and conversational speeches. This disclosure includes a Transformer-based ASR system that utilizes contextual information, wherein the Transformer accepts multiple utterances at the same time and predicts transcript for the last utterance. This is repeated in a sliding-window fashion with one-utterance shifts to recognize the entire recording. In addition, some embodiments of the present invention may use acoustic and/or text features obtained from only the previous utterances spoken by the same speaker as the last utterance when the long audio recording includes multiple speakers.Type: ApplicationFiled: October 13, 2020Publication date: April 14, 2022Applicant: Mitsubishi Electric Research Laboratories, Inc.Inventors: Takaaki Hori, Niko Moritz, Chiori Hori, Jonathan Le Roux
-
Publication number: 20220108698Abstract: An audio processing system is provided. The audio processing system comprises an input interface configured to accept an audio signal. Further, the audio processing system comprises a memory configured to store a neural network trained to determine different types of attributes of multiple concurrent audio events of different origins, wherein the types of attributes include time-dependent and time-agnostic attributes of speech and non-speech audio events. Further, the audio processing system comprises a processor configured to process the audio signal with the neural network to produce metadata of the audio signal, the metadata including one or multiple attributes of one or multiple audio events in the audio signal.Type: ApplicationFiled: October 7, 2020Publication date: April 7, 2022Applicant: Mitsubishi Electric Research Laboratories, Inc.Inventors: Niko Moritz, Gordon Wichern, Takaaki Hori, Jonathan Le Roux
-
Publication number: 20220101869Abstract: The audio processing system includes a memory to store a neural network trained to process an audio mixture to output estimation of at least a subset of a set of audio sources present in the audio mixture. The audio sources are subject to hierarchical constraints enforcing a parent-children hierarchy on the set of audio sources, such that a parent audio source in includes a mixture of its one or multiple children audio sources. The subset includes a parent audio source and at least one of its children audio sources. The system further comprises a processor to process a received input audio mixture using the neural network to estimate the subset of audio sources and their mutual relationships according to the parent-children hierarchy. The system further includes an output interface configured to render the extracted audio sources and their mutual relationships.Type: ApplicationFiled: October 7, 2020Publication date: March 31, 2022Applicant: Mitsubishi Electric Research Laboratories, Inc.Inventors: Gordon Wichern, Jonathan Le Roux, Ethan Manilow
-
Publication number: 20220076100Abstract: An artificial intelligence (AI) system is disclosed. The AI system comprises an input interface to accept input data; a memory storing a multi-dimensional neural network having a sequence of deep neural networks (DNNs) with an inner DNN and an outer DNN; a processor configured to submit the input data to the multi-dimensional neural network to produce an output of the outer DNN and an output interface to render at least a function of the output. Each DNN processes the input data sequentially by a sequence of layers along a first dimension of data propagation. The DNNs are arranged along a second dimension of data propagation from the inner DNN to the outer DNN. Further, the DNNs are connected such that an output of at least one layer of a DNN is combined with an input to at least one layer of subsequent DNN in the sequence of DNNs.Type: ApplicationFiled: September 10, 2020Publication date: March 10, 2022Applicant: Mitsubishi Electric Research Laboratories, Inc.Inventors: Chiori Hori, Peng Gao, Shijie Geng, Takaaki Hori, Jonathan Le Roux
-
Patent number: 11210523Abstract: A scene aware dialog system includes an input interface to receive a sequence of video frames, contextual information, and a query and a memory configured to store neural networks trained to generate a response to the input query by analyzing one or combination of input sequence of video frames and the input contextual information. The system further includes a processor configured to detect and classify objects in each video frame of the sequence of video frames; determine relationships among the classified objects in each of the video frame; extract features representing the classified objects and the determined relationships for each of the video frame to produce a sequence of feature vectors; and submit the sequence of feature vectors, the input query and the input contextual information to the neural network to generate a response to the input query.Type: GrantFiled: February 6, 2020Date of Patent: December 28, 2021Assignee: Mitsubishi Electric Research Laboratories, Inc.Inventors: Shijie Geng, Peng Gao, Anoop Cherian, Chiori Hori, Jonathan Le Roux
-
Publication number: 20210319784Abstract: A linguistic system for transcribing an input, where the linguistic system comprises a processor configured to execute a neural network multiple times while varying weights of at least some nodes of the neural network to produce multiple transcriptions of the input. Further, determine a distribution of pairwise distances of the multiple transcriptions; determine a legitimacy of the input based on the distribution; and transcribe the input using stored weights of the nodes of the neural network when the input is determined as legitimate to produce a final transcription of the input.Type: ApplicationFiled: April 9, 2020Publication date: October 14, 2021Applicant: Mitsubishi Electric Research Laboratories, Inc.Inventors: Jonathan Le Roux, Tejas Jayashankar, Pierre Moulin
-
Patent number: 11100920Abstract: A speech recognition system includes an encoder to convert an input acoustic signal into a sequence of encoder states, an alignment decoder to identify locations of encoder states in the sequence of encoder states that encode transcription outputs, a partition module to partition the sequence of encoder states into a set of partitions based on the locations of the identified encoder states, and an attention-based decoder to determine the transcription outputs for each partition of encoder states submitted to the attention-based decoder as an input. Upon receiving the acoustic signal, the system uses the encoder to produce the sequence of encoder states, partitions the sequence of encoder states into the set of partitions based on the locations of the encoder states identified by the alignment decoder, and submits the set of partitions sequentially into the attention-based decoder to produce a transcription output for each of the submitted partitions.Type: GrantFiled: March 25, 2019Date of Patent: August 24, 2021Assignee: Mitsubishi Electric Research Laboratories, Inc.Inventors: Niko Moritz, Takaaki Hori, Jonathan Le Roux
-
Publication number: 20210248375Abstract: A scene aware dialog system includes an input interface to receive a sequence of video frames, contextual information, and a query and a memory configured to store neural networks trained to generate a response to the input query by analyzing one or combination of input sequence of video frames and the input contextual information. The system further includes a processor configured to detect and classify objects in each video frame of the sequence of video frames; determine relationships among the classified objects in each of the video frame; extract features representing the classified objects and the determined relationships for each of the video frame to produce a sequence of feature vectors; and submit the sequence of feature vectors, the input query and the input contextual information to the neural network to generate a response to the input query.Type: ApplicationFiled: February 6, 2020Publication date: August 12, 2021Applicant: Mitsubishi Electric Research Laboratories, Inc.Inventors: Shijie Geng, Peng Gao, Anoop Cherian, Chiori Hori, Jonathan Le Roux