Patents by Inventor Jonathan Le Roux

Jonathan Le Roux has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11579598
    Abstract: A system for controlling an operation of a machine including a plurality of actuators assisting one or multiple tools to perform one or multiple tasks, in response to receiving an acoustic mixture of signals generated by the tool performing a task and by the plurality of actuators actuating the tool, submit the acoustic mixture of signals into a neural network trained to separate from the acoustic mixture a signal generated by the tool performing the task from signals generated by the actuators actuating the tool to extract the signal generated by the tool performing the task from the acoustic mixture of signals, analyze the extracted signal to produce a state of performance of the task, and execute a control action selected according to the state of performance of the task.
    Type: Grant
    Filed: October 17, 2019
    Date of Patent: February 14, 2023
    Assignee: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Gordon Wichern, Jonathan Le Roux, Fatemeh Pishdadian
  • Patent number: 11582485
    Abstract: Embodiments of the present disclosure discloses a scene-aware video encoder system. The scene-aware encoder system transforms a sequence of video frames of a video of a scene into a spatio-temporal scene graph. The spatio-temporal scene graph includes nodes representing one or multiple static and dynamic objects in the scene. Each node of the spatio-temporal scene graph describes an appearance, a location, and/or a motion of each of the objects (static and dynamic objects) at different time instances. The nodes of the spatio-temporal scene graph are embedded into a latent space using a spatio-temporal transformer encoding different combinations of different nodes of the spatio-temporal scene graph corresponding to different spatio-temporal volumes of the scene. Each node of the different nodes encoded in each of the combinations is weighted with an attention score determined as a function of similarities of spatio-temporal locations of the different nodes in the combination.
    Type: Grant
    Filed: February 7, 2022
    Date of Patent: February 14, 2023
    Assignee: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Anoop Cherian, Chiori Hori, Jonathan Le Roux, Tim Marks, Alan Sullivan
  • Publication number: 20230042468
    Abstract: A system and method for reverberation reduction is disclosed. A first Deep Neural Network (DNN) produces a first estimate of a target direct-path signal from a mixture of acoustic signals that include the target direct-path signal and a reverberation of the target direct-path signal. A filter modeling a room impulse response (RIR) for the first estimate is estimated. The filter when applied to the first estimate of the target direct-path signal generates a result closest to a residual between the mixture of the acoustic signals and the first estimate of the target direct-path signal according to a distance function. A mixture with reduced reverberation of the target direct-path signal is obtained by removing the result of applying the filter to the first estimate of the target direct-path signal from the received mixture. A second DNN produces a second estimate of the target direct-path signal from the mixture with reduced reverberation.
    Type: Application
    Filed: March 10, 2022
    Publication date: February 9, 2023
    Applicant: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Zhong-Qiu Wang, Gordon Wichern, Jonathan Le Roux
  • Publication number: 20230020834
    Abstract: Embodiments disclose a method and system for a scene-aware audio-video representation of a scene. The scene-aware audio video representation corresponds to a graph of nodes connected by edges. A node in the graph is indicative of the video features of an object in the scene. An edge in the graph connecting two nodes indicates an interaction of the corresponding two objects in the scene. In the graph, at least one or more edges are associated with audio features of a sound generated by the interaction of the corresponding two objects. The graph of the audio-video representation of the scene may be used to perform a variety of different tasks. Examples of the tasks include one or a combination of an action recognition, an anomaly detection, a sound localization and enhancement, a noisy-background sound removal, and a system control.
    Type: Application
    Filed: July 19, 2021
    Publication date: January 19, 2023
    Applicant: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Moitreya Chatterjee, Anoop Cherian, Jonathan Le Roux
  • Publication number: 20230017503
    Abstract: The present disclosure provides an artificial intelligence (AI) system for sequence-to-sequence modeling with attention adapted for streaming applications. The AI system comprises at least one processor; and memory having instructions stored thereon that, when executed by the processor, cause the AI system to process each input frame in a sequence of input frames through layers of a deep neural network (DNN) to produce a sequence of outputs. At least some of the layers of the DNN include a dual self-attention module having a dual non-causal and causal architecture attending to non-causal frames and causal frames. Further, the AI system renders the sequence of outputs.
    Type: Application
    Filed: July 2, 2021
    Publication date: January 19, 2023
    Applicant: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Niko Moritz, Takaaki Hori, Jonathan Le Roux
  • Patent number: 11557283
    Abstract: An artificial intelligence (AI) system is disclosed. The AI system includes a processor that processes a sequence of input frames with a neural network including a dilated self-attention module trained to compute a sequence of outputs by transforming each input frame into a corresponding query frame, a corresponding key frame, and a corresponding value frame leading to a sequence of key frames, a sequence of value frames, and a sequence of query frames of same ordering and by performing attention calculations for each query frame with respect to a combination of a portion of the sequences of key and value frames restricted based on a location of the query frame and a dilation sequence of the key frames and a dilation sequence of value frames extracted by processing different frames of the sequences of key and value frames with a predetermined extraction function. Further, the processor renders the sequence of outputs.
    Type: Grant
    Filed: March 26, 2021
    Date of Patent: January 17, 2023
    Assignee: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Niko Moritz, Takaaki Hori, Jonathan Le Roux
  • Patent number: 11475908
    Abstract: The audio processing system includes a memory to store a neural network trained to process an audio mixture to output estimation of at least a subset of a set of audio sources present in the audio mixture. The audio sources are subject to hierarchical constraints enforcing a parent-children hierarchy on the set of audio sources, such that a parent audio source in includes a mixture of its one or multiple children audio sources. The subset includes a parent audio source and at least one of its children audio sources. The system further comprises a processor to process a received input audio mixture using the neural network to estimate the subset of audio sources and their mutual relationships according to the parent-children hierarchy. The system further includes an output interface configured to render the extracted audio sources and their mutual relationships.
    Type: Grant
    Filed: October 7, 2020
    Date of Patent: October 18, 2022
    Assignee: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Gordon Wichern, Jonathan Le Roux, Ethan Manilow
  • Patent number: 11462211
    Abstract: A linguistic system for transcribing an input, where the linguistic system comprises a processor configured to execute a neural network multiple times while varying weights of at least some nodes of the neural network to produce multiple transcriptions of the input. Further, determine a distribution of pairwise distances of the multiple transcriptions; determine a legitimacy of the input based on the distribution; and transcribe the input using stored weights of the nodes of the neural network when the input is determined as legitimate to produce a final transcription of the input.
    Type: Grant
    Filed: April 9, 2020
    Date of Patent: October 4, 2022
    Assignee: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Jonathan Le Roux, Tejas Jayashankar, Pierre Moulin
  • Publication number: 20220310070
    Abstract: An artificial intelligence (AI) system is disclosed. The AI system includes a processor that processes a sequence of input frames with a neural network including a dilated self-attention module trained to compute a sequence of outputs by transforming each input frame into a corresponding query frame, a corresponding key frame, and a corresponding value frame leading to a sequence of key frames, a sequence of value frames, and a sequence of query frames of same ordering and by performing attention calculations for each query frame with respect to a combination of a portion of the sequences of key and value frames restricted based on a location of the query frame and a dilation sequence of the key frames and a dilation sequence of value frames extracted by processing different frames of the sequences of key and value frames with a predetermined extraction function. Further, the processor renders the sequence of outputs.
    Type: Application
    Filed: March 26, 2021
    Publication date: September 29, 2022
    Applicant: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Niko Moritz, Takaaki Hori, Jonathan Le Roux
  • Patent number: 11445267
    Abstract: A scene captioning system is provided. The scene captioning system includes an interface configured to acquire a stream of scene data signals including frames and sound data, a memory to store a computer-executable scene captioning model including a scene encoder, a timing decoder, a timing detector, and a caption decoder, wherein the audio-visual encoder is shared by the timing decoder and the timing detector and the caption decoder, and a processor, in connection with the memory. The processor is configured to perform steps of extracting scene features from the scene data signals by use of the audio-visual encoder, determining a timing of generating a caption by use of the timing detector, wherein the timing is arranged an early stage of the stream of scene data signals, and generating the caption based on the scene features by using the caption decoder according to the timing.
    Type: Grant
    Filed: July 23, 2021
    Date of Patent: September 13, 2022
    Assignee: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Chiori Hori, Takaaki Hori, Anoop Cherian, Tim Marks, Jonathan Le Roux
  • Patent number: 11373639
    Abstract: A speech recognition system successively processes each encoder state of encoded acoustic features with a frame-synchronous decoder (FSD) and label-synchronous decoder (LSD) modules. Upon identifying an encoder state carrying information about new transcription output, the system expands a current list of FSD prefixes with FSD module, evaluates the FSD prefixes with LSD module, and prunes the FSD prefixes according to joint FSD and LSD scores. FSD and LSD modules are synchronized by having LSD module to process the portion of the encoder states including new transcription output identified by the FSD module and to produce LSD scores for the FSD prefixes determined by the FSD module.
    Type: Grant
    Filed: December 12, 2019
    Date of Patent: June 28, 2022
    Assignee: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Niko Moritz, Takaaki Hori, Jonathan Le Roux
  • Publication number: 20220129749
    Abstract: A method for training a neural network with a graph-based temporal classification (GTC) objective function, using a directed graph of nodes connected by edges representing labels and transitions among the labels, is provided. The directed graph specifies one or a combination of non-monotonic alignment between a sequence of labels and a sequence of probability distributions and constraints on the label repetitions. The method comprises executing a neural network to transform a sequence of observations into the sequence of probability distributions, and updating parameters of the neural network based on the GTC objective function configured to maximize a sum of conditional probabilities of all possible sequences of labels that are generated by unfolding the directed graph to the length of the sequence of observations and mapping each unfolded sequence of nodes and edges to a possible sequence of labels.
    Type: Application
    Filed: April 20, 2021
    Publication date: April 28, 2022
    Applicant: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Niko Moritz, Takaaki Hori, Jonathan Le Roux
  • Publication number: 20220115006
    Abstract: This invention relates generally to speech processing and more particularly to end-to-end automatic speech recognition (ASR) that utilizes long contextual information. Some embodiments of the invention provide a system and a method for end-to-end ASR suitable for recognizing long audio recordings such as lecture and conversational speeches. This disclosure includes a Transformer-based ASR system that utilizes contextual information, wherein the Transformer accepts multiple utterances at the same time and predicts transcript for the last utterance. This is repeated in a sliding-window fashion with one-utterance shifts to recognize the entire recording. In addition, some embodiments of the present invention may use acoustic and/or text features obtained from only the previous utterances spoken by the same speaker as the last utterance when the long audio recording includes multiple speakers.
    Type: Application
    Filed: October 13, 2020
    Publication date: April 14, 2022
    Applicant: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Takaaki Hori, Niko Moritz, Chiori Hori, Jonathan Le Roux
  • Publication number: 20220108698
    Abstract: An audio processing system is provided. The audio processing system comprises an input interface configured to accept an audio signal. Further, the audio processing system comprises a memory configured to store a neural network trained to determine different types of attributes of multiple concurrent audio events of different origins, wherein the types of attributes include time-dependent and time-agnostic attributes of speech and non-speech audio events. Further, the audio processing system comprises a processor configured to process the audio signal with the neural network to produce metadata of the audio signal, the metadata including one or multiple attributes of one or multiple audio events in the audio signal.
    Type: Application
    Filed: October 7, 2020
    Publication date: April 7, 2022
    Applicant: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Niko Moritz, Gordon Wichern, Takaaki Hori, Jonathan Le Roux
  • Publication number: 20220101869
    Abstract: The audio processing system includes a memory to store a neural network trained to process an audio mixture to output estimation of at least a subset of a set of audio sources present in the audio mixture. The audio sources are subject to hierarchical constraints enforcing a parent-children hierarchy on the set of audio sources, such that a parent audio source in includes a mixture of its one or multiple children audio sources. The subset includes a parent audio source and at least one of its children audio sources. The system further comprises a processor to process a received input audio mixture using the neural network to estimate the subset of audio sources and their mutual relationships according to the parent-children hierarchy. The system further includes an output interface configured to render the extracted audio sources and their mutual relationships.
    Type: Application
    Filed: October 7, 2020
    Publication date: March 31, 2022
    Applicant: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Gordon Wichern, Jonathan Le Roux, Ethan Manilow
  • Publication number: 20220076100
    Abstract: An artificial intelligence (AI) system is disclosed. The AI system comprises an input interface to accept input data; a memory storing a multi-dimensional neural network having a sequence of deep neural networks (DNNs) with an inner DNN and an outer DNN; a processor configured to submit the input data to the multi-dimensional neural network to produce an output of the outer DNN and an output interface to render at least a function of the output. Each DNN processes the input data sequentially by a sequence of layers along a first dimension of data propagation. The DNNs are arranged along a second dimension of data propagation from the inner DNN to the outer DNN. Further, the DNNs are connected such that an output of at least one layer of a DNN is combined with an input to at least one layer of subsequent DNN in the sequence of DNNs.
    Type: Application
    Filed: September 10, 2020
    Publication date: March 10, 2022
    Applicant: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Chiori Hori, Peng Gao, Shijie Geng, Takaaki Hori, Jonathan Le Roux
  • Patent number: 11210523
    Abstract: A scene aware dialog system includes an input interface to receive a sequence of video frames, contextual information, and a query and a memory configured to store neural networks trained to generate a response to the input query by analyzing one or combination of input sequence of video frames and the input contextual information. The system further includes a processor configured to detect and classify objects in each video frame of the sequence of video frames; determine relationships among the classified objects in each of the video frame; extract features representing the classified objects and the determined relationships for each of the video frame to produce a sequence of feature vectors; and submit the sequence of feature vectors, the input query and the input contextual information to the neural network to generate a response to the input query.
    Type: Grant
    Filed: February 6, 2020
    Date of Patent: December 28, 2021
    Assignee: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Shijie Geng, Peng Gao, Anoop Cherian, Chiori Hori, Jonathan Le Roux
  • Publication number: 20210319784
    Abstract: A linguistic system for transcribing an input, where the linguistic system comprises a processor configured to execute a neural network multiple times while varying weights of at least some nodes of the neural network to produce multiple transcriptions of the input. Further, determine a distribution of pairwise distances of the multiple transcriptions; determine a legitimacy of the input based on the distribution; and transcribe the input using stored weights of the nodes of the neural network when the input is determined as legitimate to produce a final transcription of the input.
    Type: Application
    Filed: April 9, 2020
    Publication date: October 14, 2021
    Applicant: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Jonathan Le Roux, Tejas Jayashankar, Pierre Moulin
  • Patent number: 11100920
    Abstract: A speech recognition system includes an encoder to convert an input acoustic signal into a sequence of encoder states, an alignment decoder to identify locations of encoder states in the sequence of encoder states that encode transcription outputs, a partition module to partition the sequence of encoder states into a set of partitions based on the locations of the identified encoder states, and an attention-based decoder to determine the transcription outputs for each partition of encoder states submitted to the attention-based decoder as an input. Upon receiving the acoustic signal, the system uses the encoder to produce the sequence of encoder states, partitions the sequence of encoder states into the set of partitions based on the locations of the encoder states identified by the alignment decoder, and submits the set of partitions sequentially into the attention-based decoder to produce a transcription output for each of the submitted partitions.
    Type: Grant
    Filed: March 25, 2019
    Date of Patent: August 24, 2021
    Assignee: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Niko Moritz, Takaaki Hori, Jonathan Le Roux
  • Publication number: 20210248375
    Abstract: A scene aware dialog system includes an input interface to receive a sequence of video frames, contextual information, and a query and a memory configured to store neural networks trained to generate a response to the input query by analyzing one or combination of input sequence of video frames and the input contextual information. The system further includes a processor configured to detect and classify objects in each video frame of the sequence of video frames; determine relationships among the classified objects in each of the video frame; extract features representing the classified objects and the determined relationships for each of the video frame to produce a sequence of feature vectors; and submit the sequence of feature vectors, the input query and the input contextual information to the neural network to generate a response to the input query.
    Type: Application
    Filed: February 6, 2020
    Publication date: August 12, 2021
    Applicant: Mitsubishi Electric Research Laboratories, Inc.
    Inventors: Shijie Geng, Peng Gao, Anoop Cherian, Chiori Hori, Jonathan Le Roux