Patents by Inventor Takaaki Hori

Takaaki Hori has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

End-to-End Speech Recognition Adapted for Multi-Speaker Applications

Publication number: 20240153508

Abstract: A system for performing end-to-end automatic speech recognition (ASR). The system configured to collect a sequence of acoustic frames associated with a mixture of speeches performed by multiple speakers. Each frame from the sequence of acoustic frames is encoded using a multi-head encoder which encodes each frame into a likelihood of a transcription output and a likelihood of an identity of a speaker. The multi-head encoder thus produces a sequence of likelihoods of transcription outputs and a sequence of likelihoods of identities of the speakers corresponding to the sequence of acoustic frames that are decoded using a decoder performing an alignment operation for producing a sequence of transcription outputs annotated with identities of the speakers, for performing speaker separation.

Type: Application

Filed: October 26, 2022

Publication date: May 9, 2024

Inventors: Niko Moritz, Jonathan Le Roux, Takaaki Hori
Long-context end-to-end speech recognition system

Patent number: 11978435

Abstract: This invention relates generally to speech processing and more particularly to end-to-end automatic speech recognition (ASR) that utilizes long contextual information. Some embodiments of the invention provide a system and a method for end-to-end ASR suitable for recognizing long audio recordings such as lecture and conversational speeches. This disclosure includes a Transformer-based ASR system that utilizes contextual information, wherein the Transformer accepts multiple utterances at the same time and predicts transcript for the last utterance. This is repeated in a sliding-window fashion with one-utterance shifts to recognize the entire recording. In addition, some embodiments of the present invention may use acoustic and/or text features obtained from only the previous utterances spoken by the same speaker as the last utterance when the long audio recording includes multiple speakers.

Type: Grant

Filed: October 13, 2020

Date of Patent: May 7, 2024

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Takaaki Hori, Niko Moritz, Chiori Hori, Jonathan Le Roux
Artificial intelligence system for sequence-to-sequence processing with attention adapted for streaming applications

Patent number: 11810552

Abstract: The present disclosure provides an artificial intelligence (AI) system for sequence-to-sequence modeling with attention adapted for streaming applications. The AI system comprises at least one processor; and memory having instructions stored thereon that, when executed by the processor, cause the AI system to process each input frame in a sequence of input frames through layers of a deep neural network (DNN) to produce a sequence of outputs. At least some of the layers of the DNN include a dual self-attention module having a dual non-causal and causal architecture attending to non-causal frames and causal frames. Further, the AI system renders the sequence of outputs.

Type: Grant

Filed: July 2, 2021

Date of Patent: November 7, 2023

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Niko Moritz, Takaaki Hori, Jonathan Le Roux
System and method for producing metadata of an audio signal

Patent number: 11756551

Abstract: An audio processing system is provided. The audio processing system comprises an input interface configured to accept an audio signal. Further, the audio processing system comprises a memory configured to store a neural network trained to determine different types of attributes of multiple concurrent audio events of different origins, wherein the types of attributes include time-dependent and time-agnostic attributes of speech and non-speech audio events. Further, the audio processing system comprises a processor configured to process the audio signal with the neural network to produce metadata of the audio signal, the metadata including one or multiple attributes of one or multiple audio events in the audio signal.

Type: Grant

Filed: October 7, 2020

Date of Patent: September 12, 2023

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Niko Moritz, Gordon Wichern, Takaaki Hori, Jonathan Le Roux
Method and system for scene-aware interaction

Patent number: 11635299

Abstract: A navigation system for providing driving instructions to a driver of a vehicle traveling on a route is provided. The driving instructions are generated by executing a multimodal fusion method that comprises extracting features from sensor measurements, annotating the features with directions for the vehicle to follow the route with respect to objects sensed by the sensors, and encoding the annotated features with a multimodal attention neural network to produce encodings. The encodings are transformed into a common latent space, and the transformed encodings are fused using an attention mechanism producing an encoded representation of the scene. The method further comprises decoding the encoded representation with a sentence generation neural network to generate a driving instruction and submitting the driving instruction to an output device.

Type: Grant

Filed: February 6, 2020

Date of Patent: April 25, 2023

Inventors: Chiori Hori, Anoop Cherian, Siheng Chen, Tim Marks, Jonathan Le Roux, Takaaki Hori, Bret Harsham, Anthony Vetro, Alan Sullivan
Artificial Intelligence System for Sequence-to-Sequence Processing With Attention Adapted for Streaming Applications

Publication number: 20230017503

Abstract: The present disclosure provides an artificial intelligence (AI) system for sequence-to-sequence modeling with attention adapted for streaming applications. The AI system comprises at least one processor; and memory having instructions stored thereon that, when executed by the processor, cause the AI system to process each input frame in a sequence of input frames through layers of a deep neural network (DNN) to produce a sequence of outputs. At least some of the layers of the DNN include a dual self-attention module having a dual non-causal and causal architecture attending to non-causal frames and causal frames. Further, the AI system renders the sequence of outputs.

Type: Application

Filed: July 2, 2021

Publication date: January 19, 2023

Applicant: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Niko Moritz, Takaaki Hori, Jonathan Le Roux
Artificial intelligence system for capturing context by dilated self-attention

Patent number: 11557283

Abstract: An artificial intelligence (AI) system is disclosed. The AI system includes a processor that processes a sequence of input frames with a neural network including a dilated self-attention module trained to compute a sequence of outputs by transforming each input frame into a corresponding query frame, a corresponding key frame, and a corresponding value frame leading to a sequence of key frames, a sequence of value frames, and a sequence of query frames of same ordering and by performing attention calculations for each query frame with respect to a combination of a portion of the sequences of key and value frames restricted based on a location of the query frame and a dilation sequence of the key frames and a dilation sequence of value frames extracted by processing different frames of the sequences of key and value frames with a predetermined extraction function. Further, the processor renders the sequence of outputs.

Type: Grant

Filed: March 26, 2021

Date of Patent: January 17, 2023

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Niko Moritz, Takaaki Hori, Jonathan Le Roux
Artificial Intelligence System for Capturing Context by Dilated Self-Attention

Publication number: 20220310070

Abstract: An artificial intelligence (AI) system is disclosed. The AI system includes a processor that processes a sequence of input frames with a neural network including a dilated self-attention module trained to compute a sequence of outputs by transforming each input frame into a corresponding query frame, a corresponding key frame, and a corresponding value frame leading to a sequence of key frames, a sequence of value frames, and a sequence of query frames of same ordering and by performing attention calculations for each query frame with respect to a combination of a portion of the sequences of key and value frames restricted based on a location of the query frame and a dilation sequence of the key frames and a dilation sequence of value frames extracted by processing different frames of the sequences of key and value frames with a predetermined extraction function. Further, the processor renders the sequence of outputs.

Type: Application

Filed: March 26, 2021

Publication date: September 29, 2022

Applicant: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Niko Moritz, Takaaki Hori, Jonathan Le Roux
Low-latency captioning system

Patent number: 11445267

Abstract: A scene captioning system is provided. The scene captioning system includes an interface configured to acquire a stream of scene data signals including frames and sound data, a memory to store a computer-executable scene captioning model including a scene encoder, a timing decoder, a timing detector, and a caption decoder, wherein the audio-visual encoder is shared by the timing decoder and the timing detector and the caption decoder, and a processor, in connection with the memory. The processor is configured to perform steps of extracting scene features from the scene data signals by use of the audio-visual encoder, determining a timing of generating a caption by use of the timing detector, wherein the timing is arranged an early stage of the stream of scene data signals, and generating the caption based on the scene features by using the caption decoder according to the timing.

Type: Grant

Filed: July 23, 2021

Date of Patent: September 13, 2022

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Chiori Hori, Takaaki Hori, Anoop Cherian, Tim Marks, Jonathan Le Roux
System and method for streaming end-to-end speech recognition with asynchronous decoders pruning prefixes using a joint label and frame information in transcribing technique

Patent number: 11373639

Abstract: A speech recognition system successively processes each encoder state of encoded acoustic features with a frame-synchronous decoder (FSD) and label-synchronous decoder (LSD) modules. Upon identifying an encoder state carrying information about new transcription output, the system expands a current list of FSD prefixes with FSD module, evaluates the FSD prefixes with LSD module, and prunes the FSD prefixes according to joint FSD and LSD scores. FSD and LSD modules are synchronized by having LSD module to process the portion of the encoder states including new transcription output identified by the FSD module and to produce LSD scores for the FSD prefixes determined by the FSD module.

Type: Grant

Filed: December 12, 2019

Date of Patent: June 28, 2022

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Niko Moritz, Takaaki Hori, Jonathan Le Roux
Training a Neural Network using Graph-Based Temporal Classification

Publication number: 20220129749

Abstract: A method for training a neural network with a graph-based temporal classification (GTC) objective function, using a directed graph of nodes connected by edges representing labels and transitions among the labels, is provided. The directed graph specifies one or a combination of non-monotonic alignment between a sequence of labels and a sequence of probability distributions and constraints on the label repetitions. The method comprises executing a neural network to transform a sequence of observations into the sequence of probability distributions, and updating parameters of the neural network based on the GTC objective function configured to maximize a sum of conditional probabilities of all possible sequences of labels that are generated by unfolding the directed graph to the length of the sequence of observations and mapping each unfolded sequence of nodes and edges to a possible sequence of labels.

Type: Application

Filed: April 20, 2021

Publication date: April 28, 2022

Applicant: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Niko Moritz, Takaaki Hori, Jonathan Le Roux
Long-context End-to-end Speech Recognition System

Publication number: 20220115006

Abstract: This invention relates generally to speech processing and more particularly to end-to-end automatic speech recognition (ASR) that utilizes long contextual information. Some embodiments of the invention provide a system and a method for end-to-end ASR suitable for recognizing long audio recordings such as lecture and conversational speeches. This disclosure includes a Transformer-based ASR system that utilizes contextual information, wherein the Transformer accepts multiple utterances at the same time and predicts transcript for the last utterance. This is repeated in a sliding-window fashion with one-utterance shifts to recognize the entire recording. In addition, some embodiments of the present invention may use acoustic and/or text features obtained from only the previous utterances spoken by the same speaker as the last utterance when the long audio recording includes multiple speakers.

Type: Application

Filed: October 13, 2020

Publication date: April 14, 2022

Applicant: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Takaaki Hori, Niko Moritz, Chiori Hori, Jonathan Le Roux
System and Method for Producing Metadata of an Audio Signal

Publication number: 20220108698

Abstract: An audio processing system is provided. The audio processing system comprises an input interface configured to accept an audio signal. Further, the audio processing system comprises a memory configured to store a neural network trained to determine different types of attributes of multiple concurrent audio events of different origins, wherein the types of attributes include time-dependent and time-agnostic attributes of speech and non-speech audio events. Further, the audio processing system comprises a processor configured to process the audio signal with the neural network to produce metadata of the audio signal, the metadata including one or multiple attributes of one or multiple audio events in the audio signal.

Type: Application

Filed: October 7, 2020

Publication date: April 7, 2022

Applicant: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Niko Moritz, Gordon Wichern, Takaaki Hori, Jonathan Le Roux
Multi-Dimensional Deep Neural Network

Publication number: 20220076100

Abstract: An artificial intelligence (AI) system is disclosed. The AI system comprises an input interface to accept input data; a memory storing a multi-dimensional neural network having a sequence of deep neural networks (DNNs) with an inner DNN and an outer DNN; a processor configured to submit the input data to the multi-dimensional neural network to produce an output of the outer DNN and an output interface to render at least a function of the output. Each DNN processes the input data sequentially by a sequence of layers along a first dimension of data propagation. The DNNs are arranged along a second dimension of data propagation from the inner DNN to the outer DNN. Further, the DNNs are connected such that an output of at least one layer of a DNN is combined with an input to at least one layer of subsequent DNN in the sequence of DNNs.

Type: Application

Filed: September 10, 2020

Publication date: March 10, 2022

Applicant: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Chiori Hori, Peng Gao, Shijie Geng, Takaaki Hori, Jonathan Le Roux
System and method for a dialogue response generation system

Patent number: 11264009

Abstract: A computer-implemented method for training a dialogue response generation system and the dialogue response generation system are provided. The method includes arranging a first multimodal encoder-decoder for the dialogue response generation or video description having a first input and a first output, wherein the first multimodal encoder-decoder has been pretrained by training audio-video datasets with training video description sentences, arranging a second multimodal encoder-decoder for dialog response generation having a second input and a second output, providing first audio-visual datasets with first corresponding video description sentences to the first input of the first multimodal encoder-decoder, wherein the first encoder-decoder generates first output values based on the first audio-visual datasets with the first corresponding description sentences, providing the first audio-visual datasets excluding the first corresponding video description sentences to the second multimodal encoder-decoder.

Type: Grant

Filed: September 13, 2019

Date of Patent: March 1, 2022

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Chiori Hori, Anoop Cherian, Tim Marks, Takaaki Hori
System and method for multichannel end-to-end speech recognition

Patent number: 11133011

Abstract: A speech recognition system includes a plurality of microphones to receive acoustic signals including speech signals, an input interface to generate multichannel inputs from the acoustic signals, one or more storages to store a multichannel speech recognition network, wherein the multichannel speech recognition network comprises mask estimation networks to generate time-frequency masks from the multichannel inputs, a beamformer network trained to select a reference channel input from the multichannel inputs using the time-frequency masks and generate an enhanced speech dataset based on the reference channel input and an encoder-decoder network trained to transform the enhanced speech dataset into a text. The system further includes one or more processors, using the multichannel speech recognition network in association with the one or more storages, to generate the text from the multichannel inputs, and an output interface to render the text.

Type: Grant

Filed: October 3, 2017

Date of Patent: September 28, 2021

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Shinji Watanabe, Tsubasa Ochiai, Takaaki Hori, John R Hershey
System and method for end-to-end speech recognition with triggered attention

Patent number: 11100920

Abstract: A speech recognition system includes an encoder to convert an input acoustic signal into a sequence of encoder states, an alignment decoder to identify locations of encoder states in the sequence of encoder states that encode transcription outputs, a partition module to partition the sequence of encoder states into a set of partitions based on the locations of the identified encoder states, and an attention-based decoder to determine the transcription outputs for each partition of encoder states submitted to the attention-based decoder as an input. Upon receiving the acoustic signal, the system uses the encoder to produce the sequence of encoder states, partitions the sequence of encoder states into the set of partitions based on the locations of the encoder states identified by the alignment decoder, and submits the set of partitions sequentially into the attention-based decoder to produce a transcription output for each of the submitted partitions.

Type: Grant

Filed: March 25, 2019

Date of Patent: August 24, 2021

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Niko Moritz, Takaaki Hori, Jonathan Le Roux
Method and System for Scene-Aware Interaction

Publication number: 20210247201

Abstract: A navigation system configured to provide driving instructions to a driver of a moving vehicle based on real-time description of objects in a scene pertinent to driving the vehicle is provided.

Type: Application

Filed: February 6, 2020

Publication date: August 12, 2021

Applicant: Mitsubishi ELectric Research Laboratories, Inc.

Inventors: Chiori Hori, Anoop Cherian, Siheng Chen, Tim Marks, Jonathan Le Roux, Takaaki Hori, Bret Harsham, Anthony Vetro, Alan Sullivan
Method and system for multi-label classification

Patent number: 11086918

Abstract: A method for performing multi-label classification includes extracting a feature vector from an input vector including input data by a feature extractor, determining, by a label predictor, a relevant vector including relevant labels having relevant scores based on the feature vector, updating a binary masking vector by masking pre-selected labels having been selected in previous label selections, applying the updated binary masking vector to the relevant vector such that the relevant label vector is updated to exclude the pre-selected labels from the relevant labels, and selecting a relevant label from the updated relevant label vector based on the relevant scores of the updated relevant label vector.

Type: Grant

Filed: December 7, 2016

Date of Patent: August 10, 2021

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Takaaki Hori, Chiori Hori, Shinji Watanabe, John Hershey, Bret Harsham, Jonathan Le Roux
System and Method for Streaming end-to-end Speech Recognition with Asynchronous Decoders

Publication number: 20210183373

Abstract: A speech recognition system successively processes each encoder state of encoded acoustic features with a frame-synchronous decoder (FSD) and label-synchronous decoder (LSD) modules. Upon identifying an encoder state carrying information about new transcription output, the system expands a current list of FSD prefixes with FSD module, evaluates the FSD prefixes with LSD module, and prunes the FSD prefixes according to joint FSD and LSD scores. FSD and LSD modules are synchronized by having LSD module to process the portion of the encoder states including new transcription output identified by the FSD module and to produce LSD scores for the FSD prefixes determined by the FSD module.

Type: Application

Filed: December 12, 2019

Publication date: June 17, 2021

Applicant: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Niko Moritz, Takaaki Hori, Jonathan Le Roux

1 2 3 next