Patents by Inventor Shijie Geng

Shijie Geng has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Enhanced document visual question answering system via hierarchical attention

Patent number: 12346655

Abstract: Systems and methods for performing Document Visual Question Answering tasks are described. A document and query are received. The document encodes document tokens and the query encodes query tokens. The document is segmented into nested document sections, lines, and tokens. A nested structure of tokens is generated based on the segmented document. A feature vector for each token is generated. A graph structure is generated based on the nested structure of tokens. Each graph node corresponds to the query, a document section, a line, or a token. The node connections correspond to the nested structure. Each node is associated with the feature vector for the corresponding object. A graph attention network is employed to generate another embedding for each node. These embeddings are employed to identify a portion of the document that includes a response to the query. An indication of the identified portion of the document is be provided.

Type: Grant

Filed: November 17, 2021

Date of Patent: July 1, 2025

Assignee: Adobe Inc.

Inventors: Shijie Geng, Christopher Tensmeyer, Curtis Michael Wigington, Jiuxiang Gu
ENHANCED DOCUMENT VISUAL QUESTION ANSWERING SYSTEM VIA HIERARCHICAL ATTENTION

Publication number: 20230153531

Abstract: Systems and methods for performing Document Visual Question Answering tasks are described. A document and query are received. The document encodes document tokens and the query encodes query tokens. The document is segmented into nested document sections, lines, and tokens. A nested structure of tokens is generated based on the segmented document. A feature vector for each token is generated. A graph structure is generated based on the nested structure of tokens. Each graph node corresponds to the query, a document section, a line, or a token. The node connections correspond to the nested structure. Each node is associated with the feature vector for the corresponding object. A graph attention network is employed to generate another embedding for each node. These embeddings are employed to identify a portion of the document that includes a response to the query. An indication of the identified portion of the document is be provided.

Type: Application

Filed: November 17, 2021

Publication date: May 18, 2023

Inventors: Shijie Geng, Christopher Tensmeyer, Curtis Michael Wigington, Jiuxiang Gu
Multi-Dimensional Deep Neural Network

Publication number: 20220076100

Abstract: An artificial intelligence (AI) system is disclosed. The AI system comprises an input interface to accept input data; a memory storing a multi-dimensional neural network having a sequence of deep neural networks (DNNs) with an inner DNN and an outer DNN; a processor configured to submit the input data to the multi-dimensional neural network to produce an output of the outer DNN and an output interface to render at least a function of the output. Each DNN processes the input data sequentially by a sequence of layers along a first dimension of data propagation. The DNNs are arranged along a second dimension of data propagation from the inner DNN to the outer DNN. Further, the DNNs are connected such that an output of at least one layer of a DNN is combined with an input to at least one layer of subsequent DNN in the sequence of DNNs.

Type: Application

Filed: September 10, 2020

Publication date: March 10, 2022

Applicant: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Chiori Hori, Peng Gao, Shijie Geng, Takaaki Hori, Jonathan Le Roux
Scene-aware video dialog

Patent number: 11210523

Abstract: A scene aware dialog system includes an input interface to receive a sequence of video frames, contextual information, and a query and a memory configured to store neural networks trained to generate a response to the input query by analyzing one or combination of input sequence of video frames and the input contextual information. The system further includes a processor configured to detect and classify objects in each video frame of the sequence of video frames; determine relationships among the classified objects in each of the video frame; extract features representing the classified objects and the determined relationships for each of the video frame to produce a sequence of feature vectors; and submit the sequence of feature vectors, the input query and the input contextual information to the neural network to generate a response to the input query.

Type: Grant

Filed: February 6, 2020

Date of Patent: December 28, 2021

Assignee: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Shijie Geng, Peng Gao, Anoop Cherian, Chiori Hori, Jonathan Le Roux
Scene-Aware Video Dialog

Publication number: 20210248375

Abstract: A scene aware dialog system includes an input interface to receive a sequence of video frames, contextual information, and a query and a memory configured to store neural networks trained to generate a response to the input query by analyzing one or combination of input sequence of video frames and the input contextual information. The system further includes a processor configured to detect and classify objects in each video frame of the sequence of video frames; determine relationships among the classified objects in each of the video frame; extract features representing the classified objects and the determined relationships for each of the video frame to produce a sequence of feature vectors; and submit the sequence of feature vectors, the input query and the input contextual information to the neural network to generate a response to the input query.

Type: Application

Filed: February 6, 2020

Publication date: August 12, 2021

Applicant: Mitsubishi Electric Research Laboratories, Inc.

Inventors: Shijie Geng, Peng Gao, Anoop Cherian, Chiori Hori, Jonathan Le Roux

Enhanced document visual question answering system via hierarchical attention

ENHANCED DOCUMENT VISUAL QUESTION ANSWERING SYSTEM VIA HIERARCHICAL ATTENTION

Multi-Dimensional Deep Neural Network

Scene-aware video dialog

Scene-Aware Video Dialog