Patents by Inventor Peixi Xiong

Peixi Xiong has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Multi-granularity alignment for visual question answering

Patent number: 12210835

Abstract: In one embodiment, a method includes accessing an image and a natural-language question regarding the image and extracting, from the image, a first set of image features at a first level of granularity and a second set of image features at a second level of granularity. The method further includes extracting, from the question, a first set of text features at the first level of granularity and a second set of text features at the second level of granularity; generating a first output representing an alignment between the first set of image features and the first set of text features; generating a second output representing an alignment between the second set of image features and the second set of text features; and determining an answer to the question based on the first output and the second output.

Type: Grant

Filed: September 16, 2022

Date of Patent: January 28, 2025

Assignee: Samsung Electronics Co., Ltd.

Inventors: Peixi Xiong, Yilin Shen, Hongxia Jin
NETWORK FOR STRUCTURE-BASED TEXT-TO-IMAGE GENERATION

Publication number: 20240185493

Abstract: Technology as described herein provides for generating an image via a generator network, including extracting structural relationship information from a text prompt, wherein the structural relationship information includes sentence features and token features, generating encoded text features based on the sentence features and on relation-related tokens, wherein the relation-related tokens are identified based on parsing text dependency information in the token features, and generating an output image based on combining, via self attention and cross-attention layers, the encoded text features and encoded image features from an input image canvas. Embodiments further include applying a gating function to modify image features based on text features. The self attention and cross-attention layers can be applied via a cross-modality network, the gating function can be applied via a residual gating network, and the relation-related tokens can be further identified via an attention matrix.

Type: Application

Filed: December 29, 2023

Publication date: June 6, 2024

Inventors: Peixi Xiong, Nilesh Jain
SEMANTIC-GUIDED TRANSFORMER FOR OBJECT RECOGNITION AND RADIANCE FIELD-BASED NOVEL VIEW

Publication number: 20240029455

Abstract: Systems, apparatuses and methods may provide for technology that encodes multi-view visual data into latent features via an aggregator encoder, decodes the latent features into one or more novel target views different from views of the multi-view visual data via a rendering decoder, and decodes the latent features into an object label via a label decoder. The operation to decode the latent features via the rendering decoder and to decode the latent features via the label decoder occur at least partially at the same time. The operation to encode, via the aggregator encoder, the multi-view visual data into the latent features further includes operations to: perform, via the aggregator encoder, semantic object recognition operations based on radiance field view synthesis operations, and perform, via the aggregator encoder, radiance field view synthesis operations based on semantic object recognition operations.

Type: Application

Filed: September 27, 2023

Publication date: January 25, 2024

Inventors: Peixi Xiong, Nilesh Jain, Ravishankar Iyer, Mrutunjayya Mrutunjayya
Multi-Granularity Alignment for Visual Question Answering

Publication number: 20230106716

Abstract: In one embodiment, a method includes accessing an image and a natural-language question regarding the image and extracting, from the image, a first set of image features at a first level of granularity and a second set of image features at a second level of granularity. The method further includes extracting, from the question, a first set of text features at the first level of granularity and a second set of text features at the second level of granularity; generating a first output representing an alignment between the first set of image features and the first set of text features; generating a second output representing an alignment between the second set of image features and the second set of text features; and determining an answer to the question based on the first output and the second output.

Type: Application

Filed: September 16, 2022

Publication date: April 6, 2023

Inventors: Peixi Xiong, Yilin Shen, Hongxia Jin

Multi-granularity alignment for visual question answering

NETWORK FOR STRUCTURE-BASED TEXT-TO-IMAGE GENERATION

SEMANTIC-GUIDED TRANSFORMER FOR OBJECT RECOGNITION AND RADIANCE FIELD-BASED NOVEL VIEW

Multi-Granularity Alignment for Visual Question Answering