Patents by Inventor Mingfei Gao

Mingfei Gao has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Systems and methods for partially supervised online action detection in untrimmed videos

Patent number: 12299982

Abstract: Embodiments described herein provide systems and methods for a partially supervised training model for online action detection. Specifically, the online action detection framework may include two modules that are trained jointly—a Temporal Proposal Generator (TPG) and an Online Action Recognizer (OAR). In the training phase, OAR performs both online per-frame action recognition and start point detection. At the same time, TPG generates class-wise temporal action proposals serving as noisy supervisions for OAR. TPG is then optimized with the video-level annotations. In this way, the online action detection framework can be trained with video-category labels only without pre-annotated segment-level boundary labels.

Type: Grant

Filed: July 16, 2020

Date of Patent: May 13, 2025

Assignee: Salesforce, Inc.

Inventors: Mingfei Gao, Yingbo Zhou, Ran Xu, Caiming Xiong
Active learning via a sample consistency assessment

Patent number: 12271822

Abstract: A method for active learning includes obtaining a set of unlabeled training samples and for each unlabeled training sample, perturbing the unlabeled training sample to generate an augmented training sample. The method includes generating, using a machine learning model, a predicted label for both samples and determining an inconsistency value for the unlabeled training sample that represents variance between the predicted labels for the unlabeled and augmented training samples. The method includes sorting the unlabeled training samples based on the inconsistency values and obtaining, for a threshold number of samples selected from the sorted unlabeled training samples, a ground truth label. The method includes selecting a current set of labeled training samples including each selected unlabeled training samples paired with the corresponding ground truth label. The method includes training, using the current set and a proper subset of unlabeled training samples, the machine learning model.

Type: Grant

Filed: August 21, 2020

Date of Patent: April 8, 2025

Assignee: GOOGLE LLC

Inventors: Zizhao Zhang, Tomas Jon Pfister, Sercan Omer Arik, Mingfei Gao
Systems and methods for online adaptation for cross-domain streaming data

Patent number: 12235850

Abstract: Embodiments described herein provide an online domain adaptation framework based on cross-domain bootstrapping for online domain adaptation, in which the target domain streaming data is deleted immediately after adapted. At each online query, the data diversity is increased across domains by bootstrapping the source domain to form diverse combinations with the current target query. To fully take advantage of the valuable discrepancies among the diverse combinations, a set of independent learners are trained to preserve the differences. The knowledge of the learners is then integrated by exchanging their predicted pseudo-labels on the current target query to co-supervise the learning on the target domain, but without sharing the weights to maintain the learners' divergence.

Type: Grant

Filed: January 28, 2022

Date of Patent: February 25, 2025

Assignee: Salesforce, Inc.

Inventors: Luyu Yang, Mingfei Gao, Zeyuan Chen, Ran Xu, Chetan Ramaiah
Systems and methods for open vocabulary object detection

Patent number: 12198453

Abstract: Embodiments described herein provide methods and systems for open vocabulary object detection of images. given a pre-trained vision-language model and an image-caption pair, an activation map may be computed in the image that corresponds to an object of interest mentioned in the caption. The activation map is then converted into a pseudo bounding-box label for the corresponding object category. The open vocabulary detector is then directly supervised by these pseudo box-labels, which enables training object detectors with no human-provided bounding-box annotations.

Type: Grant

Filed: January 28, 2022

Date of Patent: January 14, 2025

Assignee: Salesforce, Inc.

Inventors: Mingfei Gao, Chen Xing
PROCESSING FORMS USING ARTIFICIAL INTELLIGENCE MODELS

Publication number: 20240338961

Abstract: An application server may receive an input document including a set of input text fields and an input key phrase querying a value for a key-value pair that corresponds to one or more of the set of input text fields. The application server may extract, using an optical character recognition model, a set of character strings and a set of two-dimensional locations of the set of character strings on a layout of the input document. After extraction, the application server may input the extracted set of character strings and the set of two-dimensional locations into a machine learned model that is trained to compute a probability that a character string corresponds to the value for the key-value pair. The application server may then identify the value for the key-value pair corresponding to the input key phrase and may out the identified value.

Type: Application

Filed: June 18, 2024

Publication date: October 10, 2024

Inventors: Mingfei Gao, Ran Xu
Systems and methods for field extraction from unlabeled data

Patent number: 12086698

Abstract: A field extraction system that does not require field-level annotations for training is provided. Specifically, the training process is bootstrapped by mining pseudo-labels from unlabeled forms using simple rules. Then, a transformer-based structure is used to model interactions between text tokens in the input form and predict a field tag for each token accordingly. The pseudo-labels are used to supervise the transformer training. As the pseudo-labels are noisy, a refinement module that contains a sequence of branches is used to refine the pseudo-labels. Each of the refinement branches conducts field tagging and generates refined labels. At each stage, a branch is optimized by the labels ensembled from all previous branches to reduce label noise.

Type: Grant

Filed: September 24, 2021

Date of Patent: September 10, 2024

Assignee: Salesforce, Inc.

Inventors: Mingfei Gao, Zeyuan Chen, Ran Xu
Processing forms using artificial intelligence models

Patent number: 12039798

Abstract: An application server may receive an input document including a set of input text fields and an input key phrase querying a value for a key-value pair that corresponds to one or more of the set of input text fields. The application server may extract, using an optical character recognition model, a set of character strings and a set of two-dimensional locations of the set of character strings on a layout of the input document. After extraction, the application server may input the extracted set of character strings and the set of two-dimensional locations into a machine learned model that is trained to compute a probability that a character string corresponds to the value for the key-value pair. The application server may then identify the value for the key-value pair corresponding to the input key phrase and may out the identified value.

Type: Grant

Filed: November 1, 2021

Date of Patent: July 16, 2024

Assignee: Salesforce, Inc.

Inventors: Mingfei Gao, Ran Xu
ACTIVE LEARNING VIA A SAMPLE CONSISTENCY ASSESSMENT

Publication number: 20230325676

Abstract: A method includes obtaining a set of unlabeled training samples. For each training sample in the set of unlabeled training samples generating, the method includes using a machine learning model and the training sample, a corresponding first prediction, generating, using the machine learning model and a modified unlabeled training sample, a second prediction, the modified unlabeled training sample based on the training sample, and determining a difference between the first prediction and the second prediction. The method includes selecting, based on the differences, a subset of the set of unlabeled training samples. For each training sample in the subset of the set of unlabeled training samples, the method includes obtaining a ground truth label for the training sample, and generating a corresponding labeled training sample based on the training sample paired with the ground truth label. The method includes training the machine learning model using the corresponding labeled training samples.

Type: Application

Filed: June 13, 2023

Publication date: October 12, 2023

Applicant: Google LLC

Inventors: Zizhao Zhang, Tomas Jon Pfister, Sercan Omer Arik, Mingfei Gao
Image analysis based document processing for inference of key-value pairs in non-fixed digital documents

Patent number: 11699297

Abstract: An online system extracts information from non-fixed form documents. The online system receives an image of a form document and obtains a set of phrases and locations of the set of phrases on the form image. For at least one field, the online system determines key scores for the set of phrases. The online system identifies a set of candidate values for the field from the set of identified phrases and identifies a set of neighbors for each candidate value from the set of identified phrases. The online system determines neighbor scores, where a neighbor score for a candidate value and a respective neighbor is determined based on the key score for the neighbor and a spatial relationship of the neighbor to the candidate value. The online system selects a candidate value and a respective neighbor based on the neighbor score as the value and key for the field.

Type: Grant

Filed: January 4, 2021

Date of Patent: July 11, 2023

Assignee: Salesforce, Inc.

Inventors: Mingfei Gao, Zeyuan Chen, Le Xue, Ran Xu, Caiming Xiong
Weakly supervised natural language localization networks for video proposal prediction based on a text query

Patent number: 11687588

Abstract: Systems and methods are provided for weakly supervised natural language localization (WSNLL), for example, as implemented in a neural network or model. The WSNLL network is trained with long, untrimmed videos, i.e., videos that have not been temporally segmented or annotated. The WSNLL network or model defines or generates a video-sentence pair, which corresponds to a pairing of an untrimmed video with an input text sentence. According to some embodiments, the WSNLL network or model is implemented with a two-branch architecture, where one branch performs segment sentence alignment and the other one conducts segment selection. These methods and systems are specifically used to predict how a video proposal matches a text query using respective visual and text features.

Type: Grant

Filed: August 5, 2019

Date of Patent: June 27, 2023

Assignee: Salesforce.com, Inc.

Inventors: Mingfei Gao, Richard Socher, Caiming Xiong
SYSTEMS AND METHODS FOR OPEN VOCABULARY OBJECT DETECTION

Publication number: 20230154213

Abstract: Embodiments described herein provide methods and systems for open vocabulary object detection of images. given a pre-trained vision-language model and an image-caption pair, an activation map may be computed in the image that corresponds to an object of interest mentioned in the caption. The activation map is then converted into a pseudo bounding-box label for the corresponding object category. The open vocabulary detector is then directly supervised by these pseudo box-labels, which enables training object detectors with no human-provided bounding-box annotations.

Type: Application

Filed: January 28, 2022

Publication date: May 18, 2023

Inventors: Mingfei Gao, Chen Xing
SYSTEMS AND METHODS FOR ONLINE ADAPTATION FOR CROSS-DOMAIN STREAMING DATA

Publication number: 20230153307

Abstract: Embodiments described herein provide an online domain adaptation framework based on cross-domain bootstrapping for online domain adaptation, in which the target domain streaming data is deleted immediately after adapted. At each online query, the data diversity is increased across domains by bootstrapping the source domain to form diverse combinations with the current target query. To fully take advantage of the valuable discrepancies among the diverse combinations, a set of independent learners are trained to preserve the differences. The knowledge of the learners is then integrated by exchanging their predicted pseudo-labels on the current target query to co-supervise the learning on the target domain, but without sharing the weights to maintain the learners' divergence.

Type: Application

Filed: January 28, 2022

Publication date: May 18, 2023

Inventors: Luyu Yang, Mingfei Gao, Zeyuan Chen, Ran Xu, Chetan Ramaiah
PROCESSING FORMS USING ARTIFICIAL INTELLIGENCE MODELS

Publication number: 20230133690

Abstract: An application server may receive an input document including a set of input text fields and an input key phrase querying a value for a key-value pair that corresponds to one or more of the set of input text fields. The application server may extract, using an optical character recognition model, a set of character strings and a set of two-dimensional locations of the set of character strings on a layout of the input document. After extraction, the application server may input the extracted set of character strings and the set of two-dimensional locations into a machine learned model that is trained to compute a probability that a character string corresponds to the value for the key-value pair. The application server may then identify the value for the key-value pair corresponding to the input key phrase and may out the identified value.

Type: Application

Filed: November 1, 2021

Publication date: May 4, 2023

Inventors: Mingfei Gao, Ran Xu
SYSTEMS AND METHODS FOR FIELD EXTRACTION FROM UNLABELED DATA

Publication number: 20220374631

Abstract: Embodiments described a field extraction system that does not require field-level annotations for training. Specifically, the training process is bootstrapped by mining pseudo-labels from unlabeled forms using simple rules. Then, a transformer-based structure is used to model interactions between text tokens in the input form and predict a field tag for each token accordingly. The pseudo-labels are used to supervise the transformer training. As the pseudo-labels are noisy, a refinement module that contains a sequence of branches is used to refine the pseudo-labels. Each of the refinement branches conducts field tagging and generates refined labels. At each stage, a branch is optimized by the labels ensembled from all previous branches to reduce label noise.

Type: Application

Filed: September 24, 2021

Publication date: November 24, 2022

Inventors: Mingfei Gao, Zeyuan Chen, Ran Xu
SYSTEMS AND METHODS FOR FIELD EXTRACTION FROM UNLABELED DATA

Publication number: 20220366317

Abstract: Embodiments described a field extraction system that does not require field-level annotations for training. Specifically, the training process is bootstrapped by mining pseudo-labels from unlabeled forms using simple rules. Then, a transformer-based structure is used to model interactions between text tokens in the input form and predict a field tag for each token accordingly. The pseudo-labels are used to supervise the transformer training. As the pseudo-labels are noisy, a refinement module that contains a sequence of branches is used to refine the pseudo-labels. Each of the refinement branches conducts field tagging and generates refined labels. At each stage, a branch is optimized by the labels ensembled from all previous branches to reduce label noise.

Type: Application

Filed: September 24, 2021

Publication date: November 17, 2022

Inventors: Mingfei Gao, Zeyuan Chen, Ran Xu
Systems for determining object importance in on-road driving scenarios and methods thereof

Patent number: 11420623

Abstract: Determining object importance in vehicle control systems can include obtaining, for a vehicle in operation, an image of a dynamic scene, identifying an object type associated with one or more objects in the image, determining, based on the object type and a goal associated with the vehicle, an importance metric associated with the one or more objects, and controlling the vehicle based at least in part on the importance metric associated with the one or more objects.

Type: Grant

Filed: March 20, 2019

Date of Patent: August 23, 2022

Assignee: HONDA MOTOR CO., LTD.

Inventors: Ashish Tawari, Sujitha Catherine Martin, Mingfei Gao
IMAGE ANALYSIS BASED DOCUMENT PROCESSING FOR INFERENCE OF KEY-VALUE PAIRS IN NON-FIXED DIGITAL DOCUMENTS

Publication number: 20220215195

Abstract: An online system extracts information from non-fixed form documents. The online system receives an image of a form document and obtains a set of phrases and locations of the set of phrases on the form image. For at least one field, the online system determines key scores for the set of phrases. The online system identifies a set of candidate values for the field from the set of identified phrases and identifies a set of neighbors for each candidate value from the set of identified phrases. The online system determines neighbor scores, where a neighbor score for a candidate value and a respective neighbor is determined based on the key score for the neighbor and a spatial relationship of the neighbor to the candidate value. The online system selects a candidate value and a respective neighbor based on the neighbor score as the value and key for the field.

Type: Application

Filed: January 4, 2021

Publication date: July 7, 2022

Inventors: Mingfei Gao, Zeyuan Chen, Le Xue, Ran Xu, Caiming Xiong
System and method for utilizing a temporal recurrent network for online action detection

Patent number: 11260872

Abstract: A system and method for utilizing a temporal recurrent network for online action detection that include receiving image data that is based on at least one image captured by a vehicle camera system. The system and method also include analyzing the image data to determine a plurality of image frames and outputting at least one goal-oriented action as determined during a current image frame. The system and method further include controlling a vehicle to be autonomously driven based on a naturalistic driving behavior data set that includes the at least one goal-oriented action.

Type: Grant

Filed: October 12, 2018

Date of Patent: March 1, 2022

Assignee: HONDA MOTOR CO., LTD.

Inventors: Yi-Ting Chen, Mingze Xu, Mingfei Gao
Two-stage online detection of action start in untrimmed videos

Patent number: 11232308

Abstract: Embodiments described herein provide a two-stage online detection of action start system including a classification module and a localization module. The classification module generates a set of action scores corresponding to a first video frame from the video, based on the first video frame and video frames before the first video frames in the video. Each action score indicating a respective probability that the first video frame contains a respective action class. The localization module is coupled to the classification module for receiving the set of action scores from the classification module and generating an action-agnostic start probability that the first video frame contains an action start.

Type: Grant

Filed: April 25, 2019

Date of Patent: January 25, 2022

Assignee: salesforce.com, inc.

Inventors: Mingfei Gao, Richard Socher, Caiming Xiong
SYSTEMS AND METHODS FOR PARTIALLY SUPERVISED ONLINE ACTION DETECTION IN UNTRIMMED VIDEOS

Publication number: 20210357687

Abstract: Embodiments described herein provide systems and methods for a partially supervised training model for online action detection. Specifically, the online action detection framework may include two modules that are trained jointly—a Temporal Proposal Generator (TPG) and an Online Action Recognizer (OAR). In the training phase, OAR performs both online per-frame action recognition and start point detection. At the same time, TPG generates class-wise temporal action proposals serving as noisy supervisions for OAR. TPG is then optimized with the video-level annotations. In this way, the online action detection framework can be trained with video-category labels only without pre-annotated segment-level boundary labels.

Type: Application

Filed: July 16, 2020

Publication date: November 18, 2021

Inventors: Mingfei Gao, Yingbo Zhou, Ran Xu, Caiming Xiong

1 2 next