Patents by Inventor Farley Lai

Farley Lai has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Few-shot action recognition

Patent number: 12266174

Abstract: Methods and systems of training a neural network include training a feature extractor and a classifier using a first set of training data that includes one or more base cases. The classifier is trained with few-shot adaptation using a second set of training data, smaller than the first set of training data, while keeping parameters of the feature extractor constant.

Type: Grant

Filed: July 12, 2022

Date of Patent: April 1, 2025

Assignee: NEC Corporation

Inventors: Biplob Debnath, Srimat Chakradhar, Oliver Po, Asim Kadav, Farley Lai, Farhan Asif Chowdhury
Temporal augmentation for training video reasoning system

Patent number: 12266157

Abstract: A method for augmenting video sequences in a video reasoning system is presented. The method includes randomly subsampling a sequence of video frames captured from one or more video cameras, randomly reversing the subsampled sequence of video frames to define a plurality of sub-sequences of randomly reversed video frames, training, in a training mode, a video reasoning model with temporally augmented input, including the plurality of sub-sequences of randomly reversed video frames, to make predictions over temporally augmented target classes, updating parameters of the video reasoning model by a machine leaning algorithm, and deploying, in an inference mode, the video reasoning model in the video reasoning system to make a final prediction related to a human action in the sequence of video frames.

Type: Grant

Filed: April 4, 2022

Date of Patent: April 1, 2025

Assignee: NEC Corporation

Inventors: Farley Lai, Asim Kadav
Keypoint based action localization

Patent number: 12198397

Abstract: A computer-implemented method is provided for action localization. The method includes converting one or more video frames into person keypoints and object keypoints. The method further includes embedding position, timestamp, instance, and type information with the person keypoints and object keypoints to obtain keypoint embeddings. The method also includes predicting, by a hierarchical transformer encoder using the keypoint embeddings, human actions and bounding box information of when and where the human actions occur in the one or more video frames.

Type: Grant

Filed: January 27, 2022

Date of Patent: January 14, 2025

Assignee: NEC Corporation

Inventors: Asim Kadav, Farley Lai, Hans Peter Graf, Yi Huang
Semi-automatic data collection and association for multi-camera tracking

Patent number: 12131489

Abstract: A surveillance system is provided. The surveillance system is configured for (i) detecting and tracking persons locally for each camera input video stream using the common area anchor boxes and assigning each detected ones of the persons a local track id, (ii) associating a same person in overlapping camera views to a global track id, and collecting associated track boxes as the same person moves in different camera views over time using a priority queue and the local track id and the global track id, (iii) performing track data collection to derive a spatial transformation through matched track box spatial features of a same person over time for scene coverage and (iv) learning a multi-camera tracker given visual features from matched track boxes of distinct people across cameras based on the derived spatial transformation.

Type: Grant

Filed: May 11, 2022

Date of Patent: October 29, 2024

Assignee: NEC Corporation

Inventors: Farley Lai, Asim Kadav, Likitha Lakshminarayanan
CUT-PASTE TRAINING AUGMENTATION FOR MACHINE LEARNING MODELS

Publication number: 20240273902

Abstract: Methods and systems of training a machine learning model include identifying an object or person related to an action in a first video. The object or person is copied from the first video to a second video to generate a third video. A machine learning model is trained using the first video and the third video.

Type: Application

Filed: February 12, 2024

Publication date: August 15, 2024

Inventors: Deep Patel, Giovanni Milione, Kai Li, Farley Lai, Erik Kruus
Multi-hop transformer for spatio-temporal reasoning and localization

Patent number: 11741712

Abstract: A method for using a multi-hop reasoning framework to perform multi-step compositional long-term reasoning is presented. The method includes extracting feature maps and frame-level representations from a video stream by using a convolutional neural network (CNN), performing object representation learning and detection, linking objects through time via tracking to generate object tracks and image feature tracks, feeding the object tracks and the image feature tracks to a multi-hop transformer that hops over frames in the video stream while concurrently attending to one or more of the objects in the video stream until the multi-hop transformer arrives at a correct answer, and employing video representation learning and recognition from the objects and image context to locate a target object within the video stream.

Type: Grant

Filed: September 1, 2021

Date of Patent: August 29, 2023

Inventors: Asim Kadav, Farley Lai, Hans Peter Graf, Alexandru Niculescu-Mizil, Renqiang Min, Honglu Zhou
COMPOSITIONAL REASONING OF GORUP ACTIVITY IN VIDEOS WITH KEYPOINT-ONLY MODALITY

Publication number: 20230148017

Abstract: A method for compositional reasoning of group activity in videos with keypoint-only modality is presented. The method includes obtaining video frames from a video stream received from a plurality of video image capturing devices, extracting keypoints all of persons detected in the video frames to define keypoint data, tokenizing the keypoint data with time and segment information, clustering groups of keypoint persons in the video frames and passing the clustering groups through multi-scale prediction, and performing a prediction to provide a group activity prediction of a scene in the video frames.

Type: Application

Filed: October 5, 2022

Publication date: May 11, 2023

Inventors: Asim Kadav, Farley Lai, Hans Peter Graf, Honglu Zhou
Contextual grounding of natural language phrases in images

Patent number: 11620814

Abstract: Aspects of the present disclosure describe systems, methods and structures providing contextual grounding—a higher-order interaction technique to capture corresponding context between text entities and visual objects.

Type: Grant

Filed: September 8, 2020

Date of Patent: April 4, 2023

Inventors: Farley Lai, Asim Kadav, Ning Xie
SELF-SUPERVISED MULTIMODAL REPRESENTATION LEARNING WITH CASCADE POSITIVE EXAMPLE MINING

Publication number: 20230086023

Abstract: A method for model training and deployment includes training, by a processor, a model to learn video representations with a self-supervised contrastive loss by performing progressive training in phases with an incremental number of positive instances from one or more video sequences, resetting the learning rate schedule in each of the phases, and inheriting model weights from a checkpoint from a previous training phase. The method further includes updating the trained model with the self-supervised contrastive loss given multiple positive instances obtained from Cascade K-Nearest Neighbor mining of the one or more video sequences by extracting features in different modalities to compute similarities between the one or more video sequences and selecting a top-k similar instances with features in different modalities. The method also includes fine-tuning the trained model for a downstream task.

Type: Application

Filed: September 8, 2022

Publication date: March 23, 2023

Inventors: Farley Lai, Asim Kadav, Cheng-En Wu
Action recognition with high-order interaction through spatial-temporal object tracking

Patent number: 11600067

Abstract: Aspects of the present disclosure describe systems, methods, and structures that provide action recognition with high-order interaction with spatio-temporal object tracking. Image and object features are organized into into tracks, which advantageously facilitates many possible learnable embeddings and intra/inter-track interaction(s). Operationally, our systems, method, and structures according to the present disclosure employ an efficient high-order interaction model to learn embeddings and intra/inter object track interaction across the space and time for AR. Each frame is detected by an object detector to locate visual objects. Those objects are linked through time to form object tracks. The object tracks are then organized and combined with the embeddings as the input to our model. The model is trained to generate representative embeddings and discriminative video features through high-order interaction which is formulated as an efficient matrix operation without iterative processing delay.

Type: Grant

Filed: September 9, 2020

Date of Patent: March 7, 2023

Inventors: Farley Lai, Asim Kadav, Jie Chen
FEW-SHOT ACTION RECOGNITION

Publication number: 20230049770

Abstract: Methods and systems of training a neural network include training a feature extractor and a classifier using a first set of training data that includes one or more base cases. The classifier is trained with few-shot adaptation using a second set of training data, smaller than the first set of training data, while keeping parameters of the feature extractor constant.

Type: Application

Filed: July 12, 2022

Publication date: February 16, 2023

Inventors: Biplob Debnath, Srimat Chakradhar, Oliver Po, Asim Kadav, Farley Lai, Farhan Asif Chowdhury
SEMI-AUTOMATIC DATA COLLECTION AND ASSOCIATION FOR MULTI-CAMERA TRACKING

Publication number: 20220383522

Abstract: A surveillance system is provided. The surveillance system is configured for (i) detecting and tracking persons locally for each camera input video stream using the common area anchor boxes and assigning each detected ones of the persons a local track id, (ii) associating a same person in overlapping camera views to a global track id, and collecting associated track boxes as the same person moves in different camera views over time using a priority queue and the local track id and the global track id, (iii) performing track data collection to derive a spatial transformation through matched track box spatial features of a same person over time for scene coverage and (iv) learning a multi-camera tracker given visual features from matched track boxes of distinct people across cameras based on the derived spatial transformation.

Type: Application

Filed: May 11, 2022

Publication date: December 1, 2022

Inventors: Farley Lai, Asim Kadav, Likitha Lakshminarayanan
Keypoint based pose-tracking using entailment

Patent number: 11475590

Abstract: Aspects of the present disclosure describe systems, methods and structures for an efficient multi-person posetracking method that advantageously achieves state-of-the-art performance on PoseTrack datasets by only using keypoint information in a tracking step without optical flow or convolution routines. As a consequence, our method has fewer parameters and FLOPs and achieves faster FPS. Our method benefits from our parameter-free tracking method that outperforms commonly used bounding box propagation in top-down methods. Finally, we disclose tokenization and embedding multi-person pose keypoint information in the transformer architecture that can be re-used for other pose tasks such as pose-based action recognition.

Type: Grant

Filed: September 9, 2020

Date of Patent: October 18, 2022

Inventors: Asim Kadav, Farley Lai, Hans Peter Graf, Michael Snower
TEMPORAL AUGMENTATION FOR TRAINING VIDEO REASONING SYSTEM

Publication number: 20220319157

Abstract: A method for augmenting video sequences in a video reasoning system is presented. The method includes randomly subsampling a sequence of video frames captured from one or more video cameras, randomly reversing the subsampled sequence of video frames to define a plurality of sub-sequences of randomly reversed video frames, training, in a training mode, a video reasoning model with temporally augmented input, including the plurality of sub-sequences of randomly reversed video frames, to make predictions over temporally augmented target classes, updating parameters of the video reasoning model by a machine leaning algorithm, and deploying, in an inference mode, the video reasoning model in the video reasoning system to make a final prediction related to a human action in the sequence of video frames.

Type: Application

Filed: April 4, 2022

Publication date: October 6, 2022

Inventors: Farley Lai, Asim Kadav
KEYPOINT BASED ACTION LOCALIZATION

Publication number: 20220237884

Abstract: A computer-implemented method is provided for action localization. The method includes converting one or more video frames into person keypoints and object keypoints. The method further includes embedding position, timestamp, instance, and type information with the person keypoints and object keypoints to obtain keypoint embeddings. The method also includes predicting, by a hierarchical transformer encoder using the keypoint embeddings, human actions and bounding box information of when and where the human actions occur in the one or more video frames.

Type: Application

Filed: January 27, 2022

Publication date: July 28, 2022

Inventors: Asim Kadav, Farley Lai, Hans Peter Graf, Yi Huang
MULTI-HOP TRANSFORMER FOR SPATIO-TEMPORAL REASONING AND LOCALIZATION

Publication number: 20220101007

Abstract: A method for using a multi-hop reasoning framework to perform multi-step compositional long-term reasoning is presented. The method includes extracting feature maps and frame-level representations from a video stream by using a convolutional neural network (CNN), performing object representation learning and detection, linking objects through time via tracking to generate object tracks and image feature tracks, feeding the object tracks and the image feature tracks to a multi-hop transformer that hops over frames in the video stream while concurrently attending to one or more of the objects in the video stream until the multi-hop transformer arrives at a correct answer, and employing video representation learning and recognition from the objects and image context to locate a target object within the video stream.

Type: Application

Filed: September 1, 2021

Publication date: March 31, 2022

Inventors: Asim Kadav, Farley Lai, Hans Peter Graf, Alexandru Niculescu-Mizil, Renqiang Min, Honglu Zhou
RULE ENABLED COMPOSITIONAL REASONING SYSTEM

Publication number: 20220083781

Abstract: A computer-implemented method is provided for compositional reasoning. The method includes producing a set of primitive predictions from an input sequence. Each of the primitive predictions is of a single action of a tracked subject to be composed in a complex action comprising multiple single actions. The method further includes performing contextual rule filtering of the primitive predictions to pass through filtered primitive predictions that interact with one or more entities of interest in the input sequence with respect to predefined contextual interaction criteria. The method includes performing, by a processor device, temporal rule matching by matching the filtered primitive predictions according to pre-defined temporal rules to identify complex event patterns in the sequence of primitive predictions.

Type: Application

Filed: September 1, 2021

Publication date: March 17, 2022

Inventors: Farley Lai, Asim Kadav, Anupriya Prasad
Learning representations of generalized cross-modal entailment tasks

Patent number: 11250299

Abstract: A method is provided for determining entailment between an input premise and an input hypothesis of different modalities. The method includes extracting features from the input hypothesis and an entirety of and regions of interest in the input premise. The method further includes deriving intra-modal relevant information while suppressing intra-modal irrelevant information, based on intra-modal interactions between elementary ones of the features of the input hypothesis and between elementary ones of the features of the input premise. The method also includes attaching cross-modal relevant information to the features from the input premise to the features from the input hypothesis to form a cross-modal representation, based on cross-modal interactions between pairs of different elementary features from different modalities.

Type: Grant

Filed: October 30, 2019

Date of Patent: February 15, 2022

Inventors: Farley Lai, Asim Kadav, Ning Xie
False alarm reduction system for automatic manufacturing quality control

Patent number: 11087452

Abstract: A false alarm reduction system and method are provided for reducing false alarms in an automatic defect detection system. The false alarm reduction system includes a defect detection system, generating a list of image boxes marking detected potential defects in an input image. The false alarm reduction system further includes a feature extractor, transforming each of the image boxes in the list into a respective set of numerical features. The false alarm reduction system also includes a classifier, computing as a classification outcome for the each of the image boxes whether the detected potential defect is a true defect or a false alarm responsive to the respective set of numerical features for each of the image boxes.

Type: Grant

Filed: January 16, 2019

Date of Patent: August 10, 2021

Inventors: Alexandru Niculescu-Mizil, Renqiang Min, Eric Cosatto, Farley Lai, Hans Peter Graf, Xavier Fontaine
KEYPOINT BASED POSE-TRACKING USING ENTAILMENT

Publication number: 20210082144

Abstract: Aspects of the present disclosure describe systems, methods and structures for an efficient multi-person posetracking method that advantageously achieves state-of-the-art performance on PoseTrack datasets by only using keypoint information in a tracking step without optical flow or convolution routines. As a consequence, our method has fewer parameters and FLOPs and achieves faster FPS. Our method benefits from our parameter-free tracking method that outperforms commonly used bounding box propagation in top-down methods. Finally, we disclose tokenization and embedding multi-person pose keypoint information in the transformer architecture that can be re-used for other pose tasks such as pose-based action recognition.

Type: Application

Filed: September 9, 2020

Publication date: March 18, 2021

Applicant: NEC LABORATORIES AMERICA, INC

Inventors: Asim KADAV, Farley LAI, Hans Peter GRAF, Michael SNOWER

1 2 next