Patents by Inventor Farley Lai
Farley Lai has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12131489Abstract: A surveillance system is provided. The surveillance system is configured for (i) detecting and tracking persons locally for each camera input video stream using the common area anchor boxes and assigning each detected ones of the persons a local track id, (ii) associating a same person in overlapping camera views to a global track id, and collecting associated track boxes as the same person moves in different camera views over time using a priority queue and the local track id and the global track id, (iii) performing track data collection to derive a spatial transformation through matched track box spatial features of a same person over time for scene coverage and (iv) learning a multi-camera tracker given visual features from matched track boxes of distinct people across cameras based on the derived spatial transformation.Type: GrantFiled: May 11, 2022Date of Patent: October 29, 2024Assignee: NEC CorporationInventors: Farley Lai, Asim Kadav, Likitha Lakshminarayanan
-
Publication number: 20240273902Abstract: Methods and systems of training a machine learning model include identifying an object or person related to an action in a first video. The object or person is copied from the first video to a second video to generate a third video. A machine learning model is trained using the first video and the third video.Type: ApplicationFiled: February 12, 2024Publication date: August 15, 2024Inventors: Deep Patel, Giovanni Milione, Kai Li, Farley Lai, Erik Kruus
-
Patent number: 11741712Abstract: A method for using a multi-hop reasoning framework to perform multi-step compositional long-term reasoning is presented. The method includes extracting feature maps and frame-level representations from a video stream by using a convolutional neural network (CNN), performing object representation learning and detection, linking objects through time via tracking to generate object tracks and image feature tracks, feeding the object tracks and the image feature tracks to a multi-hop transformer that hops over frames in the video stream while concurrently attending to one or more of the objects in the video stream until the multi-hop transformer arrives at a correct answer, and employing video representation learning and recognition from the objects and image context to locate a target object within the video stream.Type: GrantFiled: September 1, 2021Date of Patent: August 29, 2023Inventors: Asim Kadav, Farley Lai, Hans Peter Graf, Alexandru Niculescu-Mizil, Renqiang Min, Honglu Zhou
-
Publication number: 20230148017Abstract: A method for compositional reasoning of group activity in videos with keypoint-only modality is presented. The method includes obtaining video frames from a video stream received from a plurality of video image capturing devices, extracting keypoints all of persons detected in the video frames to define keypoint data, tokenizing the keypoint data with time and segment information, clustering groups of keypoint persons in the video frames and passing the clustering groups through multi-scale prediction, and performing a prediction to provide a group activity prediction of a scene in the video frames.Type: ApplicationFiled: October 5, 2022Publication date: May 11, 2023Inventors: Asim Kadav, Farley Lai, Hans Peter Graf, Honglu Zhou
-
Patent number: 11620814Abstract: Aspects of the present disclosure describe systems, methods and structures providing contextual grounding—a higher-order interaction technique to capture corresponding context between text entities and visual objects.Type: GrantFiled: September 8, 2020Date of Patent: April 4, 2023Inventors: Farley Lai, Asim Kadav, Ning Xie
-
Publication number: 20230086023Abstract: A method for model training and deployment includes training, by a processor, a model to learn video representations with a self-supervised contrastive loss by performing progressive training in phases with an incremental number of positive instances from one or more video sequences, resetting the learning rate schedule in each of the phases, and inheriting model weights from a checkpoint from a previous training phase. The method further includes updating the trained model with the self-supervised contrastive loss given multiple positive instances obtained from Cascade K-Nearest Neighbor mining of the one or more video sequences by extracting features in different modalities to compute similarities between the one or more video sequences and selecting a top-k similar instances with features in different modalities. The method also includes fine-tuning the trained model for a downstream task.Type: ApplicationFiled: September 8, 2022Publication date: March 23, 2023Inventors: Farley Lai, Asim Kadav, Cheng-En Wu
-
Patent number: 11600067Abstract: Aspects of the present disclosure describe systems, methods, and structures that provide action recognition with high-order interaction with spatio-temporal object tracking. Image and object features are organized into into tracks, which advantageously facilitates many possible learnable embeddings and intra/inter-track interaction(s). Operationally, our systems, method, and structures according to the present disclosure employ an efficient high-order interaction model to learn embeddings and intra/inter object track interaction across the space and time for AR. Each frame is detected by an object detector to locate visual objects. Those objects are linked through time to form object tracks. The object tracks are then organized and combined with the embeddings as the input to our model. The model is trained to generate representative embeddings and discriminative video features through high-order interaction which is formulated as an efficient matrix operation without iterative processing delay.Type: GrantFiled: September 9, 2020Date of Patent: March 7, 2023Inventors: Farley Lai, Asim Kadav, Jie Chen
-
Publication number: 20230049770Abstract: Methods and systems of training a neural network include training a feature extractor and a classifier using a first set of training data that includes one or more base cases. The classifier is trained with few-shot adaptation using a second set of training data, smaller than the first set of training data, while keeping parameters of the feature extractor constant.Type: ApplicationFiled: July 12, 2022Publication date: February 16, 2023Inventors: Biplob Debnath, Srimat Chakradhar, Oliver Po, Asim Kadav, Farley Lai, Farhan Asif Chowdhury
-
Publication number: 20220383522Abstract: A surveillance system is provided. The surveillance system is configured for (i) detecting and tracking persons locally for each camera input video stream using the common area anchor boxes and assigning each detected ones of the persons a local track id, (ii) associating a same person in overlapping camera views to a global track id, and collecting associated track boxes as the same person moves in different camera views over time using a priority queue and the local track id and the global track id, (iii) performing track data collection to derive a spatial transformation through matched track box spatial features of a same person over time for scene coverage and (iv) learning a multi-camera tracker given visual features from matched track boxes of distinct people across cameras based on the derived spatial transformation.Type: ApplicationFiled: May 11, 2022Publication date: December 1, 2022Inventors: Farley Lai, Asim Kadav, Likitha Lakshminarayanan
-
Patent number: 11475590Abstract: Aspects of the present disclosure describe systems, methods and structures for an efficient multi-person posetracking method that advantageously achieves state-of-the-art performance on PoseTrack datasets by only using keypoint information in a tracking step without optical flow or convolution routines. As a consequence, our method has fewer parameters and FLOPs and achieves faster FPS. Our method benefits from our parameter-free tracking method that outperforms commonly used bounding box propagation in top-down methods. Finally, we disclose tokenization and embedding multi-person pose keypoint information in the transformer architecture that can be re-used for other pose tasks such as pose-based action recognition.Type: GrantFiled: September 9, 2020Date of Patent: October 18, 2022Inventors: Asim Kadav, Farley Lai, Hans Peter Graf, Michael Snower
-
Publication number: 20220319157Abstract: A method for augmenting video sequences in a video reasoning system is presented. The method includes randomly subsampling a sequence of video frames captured from one or more video cameras, randomly reversing the subsampled sequence of video frames to define a plurality of sub-sequences of randomly reversed video frames, training, in a training mode, a video reasoning model with temporally augmented input, including the plurality of sub-sequences of randomly reversed video frames, to make predictions over temporally augmented target classes, updating parameters of the video reasoning model by a machine leaning algorithm, and deploying, in an inference mode, the video reasoning model in the video reasoning system to make a final prediction related to a human action in the sequence of video frames.Type: ApplicationFiled: April 4, 2022Publication date: October 6, 2022Inventors: Farley Lai, Asim Kadav
-
Publication number: 20220237884Abstract: A computer-implemented method is provided for action localization. The method includes converting one or more video frames into person keypoints and object keypoints. The method further includes embedding position, timestamp, instance, and type information with the person keypoints and object keypoints to obtain keypoint embeddings. The method also includes predicting, by a hierarchical transformer encoder using the keypoint embeddings, human actions and bounding box information of when and where the human actions occur in the one or more video frames.Type: ApplicationFiled: January 27, 2022Publication date: July 28, 2022Inventors: Asim Kadav, Farley Lai, Hans Peter Graf, Yi Huang
-
Publication number: 20220101007Abstract: A method for using a multi-hop reasoning framework to perform multi-step compositional long-term reasoning is presented. The method includes extracting feature maps and frame-level representations from a video stream by using a convolutional neural network (CNN), performing object representation learning and detection, linking objects through time via tracking to generate object tracks and image feature tracks, feeding the object tracks and the image feature tracks to a multi-hop transformer that hops over frames in the video stream while concurrently attending to one or more of the objects in the video stream until the multi-hop transformer arrives at a correct answer, and employing video representation learning and recognition from the objects and image context to locate a target object within the video stream.Type: ApplicationFiled: September 1, 2021Publication date: March 31, 2022Inventors: Asim Kadav, Farley Lai, Hans Peter Graf, Alexandru Niculescu-Mizil, Renqiang Min, Honglu Zhou
-
Publication number: 20220083781Abstract: A computer-implemented method is provided for compositional reasoning. The method includes producing a set of primitive predictions from an input sequence. Each of the primitive predictions is of a single action of a tracked subject to be composed in a complex action comprising multiple single actions. The method further includes performing contextual rule filtering of the primitive predictions to pass through filtered primitive predictions that interact with one or more entities of interest in the input sequence with respect to predefined contextual interaction criteria. The method includes performing, by a processor device, temporal rule matching by matching the filtered primitive predictions according to pre-defined temporal rules to identify complex event patterns in the sequence of primitive predictions.Type: ApplicationFiled: September 1, 2021Publication date: March 17, 2022Inventors: Farley Lai, Asim Kadav, Anupriya Prasad
-
Patent number: 11250299Abstract: A method is provided for determining entailment between an input premise and an input hypothesis of different modalities. The method includes extracting features from the input hypothesis and an entirety of and regions of interest in the input premise. The method further includes deriving intra-modal relevant information while suppressing intra-modal irrelevant information, based on intra-modal interactions between elementary ones of the features of the input hypothesis and between elementary ones of the features of the input premise. The method also includes attaching cross-modal relevant information to the features from the input premise to the features from the input hypothesis to form a cross-modal representation, based on cross-modal interactions between pairs of different elementary features from different modalities.Type: GrantFiled: October 30, 2019Date of Patent: February 15, 2022Inventors: Farley Lai, Asim Kadav, Ning Xie
-
Patent number: 11087452Abstract: A false alarm reduction system and method are provided for reducing false alarms in an automatic defect detection system. The false alarm reduction system includes a defect detection system, generating a list of image boxes marking detected potential defects in an input image. The false alarm reduction system further includes a feature extractor, transforming each of the image boxes in the list into a respective set of numerical features. The false alarm reduction system also includes a classifier, computing as a classification outcome for the each of the image boxes whether the detected potential defect is a true defect or a false alarm responsive to the respective set of numerical features for each of the image boxes.Type: GrantFiled: January 16, 2019Date of Patent: August 10, 2021Inventors: Alexandru Niculescu-Mizil, Renqiang Min, Eric Cosatto, Farley Lai, Hans Peter Graf, Xavier Fontaine
-
Publication number: 20210082144Abstract: Aspects of the present disclosure describe systems, methods and structures for an efficient multi-person posetracking method that advantageously achieves state-of-the-art performance on PoseTrack datasets by only using keypoint information in a tracking step without optical flow or convolution routines. As a consequence, our method has fewer parameters and FLOPs and achieves faster FPS. Our method benefits from our parameter-free tracking method that outperforms commonly used bounding box propagation in top-down methods. Finally, we disclose tokenization and embedding multi-person pose keypoint information in the transformer architecture that can be re-used for other pose tasks such as pose-based action recognition.Type: ApplicationFiled: September 9, 2020Publication date: March 18, 2021Applicant: NEC LABORATORIES AMERICA, INCInventors: Asim KADAV, Farley LAI, Hans Peter GRAF, Michael SNOWER
-
Publication number: 20210081728Abstract: Aspects of the present disclosure describe systems, methods and structures providing contextual grounding—a higher-order interaction technique to capture corresponding context between text entities and visual objects.Type: ApplicationFiled: September 8, 2020Publication date: March 18, 2021Applicant: NEC LABORATORIES AMERICA, INCInventors: Farley LAI, Asim KADAV, Ning XIE
-
Publication number: 20210081672Abstract: Aspects of the present disclosure describe systems, methods and structures including a network that recognizes action(s) from learned relationship(s) between various objects in video(s). Interaction(s) of objects over space and time is learned from a series of frames of the video. Object-like representations are learned directly from various 2D CNN layers by capturing the 2D CNN channels, resizing them to an appropriate dimension and then providing them to a transformer network that learns higher-order relationship(s) between them. To effectively learn object-like representations, we 1) combine channels from a first and last convolutional layer in the 2D CNN, and 2) optionally cluster the channel (feature map) representations so that channels representing the same object type are grouped together.Type: ApplicationFiled: September 9, 2020Publication date: March 18, 2021Applicant: NEC LABORATORIES AMERICA, INCInventors: Asim KADAV, Farley LAI, Chhavi SHARMA
-
Publication number: 20210081673Abstract: Aspects of the present disclosure describe systems, methods, and structures that provide action recognition with high-order interaction with spatio-temporal object tracking. Image and object features are organized into into tracks, which advantageously facilitates many possible learnable embeddings and intra/inter-track interaction(s). Operationally, our systems, method, and structures according to the present disclosure employ an efficient high-order interaction model to learn embeddings and intra/inter object track interaction across the space and time for AR. Each frame is detected by an object detector to locate visual objects. Those objects are linked through time to form object tracks. The object tracks are then organized and combined with the embeddings as the input to our model. The model is trained to generate representative embeddings and discriminative video features through high-order interaction which is formulated as an efficient matrix operation without iterative processing delay.Type: ApplicationFiled: September 9, 2020Publication date: March 18, 2021Applicant: NEC LABORATORIES AMERICA, INCInventors: Farley LAI, Asim KADAV, Jie CHEN