Patents by Inventor Farley Lai

Farley Lai has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11741712
    Abstract: A method for using a multi-hop reasoning framework to perform multi-step compositional long-term reasoning is presented. The method includes extracting feature maps and frame-level representations from a video stream by using a convolutional neural network (CNN), performing object representation learning and detection, linking objects through time via tracking to generate object tracks and image feature tracks, feeding the object tracks and the image feature tracks to a multi-hop transformer that hops over frames in the video stream while concurrently attending to one or more of the objects in the video stream until the multi-hop transformer arrives at a correct answer, and employing video representation learning and recognition from the objects and image context to locate a target object within the video stream.
    Type: Grant
    Filed: September 1, 2021
    Date of Patent: August 29, 2023
    Inventors: Asim Kadav, Farley Lai, Hans Peter Graf, Alexandru Niculescu-Mizil, Renqiang Min, Honglu Zhou
  • Publication number: 20230148017
    Abstract: A method for compositional reasoning of group activity in videos with keypoint-only modality is presented. The method includes obtaining video frames from a video stream received from a plurality of video image capturing devices, extracting keypoints all of persons detected in the video frames to define keypoint data, tokenizing the keypoint data with time and segment information, clustering groups of keypoint persons in the video frames and passing the clustering groups through multi-scale prediction, and performing a prediction to provide a group activity prediction of a scene in the video frames.
    Type: Application
    Filed: October 5, 2022
    Publication date: May 11, 2023
    Inventors: Asim Kadav, Farley Lai, Hans Peter Graf, Honglu Zhou
  • Patent number: 11620814
    Abstract: Aspects of the present disclosure describe systems, methods and structures providing contextual grounding—a higher-order interaction technique to capture corresponding context between text entities and visual objects.
    Type: Grant
    Filed: September 8, 2020
    Date of Patent: April 4, 2023
    Inventors: Farley Lai, Asim Kadav, Ning Xie
  • Publication number: 20230086023
    Abstract: A method for model training and deployment includes training, by a processor, a model to learn video representations with a self-supervised contrastive loss by performing progressive training in phases with an incremental number of positive instances from one or more video sequences, resetting the learning rate schedule in each of the phases, and inheriting model weights from a checkpoint from a previous training phase. The method further includes updating the trained model with the self-supervised contrastive loss given multiple positive instances obtained from Cascade K-Nearest Neighbor mining of the one or more video sequences by extracting features in different modalities to compute similarities between the one or more video sequences and selecting a top-k similar instances with features in different modalities. The method also includes fine-tuning the trained model for a downstream task.
    Type: Application
    Filed: September 8, 2022
    Publication date: March 23, 2023
    Inventors: Farley Lai, Asim Kadav, Cheng-En Wu
  • Patent number: 11600067
    Abstract: Aspects of the present disclosure describe systems, methods, and structures that provide action recognition with high-order interaction with spatio-temporal object tracking. Image and object features are organized into into tracks, which advantageously facilitates many possible learnable embeddings and intra/inter-track interaction(s). Operationally, our systems, method, and structures according to the present disclosure employ an efficient high-order interaction model to learn embeddings and intra/inter object track interaction across the space and time for AR. Each frame is detected by an object detector to locate visual objects. Those objects are linked through time to form object tracks. The object tracks are then organized and combined with the embeddings as the input to our model. The model is trained to generate representative embeddings and discriminative video features through high-order interaction which is formulated as an efficient matrix operation without iterative processing delay.
    Type: Grant
    Filed: September 9, 2020
    Date of Patent: March 7, 2023
    Inventors: Farley Lai, Asim Kadav, Jie Chen
  • Publication number: 20230049770
    Abstract: Methods and systems of training a neural network include training a feature extractor and a classifier using a first set of training data that includes one or more base cases. The classifier is trained with few-shot adaptation using a second set of training data, smaller than the first set of training data, while keeping parameters of the feature extractor constant.
    Type: Application
    Filed: July 12, 2022
    Publication date: February 16, 2023
    Inventors: Biplob Debnath, Srimat Chakradhar, Oliver Po, Asim Kadav, Farley Lai, Farhan Asif Chowdhury
  • Publication number: 20220383522
    Abstract: A surveillance system is provided. The surveillance system is configured for (i) detecting and tracking persons locally for each camera input video stream using the common area anchor boxes and assigning each detected ones of the persons a local track id, (ii) associating a same person in overlapping camera views to a global track id, and collecting associated track boxes as the same person moves in different camera views over time using a priority queue and the local track id and the global track id, (iii) performing track data collection to derive a spatial transformation through matched track box spatial features of a same person over time for scene coverage and (iv) learning a multi-camera tracker given visual features from matched track boxes of distinct people across cameras based on the derived spatial transformation.
    Type: Application
    Filed: May 11, 2022
    Publication date: December 1, 2022
    Inventors: Farley Lai, Asim Kadav, Likitha Lakshminarayanan
  • Patent number: 11475590
    Abstract: Aspects of the present disclosure describe systems, methods and structures for an efficient multi-person posetracking method that advantageously achieves state-of-the-art performance on PoseTrack datasets by only using keypoint information in a tracking step without optical flow or convolution routines. As a consequence, our method has fewer parameters and FLOPs and achieves faster FPS. Our method benefits from our parameter-free tracking method that outperforms commonly used bounding box propagation in top-down methods. Finally, we disclose tokenization and embedding multi-person pose keypoint information in the transformer architecture that can be re-used for other pose tasks such as pose-based action recognition.
    Type: Grant
    Filed: September 9, 2020
    Date of Patent: October 18, 2022
    Inventors: Asim Kadav, Farley Lai, Hans Peter Graf, Michael Snower
  • Publication number: 20220319157
    Abstract: A method for augmenting video sequences in a video reasoning system is presented. The method includes randomly subsampling a sequence of video frames captured from one or more video cameras, randomly reversing the subsampled sequence of video frames to define a plurality of sub-sequences of randomly reversed video frames, training, in a training mode, a video reasoning model with temporally augmented input, including the plurality of sub-sequences of randomly reversed video frames, to make predictions over temporally augmented target classes, updating parameters of the video reasoning model by a machine leaning algorithm, and deploying, in an inference mode, the video reasoning model in the video reasoning system to make a final prediction related to a human action in the sequence of video frames.
    Type: Application
    Filed: April 4, 2022
    Publication date: October 6, 2022
    Inventors: Farley Lai, Asim Kadav
  • Publication number: 20220237884
    Abstract: A computer-implemented method is provided for action localization. The method includes converting one or more video frames into person keypoints and object keypoints. The method further includes embedding position, timestamp, instance, and type information with the person keypoints and object keypoints to obtain keypoint embeddings. The method also includes predicting, by a hierarchical transformer encoder using the keypoint embeddings, human actions and bounding box information of when and where the human actions occur in the one or more video frames.
    Type: Application
    Filed: January 27, 2022
    Publication date: July 28, 2022
    Inventors: Asim Kadav, Farley Lai, Hans Peter Graf, Yi Huang
  • Publication number: 20220101007
    Abstract: A method for using a multi-hop reasoning framework to perform multi-step compositional long-term reasoning is presented. The method includes extracting feature maps and frame-level representations from a video stream by using a convolutional neural network (CNN), performing object representation learning and detection, linking objects through time via tracking to generate object tracks and image feature tracks, feeding the object tracks and the image feature tracks to a multi-hop transformer that hops over frames in the video stream while concurrently attending to one or more of the objects in the video stream until the multi-hop transformer arrives at a correct answer, and employing video representation learning and recognition from the objects and image context to locate a target object within the video stream.
    Type: Application
    Filed: September 1, 2021
    Publication date: March 31, 2022
    Inventors: Asim Kadav, Farley Lai, Hans Peter Graf, Alexandru Niculescu-Mizil, Renqiang Min, Honglu Zhou
  • Publication number: 20220083781
    Abstract: A computer-implemented method is provided for compositional reasoning. The method includes producing a set of primitive predictions from an input sequence. Each of the primitive predictions is of a single action of a tracked subject to be composed in a complex action comprising multiple single actions. The method further includes performing contextual rule filtering of the primitive predictions to pass through filtered primitive predictions that interact with one or more entities of interest in the input sequence with respect to predefined contextual interaction criteria. The method includes performing, by a processor device, temporal rule matching by matching the filtered primitive predictions according to pre-defined temporal rules to identify complex event patterns in the sequence of primitive predictions.
    Type: Application
    Filed: September 1, 2021
    Publication date: March 17, 2022
    Inventors: Farley Lai, Asim Kadav, Anupriya Prasad
  • Patent number: 11250299
    Abstract: A method is provided for determining entailment between an input premise and an input hypothesis of different modalities. The method includes extracting features from the input hypothesis and an entirety of and regions of interest in the input premise. The method further includes deriving intra-modal relevant information while suppressing intra-modal irrelevant information, based on intra-modal interactions between elementary ones of the features of the input hypothesis and between elementary ones of the features of the input premise. The method also includes attaching cross-modal relevant information to the features from the input premise to the features from the input hypothesis to form a cross-modal representation, based on cross-modal interactions between pairs of different elementary features from different modalities.
    Type: Grant
    Filed: October 30, 2019
    Date of Patent: February 15, 2022
    Inventors: Farley Lai, Asim Kadav, Ning Xie
  • Patent number: 11087452
    Abstract: A false alarm reduction system and method are provided for reducing false alarms in an automatic defect detection system. The false alarm reduction system includes a defect detection system, generating a list of image boxes marking detected potential defects in an input image. The false alarm reduction system further includes a feature extractor, transforming each of the image boxes in the list into a respective set of numerical features. The false alarm reduction system also includes a classifier, computing as a classification outcome for the each of the image boxes whether the detected potential defect is a true defect or a false alarm responsive to the respective set of numerical features for each of the image boxes.
    Type: Grant
    Filed: January 16, 2019
    Date of Patent: August 10, 2021
    Inventors: Alexandru Niculescu-Mizil, Renqiang Min, Eric Cosatto, Farley Lai, Hans Peter Graf, Xavier Fontaine
  • Publication number: 20210081672
    Abstract: Aspects of the present disclosure describe systems, methods and structures including a network that recognizes action(s) from learned relationship(s) between various objects in video(s). Interaction(s) of objects over space and time is learned from a series of frames of the video. Object-like representations are learned directly from various 2D CNN layers by capturing the 2D CNN channels, resizing them to an appropriate dimension and then providing them to a transformer network that learns higher-order relationship(s) between them. To effectively learn object-like representations, we 1) combine channels from a first and last convolutional layer in the 2D CNN, and 2) optionally cluster the channel (feature map) representations so that channels representing the same object type are grouped together.
    Type: Application
    Filed: September 9, 2020
    Publication date: March 18, 2021
    Applicant: NEC LABORATORIES AMERICA, INC
    Inventors: Asim KADAV, Farley LAI, Chhavi SHARMA
  • Publication number: 20210081728
    Abstract: Aspects of the present disclosure describe systems, methods and structures providing contextual grounding—a higher-order interaction technique to capture corresponding context between text entities and visual objects.
    Type: Application
    Filed: September 8, 2020
    Publication date: March 18, 2021
    Applicant: NEC LABORATORIES AMERICA, INC
    Inventors: Farley LAI, Asim KADAV, Ning XIE
  • Publication number: 20210082144
    Abstract: Aspects of the present disclosure describe systems, methods and structures for an efficient multi-person posetracking method that advantageously achieves state-of-the-art performance on PoseTrack datasets by only using keypoint information in a tracking step without optical flow or convolution routines. As a consequence, our method has fewer parameters and FLOPs and achieves faster FPS. Our method benefits from our parameter-free tracking method that outperforms commonly used bounding box propagation in top-down methods. Finally, we disclose tokenization and embedding multi-person pose keypoint information in the transformer architecture that can be re-used for other pose tasks such as pose-based action recognition.
    Type: Application
    Filed: September 9, 2020
    Publication date: March 18, 2021
    Applicant: NEC LABORATORIES AMERICA, INC
    Inventors: Asim KADAV, Farley LAI, Hans Peter GRAF, Michael SNOWER
  • Publication number: 20210081673
    Abstract: Aspects of the present disclosure describe systems, methods, and structures that provide action recognition with high-order interaction with spatio-temporal object tracking. Image and object features are organized into into tracks, which advantageously facilitates many possible learnable embeddings and intra/inter-track interaction(s). Operationally, our systems, method, and structures according to the present disclosure employ an efficient high-order interaction model to learn embeddings and intra/inter object track interaction across the space and time for AR. Each frame is detected by an object detector to locate visual objects. Those objects are linked through time to form object tracks. The object tracks are then organized and combined with the embeddings as the input to our model. The model is trained to generate representative embeddings and discriminative video features through high-order interaction which is formulated as an efficient matrix operation without iterative processing delay.
    Type: Application
    Filed: September 9, 2020
    Publication date: March 18, 2021
    Applicant: NEC LABORATORIES AMERICA, INC
    Inventors: Farley LAI, Asim KADAV, Jie CHEN
  • Patent number: 10885627
    Abstract: Methods and systems for detecting and correcting anomalous inputs include training a neural network to embed high-dimensional input data into a low-dimensional space with an embedding that preserves neighbor relationships. Input data items are embedded into the low-dimensional space to form respective low-dimensional codes. An anomaly is determined among the high-dimensional input data based on the low-dimensional codes. The anomaly is corrected.
    Type: Grant
    Filed: April 1, 2019
    Date of Patent: January 5, 2021
    Inventors: Renqiang Min, Farley Lai, Eric Cosatto, Hans Peter Graf
  • Patent number: 10853937
    Abstract: A false alarm reduction system is provided that includes a processor cropping each input image at randomly chosen positions to form cropped images of a same size at different scales in different contexts. The system further includes a CONDA-GMM, having a first and a second conditional deep autoencoder for respectively (i) taking each cropped image without a respective center block as input for measuring a discrepancy between a reconstructed and a target center block, and (ii) taking an entirety of cropped images with the target center block. The CONDA-GMM constructs density estimates based on reconstruction error features and low-dimensional embedding representations derived from image encodings. The processor determines an anomaly existence based on a prediction of a likelihood of the anomaly existing in a framework of a CGMM, given the context being a representation of the cropped image with the center block removed and having a discrepancy above a threshold.
    Type: Grant
    Filed: January 16, 2019
    Date of Patent: December 1, 2020
    Assignee: NEC CORPORATION
    Inventors: Alexandru Niculescu-Mizil, Renqiang Min, Eric Cosatto, Farley Lai, Hans Peter Graf, Xavier Fontaine