Patents by Inventor Samuel Schulter

Samuel Schulter has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Multimodal semantic analysis and image retrieval

Patent number: 12373484

Abstract: Systems and methods are provided for identifying and retrieving semantically similar images from a database. Semantic analysis is performed on an input query utilizing a vision language model to identify semantic concepts associated with the input query. A preliminary set of images is retrieved from the database for semantic concepts identified. Relevant concepts are extracted for images with a tokenizer by comparing images against a predefined label space to identify relevant concepts. A ranked list of relevant concepts is generated based on occurrence frequency within the set. The preliminary set of images is refined based on selecting specific relevant concepts from the ranked list by the user by combining the input query with the specific relevant concepts. Additional semantic analysis is iteratively performed to retrieve additional sets of images semantically similar to the combined input query and selection of the specific relevant concepts until a threshold condition is met.

Type: Grant

Filed: April 18, 2024

Date of Patent: July 29, 2025

Assignee: NEC Corporation

Inventors: Vijay Kumar Baikampady Gopalkrishna, Samuel Schulter, Manmohan Chandraker, Xiang Yu
DETECTION OF RARE AND UNSEEN TRAFFIC PARTICIPANTS AND SCENE ELEMENTS

Publication number: 20250218162

Abstract: Systems and methods receive an annotated driving dataset including images capturing driving scenes and annotations including bounding boxes locating objects in the images. An image-caption dataset is obtained including images from common scenes and captions describing the images. A specialized dataset includes data of specific rare or unseen categories. Problem-specific knowledge generates a list of rare or unseen categories. Dataset tuning is performed by applying vision language model (VLM) sub-categorization, cut and paste, image generation, or caption filtering to the annotated driving dataset, the image-caption dataset, and the specialized dataset based on the problem-specific knowledge. A combined dataset includes outputs from the dataset tuning and the annotated driving dataset. A machine learning model is trained using the combined dataset.

Type: Application

Filed: November 6, 2024

Publication date: July 3, 2025

Inventors: Samuel Schulter, Abhishek Aich, Manmohan Chandraker
EXPANDING TOKEN LENGTHS IN TRANSFORMER ENCODERS

Publication number: 20250182294

Abstract: Methods and systems for image segmentation include generating features at multiple scales from an input image using a backbone model. The features are encoded using a transformer encoder that creates a per-pixel embedding map from a high-resolution scale of the multiple scales using deformable attention layers that operate on progressively higher-resolution scales of the multiple scales. The features are decoded using a transformer decoder to generate a segmentation mask.

Type: Application

Filed: December 4, 2024

Publication date: June 5, 2025

Inventors: Abhishek Aich, Yumin Suh, Samuel Schulter
SELF-IMPROVING DATA ENGINE FOR AUTONOMOUS VEHICLES

Publication number: 20250148757

Abstract: Systems and methods for a self-improving data engine for autonomous vehicles is presented. To train the self-improving data engine for autonomous vehicles (SIDE), multi-modality dense captioning (MMDC) models can detect unrecognized classes from diversified descriptions for input images. A vision-language-model (VLM) can generate textual features from the diversified descriptions and image features from corresponding images to the diversified descriptions. Curated features, including curated textual features and curated image features, can be obtained by comparing similarity scores between the textual features and top-ranked image features based on their likelihood scores. Generate annotations, including bounding boxes and labels, can be generated for the curated features by comparing the similarity scores of labels generated by a zero-shot classifier and the curated textual features. The SIDE can be trained using the curated features, annotations, and feedback.

Type: Application

Filed: October 30, 2024

Publication date: May 8, 2025

Inventors: Jong-Chyi Su, Sparsh Garg, Samuel Schulter, Manmohan Chandraker, Mingfu Liang
SELF-IMPROVING MODELS FOR AGENTIC VISUAL PROGRAM SYNTHESIS

Publication number: 20250139527

Abstract: Systems and methods for a self-improving model for agentic visual program synthesis. An agent can be continuously trained using an optimal training tuple to perform a corrective action to a monitored entity which in turn generates new input data for the training. To train the agent, an input question can be decomposed into vision model tasks to generate task outputs. The task outputs can be corrected based on feedback to obtain corrected task outputs. The optimal training tuple can be generated by comparing an optimal tuple threshold with a similarity score of the input image, the input question, and the corrected task outputs.

Type: Application

Filed: October 29, 2024

Publication date: May 1, 2025

Inventors: Vijay Kumar Baikampady Gopalkrishna, Samuel Schulter, Manmohan Chandraker, Zaid Khan
LANGUAGE-BASED OBJECT DETECTION AND DATA AUGMENTATION

Publication number: 20250118096

Abstract: Methods and systems for object detection include generating a negative description for an input image based on a positive description of the input image using a language model. A negative image is generated based on the input image and the negative description by replacing a portion of the input image that is described by the positive description with content that is described by the negative description using a generative image model. An object detection model is trained with the input image, the positive description, the negative description, and the negative image.

Type: Application

Filed: October 2, 2024

Publication date: April 10, 2025

Inventors: Samuel Schulter, Abhishek Aich, Vijay Kumar Baikampady Gopalkrishna
EFFICIENT TRANSFORMER-BASED PANOPTIC SEGMENTATION

Publication number: 20250117947

Abstract: Methods and systems for segmentation include encoding an image using a backbone model to generate feature maps. An exit point based on one of the feature maps. The feature maps are processed with a dynamic transformer encoder that includes layers, exiting the dynamic transformer encoder at a layer identified by the exit point. An output of the dynamic transformer encoder is decoded to output a segmentation of the image.

Type: Application

Filed: September 23, 2024

Publication date: April 10, 2025

Inventors: Abhishek Aich, Yumin Suh, Samuel Schulter, Manyi Yao
VISUAL OBJECT DETECTION USING EXPLICIT NEGATIVES

Publication number: 20250118053

Abstract: Systems and methods for visual object detection using explicit negatives. To train an artificial intelligence model with explicit negatives, a data sampler can sample input data from a language-based dataset to select images with annotations. A negative generation engine can generate explicit negatives representing sentences that include contradicting words that are semantically related to the annotations by using an external knowledgebase. A model trainer can minimize the classification loss of positive labels while decreasing the confidence score of the explicit negatives for the artificial intelligence model. The negative generation engine can be optimized to generate next explicit negatives. The artificial intelligence model can backpropagate using positive labels and the next explicit negatives to generate supervisory loss corresponding to the net explicit negatives. The artificial intelligence model can detect objects from an input image.

Type: Application

Filed: October 3, 2024

Publication date: April 10, 2025

Inventors: Samuel Schulter, Vijay Kumar Baikampady Gopalkrishna, Yumin Suh
AUTOMATIC ISSUE DETECTION IN MODELS

Publication number: 20250118063

Abstract: Systems and methods include detecting one or more objects in an image and generating one or more captions for the image. One or more predicted categories of the one or more objects detected in the image and the one or more captions are matched. From the one or more predicted categories, a category that is not successfully predicted in the image is identified. Data is curated to improve the category that is not successfully predicted in the image. A perception model is finetuned using data curated.

Type: Application

Filed: September 20, 2024

Publication date: April 10, 2025

Inventors: Jong-Chyi Su, Samuel Schulter, Sparsh Garg, Manmohan Chandraker, Mingfu Liang
SYSTEM ENABLEMENT BASED ON IMAGE QUALITY ANALYSIS

Publication number: 20250118067

Abstract: Systems and methods include generating a detection output for an image over multiple iterations by applying a dropout randomly to a different convolutional layer of a learning model for each iteration. The detection outputs are clustered, on labels, for each iteration. A total surface area for the clusters is computed over the iteration. A confidence is computed for the image using the total surface area for the clusters as an uncertainty score. A system is disabled if the confidence is below a threshold.

Type: Application

Filed: September 17, 2024

Publication date: April 10, 2025

Inventors: Sparsh Garg, Samuel Schulter, Yumin Suh
AUTOMATIC DATA SYSTEMS FOR NOVEL OBJECT DETECTION

Publication number: 20250118044

Abstract: Systems and methods for identifying novel objects in an image include detecting one or more objects in an image and generating one or more captions for the image. One or more predicted categories of the one or more objects detected in the image and the one or more captions are matched to identify, from the one or more predicted categories, a category of a novel object in the image. An image feature and a text description feature are generated using a description of the novel object. A relevant image is selected using a similarity score between the image feature and the text description feature. A model is updated using the relevant image and associated description of the novel object.

Type: Application

Filed: September 20, 2024

Publication date: April 10, 2025

Inventors: Jong-Chyi Su, Samuel Schulter, Sparsh Garg, Manmohan Chandraker, Mingfu Liang
LANGUAGE-BASED OBJECT DETECTION AND DATA AUGMENTATION FOR SELF-DRIVING VEHICLE OPERATION

Publication number: 20250115276

Abstract: Methods and systems for object detection include generating a negative description for an input image of a road scene, based on a positive description of the input image, using a language model. A negative image is generated based on the input image and the negative description by replacing a portion of the input image that is described by the positive description with content that is described by the negative description using a generative image model. An object detection model is trained with the input image, the positive description, the negative description, and the negative image. An object is identified within a driving scene using the trained object detection model. A driving action is performed in a self-driving vehicle responsive to the identified object.

Type: Application

Filed: October 2, 2024

Publication date: April 10, 2025

Inventors: Samuel Schulter, Vijay Kumar Baikampady Gopalkrishna, Yumin Suh
Multi-modal test-time adaptation

Patent number: 12254681

Abstract: Systems and methods are provided for multi-modal test-time adaptation. The method includes inputting a digital image into a pre-trained Camera Intra-modal Pseudo-label Generator, and inputting a point cloud set into a pre-trained Lidar Intra-modal Pseudo-label Generator. The method further includes applying a fast 2-dimension (2D) model, and a slow 2D model, to the inputted digital image to apply pseudo-labels, and applying a fast 3-dimension (3D) model, and a slow 3D model, to the inputted point cloud set to apply pseudo-labels. The method further includes fusing pseudo-label predictions from the fast models and the slow models through an Inter-modal Pseudo-label Refinement module to obtain robust pseudo labels, and measuring a prediction consistency for the pseudo-labels.

Type: Grant

Filed: September 6, 2022

Date of Patent: March 18, 2025

Assignee: NEC Corporation

Inventors: Yi-Hsuan Tsai, Bingbing Zhuang, Samuel Schulter, Buyu Liu, Sparsh Garg, Ramin Moslemi, Inkyu Shin
Semantic image capture fault detection

Patent number: 12205356

Abstract: Methods and systems for detecting faults include capturing an image of a scene using a camera. The image is embedded using a segmentation model that includes an image branch having an image embedding layer that embeds images into a joint latent space and a text branch having a text embedding layer that embeds text into the joint latent space. Semantic information is generated for a region of the image corresponding to a predetermined static object using the embedded image. A fault of the camera is identified based on a discrepancy between the semantic information and semantic information of the predetermined static image. The fault of the camera is corrected.

Type: Grant

Filed: March 23, 2023

Date of Patent: January 21, 2025

Assignee: NEC Corporation

Inventors: Samuel Schulter, Sparsh Garg, Manmohan Chandraker
QUESTION DECOMPOSITION IN VISUAL QUESTION ANSWERING

Publication number: 20240379234

Abstract: Methods and systems for visual question answering include decomposing an initial question to generate a sub-question. The initial question and an image are applied to a visual question answering model to generate an answer and a confidence score. It is determined that the confidence score is below a threshold value. The sub-question is applied to the visual question answering model, responsive to the determination that the confidence score is below a threshold value, to generate a final answer.

Type: Application

Filed: May 9, 2024

Publication date: November 14, 2024

Inventors: Vijay Kumar Baikampady Gopalkrishnan, Samuel Schulter, Manmohan Chandraker
PANOPTIC SEGMENTATION WITH MULTI-DATASET TRAINING AND PART-WHOLE AWARENESS

Publication number: 20240378874

Abstract: Systems and methods are provided for multi-dataset panoptic segmentation, including processing received images from multiple datasets to extract multi-scale features using a backbone network, each of the multiple datasets including a unique label space, generating text-embeddings for class names from the unique label space for each of the multiple datasets, and integrating the text-embeddings with visual features extracted from the received images to create a unified semantic space. A transformer-based segmentation model is trained using the unified semantic space to predict segmentation masks and classes for the received images, and a unified panoptic segmentation map is generated from the predicted segmentation masks and classes by performing inference using a panoptic interference algorithm.

Type: Application

Filed: May 9, 2024

Publication date: November 14, 2024

Inventors: Samuel Schulter, Abhishek Aich
OPTIMIZING MODELS FOR OPEN-VOCABULARY DETECTION

Publication number: 20240378454

Abstract: Systems and methods for optimizing models for open-vocabulary detection. Region proposals can be obtained by employing a pre-trained vision-language model and a pre-trained region proposal network. Object feature predictions can be obtained by employing a trained teacher neural network with the region proposals. Object feature predictions can be filtered above a threshold to obtain pseudo labels. A student neural network with a split-and-fusion detection head can be trained by utilizing the region proposals, base ground truth class labels and the pseudo labels. The pseudo labels can be optimized by reducing the noise from the pseudo labels by employing the trained split-and-fusion detection head of the trained student neural network to obtain optimized object detections. An action can be performed relative to a scene layout based on the optimized object detections.

Type: Application

Filed: May 9, 2024

Publication date: November 14, 2024

Inventors: Samuel Schulter, Yumin Suh, Manmohan Chandraker, Vijay Kumar Baikampady Gopalkrishna
Uncertainty-aware fusion towards large-scale NeRF

Patent number: 12131422

Abstract: A method for achieving high-fidelity novel view synthesis and 3D reconstruction for large-scale scenes is presented. The method includes obtaining images from a video stream received from a plurality of video image capturing devices, grouping the images into different image clusters representing a large-scale 3D scene, training a neural radiance field (NeRF) and an uncertainty multilayer perceptron (MLP) for each of the image clusters to generate a plurality of NeRFs and a plurality of uncertainty MLPs for the large-scale 3D scene, applying a rendering loss and an entropy loss to the plurality of NeRFs, performing uncertainty-based fusion to the plurality of NeRFs to define a fused NeRF, and jointly fine-tuning the plurality of NeRFs and the plurality of uncertainty MLPs, and during inference, applying the fused NeRF for novel view synthesis of the large-scale 3D scene.

Type: Grant

Filed: October 11, 2022

Date of Patent: October 29, 2024

Assignee: NEC Corporation

Inventors: Bingbing Zhuang, Samuel Schulter, Yi-Hsuan Tsai, Buyu Liu, Nanbo Li
ROAD ANALYSIS WITH UNIVERSAL LEARNING

Publication number: 20240354583

Abstract: Methods and systems for training a model include annotating a subset of an unlabeled training dataset, that includes images of road scenes, with labels. A road defect detection model is iteratively trained, including adding pseudo-labels to a remainder of examples from the unlabeled training dataset and training the road defect detection model based on the labels and the pseudo-labels.

Type: Application

Filed: March 25, 2024

Publication date: October 24, 2024

Inventors: Sparsh Garg, Samuel Schulter, Bingbing Zhuang, Manmohan Chandraker
MULTIMODAL SEMANTIC ANALYSIS AND IMAGE RETRIEVAL

Publication number: 20240354336

Abstract: Systems and methods are provided for identifying and retrieving semantically similar images from a database. Semantic analysis is performed on an input query utilizing a vision language model to identify semantic concepts associated with the input query. A preliminary set of images is retrieved from the database for semantic concepts identified. Relevant concepts are extracted for images with a tokenizer by comparing images against a predefined label space to identify relevant concepts. A ranked list of relevant concepts is generated based on occurrence frequency within the set. The preliminary set of images is refined based on selecting specific relevant concepts from the ranked list by the user by combining the input query with the specific relevant concepts. Additional semantic analysis is iteratively performed to retrieve additional sets of images semantically similar to the combined input query and selection of the specific relevant concepts until a threshold condition is met.

Type: Application

Filed: April 18, 2024

Publication date: October 24, 2024

Inventors: Vijay Kumar Baikampady Gopalkrishna, Samuel Schulter, Manmohan Chandraker, Xiang Yu

1 2 3 4 next