Patents by Inventor Samuel Schulter

Samuel Schulter has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12205356
    Abstract: Methods and systems for detecting faults include capturing an image of a scene using a camera. The image is embedded using a segmentation model that includes an image branch having an image embedding layer that embeds images into a joint latent space and a text branch having a text embedding layer that embeds text into the joint latent space. Semantic information is generated for a region of the image corresponding to a predetermined static object using the embedded image. A fault of the camera is identified based on a discrepancy between the semantic information and semantic information of the predetermined static image. The fault of the camera is corrected.
    Type: Grant
    Filed: March 23, 2023
    Date of Patent: January 21, 2025
    Assignee: NEC Corporation
    Inventors: Samuel Schulter, Sparsh Garg, Manmohan Chandraker
  • Publication number: 20240378454
    Abstract: Systems and methods for optimizing models for open-vocabulary detection. Region proposals can be obtained by employing a pre-trained vision-language model and a pre-trained region proposal network. Object feature predictions can be obtained by employing a trained teacher neural network with the region proposals. Object feature predictions can be filtered above a threshold to obtain pseudo labels. A student neural network with a split-and-fusion detection head can be trained by utilizing the region proposals, base ground truth class labels and the pseudo labels. The pseudo labels can be optimized by reducing the noise from the pseudo labels by employing the trained split-and-fusion detection head of the trained student neural network to obtain optimized object detections. An action can be performed relative to a scene layout based on the optimized object detections.
    Type: Application
    Filed: May 9, 2024
    Publication date: November 14, 2024
    Inventors: Samuel Schulter, Yumin Suh, Manmohan Chandraker, Vijay Kumar Baikampady Gopalkrishna
  • Publication number: 20240379234
    Abstract: Methods and systems for visual question answering include decomposing an initial question to generate a sub-question. The initial question and an image are applied to a visual question answering model to generate an answer and a confidence score. It is determined that the confidence score is below a threshold value. The sub-question is applied to the visual question answering model, responsive to the determination that the confidence score is below a threshold value, to generate a final answer.
    Type: Application
    Filed: May 9, 2024
    Publication date: November 14, 2024
    Inventors: Vijay Kumar Baikampady Gopalkrishnan, Samuel Schulter, Manmohan Chandraker
  • Publication number: 20240378874
    Abstract: Systems and methods are provided for multi-dataset panoptic segmentation, including processing received images from multiple datasets to extract multi-scale features using a backbone network, each of the multiple datasets including a unique label space, generating text-embeddings for class names from the unique label space for each of the multiple datasets, and integrating the text-embeddings with visual features extracted from the received images to create a unified semantic space. A transformer-based segmentation model is trained using the unified semantic space to predict segmentation masks and classes for the received images, and a unified panoptic segmentation map is generated from the predicted segmentation masks and classes by performing inference using a panoptic interference algorithm.
    Type: Application
    Filed: May 9, 2024
    Publication date: November 14, 2024
    Inventors: Samuel Schulter, Abhishek Aich
  • Patent number: 12131422
    Abstract: A method for achieving high-fidelity novel view synthesis and 3D reconstruction for large-scale scenes is presented. The method includes obtaining images from a video stream received from a plurality of video image capturing devices, grouping the images into different image clusters representing a large-scale 3D scene, training a neural radiance field (NeRF) and an uncertainty multilayer perceptron (MLP) for each of the image clusters to generate a plurality of NeRFs and a plurality of uncertainty MLPs for the large-scale 3D scene, applying a rendering loss and an entropy loss to the plurality of NeRFs, performing uncertainty-based fusion to the plurality of NeRFs to define a fused NeRF, and jointly fine-tuning the plurality of NeRFs and the plurality of uncertainty MLPs, and during inference, applying the fused NeRF for novel view synthesis of the large-scale 3D scene.
    Type: Grant
    Filed: October 11, 2022
    Date of Patent: October 29, 2024
    Assignee: NEC Corporation
    Inventors: Bingbing Zhuang, Samuel Schulter, Yi-Hsuan Tsai, Buyu Liu, Nanbo Li
  • Publication number: 20240355102
    Abstract: Systems and methods for traffic violation prediction. The systems and methods include obtaining a plurality of bounding boxes of road scene categories from an input dataset by employing a pre-trained detection model. A plurality of pseudo-labels of road scene categories for the plurality of bounding boxes can be obtained by employing the pre-trained detection model. A labeled dataset can be obtained by filtering the input dataset for images having the plurality of pseudo-labels and the plurality of bounding boxes. A traffic violation prediction model can be trained with both unlabeled and labeled dataset including the road scene categories obtained from the pre-trained detection model to predict simultaneous traffic violations of one or more riders in a road scene.
    Type: Application
    Filed: March 19, 2024
    Publication date: October 24, 2024
    Inventors: Sparsh Garg, Samuel Schulter
  • Publication number: 20240354336
    Abstract: Systems and methods are provided for identifying and retrieving semantically similar images from a database. Semantic analysis is performed on an input query utilizing a vision language model to identify semantic concepts associated with the input query. A preliminary set of images is retrieved from the database for semantic concepts identified. Relevant concepts are extracted for images with a tokenizer by comparing images against a predefined label space to identify relevant concepts. A ranked list of relevant concepts is generated based on occurrence frequency within the set. The preliminary set of images is refined based on selecting specific relevant concepts from the ranked list by the user by combining the input query with the specific relevant concepts. Additional semantic analysis is iteratively performed to retrieve additional sets of images semantically similar to the combined input query and selection of the specific relevant concepts until a threshold condition is met.
    Type: Application
    Filed: April 18, 2024
    Publication date: October 24, 2024
    Inventors: Vijay Kumar Baikampady Gopalkrishna, Samuel Schulter, Manmohan Chandraker, Xiang Yu
  • Publication number: 20240354583
    Abstract: Methods and systems for training a model include annotating a subset of an unlabeled training dataset, that includes images of road scenes, with labels. A road defect detection model is iteratively trained, including adding pseudo-labels to a remainder of examples from the unlabeled training dataset and training the road defect detection model based on the labels and the pseudo-labels.
    Type: Application
    Filed: March 25, 2024
    Publication date: October 24, 2024
    Inventors: Sparsh Garg, Samuel Schulter, Bingbing Zhuang, Manmohan Chandraker
  • Publication number: 20240354921
    Abstract: Systems and methods for road defect level prediction. A depth map is obtained from an image dataset received from input peripherals by employing a vision transformer model. A plurality of semantic maps is obtained from the image dataset by employing a semantic segmentation model to give pixel-wise segmentation results of road scenes to detect road pixels. Regions of interest (ROI) are detected by utilizing the road pixels. Road defect levels are predicted by fitting the ROI and the depth map into a road surface model to generate road points classified into road defect levels. The predicted road defect levels are visualized on a road map.
    Type: Application
    Filed: March 26, 2024
    Publication date: October 24, 2024
    Inventors: Sparsh Garg, Bingbing Zhuang, Samuel Schulter, Manmohan Chandraker
  • Publication number: 20240355090
    Abstract: Systems and methods are provided for matching one or more images using conditional similarity pseudo-labels, including analyzing an unlabeled dataset of images, accessing a foundational vision-language model trained on a plurality of image-text pairs, and defining a set of attributes each comprising multiple possible values for generating pseudo-labels based on notions of similarity (NoS). Text prompts are generated for each attribute value using a prompt template and encoding the text prompts using a text encoder of the foundational model. Each image in the dataset of images is processed through a vision encoder of the foundational model to obtain visual features, the visual features are compared against encoded text prompts to assign a pseudo-label for each attribute for each image, and a conditional similarity network (CSN) is trained with the pseudo-labeled images to generate a conditional similarity model.
    Type: Application
    Filed: April 18, 2024
    Publication date: October 24, 2024
    Inventors: Vijay Kumar Baikampady Gopalkrishna, Samuel Schulter, Xiang Yu, Manmohan Chandraker
  • Publication number: 20240160927
    Abstract: Systems and methods for performing multiple tasks with a single artificial intelligence model that can include training a supernet model for an application by splitting the application into tasks, and splitting the supernet model into subnets. The methods and systems can further assign the tasks computing budgets, and match the tasks to subnets by matching the computing budget of the tasks to the computing capacity of the subnets. Further, the methods and systems can perform the tasks with matching subnets to produce parameters that are used by the supernet to perform the application. The supernet combines all of the task to produce a model for the application and the supernet retains weights for the tasks to be used in subsequent applications.
    Type: Application
    Filed: November 7, 2023
    Publication date: May 16, 2024
    Inventors: Yumin Suh, Samuel Schulter, Xiang Yu, Abhishek Aich
  • Publication number: 20240152767
    Abstract: Systems and methods for training a visual question answer model include training a teacher model by performing image conditional visual question generation on a visual language model (VLM) and a targeted visual question answer dataset using images to generate question and answer pairs. Unlabeled images are pseudolabeled using the teacher model to decode synthetic question and answer pairs for the unlabeled images. The synthetic question and answer pairs for the unlabeled images are merged with real data from the targeted visual question answer dataset to generate a self-augmented training set. A student model is trained using the VLM and the self-augmented training set to return visual answers to text queries.
    Type: Application
    Filed: October 30, 2023
    Publication date: May 9, 2024
    Inventors: Vijay Kumar Baikampady Gopalkrishna, Samuel Schulter, Xiang Yu, Zaid Khan, Manmohan Chandraker
  • Publication number: 20240153250
    Abstract: Methods and systems for training a model include training a size estimation model to generate an estimated object size using a training dataset with differing levels of annotation. Two-dimensional object detection is performed on a training image to identify an object. The training image is cropped around the object. A category-level shape reconstruction is generated using a neural radiance field model. A normalized coordinate model is trained using the training image and ground truth information from the category-level shape reconstruction.
    Type: Application
    Filed: November 1, 2023
    Publication date: May 9, 2024
    Inventors: Bingbing Zhuang, Samuel Schulter, Buyu Liu, Zhixiang Min
  • Publication number: 20240153251
    Abstract: Methods and systems for training a model include performing two-dimensional object detection on a training image to identify an object. The training image is cropped around the object. A category-level shape reconstruction is generated using a neural radiance field model. A normalized coordinate model is trained using the training image and ground truth information from the category-level shape reconstruction.
    Type: Application
    Filed: November 1, 2023
    Publication date: May 9, 2024
    Inventors: Bingbing Zhuang, Samuel Schulter, Buyu Liu, Zhixiang Min
  • Publication number: 20240078816
    Abstract: A computer-implemented method for training a neural network to predict object categories without manual annotation is provided. The method includes feeding training datasets including at least images and data annotations to an object detection neural network, converting, by a text prompter, the data annotations into natural text inputs, converting, by a text embedder, the natural text inputs into embeddings, minimizing objective functions during training to adjust parameters of the object detection neural network, and predicting, by the object detection neural network, objects within images and videos.
    Type: Application
    Filed: August 11, 2023
    Publication date: March 7, 2024
    Inventors: Samuel Schulter, Vijay Kumar Baikampady Gopalkrishna, Yumin Suh, Shiyu Zhao
  • Publication number: 20240071105
    Abstract: Methods and systems for training a model include pre-training a backbone model with a pre-training decoder, using an unlabeled dataset with multiple distinct sensor data modalities that derive from different sensor types. The backbone model is fine-tuned with an output decoder after pre-training, using a labeled dataset with the multiple modalities.
    Type: Application
    Filed: August 22, 2023
    Publication date: February 29, 2024
    Inventors: Samuel Schulter, Bingbing Zhuang, Vijay Kumar Baikampady Gopalkrishna, Sparsh Garg, Zhixing Zhang
  • Publication number: 20240071092
    Abstract: A computer-implemented method for detecting objects within an advanced driver assistance system (ADAS) is provided. The method includes obtaining road scene datasets from a plurality of cameras, including at least road scene images and road scene data annotations, to be provided to an object detection neural network communicating with an open-vocabulary detector of a vehicle, converting, by a text prompter, the road scene data annotations into natural text inputs, converting, by a text embedder, the natural text inputs into embeddings, minimizing objective functions during training to adjust parameters of the object detection neural network, and detecting, by the object detection neural network, objects within the road scene datasets to provide alerts or notifications to a driver of the vehicle pertaining to the detected objects.
    Type: Application
    Filed: August 11, 2023
    Publication date: February 29, 2024
    Inventors: Samuel Schulter, Vijay Kumar Baikampady Gopalkrishna, Yumin Suh
  • Publication number: 20230281963
    Abstract: A method is provided for pretraining vision and language models that includes receiving image-text pairs, each including an image and a text describing the image. The method encodes an image into a set of feature vectors corresponding to input image patches and a CLS token which represents a global image feature. The method parses, by a text tokenizer, the text into a set of feature vectors as tokens for each word in the text. The method encodes the CLS token from the NN based visual encoder and the tokens from the text tokenizer into a set of features by a NN based text and multimodal encoder that shares weights for encoding both the CLS token and the tokens. The method accumulates the weights from multiple iterations as an exponential moving average of the weights during the pretraining until a predetermined error threshold is reduced to be under a threshold amount.
    Type: Application
    Filed: February 28, 2023
    Publication date: September 7, 2023
    Inventors: Vijay Kumar Baikampady Gopalkrishna, Xiang Yu, Samuel Schulter
  • Publication number: 20230281999
    Abstract: Methods and systems identifying road hazards include capturing an image of a road scene using a camera. The image is embedded using a segmentation model that includes an image branch having an image embedding layer that embeds images into a joint latent space and a text branch having a text embedding layer that embeds text into the joint latent space. A mask is generated for an object within the image using the segmentation model. A probability is determined that the object matches a road hazard using the segmentation mode. A signal is generated responsive to the probability to ameliorate a danger posed by the road hazard.
    Type: Application
    Filed: March 23, 2023
    Publication date: September 7, 2023
    Inventors: Samuel Schulter, Sparsh Garg
  • Publication number: 20230281858
    Abstract: A method for object detection obtains, from a set of RGB images lacking annotations, a set of regions that include potential objects, a bounding box, and an objectness score indicating a region prediction confidence. The method obtains, by a region scorer for each region in the set, a category from a fixed set of categories and a confidence for the category responsive to the objectness score. The method duplicates each region in the set to obtain a first and a second patch. The method encodes the patches to obtain an image vector. The method encodes a template sentence using the category to obtain a text vector for each category. The method compares the image vector to the text vector via a similarity function to obtain a similarity probability based on the confidence. The method defines a final set of pseudo labels based on the similarity probability being above a threshold.
    Type: Application
    Filed: February 21, 2023
    Publication date: September 7, 2023
    Inventors: Samuel Schulter, Vijay Kumar Baikampady Gopalkrishna