Patents by Inventor VISHAL CHUDASAMA

VISHAL CHUDASAMA has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

TEACHING STUDENT NETWORK FOR END-TO-END SEMI-SUPERVISED OBJECT DETECTION

Publication number: 20230281980

Abstract: A system and method for end-to-end semi-supervised object detection is provided. The system retrieves labeled and unlabeled images from an image dataset and generates an input batch by application of image transformation(s) on the images. The system further generates a first result for each image of the input batch by application of a teacher neural network on the input batch. For an object in an unlabeled image of the batch, the first result includes candidate bounding boxes and scores for the boxes. The system determines a threshold score based on the scores and selects a foreground bounding box from the candidates. The system generates a second result by application of a student neural network on the unlabeled image and computes a training loss over the input batch based on the foreground bounding box and the second result. The system trains the student neural network based on the training loss.

Type: Application

Filed: January 25, 2023

Publication date: September 7, 2023

Inventors: PANKAJ WASNIK, NAOYUKI ONOE, VISHAL CHUDASAMA, PURBAYAN KAR
EMOTION RECOGNITION IN MULTIMEDIA VIDEOS USING MULTI-MODAL FUSION-BASED DEEP NEURAL NETWORK

Publication number: 20230154172

Abstract: A system and method of landmark detection using emotion recognition in multimedia videos using multi-modal fusion based deep neural network is provided. The system includes circuitry and a memory configured to store a multimodal fusion network which includes one or more feature extractors, a network of transformer encoders, a fusion attention network, and an output network coupled to the fusion attention network. The system inputs a multimodal input to the one or more feature extractors. The multimodal input is associated with an utterance depicted in one or more videos. The system generates input embeddings as an output of the one or more feature extractors for the input and further generates a set of emotion-relevant features based on the input embeddings. The system further generates a fused-feature representation of the set of emotion-relevant features and predicts an emotion label for the utterance based on fused-feature representation.

Type: Application

Filed: September 9, 2022

Publication date: May 18, 2023

Inventors: PANKAJ WASNIK, NAOYUKI ONOE, VISHAL CHUDASAMA

TEACHING STUDENT NETWORK FOR END-TO-END SEMI-SUPERVISED OBJECT DETECTION

EMOTION RECOGNITION IN MULTIMEDIA VIDEOS USING MULTI-MODAL FUSION-BASED DEEP NEURAL NETWORK