Patents by Inventor Aniket BERA

Aniket BERA has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

PHYSICS-INFORMED ADAPTIVE FOURIER NEURAL INTERPOLATION OPERATOR FOR SYNTHETIC FRAME GENERATION

Publication number: 20250014144

Abstract: A neural operator-based architecture for performing synthetic frame generation is introduced. The architecture leverages the principles of physics to learn the features in the frames, independent of input resolution, through token mixing and global convolution in the Fourier spectral domain by using Fast Fourier Transform (FFT). The architecture overcomes one of the common limitations exhibited by models that use convolutional layers, a variance to scale, and makes the model resolution independent. This approach is particularly relevant in cases where hardware and resource limitations prevent the capture of high frame rate videos.

Type: Application

Filed: June 28, 2024

Publication date: January 9, 2025

Inventors: Aniket Bera, Rashmi Bhaskara, Md Ashiqur Rahman, Hrishikesh Viswanath
MULTIMODAL DEEPFAKE DETECTION VIA LIP-AUDIO CROSS-ATTENTION AND FACIAL SELF-ATTENTION

Publication number: 20250005925

Abstract: A novel multi-modal audio video framework is disclosed. The framework advantageously leverages both audio and video information to accurately detect whether the video has been manipulated, e.g., detect whether the video is a so-called ‘deepfake.’ In a video-only pipeline, the framework adopts a vision encoder having a feature extractor and a Transformer encoder that leverages self-attention mechanisms to detect artifacts in a facial region of the video. Additionally, in a separate audio-video pipeline, the framework adopts an audio+lip encoder having a Transformer encoder that leverages cross-attention mechanisms to identify discrepancies between lip movements of the person and words spoken by the person in the video. These two modalities are used jointly to make an inference as to whether the video has been manipulated.

Type: Application

Filed: April 24, 2024

Publication date: January 2, 2025

Applicant: Purdue Research Foundation

Inventors: Aniket Bera, Aaditya Kharel, Manas Aniruddha Paranjape
Human emotion recognition in images or video

Patent number: 11861940

Abstract: Systems, methods, apparatuses, and computer program products for recognizing human emotion in images or video. A method for recognizing perceived human emotion may include receiving a raw input. The raw input may be processed to generate input data corresponding to at least one context. Features from the raw input data may be extracted to obtain a plurality of feature vectors and inputs. The plurality of feature vectors and the inputs may be transmitted to a respective neural network. At least some of the plurality of feature vectors may be fused to obtain a feature encoding. Additional feature encodings may be computed from the plurality of feature vectors via the respective neural network. A multi-label emotion classification of a primary agent may be performed in the raw input based on the feature encoding and the additional feature encodings.

Type: Grant

Filed: June 16, 2021

Date of Patent: January 2, 2024

Assignee: UNIVERSITY OF MARYLAND, COLLEGE PARK

Inventors: Trisha Mittal, Aniket Bera, Uttaran Bhattacharya, Rohan Chandra, Pooja Guhan, Dinesh Manocha
System and method for multimodal emotion recognition

Patent number: 11830291

Abstract: Systems, methods, apparatuses, and computer program products for providing multimodal emotion recognition. The method may include receiving raw input from an input source. The method may also include extracting one or more feature vectors from the raw input. The method may further include determining an effectiveness of the one or more feature vectors. Further, the method may include performing, based on the determination, multiplicative fusion processing on the one or more feature vectors. The method may also include predicting, based on results of the multiplicative fusion processing, one or more emotions of the input source.

Type: Grant

Filed: February 10, 2021

Date of Patent: November 28, 2023

Assignee: UNIVERSITY OF MARYLAND, COLLEGE PARK

Inventors: Trisha Mittal, Aniket Bera, Uttaran Bhattacharya, Rohan Chandra, Dinesh Manocha
NEURAL NETWORKS FOR GENERATING EMOTIVE GESTURES FOR VIRTUAL AGENTS

Publication number: 20230135769

Abstract: Systems and methods of the present invention for gesture generation include: receiving a sequence of one or more word embeddings, one or more attributes, a gesture generation machine learning model; providing the sequence of one or more word embeddings and the one or more attributes to the gesture generation machine learning model; and providing the second emotive gesture of the virtual agent from the gesture generation machine learning model. The gesture generation machine learning model is configured to: produce, via an encoder, an output based on the one or more word embeddings; generate one or more encoded features based on the output and the one or more attributes; and produce, via a decoder, a emotive gesture based on the one or more encoded features and the preceding emotive gesture. Other aspects, embodiments, and features are also claimed and described.

Type: Application

Filed: October 31, 2022

Publication date: May 4, 2023

Inventors: Uttaran BHATTACHARYA, Aniket BERA, Dinesh MANOCHA, Abhishek BANERJEE, Pooja GUHAN, Nicholas REWKOWSKI
System and Method for Detecting Fabricated Videos

Publication number: 20220138472

Abstract: A video is classified as real or fake by extracting facial features, including facial modalities and facial emotions, and speech features, including speech modalities and speech emotions, from the video. The facial and speech modalities are passed through first and second neural networks, respectively, to generate facial and speech modality embeddings. The facial and speech emotions are passed through third and fourth neural networks, respectively, to generate facial and speech emotion embeddings. A first distance, d1, between the facial modality embedding and the speech modality embedding is generated, together with a second distance, d2, between the facial emotion embedding and the speech emotion embedding. The video is classified as fake if a sum of the first distance and the second distance exceeds a threshold distance. The networks may be trained using real and fake video pairs for multiple subjects.

Type: Application

Filed: November 1, 2021

Publication date: May 5, 2022

Inventors: Trisha Mittal, Uttaran Bhattacharya, Rohan Chandra, Aniket Bera, Dinesh Manocha
HUMAN EMOTION RECOGNITION IN IMAGES OR VIDEO

Publication number: 20210390288

Abstract: Systems, methods, apparatuses, and computer program products for recognizing human emotion in images or video. A method for recognizing perceived human emotion may include receiving a raw input. The raw input may be processed to generate input data corresponding to at least one context. Features from the raw input data may be extracted to obtain a plurality of feature vectors and inputs. The plurality of feature vectors and the inputs may be transmitted to a respective neural network. At least some of the plurality of feature vectors may be fused to obtain a feature encoding. Additional feature encodings may be computed from the plurality of feature vectors via the respective neural network. A multi-label emotion classification of a primary agent may be performed in the raw input based on the feature encoding and the additional feature encodings.

Type: Application

Filed: June 16, 2021

Publication date: December 16, 2021

Inventors: Trisha MITTAL, Aniket BERA, Uttaran BHATTACHARYA, Rohan CHANDRA, Pooja GUHAN, Dinesh MANOCHA
SYSTEM AND METHOD FOR MULTIMODAL EMOTION RECOGNITION

Publication number: 20210342656

Abstract: Systems, methods, apparatuses, and computer program products for providing multimodal emotion recognition. The method may include receiving raw input from an input source. The method may also include extracting one or more feature vectors from the raw input. The method may further include determining an effectiveness of the one or more feature vectors. Further, the method may include performing, based on the determination, multiplicative fusion processing on the one or more feature vectors. The method may also include predicting, based on results of the multiplicative fusion processing, one or more emotions of the input source.

Type: Application

Filed: February 10, 2021

Publication date: November 4, 2021

Inventors: Trisha MITTAL, Aniket BERA, Uttaran BHATTACHARYA, Rohan CHANDRA, Dinesh MANOCHA