Patents by Inventor Rameswar Panda

Rameswar Panda has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Neural architecture search of language models using knowledge distillation

Patent number: 12346665

Abstract: A neural architecture search method, system, and computer program product that determines, by a computing device, a best fit language model of a plurality of language models that is a best fit for interpretation of a corpus of natural language and interprets, by the computing device, the corpus of natural language using the best fit language model.

Type: Grant

Filed: February 14, 2022

Date of Patent: July 1, 2025

Assignee: International Business Machines Corporation

Inventors: Michele Merler, Aashka Trivedi, Rameswar Panda, Bishwaranjan Bhattacharjee, Taesun Moon, Avirup Sil
EFFICIENT TRANSFORMER TRAINING BASED ON SMALLER PRETRAINED MODELS

Publication number: 20250103875

Abstract: Parameters of a first transformer are accessed, and size dimensions of a second transformer that is to be trained and is larger than the first transformer are received. The parameters of the first transformer are linearly transformed using a combination of a width-growth operator and a depth-growth operator, wherein the linear transformation produces a set of new parameters, the set corresponding to the size dimensions of the second transformer. The second transformer is initialized with the set of new parameters.

Type: Application

Filed: September 25, 2023

Publication date: March 27, 2025

Inventors: Rameswar Panda, Peihao Wang, LEONID KARLINSKY, Rogerio Schmidt Feris, David Cox, Yoon Hyung Kim
Adaptive selection of data modalities for efficient video recognition

Patent number: 12249147

Abstract: One embodiment of the invention provides a method for video recognition. The method comprises receiving an input video comprising a sequence of video segments over a plurality of data modalities. The method further comprises, for a video segment of the sequence, selecting one or more data modalities based on data representing the video segment. Each data modality selected is optimal for video recognition of the video segment. The method further comprises, for each data modality selected, providing at least one data input representing the video segment over the data modality selected to a machine learning model corresponding to the data modality selected, and generating a first type of prediction representative of the video segment via the machine learning model. The method further comprises determining a second type of prediction representative of the entire input video by aggregating all first type of predictions generated.

Type: Grant

Filed: March 11, 2021

Date of Patent: March 11, 2025

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Rameswar Panda, Richard Chen, Quanfu Fan, Rogerio Schmidt Feris
MULTITASK PROMPT TUNING FOR PARAMETER-EFFICIENT TRANSFER LEARNING

Publication number: 20250005370

Abstract: A source task prompt of each of a plurality of source tasks is decomposed as a multiplication of a shared prompt matrix shared across source tasks and a low-rank task-specific matrix. Prompt distillation is performed to transfer multitask knowledge to the shared prompt matrix by distilling knowledge from the source task prompts. Low-rank multiplicative updates are performed to the shared prompt matrix to transfer the multitask knowledge to one or more target tasks. The one or more target tasks (e.g.

Type: Application

Filed: June 29, 2023

Publication date: January 2, 2025

Inventors: Rameswar Panda, Zhen Wang, LEONID KARLINSKY, Rogerio Schmidt Feris, Yoon Hyung Kim
Interpretability-aware redundancy reduction for vision transformers

Patent number: 12154307

Abstract: A sequence of patch tokens representing an image can be received. A network can be trained to learn informative patch tokens and uninformative patch tokens in the sequence of patch tokens, in learning to recognize an object in the image. The sequence of patch tokens can be reduced by removing the uninformative patch tokens from the sequence of patch tokens. The reduced sequence of patch tokens can be input to an attention-based deep learning neural network. The attention-based deep learning neural network can be fine-tuned to recognize the object in the image using the reduced sequence of patch tokens.

Type: Grant

Filed: December 22, 2021

Date of Patent: November 26, 2024

Assignees: International Business Machines Corporation, Massachusetts Institute of Technology

Inventors: Bowen Pan, Rameswar Panda, Rogerio Schmidt Feris, Aude Jeanne Oliva
Temporal contrastive learning for semi-supervised video action recognition

Patent number: 12067082

Abstract: A base pathway of a computerized two-pathway video action recognition model is trained using a plurality of labeled video samples. The base pathway is trained using a plurality of unlabeled video samples at a first framerate. An auxiliary pathway of the computerized two-pathway video action recognition model is trained using a plurality of the unlabeled video samples at a second framerate, the second framerate being slower than the first framerate, wherein the training of the base pathway and the training of the auxiliary pathway result in a trained computerized two-pathway video action recognition model. A candidate video is categorized using the trained computerized two-pathway video action recognition model and the categorized candidate video is stored in a computer-accessible video database system for information retrieval.

Type: Grant

Filed: October 29, 2021

Date of Patent: August 20, 2024

Assignees: International Business Machines Corporation, Indian Institute of Technology

Inventors: Rameswar Panda, Rogerio Schmidt Feris, Abir Das
TRANSLATING TEXT USING GENERATED VISUAL REPRESENTATIONS AND ARTIFICIAL INTELLIGENCE

Publication number: 20240127005

Abstract: Methods, systems, and computer program products for translating text using generated visual representations and artificial intelligence are provided herein. A computer-implemented method includes generating a tokenized form of at least a portion of input text in a first language; generating at least one visual representation of at least a portion of the input text using a first set of artificial intelligence techniques; generating a tokenized form of at least a portion of the at least one visual representation; and generating an output including a translated version of the input text into at least a second language by processing, using a second set of artificial intelligence techniques, at least a portion of the tokenized form of the at least a portion of the input text and at least a portion of the tokenized form of the at least a portion of the at least one visual representation.

Type: Application

Filed: September 28, 2022

Publication date: April 18, 2024

Inventors: Rameswar Panda, Yi Li, Richard Chen, Rogerio Schmidt Feris, Yoon Hyung Kim, David Cox
Dynamic multi-resolution processing for video classification

Patent number: 11954910

Abstract: Methods, apparatus, and systems for multi-resolution processing for video classification. A plurality of video frames of a video are obtained and a resolution for classifying each video frame of the plurality of video frames is determined by analyzing each video frame using a policy network. Based on the determined resolution, each video frame having a determined resolution is rescaled and each rescaled video frame is routed to a classifier of a backbone network that corresponds to the determined resolution. Each rescaled video frame is classified using the corresponding classifier of the backbone network to obtain a plurality of classifications and the classifications are averaged to determine an action classification of the video.

Type: Grant

Filed: December 26, 2020

Date of Patent: April 9, 2024

Assignees: International Business Machines Corporation, MASSACHUSETTS INSTITUTE OF TECHNOLOGY, MA

Inventors: Rameswar Panda, Yue Meng, Chung-Ching Lin, Rogerio Schmidt Feris, Aude Jeanne Oliva
Regional-to-local attention for vision transformers

Patent number: 11915474

Abstract: Techniques and apparatus for analyzing visual content using a visual transformer are described. An example technique includes generating a first set of tokens based on a visual content item. Each token in the first set of tokens is associated with a regional feature from a different region of a plurality of regions of the visual content item. A second set of tokens is generated based on the visual content item. Each token in the second set of tokens is associated with a local feature from one of the plurality of regions of the visual content item. At least one feature map is generated for the visual content item, based on analyzing the first set of tokens and the second set of tokens separately using a hierarchical vision transformer. At least one vision task is performed based on the at least one feature map.

Type: Grant

Filed: May 31, 2022

Date of Patent: February 27, 2024

Assignee: International Business Machines Corporation

Inventors: Richard Chen, Rameswar Panda, Quanfu Fan
Detecting hybdrid-distance adversarial patches

Patent number: 11875489

Abstract: A hybrid-distance adversarial patch generator can be trained to generate a hybrid adversarial patch effective at multiple distances. The hybrid patch can be inserted into multiple sample images, each depicting an object, to simulate inclusion of the hybrid patch at multiple distances. The multiple sample images can then be used to train an object detection model to detect the objects.

Type: Grant

Filed: June 30, 2021

Date of Patent: January 16, 2024

Assignee: International Business Machines Corporation

Inventors: Quanfu Fan, Sijia Liu, Richard Chen, Rameswar Panda
REGIONAL-TO-LOCAL ATTENTION FOR VISION TRANSFORMERS

Publication number: 20230386197

Abstract: Techniques and apparatus for analyzing visual content using a visual transformer are described. An example technique includes generating a first set of tokens based on a visual content item. Each token in the first set of tokens is associated with a regional feature from a different region of a plurality of regions of the visual content item. A second set of tokens is generated based on the visual content item. Each token in the second set of tokens is associated with a local feature from one of the plurality of regions of the visual content item. At least one feature map is generated for the visual content item, based on analyzing the first set of tokens and the second set of tokens separately using a hierarchical vision transformer. At least one vision task is performed based on the at least one feature map.

Type: Application

Filed: May 31, 2022

Publication date: November 30, 2023

Inventors: Richard CHEN, Rameswar PANDA, Quanfu FAN
NEURAL ARCHITECTURE SEARCH OF LANGUAGE MODELS USING KNOWLEDGE DISTILLATION

Publication number: 20230259716

Abstract: A neural architecture search method, system, and computer program product that determines, by a computing device, a best fit language model of a plurality of language models that is a best fit for interpretation of a corpus of natural language and interprets, by the computing device, the corpus of natural language using the best fit language model.

Type: Application

Filed: February 14, 2022

Publication date: August 17, 2023

Inventors: Michele Merler, Aashka Trivedi, Rameswar Panda, Bishwaranjan Bhattacharjee, Taesun Moon, Avirup Sil
DYNAMIC NETWORK QUANTIZATION FOR EFFICIENT VIDEO INFERENCE

Publication number: 20230215174

Abstract: A recognition network is trained for a selected video frame at a desired highest precision using back-propagation and a policy network is trained using back-propagation from the trained recognition network. The recognition network is trained at a lower precision specified by a policy recommended for the selected video frame by the trained policy network. A frame of a given video is inputted to the trained policy network for determination of a precision policy for processing the frame. Video inferencing is performed utilizing the trained policy network and the trained recognition network based on the precision policy.

Type: Application

Filed: December 31, 2021

Publication date: July 6, 2023

Inventors: Rameswar Panda, Ximeng Sun, Richard Chen, Rogerio Schmidt Feris, Ekaterina Saenko
FAIR SELECTIVE CLASSIFICATION VIA A VARIATIONAL MUTUAL INFORMATION UPPER BOUND FOR IMPOSING SUFFICIENCY

Publication number: 20230206114

Abstract: One or more group-specific aggregate losses, one or more group-agnostic aggregate losses, and a joint loss are computed. A regularizer loss is computed based on the one or more group-specific aggregate losses and the one or more group-agnostic aggregate losses. One or more group-specific models are trained based on the one or more group-specific aggregate losses. A feature extractor is updated based on the regularizer loss and a joint classifier is updated based on the joint loss.

Type: Application

Filed: December 29, 2021

Publication date: June 29, 2023

Inventors: Joshua Ka-Wing Lee, Yuheng Bu, Deepta Rajan, Prasanna Sattigeri, Subhro Das, Rameswar Panda, Gregory Wornell
INTERPRETABILITY-AWARE REDUNDANCY REDUCTION FOR VISION TRANSFORMERS

Publication number: 20230196710

Abstract: A sequence of patch tokens representing an image can be received. A network can be trained to learn informative patch tokens and uninformative patch tokens in the sequence of patch tokens, in learning to recognize an object in the image. The sequence of patch tokens can be reduced by removing the uninformative patch tokens from the sequence of patch tokens. The reduced sequence of patch tokens can be input to an attention-based deep learning neural network. The attention-based deep learning neural network can be fine-tuned to recognize the object in the image using the reduced sequence of patch tokens.

Type: Application

Filed: December 22, 2021

Publication date: June 22, 2023

Inventors: Bowen Pan, Rameswar Panda, Rogerio Schmidt Feris, Aude Jeanne Oliva
TEMPORAL CONTRASTIVE LEARNING FOR SEMI-SUPERVISED VIDEO ACTION RECOGNITION

Publication number: 20230138254

Abstract: A base pathway of a computerized two-pathway video action recognition model is trained using a plurality of labeled video samples. The base pathway is trained using a plurality of unlabeled video samples at a first framerate. An auxiliary pathway of the computerized two-pathway video action recognition model is trained using a plurality of the unlabeled video samples at a second framerate, the second framerate being slower than the first framerate, wherein the training of the base pathway and the training of the auxiliary pathway result in a trained computerized two-pathway video action recognition model. A candidate video is categorized using the trained computerized two-pathway video action recognition model and the categorized candidate video is stored in a computer-accessible video database system for information retrieval.

Type: Application

Filed: October 29, 2021

Publication date: May 4, 2023

Inventors: Rameswar Panda, Rogerio Schmidt Feris, Abir Das
ADAPTIVE REDUNDANCY REDUCTION FOR EFFICIENT VIDEO UNDERSTANDING

Publication number: 20230082448

Abstract: For each convolution layer of a plurality of convolution layers of a convolutional neural network (CNN), apply an input-dependent policy network to determine: a first fraction of input feature maps to the given layer for which first corresponding output feature maps are to be fully computed by the layer; and a second fraction of input feature maps to the layer for which second corresponding output feature maps are not to be fully computed, but to be reconstructed from the first corresponding output feature maps. Fully computing the first corresponding output feature maps and reconstruct the second corresponding output feature maps. For a final one of the convolution layers of the plurality of convolution layers of the neural network, input the first corresponding output feature maps and the second corresponding output feature maps to an output layer to obtain an inference result.

Type: Application

Filed: September 15, 2021

Publication date: March 16, 2023

Inventors: Bowen Pan, Rameswar Panda, Camilo Luciano Fosco, Rogerio Schmidt Feris, Aude Jeanne Oliva
DETECTING HYBDRID-DISTANCE ADVERSARIAL PATCHES

Publication number: 20230005111

Abstract: A hybrid-distance adversarial patch generator can be trained to generate a hybrid adversarial patch effective at multiple distances. The hybrid patch can be inserted into multiple sample images, each depicting an object, to simulate inclusion of the hybrid patch at multiple distances. The multiple sample images can then be used to train an object detection model to detect the objects.

Type: Application

Filed: June 30, 2021

Publication date: January 5, 2023

Inventors: Quanfu Fan, Sijia Liu, Richard Chen, Rameswar Panda
Summarizing videos via side information

Patent number: 11538248

Abstract: Machine learning-based techniques for summarizing collections of data such as image and video data leveraging side information obtained from related (e.g., video) data are provided. In one aspect, a method for video summarization includes: obtaining related videos having content related to a target video; and creating a summary of the target video using information provided by the target video and side information provided by the related videos to select portions of the target video to include in the summary. The side information can include video data, still image data, text, comments, natural language descriptions, and combinations thereof.

Type: Grant

Filed: October 27, 2020

Date of Patent: December 27, 2022

Assignee: International Business Machines Corporation

Inventors: Rameswar Panda, Chuang Gan, Pin-Yu Chen, Bo Wu
ADAPTIVE SELECTION OF DATA MODALITIES FOR EFFICIENT VIDEO RECOGNITION

Publication number: 20220292285

Abstract: One embodiment of the invention provides a method for video recognition. The method comprises receiving an input video comprising a sequence of video segments over a plurality of data modalities. The method further comprises, for a video segment of the sequence, selecting one or more data modalities based on data representing the video segment. Each data modality selected is optimal for video recognition of the video segment. The method further comprises, for each data modality selected, providing at least one data input representing the video segment over the data modality selected to a machine learning model corresponding to the data modality selected, and generating a first type of prediction representative of the video segment via the machine learning model. The method further comprises determining a second type of prediction representative of the entire input video by aggregating all first type of predictions generated.

Type: Application

Filed: March 11, 2021

Publication date: September 15, 2022

Inventors: Rameswar Panda, Richard Chen, Quanfu Fan, Rogerio Schmidt Feris

1 2 next