Patents by Inventor Rameswar Panda
Rameswar Panda has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20250005370Abstract: A source task prompt of each of a plurality of source tasks is decomposed as a multiplication of a shared prompt matrix shared across source tasks and a low-rank task-specific matrix. Prompt distillation is performed to transfer multitask knowledge to the shared prompt matrix by distilling knowledge from the source task prompts. Low-rank multiplicative updates are performed to the shared prompt matrix to transfer the multitask knowledge to one or more target tasks. The one or more target tasks (e.g.Type: ApplicationFiled: June 29, 2023Publication date: January 2, 2025Inventors: Rameswar Panda, Zhen Wang, LEONID KARLINSKY, Rogerio Schmidt Feris, Yoon Hyung Kim
-
Patent number: 12154307Abstract: A sequence of patch tokens representing an image can be received. A network can be trained to learn informative patch tokens and uninformative patch tokens in the sequence of patch tokens, in learning to recognize an object in the image. The sequence of patch tokens can be reduced by removing the uninformative patch tokens from the sequence of patch tokens. The reduced sequence of patch tokens can be input to an attention-based deep learning neural network. The attention-based deep learning neural network can be fine-tuned to recognize the object in the image using the reduced sequence of patch tokens.Type: GrantFiled: December 22, 2021Date of Patent: November 26, 2024Assignees: International Business Machines Corporation, Massachusetts Institute of TechnologyInventors: Bowen Pan, Rameswar Panda, Rogerio Schmidt Feris, Aude Jeanne Oliva
-
Patent number: 12067082Abstract: A base pathway of a computerized two-pathway video action recognition model is trained using a plurality of labeled video samples. The base pathway is trained using a plurality of unlabeled video samples at a first framerate. An auxiliary pathway of the computerized two-pathway video action recognition model is trained using a plurality of the unlabeled video samples at a second framerate, the second framerate being slower than the first framerate, wherein the training of the base pathway and the training of the auxiliary pathway result in a trained computerized two-pathway video action recognition model. A candidate video is categorized using the trained computerized two-pathway video action recognition model and the categorized candidate video is stored in a computer-accessible video database system for information retrieval.Type: GrantFiled: October 29, 2021Date of Patent: August 20, 2024Assignees: International Business Machines Corporation, Indian Institute of TechnologyInventors: Rameswar Panda, Rogerio Schmidt Feris, Abir Das
-
Publication number: 20240127005Abstract: Methods, systems, and computer program products for translating text using generated visual representations and artificial intelligence are provided herein. A computer-implemented method includes generating a tokenized form of at least a portion of input text in a first language; generating at least one visual representation of at least a portion of the input text using a first set of artificial intelligence techniques; generating a tokenized form of at least a portion of the at least one visual representation; and generating an output including a translated version of the input text into at least a second language by processing, using a second set of artificial intelligence techniques, at least a portion of the tokenized form of the at least a portion of the input text and at least a portion of the tokenized form of the at least a portion of the at least one visual representation.Type: ApplicationFiled: September 28, 2022Publication date: April 18, 2024Inventors: Rameswar Panda, Yi Li, Richard Chen, Rogerio Schmidt Feris, Yoon Hyung Kim, David Cox
-
Patent number: 11954910Abstract: Methods, apparatus, and systems for multi-resolution processing for video classification. A plurality of video frames of a video are obtained and a resolution for classifying each video frame of the plurality of video frames is determined by analyzing each video frame using a policy network. Based on the determined resolution, each video frame having a determined resolution is rescaled and each rescaled video frame is routed to a classifier of a backbone network that corresponds to the determined resolution. Each rescaled video frame is classified using the corresponding classifier of the backbone network to obtain a plurality of classifications and the classifications are averaged to determine an action classification of the video.Type: GrantFiled: December 26, 2020Date of Patent: April 9, 2024Assignees: International Business Machines Corporation, MASSACHUSETTS INSTITUTE OF TECHNOLOGY, MAInventors: Rameswar Panda, Yue Meng, Chung-Ching Lin, Rogerio Schmidt Feris, Aude Jeanne Oliva
-
Patent number: 11915474Abstract: Techniques and apparatus for analyzing visual content using a visual transformer are described. An example technique includes generating a first set of tokens based on a visual content item. Each token in the first set of tokens is associated with a regional feature from a different region of a plurality of regions of the visual content item. A second set of tokens is generated based on the visual content item. Each token in the second set of tokens is associated with a local feature from one of the plurality of regions of the visual content item. At least one feature map is generated for the visual content item, based on analyzing the first set of tokens and the second set of tokens separately using a hierarchical vision transformer. At least one vision task is performed based on the at least one feature map.Type: GrantFiled: May 31, 2022Date of Patent: February 27, 2024Assignee: International Business Machines CorporationInventors: Richard Chen, Rameswar Panda, Quanfu Fan
-
Patent number: 11875489Abstract: A hybrid-distance adversarial patch generator can be trained to generate a hybrid adversarial patch effective at multiple distances. The hybrid patch can be inserted into multiple sample images, each depicting an object, to simulate inclusion of the hybrid patch at multiple distances. The multiple sample images can then be used to train an object detection model to detect the objects.Type: GrantFiled: June 30, 2021Date of Patent: January 16, 2024Assignee: International Business Machines CorporationInventors: Quanfu Fan, Sijia Liu, Richard Chen, Rameswar Panda
-
Publication number: 20230386197Abstract: Techniques and apparatus for analyzing visual content using a visual transformer are described. An example technique includes generating a first set of tokens based on a visual content item. Each token in the first set of tokens is associated with a regional feature from a different region of a plurality of regions of the visual content item. A second set of tokens is generated based on the visual content item. Each token in the second set of tokens is associated with a local feature from one of the plurality of regions of the visual content item. At least one feature map is generated for the visual content item, based on analyzing the first set of tokens and the second set of tokens separately using a hierarchical vision transformer. At least one vision task is performed based on the at least one feature map.Type: ApplicationFiled: May 31, 2022Publication date: November 30, 2023Inventors: Richard CHEN, Rameswar PANDA, Quanfu FAN
-
Publication number: 20230259716Abstract: A neural architecture search method, system, and computer program product that determines, by a computing device, a best fit language model of a plurality of language models that is a best fit for interpretation of a corpus of natural language and interprets, by the computing device, the corpus of natural language using the best fit language model.Type: ApplicationFiled: February 14, 2022Publication date: August 17, 2023Inventors: Michele Merler, Aashka Trivedi, Rameswar Panda, Bishwaranjan Bhattacharjee, Taesun Moon, Avirup Sil
-
Publication number: 20230215174Abstract: A recognition network is trained for a selected video frame at a desired highest precision using back-propagation and a policy network is trained using back-propagation from the trained recognition network. The recognition network is trained at a lower precision specified by a policy recommended for the selected video frame by the trained policy network. A frame of a given video is inputted to the trained policy network for determination of a precision policy for processing the frame. Video inferencing is performed utilizing the trained policy network and the trained recognition network based on the precision policy.Type: ApplicationFiled: December 31, 2021Publication date: July 6, 2023Inventors: Rameswar Panda, Ximeng Sun, Richard Chen, Rogerio Schmidt Feris, Ekaterina Saenko
-
Publication number: 20230206114Abstract: One or more group-specific aggregate losses, one or more group-agnostic aggregate losses, and a joint loss are computed. A regularizer loss is computed based on the one or more group-specific aggregate losses and the one or more group-agnostic aggregate losses. One or more group-specific models are trained based on the one or more group-specific aggregate losses. A feature extractor is updated based on the regularizer loss and a joint classifier is updated based on the joint loss.Type: ApplicationFiled: December 29, 2021Publication date: June 29, 2023Inventors: Joshua Ka-Wing Lee, Yuheng Bu, Deepta Rajan, Prasanna Sattigeri, Subhro Das, Rameswar Panda, Gregory Wornell
-
Publication number: 20230196710Abstract: A sequence of patch tokens representing an image can be received. A network can be trained to learn informative patch tokens and uninformative patch tokens in the sequence of patch tokens, in learning to recognize an object in the image. The sequence of patch tokens can be reduced by removing the uninformative patch tokens from the sequence of patch tokens. The reduced sequence of patch tokens can be input to an attention-based deep learning neural network. The attention-based deep learning neural network can be fine-tuned to recognize the object in the image using the reduced sequence of patch tokens.Type: ApplicationFiled: December 22, 2021Publication date: June 22, 2023Inventors: Bowen Pan, Rameswar Panda, Rogerio Schmidt Feris, Aude Jeanne Oliva
-
Publication number: 20230138254Abstract: A base pathway of a computerized two-pathway video action recognition model is trained using a plurality of labeled video samples. The base pathway is trained using a plurality of unlabeled video samples at a first framerate. An auxiliary pathway of the computerized two-pathway video action recognition model is trained using a plurality of the unlabeled video samples at a second framerate, the second framerate being slower than the first framerate, wherein the training of the base pathway and the training of the auxiliary pathway result in a trained computerized two-pathway video action recognition model. A candidate video is categorized using the trained computerized two-pathway video action recognition model and the categorized candidate video is stored in a computer-accessible video database system for information retrieval.Type: ApplicationFiled: October 29, 2021Publication date: May 4, 2023Inventors: Rameswar Panda, Rogerio Schmidt Feris, Abir Das
-
Publication number: 20230082448Abstract: For each convolution layer of a plurality of convolution layers of a convolutional neural network (CNN), apply an input-dependent policy network to determine: a first fraction of input feature maps to the given layer for which first corresponding output feature maps are to be fully computed by the layer; and a second fraction of input feature maps to the layer for which second corresponding output feature maps are not to be fully computed, but to be reconstructed from the first corresponding output feature maps. Fully computing the first corresponding output feature maps and reconstruct the second corresponding output feature maps. For a final one of the convolution layers of the plurality of convolution layers of the neural network, input the first corresponding output feature maps and the second corresponding output feature maps to an output layer to obtain an inference result.Type: ApplicationFiled: September 15, 2021Publication date: March 16, 2023Inventors: Bowen Pan, Rameswar Panda, Camilo Luciano Fosco, Rogerio Schmidt Feris, Aude Jeanne Oliva
-
Publication number: 20230005111Abstract: A hybrid-distance adversarial patch generator can be trained to generate a hybrid adversarial patch effective at multiple distances. The hybrid patch can be inserted into multiple sample images, each depicting an object, to simulate inclusion of the hybrid patch at multiple distances. The multiple sample images can then be used to train an object detection model to detect the objects.Type: ApplicationFiled: June 30, 2021Publication date: January 5, 2023Inventors: Quanfu Fan, Sijia Liu, Richard Chen, Rameswar Panda
-
Patent number: 11538248Abstract: Machine learning-based techniques for summarizing collections of data such as image and video data leveraging side information obtained from related (e.g., video) data are provided. In one aspect, a method for video summarization includes: obtaining related videos having content related to a target video; and creating a summary of the target video using information provided by the target video and side information provided by the related videos to select portions of the target video to include in the summary. The side information can include video data, still image data, text, comments, natural language descriptions, and combinations thereof.Type: GrantFiled: October 27, 2020Date of Patent: December 27, 2022Assignee: International Business Machines CorporationInventors: Rameswar Panda, Chuang Gan, Pin-Yu Chen, Bo Wu
-
Publication number: 20220292285Abstract: One embodiment of the invention provides a method for video recognition. The method comprises receiving an input video comprising a sequence of video segments over a plurality of data modalities. The method further comprises, for a video segment of the sequence, selecting one or more data modalities based on data representing the video segment. Each data modality selected is optimal for video recognition of the video segment. The method further comprises, for each data modality selected, providing at least one data input representing the video segment over the data modality selected to a machine learning model corresponding to the data modality selected, and generating a first type of prediction representative of the video segment via the machine learning model. The method further comprises determining a second type of prediction representative of the entire input video by aggregating all first type of predictions generated.Type: ApplicationFiled: March 11, 2021Publication date: September 15, 2022Inventors: Rameswar Panda, Richard Chen, Quanfu Fan, Rogerio Schmidt Feris
-
Publication number: 20220215198Abstract: Methods, apparatus, and systems for multi-resolution processing for video classification. A plurality of video frames of a video are obtained and a resolution for classifying each video frame of the plurality of video frames is determined by analyzing each video frame using a policy network. Based on the determined resolution, each video frame having a determined resolution is rescaled and each rescaled video frame is routed to a classifier of a backbone network that corresponds to the determined resolution. Each rescaled video frame is classified using the corresponding classifier of the backbone network to obtain a plurality of classifications and the classifications are averaged to determine an action classification of the video.Type: ApplicationFiled: December 26, 2020Publication date: July 7, 2022Inventors: Rameswar Panda, Yue Meng, Chung-Ching Lin, Rogerio Schmidt Feris, Aude Jeanne Oliva
-
Publication number: 20220129679Abstract: Machine learning-based techniques for summarizing collections of data such as image and video data leveraging side information obtained from related (e.g., video) data are provided. In one aspect, a method for video summarization includes: obtaining related videos having content related to a target video; and creating a summary of the target video using information provided by the target video and side information provided by the related videos to select portions of the target video to include in the summary. The side information can include video data, still image data, text, comments, natural language descriptions, and combinations thereof.Type: ApplicationFiled: October 27, 2020Publication date: April 28, 2022Inventors: Rameswar Panda, Chuang Gan, Pin-Yu Chen, Bo Wu
-
Publication number: 20220121924Abstract: An embodiment includes identifying an initial plurality of sets of hyperparameter values at which to evaluate an objective function that relates hyperparameter values to performance values of a neural network. The embodiment also executes training processes on the neural network with the hyperparameters set to the each of the initial sets of hyperparameter values such that the training process provides an initial set of the performance values for the objective function. The embodiment also generates an approximation of the objective function using splines at selected performance values. The embodiment approximates a point at which the approximation of the objective function reaches a maximum value, then determines an updated set of hyperparameter values associated with the maximum value. The embodiment then executes a runtime process using the neural network with the hyperparameters set to the updated set of hyperparameter values.Type: ApplicationFiled: October 21, 2020Publication date: April 21, 2022Applicant: International Business Machines CorporationInventors: Ulrich Alfons Finkler, Michele Merler, Mayoore Selvarasa Jaiswal, Hui Wu, Rameswar Panda, Wei Zhang