Patents by Inventor Rameswar Panda

Rameswar Panda has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240127005
    Abstract: Methods, systems, and computer program products for translating text using generated visual representations and artificial intelligence are provided herein. A computer-implemented method includes generating a tokenized form of at least a portion of input text in a first language; generating at least one visual representation of at least a portion of the input text using a first set of artificial intelligence techniques; generating a tokenized form of at least a portion of the at least one visual representation; and generating an output including a translated version of the input text into at least a second language by processing, using a second set of artificial intelligence techniques, at least a portion of the tokenized form of the at least a portion of the input text and at least a portion of the tokenized form of the at least a portion of the at least one visual representation.
    Type: Application
    Filed: September 28, 2022
    Publication date: April 18, 2024
    Inventors: Rameswar Panda, Yi Li, Richard Chen, Rogerio Schmidt Feris, Yoon Hyung Kim, David Cox
  • Patent number: 11954910
    Abstract: Methods, apparatus, and systems for multi-resolution processing for video classification. A plurality of video frames of a video are obtained and a resolution for classifying each video frame of the plurality of video frames is determined by analyzing each video frame using a policy network. Based on the determined resolution, each video frame having a determined resolution is rescaled and each rescaled video frame is routed to a classifier of a backbone network that corresponds to the determined resolution. Each rescaled video frame is classified using the corresponding classifier of the backbone network to obtain a plurality of classifications and the classifications are averaged to determine an action classification of the video.
    Type: Grant
    Filed: December 26, 2020
    Date of Patent: April 9, 2024
    Assignees: International Business Machines Corporation, MASSACHUSETTS INSTITUTE OF TECHNOLOGY, MA
    Inventors: Rameswar Panda, Yue Meng, Chung-Ching Lin, Rogerio Schmidt Feris, Aude Jeanne Oliva
  • Patent number: 11915474
    Abstract: Techniques and apparatus for analyzing visual content using a visual transformer are described. An example technique includes generating a first set of tokens based on a visual content item. Each token in the first set of tokens is associated with a regional feature from a different region of a plurality of regions of the visual content item. A second set of tokens is generated based on the visual content item. Each token in the second set of tokens is associated with a local feature from one of the plurality of regions of the visual content item. At least one feature map is generated for the visual content item, based on analyzing the first set of tokens and the second set of tokens separately using a hierarchical vision transformer. At least one vision task is performed based on the at least one feature map.
    Type: Grant
    Filed: May 31, 2022
    Date of Patent: February 27, 2024
    Assignee: International Business Machines Corporation
    Inventors: Richard Chen, Rameswar Panda, Quanfu Fan
  • Patent number: 11875489
    Abstract: A hybrid-distance adversarial patch generator can be trained to generate a hybrid adversarial patch effective at multiple distances. The hybrid patch can be inserted into multiple sample images, each depicting an object, to simulate inclusion of the hybrid patch at multiple distances. The multiple sample images can then be used to train an object detection model to detect the objects.
    Type: Grant
    Filed: June 30, 2021
    Date of Patent: January 16, 2024
    Assignee: International Business Machines Corporation
    Inventors: Quanfu Fan, Sijia Liu, Richard Chen, Rameswar Panda
  • Publication number: 20230386197
    Abstract: Techniques and apparatus for analyzing visual content using a visual transformer are described. An example technique includes generating a first set of tokens based on a visual content item. Each token in the first set of tokens is associated with a regional feature from a different region of a plurality of regions of the visual content item. A second set of tokens is generated based on the visual content item. Each token in the second set of tokens is associated with a local feature from one of the plurality of regions of the visual content item. At least one feature map is generated for the visual content item, based on analyzing the first set of tokens and the second set of tokens separately using a hierarchical vision transformer. At least one vision task is performed based on the at least one feature map.
    Type: Application
    Filed: May 31, 2022
    Publication date: November 30, 2023
    Inventors: Richard CHEN, Rameswar PANDA, Quanfu FAN
  • Publication number: 20230259716
    Abstract: A neural architecture search method, system, and computer program product that determines, by a computing device, a best fit language model of a plurality of language models that is a best fit for interpretation of a corpus of natural language and interprets, by the computing device, the corpus of natural language using the best fit language model.
    Type: Application
    Filed: February 14, 2022
    Publication date: August 17, 2023
    Inventors: Michele Merler, Aashka Trivedi, Rameswar Panda, Bishwaranjan Bhattacharjee, Taesun Moon, Avirup Sil
  • Publication number: 20230215174
    Abstract: A recognition network is trained for a selected video frame at a desired highest precision using back-propagation and a policy network is trained using back-propagation from the trained recognition network. The recognition network is trained at a lower precision specified by a policy recommended for the selected video frame by the trained policy network. A frame of a given video is inputted to the trained policy network for determination of a precision policy for processing the frame. Video inferencing is performed utilizing the trained policy network and the trained recognition network based on the precision policy.
    Type: Application
    Filed: December 31, 2021
    Publication date: July 6, 2023
    Inventors: Rameswar Panda, Ximeng Sun, Richard Chen, Rogerio Schmidt Feris, Ekaterina Saenko
  • Publication number: 20230206114
    Abstract: One or more group-specific aggregate losses, one or more group-agnostic aggregate losses, and a joint loss are computed. A regularizer loss is computed based on the one or more group-specific aggregate losses and the one or more group-agnostic aggregate losses. One or more group-specific models are trained based on the one or more group-specific aggregate losses. A feature extractor is updated based on the regularizer loss and a joint classifier is updated based on the joint loss.
    Type: Application
    Filed: December 29, 2021
    Publication date: June 29, 2023
    Inventors: Joshua Ka-Wing Lee, Yuheng Bu, Deepta Rajan, Prasanna Sattigeri, Subhro Das, Rameswar Panda, Gregory Wornell
  • Publication number: 20230196710
    Abstract: A sequence of patch tokens representing an image can be received. A network can be trained to learn informative patch tokens and uninformative patch tokens in the sequence of patch tokens, in learning to recognize an object in the image. The sequence of patch tokens can be reduced by removing the uninformative patch tokens from the sequence of patch tokens. The reduced sequence of patch tokens can be input to an attention-based deep learning neural network. The attention-based deep learning neural network can be fine-tuned to recognize the object in the image using the reduced sequence of patch tokens.
    Type: Application
    Filed: December 22, 2021
    Publication date: June 22, 2023
    Inventors: Bowen Pan, Rameswar Panda, Rogerio Schmidt Feris, Aude Jeanne Oliva
  • Publication number: 20230138254
    Abstract: A base pathway of a computerized two-pathway video action recognition model is trained using a plurality of labeled video samples. The base pathway is trained using a plurality of unlabeled video samples at a first framerate. An auxiliary pathway of the computerized two-pathway video action recognition model is trained using a plurality of the unlabeled video samples at a second framerate, the second framerate being slower than the first framerate, wherein the training of the base pathway and the training of the auxiliary pathway result in a trained computerized two-pathway video action recognition model. A candidate video is categorized using the trained computerized two-pathway video action recognition model and the categorized candidate video is stored in a computer-accessible video database system for information retrieval.
    Type: Application
    Filed: October 29, 2021
    Publication date: May 4, 2023
    Inventors: Rameswar Panda, Rogerio Schmidt Feris, Abir Das
  • Publication number: 20230082448
    Abstract: For each convolution layer of a plurality of convolution layers of a convolutional neural network (CNN), apply an input-dependent policy network to determine: a first fraction of input feature maps to the given layer for which first corresponding output feature maps are to be fully computed by the layer; and a second fraction of input feature maps to the layer for which second corresponding output feature maps are not to be fully computed, but to be reconstructed from the first corresponding output feature maps. Fully computing the first corresponding output feature maps and reconstruct the second corresponding output feature maps. For a final one of the convolution layers of the plurality of convolution layers of the neural network, input the first corresponding output feature maps and the second corresponding output feature maps to an output layer to obtain an inference result.
    Type: Application
    Filed: September 15, 2021
    Publication date: March 16, 2023
    Inventors: Bowen Pan, Rameswar Panda, Camilo Luciano Fosco, Rogerio Schmidt Feris, Aude Jeanne Oliva
  • Publication number: 20230005111
    Abstract: A hybrid-distance adversarial patch generator can be trained to generate a hybrid adversarial patch effective at multiple distances. The hybrid patch can be inserted into multiple sample images, each depicting an object, to simulate inclusion of the hybrid patch at multiple distances. The multiple sample images can then be used to train an object detection model to detect the objects.
    Type: Application
    Filed: June 30, 2021
    Publication date: January 5, 2023
    Inventors: Quanfu Fan, Sijia Liu, Richard Chen, Rameswar Panda
  • Patent number: 11538248
    Abstract: Machine learning-based techniques for summarizing collections of data such as image and video data leveraging side information obtained from related (e.g., video) data are provided. In one aspect, a method for video summarization includes: obtaining related videos having content related to a target video; and creating a summary of the target video using information provided by the target video and side information provided by the related videos to select portions of the target video to include in the summary. The side information can include video data, still image data, text, comments, natural language descriptions, and combinations thereof.
    Type: Grant
    Filed: October 27, 2020
    Date of Patent: December 27, 2022
    Assignee: International Business Machines Corporation
    Inventors: Rameswar Panda, Chuang Gan, Pin-Yu Chen, Bo Wu
  • Publication number: 20220292285
    Abstract: One embodiment of the invention provides a method for video recognition. The method comprises receiving an input video comprising a sequence of video segments over a plurality of data modalities. The method further comprises, for a video segment of the sequence, selecting one or more data modalities based on data representing the video segment. Each data modality selected is optimal for video recognition of the video segment. The method further comprises, for each data modality selected, providing at least one data input representing the video segment over the data modality selected to a machine learning model corresponding to the data modality selected, and generating a first type of prediction representative of the video segment via the machine learning model. The method further comprises determining a second type of prediction representative of the entire input video by aggregating all first type of predictions generated.
    Type: Application
    Filed: March 11, 2021
    Publication date: September 15, 2022
    Inventors: Rameswar Panda, Richard Chen, Quanfu Fan, Rogerio Schmidt Feris
  • Publication number: 20220215198
    Abstract: Methods, apparatus, and systems for multi-resolution processing for video classification. A plurality of video frames of a video are obtained and a resolution for classifying each video frame of the plurality of video frames is determined by analyzing each video frame using a policy network. Based on the determined resolution, each video frame having a determined resolution is rescaled and each rescaled video frame is routed to a classifier of a backbone network that corresponds to the determined resolution. Each rescaled video frame is classified using the corresponding classifier of the backbone network to obtain a plurality of classifications and the classifications are averaged to determine an action classification of the video.
    Type: Application
    Filed: December 26, 2020
    Publication date: July 7, 2022
    Inventors: Rameswar Panda, Yue Meng, Chung-Ching Lin, Rogerio Schmidt Feris, Aude Jeanne Oliva
  • Publication number: 20220129679
    Abstract: Machine learning-based techniques for summarizing collections of data such as image and video data leveraging side information obtained from related (e.g., video) data are provided. In one aspect, a method for video summarization includes: obtaining related videos having content related to a target video; and creating a summary of the target video using information provided by the target video and side information provided by the related videos to select portions of the target video to include in the summary. The side information can include video data, still image data, text, comments, natural language descriptions, and combinations thereof.
    Type: Application
    Filed: October 27, 2020
    Publication date: April 28, 2022
    Inventors: Rameswar Panda, Chuang Gan, Pin-Yu Chen, Bo Wu
  • Publication number: 20220121924
    Abstract: An embodiment includes identifying an initial plurality of sets of hyperparameter values at which to evaluate an objective function that relates hyperparameter values to performance values of a neural network. The embodiment also executes training processes on the neural network with the hyperparameters set to the each of the initial sets of hyperparameter values such that the training process provides an initial set of the performance values for the objective function. The embodiment also generates an approximation of the objective function using splines at selected performance values. The embodiment approximates a point at which the approximation of the objective function reaches a maximum value, then determines an updated set of hyperparameter values associated with the maximum value. The embodiment then executes a runtime process using the neural network with the hyperparameters set to the updated set of hyperparameter values.
    Type: Application
    Filed: October 21, 2020
    Publication date: April 21, 2022
    Applicant: International Business Machines Corporation
    Inventors: Ulrich Alfons Finkler, Michele Merler, Mayoore Selvarasa Jaiswal, Hui Wu, Rameswar Panda, Wei Zhang
  • Patent number: 11210775
    Abstract: A sequence of frames of a video can be received. For a given frame in the sequence of frames, a gradient-embedded frame is generated corresponding to the given frame. The gradient-embedded frame incorporates motion information. The motion information can be represented as disturbance in the gradient-embedded frame. A plurality of such gradient-embedded frames can be generated corresponding to a plurality of the sequence of frames. Based on the plurality of gradient-embedded frames, a neural network such as a generative adversarial network is trained to learn to suppress the disturbance in the gradient-embedded frame and to generate a substitute frame. In inference stage, anomaly in a target video frame can be detected by comparing it to a corresponding substitute frame generated by the neural network.
    Type: Grant
    Filed: September 18, 2020
    Date of Patent: December 28, 2021
    Assignee: International Business Machines Corporation
    Inventors: Bo Wu, Chuang Gan, Dakuo Wang, Rameswar Panda
  • Patent number: 10915798
    Abstract: Disclosed herein are embodiments of systems, methods, and products for a webly supervised training of a convolutional neural network (CNN) to predict emotion in images. A computer may query one or more image repositories using search keywords generated based on the tertiary emotion classes of Parrott's emotion wheel. The computer may filter images received in response to the query to generate a weakly labeled training dataset labels associated with the images that are noisy or wrong may be cleaned prior to training of the CNN. The computer may iteratively train the CNN leveraging the hierarchy of emotion classes by increasing the complexity of the labels (tags) for each iteration. Such curriculum guided training may generate a trained CNN that is more accurate than the conventionally trained neural networks.
    Type: Grant
    Filed: May 15, 2018
    Date of Patent: February 9, 2021
    Assignee: Adobe Inc.
    Inventors: Jianming Zhang, Rameswar Panda, Haoxiang Li, Joon-Young Lee, Xin Lu
  • Patent number: 10672115
    Abstract: Systems and methods are disclosed for processing an image to detect anomalous pixels. An image classification is received from a trained convolutional neural network (CNN) for an input image with a positive classification being defined to represent detection of an anomaly in the image and a negative classification being defined to represent absence of an anomaly. A backward propagation analysis of the input image for each layer of the CNN generates an attention mapping that includes a positive attention map and a negative attention map. A positive mask is generated based on intensity thresholds of the positive attention map and a negative mask is generated based on intensity thresholds of the negative attention map. An image of segmented anomalous pixels is generated based on an aggregation of the positive mask and the negative mask.
    Type: Grant
    Filed: December 6, 2017
    Date of Patent: June 2, 2020
    Assignee: Siemens Corporation
    Inventors: Rameswar Panda, Ziyan Wu, Arun Innanje, Ramesh Nair, Ti-chiun Chang, Jan Ernst