Patents by Inventor R. Manmatha

R. Manmatha has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Multiple questions, multiple answers service

Patent number: 12632744

Abstract: Techniques for answering multiple questions with multiple answers are described. An example includes taking in a plurality of questions and a single context including at least one of text, encoded audio data, or encoded visual data, generating embeddings for at least one of the plurality of questions, the single context, and question identifying information, encoding the generated embeddings, and decoding the encoded embeddings using trainable prompts to predict an answer for each of the plurality of questions.

Type: Grant

Filed: September 26, 2022

Date of Patent: May 19, 2026

Assignee: Amazon Technologies, Inc.

Inventors: Peng Tang, Srikar Appalaraju, R. Manmatha, Yusheng Xie, Vijay Mahadevan
Document information extraction using visual question answering and document type specific adapters

Patent number: 12494077

Abstract: Document type specific adapters of a document analysis system are used to provide for additional document types or document specialization when generating answers to user submitted questions targeting information included in a document image provided with the user submitted question. The document analysis system receives a visual question answering (VQA) prompt comprising a document image and a question defining information to be extracted from the document image, generates tokens based on the document image and the question, and adjusts encoding of the tokens using document type specific adapters of a transformer model to extract the information from the document image. A classifier of the document analysis system may determine whether the document image matches a document type supported by the document type specific adapters.

Type: Grant

Filed: June 26, 2023

Date of Patent: December 9, 2025

Assignee: Amazon Technologies, Inc.

Inventors: Peng Tang, Pengkai Zhu, Yusheng Xie, Bhavan Ashwinbhai Jasani, R. Manmatha, Vijay Mahadevan
Statistical model training systems

Patent number: 11868440

Abstract: Subsets of training data are selected for iterations of a statistical model through a training process. The selection can reduce the amount of data to be processed by selecting the training data that will likely have significant training value for the pass. This can include using a metric such as the loss or certainty to sample the data, such that easy to classify instances are used for training less frequently than harder to classify instances. A cutoff value or threshold can also, or alternatively, be used such that harder to classify instances are not selected for training until later in the process when the model may be more likely to benefit from training on those instances. Sampling can vary between passes for variety, and the cutoff value might also change such that all data instances are eligible for training selection by at least the last iteration.

Type: Grant

Filed: October 4, 2018

Date of Patent: January 9, 2024

Assignee: A9.com, Inc.

Inventors: Yash Patel, R. Manmatha, Alexander Smola, Son D. Tran, Sheng Zha
Compressed content object and action detection

Patent number: 11568545

Abstract: Various embodiments of a framework which allow, as an alternative to resource-taxing decompression, efficient computation of feature maps using a compressed content data subset, such as video, by exploiting the motion information, such as a motion vector, present in the compressed video. This framework allows frame-specific object recognition and action detection algorithms to be applied to compressed video and other media files by executing only on I-frames in a Group of Pictures and linearly interpolating the results. Training and machine learning increases recognition accuracy. Yielding significant computational gains, this approach accelerates frame-wise feature extraction I-frame/P-frame/P-frame videos as well as I-frame/P-frame/B-frame videos. The present techniques may also be used for segmentation to identify and label respective regions for objects in a video.

Type: Grant

Filed: December 27, 2019

Date of Patent: January 31, 2023

Assignee: A9.com, Inc.

Inventors: R. Manmatha, Hexiang Hu, Deva Ramanan
COMPRESSED CONTENT OBJECT AND ACTION DETECTION

Publication number: 20210342924

Abstract: Various embodiments of a framework which allow, as an alternative to resource-taxing decompression, efficient computation of feature maps using a compressed content data subset, such as video, by exploiting the motion information, such as a motion vector, present in the compressed video. This framework allows frame-specific object recognition and action detection algorithms to be applied to compressed video and other media files by executing only on I-frames in a Group of Pictures and linearly interpolating the results. Training and machine learning increases recognition accuracy. Yielding significant computational gains, this approach accelerates frame-wise feature extraction I-frame/P-frame/P-frame videos as well as I-frame/P-frame/B-frame videos. The present techniques may also be used for segmentation to identify and label respective regions for objects in a video.

Type: Application

Filed: December 27, 2019

Publication date: November 4, 2021

Inventors: R. Manmatha, Hexiang Hu, Deva Ramanan
Computer vision using learnt lossy image compression representations

Patent number: 10984560

Abstract: Techniques for performing learnt image compression and object detection using compressed image data are described. A system may perform image compression using an image compression model that includes an encoder, an entropy model, and a decoder. The encoder, the entropy model, and the decoder may be jointly trained using machine learning based on training data. After training, the encoder and the decoder may be separated to encode image data to generate compressed image data or to decode compressed image data to generate reconstructed image data. In addition, the system may perform object detection using a compressed object detection model that processes compressed image data generated by the image compression model. For example, the compressed object detection model may perform partial decoding using a single layer of the decoder and perform compressed object detection on the partially decoded image data.

Type: Grant

Filed: March 29, 2019

Date of Patent: April 20, 2021

Assignee: Amazon Technologies, Inc.

Inventors: Srikar Appalaraju, R. Manmatha, Tal Hassner
Hierarchical auto-regressive image compression system

Patent number: 10965948

Abstract: The present application relates to a multi-stage encoder/decoder system that provides image compression using hierarchical auto-regressive models and saliency-based masks. The multi-stage encoder/decoder system includes a first stage and a second stage of a trained image compression network, such that the second stage, based on the image compression performed by the first stage, identify certain redundancies that can be removed from the bit string to reduce the storage and bandwidth requirements. Additionally, by using saliency-based masks, distortions in different sections of the image can be weighted differently to further improve the image compression performance.

Type: Grant

Filed: December 13, 2019

Date of Patent: March 30, 2021

Assignee: AMAZON TECHNOLOGIES, INC.

Inventors: Srikar Appalaraju, Yash Patel, R. Manmatha
Learned lossy image compression codec

Patent number: 10909728

Abstract: Techniques for learned lossy image compression are described. A system may perform image compression using an image compression model that includes an encoder to compress an image and a decoder to reconstruct the image. The encoder and the decoder are trained using machine learning techniques. After training, the encoder can encode image data to generate compressed image data and the decoder can decode compressed image data to generate reconstructed image data.

Type: Grant

Filed: May 1, 2019

Date of Patent: February 2, 2021

Assignee: Amazon Technologies, Inc.

Inventors: Srikar Appalaraju, R. Manmatha, Yash Patel
COMPRESSED CONTENT OBJECT AND ACTION DETECTION

Publication number: 20200143457

Abstract: Various embodiments of a framework which allow, as an alternative to resource-taxing decompression, efficient computation of feature maps using a compressed content data subset, such as video, by exploiting the motion information, such as a motion vector, present in the compressed video. This framework allows frame-specific object recognition and action detection algorithms to be applied to compressed video and other media files by executing only on I-frames in a Group of Pictures and linearly interpolating the results. Training and machine learning increases recognition accuracy. Yielding significant computational gains, this approach accelerates frame-wise feature extraction I-frame/P-frame/P-frame videos as well as I-frame/P-frame/B-frame videos. The present techniques may also be used for segmentation to identify and label respective regions for objects in a video.

Type: Application

Filed: December 27, 2019

Publication date: May 7, 2020

Inventors: R. Manmatha, Hexiang Hu, Deva Ramanan
Compressed content object and action detection

Patent number: 10528819

Abstract: Various embodiments of a framework which allow, as an alternative to resource-taxing decompression, efficient computation of feature maps using a compressed content data subset, such as video, by exploiting the motion information, such as a motion vector, present in the compressed video. This framework allows frame-specific object recognition and action detection algorithms to be applied to compressed video and other media files by executing only on I-frames in a Group of Pictures and linearly interpolating the results. Training and machine learning increases recognition accuracy. Yielding significant computational gains, this approach accelerates frame-wise feature extraction I-frame/P-frame/P-frame videos as well as I-frame/P-frame/B-frame videos. The present techniques may also be used for segmentation to identify and label respective regions for objects in a video.

Type: Grant

Filed: November 20, 2017

Date of Patent: January 7, 2020

Assignee: A9.COM, INC.

Inventors: R. Manmatha, Hexiang Hu, Deva Ramanan
Item recommendation based on feature match

Patent number: 10109051

Abstract: Images may be analyzed to determine a visually cohesive color palette, for example by comparing a subset of the colors most frequently appearing in the image to a plurality of color schemes (e.g., complementary, analogous, etc.), and potentially modifying one or more of the subset of colors to more accurately fit the selected color scheme. Various regions of the image are selected and portions of the regions having one or more colors of the color palette are extracted and classified to generate and compare feature vectors of the patches to previously-determined feature vectors of items to identify visually similar items. The visually similar items are selected for presentation in various ways, such as by choosing an outfit of visually-similar apparel items based on the locations of the corresponding colors in the image, etc.

Type: Grant

Filed: June 29, 2016

Date of Patent: October 23, 2018

Assignee: A9.com, Inc.

Inventors: Aishwarya Natesh, Arnab Sanat Kumar Dhua, Ming Du, R. Manmatha, Colin Jon Taylor, Mehmet Nejat Tek
Text recognition and localization with deep learning

Patent number: 10032072

Abstract: Approaches provide for identifying text represented in image data as well as determining a location or region of the image data that includes the text represented in the image data. For example, a camera of a computing device can be used to capture a live camera view of one or more items. The live camera view can be presented to the user on a display screen of the computing device. An application executing on the computing device or at least in communication with the computing device can analyze the image data of the live camera view to identify text represented in the image data as well as determine locations or regions of the image that include the representations.

Type: Grant

Filed: June 21, 2016

Date of Patent: July 24, 2018

Assignee: A9.com, Inc.

Inventors: Son Dinh Tran, R. Manmatha