Patents by Inventor Raghavan Manmatha

Raghavan Manmatha has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Mitigating bias in multimodal models via query transformation

Patent number: 12229179

Abstract: The present disclosure generally relates to systems and methods for searching media content. In some implementation examples, a search system receives an input query, generates a query embedding of the input query, and generates a bias mitigation transformation associated with a sensitive attribute. Based on the query embedding and the bias mitigation transformation, the search system generates a transformed query embedding that suppresses at least a portion of the query embedding related to the sensitive attribute. Using the transformed query embedding, the search system executes a similarity search in a media embedding model to identify one or more media embeddings that are similar to the transformed query embedding and transmits the one or more media embeddings.

Type: Grant

Filed: November 20, 2023

Date of Patent: February 18, 2025

Assignee: Amazon Technologies, Inc.

Inventors: Matthaeus Kleindessner, Christopher Michael Russell, Kailash Budhathoki, Ali Caner Turkmen, Siqi Deng, Varad Gunjal, Ashwin Swaminathan, Raghavan Manmatha, Hao Yang
CONTENT EXTRACTION USING RELATED ENTITY GROUP METADATA FROM REFERENCE OBJECTS

Publication number: 20240152510

Abstract: Representations of sets of descriptors of reference objects are stored in a repository, with individual descriptors including information about entities identified in the reference objects. In response to a request to extract content from a particular data object, a reference object which satisfies a similarity criterion with respect to the particular data object is identified from the repository using the descriptors. A structural comparison of the particular data object and the reference object is performed to determine an entity related to another entity identified in the particular data object.

Type: Application

Filed: December 18, 2023

Publication date: May 9, 2024

Applicant: Amazon Technologies, Inc.

Inventors: Srikar Appalaraju, Raghavan Manmatha, Bhargava Urala Kota
Content extraction using related entity group metadata from reference objects

Patent number: 11893012

Abstract: Representations of sets of descriptors of reference objects are stored in a repository, with individual descriptors including information about entities identified in the reference objects. In response to a request to extract content from a particular data object, a reference object which satisfies a similarity criterion with respect to the particular data object is identified from the repository using the descriptors. A structural comparison of the particular data object and the reference object is performed to determine an entity related to another entity identified in the particular data object.

Type: Grant

Filed: May 28, 2021

Date of Patent: February 6, 2024

Assignee: Amazon Technologies, Inc.

Inventors: Srikar Appalaraju, Raghavan Manmatha, Bhargava Urala Kota
Residual context refinement network architecture for optical character recognition

Patent number: 11308354

Abstract: Techniques for recognizing text in an image are described. An exemplary method may include receiving a request to recognize text in an image; extracting features from the image and generating a visual feature sequence from the extracted features; performing selective contextual refinement at least one selective contextual refinement block of a stack of selective contextual refinement blocks to generate a text prediction by: generating a contextual feature map and combining the contextual feature map with the visual feature sequence into a visual feature space, and applying a selective decoder that utilizes a two-step attention on the visual feature space to generate a text prediction, wherein the two-step attention includes performing a 1-D self-attention computation to generate attentional features and decoding the attentional features to generate the text prediction; and outputting the generated text prediction.

Type: Grant

Filed: March 30, 2020

Date of Patent: April 19, 2022

Assignee: Amazon Technologies, Inc.

Inventors: Ron Litman, Oron Anschel, Shahar Tsiper, Roee Litman, Shai Mazor, Jonathan Wu, Raghavan Manmatha
Enhanced compression of video data

Patent number: 10659787

Abstract: Techniques are generally described for enhanced compression of video data. In various examples, the techniques may include receiving first video data representing a scene in an environment. The techniques may further include generating illumination map data representing illumination of the scene in the first video data. The techniques may further comprise generating reflectance map data representing a reflectance of at least one object in the first video data. In some examples, the techniques may include sending, to a second computing device, the illumination map data and the reflectance map data. The techniques may further include receiving second video data representing the scene. The techniques may include determining a first illumination difference between the second video data and the first video data. The techniques may comprise sending, to the second computing device, the first illumination difference.

Type: Grant

Filed: September 20, 2018

Date of Patent: May 19, 2020

Assignee: AMAZON TECHNOLOGIES, INC.

Inventors: Ilya Vladimirovich Brailovskiy, Raghavan Manmatha
Activation layers for deep learning networks

Patent number: 10366313

Abstract: Tasks such as object classification from image data can take advantage of a deep learning process using convolutional neural networks. These networks can include a convolutional layer followed by an activation layer, or activation unit, among other potential layers. Improved accuracy can be obtained by using a generalized linear unit (GLU) as an activation unit in such a network, where a GLU is linear for both positive and negative inputs, and is defined by a positive slope, a negative slope, and a bias. These parameters can be learned for each channel or a block of channels, and stacking those types of activation units can further improve accuracy.

Type: Grant

Filed: February 12, 2018

Date of Patent: July 30, 2019

Assignee: A9.COM, INC.

Inventors: Son Dinh Tran, Raghavan Manmatha
ACTIVATION LAYERS FOR DEEP LEARNING NETWORKS

Publication number: 20180197049

Abstract: Tasks such as object classification from image data can take advantage of a deep learning process using convolutional neural networks. These networks can include a convolutional layer followed by an activation layer, or activation unit, among other potential layers. Improved accuracy can be obtained by using a generalized linear unit (GLU) as an activation unit in such a network, where a GLU is linear for both positive and negative inputs, and is defined by a positive slope, a negative slope, and a bias. These parameters can be learned for each channel or a block of channels, and stacking those types of activation units can further improve accuracy.

Type: Application

Filed: February 12, 2018

Publication date: July 12, 2018

Inventors: Son Dinh Tran, Raghavan Manmatha
Object retrieval

Patent number: 10013633

Abstract: Various approaches enable a user to capture image information (e.g., still images or video) about an object of interest such as the sole of a shoe or other piece of footwear (e.g., a sandal) and receive information about items that are determined to match footwear based at least in part on the image information. For example, an image analyze service or other similar service can analyze the images to determine a type of shoe included within the images based at least in part on patterns of other distinguishing features of the sole of the shoe. The image analysis service can aggregate the results and can provide information about the results as a set of matches or results to be displayed to a user in response to a visual search query. The information can include, for example, descriptions, contact information, availability, location data, pricing information, and other such information.

Type: Grant

Filed: March 8, 2017

Date of Patent: July 3, 2018

Assignee: A9.COM, INC.

Inventors: Raghavan Manmatha, Wei-Hong Chuang
Content collection search with robust content matching

Patent number: 10007680

Abstract: Systems and approaches for searching a content collection corresponding to query content are provided. In particular, false positive match rates between the query content and the content collection may be reduced with a minimum content region test and/or a minimum features per scale test. For example, by correlating content descriptors of a content piece in the content collection with query descriptors of the query content, the content piece can be determined to match the query content when a particular region of the content piece and/or a particular region of a query descriptor have a proportionate size meeting or exceeding a specified minimum. Alternatively, or in addition, the false positive match rate between query content and a content piece can be reduced by comparing content descriptors and query descriptors of features at a plurality of scales. A content piece can be determined to match the query content according to descriptor proportion quotas for the plurality of scales.

Type: Grant

Filed: January 26, 2015

Date of Patent: June 26, 2018

Assignee: A9.COM, INC.

Inventors: Arnab Sanat Kumar Dhua, Sunil Ramesh, Max Delgadillo, Raghavan Manmatha
Activation layers for deep learning networks

Patent number: 9892344

Abstract: Tasks such as object classification from image data can take advantage of a deep learning process using convolutional neural networks. These networks can include a convolutional layer followed by an activation layer, or activation unit, among other potential layers. Improved accuracy can be obtained by using a generalized linear unit (GLU) as an activation unit in such a network, where a GLU is linear for both positive and negative inputs, and is defined by a positive slope, a negative slope, and a bias. These parameters can be learned for each channel or a block of channels, and stacking those types of activation units can further improve accuracy.

Type: Grant

Filed: November 30, 2015

Date of Patent: February 13, 2018

Assignee: A9.COM, INC.

Inventors: Son Dinh Tran, Raghavan Manmatha
Method and system for matching an image using normalized feature vectors

Patent number: 9721182

Abstract: A method, system and computer program product for encoding an image is provided. The image that needs to be represented is represented in the form of a Gaussian pyramid which is a scale-space representation of the image and includes several pyramid images. The feature points in the pyramid images are identified and a specified number of feature points are selected. The orientations of the selected feature points are obtained by using a set of orientation calculating algorithms. A patch is extracted around the feature point in the pyramid images based on the orientations of the feature point and the sampling factor of the pyramid image. The boundary patches in the pyramid images are extracted by padding the pyramid images with extra pixels. The feature vectors of the extracted patches are defined. These feature vectors are normalized so that the components in the feature vectors are less than a threshold.

Type: Grant

Filed: December 21, 2016

Date of Patent: August 1, 2017

Assignee: A9.com, Inc.

Inventors: Mark A. Ruzon, Raghavan Manmatha, Donald Tanguay
Object retrieval

Patent number: 9652838

Abstract: Various approaches enable a user to capture image information (e.g., still images or video) about an object of interest such as the sole of a shoe or other piece of footwear (e.g., a sandal) and receive information about items that are determined to match footwear based at least in part on the image information. For example, an image analyze service or other similar service can analyze the images to determine a type of shoe included within the images based at least in part on patterns of other distinguishing features of the sole of the shoe. The image analysis service can aggregate the results and can provide information about the results as a set of matches or results to be displayed to a user in response to a visual search query. The information can include, for example, descriptions, contact information, availability, location data, pricing information, and other such information.

Type: Grant

Filed: December 23, 2014

Date of Patent: May 16, 2017

Assignee: A9.com, Inc.

Inventors: Raghavan Manmatha, Wei-Hong Chuang
METHOD AND SYSTEM FOR MATCHING AN IMAGE USING NORMALIZED FEATURE VECTORS

Publication number: 20170103282

Abstract: A method, system and computer program product for encoding an image is provided. The image that needs to be represented is represented in the form of a Gaussian pyramid which is a scale-space representation of the image and includes several pyramid images. The feature points in the pyramid images are identified and a specified number of feature points are selected. The orientations of the selected feature points are obtained by using a set of orientation calculating algorithms. A patch is extracted around the feature point in the pyramid images based on the orientations of the feature point and the sampling factor of the pyramid image. The boundary patches in the pyramid images are extracted by padding the pyramid images with extra pixels. The feature vectors of the extracted patches are defined. These feature vectors are normalized so that the components in the feature vectors are less than a threshold.

Type: Application

Filed: December 21, 2016

Publication date: April 13, 2017

Inventors: Mark A. Ruzon, Raghavan Manmatha, Donald Ranguay
Method and system for detecting and recognizing text in images

Patent number: 9530069

Abstract: Various embodiments of the present invention relate to a method, system and computer program product for detecting and recognizing text in the images captured by cameras and scanners. First, a series of image-processing techniques is applied to detect text regions in the image. Subsequently, the detected text regions pass through different processing stages that reduce blurring and the negative effects of variable lighting. This results in the creation of multiple images that are versions of the same text region. Some of these multiple versions are sent to a character-recognition system. The resulting texts from each of the versions of the image sent to the character-recognition system are then combined to a single result, wherein the single result is detected text.

Type: Grant

Filed: February 3, 2015

Date of Patent: December 27, 2016

Assignee: A9.com, Inc.

Inventors: Raghavan Manmatha, Mark A. Ruzon
Method and system for matching an image using normalized feature vectors

Patent number: 9530076

Abstract: A method, system and computer program product for encoding an image is provided. The image that needs to be represented is represented in the form of a Gaussian pyramid which is a scale-space representation of the image and includes several pyramid images. The feature points in the pyramid images are identified and a specified number of feature points are selected. The orientations of the selected feature points are obtained by using a set of orientation calculating algorithms. A patch is extracted around the feature point in the pyramid images based on the orientations of the feature point and the sampling factor of the pyramid image. The boundary patches in the pyramid images are extracted by padding the pyramid images with extra pixels. The feature vectors of the extracted patches are defined. These feature vectors are normalized so that the components in the feature vectors are less than a threshold.

Type: Grant

Filed: February 16, 2015

Date of Patent: December 27, 2016

Assignee: A9.com, Inc.

Inventors: Mark A. Ruzon, Raghavan Manmatha, Donald Tanguay
Method and system for searching for information on a network in response to an image query sent by a user from a mobile communications device

Patent number: 9104700

Abstract: Present invention relates to a method and system for automatic searching for information on a network in response to an image query sent by a user. The image query includes an image that is captured by using a mobile communications device with a camera. The image is processed to detect the text present in it. The detected text is then recognized using an OCR. Subsequently, the text is searched for matches in the corresponding domain database, selected from the various domain databases present in the network. Thereafter, selected matches and additional related information is sent to the user.

Type: Grant

Filed: January 27, 2014

Date of Patent: August 11, 2015

Assignee: A9.com, Inc.

Inventors: Gurumurthy D. Ramkumar, Raghavan Manmatha, Supratik Bhattacharyya, Gautam Bhargava, Mark A. Ruzon
CONTENT COLLECTION SEARCH WITH ROBUST CONTENT MATCHING

Publication number: 20150213061

Abstract: Systems and approaches for searching a content collection corresponding to query content are provided. In particular, false positive match rates between the query content and the content collection may be reduced with a minimum content region test and/or a minimum features per scale test. For example, by correlating content descriptors of a content piece in the content collection with query descriptors of the query content, the content piece can be determined to match the query content when a particular region of the content piece and/or a particular region of a query descriptor have a proportionate size meeting or exceeding a specified minimum. Alternatively, or in addition, the false positive match rate between query content and a content piece can be reduced by comparing content descriptors and query descriptors of features at a plurality of scales. A content piece can be determined to match the query content according to descriptor proportion quotas for the plurality of scales.

Type: Application

Filed: January 26, 2015

Publication date: July 30, 2015

Inventors: Arnab Sanat Kumar Dhua, Sunil Ramesh, Max Delgadillo, Raghavan Manmatha
METHOD AND SYSTEM FOR MATCHING AN IMAGE USING NORMALIZED FEATURE VECTORS

Publication number: 20150161480

Abstract: A method, system and computer program product for encoding an image is provided. The image that needs to be represented is represented in the form of a Gaussian pyramid which is a scale-space representation of the image and includes several pyramid images. The feature points in the pyramid images are identified and a specified number of feature points are selected. The orientations of the selected feature points are obtained by using a set of orientation calculating algorithms. A patch is extracted around the feature point in the pyramid images based on the orientations of the feature point and the sampling factor of the pyramid image. The boundary patches in the pyramid images are extracted by padding the pyramid images with extra pixels. The feature vectors of the extracted patches are defined. These feature vectors are normalized so that the components in the feature vectors are less than a threshold.

Type: Application

Filed: February 16, 2015

Publication date: June 11, 2015

Inventors: Mark A. Ruzon, Raghavan Manmatha, Donald Tanguay
METHOD AND SYSTEM FOR DETECTING AND RECOGNIZING TEXT IN IMAGES

Publication number: 20150154464

Abstract: Various embodiments of the present invention relate to a method, system and computer program product for detecting and recognizing text in the images captured by cameras and scanners. First, a series of image-processing techniques is applied to detect text regions in the image. Subsequently, the detected text regions pass through different processing stages that reduce blurring and the negative effects of variable lighting. This results in the creation of multiple images that are versions of the same text region. Some of these multiple versions are sent to a character-recognition system. The resulting texts from each of the versions of the image sent to the character-recognition system are then combined to a single result, wherein the single result is detected text.

Type: Application

Filed: February 3, 2015

Publication date: June 4, 2015

Inventors: Raghavan Manmatha, Mark A. Ruzon
Method and system for detecting and recognizing text in images

Patent number: 8977072

Abstract: Various embodiments of the present invention relate to a method, system and computer program product for detecting and recognizing text in the images captured by cameras and scanners. First, a series of image-processing techniques is applied to detect text regions in the image. Subsequently, the detected text regions pass through different processing stages that reduce blurring and the negative effects of variable lighting. This results in the creation of multiple images that are versions of the same text region. Some of these multiple versions are sent to a character-recognition system. The resulting texts from each of the versions of the image sent to the character-recognition system are then combined to a single result, wherein the single result is detected text.

Type: Grant

Filed: December 13, 2012

Date of Patent: March 10, 2015

Assignee: A9.com, Inc.

Inventors: Raghavan Manmatha, Mark A. Ruzon

1 2 next