Patents by Inventor Sandeep Tata

Sandeep Tata has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

RECOMMENDING A DOCUMENT FOR A USER TO ACCESS

Publication number: 20240346316

Abstract: A user device can send, to a server, a request for a set of documents likely to be opened by a user, determine a client-suggested document to present to the user and a potential motive for the user to open the client-suggested document, receive a suggestion message from the server, the suggestion message including a set of documents likely to be opened by the user and potential motives for the user to open documents in the set of documents, and present, on a display of the user device, visual representations of the client-suggested document, the potential motive for the user to open the client-suggested document, multiple documents included in the set of documents, and the potential motives for the user to open the multiple documents in the set of documents.

Type: Application

Filed: June 27, 2024

Publication date: October 17, 2024

Inventors: Sandeep Tata, Julian Gibbons, Divanshu Garg, Alexandre Mah, Alan Green, Cayden Meyer, Michael Smith, Reuben Kan, Alexandrin Popescul
Transferable Neural Architecture for Structured Data Extraction From Web Documents

Publication number: 20240126827

Abstract: Systems and methods for efficiently identifying and extracting machine-actionable structured data from web documents are provided. The technology employs neural network architectures which process the raw HTML content of a set of seed websites to create transferrable models regarding information of interest. These models can then be applied to the raw HTML of other websites to identify similar information of interest. Data can thus be extracted across multiple websites in a functional, structured form that allows it to be used further by a processing system.

Type: Application

Filed: December 13, 2023

Publication date: April 18, 2024

Inventors: Ying Sheng, Yuchen Lin, Sandeep Tata, Nguyen Vo
Leveraging Machine Learning Models to Identify Missing or Incorrect Labels in Training or Testing Data

Publication number: 20240054390

Abstract: Labels are often over labeled by machine-learning models and under labeled by human labelers. A solution to the over and under labeling problem is to have both a machine-learning model and a human label a document, then send the document to a parser to determine the discrepancies. The discrepancies are then presented to a human to review and decide whether the machine-learning model identified labels are labels. The feedback is then given to the machine-learning model for further improvement in its confidence calculations which via a confidence threshold determine if the identified labels are presented.

Type: Application

Filed: August 19, 2022

Publication date: February 15, 2024

Inventors: James Bradley Wendt, Sandeep Tata, Lauro Ivo Beltrao Colaco Costa, Emmanouil Koukoumidis
System for Information Extraction from Form-Like Documents

Publication number: 20240046684

Abstract: The present disclosure is directed to extracting text from form-like documents. In particular, a computing system can obtain an image of a document that contains a plurality of portions of text. The computing system can extract one or more candidate text portions for each field type included in a target schema. The computing system can generate a respective input feature vector for each candidate for the field type. The computing system can generate a respective candidate embedding for the candidate text portion. The computing system can determine a respective score for each candidate text portion for the field type based at least in part on the respective candidate embedding for the candidate text portion. The computing system can assign one or more of the candidate text portions to the field type based on the respective scores.

Type: Application

Filed: October 19, 2023

Publication date: February 8, 2024

Inventors: Sandeep Tata, Bodhisattwa Prasad Majumder, Qi Zhao, James Bradley Wendt, Marc Najork, Navneet Potti
Transferable neural architecture for structured data extraction from web documents

Patent number: 11886533

Abstract: Systems and methods for efficiently identifying and extracting machine-actionable structured data from web documents are provided. The technology employs neural network architectures which process the raw HTML content of a set of seed websites to create transferable models regarding information of interest. These models can then be applied to the raw HTML of other websites to identify similar information of interest. Data can thus be extracted across multiple websites in a functional, structured form that allows it to be used further by a processing system.

Type: Grant

Filed: January 29, 2020

Date of Patent: January 30, 2024

Assignee: Google LLC

Inventors: Ying Sheng, Yuchen Lin, Sandeep Tata, Nguyen Vo
System for information extraction from form-like documents

Patent number: 11830269

Abstract: The present disclosure is directed to extracting text from form-like documents. In particular, a computing system can obtain an image of a document that contains a plurality of portions of text. The computing system can extract one or more candidate text portions for each field type included in a target schema. The computing system can generate a respective input feature vector for each candidate for the field type. The computing system can generate a respective candidate embedding for the candidate text portion. The computing system can determine a respective score for each candidate text portion for the field type based at least in part on the respective candidate embedding for the candidate text portion. The computing system can assign one or more of the candidate text portions to the field type based on the respective scores.

Type: Grant

Filed: July 18, 2022

Date of Patent: November 28, 2023

Assignee: GOOGLE LLC

Inventors: Sandeep Tata, Bodhisattwa Prasad Majumder, Qi Zhao, James Bradley Wendt, Marc Najork, Navneet Potti
A Transferable Neural Architecture for Structured Data Extraction From Web Documents

Publication number: 20230014465

Abstract: Systems and methods for efficiently identifying and extracting machine-actionable structured data from web documents are provided. The technology employs neural network architectures which process the raw HTML content of a set of seed websites to create transferable models regarding information of interest. These models can then be applied to the raw HTML of other websites to identify similar information of interest. Data can thus be extracted across multiple websites in a functional, structured form that allows it to be used further by a processing system.

Type: Application

Filed: January 29, 2020

Publication date: January 19, 2023

Inventors: Ying Sheng, Yuchen Lin, Sandeep Tata, Nguyen Vo
Systems and methods for active learning

Patent number: 11526752

Abstract: Provided are computing systems and methods directed to active learning and may provide advantages or improvements to active learning applications for skewed data sets. A challenge in training and developing high-quality models for many supervised learning scenarios is obtaining labeled training examples. Provided are systems and methods for active learning on a training dataset that includes both labeled and unlabeled datapoints. In particular, the systems and methods described herein can select (e.g., at each of a number of iterations) a number of the unlabeled datapoints for which labels should be obtained to gain additional labeled datapoints on which to train a machine-learned model (e.g., machine-learned classifier model). Generally, provided are cost-effective methods and systems for selecting data to improve machine-learned models in applications such as the identification of content items in text, images, and/or audio.

Type: Grant

Filed: January 23, 2020

Date of Patent: December 13, 2022

Assignee: GOOGLE LLC

Inventors: Qi Zhao, Abbas Kazerouni, Sandeep Tata, Jing Xie, Marc Najork
System for Information Extraction from Form-Like Documents

Publication number: 20220375245

Abstract: The present disclosure is directed to extracting text from form-like documents. In particular, a computing system can obtain an image of a document that contains a plurality of portions of text. The computing system can extract one or more candidate text portions for each field type included in a target schema. The computing system can generate a respective input feature vector for each candidate for the field type. The computing system can generate a respective candidate embedding for the candidate text portion. The computing system can determine a respective score for each candidate text portion for the field type based at least in part on the respective candidate embedding for the candidate text portion. The computing system can assign one or more of the candidate text portions to the field type based on the respective scores.

Type: Application

Filed: July 18, 2022

Publication date: November 24, 2022

Inventors: Sandeep Tata, Bodhisattwa Prasad Majumder, Qi Zhao, James Bradley Wendt, Marc Najork, Navneet Potti
System for information extraction from form-like documents

Patent number: 11393233

Abstract: The present disclosure is directed to extracting text from form-like documents. In particular, a computing system can obtain an image of a document that contains a plurality of portions of text. The computing system can extract one or more candidate text portions for each field type included in a target schema. The computing system can generate a respective input feature vector for each candidate for the field type. The computing system can generate a respective candidate embedding for the candidate text portion. The computing system can determine a respective score for each candidate text portion for the field type based at least in part on the respective candidate embedding for the candidate text portion. The computing system can assign one or more of the candidate text portions to the field type based on the respective scores.

Type: Grant

Filed: June 2, 2020

Date of Patent: July 19, 2022

Assignee: GOOGLE LLC

Inventors: Sandeep Tata, Bodhisattwa Prasad Majumder, Qi Zhao, James Bradley Wendt, Marc Najork, Navneet Potti
System for Information Extraction from Form-Like Documents

Publication number: 20210374395

Abstract: The present disclosure is directed to extracting text from form-like documents. In particular, a computing system can obtain an image of a document that contains a plurality of portions of text. The computing system can extract one or more candidate text portions for each field type included in a target schema. The computing system can generate a respective input feature vector for each candidate for the field type. The computing system can generate a respective candidate embedding for the candidate text portion. The computing system can determine a respective score for each candidate text portion for the field type based at least in part on the respective candidate embedding for the candidate text portion. The computing system can assign one or more of the candidate text portions to the field type based on the respective scores.

Type: Application

Filed: June 2, 2020

Publication date: December 2, 2021

Inventors: Sandeep Tata, Bodhisattwa Prasad Majumder, Qi Zhao, James Bradley Wendt, Marc Najork, Navneet Potti
RECOMMENDING A DOCUMENT FOR A USER TO ACCESS

Publication number: 20210019622

Abstract: A user device can send, to a server, a request for a set of documents likely to be opened by a user, determine a client-suggested document to present to the user and a potential motive for the user to open the client-suggested document, receive a suggestion message from the server, the suggestion message including a set of documents likely to be opened by the user and potential motives for the user to open documents in the set of documents, and present, on a display of the user device, visual representations of the client-suggested document, the potential motive for the user to open the client-suggested document, multiple documents included in the set of documents, and the potential motives for the user to open the multiple documents in the set of documents.

Type: Application

Filed: October 5, 2020

Publication date: January 21, 2021

Inventors: Alan Green, Cayden Meyer, Julian Gibbons, Alexandre Mah, Divanshu Garg, Reuben Kan, Michael Smith, Sandeep Tata, Alexandrin Popescul
Recommending a document for a user to access

Patent number: 10832130

Abstract: A user device can send, to a server, a request for a set of documents likely to be opened by a user, determine a client-suggested document to present to the user and a potential motive for the user to open the client-suggested document, receive a suggestion message from the server, the suggestion message including a set of documents likely to be opened by the user and potential motives for the user to open documents in the set of documents, and present, on a display of the user device, visual representations of the client-suggested document, the potential motive for the user to open the client-suggested document, multiple documents included in the set of documents, and the potential motives for the user to open the multiple documents in the set of documents.

Type: Grant

Filed: April 5, 2017

Date of Patent: November 10, 2020

Assignee: GOOGLE LLC

Inventors: Alan Green, Cayden Meyer, Julian Gibbons, Alexandre Mah, Divanshu Garg, Reuben Kan, Michael Smith, Sandeep Tata, Alexandrin Popescul
Systems and Methods for Active Learning

Publication number: 20200250527

Abstract: The present disclosure provides computing systems and methods directed to active learning and may provide advantages or improvements to active learning applications for skewed data sets. A challenge in training and developing high-quality models for many supervised learning scenarios is obtaining labeled training examples. This disclosure provides systems and methods for active learning on a training dataset that includes both labeled and unlabeled datapoints. In particular, the systems and methods described herein can select (e.g., at each of a number of iterations) a number of the unlabeled datapoints for which labels should be obtained to gain additional labeled datapoints on which to train a machine-learned model (e.g., machine-learned classifier model). Generally, the disclosure provides cost-effective methods and systems for selecting data to improve machine-learned models in applications such as the identification of content items in text, images, and/or audio.

Type: Application

Filed: January 23, 2020

Publication date: August 6, 2020

Inventors: Qi Zhao, Abbas Kazerouni, Sandeep Tata, Jing Xie, Marc Najork
Atomic incremental load for map-reduce systems on append-only file systems

Patent number: 10558615

Abstract: Augmenting data files in a repository of an append-only file system includes maintaining a companion metadata file for each corresponding data file in a map-reduce system using the append-only file system. Each companion metadata file tracks a logical end-of-file (EOF) for each data file. Global versioning of each companion metadata is maintained. A map-reduce append job is performed for a set of data files using a current global version number for the companion metadata file. The map-reduce job including multiple append tasks. For each successful append job, a logical EOF for each appended file is incremented to a new physical EOF. For each failed append task of the append job, a logical EOF is maintained for each failed append task by not incrementing the logical EOF for each failed append task.

Type: Grant

Filed: June 30, 2016

Date of Patent: February 11, 2020

Assignee: International Business Machines Corporation

Inventor: Sandeep Tata
Differentiated secondary index maintenance in log structured data stores

Patent number: 10078681

Abstract: There are provided a method for operating multi-node data stores. The method performs storing a data table in a first computing node and stores an index table in a second computing node. The index table provides keys used for accessing data in the first computing node and other multi-node data stores. The method performs operations that update or read the data table accessed from the first computing node and the index table accessed from the second computing node. The operations optimizes between latency in updating or reading the data table and the index table and data consistency maintained between data entries in the data table and data entries pointed by indices in the index table.

Type: Grant

Filed: December 21, 2015

Date of Patent: September 18, 2018

Assignee: International Business Machines Corporation

Inventors: Wei Tan, Sandeep Tata
Differentiated secondary index maintenance in log structured NoSQL data stores

Patent number: 10078682

Abstract: There are provided a system and a computer program product for operating multi-node data stores. The system stores a data table in a first computing node and stores an index table in a second computing node. The index table provides keys used for accessing data in the first computing node and other multi-node data stores. The system performs operations that update or read the data table accessed from the first computing node and the index table accessed from the second computing node. The operations optimizes between latency in updating or reading the data table and the index table and data consistency maintained between data entries in the data table and data entries pointed by indices in the index table.

Type: Grant

Filed: December 21, 2015

Date of Patent: September 18, 2018

Assignee: International Business Machines Corporation

Inventors: Wei Tan, Sandeep Tata
RECOMMENDING A DOCUMENT FOR A USER TO ACCESS

Publication number: 20180081503

Abstract: A user device can send, to a server, a request for a set of documents likely to be opened by a user, determine a client-suggested document to present to the user and a potential motive for the user to open the client-suggested document, receive a suggestion message from the server, the suggestion message including a set of documents likely to be opened by the user and potential motives for the user to open documents in the set of documents, and present, on a display of the user device, visual representations of the client-suggested document, the potential motive for the user to open the client-suggested document, multiple documents included in the set of documents, and the potential motives for the user to open the multiple documents in the set of documents.

Type: Application

Filed: April 5, 2017

Publication date: March 22, 2018

Inventors: Alan GREEN, Cayden MEYER, Julian GIBBONS, Alexandre MAH, Divanshu GARG, Reuben KAN, Michael SMITH, Sandeep TATA, Alexandrin POPESCUL
ATOMIC INCREMENTAL LOAD FOR MAP-REDUCE SYSTEMS ON APPEND-ONLY FILE SYSTEMS

Publication number: 20160306799

Abstract: Augmenting data files in a repository of an append-only file system includes maintaining a companion metadata file for each corresponding data file in a map-reduce system using the append-only file system. Each companion metadata file tracks a logical end-of-file (EOF) for each data file. Global versioning of each companion metadata is maintained. A map-reduce append job is performed for a set of data files using a current global version number for the companion metadata file. The map-reduce job including multiple append tasks. For each successful append job, a logical EOF for each appended file is incremented to a new physical EOF. For each failed append task of the append job, a logical EOF is maintained for each failed append task by not incrementing the logical EOF for each failed append task.

Type: Application

Filed: June 30, 2016

Publication date: October 20, 2016

Inventor: Sandeep Tata
Atomic incremental load for map-reduce systems on append-only file systems

Patent number: 9424271

Abstract: Augmenting data files in a repository of an append-only file system comprises maintaining metadata corresponding to each data file for tracking a logical end-of-file (EOF) for each data file for appending. A global versioning mechanism for the metadata allows selecting the current version of the metadata to read for performing an append job for a set of data files. Each append job comprises multiple append tasks. For each successful append job, the global versioning mechanism increments a valid metadata version to use for each data file appended. Said valid metadata version indicates the logical EOF corresponding to a new physical EOF for each of the data files appended.

Type: Grant

Filed: August 30, 2012

Date of Patent: August 23, 2016

Assignee: International Business Machines Corporation

Inventor: Sandeep Tata

1 2 next