Patents by Inventor Alok S. Kirpal

Alok S. Kirpal has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Method and system for form-filling crawl and associating rich keywords

Patent number: 8793239

Abstract: Techniques are provided for the efficient location, processing, and retrieval of local product information derived from web pages generally locatable through form queries submitted to web pages often referred to as the “deep” or “hidden” web. In an embodiment, information such as product information and dealer-location information is located on a web page form such as a dealer-locator form. After location of a suitable web page form, editorial wrapping is performed to create an automated information extraction process. Using the automated information extractor, deep-web crawling is performed. A grid-based extraction of individual business records is performed, and matching and ingestion are performed in conjunction with a business listing database. Finally, metadata tags are added to entries in the business listing database. Metadata tags also may be added to entries in other databases.

Type: Grant

Filed: October 8, 2009

Date of Patent: July 29, 2014

Assignee: Yahoo! Inc.

Inventors: Nilesh Dalvi, Raghu Ramakrishnan, Vinay Kakade, Arup Kumar Choudhury, Sathiya Keerthi Selvaraj, Philip Bohannon, Mani Abrol, David Ciemiewicz, Arun Shankar Iyer, Vipul Agarwal, Alok S. Kirpal
Method and System for Form-Filling Crawl and Associating Rich Keywords

Publication number: 20110087646

Abstract: Techniques are provided for the efficient location, processing, and retrieval of local product information derived from web pages generally locatable through form queries submitted to web pages often referred to as the “deep” or “hidden” web. In an embodiment, information such as product information and dealer-location information is located on a web page form such as a dealer-locator form. After location of a suitable web page form, editorial wrapping is performed to create an automated information extraction process. Using the automated information extractor, deep-web crawling is performed. A grid-based extraction of individual business records is performed, and matching and ingestion are performed in conjunction with a business listing database. Finally, metadata tags are added to entries in the business listing database. Metadata tags also may be added to entries in other databases.

Type: Application

Filed: October 8, 2009

Publication date: April 14, 2011

Inventors: Nilesh Dalvi, Raghu Ramakrishnan, Vinay Kakade, Arup Kumar Choudhury, Sathiya Keerthi Selvaraj, Philip Bohannon, Mani Abrol, David Ciemiewicz, Arun Shankar Iyer, Vipul Agarwal, Alok S. Kirpal
APPARATUS AND METHODS FOR CONCEPT-CENTRIC INFORMATION EXTRACTION

Publication number: 20100241639

Abstract: Disclosed are methods and apparatus for extracting (or annotating) structured information from web content. Web content of interest from a particular domain is represented as one or more tree instances having a plurality of branching nodes that each correspond to a web object such that the tree instances correspond to one or more structured data instances. The particular domain is associated with domain knowledge that includes one or more presentation rulesets that each specifies a particular structure for a set of data instances, a domain-specific concept labeler, one or more specified properties of the web objects in the tree instances, and a concept schema that specifies a representation of the data to be extracted from the web content. A structured data instance that conforms to the concept schema is extracted from the one or more tree instances based on the domain knowledge for the particular domain.

Type: Application

Filed: March 20, 2009

Publication date: September 23, 2010

Applicant: YAHOO! INC.

Inventors: Daniel Kifer, Srujana Merugu, Ankur Jain, Sathiya Keerthi Selvaraj, Alok S. Kirpal, Philip L. Bohannon, Raghu Ramakrishnan
AUTOMATIC EXTRACTION USING MACHINE LEARNING BASED ROBUST STRUCTURAL EXTRACTORS

Publication number: 20100223214

Abstract: A method and apparatus for automatically extracting information from a large number of documents through applying machine learning techniques and exploiting structural similarities among documents. A machine learning model is trained to have at least 50% accuracy. The trained machine learning model is used to identify information attributes in a sample of pages from a cluster of structurally similar documents. A structure-specific model of the cluster is created by compiling a list of top-K locations for each attribute identified by the trained machine learning model in the sample. These top-K lists are used to extract information from the pages of the cluster from which the sample of pages was taken.

Type: Application

Filed: February 27, 2009

Publication date: September 2, 2010

Inventors: Alok S. Kirpal, Sandeepkumar Bhuramal Satpal, Meghana Kshirsagar, Srinivasan H. Sengamedu
SYNTACTIC AND/OR SEMANTIC ANALYSIS OF UNIFORM RESOURCE IDENTIFIERS

Publication number: 20090240638

Abstract: Subject matter disclosed herein may relate to analyses of uniform resource identifiers associated with web pages, and further may relate to gathering information about web pages by analyzing the uniform resource identifiers.

Type: Application

Filed: March 19, 2008

Publication date: September 24, 2009

Applicant: Yahoo! Inc.

Inventors: Alok S. Kirpal, Krishna Lecla Poola, Kirshna Prasad Chitrapura
BOOSTING EXTRACTION ACCURACY BY HANDLING TRAINING DATA BIAS

Publication number: 20090216739

Abstract: Methods and apparatus are described for use with information extraction techniques based on sequential models. Additional statistics are maintained during inference and employed to boost the accuracy of the extraction algorithm and mitigate the effects of training bias.

Type: Application

Filed: February 22, 2008

Publication date: August 27, 2009

Applicant: YAHOO! INC.

Inventors: Alok S. Kirpal, Meghana Kshirsagar
EXTRACTING ENTITIES FROM A WEB PAGE

Publication number: 20090182759

Abstract: A method for extracting entities from a web page includes first applying a high precision low recall (HPLR) technique on a first web page, producing one or more entities extracted from the first web page. Then a sequential model is trained using the one or more entities extracted from the first web page. The sequential model is then performed on a second web page, producing one or more entities extracted from the second web page.

Type: Application

Filed: January 11, 2008

Publication date: July 16, 2009

Applicant: YAHOO! INC.

Inventor: Alok S. Kirpal

Method and system for form-filling crawl and associating rich keywords

Method and System for Form-Filling Crawl and Associating Rich Keywords

APPARATUS AND METHODS FOR CONCEPT-CENTRIC INFORMATION EXTRACTION

AUTOMATIC EXTRACTION USING MACHINE LEARNING BASED ROBUST STRUCTURAL EXTRACTORS

SYNTACTIC AND/OR SEMANTIC ANALYSIS OF UNIFORM RESOURCE IDENTIFIERS

BOOSTING EXTRACTION ACCURACY BY HANDLING TRAINING DATA BIAS

EXTRACTING ENTITIES FROM A WEB PAGE