Patents by Inventor Meghana Kshirsagar

Meghana Kshirsagar has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

HIGH PRECISION WEB EXTRACTION USING SITE KNOWLEDGE

Publication number: 20100257440

Abstract: Techniques for high precision web extraction using site knowledge are provided. Portions of repeating text are identified in unlabeled web pages from a particular web site. Based on the portions of repeating text, the unlabeled web pages are partitioned into a set of segments. Multiple labels are assigned to respectively corresponding multiple attributes in the set of segments, where assigning the multiple labels comprises applying a classification model to each separate segment in the set of segments. First one or more labels are identified that were erroneously assigned to one or more attributes in the set of segments. Second one or more correct labels for the one or more attributes are determined. The first one or more labels in the set of segments are corrected by assigning the second one or more labels to the one or more attributes.

Type: Application

Filed: April 1, 2009

Publication date: October 7, 2010

Inventors: Meghana Kshirsagar, Rajeev Rastogi, Sandeepkumar Bhuramal Satpal, Srinivasan H. Sengamedu, Venu Satuluri
AUTOMATIC EXTRACTION USING MACHINE LEARNING BASED ROBUST STRUCTURAL EXTRACTORS

Publication number: 20100223214

Abstract: A method and apparatus for automatically extracting information from a large number of documents through applying machine learning techniques and exploiting structural similarities among documents. A machine learning model is trained to have at least 50% accuracy. The trained machine learning model is used to identify information attributes in a sample of pages from a cluster of structurally similar documents. A structure-specific model of the cluster is created by compiling a list of top-K locations for each attribute identified by the trained machine learning model in the sample. These top-K lists are used to extract information from the pages of the cluster from which the sample of pages was taken.

Type: Application

Filed: February 27, 2009

Publication date: September 2, 2010

Inventors: Alok S. Kirpal, Sandeepkumar Bhuramal Satpal, Meghana Kshirsagar, Srinivasan H. Sengamedu
BOOSTING EXTRACTION ACCURACY BY HANDLING TRAINING DATA BIAS

Publication number: 20090216739

Abstract: Methods and apparatus are described for use with information extraction techniques based on sequential models. Additional statistics are maintained during inference and employed to boost the accuracy of the extraction algorithm and mitigate the effects of training bias.

Type: Application

Filed: February 22, 2008

Publication date: August 27, 2009

Applicant: YAHOO! INC.

Inventors: Alok S. Kirpal, Meghana Kshirsagar

HIGH PRECISION WEB EXTRACTION USING SITE KNOWLEDGE

AUTOMATIC EXTRACTION USING MACHINE LEARNING BASED ROBUST STRUCTURAL EXTRACTORS

BOOSTING EXTRACTION ACCURACY BY HANDLING TRAINING DATA BIAS