Patents by Inventor V. G. Vinod Vydiswaran

V. G. Vinod Vydiswaran has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Structural clustering and template identification for electronic documents

Patent number: 8239387

Abstract: Subject matter disclosed herein may relate to clustering electronic documents, such as, for example, web pages, and may also relate to template identification for electronic documents.

Type: Grant

Filed: February 22, 2008

Date of Patent: August 7, 2012

Assignee: Yahoo! Inc.

Inventors: Amit Madaan, V. G. Vinod Vydiswaran, Rupesh R. Mehta
Techniques for inducing high quality structural templates for electronic documents

Patent number: 8046681

Abstract: Techniques are disclosed herein to automatically learn a template that describes a common structure present in documents in a training set. The structure of the template is compared to the structure of the documents (or at least a part of each document) in the training set, one-by-one, and generalized in response to differences between the template and the document to which the template is currently being compared. If the structure of any particular document is considered too dissimilar from the structure of the template, then the template is not modified. Various generalization operators are added to the template to generalize the template. One such generalization operator is an “OR”, which indicates that only one of “n” sub-trees below the “OR” operator in the template is allowed at the corresponding position in a document.

Type: Grant

Filed: November 27, 2007

Date of Patent: October 25, 2011

Assignee: Yahoo! Inc.

Inventors: V. G. Vinod Vydiswaran, Rupesh R. Mehta, Amit Madaan
GENERATING DOCUMENT TEMPLATES THAT ARE ROBUST TO STRUCTURAL VARIATIONS

Publication number: 20100174715

Abstract: A template or wrapper tree for a document such as a web page is generalized from the bottom up (from leaf toward root of a logical tree structure of the template). At a given level in the tree, sub-trees are clustered and the clustered sub-trees are generalized, and the process is repeated at a next higher level in the tree, resulting in a generalized template or wrapper tree. This can be done by generating a nested pattern regular expression based on the sub-tree clusters, merging sub-trees based on the nested pattern regular expression, and then replacing sub-trees in a tree-based regular expression of the template or wrapper at the given level with the merged sub-trees. This process is repeated at a next higher level of the tree (progressing from leaf towards root) until the wrapper or tree-based regular expression that represents the template is fully generalized.

Type: Application

Filed: February 22, 2010

Publication date: July 8, 2010

Applicant: YAHOO! INC.

Inventors: Charu Tiwari, V.G. Vinod Vydiswaran
Generating document templates that are robust to structural variations

Patent number: 7668942

Abstract: A template or wrapper tree for a document such as a web page is generalized from the bottom up (from leaf toward root of a logical tree structure of the template). At a given level in the tree, sub-trees are clustered and the clustered sub-trees are generalized, and the process is repeated at a next higher level in the tree, resulting in a generalized template or wrapper tree. This can be done by generating a nested pattern regular expression based on the sub-tree clusters, merging sub-trees based on the nested pattern regular expression, and then replacing sub-trees in a tree-based regular expression of the template or wrapper at the given level with the merged sub-trees. This process is repeated at a next higher level of the tree (progressing from leaf towards root) until the wrapper or tree-based regular expression that represents the template is fully generalized.

Type: Grant

Filed: May 2, 2008

Date of Patent: February 23, 2010

Assignee: Yahoo! Inc.

Inventors: Charu Tiwari, V. G. Vinod Vydiswaran
GENERATING DOCUMENT TEMPLATES THAT ARE ROBUST TO STRUCTURAL VARIATIONS

Publication number: 20090276506

Abstract: A template or wrapper tree for a document such as a web page is generalized from the bottom up (from leaf toward root of a logical tree structure of the template). At a given level in the tree, sub-trees are clustered and the clustered sub-trees are generalized, and the process is repeated at a next higher level in the tree, resulting in a generalized template or wrapper tree. This can be done by generating a nested pattern regular expression based on the sub-tree clusters, merging sub-trees based on the nested pattern regular expression, and then replacing sub-trees in a tree-based regular expression of the template or wrapper at the given level with the merged subtrees. This process is repeated at a next higher level of the tree (progressing from leaf towards root) until the wrapper or tree-based regular expression that represents the template is fully generalized.

Type: Application

Filed: May 2, 2008

Publication date: November 5, 2009

Applicant: YAHOO! INC.

Inventors: Charu Tiwari, V.G. Vinod Vydiswaran
ADAPTIVE SAMPLING OF WEB PAGES FOR EXTRACTION

Publication number: 20090204889

Abstract: Techniques are provided for improving the recall rate of an information extraction system by automatically selecting pages to surface to a user for annotation based on variation data. Techniques are provided for generating the variation data during the construction of the template that is to be used for extraction. During template construction, data is stored to indicate which template-construction pages saw or made changes to nodes in the template. After interesting nodes have been identified in the template, the data stored during template construction is used to determine which pages made changes to interesting-variation nodes. Techniques are also provided for generating the variation data during the extraction phase, when the template is being used to extract information from pages. During the extraction phase, variation data is generated in response to detecting that extraction for a given page resulted in one or more empty attributes.

Type: Application

Filed: February 13, 2008

Publication date: August 13, 2009

Inventors: Rupesh R. Mehta, V.G. Vinod Vydiswaran
EXTRACTING INFORMATION BASED ON DOCUMENT STRUCTURE AND CHARACTERISTICS OF ATTRIBUTES

Publication number: 20090125529

Abstract: Techniques are disclosed herein for extracting attributes from documents such as web pages. A structure of a training document is compared with a structure of a template to determine a template-node that structurally corresponds to a training-document node that has been annotated with an attribute. Filters can be learned by analyzing characteristics that the attribute possesses in the training document. To extract information for the attribute from a new document, first a set of candidate nodes in a new document are determined by determining which nodes in the new document structurally map to the template node. The filters are applied to eliminate false positives from the candidate nodes. Information can then be extracted from the new document, based on remaining candidate nodes. Even if incremental changes are made to the structure of new documents, nodes that posses the attributes can still be reliably identified.

Type: Application

Filed: November 12, 2007

Publication date: May 14, 2009

Inventors: V.G. Vinod Vydiswaran, Charu Tiwari, Arun Ramanujapuram

Structural clustering and template identification for electronic documents

Techniques for inducing high quality structural templates for electronic documents

GENERATING DOCUMENT TEMPLATES THAT ARE ROBUST TO STRUCTURAL VARIATIONS

Generating document templates that are robust to structural variations

GENERATING DOCUMENT TEMPLATES THAT ARE ROBUST TO STRUCTURAL VARIATIONS

ADAPTIVE SAMPLING OF WEB PAGES FOR EXTRACTION

EXTRACTING INFORMATION BASED ON DOCUMENT STRUCTURE AND CHARACTERISTICS OF ATTRIBUTES