Patents by Inventor V. G. Vinod Vydiswaran

V. G. Vinod Vydiswaran has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 8239387
    Abstract: Subject matter disclosed herein may relate to clustering electronic documents, such as, for example, web pages, and may also relate to template identification for electronic documents.
    Type: Grant
    Filed: February 22, 2008
    Date of Patent: August 7, 2012
    Assignee: Yahoo! Inc.
    Inventors: Amit Madaan, V. G. Vinod Vydiswaran, Rupesh R. Mehta
  • Patent number: 8046681
    Abstract: Techniques are disclosed herein to automatically learn a template that describes a common structure present in documents in a training set. The structure of the template is compared to the structure of the documents (or at least a part of each document) in the training set, one-by-one, and generalized in response to differences between the template and the document to which the template is currently being compared. If the structure of any particular document is considered too dissimilar from the structure of the template, then the template is not modified. Various generalization operators are added to the template to generalize the template. One such generalization operator is an “OR”, which indicates that only one of “n” sub-trees below the “OR” operator in the template is allowed at the corresponding position in a document.
    Type: Grant
    Filed: November 27, 2007
    Date of Patent: October 25, 2011
    Assignee: Yahoo! Inc.
    Inventors: V. G. Vinod Vydiswaran, Rupesh R. Mehta, Amit Madaan
  • Publication number: 20100174715
    Abstract: A template or wrapper tree for a document such as a web page is generalized from the bottom up (from leaf toward root of a logical tree structure of the template). At a given level in the tree, sub-trees are clustered and the clustered sub-trees are generalized, and the process is repeated at a next higher level in the tree, resulting in a generalized template or wrapper tree. This can be done by generating a nested pattern regular expression based on the sub-tree clusters, merging sub-trees based on the nested pattern regular expression, and then replacing sub-trees in a tree-based regular expression of the template or wrapper at the given level with the merged sub-trees. This process is repeated at a next higher level of the tree (progressing from leaf towards root) until the wrapper or tree-based regular expression that represents the template is fully generalized.
    Type: Application
    Filed: February 22, 2010
    Publication date: July 8, 2010
    Applicant: YAHOO! INC.
    Inventors: Charu Tiwari, V.G. Vinod Vydiswaran
  • Patent number: 7668942
    Abstract: A template or wrapper tree for a document such as a web page is generalized from the bottom up (from leaf toward root of a logical tree structure of the template). At a given level in the tree, sub-trees are clustered and the clustered sub-trees are generalized, and the process is repeated at a next higher level in the tree, resulting in a generalized template or wrapper tree. This can be done by generating a nested pattern regular expression based on the sub-tree clusters, merging sub-trees based on the nested pattern regular expression, and then replacing sub-trees in a tree-based regular expression of the template or wrapper at the given level with the merged sub-trees. This process is repeated at a next higher level of the tree (progressing from leaf towards root) until the wrapper or tree-based regular expression that represents the template is fully generalized.
    Type: Grant
    Filed: May 2, 2008
    Date of Patent: February 23, 2010
    Assignee: Yahoo! Inc.
    Inventors: Charu Tiwari, V. G. Vinod Vydiswaran
  • Publication number: 20090276506
    Abstract: A template or wrapper tree for a document such as a web page is generalized from the bottom up (from leaf toward root of a logical tree structure of the template). At a given level in the tree, sub-trees are clustered and the clustered sub-trees are generalized, and the process is repeated at a next higher level in the tree, resulting in a generalized template or wrapper tree. This can be done by generating a nested pattern regular expression based on the sub-tree clusters, merging sub-trees based on the nested pattern regular expression, and then replacing sub-trees in a tree-based regular expression of the template or wrapper at the given level with the merged subtrees. This process is repeated at a next higher level of the tree (progressing from leaf towards root) until the wrapper or tree-based regular expression that represents the template is fully generalized.
    Type: Application
    Filed: May 2, 2008
    Publication date: November 5, 2009
    Applicant: YAHOO! INC.
    Inventors: Charu Tiwari, V.G. Vinod Vydiswaran
  • Publication number: 20090204889
    Abstract: Techniques are provided for improving the recall rate of an information extraction system by automatically selecting pages to surface to a user for annotation based on variation data. Techniques are provided for generating the variation data during the construction of the template that is to be used for extraction. During template construction, data is stored to indicate which template-construction pages saw or made changes to nodes in the template. After interesting nodes have been identified in the template, the data stored during template construction is used to determine which pages made changes to interesting-variation nodes. Techniques are also provided for generating the variation data during the extraction phase, when the template is being used to extract information from pages. During the extraction phase, variation data is generated in response to detecting that extraction for a given page resulted in one or more empty attributes.
    Type: Application
    Filed: February 13, 2008
    Publication date: August 13, 2009
    Inventors: Rupesh R. Mehta, V.G. Vinod Vydiswaran
  • Publication number: 20090125529
    Abstract: Techniques are disclosed herein for extracting attributes from documents such as web pages. A structure of a training document is compared with a structure of a template to determine a template-node that structurally corresponds to a training-document node that has been annotated with an attribute. Filters can be learned by analyzing characteristics that the attribute possesses in the training document. To extract information for the attribute from a new document, first a set of candidate nodes in a new document are determined by determining which nodes in the new document structurally map to the template node. The filters are applied to eliminate false positives from the candidate nodes. Information can then be extracted from the new document, based on remaining candidate nodes. Even if incremental changes are made to the structure of new documents, nodes that posses the attributes can still be reliably identified.
    Type: Application
    Filed: November 12, 2007
    Publication date: May 14, 2009
    Inventors: V.G. Vinod Vydiswaran, Charu Tiwari, Arun Ramanujapuram