Patents by Inventor Rupesh R. Mehta

Rupesh R. Mehta has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 8832102
    Abstract: Exemplary methods and apparatuses are provided which may be implemented using one or more computing devices to allow for super clustering of clusters of electronic documents based, at least in part, on structural and static content features.
    Type: Grant
    Filed: January 12, 2010
    Date of Patent: September 9, 2014
    Assignee: Yahoo! Inc.
    Inventors: Rupesh R. Mehta, Srinivasan H. Sengamedu, Rajeev R. Rastogi
  • Patent number: 8239387
    Abstract: Subject matter disclosed herein may relate to clustering electronic documents, such as, for example, web pages, and may also relate to template identification for electronic documents.
    Type: Grant
    Filed: February 22, 2008
    Date of Patent: August 7, 2012
    Assignee: Yahoo! Inc.
    Inventors: Amit Madaan, V. G. Vinod Vydiswaran, Rupesh R. Mehta
  • Patent number: 8046681
    Abstract: Techniques are disclosed herein to automatically learn a template that describes a common structure present in documents in a training set. The structure of the template is compared to the structure of the documents (or at least a part of each document) in the training set, one-by-one, and generalized in response to differences between the template and the document to which the template is currently being compared. If the structure of any particular document is considered too dissimilar from the structure of the template, then the template is not modified. Various generalization operators are added to the template to generalize the template. One such generalization operator is an “OR”, which indicates that only one of “n” sub-trees below the “OR” operator in the template is allowed at the corresponding position in a document.
    Type: Grant
    Filed: November 27, 2007
    Date of Patent: October 25, 2011
    Assignee: Yahoo! Inc.
    Inventors: V. G. Vinod Vydiswaran, Rupesh R. Mehta, Amit Madaan
  • Publication number: 20110173197
    Abstract: Exemplary methods and apparatuses are provided which may be implemented using one or more computing devices to allow for super clustering of clusters of electronic documents based, at least in part, on structural and static content features.
    Type: Application
    Filed: January 12, 2010
    Publication date: July 14, 2011
    Applicant: Yahoo! Inc.
    Inventors: Rupesh R. Mehta, Srinivasan H. Sengamedu, Rajeev R. Rastogi
  • Publication number: 20110040770
    Abstract: An example of a method includes generating an attributed extensible markup language path (XPath) for an annotated entity in a web page. The method further includes determining a first node that satisfy the attributed XPath in the web page and is annotated. The method also includes identifying an attribute property that satisfies predefined criteria in the web page while traversing from the first node to a root node, the attribute property comprising an attribute value and an attribute name. Moreover, the method includes populating the attributed XPath with the attribute property that satisfies predefined criteria. The method also includes filtering the attributed XPath to generate a robust XPath, and extracting content from multiple web pages based on the robust XPath.
    Type: Application
    Filed: August 13, 2009
    Publication date: February 17, 2011
    Applicant: Yahoo! Inc.
    Inventors: Amit MADAAN, Charu TIWARI, Rupesh R. MEHTA
  • Publication number: 20100228738
    Abstract: A method and apparatus for improved sampling documents for training sets input to information extraction systems is provided, which improves the recall and robustness of wrapper extraction. A passive sampling technique provides a list of documents to present for human annotation ordered by representativeness of the document based on structural and content statistics. Thus, the document with the most interesting attributes and which is most representative of the cluster of structurally similar documents to which the document pertains is presented for annotation first. The problem is mapped to classical ‘Set-Cover’ problem and solved using greedy approach. An active sampling technique refines and reorders the sample list produced by the passive sampling technique after initial annotations, based on the human annotation, spatial boundaries of the documents, and structural and content statistics.
    Type: Application
    Filed: March 4, 2009
    Publication date: September 9, 2010
    Inventors: Rupesh R. Mehta, Srinivasan H. Sengamedu
  • Publication number: 20090265611
    Abstract: Methods and apparatus are described which enable the efficient adaptation of web pages to mobile displays. The more important or relevant sections of a web page are identified and configured into a more compact form. Both layout preserving and high compaction techniques are described.
    Type: Application
    Filed: May 7, 2008
    Publication date: October 22, 2009
    Applicant: Yahoo ! Inc.
    Inventors: Srinivasan H. Sengamedu, Rupesh R. Mehta
  • Publication number: 20090248707
    Abstract: Methods and systems are provided herein that may allow for pertinent information-type(s) of data to be located or otherwise identified within one or more documents, such as, for example, web page documents associated with one or more websites. For example, exemplary methods and systems are provided that may be used to determine if information may be more likely to be of an “informative” type of information or possibly more likely to be of a “noise” type of information.
    Type: Application
    Filed: March 25, 2008
    Publication date: October 1, 2009
    Applicant: Yahoo! Inc.
    Inventors: Rupesh R. Mehta, Amit Madaan
  • Publication number: 20090216708
    Abstract: Subject matter disclosed herein may relate to clustering electronic documents, such as, for example, web pages, and may also relate to template identification for electronic documents.
    Type: Application
    Filed: February 22, 2008
    Publication date: August 27, 2009
    Applicant: Yahoo! Inc.
    Inventors: Amit Madaan, V. G. Vydiswaran, Rupesh R. Mehta
  • Publication number: 20090204889
    Abstract: Techniques are provided for improving the recall rate of an information extraction system by automatically selecting pages to surface to a user for annotation based on variation data. Techniques are provided for generating the variation data during the construction of the template that is to be used for extraction. During template construction, data is stored to indicate which template-construction pages saw or made changes to nodes in the template. After interesting nodes have been identified in the template, the data stored during template construction is used to determine which pages made changes to interesting-variation nodes. Techniques are also provided for generating the variation data during the extraction phase, when the template is being used to extract information from pages. During the extraction phase, variation data is generated in response to detecting that extraction for a given page resulted in one or more empty attributes.
    Type: Application
    Filed: February 13, 2008
    Publication date: August 13, 2009
    Inventors: Rupesh R. Mehta, V.G. Vinod Vydiswaran