Patents by Inventor Sandeepkumar Bhuramal Satpal

Sandeepkumar Bhuramal Satpal has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 8768926
    Abstract: Web pages are efficiently categorized in a data processor without analyzing the content of the web pages. According to at least one embodiment, data is maintained that represents sample URLs grouped into a plurality of clusters. The sample URLs of a cluster are used to produce a URL regular expression pattern (“URL-regex”) that differentiates the sample URLs of the cluster from the sample URLs of other clusters and that covers at least a specified percentage of the sample URLs in the cluster. The process of producing a URL-regex is repeated for each of the clusters producing a URL-regex for each cluster. Web pages are then categorized into one of the clusters by determining which of the URL-regex patterns produced for the clusters match URLs that refer to the web pages. Thus, a web page may be categorized based on a URL that refers to the web page without having to obtain and analyze the content of the web page.
    Type: Grant
    Filed: January 5, 2010
    Date of Patent: July 1, 2014
    Assignee: Yahoo! Inc.
    Inventors: Ashwin Tengli, Rajeev Rastogi, Jeyashankher Ramamirtham, Srinivasan H Sengamedu, Sandeepkumar Bhuramal Satpal
  • Publication number: 20110167063
    Abstract: Web pages are efficiently categorized in a data processor without analyzing the content of the web pages. According to at least one embodiment, data is maintained that represents sample URLs grouped into a plurality of clusters. The sample URLs of a cluster are used to produce a URL regular expression pattern (“URL-regex”) that differentiates the sample URLs of the cluster from the sample URLs of other clusters and that covers at least a specified percentage of the sample URLs in the cluster. The process of producing a URL-regex is repeated for each of the clusters producing a URL-regex for each cluster. Web pages are then categorized into one of the clusters by determining which of the URL-regex patterns produced for the clusters match URLs that refer to the web pages. Thus, a web page may be categorized based on a URL that refers to the web page without having to obtain and analyze the content of the web page.
    Type: Application
    Filed: January 5, 2010
    Publication date: July 7, 2011
    Inventors: Ashwin Tengli, Rajeev Rastogi, Jeyashankher Ramamirtham, Srinivasan H. Sengamedu, Sandeepkumar Bhuramal Satpal
  • Publication number: 20100257440
    Abstract: Techniques for high precision web extraction using site knowledge are provided. Portions of repeating text are identified in unlabeled web pages from a particular web site. Based on the portions of repeating text, the unlabeled web pages are partitioned into a set of segments. Multiple labels are assigned to respectively corresponding multiple attributes in the set of segments, where assigning the multiple labels comprises applying a classification model to each separate segment in the set of segments. First one or more labels are identified that were erroneously assigned to one or more attributes in the set of segments. Second one or more correct labels for the one or more attributes are determined. The first one or more labels in the set of segments are corrected by assigning the second one or more labels to the one or more attributes.
    Type: Application
    Filed: April 1, 2009
    Publication date: October 7, 2010
    Inventors: Meghana Kshirsagar, Rajeev Rastogi, Sandeepkumar Bhuramal Satpal, Srinivasan H. Sengamedu, Venu Satuluri
  • Publication number: 20100223214
    Abstract: A method and apparatus for automatically extracting information from a large number of documents through applying machine learning techniques and exploiting structural similarities among documents. A machine learning model is trained to have at least 50% accuracy. The trained machine learning model is used to identify information attributes in a sample of pages from a cluster of structurally similar documents. A structure-specific model of the cluster is created by compiling a list of top-K locations for each attribute identified by the trained machine learning model in the sample. These top-K lists are used to extract information from the pages of the cluster from which the sample of pages was taken.
    Type: Application
    Filed: February 27, 2009
    Publication date: September 2, 2010
    Inventors: Alok S. Kirpal, Sandeepkumar Bhuramal Satpal, Meghana Kshirsagar, Srinivasan H. Sengamedu