Patents by Inventor Rajesh M. Desai

Rajesh M. Desai has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 7701944
    Abstract: The present invention relates to a method for configuring a policy management protocol for a web crawler, the method further comprising the steps of determining a web space that is to be crawled by a web crawler, wherein the web space is comprised of an IP address and/or a range of IP addresses, and determining additional hostnames that are associated with the IP address and/ range of IP addresses. The method further comprises the steps of configuring the web crawler to crawl the IP address and/ range of IP addresses, and determine additional hostnames that are associated with the IP address or range of IP addresses, and performing a web crawling function upon the determined additional hostnames by the web crawler.
    Type: Grant
    Filed: January 19, 2007
    Date of Patent: April 20, 2010
    Assignee: International Business Machines Corporation
    Inventors: Varun Bhagwan, Rajesh M. Desai, Piyoosh Jalan
  • Publication number: 20080295148
    Abstract: The present invention relates to a method for configuring a policy management protocol for a web crawler, the method further comprising the steps of determining a web space that is to be crawled by a web crawler, wherein the web space is comprised of an IP address and/or a range of IP addresses, and determining additional hostnames that are associated with the IP address and/range of IP addresses. The method further comprises the steps of configuring the web crawler to crawl the IP address and/range of IP addresses, and determine additional hostnames that are associated with the IP address or range of IP addresses, and performing a web crawling function upon the determined additional hostnames by the web crawler.
    Type: Application
    Filed: May 30, 2008
    Publication date: November 27, 2008
    Applicant: International Business Machines Corporation
    Inventors: Varun Bhagwan, Rajesh M. Desai, Piyoosh Jalan
  • Publication number: 20080235163
    Abstract: As part of the normal crawling process, a crawler parses a page and computes a de-tagged hash, called a fingerprint, of the page content. A lookup structure consisting of the host hash (hash of the host portion of the URL) and the fingerprint of the page is maintained. Before the crawler writes a page to a store, this lookup structure is consulted. If the lookup structure already contains the tuple (i.e., host hash and fingerprint), then the page is not written to the store. Thus, a lot of duplicates are eliminated at the crawler itself, saving CPU and disk cycles which would otherwise be needed during current duplicate elimination processes.
    Type: Application
    Filed: March 22, 2007
    Publication date: September 25, 2008
    Inventors: Srinivasan Balasubramanian, Rajesh M. Desai, Piyoosh Jalan
  • Publication number: 20080175243
    Abstract: The present invention relates to a method for configuring a policy management protocol for a web crawler, the method further comprising the steps of determining a web space that is to be crawled by a web crawler, wherein the web space is comprised of an IP address and/or a range of IP addresses, and determining additional hostnames that are associated with the IP address and/ range of IP addresses. The method further comprises the steps of configuring the web crawler to crawl the IP address and/ range of IP addresses, and determine additional hostnames that are associated with the IP address or range of IP addresses, and performing a web crawling function upon the determined additional hostnames by the web crawler.
    Type: Application
    Filed: January 19, 2007
    Publication date: July 24, 2008
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Varun Bhagwan, Rajesh M. Desai, Piyoosh Jalan
  • Patent number: 7389310
    Abstract: A scale-out supercomputing environment includes a plurality of interconnected nodes arranged in a three-dimensional cubic grid and configured to perform a method of duplicate detection. The method includes at least computing a fingerprint of at least one document in the supercomputing environment to generate data packets from the at least one document and to generate a fixed size tuple of information from the at least one document, distributing the data packets to each node of the plurality of nodes to ensure all elements of the fixed size tuple fit into memory of the plurality of nodes, applying localized detection techniques to data packets on each node of the plurality of nodes to remove data packet duplicates, redistributing the data packets to each node of the plurality of nodes based on the document fingerprint, and performing a global merge of results of the localized detection techniques.
    Type: Grant
    Filed: March 10, 2008
    Date of Patent: June 17, 2008
    Assignee: International Business Machines Corporation
    Inventors: Varun Bhagwan, Rajesh M. Desai, Daniel F. Gruhl
  • Patent number: 7363329
    Abstract: A method for duplicate detection on web-scale data in a supercomputing environment includes computing a hash of at least one document in a computer system to generate data packets from the at least one document and to generate a fixed size tuple of information from the at least one document, distributing the data packets to each node of the plurality of nodes, applying localized detection techniques to data packets on each node of the plurality of nodes to remove data packet duplicates, redistributing the data packets to each node of the plurality of nodes based on the document fingerprint, reapplying the localized detection techniques on each node to the redistributed packets to remove exact data packet duplicates, and performing a global merge of results of the localized detection techniques in a distributed fashion.
    Type: Grant
    Filed: November 13, 2007
    Date of Patent: April 22, 2008
    Assignee: International Business Machines Corporation
    Inventors: Varun Bhagwan, Rajesh M. Desai, Daniel F. Gruhl