Patents by Inventor Keith H. Randall

Keith H. Randall has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10621241
    Abstract: Systems and methods for scheduling document crawling are provided in which a list of document identifiers is obtained. Each respective document identifier identifies a corresponding document on a network. For each respective document identifier in the list of document identifiers, a content change frequency of the corresponding document is determined and a first score for the document identifier that is a function of the determined content change frequency of the corresponding document is also determined. The first score is compared against a threshold value. The document is scheduled for crawling based on the result of the comparison. The content change frequency for a respective document identifier is determined by comparing information stored for successive downloads of the document corresponding to the document identifier.
    Type: Grant
    Filed: July 7, 2014
    Date of Patent: April 14, 2020
    Assignee: GOOGLE LLC
    Inventor: Keith H. Randall
  • Publication number: 20140324818
    Abstract: Systems and methods for scheduling document crawling are provided in which a list of document identifiers is obtained. Each respective document identifier identifies a corresponding document on a network. For each respective document identifier in the list of document identifiers, a content change frequency of the corresponding document is determined and a first score for the document identifier that is a function of the determined content change frequency of the corresponding document is also determined. The first score is compared against a threshold value. The document is scheduled for crawling based on the result of the comparison. The content change frequency for a respective document identifier is determined by comparing information stored for successive downloads of the document corresponding to the document identifier.
    Type: Application
    Filed: July 7, 2014
    Publication date: October 30, 2014
    Inventor: Keith H. RANDALL
  • Patent number: 8799883
    Abstract: The present invention relates generally to a system and method for measuring application memory use, and more particularly to measuring heap usage of each of a plurality of applications running inside a single heap. Preferred embodiments of the present invention work by traversing a set of objects in a heap. During this traversal, sets of strongly connected components are identified. Additionally, representative objects of the sets of strongly connected components are identified and a topological sort order of the objects is established. Further, during a second traversal of the objects, the topological sort order is used to identify one or more applications responsible for each of the strongly connected component sets. And, in the process, the resource usage of each application is computed.
    Type: Grant
    Filed: January 31, 2003
    Date of Patent: August 5, 2014
    Assignee: Hewlett-Packard Development Company, L. P.
    Inventor: Keith H Randall
  • Patent number: 8775403
    Abstract: A scheduler for a search engine crawler includes a history log containing document identifiers (e.g., URLs) corresponding to documents (e.g., web pages) on a network (e.g., Internet). The scheduler is configured to process each document identifier in a set of the document identifiers by determining a content change frequency of the document corresponding to the document identifier, determining a first score for the document identifier that is a function of the determined content change frequency of the corresponding document, comparing the first score against a threshold value, and scheduling the corresponding document for indexing based on the results of the comparison.
    Type: Grant
    Filed: April 17, 2012
    Date of Patent: July 8, 2014
    Assignee: Google Inc.
    Inventor: Keith H. Randall
  • Publication number: 20120317089
    Abstract: A scheduler for a search engine crawler includes a history log containing document identifiers (e.g., URLs) corresponding to documents (e.g., web pages) on a network (e.g., Internet). The scheduler is configured to process each document identifier in a set of the document identifiers by determining a content change frequency of the document corresponding to the document identifier, determining a first score for the document identifier that is a function of the determined content change frequency of the corresponding document, comparing the first score against a threshold value, and scheduling the corresponding document for indexing based on the results of the comparison.
    Type: Application
    Filed: April 17, 2012
    Publication date: December 13, 2012
    Inventor: Keith H. Randall
  • Patent number: 8161033
    Abstract: A scheduler for a search engine crawler includes a history log containing document identifiers (e.g., URLs) corresponding to documents (e.g., web pages) on a network (e.g., Internet). The scheduler is configured to process each document identifier in a set of the document identifiers by determining a content change frequency of the document corresponding to the document identifier, determining a first score for the document identifier that is a function of the determined content change frequency of the corresponding document, comparing the first score against a threshold value, and scheduling the corresponding document for indexing based on the results of the comparison.
    Type: Grant
    Filed: May 25, 2010
    Date of Patent: April 17, 2012
    Assignee: Google Inc.
    Inventor: Keith H. Randall
  • Publication number: 20100241621
    Abstract: A scheduler for a search engine crawler includes a history log containing document identifiers (e.g., URLs) corresponding to documents (e.g., web pages) on a network (e.g., Internet). The scheduler is configured to process each document identifier in a set of the document identifiers by determining a content change frequency of the document corresponding to the document identifier, determining a first score for the document identifier that is a function of the determined content change frequency of the corresponding document, comparing the first score against a threshold value, and scheduling the corresponding document for indexing based on the results of the comparison.
    Type: Application
    Filed: May 25, 2010
    Publication date: September 23, 2010
    Inventor: Keith H. Randall
  • Patent number: 7725452
    Abstract: A scheduler for a search engine crawler includes a history log containing document identifiers (e.g., URLs) corresponding to documents (e.g., web pages) on a network (e.g., Internet). The scheduler is configured to process each document identifier in a set of the document identifiers by determining a content change frequency of the document corresponding to the document identifier, determining a first score for the document identifier that is a function of the determined content change frequency of the corresponding document, comparing the first score against a threshold value, and scheduling the corresponding document for indexing based on the results of the comparison.
    Type: Grant
    Filed: May 20, 2004
    Date of Patent: May 25, 2010
    Assignee: Google Inc.
    Inventor: Keith H. Randall
  • Patent number: 7568034
    Abstract: A method of distributing files operates in a system having a master and a plurality of slaves, interconnected by a communications network. Each slave determines a current file length for each of a plurality of files and sends slave status information to the master, the slave status information including the current file length for each file. The master schedules copy operations based on the slave status information. The master stores bandwidth capability information indicating data transmission bandwidth capabilities for the resources required to transmit data between the slaves, and also stores bandwidth usage information indicating a total allocated bandwidth for each resource. For each schedule copy operation, an amount of data transmission bandwidth is allocated and the stored bandwidth usage information is updated accordingly. The master only schedules copy operations that do not cause the total allocated bandwidth of any resource to exceed the bandwidth capability of that resource.
    Type: Grant
    Filed: July 3, 2003
    Date of Patent: July 28, 2009
    Assignee: Google Inc.
    Inventors: Daniel Dulitz, Sanjay Ghemawat, Bwolen Po-Jen Yang, Keith H. Randall, Anurag Acharya
  • Patent number: 7028039
    Abstract: A web crawler system includes a central processing unit for performing computations in accordance with stored procedures and a network interface for accessing remotely located computers via a network. A web crawler module downloads pages from remotely located servers via the network interface. A first link processing module obtains page link information from the downloaded page; the page link information includes for each downloaded page a row of page identifiers of other pages. A second link processing module encodes the rows of page identifies in a space efficient manner. It arranges the rows of page identifiers in a particular order. For each respective row it identifies a prior row, if any, that best matches the respective row in accordance with predefined row match criteria, determines a set of deletes representing page identifiers in the identified prior row not in the respective row, and determines a set of adds representing page identifiers in the respective row not in the identifier prior row.
    Type: Grant
    Filed: January 18, 2001
    Date of Patent: April 11, 2006
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Michael Burrows, Keith H. Randall, Raymond P. Stata, Rajiv G. Wickremesinghe
  • Publication number: 20040154016
    Abstract: The present invention relates generally to a system and method for measuring application memory use, and more particularly to measuring heap usage of each of a plurality of applications running inside a single heap. Preferred embodiments of the present invention work by traversing a set of objects in a heap. During this traversal, sets of strongly connected components are identified. Additionally, representative objects of the sets of strongly connected components are identified and a topological sort order of the objects is established. Further, during a second traversal of the objects, the topological sort order is used to identify one or more applications responsible for each of the strongly connected component sets. And, in the process, the resource usage of each application is computed.
    Type: Application
    Filed: January 31, 2003
    Publication date: August 5, 2004
    Inventor: Keith H. Randall
  • Publication number: 20030237079
    Abstract: A system and method that establishes a list of one or more possible field pairs, which comprise an array field and an integer field of an object included in a computer program. A portion of the computer program is then scanned for references to possible field pairs included in the list. Each possible field pair corresponding to an invalid combination of references is removed from the list. An invalid combination of references precludes confirmation of an invariant relationship of a given possible field pair. The field pairs remaining on the list after this removal process are considered actual field pairs. Next, the invariant relationship of the field pairs remaining on the list is confirmed. Machine code is then generated for the computer program such that array bounds checks corresponding to a given field pair is not included in the machine code if the invariant relationship is confirmed.
    Type: Application
    Filed: January 31, 2003
    Publication date: December 25, 2003
    Inventors: Aneesh Aggarwal, Keith H. Randall
  • Publication number: 20020138509
    Abstract: A web crawler system includes a central processing unit for performing computations in accordance with stored procedures and a network interface for accessing remotely located computers via a network. A web crawler module downloads pages from remotely located servers via the network interface. A first link processing module obtains page link information from the downloaded page; the page link information includes for each downloaded page a row of page identifiers of other pages. A second link processing module encodes the rows of page identifies in a space efficient manner. It arranges the rows of page identifiers in a particular order. For each respective row it identifies a prior row, if any, that best matches the respective row in accordance with predefined row match criteria, determines a set of deletes representing page identifiers in the identified prior row not in the respective row, and determines a set of adds representing page identifiers in the respective row not in the identifier prior row.
    Type: Application
    Filed: January 18, 2001
    Publication date: September 26, 2002
    Inventors: Michael Burrows, Keith H. Randall, Raymond P. Stata, Rajiv G. Wickremesinghe