Patents by Inventor Keith H. Randall

Keith H. Randall has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Scheduler for search engine crawler

Patent number: 10621241

Abstract: Systems and methods for scheduling document crawling are provided in which a list of document identifiers is obtained. Each respective document identifier identifies a corresponding document on a network. For each respective document identifier in the list of document identifiers, a content change frequency of the corresponding document is determined and a first score for the document identifier that is a function of the determined content change frequency of the corresponding document is also determined. The first score is compared against a threshold value. The document is scheduled for crawling based on the result of the comparison. The content change frequency for a respective document identifier is determined by comparing information stored for successive downloads of the document corresponding to the document identifier.

Type: Grant

Filed: July 7, 2014

Date of Patent: April 14, 2020

Assignee: GOOGLE LLC

Inventor: Keith H. Randall
Scheduler for Search Engine Crawler

Publication number: 20140324818

Abstract: Systems and methods for scheduling document crawling are provided in which a list of document identifiers is obtained. Each respective document identifier identifies a corresponding document on a network. For each respective document identifier in the list of document identifiers, a content change frequency of the corresponding document is determined and a first score for the document identifier that is a function of the determined content change frequency of the corresponding document is also determined. The first score is compared against a threshold value. The document is scheduled for crawling based on the result of the comparison. The content change frequency for a respective document identifier is determined by comparing information stored for successive downloads of the document corresponding to the document identifier.

Type: Application

Filed: July 7, 2014

Publication date: October 30, 2014

Inventor: Keith H. RANDALL
System and method of measuring application resource usage

Patent number: 8799883

Abstract: The present invention relates generally to a system and method for measuring application memory use, and more particularly to measuring heap usage of each of a plurality of applications running inside a single heap. Preferred embodiments of the present invention work by traversing a set of objects in a heap. During this traversal, sets of strongly connected components are identified. Additionally, representative objects of the sets of strongly connected components are identified and a topological sort order of the objects is established. Further, during a second traversal of the objects, the topological sort order is used to identify one or more applications responsible for each of the strongly connected component sets. And, in the process, the resource usage of each application is computed.

Type: Grant

Filed: January 31, 2003

Date of Patent: August 5, 2014

Assignee: Hewlett-Packard Development Company, L. P.

Inventor: Keith H Randall
Scheduler for search engine crawler

Patent number: 8775403

Abstract: A scheduler for a search engine crawler includes a history log containing document identifiers (e.g., URLs) corresponding to documents (e.g., web pages) on a network (e.g., Internet). The scheduler is configured to process each document identifier in a set of the document identifiers by determining a content change frequency of the document corresponding to the document identifier, determining a first score for the document identifier that is a function of the determined content change frequency of the corresponding document, comparing the first score against a threshold value, and scheduling the corresponding document for indexing based on the results of the comparison.

Type: Grant

Filed: April 17, 2012

Date of Patent: July 8, 2014

Assignee: Google Inc.

Inventor: Keith H. Randall
Scheduler for Search Engine Crawler

Publication number: 20120317089

Abstract: A scheduler for a search engine crawler includes a history log containing document identifiers (e.g., URLs) corresponding to documents (e.g., web pages) on a network (e.g., Internet). The scheduler is configured to process each document identifier in a set of the document identifiers by determining a content change frequency of the document corresponding to the document identifier, determining a first score for the document identifier that is a function of the determined content change frequency of the corresponding document, comparing the first score against a threshold value, and scheduling the corresponding document for indexing based on the results of the comparison.

Type: Application

Filed: April 17, 2012

Publication date: December 13, 2012

Inventor: Keith H. Randall
Scheduler for search engine crawler

Patent number: 8161033

Abstract: A scheduler for a search engine crawler includes a history log containing document identifiers (e.g., URLs) corresponding to documents (e.g., web pages) on a network (e.g., Internet). The scheduler is configured to process each document identifier in a set of the document identifiers by determining a content change frequency of the document corresponding to the document identifier, determining a first score for the document identifier that is a function of the determined content change frequency of the corresponding document, comparing the first score against a threshold value, and scheduling the corresponding document for indexing based on the results of the comparison.

Type: Grant

Filed: May 25, 2010

Date of Patent: April 17, 2012

Assignee: Google Inc.

Inventor: Keith H. Randall
Scheduler for Search Engine Crawler

Publication number: 20100241621

Abstract: A scheduler for a search engine crawler includes a history log containing document identifiers (e.g., URLs) corresponding to documents (e.g., web pages) on a network (e.g., Internet). The scheduler is configured to process each document identifier in a set of the document identifiers by determining a content change frequency of the document corresponding to the document identifier, determining a first score for the document identifier that is a function of the determined content change frequency of the corresponding document, comparing the first score against a threshold value, and scheduling the corresponding document for indexing based on the results of the comparison.

Type: Application

Filed: May 25, 2010

Publication date: September 23, 2010

Inventor: Keith H. Randall
Scheduler for search engine crawler

Patent number: 7725452

Abstract: A scheduler for a search engine crawler includes a history log containing document identifiers (e.g., URLs) corresponding to documents (e.g., web pages) on a network (e.g., Internet). The scheduler is configured to process each document identifier in a set of the document identifiers by determining a content change frequency of the document corresponding to the document identifier, determining a first score for the document identifier that is a function of the determined content change frequency of the corresponding document, comparing the first score against a threshold value, and scheduling the corresponding document for indexing based on the results of the comparison.

Type: Grant

Filed: May 20, 2004

Date of Patent: May 25, 2010

Assignee: Google Inc.

Inventor: Keith H. Randall
System and method for data distribution

Patent number: 7568034

Abstract: A method of distributing files operates in a system having a master and a plurality of slaves, interconnected by a communications network. Each slave determines a current file length for each of a plurality of files and sends slave status information to the master, the slave status information including the current file length for each file. The master schedules copy operations based on the slave status information. The master stores bandwidth capability information indicating data transmission bandwidth capabilities for the resources required to transmit data between the slaves, and also stores bandwidth usage information indicating a total allocated bandwidth for each resource. For each schedule copy operation, an amount of data transmission bandwidth is allocated and the stored bandwidth usage information is updated accordingly. The master only schedules copy operations that do not cause the total allocated bandwidth of any resource to exceed the bandwidth capability of that resource.

Type: Grant

Filed: July 3, 2003

Date of Patent: July 28, 2009

Assignee: Google Inc.

Inventors: Daniel Dulitz, Sanjay Ghemawat, Bwolen Po-Jen Yang, Keith H. Randall, Anurag Acharya
System and method for storing connectivity information in a web database

Patent number: 7028039

Abstract: A web crawler system includes a central processing unit for performing computations in accordance with stored procedures and a network interface for accessing remotely located computers via a network. A web crawler module downloads pages from remotely located servers via the network interface. A first link processing module obtains page link information from the downloaded page; the page link information includes for each downloaded page a row of page identifiers of other pages. A second link processing module encodes the rows of page identifies in a space efficient manner. It arranges the rows of page identifiers in a particular order. For each respective row it identifies a prior row, if any, that best matches the respective row in accordance with predefined row match criteria, determines a set of deletes representing page identifiers in the identified prior row not in the respective row, and determines a set of adds representing page identifiers in the respective row not in the identifier prior row.

Type: Grant

Filed: January 18, 2001

Date of Patent: April 11, 2006

Assignee: Hewlett-Packard Development Company, L.P.

Inventors: Michael Burrows, Keith H. Randall, Raymond P. Stata, Rajiv G. Wickremesinghe
System and method of measuring application resource usage

Publication number: 20040154016

Abstract: The present invention relates generally to a system and method for measuring application memory use, and more particularly to measuring heap usage of each of a plurality of applications running inside a single heap. Preferred embodiments of the present invention work by traversing a set of objects in a heap. During this traversal, sets of strongly connected components are identified. Additionally, representative objects of the sets of strongly connected components are identified and a topological sort order of the objects is established. Further, during a second traversal of the objects, the topological sort order is used to identify one or more applications responsible for each of the strongly connected component sets. And, in the process, the resource usage of each application is computed.

Type: Application

Filed: January 31, 2003

Publication date: August 5, 2004

Inventor: Keith H. Randall
System and method for identifying related fields

Publication number: 20030237079

Abstract: A system and method that establishes a list of one or more possible field pairs, which comprise an array field and an integer field of an object included in a computer program. A portion of the computer program is then scanned for references to possible field pairs included in the list. Each possible field pair corresponding to an invalid combination of references is removed from the list. An invalid combination of references precludes confirmation of an invariant relationship of a given possible field pair. The field pairs remaining on the list after this removal process are considered actual field pairs. Next, the invariant relationship of the field pairs remaining on the list is confirmed. Machine code is then generated for the computer program such that array bounds checks corresponding to a given field pair is not included in the machine code if the invariant relationship is confirmed.

Type: Application

Filed: January 31, 2003

Publication date: December 25, 2003

Inventors: Aneesh Aggarwal, Keith H. Randall
System and method for storing connectivity information in a web database

Publication number: 20020138509

Abstract: A web crawler system includes a central processing unit for performing computations in accordance with stored procedures and a network interface for accessing remotely located computers via a network. A web crawler module downloads pages from remotely located servers via the network interface. A first link processing module obtains page link information from the downloaded page; the page link information includes for each downloaded page a row of page identifiers of other pages. A second link processing module encodes the rows of page identifies in a space efficient manner. It arranges the rows of page identifiers in a particular order. For each respective row it identifies a prior row, if any, that best matches the respective row in accordance with predefined row match criteria, determines a set of deletes representing page identifiers in the identified prior row not in the respective row, and determines a set of adds representing page identifiers in the respective row not in the identifier prior row.

Type: Application

Filed: January 18, 2001

Publication date: September 26, 2002

Inventors: Michael Burrows, Keith H. Randall, Raymond P. Stata, Rajiv G. Wickremesinghe