Patents by Inventor Keith H. Randall
Keith H. Randall has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 10621241Abstract: Systems and methods for scheduling document crawling are provided in which a list of document identifiers is obtained. Each respective document identifier identifies a corresponding document on a network. For each respective document identifier in the list of document identifiers, a content change frequency of the corresponding document is determined and a first score for the document identifier that is a function of the determined content change frequency of the corresponding document is also determined. The first score is compared against a threshold value. The document is scheduled for crawling based on the result of the comparison. The content change frequency for a respective document identifier is determined by comparing information stored for successive downloads of the document corresponding to the document identifier.Type: GrantFiled: July 7, 2014Date of Patent: April 14, 2020Assignee: GOOGLE LLCInventor: Keith H. Randall
-
Publication number: 20140324818Abstract: Systems and methods for scheduling document crawling are provided in which a list of document identifiers is obtained. Each respective document identifier identifies a corresponding document on a network. For each respective document identifier in the list of document identifiers, a content change frequency of the corresponding document is determined and a first score for the document identifier that is a function of the determined content change frequency of the corresponding document is also determined. The first score is compared against a threshold value. The document is scheduled for crawling based on the result of the comparison. The content change frequency for a respective document identifier is determined by comparing information stored for successive downloads of the document corresponding to the document identifier.Type: ApplicationFiled: July 7, 2014Publication date: October 30, 2014Inventor: Keith H. RANDALL
-
Patent number: 8799883Abstract: The present invention relates generally to a system and method for measuring application memory use, and more particularly to measuring heap usage of each of a plurality of applications running inside a single heap. Preferred embodiments of the present invention work by traversing a set of objects in a heap. During this traversal, sets of strongly connected components are identified. Additionally, representative objects of the sets of strongly connected components are identified and a topological sort order of the objects is established. Further, during a second traversal of the objects, the topological sort order is used to identify one or more applications responsible for each of the strongly connected component sets. And, in the process, the resource usage of each application is computed.Type: GrantFiled: January 31, 2003Date of Patent: August 5, 2014Assignee: Hewlett-Packard Development Company, L. P.Inventor: Keith H Randall
-
Patent number: 8775403Abstract: A scheduler for a search engine crawler includes a history log containing document identifiers (e.g., URLs) corresponding to documents (e.g., web pages) on a network (e.g., Internet). The scheduler is configured to process each document identifier in a set of the document identifiers by determining a content change frequency of the document corresponding to the document identifier, determining a first score for the document identifier that is a function of the determined content change frequency of the corresponding document, comparing the first score against a threshold value, and scheduling the corresponding document for indexing based on the results of the comparison.Type: GrantFiled: April 17, 2012Date of Patent: July 8, 2014Assignee: Google Inc.Inventor: Keith H. Randall
-
Publication number: 20120317089Abstract: A scheduler for a search engine crawler includes a history log containing document identifiers (e.g., URLs) corresponding to documents (e.g., web pages) on a network (e.g., Internet). The scheduler is configured to process each document identifier in a set of the document identifiers by determining a content change frequency of the document corresponding to the document identifier, determining a first score for the document identifier that is a function of the determined content change frequency of the corresponding document, comparing the first score against a threshold value, and scheduling the corresponding document for indexing based on the results of the comparison.Type: ApplicationFiled: April 17, 2012Publication date: December 13, 2012Inventor: Keith H. Randall
-
Patent number: 8161033Abstract: A scheduler for a search engine crawler includes a history log containing document identifiers (e.g., URLs) corresponding to documents (e.g., web pages) on a network (e.g., Internet). The scheduler is configured to process each document identifier in a set of the document identifiers by determining a content change frequency of the document corresponding to the document identifier, determining a first score for the document identifier that is a function of the determined content change frequency of the corresponding document, comparing the first score against a threshold value, and scheduling the corresponding document for indexing based on the results of the comparison.Type: GrantFiled: May 25, 2010Date of Patent: April 17, 2012Assignee: Google Inc.Inventor: Keith H. Randall
-
Publication number: 20100241621Abstract: A scheduler for a search engine crawler includes a history log containing document identifiers (e.g., URLs) corresponding to documents (e.g., web pages) on a network (e.g., Internet). The scheduler is configured to process each document identifier in a set of the document identifiers by determining a content change frequency of the document corresponding to the document identifier, determining a first score for the document identifier that is a function of the determined content change frequency of the corresponding document, comparing the first score against a threshold value, and scheduling the corresponding document for indexing based on the results of the comparison.Type: ApplicationFiled: May 25, 2010Publication date: September 23, 2010Inventor: Keith H. Randall
-
Patent number: 7725452Abstract: A scheduler for a search engine crawler includes a history log containing document identifiers (e.g., URLs) corresponding to documents (e.g., web pages) on a network (e.g., Internet). The scheduler is configured to process each document identifier in a set of the document identifiers by determining a content change frequency of the document corresponding to the document identifier, determining a first score for the document identifier that is a function of the determined content change frequency of the corresponding document, comparing the first score against a threshold value, and scheduling the corresponding document for indexing based on the results of the comparison.Type: GrantFiled: May 20, 2004Date of Patent: May 25, 2010Assignee: Google Inc.Inventor: Keith H. Randall
-
Patent number: 7568034Abstract: A method of distributing files operates in a system having a master and a plurality of slaves, interconnected by a communications network. Each slave determines a current file length for each of a plurality of files and sends slave status information to the master, the slave status information including the current file length for each file. The master schedules copy operations based on the slave status information. The master stores bandwidth capability information indicating data transmission bandwidth capabilities for the resources required to transmit data between the slaves, and also stores bandwidth usage information indicating a total allocated bandwidth for each resource. For each schedule copy operation, an amount of data transmission bandwidth is allocated and the stored bandwidth usage information is updated accordingly. The master only schedules copy operations that do not cause the total allocated bandwidth of any resource to exceed the bandwidth capability of that resource.Type: GrantFiled: July 3, 2003Date of Patent: July 28, 2009Assignee: Google Inc.Inventors: Daniel Dulitz, Sanjay Ghemawat, Bwolen Po-Jen Yang, Keith H. Randall, Anurag Acharya
-
Patent number: 7028039Abstract: A web crawler system includes a central processing unit for performing computations in accordance with stored procedures and a network interface for accessing remotely located computers via a network. A web crawler module downloads pages from remotely located servers via the network interface. A first link processing module obtains page link information from the downloaded page; the page link information includes for each downloaded page a row of page identifiers of other pages. A second link processing module encodes the rows of page identifies in a space efficient manner. It arranges the rows of page identifiers in a particular order. For each respective row it identifies a prior row, if any, that best matches the respective row in accordance with predefined row match criteria, determines a set of deletes representing page identifiers in the identified prior row not in the respective row, and determines a set of adds representing page identifiers in the respective row not in the identifier prior row.Type: GrantFiled: January 18, 2001Date of Patent: April 11, 2006Assignee: Hewlett-Packard Development Company, L.P.Inventors: Michael Burrows, Keith H. Randall, Raymond P. Stata, Rajiv G. Wickremesinghe
-
Publication number: 20040154016Abstract: The present invention relates generally to a system and method for measuring application memory use, and more particularly to measuring heap usage of each of a plurality of applications running inside a single heap. Preferred embodiments of the present invention work by traversing a set of objects in a heap. During this traversal, sets of strongly connected components are identified. Additionally, representative objects of the sets of strongly connected components are identified and a topological sort order of the objects is established. Further, during a second traversal of the objects, the topological sort order is used to identify one or more applications responsible for each of the strongly connected component sets. And, in the process, the resource usage of each application is computed.Type: ApplicationFiled: January 31, 2003Publication date: August 5, 2004Inventor: Keith H. Randall
-
Publication number: 20030237079Abstract: A system and method that establishes a list of one or more possible field pairs, which comprise an array field and an integer field of an object included in a computer program. A portion of the computer program is then scanned for references to possible field pairs included in the list. Each possible field pair corresponding to an invalid combination of references is removed from the list. An invalid combination of references precludes confirmation of an invariant relationship of a given possible field pair. The field pairs remaining on the list after this removal process are considered actual field pairs. Next, the invariant relationship of the field pairs remaining on the list is confirmed. Machine code is then generated for the computer program such that array bounds checks corresponding to a given field pair is not included in the machine code if the invariant relationship is confirmed.Type: ApplicationFiled: January 31, 2003Publication date: December 25, 2003Inventors: Aneesh Aggarwal, Keith H. Randall
-
Publication number: 20020138509Abstract: A web crawler system includes a central processing unit for performing computations in accordance with stored procedures and a network interface for accessing remotely located computers via a network. A web crawler module downloads pages from remotely located servers via the network interface. A first link processing module obtains page link information from the downloaded page; the page link information includes for each downloaded page a row of page identifiers of other pages. A second link processing module encodes the rows of page identifies in a space efficient manner. It arranges the rows of page identifiers in a particular order. For each respective row it identifies a prior row, if any, that best matches the respective row in accordance with predefined row match criteria, determines a set of deletes representing page identifiers in the identified prior row not in the respective row, and determines a set of adds representing page identifiers in the respective row not in the identifier prior row.Type: ApplicationFiled: January 18, 2001Publication date: September 26, 2002Inventors: Michael Burrows, Keith H. Randall, Raymond P. Stata, Rajiv G. Wickremesinghe