Abstract: Systems and methods are provided for ranking document data retrieved from a data source in response to a search request. A ranking system retrieves document data from documents in the data source that each includes at least one key term that matches a search term in the search request. For each document, a term frequency value is calculated based on a number of occurrences of the key term in the document. Prefix and suffix term rules are used to determine whether a particular occurrence of the key term in a particular document should be included in determining a term weight value for that particular occurrence of the key term. A relevancy ranking value is determined for each document based on the corresponding term frequency and term weight values. The document data is displayed according to each document's corresponding relevancy ranking value.
Abstract: Systems and methods are provided for crawling and indexing documents stored in a data storage system. A crawler system processes multiple jobs that each correspond to crawling documents in the data storage system. Each job includes priority data and crawling instructions. The crawler system stores each job in a priority queue in a sequence based on the priority data. The crawler system assigns each job in the priority queue to a next available processing module for processing based on the stored sequence. Before processing each job, the crawler system determines whether to segment the job into smaller steps based on the corresponding crawling instructions. If the job is segmented, one of smaller steps is processed to crawl a group of the documents in the data storage system. The remaining steps are stored in the priority queue to wait for processing.
Abstract: Systems and methods are provided for ranking document data retrieved from a data source in response to a search request. A ranking system retrieves document data from documents in the data source that each includes at least one key term that matches a search term in the search request. For each document, a term frequency value is calculated based on a number of occurrences of the key term in the document. Prefix and suffix term rules are used to determine whether a particular occurrence of the key term in a particular document should be included in determining a term weight value for that particular occurrence of the key term. A relevancy ranking value is determined for each document based on the corresponding term frequency and term weight values. The document data is displayed according to each document's corresponding relevancy ranking value.