Abstract: A method receives a second data set that is different from a first data set. A total number of operations based on the second data set using an operation estimator is generated. Also, an aggregate resource cost for the total number of operations based on the second data set using a resource cost estimator is generated. The method generates a simulation driver file including a sequence of operations from the total number of operations and a resource cost for each operation in the sequence of operations from the aggregate resource cost. The method simulates the sequence of operations by performing: requesting an amount of resource used by a respective operation on the simulated distributed computing system; reserving the amount of resource when available in the simulated distributed computing system without executing the respective operation; and calculating a time period associated with a simulated execution time of the respective operation.
Abstract: In one embodiment, a method selects a new job to schedule for execution on a data processing system. The new job includes a. Performance information for a set of current jobs that are being executed in the data processing system is retrieved where the set of jobs are assigned to queues and currently classified with a current classification. The method analyzes the performance information to determine when one or more current jobs in the set of current jobs should be re-classified due to resource usage of a respective current job when being executed in the data processing system and re-classifies the classifications for the one or more current jobs in the queues. Then, the new job is assigned to one of the queues based on the classification of the new job and the classifications of jobs in the queues including the re-classified classifications for the one or more current jobs.