Method for dynamic scheduling in a distributed environment
A method and system is provided for assigning programs in a workflow to one or more nodes for execution. Prior to the assignment, a priority of execution of each program is calculated in relation to its dependency upon data received and transmitted data. Based upon the calculated priority and the state of each of the nodes, the programs in the workflow are dynamically assigned to one or more nodes for execution. In addition to the node assignment based upon priority, preemptive execution of the programs in the workflow is determined so that the programs in the workflow may not preemptively be executed at a selected node in response to the determination.
1. Technical Field
This invention relates to a method and system for dynamically scheduling programs for execution on one or more nodes.
2. Description of the Prior Art
A directed acyclic graph (DAG) includes a set of nodes connected by a set of edges. Each node represents a task, and the weight of the node is the execution time of the task. Each edge represents a message transferred from one node to another node, with its weight being the transmission time of the message. Scheduling programs for execution onto processors is a crucial component of a parallel processing system. There are generally two categories of prior art scheduler using DAGs: centralized and decentralized (not shown). An example of a centralized scheduler (10) is shown in
In the decentralized scheduler, a plurality of independent schedulers are provided. The benefit associated with the decentralized scheduler is the scalability in a multinode system. However, the negative aspect of the decentralized scheduler is complexity of control and communication among the schedulers to efficient allocate resources in a sequential manner to reduce operation and transmission costs associated with transferring data across nodes for execution of dependent programs. Accordingly, there is an increased communication cost associated with a decentralized scheduler.
There is therefore a need for a method and system to efficiently assign resources based upon a plurality of execution requests for a set of programs having execution dependency with costs associated with data transfer and processing accounted for in a dynamic manner.
SUMMARY OF THE INVENTIONThis invention comprises a method and system for dynamically scheduling execution of a program among two or more processor nodes.
In one aspect of the invention a method is provided for assigning resources to a plurality of processing nodes. Priority of execution dependency of a program is decided. In response to the decision, the program is dynamically assigned to a node based upon the priority and in accordance with a state of each node in a multinode system. Preemptive execution of the program is determined, and the program is executed at a designated node non-preemptively in response to a positive determination.
In another aspect of the invention, a system is provided with a plurality of operating nodes, and a scheduling manager to decide priority of execution dependency of a program. A global scheduler is also provided to dynamically assign the program to one of the nodes based upon the priority and a state of each node in the system. In addition, a program manager is provided to determine applicability of preemptive execution of the program, and to non-preemptively execute the program at a designated node in response to a positive determination.
In a further aspect of the invention, an article is provided with a computer-readable signal-bearing medium with a plurality of operating nodes in the medium. Means in the medium are provided for deciding priority of execution dependency of a program. In addition, means in the medium are provided for dynamically assigning the program to one of the nodes based upon the priority and a state of each node in the system. Means in the medium are provided for determining applicability of preemptive execution of the program, and to non-preemptively execute the program at a designated node in response to a positive determination.
Other features and advantages of this invention will become apparent from the following detailed description of the presently preferred embodiment of the invention, taken in conjunction with the accompanying drawings.
BRIEF DESCRIPTION OF THE DRAWINGS
A grid environment (50) is shown in
As mentioned above, the workflow analysis (112) of
Following the sorting process at step (178), priority is assigned to each group (180). The process of assigning priority to each group is applied recursively for each program constituting the strongly connected component group (182) by returning to step (172). The decision of priority Pi is given to group Gi,s and the priority Pi,j is given to the jth group Gi,j in a range of Pi<Pi,j<Pj+1, such that Pi,j<Pj+i in the sequence acquired by topologically sorting the DAG created by excluding the input into Gi,s as the root. The purpose of normalizing the priority of each program is to enable programs in different program sets to be executed with the same presence. That is, when there are nodes for computing and the program sets have an equal total computation time, in situations when program sets request execution at the same time, the computation can be ended at the same time given the equal computation time between the sets. However, in a case where a program set includes a preferential request, the request includes a weight value. The priority assigned to the program is then multiplied by the weight value and applied to the scheduling method described above. Accordingly, it is required that the programs within the groups be recursively split into strongly connected components to decide the priority.
Following the assignment of priority to a group of programs, as well as each program within a group (152), a test is conducted to determine if the program or set of programs can be assigned to a logical node to minimize the transfer of data between programs when analyzing execution dependency (154). The determination at step (154) is based upon whether the computation and/or transmission costs can be estimated.
Following the process of calculating the costs associated with execution of a program or group of programs, each of the programs or program groups is assigned to one or more logical nodes (156). The assignment to the logical nodes is stored (158) in the workflow database (64) of the global scheduler (60) and is utilized for scheduling execution of associated programs on actual nodes.
Thereafter, the execution condition of the next program is checked and submitted to the queue (254). Step (254) includes providing a priority parameter, i, to a newly executable program. The priority parameter is defined as pi={bi, di, mi}, where bi is the priority given to the entire program, di is the priority based on the dependency relation of each program in the execution dependency, and mi is the priority based on the correspondence relation between the logical node assignment and the actual node assignment. The priority mi has the highest priority when the node to be assigned and the actually assigned node mapped from the logical node for the program are matched. The next highest priority is when the logical node is not assigned to the actual node, and the lowest priority is when the node to be assigned to the program(s) is different from the mapped assignment. The entries in the wait queue are sorted based upon the priority parameters. The sorting is made based upon the following precedence: mi<di, bi, i.e. after the sorting based on mi is complete, the sorting is then based on di, followed by sorting based on bi. Following step (254), a node capable of executing a program or a set of programs is selected (256). The node selection process is based upon prior calculated costs, priority, and availability. A test is then conducted (258) to determine of the node selected at step (256) exists. A negative response to the test at step (258) will result in a return to step (252). However, a positive response to the test at step (258) will result in selection of a program or a set of programs for the transfer from the logical node assignment to the physical node (260). A test is then conducted to determine if the program(s) exist (262). If the response to the test at step (262) is negative, the scheduling returns to step (256). However, if the response to the test at step (262) is positive, a new map is created and the program is assigned to the actual node (264). Thereafter, required data transmission is requested for the program input (266), the program is submitted to the physical node's local queue (268), and a program assignment event is generated (270) followed by a return to step (260). Accordingly, the process of scheduling and executing a program includes mapping the program to an actual node for execution.
The global scheduler dynamically assigns resources while optimizing overhead. Assignment of a workflow to a logical node is employed to mitigate communication and transmission costs associated with execution of a plurality of programs in the workflow by a plurality of nodes in the system. The priority of each program is normalized and sorted in the order of priority. Accordingly, the use of the global scheduler in conjunction with logical node assignments supports cost effective assignment of programs in a workflow to an optimal mode.
Alternative EmbodimentsIt will be appreciated that, although specific embodiments of the invention have been described herein for purposes of illustration, various modifications may be made without departing from the spirit and scope of the invention. In particular, the assignment of programs in a workflow to a logical node to determine communication and transmission costs may be removed to allow the programs to be forwarded directly to a node having a local queue. Accordingly, the scope of protection of this invention is limited only by the following claims and their equivalents.
Claims
1. A method for assigning resources to a plurality of processing nodes comprising:
- deciding priority of execution dependency of a program;
- dynamically assigning said program to a node based upon said priority and in accordance with a state of each node in a multinode system;
- determining preemptive execution of said program; and
- executing said program at a designated node non-preemptively in response to a positive determination.
2. The method of claim 1, wherein said priority is based upon a criteria selected from a group consisting of: topological sorting method, and a shortest path length from a start program.
3. The method of claim 1, wherein the step of deciding priority of execution dependency of a program includes normalizing said priority.
4. The method of claim 1, wherein the step of deciding priority of execution dependency of a program includes assigning said program to a logical node based upon an estimated computation and transmission cost.
5. The method of claim 4, further comprising storing said logical node assignment in a workflow database of a global scheduler based upon said estimated costs.
6. The method of claim 1, wherein the step of dynamically assigning said program to a node includes assigning said program to a physical node at time of execution.
7. The method of claim 1, wherein the step of dynamically assigning said program to a node includes assigning said program to a logical node based upon estimated computation and transmission costs.
8. A system comprising:
- a plurality of operating nodes;
- a scheduling manager adapted to decide priority of execution dependency of a program;
- a global scheduler adapted to dynamically assign said program to a node based upon said priority and a state of each node in said system; and
- a program manager adapted to determine applicability of preemptive execution of said program, and to non-preemptively execute said program at a designated node in response to a positive determination.
9. The system of claim 8, wherein said priority is based upon a criteria selected from a group consisting of: topological sorting method, and a shortest path length from a start program.
10. The system of claim 8, wherein said scheduling manager is adapted to normalize said priority.
11. The system of claim 8, wherein said scheduling manager is adapted to assign said program to a logical node based upon estimated computation and transmission costs.
12. The system of claim 11, further comprising a workflow database adapted to store said logical node assignment based upon said estimated costs.
13. The system of claim 8, wherein said global scheduler is adapted to assign said program to a physical node at time of execution.
14. An article comprising:
- a computer-readable signal-bearing medium;
- a plurality of operating nodes in said medium;
- means in said medium for deciding priority of execution dependency of a program;
- means in said medium for dynamically assigning said program to one of said nodes based upon said priority and a state of each node in said system; and
- means in said medium for determining applicability of preemptive execution of said program, and to non-preemptively execute said program at a designated node in response to a positive determination.
15. The article of claim 14, wherein said medium is selected from a group consisting of: a recordable data storage medium, and a modulated carrier signal.
16. The article of claim 14, wherein said means for deciding priority of execution dependency includes criteria selected from a group consisting of: topological sorting method, and a shortest path length from a start program.
17. The article of claim 14, wherein said means for dynamically assigning said program to a node based upon said priority and a state of each node in said system normalizes priority of execution.
18. The article of claim 14, wherein said means for deciding priority of execution dependency of a program includes assigning said program to a node based upon said priority and a state of each node in said system assigns said program to a logical node based upon estimated computation and transmission costs.
19. The article claim 18, further comprising means in the medium for storing said logical node assignment based upon said estimated costs.
20. The article of claim 14, wherein said means for dynamically assigning said program to one of said node based upon said priority and state of each node includes assigning said program to a physical node at time of execution.
Type: Application
Filed: Nov 22, 2004
Publication Date: May 25, 2006
Inventors: Masaaki Taniguchi (Yamato-shi), Harunobu Kubo (Yamato-shi)
Application Number: 10/994,852
International Classification: G06F 9/46 (20060101);