SERVER FARM AND METHOD FOR OPERATING THE SAME

A method for operating a server farm with a plurality of servers operably connected with each other includes: receiving a job request of a computational task to be handled by the server farm; determining, from the plurality of servers, one or more servers operable to accept the job request; determining a respective effective energy efficiency value associated with at least the one or more servers; and assigning the computational task to a server with the highest effective energy efficiency value. The effective energy efficiency value is defined by a service rate of the respective server divided by a difference between an energy consumption rate value when the respective server is busy and an energy consumption rate value when the respective server is idle. The present invention also relates to a server farm operated by the method.

Skip to: Description  ·  Claims  · Patent History  ·  Patent History
Description
TECHNICAL FIELD

The present invention relates to a system and method for operating a server farm, and particularly, although not exclusively, to an asymptotically optimal job assignment method for operating an energy-efficient processor sharing server farms.

BACKGROUND

Data centers with server farms are essential to the functioning of computer systems in different applications and sectors in the modern economy. Generally, server farms in data centers include a large number of servers that consume power during operation to process and handle jobs or computational tasks. These servers account for the major portion of energy consumption of data centers.

Since excessive power consumption in server farm may increase operation cost and cause environmental concerns, various approaches have been proposed to optimize energy utilization in server farms. In one example, speed scaling is applied to control server speed. In another example, right-sizing of server farms is applied by powering servers on/off according to traffic load.

Rapid improvements in computer hardware have resulted in frequent upgrades of parts of the server farms, and this has led to server farms with different computer resources (heterogeneous servers) being deployed. The heterogeneity of servers in server farm significantly complicates the optimization of energy utilization. Therefore, there remains a need for server farm designers and/or operators to devise an optimal strategy in operating and managing server farms so as to conserve energy and maximize the effective energy efficiency of server farms.

SUMMARY OF THE INVENTION

In accordance with a first aspect of the present invention, there is provided a method for operating a server farm with a plurality of servers operably connected with each other, the method comprising the steps of: receiving a job request of a computational task to be handled by the server farm; determining, from the plurality of servers, one or more servers operable to accept the job request; determining a respective effective energy efficiency value associated with at least the one or more servers; and assigning the computational task to a server with the highest effective energy efficiency value; wherein the effective energy efficiency value is defined by: a service rate of the respective server divided by a difference between an energy consumption rate value when the respective server is busy (performing computational tasks) and an energy consumption rate value when the respective server is idle (not performing computational tasks). Preferably, the method steps can be in different order as listed as long as they could be logically rearranged. For example, the job request could be received after the one or more servers operable to accept the job request are determined. Optionally, the respective effective energy efficiency values associated with all of the servers, instead of only those operable to accept the job request, are determined.

In one embodiment of the first aspect, the method further comprises sorting the one or more servers according to the respective determined effective energy efficiency values. The sorting could be in ascending or descending order.

In one embodiment of the first aspect, the step of determining from the plurality of servers one or more servers operable to accept the job request comprises determining, from the plurality of servers, all servers operable to accept the job request.

In one embodiment of the first aspect, the plurality of servers cannot be powered off during operation of the server farm. In one embodiment of the first aspect, the plurality of servers cannot be powered off during operation.

In one embodiment of the first aspect, assignment of computation tasks in the server farm is substantially independent of an arrival rate of computation tasks at the server farm.

In one embodiment of the first aspect, assignment of computation tasks in the server farm is substantially independent of a respective size of the computation tasks received at the server farm.

In one embodiment of the first aspect, the plurality of servers each includes a finite buffer for queuing job requests.

In one embodiment of the first aspect, the one or more servers operable to accept the job request each has at least one vacancy in their respective buffer.

In one embodiment of the first aspect, the server farm is heterogeneous in that some or all of the plurality of servers can have different server speeds, energy consumption rates, and/or buffer sizes.

In one embodiment of the first aspect, the server farm is a non-jockeying server farm in which computational task being handled by one of the plurality of servers cannot be reassigned to other servers.

In accordance with a second aspect of the present invention, there is provided a system for operating a server farm with a plurality of servers operably connected with each other, the system comprising one or more processors arranged to: receive a job request of a computational task to be handled by the server farm; determine, from the plurality of servers, one or more servers operable to accept the job request; determine a respective effective energy efficiency value associated with at least the one or more servers; and assign the computational task to a server with the highest effective energy efficiency value; wherein the effective energy efficiency value is defined by: a service rate of the respective server divided by a difference between an energy consumption rate value when the respective server is busy (performing computational tasks) and an energy consumption rate value when the respective server is idle (not performing computational tasks).

In one embodiment of the second aspect, the one or more processors may be incorporated in one or more servers in the server farm. In another embodiment, the one or more processors may be arranged external to the server farm, but are operably connected with the servers in the server farm.

In accordance with a third aspect of the present invention, there is provided a server farm comprising: a plurality of servers operably connected with each other; one or more processor operably connected with the plurality of server, the one or more processor being arranged to: receive a job request of a computational task to be handled by the server farm; determine, from the plurality of servers, one or more servers operable to accept the job request; determine a respective effective energy efficiency value associated with at least the one or more servers; and assign the computational task to a server with the highest effective energy efficiency value; wherein the effective energy efficiency value is defined by: a service rate of the respective server divided by a difference between an energy consumption rate value when the respective server is busy (performing computational tasks) and an energy consumption rate value when the respective server is idle (not performing computational tasks).

In one embodiment of the third aspect, the one or more processor is further operable to sort the one or more servers according to the respective determined effective energy efficiency values.

In one embodiment of the third aspect, the one or more processor is further operable to: determine, from the plurality of servers, all servers operable to accept the job request.

In one embodiment of the third aspect, the plurality of servers cannot be powered off during operation of the server farm.

In one embodiment of the third aspect, the one or more processor is arranged such that assignment of computation tasks in the server farm is substantially independent of an arrival rate of computation tasks at the server farm.

In one embodiment of the third aspect, the one or more processor is arranged such that assignment of computation tasks in the server farm is substantially independent of a respective size of the computation tasks received at the server farm.

In one embodiment of the third aspect, the plurality of servers each includes a finite buffer for queuing job requests; and wherein the one or more servers operable to accept the job request each has at least one vacancy in their respective buffer.

In one embodiment of the third aspect, the server farm is heterogeneous in that the plurality of servers can have different server speeds, energy consumption rates, and/or buffer sizes.

In one embodiment of the third aspect, the server farm is a non-jockeying server farm in which computational task being handled by one of the plurality of servers cannot be reassigned to other servers.

In one embodiment of the third aspect, the one or more processors are incorporated in at least one of the plurality of servers.

In accordance with a fourth aspect of the present invention, there is provided a non-transient computer readable medium for storing computer instructions that, when executed by one or more processors, causes the one or more processors to perform a method for operating a server farm with a plurality of servers operably connected with each other, the method comprising the steps of: receiving a job request of a computational task to be handled by the server farm; determining, from the plurality of servers, one or more servers operable to accept the job request; determining a respective effective energy efficiency value associated with at least the one or more servers; and assigning the computational task to a server with the highest effective energy efficiency value; wherein the effective energy efficiency value is defined by: a service rate of the respective server divided by a difference between an energy consumption rate value when the respective server is busy (performing computational tasks) and an energy consumption rate value when the respective server is idle (not performing computational tasks).

It is an object of the present invention to address the above needs, to overcome or substantially ameliorate the above disadvantages or, more generally, to provide an improved method for assigning jobs in a large-scale server farm by taking into account the power consumed by the servers when idle.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the present invention will now be described, by way of example, with reference to the accompanying drawings in which:

FIG. 1 is a schematic diagram of an operation environment of a server farm in accordance with one embodiment of the present invention;

FIG. 2 is a functional block diagram of an information handling system in accordance with one embodiment of the present invention;

FIG. 3 is a flow diagram illustrating a job assignment method, also referred to as the most-energy-efficient-available-server-first accounting for idle power (MAIP) job assignment method, for operating a server farm in accordance with one embodiment of the present invention;

FIG. 4A is a graph of simulation results showing a relative difference in energy efficiency of a server farm implementing the MAIP job assignment method in one embodiment of the present invention to energy efficiency of a server farm implementing a most-energy-efficient-available-server-first neglecting idle power (MNIP) job assignment method;

FIG. 4B is a graph of simulation results showing job throughput of a server farm implementing the MAIP job assignment method in one embodiment of the present invention and that of a server farm implementing the MNIP job assignment method;

FIG. 4C is a graph of simulation results showing a relative difference in energy consumption rate of a server farm implementing the MAIP job assignment method in one embodiment of the present invention to a server farm implementing the MNIP job assignment method;

FIG. 5A is a graph of simulation results showing energy efficiency against the number of servers in a server farm implementing the MAIP job assignment method in one embodiment of the present invention and in a server farm implementing the MNIP job assignment method (with normalized offered traffic ρ=0.6);

FIG. 5B is a graph of simulation results showing job throughput against the number of servers in a server farm implementing the MAIP job assignment method in one embodiment of the present invention and in a server farm implementing the MNIP job assignment method (with normalized offered traffic ρ=0.6);

FIG. 5C is a graph of simulation results showing energy consumption rate against the number of servers in a server farm implementing the MAIP job assignment method in one embodiment of the present invention and in a server farm implementing the MNIP job assignment method (with normalized offered traffic ρ=0.6);

FIG. 6A is a graph of simulation results showing a cumulative distribution function of a relative difference of energy efficiency of a server farm implementing the MAIP job assignment method in one embodiment of the present invention to energy efficiency of a server farm implementing the MNIP job assignment method, for different server heterogeneity β (with normalized offered traffic ρ=0.4);

FIG. 6B is a graph of simulation results showing a cumulative distribution function of a relative difference of energy efficiency of a server farm implementing the MAIP job assignment method in one embodiment of the present invention to energy efficiency of a server farm implementing the MNIP job assignment method, for different server heterogeneity β (with normalized offered traffic ρ=0.6);

FIG. 6C is a graph of simulation results showing a cumulative distribution function of a relative difference of energy efficiency of a server farm implementing the MAIP job assignment method in one embodiment of the present invention to energy efficiency of a server farm implementing the MNIP job assignment method, for different server heterogeneity β (with normalized offered traffic ρ=0.8);

FIG. 7A is a graph of simulation results showing energy efficiency against the number of servers in a server farm implementing the MAIP job assignment method in one embodiment of the present invention and in a server farm implementing the Most Energy Efficient Server First (MEESF) job assignment method (with suspension period Δ=0);

FIG. 7B is a graph of simulation results showing energy efficiency against the number of servers in a server farm implementing the MAIP job assignment method in one embodiment of the present invention and in a server farm implementing the MEESF job assignment method (with suspension period Δ=0.0005);

FIG. 7C is a graph of simulation results showing energy efficiency against the number of servers in a server farm implementing the MAIP job assignment method in one embodiment of the present invention and in a server farm implementing the MEESF job assignment method (with suspension period Δ=0.01);

FIG. 8A is a graph of simulation results showing job throughput against the number of servers in a server farm implementing the MAIP job assignment method in one embodiment of the present invention and in a server farm implementing the Most Energy Efficient Server First (MEESF) job assignment method (with suspension period Δ=0);

FIG. 8B is a graph of simulation results showing job throughput against the number of servers in a server farm implementing the MAIP job assignment method in one embodiment of the present invention and in a server farm implementing the Most Energy Efficient Server First (MEESF) job assignment method (with suspension period Δ=0.0005);

FIG. 8C is a graph of simulation results showing job throughput against the number of servers in a server farm implementing the MAIP job assignment method in one embodiment of the present invention and in a server farm implementing the Most Energy Efficient Server First (MEESF) job assignment method (with suspension period Δ=0.01);

FIG. 9A is a graph of simulation results showing a cumulative distribution function of a relative difference of energy efficiency in a server farm implementing the MAIP job assignment method in one embodiment of the present invention, by comparing different job-size distribution to an exponential distribution (with normalized offered traffic ρ=0.4 and server heterogeneity β=1);

FIG. 9B is a graph of simulation results showing a cumulative distribution function of a relative difference of energy efficiency of a server farm implementing the MAIP job assignment method in one embodiment of the present invention, by comparing different job-size distribution to an exponential distribution (with normalized offered traffic ρ=0.6 and server heterogeneity β=1); and

FIG. 9C is a graph of simulation results showing a cumulative distribution function of a relative difference of energy efficiency of a server farm implementing the MAIP job assignment method in one embodiment of the present invention, by comparing different job-size distribution to an exponential distribution (with normalized offered traffic ρ=0.8 and server heterogeneity β=1).

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

FIG. 1 shows an operation environment 100 of a server farm in accordance with one embodiment of the present invention. The environment 100 includes a server farm 102 with servers 104 operably connected with each other, for example, through one or more communication links or buses (not shown). Each server 104 is generally an information handling system that is designed to perform computational tasks such as storing, managing, and/or processing information and data. In one embodiment, the server 104 may be any types of device or apparatus that is operable to store, send, receive, and/or forward information and data. The servers 104 in the environment 100 may optionally be connected to other networks, such as the Internet or a cloud-computing network (not shown), for exchanging, storing, or retrieving information. Each server is preferably operable independently. In one example, two or more of the servers 104 in the server farm 102 may be linked together to perform computation functions. The server farm could be a single server farm, or could be a cluster of server farms that are operably connected with each other.

The environment 100 in FIG. 1 includes computing devices that can be connected to the server over a communication network, such as the Internet or a cloud computing network, for bi-directional communication and information/data exchange. As shown in FIG. 1, the computing device may be a desktop computer 106A, a portable computing device 106B, a mobile device 106C, etc., and can be controlled by a user or by another computing device to perform computation functions. The computing device 106A, 106B, 106C is operable to transmit to a server management module a job request of a computational task to be performed by one or more servers 104 in the server farm 102. In one example, the computational task is to retrieve information stored on or accessible to the servers 104 in the farm 102. In the present embodiment, the server management module may be implemented by one or more processors in the one or more servers 104 of the server farm 102, or by other processing system outside the server farm 102. The server management module may be implemented entirely by hardware resources, entirely by software resources, or by both hardware and software resources. The server management module may be operated by an administration personnel for controlling operation of the server farm 102.

Referring to FIG. 2, there is shown a schematic diagram of an exemplary information handling system 200 that can be used as a server or other information processing systems in one embodiment of the present invention. Preferably, the server 200 may have different configurations, and it generally comprises suitable components necessary to receive, store and execute appropriate computer instructions or codes. The main components of the server 200 are a processing unit 202 and a memory unit 204. The processing unit 202 is a processor such as a CPU, an MCU, etc. The memory unit 204 may include a volatile memory unit (such as RAM), a non-volatile unit (such as ROM, EPROM, EEPROM and flash memory) or both. Preferably, the server 200 further includes one or more input devices 206 such as a keyboard, a mouse, a stylus, a microphone, a tactile input device (e.g., touch sensitive screen) and a video input device (e.g., camera). The server 200 may further include one or more output devices 208 such as one or more displays, speakers, disk drives, and printers. The displays may be a liquid crystal display, a light emitting display or any other suitable display that may or may not be touch sensitive. The server 200 may further include one or more disk drives 212 which may encompass solid state drives, hard disk drives, optical drives and/or magnetic tape drives. A suitable operating system may be installed in the server 200, e.g., on the disk drive 212 or in the memory unit 204 of the server 200. The memory unit 204 and the disk drive 212 may be operated by the processing unit 202. The server 200 also preferably includes a communication module 210 for establishing one or more communication links (not shown) with one or more other computing devices such as a server, personal computers, terminals, wireless or handheld computing devices. The communication module 210 may be a modem, a Network Interface Card (NIC), an integrated network interface, a radio frequency transceiver, an optical port, an infrared port, a USB connection, or other interfaces. The communication links may be wired or wireless for communicating commands, instructions, information and/or data. Preferably, the processing unit 202, the memory unit 204, and optionally the input devices 206, the output devices 208, the communication module 210 and the disk drives 212 are connected with each other through a bus, a Peripheral Component Interconnect (PCI) such as PCI Express, a Universal Serial Bus (USB), and/or an optical bus structure. In one embodiment, some of these components may be connected through a network such as the Internet or a cloud computing network. A person skilled in the art would appreciate that the server 200 shown in FIG. 2 is merely exemplary, and that different servers 200 may have different configurations and still be applicable in the present invention.

FIG. 3 shows a flow diagram illustrating a job assignment method 300 in accordance with one embodiment of the present invention, also referred to as the most-energy-efficient-available-server-first accounting for idle power (MAIP) job assignment method, for operating a server farm. The server farm in which the method operates may be similar to the server farm 102 of FIG. 1, with a number of servers operably connected with each other. Each of the servers in the server farm may include a finite buffer for queuing job requests. The job assignment method 300 may be implemented by one or more processors or servers within or outside the server farm. In one embodiment, the method 300 includes the step 302 of receiving a job request of a computational task to be handled by the server farm. The method 300 further includes the step 304 of determining, from all of the servers, one or more servers that is operable to accept the job request. Preferably, the one or more servers operable to accept the job request are servers that have at least one vacancy in their respective buffer. In a preferred embodiment, all servers operable to accept the job request are identified in step 304. After determining the servers operable to accept the job request, the method 300 proceeds to step 306, in which respective effective energy efficiency values associated with the one or more servers operable to accept the job request are determined. In the present invention, the effective energy efficiency value is defined as: a service rate of the respective server divided by a difference between an energy consumption rate value when the respective server is busy and an energy consumption rate value when the respective server is idle. Optionally, the method 300 further includes the step of sorting the one or more servers according to the respective determined effective energy efficiency values. After determining the respective effective energy efficiency values, the method proceeds to step 308, in which the computational task to be performed is assigned to a server with the highest effective energy efficiency value. Preferably, in method 300, assignment of computation tasks in the server farm is substantially independent of an arrival rate of computation tasks at the server farm/a respective size of the computation tasks received at the server farm.

A person skilled in the art would appreciate that the steps 302, 304, 306, 308 in method 300 need not be performed in the order as listed, but can be in any other order as long as it is logical. For example, steps 304 and 306 can be performed before step 302.

In a preferred embodiment, the server farm is heterogeneous in that the servers can have different server speeds, energy consumption rates, and/or buffer sizes. All servers in the farm, even the idle ones, have non-negligible energy consumption rate. Preferably, the server farm is a non-jockeying server farm in which computational task being handled by one of the servers cannot be reassigned to other servers. In one embodiment, the plurality of servers cannot be powered off during operation of the server farm. This could refer to, in practice, periods of operation during which no powering off of the servers takes place. In one embodiment, method 300 may be combined with a right-sizing technique by powering off idle servers, although frequent powering off/on increases wear and tear and the need for costly replacement and maintenance.

In a preferred embodiment, the processor sharing (PS) discipline is imposed on each queue of the servers, so that all jobs on the same queue share the processing capacity and are served at the same rate. This arrangement avoids unfair delays for those jobs that are preceded by extremely large jobs, making it an appropriate model for web server farms, where job-size distributions are highly variable. The finite buffer size queuing model with PS discipline can be applied in situations where a minimum service rate is required for processing a job in the system.

In one embodiment, the server farm is a large-scale realistically-dimensioned server farm that cannot reject a job if it has buffer space available. Although in situations where a server farm has some inefficient servers and rejection of some jobs might save energy, this is not permitted in some embodiments of the present invention.

An objective function of the optimization of the present invention is the energy efficiency of a server farm, defined as the ratio of the long-run expected throughput divided by the expected energy consumption rate. This objective function represents the amount of useful work (e.g., data rate, throughput, processes per second) per watt, and is well-accepted as a performance measure in ICT applications.

I—System Model

The following table (Table I) includes definition of some of the symbols used in the following description.

TABLE I Symbol Definition     Set of servers in the system K Number of servers in the system Bj Buffer size of server j μj Service rate of server j εj Energy consumption rate of server j when it is busy εj0 Energy consumption rate of server j when it is idle μj/(εjj0) Effective energy efficiency of server j λ Job arrival rate    ϕ Job throughput of the system under policy ϕ εϕ Energy consumption rate of the system under policy ϕ    ϕϕ Energy efficiency of the system under policy ϕ

The present embodiment considers a heterogeneous server farm modeled as a multi-queue system with reassignment of incomplete jobs (e.g. jobs being processed) disallowed. In this embodiment, the server farm has K≥2 servers, forming the set ={1, 2, . . . , K}. These servers are characterized by their service rates, energy consumption rates, and buffer sizes. For j ∈ , the service rate for server j is denoted by μj. The energy consumption rate of server j is εj when it is busy and εj0 when it is idle, where εjj0≥0. In the present invention, the ratio μj/(εj−εj0) is referred to as the effective energy efficiency of server j. In one embodiment, the buffer size of server j is denoted by Bj≥2.

Preferably, job arrivals follow a Poisson process with rate λ, indicating the average number of arrivals per time unit. An arriving job is assigned to one of the servers with at least one vacant slot in its buffer, subject to the control of an assignment policy ϕ. In one embodiment, if all buffers are full, the arriving job is lost.

In the present embodiment, it is assumed that job sizes are independent and identically distributed. The average size of jobs is normalized, without loss of generality, to one. Preferably, each server j serves its jobs at a total rate of μj using the PS service discipline.

The following consideration is limited to realistic cases by assuming that the ratio of the arrival rate to the total service rate, ρλ/Σj=1Kμj, is sufficiently large to be economically justifiable but not too large to violate the required quality of service (QoS). In the following, ρ is referred to as the normalized offered traffic.

The job throughput of the system under policy ϕ, which is equivalent to the long-run average job departure rate, is denoted by ϕ. The power consumption of the system under policy ϕ, which is equivalent to the long-run average energy consumption rate, is denoted by εϕ. By definition, ϕϕ is the energy efficiency of the system under policy ϕ.

II—MAIP Job Assignment Method

In the present embodiment, the server farm managing module makes decisions at arrival events to assign a new job to one of the servers (queues) in the server farm (queuing system). A server selected to accept new jobs is called a tagged server, while all other servers are untagged. If all of the servers are full, i.e., has no capacity to accept new job requests, then no server is tagged at that time and new arrivals are blocked until completion of some job in the system.

Preferably, MAIP is obtained by considering the effective energy efficiency of servers, taking into account the effect of idle power, i.e., energy consumption rate when the server is idle. Preferably, the method in the present embodiment always selects a server with the highest effective energy efficiency among all servers that are not full. Such a server is regarded as the most energy-efficient server available to accept new jobs.

A simple explanation of MAIP of the present embodiment is as follows. Consider a system with two servers only, where μ12=1, ε1=2, ε10=1, ε22.5, and ε20=2. It is clear that in this example ε12 and ε1020. If a job arrives when both servers are idle, the scheduler has two choices:

  • (1) Assigning the job to server 1 makes server 1 busy. And the energy consumption rate of the whole system becomes ε120=4.
  • (2) Assigning the job to server 2 makes server 2 busy. And the energy consumption rate of the whole system becomes ε210=3.5.

Since (ε120)>(ε210), which is equivalently (ε1−ε10)>(ε2−ε20), and since both servers have the same service rate, choosing server 2 for serving the job in this particular example turns out to be better in terms of the energy efficiency of the system, despite the fact that server 2 consumes more power when busy than server 1 does.

In examples where power consumption of idle servers in a system is not necessarily negligible, the energy used by the system can be categorized into two parts, a productive part and an unproductive part. The productive part contributes to job throughput, whereas the unproductive part is a waste of energy. For a server j, when it is idle, the service rate is 0 accompanied by an energy consumption rate of εj0; when it is busy, the service rate becomes μj and the energy consumption rate increases to εj. The additional service rate is considered as a reward at the cost of an additional energy consumption rate εj−εj0. In other words, if jobs are assigned to server j, the productive power used to support the service rate μj is effectively εj−εj0. In the design of MAIP in one embodiment of the present invention, productive power is the main consideration.

Since MAIP in the present embodiment aims for energy-efficient job assignment, in the following description, the servers are labeled according to their effective energy efficiency. In particular, in the context of MAIP, server i is defined to be more energy-efficient than server j if and only if μj/(εj−εj0)>μj/(εj−εj0). That is, for any pair of servers i and j, if i<j, then μj/(εj−εj0)≥μj/(εj−εj0). MAIP in the present embodiment operates by always selecting a server with the highest effective energy efficiency among all servers that contain at least one vacant slot in their buffers, where ties are broken arbitrarily. Advantageously, MAIP in the present embodiment is a simple approach that requires only binary state information (i.e., available or unavailable) from each server for its implementation.

III—Analysis A. Stochastic Process

Let j denote the set of all states of server j, where the state, nj is the number of jobs queuing or being served at server j. Thus, j={0, 1, . . . , Bj}, where Bj≥2 is the buffer size for server j. For server j, states 0, 1, . . . , Bj−1 are called controllable, and the state Bj is called uncontrollable. The set of controllable states for server j, in which the server is available to be tagged, is denoted by j(0,1)={0, 1, . . . , Bj−1} while, for the uncontrollable state in the set j(0)={Bj}, the server is forced to be untagged because it cannot accept jobs.

The vectors n=(n1, n2, . . . , nK) represents the state of the multi-queue system, nj j ∈. The set of all such states n is denoted by , the sets of uncontrollable and controllable states in are, respectively,


(0)={n ∈ |nj j(0), ∀ j ∈ },


(0, 1)={n ∈ |n ∉ (0)}.   (1)

Define Xϕ(t)=(X1ϕ(t), X2ϕ. . . XKϕ(t)) to be a vector of random variables representing the state at time t under policy ϕ of the stochastic process of the multi-queue system. Without loss of generality set the initial state Xϕ(0)=x(0), x(0) ∈ .

Decisions made on job arrivals rely on the values of X(t) just before an arrival occurs. Use ajϕ(i), j ∈ as an indicator of activity at time t under policy ϕ so that ajϕ(t)=1 if server j is tagged, and ajϕ(t)=0 otherwise. Then Σj=1Kajϕ(t)≤1 for all t>0. All job assignment policies considered in the present embodiment are stationary, and so ajϕ(n), n ∈ , is used to represent the action to be taken on the stochastic process when the system is in state n. A policy ϕ comprises those) aϕ(n)=(a1ϕ(n), a2ϕ(n), . . . , aKϕ(n)) for all n ∈ .

Define a mapping Rj: j→R, where Rj(nj)(nj j) is the reward rate of server j in state nj. Let j be the set of all such mappings Rj. Then, for a given vector of mappings R=(R1, R2, . . . , RK), the long-run average reward under policy ϕ is defined to be

γ φ ( R ) = lim t + 1 t { 0 t j R j ( X j φ ( u ) ) du } . ( 2 )

R is referred to as the reward rate function. Along similar lines, consider μj(nj) and εj(nj), the service rate and energy consumption rate of server j in state nj, respectively, as rewards; that is μj, εj j. As previously defined, μj(nj)=μj, εj(nj)=εj for nj>0, μj(0)=0 and εj(0)=εj0, where μj>0, εjj0≥0, j ∈ . For the vectors μ=(μ1, μ2, . . . , μK) and ε=(ε1, ε2, . . . , εK), the long-run average job service rate of the entire system is, then, γϕ(μ) and the long-run average energy consumption rate of the system is γϕ(ε). For simplicity, long-run average job service rate and the long-run average energy consumption rate are referred to, in this description, as the job throughput and energy consumption rate, respectively. Since the energy efficiency of the system is the ratio of job throughput to energy consumption rate, the problem of maximizing energy efficiency is encapsulated in

max φ γ φ ( μ ) γ φ ( ɛ ) . ( 3 )

Based on the definition given above, MAIP can be formally defined as follows.

a j MAIP ( n ) = { 1 , n { 0 , 1 } , j = min arg max j : n j j { 0 , 1 } μ j ɛ j - ɛ j 0 0 , otherwise . . ( 4 )

B. Whittle's Index

A well-known index theorem for SFABP was published in 1974 in J. C. Gittins and D. M. Jones, “A dynamic allocation index for the sequential design of experiments,” in Progress in Statistics, J. Gani, Ed. Amsterdam, NL: North-Holland, 1974, pp. 241-266. The optimal solution for the general multi-armed bandit problem (MABP) was published in 1979 in J. C. Gittins, “Bandit processes and dynamic allocation indices,” Journal of the Royal Statistical Society. Series B (Methodological), pp. 148-177,1979. Relaxing the constraint that only one machine (project/bandit/process) is played at a time, and only the played machine changes state, Whittle, in P. Whittle, “Restless bandits: Activity allocation in a changing world,” J. Appl. Probab., vol. 25, pp. 287-298, 1988, published a more general model, the restless multi-armed bandit (RMAB) and proposed as an index the so-called Whittle's index as an approximation for optimality.

The general definition of Whittle's index for the problem of the present embodiment is given here; a closed-form expression will be provided in Section C for the case when job sizes are exponentially distributed.

Based on Theorem 1 provided in Z. Rosberg, Y. Peng, J. Fu, J. Guo, E. W. M. Wong, and M. Zukerman, “Insensitive job assignment with throughput and energy criteria for processor-sharing server farms,” IEEE/ACM Trans. Netw., vol. 22, no. 4, pp. 1257-1270, August 2014, there exists a value e*>0, given by

e * = max φ { γ φ ( μ ) γ φ ( ɛ ) } , ( 5 )

the optimization problem in equation (4) can be written as

sup φ ( γ φ ( R ) : j { 1 , 2 , , K } : X j φ ( t ) j { 0 , 1 } a j φ ( t ) = 1 , t 0 } ( 6 )

where the reward rate function R=(R1, R2, . . . , RK), Rj j, Rj(nj)=μj(nj)−ε*εj(nj), j ∈ .

Following the Whittle's index approach, the problem in equation (6) can be relaxed as

sup φ lim t + 1 t { 0 t j R j ( X j φ ( u ) ) d u } , s . t . { j : X j φ ( t ) j { 0 , 1 } a j φ ( t ) } = 1. ( 7 )

This would mean that ajϕ(t) becomes random variables, and so that sometimes more than one server will be tagged simultaneously. This is unrealistic and is not preferable in the present invention.

The linear constraint in equation (8) is covered by the introduction of a Lagrange multiplier v.

inf v sup φ lim t + 1 t { 0 t [ j R j ( X j φ ( u ) ) - v j : X j φ ( t ) j { 0 , 1 } a j φ ( u ) ] d u } + v , ( 8 )

For a given v, equation (8) can be decomposed into K sub-problems:

sup φ lim t + 1 t { 0 t [ R j ( X j φ ( u ) ) - va j φ ( u ) ] d u } , ( 9 )

where ajϕ(u)=0 when Xjϕ(u) ∈j(0), for 0<u<t, j ∈.

In P. Whittle, “Restless bandits: Activity allocation in a changing world, ” J. Appl. Probab., vol. 25, pp. 287-298, 1988, Whittle defined a v-subsidy policy for a project (server) as an optimal solution for equation (9), which provides the set of states where the given project will be passive (untagged), and introduced the following definition.

Definition 1. Let D(v) be the set of passive states of a project under a v-subsidy policy. The project is indexable if D(v) increases monotonically from ∅ to the set of all possible states for the project as v increases from −∞ to +∞.

In particular, if a project (server) j is indexable and there is a v* satisfying nj∉ D(v) for v≤v* and nj ∈ D(v) otherwise then this v* is the value of Whittle's index for project (server) j at state nj. Whittle's index policy for the multi-queue system chooses a controllable server (a server in controllable states) with highest Whittle's index to be tagged (with others untagged) at each decision making epoch.

C. Indexability

The closed form of the optimal solution for equation (9) is given—it is equivalent to the Whittle's index policy for the case with exponentially distributed job sizes. The method of the present embodiment uses the theory of semi-Markov decision processes and the Hamilton-Jacobi-Bellman equation. Formulation in this way requires the exponential job size assumption, but in some embodiments the method of the present invention is not limited to such job size distribution.

Let Vjϕj,v(Rj, Rj) be, for policy ϕj, the expected value of the cumulative reward of a process for server j ∈ that starts from state nj j and ends when it first goes into an absorbing state nj0 ∈ , with reward rate Rj(nj)−vajϕj (nj). In particular, Vjϕj(nj0)=0 for any ϕj. Here, ϕj is a stationary policy for server j, which determines whether it is tagged or not according to its current state Xjϕj(t). Because state 0 is reachable from all other states, it can be assumed, without loss of generality, that nj0=0 for all j ∈ . For this section, define Rj(nj)=μj(nj)−ε*εj(nj), j ∈ where ε* is defined as in equation (5).

Now, let jH, j ∈ represent a process for server j that starts from state 0 until it reaches state 0 again, where ϕj is constrained to those policies satisfying ajϕj(0)=1. The set of all such policies is denoted by ΦjH. It follows from S. M. Ross, Applied probability models with optimization applications. Dover Publications (New York), 1992 that the average reward of process jH is equivalent to the long-run average reward of the system.

Now an application of the g-revised criterion in M. Ross, Applied probability models with optimization applications. Dover Publications (New York), 1992 yields the followed corollary to these two theorems.

Corollary 1. For a server j and a given v<+∞, let Rj(nj)=μj(nj)−ε*εj(nj)<+∞, there exists a real g, with Rjg(nj)−g such that if policy ϕ*j∈ ΦjH maximizes Vjϕj,v(nj, Rj3) then, ϕ*j also maximizes the long-run average reward of server j with reward rate Rj(nj)−ajϕ*j(nj)v, n, ∈ j, among all policies in ΦjH. In particular, this value of g, denoted by g*, is equivalent to the maximized long-run average reward.

In other words, by comparing the maximized average reward of process jH under policy ϕ*j and policy ϕj0 with ajϕj0(0)=0 (and all the actions for non-zero states are the same as ϕ*j), then the one with higher average reward is the optimal policy for equation (9). Note that, in the server farm model of the present embodiment, if ajϕj0(0)=0, the actions for non-zero states are meaningless since the corresponding server (queue) will never leave state 0.

The first step involves finding ϕ*j. Let Vjv(nj, Rjg)=supϕj Vjϕj,,v(nj, Rj0). The maximization of Vjϕj,v(nj, Rj0) can be written, using the Hamilton-Jacobi-Bellman equation, as

V j v ( n j , R j g ) = max { ( R j g ( n j ) - v ) τ j 1 ( n j ) + n j P j 1 ( n j , n ) V j v ( n , R j g ) , R j g ( n j ) τ j 0 ( n j ) + n j P j 0 ( n j , n ) V j v ( n , R j g ) } , ( 10 )

where τj1(nj) and τj0(nj), are the expected sojourn time in state nj for ajϕj (nj)=1, and ajϕj(nj)=0, respectively, and P1(nj, n) and P0(nj, n), nj, n ∈j, are the transition probability for ajϕj(nj)=1 and ajϕj(nj)=0, respectively.

For equation (10), there is a specific v, referred to as v*j(nj, Rjg), satisfying

v j * ( n j , R j g ) τ j 1 ( n j ) = n j P j 1 ( n j , n ) V j v ( n , R j g ) - n j P j 0 ( n j , n ) V j v ( n , R j g ) + R j g ( n j ) ( τ j 1 ( n j ) - τ j 0 ( n j ) ) . ( 11 )

For an indexable server j, a policy can be defined as follows:

  • if v<v*j(nj, Rjg), j will be tagged
  • if v>v*j(nj, Rjg), j will be untagged, and
  • if v=v*j(nj, Rjg), j can be either tagged or untagged. (12)

The v*(nj, Rjg), nj j, j ∈ , constitute Whittle's index in this context, and equation (12) defines the optimal solution for equation/problem (9). According to equation (11), although the value of v*j(nj, Rjg) may appear to rely on v, it can be shown that in the present embodiment, the value of v*j(nj, Rjg) can be expressed in close form and is independent of v, and that the server farm in the present embodiment is indexable according to the definition in P. Whittle, “Restless bandits: Activity allocation in a changing world, ” J. Appl. Probab., vol. 25, pp. 287-298,1988.

Proposition 1. For the system of the present embodiment defined in Section I, j ∈ ,

v j * ( n j , R j g ) = λ ( μ j - e * ɛ j - g ) μ j , n j = 1 , 2 , , B j - 1. ( 13 )

The optimal policy, denoted by ϕ*j, that maximizes vjϕj,v(nj, Rjg) also maximizes the average reward of process jH with the value of g specified in Corollary 1 among all policies in ΦjH. For the optimal v-subsidy policy, it remains to compare ϕ*jwith ajϕds *j(0)=1 and ϕj0 with ajϕ0j(0)=0.

Proposition 2. For the system of the present embodiment defined in Section I, j ∈ ,

v j * ( 0 , R j g ) = λ μ j ( μ j - e * ɛ j + e * ɛ j 0 ) . ( 14 )

The following Proposition 3 is a consequence of Propositions 1 and 2.

Proposition 3. For the system defined in Section I, if job-sizes are exponentially distributed then the Whittle's index of server j at state nj is:

v j * ( n j , R j g ) = λ ( 1 - e * ɛ j - ɛ j 0 μ j ) , n j = 0 , 1 , , B j - 1. ( 15 )

Evidently then, the system is indexable.

It is clear that the Whittle's index policy, which prioritizes servers with the highest index value at each decision making epoch, is similar to the MAIP method of the present embodiment defined in equation (4), when job sizes are exponentially distributed.

D. Asymptotic Optimality

This section serves to prove the asymptotic optimality of MAIP for the server farm of the present embodiment comprising multiple groups of identical servers, as the numbers of servers in these groups become large and when the job sizes are exponentially distributed (the number of servers is scaled under appropriate and reasonable conditions for large server farms).

The proof methodology disclosed in R. R. Weber and G. Weiss, “On an index policy for restless bandits, ” J. Appl. Probab., no. 3, pp. 637-648, September 1990 for the asymptotic optimality of index policies is applied to the problem of the present embodiment. However, this proof cannot be directly applied to the present problem because of the presence of uncontrollable states (since buffering spill-over creates dependencies between servers) in the server farm of the present embodiment. In the following, an additional server is defined, designated as server K+1, to handle the blocking case when all original servers are full; this server has only one state (server K+1 never changes state) with zero reward rate. In a preferred embodiment, this is a virtual server that is used only in the proof of the asymptotic optimality in this section. In particular, =1 and {0}=∅. Also, define = ∪(K+1) as the set of servers including this added zero-reward server. The set of controllable states of these K+1 servers is defined as z,85 {0, 1}=∪j∈K+{0, 1} and the set of uncontrollable states is {0}=∪j∈K+j{0}.

In this section, servers with identical buffer size, service rate, and energy consumption rate are grouped as a server group, and these server groups are labeled as server groups 1, 2, . . . {tilde over (K)}. For servers i, j of the same server group, i{0, 1}=j{0, 1} and i{0}=j{0}. For clarity of presentation, define j{0, 1} and j{0}, i=1, 2, . . . , {tilde over (K)} as, respectively, the sets of controllable and uncontrollable states of servers in server group i. States for different server groups are regarded as different states, that is, j{0, 1}j{0, 1}=∅; and j{0}j{0}=∅; for different server groups i and j; j=1, 2, . . . {tilde over (K)}. Let Zjϕ(t) be the random variable representing the proportion of servers in state i ∈ {0, 1}{0} at time t under policy ϕ. Again, states i ∈ {0, 1}{0} are labeled as 1, 2, . . . , I, where I=|{0, 1}{0}| and Zϕ(t) is used to denote the random vector (Z1ϕ(t), Z2ϕ(t), . . . , Z1ϕ(t)). Correspondingly, actions ajϕ(nj), njj, j ∈ + correspond to actions aϕ(i), i ∈ {0, 1}{0}.

Let z, z′ ∈RI be possible values of Zϕ(t), T>0, ϕ ∈ Φ. Transitions of the random vector Zϕ(t) from z to z′ can be written as z′=z+ei,i′, where ej, p is a vector of which the ith element is

+ 1 K + 1 ,

the i'th element is

- 1 K + 1

and otherwise is zero, is i, i′∈ {0, 1}{0}. In particular, for the server farm of the present embodiment defined in Section I, server j only appears in state i ∈ j; that is, the transition from z to z′=z+ei,i′, i∈ j{0, 1}j{0}, i′∈ j{0, 1}j{0}j, j′=1, 2, . . . , {tilde over (K)}, j≠j′ never occurs. In order to address such impossible transitions, the corresponding transition probabilities are set to zero. Then, order/sort the states i ∈ {0, 1} according to descending index values, where all states i ∈ {0} come after the controllable states, with aϕ(i)=0 for i ∈ {0}. Next, set the state i ∈ K+2{0, 1} of the zero-reward server, which is also a controllable state, to come after all the other controllable states but to precede the uncontrollable states. Because of the existence of the zero-reward server K+1, the number of servers in controllable states can always meet the constraint in equation (7). Note here that the state of server K+1 and the uncontrollable states are manually moved to certain positions without following their indices which are zero. It can be shown that such movements will not affect the long-run average performance of Whittle's index policy, which exists and is equivalent to MAIP in the present embodiment. The position of a state in the ordering i=1, 2, . . . , I is also defined as its label.

Let γOR(ϕ) be the long-run average reward of the original problem in equation (6) under policy ϕ, and γLR(ϕ) be the long-run average reward of the relaxed problem in equation (7) under policy ϕ. In addition, let

γ OR = max φ { γ OR ( φ ) } ,

the maximal long-run average reward of the original problem, and

γ LR = max φ { γ LR ( φ ) } ,

the maximal long-run average reward of the relaxed problem. From the definition of the system of the present embodiment, γLR(ϕ)/K, γOR(ϕ)/K≤maxj∈a+,njj, Rj(nj)<+∞, where Rj(nj) is the reward of server j in state nj as defined before. Then, γOR(index)/K≤γLR/K is obtained. It can be proved that, under Whittle's index policy, γOR(index)/K−γOR/K→0 when K is scaled in a certain way.

To demonstrate the asymptotic optimality, the following describes the stationary policies, including Whittle's index policy, in another way. Let μjϕ(z) ∈ [0, 1], z ∈ RI, i=1, 2, . . . I, be the probability for a server in state i ∈ {0, 1}{0} to be tagged (aϕ(i)=1) when Zϕ(t)=z. Then, 1−viϕ(z) is the probability for a server in state i to be untagged (aϕ(i)=0).

Define i+, i ∈ {0, 1}{0} as the set of states that precede state i in the ordering. Then, for Whittle's index policy, obtain

u i index ( z ) = 1 z i min { z i , max { 0 , 1 K + 1 - i i + i } } . ( 16 )

The multi-queue system of the present embodiment is stable, since any stationary policy will lead to an irreducible Markov chain for the associated process and the number of states is finite. Then, for a policy ϕ ∈ Φ, the vector Xϕ(t) converges as t→∞ in distribution to a random vector Xϕ. In the equilibrium region, let πjϕ be the steady state distribution of Xjϕ for server j, j ∈ +, under ϕ ∈ Φ, where πjϕ(i), i ∈ j, is the steady state probability of state i. For clarity of presentation, extend vector πjϕ, to a vector of length I, written πjϕ, of which the ith element is πjϕ(i), if i ∈ j, and otherwise, 0. The long-run expected value of Zϕ(t) is Σj=1K+1jϕ/(K+1). In the server farm embodiment defined in Section I, the long-run expected value of Zϕ(t) should be a member of the set

Z = { z R I | i { 0 , 1 } { 0 } z i 1 , i { 0 , 1 } { 0 } , z i 0 } . ( 17 )

Define q1(z, zi, zi′), and q0(z, zi, zi′), z ∈ Z, i ∈ {0, 1}{0}, as the average transition rate of the ith element in vector z from zi to zi′, under tagged and untagged action, respectively. Then, the average transition rate of the ith element of z under policy ϕ is given by


qϕ(z, zi, zi′)=uiϕ(z)q1(z, z, zi′)+(1−ujϕ(z))q0(z, zi , zi′).   (18)

Consider the following differential equation for a stochastic process, denoted by:

dz i φ ( t ) dt = z i [ z i ( t ) q φ ( z φ ( t ) , z i , z i φ ( t ) ) - z i φ ( t ) q φ ( z φ ( t ) , z i φ ( t ) , z i ) ] . ( 19 )

Because of the global balance at an equilibrium point of limt→+∞0tzϕ(u)du/t, if exists, denoted by zϕ, dzϕ(t)/dt|zϕ(t)=zϕ=0. Let OPT represent the optimal solution of the relaxed problem (7) and recall that index represents the Whittle's index policy. Since uiindex(zindex)=uiOPT(zindex), following the proof of Theorem 2 in R. R. Weber and G. Weiss, “On an index policy for restless bandits” J. Appl. Probab., no. 3, pp. 637-648, September 1990, it can be determined that dzOPT(t)/dt|zOPT(t)=zindex=0 and zindex=zOPT, if both zindex and zOPT exist. The existence of zindex and zOPT will be discussed later.

For a small δ>0, define Rδ,ϕ to be the average reward rates during the time period that |zϕ(t)−zϕ|≤δ under policy ϕ, and

R m K = sup φ lim sup t + R ( X φ ( t ) ) K < +

is an upper bound of the absolute value of the reward rate divided by K. Then,

γ LR K - γ OR ( index ) K lim δ 0 lim t + 1 t 0 t [ R m K P { z OPT ( u ) - z OPT > δ } + R m K P { z index ( u ) - z index > δ } + R _ δ , OPT K P { z OPT ( u ) - z OPT δ } - R _ δ , index K P { z index ( u ) - z index δ } ] du . ( 20 )

The server farm is decomposed into {tilde over (K)} server groups, with number of servers in the ith group denoted by Ki, i=1, 2, . . . {tilde over (K)}. Then, K=Σδ=1KKi.

Based on the proof provided in R. R. Weber and G. Weiss, “On an index policy for restless bandits,” J. Appl. Probab., no. 3, pp. 637-648, September 1990, for any Ki=Ki0n, Ki0=1, 2, . . . , {tilde over (K)}, n=1, 2, . . . , δ>0 and ϕ is set to be either index or OPT,

lim n + lim t + 1 t 0 t P { Z φ ( u ) - z φ ( u ) > δ } du = 0 ( 21 )

Then, as n→+∞, the existence of an equilibrium point of limt→+∞0tZϕ(u)du/t leads to the existence of zϕ=limt→+∞0tzϕ(u)du/t (using the Lipschitz continuity of the right side of Equation (17) as a function of zϕ(t)). The following is obtained:

lim n + lim δ 0 ( R _ δ , OPT K - R _ δ , index K ) = 0 , and ( 22 ) lim n + ( γ LR K - γ OR ( index ) K ) = 0. ( 23 )

Finally, γOR(index)/K−γOR/K→0 as n→+∞, that is, MAIP (Whittle's index policy) approaches the optimal solution in terms of energy efficiency as the number of servers in each server group tends to infinity at the appropriate rate.

IV—Numerical Results

In this section, the performance of the MAIP method of the present embodiment is evaluated by extensive numerical results obtained by simulation. All results in the following are presented in the form of an observed mean from multiple independent runs of the corresponding experiment. The confidence intervals at the 95% level based on the Student's t-distribution are maintained within ±5% of the observed mean. For convenience of describing the results, given two numerical quantities x>0 and y>0, the relative difference of x to y is defined as (x−y)/υ.

In all experiments performed, a system of servers divided into three server groups were utilized. Servers in each server group i, i=1, 2, 3, have the same buffer size, service rate, and energy consumption rate, denoted by Bii, εx and εi0, respectively. This can be considered as a realistic setting since in practice a server farm is likely to comprise multiple servers of the same type purchased at a time. If not otherwise specified, in the present embodiment job sizes are assumed to be exponentially distributed. Recall that the job throughput is the average job departure rate (jobs per second), the power consumption is the average energy consumption rate (Watt), and the energy efficiency is the ratio of job throughput to power consumption (jobs per Watt second). Also, the average job size has been normalized to one (Byte).

A. Effectiveness of Idle Power

To demonstrate the effect of idle power on job assignment, the following compares the MAIP method with a baseline method. The baseline method used is the “Most energy-efficient available server first Neglecting Idle Power” (MNIP) job assignment method. As its name suggests, the MNIP method neglects idle power and hence treats εj0−0 for all j ∈ K in the process of selecting servers for job assignment. The following compares MAIP with MNIP in terms of energy efficiency, job throughput, and energy consumption rate under various system parameters.

For the set of experiments in FIGS. 4A-4C, each server group has 15 servers, and where Bi=1.0, εj0j=0.4i−0.3 for i=1, 2, 3. μj and εj are randomly generated with μ1=6.80, ε1=6.86, μ2=3.64, ε23.72, μ3=2.87, and ε3=3.15. The normalized offered traffic ρ is varied from 0.01 to 0.9.

FIGS. 4A-4C show, by simulation, the comparison of MAIP method to MNIP method in terms of energy efficiency (FIG. 4A), job throughput (FIG. 4B), and energy consumption rate (FIG. 4C). It can be observed from FIG. 4B that in all cases, both methods have almost the same performance in job throughput. It can also be observed from FIGS. 4A and 4C that, in the case where ρ→0 and ρ→1, the two methods are close to each other in terms of both energy efficiency and power consumption. This may be because in such trivial and extreme cases the system is at all times either almost empty or almost fully occupied. However, in the realistic cases where ρ is not too large and not too small, MAIP significantly outperforms MNIP with a gain of over 45% in energy efficiency at ρ=0.4, 0.5.

The same settings as that for obtaining the results in FIGS. 4A-4C are used to obtain FIGS. 5A-5C, except that for FIGS. 5A-5C, ρ is fixed at 0.6 and the number of servers K is varied from 3 to 690. In this example, K is increased by increasing the number of servers in each of the three server groups. It can be observed from FIG. 5B that, under such a medium traffic load, the service capacity is sufficiently large, so that with both MAIP and MNIP almost all jobs can be admitted and hence the job throughput is almost identical to the arrival rate for all values of K. As a result, the job throughput increases almost linearly with respect to the number of servers K. On the other hand, it can be observed from FIG. 5C that, for both methods, the power consumption also increases almost linearly with respect to the number of servers K. As seen in FIG. 5A, however, the power consumption of MAIP increases at a significantly smaller rate than that of MNIP, which results in a substantial improvement of the energy efficiency by nearly 36% in all cases.

For the set of experiments for FIGS. 6A-6C, each server group has 15 servers, and where Bi=10 for i=1, 2, 3. A parameter β is introduced, where different values of β lead to different levels of server heterogeneity. In the present example, three different values for β are considered, i.e., β=0.5, 1, 1.5. Set εi0i=(0.4i−0.3) β for i=1, 2, 3. The set of service rates μj are randomly generated from the range [1, 10] and are arranged in a non-increasing order, i.e. μ1≥μ2≥μ3. For the set of energy consumption rates εj, two real numbers α1 and α2 are chosen randomly from [0.5, 1]. Then, with μ11=200, set μiii−1βμi−1i−1 for i=2, 3.

The results of FIGS. 6A-6C are obtained from 1000 experiments and are plotted in the form of cumulative distribution of the relative difference of MAIP to MNIP in terms of the energy efficiency. It can be observed from FIGS. 6A-6C that MAIP significantly outperforms MNIP by up to 60%. It can also be observed from FIGS. 6A and 6B that MAIP outperforms MNIP by more than 10% in nearly 100% of the experiments for the case β=0.5. In addition, it can be observed from FIGS. 6A-6C that as the level of server heterogeneity (i.e., the value of β) becomes higher, the performance improvement of MAIP over MNIP in general becomes larger, although the gain is decreasing when the normalized offered traffic ρ approaches 0.8, similar to what is observed in FIG. 4A.

B. Effect of Jockeying Cost

MAIP in the present embodiment is designed as a non-jockeying policy, which is more appropriate than jockeying policies for job assignment in a large-scale server farm. In general, jockeying policies suit a small server farm where the cost associated with jockeying is negligible. In large-scale systems, the cost associated with jockeying can be significant and may have a snowball effect on the system performance. The following demonstrates the benefits of MAIP in a server farm where jockeying costs are high, by comparing it with a jockeying policy known as Most Energy Efficient Server First (MEESF) proposed in J. Fu, J. Guo, E. W. M. Wong, and M. Zukerman, “Energy-efficient heuristics for insensitive job assignment in processor sharing server farms,” IEEE J. Sel. Areas Commun., vol. 33, no. 12, pp. 2878-2891, December 2015.

The settings of servers in each of the three server groups are based on the benchmark results of Dell PowerEdge rack servers R610 (August 2010), R620 (May 2012) and R630 (April 2015). Specifically, μ3 and ε3 is normalised to one, and the following settings are applied: μ13=3.5, ε13=1.2, ε101=0.2, μ23=1.4, ε22=0.2, and ε303=0.3. Also set Bj=10 for i=1, 2, 3, and ρ=0.6. The number of servers K is varied from 3 to 270, where K is increased by increasing the number of servers in each of the three server groups.

In the present example, assume that each jockeying action incurs a (constant) delay Δ. That is, when a job is reassigned from server i to server j, it will be suspended for a period Δ before resumed on server j. Clearly, when Δ>0, this is equivalent to increasing the size of the job and hence its service requirement. Accordingly, for a given system, a non-zero cost per jockeying action indeed increases the traffic load. In the present example, three different values of Δ are considered. The case where Δ=0 is for zero jockeying cost, the case where Δ=0.0005 indicates a relatively small cost per jockeying action, and the case where Δ=0.01 represents a large cost per jockeying action. The results are presented in FIGS. 7A-7C for the energy efficiency and in FIGS. 8A-8C for the job throughput.

For the case where Δ=0, it can be observed that FIG. 8A is similar to FIG. 5B. That is, under a medium traffic load, the service capacity is sufficiently large, so that both MAIP and MEESF yield a job throughput that is almost identical to the arrival rate for all values of K. It can be observed from FIG. 7A that, in this case, MEESF consistently outperforms MAIP in terms of the energy efficiency, even though for a very small margin.

For the case where Δ=0.0005, it can be observed from FIG. 8B that, since the cost per jockeying action is relatively small, the service capacity is sufficiently large so that the job throughput of MEESF is not affected. It can also be observed from FIG. 7B that, when the number of servers K is small, the energy efficiency of MEESF is better than that of MAIP, but when the number of servers K is large, the energy efficiency of MEESF is clearly degraded. This is because, as K increases, the number of jockeying actions required for a job on average increases. With a non-zero cost per jockeying action, it can substantially increase the power consumption, since it is required to use more of those less energy-efficient servers to meet the increased traffic load.

The effect is more profound when Δ is increased to 0.01. In this case, as shown in FIGS. 7C and 8C, the cost associated with jockeying is so high that both the job throughput and the energy efficiency of MEESF are significantly degraded, due to the substantially increased traffic load.

C. Sensitivity to Job-Size Distributions

The workload characterizations of many computer science applications, such as Web file sizes, IP flow durations, and the lifetimes of supercomputing jobs, are shown to exhibit heavy-tailed Pareto distributions. To determine whether the performance of MAIP is sensitive to the job-size distribution, three different distributions, in addition to the exponential distribution, are considered in the following. These distributions are deterministic, Pareto with the shape parameter set to 2.001 (Pareto-1 for short), and Pareto with the shape parameter set to 1.98 (Pareto-2 for short). In all cases, the mean was set to be one.

The same settings as that for obtaining the results in FIGS. 6A-6C are used to obtain FIGS. 9A-9C, with β=1. In each experiment, the energy efficiencies for MAIP in each case are obtained, and the relative differences of the one using each corresponding distribution to the one using the exponential distribution are computed. FIGS. 9A-9C plot the cumulative distribution of the relative difference results obtained from the 1000 experiments for each particular value of the normalized offered traffic. It can be observed from FIGS. 9A-9C that all relative difference results are between −5% and 0. As the confidence intervals of these simulation results are maintained within ±5% of the observed mean with a 95% confidence level, the energy efficiency of MAIP is not very sensitive to job-size distribution.

V—Conclusion

The embodiments of the MAIP job assignment method in the present invention as broadly described above address job assignment problem in a server farm comprising multiple processor sharing servers with different service rates, energy consumption rates and buffer sizes. The MAIP method in embodiments of the present invention takes into account of idle power, and can maximize the energy efficiency of the entire system, defined as the ratio of the long-run average throughput to the long-run average energy consumption rate, by effectively assigning jobs/requests to these servers.

Advantageously, the MAIP method only requires information of full/non-full states of servers, and can be implemented by using a binary variable for each server. Also, this method does not require any estimation or prediction of average arrival rate. MAIP has been proven to approach optimality as the numbers of servers in server groups tend to infinity and when job sizes are exponentially distributed. This asymptotic property is particularly appropriate to a large-scale server farm that is likely to purchase and upgrade a large number of servers with the same style and attributes at the same time. Also, the MAIP method is highly energy efficient in cases of exponential and Pareto job-size distributions, and so it is suitable for a server farm with highly varying job sizes. MAIP is also more appropriate than MEESF for a server farm with non-zero jockeying cost, and it is useful for a real large-scale system which has significant cost for job reassignment. Various other advantages of the methods of the present invention can be determined by a person skilled in the art upon considering the above description and the referenced drawings.

Although not required, the embodiments described with reference to the Figures can be implemented as an application programming interface (API) or as a series of libraries for use by a developer or can be included within another software application, such as a terminal or personal computer operating system or a portable computing device operating system. Generally, as program modules include routines, programs, objects, components and data files assisting in the performance of particular functions, the skilled person will understand that the functionality of the software application may be distributed across a number of routines, objects or components to achieve the same functionality desired herein.

It will also be appreciated that where the methods and systems of the present invention are either wholly implemented by computing system or partly implemented by computing systems then any appropriate computing system architecture may be utilized. This will include stand-alone computers, network computers and dedicated hardware devices. Where the terms “computing system” and “computing device” are used, these terms are intended to cover any appropriate arrangement of computer hardware capable of implementing the function described.

It will be appreciated by persons skilled in the art that numerous variations and/or modifications may be made to the invention as shown in the specific embodiments without departing from the spirit or scope of the invention as broadly described. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive.

Any reference to prior art contained herein is not to be taken as an admission that the information is common general knowledge, unless otherwise indicated.

Claims

1. A method for operating a server farm with a plurality of servers operably connected with each other, the method comprising the steps of:

receiving a job request of a computational task to be handled by the server farm;
determining, from the plurality of servers, one or more servers operable to accept the job request;
determining a respective effective energy efficiency value associated with at least the one or more servers; and
assigning the computational task to a server with the highest effective energy efficiency value;
wherein the effective energy efficiency value is defined by: a service rate of the respective server divided by a difference between an energy consumption rate value when the respective server is busy and an energy consumption rate value when the respective server is idle.

2. The method in accordance with claim 1, further comprising the step of:

sorting the one or more servers according to the respective determined effective energy efficiency values.

3. The method in accordance with claim 1, wherein the step of determining from the plurality of servers one or more servers operable to accept the job request comprises determining, from the plurality of servers, all servers operable to accept the job request.

4. The method in accordance with claim 1, wherein the plurality of servers cannot be powered off during operation of the server farm.

5. The method in accordance with claim 1, wherein assignment of computation tasks in the server farm is substantially independent of an arrival rate of computation tasks at the server farm.

6. The method in accordance with claim 1, wherein assignment of computation tasks in the server farm is substantially independent of a respective size of the computation tasks received at the server farm.

7. The method in accordance with claim 1, wherein the plurality of servers each includes a finite buffer for queuing job requests.

8. The method in accordance with claim 7, wherein the one or more servers operable to accept the job request each has at least one vacancy in their respective buffer.

9. The method in accordance with claim 1, wherein the server farm is heterogeneous in that the plurality of servers can have different server speeds, energy consumption rates, and/or buffer sizes.

10. The method in accordance with claim 1, wherein the server farm is a non-jockeying server farm in which computational task being handled by one of the plurality of servers cannot be reassigned to other servers.

11. A server farm comprising:

a plurality of servers operably connected with each other;
one or more processor operably connected with the plurality of server, the one or more processor being arranged to:
receive a job request of a computational task to be handled by the server farm;
determine, from the plurality of servers, one or more servers operable to accept the job request;
determine a respective effective energy efficiency value associated with at least the one or more servers; and
assign the computational task to a server with the highest effective energy efficiency value;
wherein the effective energy efficiency value is defined by: a service rate of the respective server divided by a difference between an energy consumption rate value when the respective server is busy and an energy consumption rate value when the respective server is idle.

12. The server farm in accordance with claim 11, wherein the one or more processor is further operable to:

sort the one or more servers according to the respective determined effective energy efficiency values.

13. The server farm in accordance with claim 11, wherein the one or more processor is further operable to: determine, from the plurality of servers, all servers operable to accept the job request.

14. The server farm in accordance with claim 11, wherein the plurality of servers cannot be powered off during operation of the server farm.

15. The server farm in accordance with claim 11, wherein the one or more processor is arranged such that assignment of computation tasks in the server farm is substantially independent of an arrival rate of computation tasks at the server farm.

16. The server farm in accordance with claim 11, wherein the one or more processor is arranged such that assignment of computation tasks in the server farm is substantially independent of a respective size of the computation tasks received at the server farm.

17. The server farm in accordance with claim 11, wherein the plurality of servers each includes a finite buffer for queuing job requests; and wherein the one or more servers operable to accept the job request each has at least one vacancy in their respective buffer.

18. The server farm in accordance with claim 11, wherein the server farm is heterogeneous in that the plurality of servers can have different server speeds, energy consumption rates, and/or buffer sizes.

19. The server farm in accordance with claim 11, wherein the server farm is a non-jockeying server farm in which computational task being handled by one of the plurality of servers cannot be reassigned to other servers.

20. The server farm in accordance with claim 11, wherein the one or more processors are incorporated in at least one of the plurality of servers.

Patent History
Publication number: 20180101416
Type: Application
Filed: Oct 11, 2016
Publication Date: Apr 12, 2018
Inventors: Jing Fu (New Territories), William Morgan (Balwyn), Jun Guo (New Territories), Moshe Zukerman (New Territories), Wing Ming Eric Wong (Kowloon)
Application Number: 15/290,106
Classifications
International Classification: G06F 9/50 (20060101);