MANAGING POOLS OF DYNAMIC RESOURCES

Info

Publication number: 20100083272
Type: Application
Filed: Oct 1, 2008
Publication Date: Apr 1, 2010
Patent Grant number: 9875141
Applicant: MICROSOFT CORPORATION (Redmond, WA)
Inventors: Joseph L. Hellerstein (Seattle, WA), Eric Lynn Eilebrecht (Woodinville, WA), Vance Morrison (Kirkland, WA), Paul Ringseth (Bellevue, WA)
Application Number: 12/243,859

Abstract

Computer systems attempt to manage resource pools of a dynamic number of similar resources and work tasks in order to optimize system performance. Work requests are received into the resource pool having a dynamic number of resources instances. An instance-throughput curve is determined that relates a number of resource instances in the resource pool to throughput of the work requests. A slope of a point on the instance-throughput curve is estimated with stochastic gradient approximation. The number of resource instances for the resource pool is selected when the estimated slope of the instance-throughput curve is zero.

Description

Description

BACKGROUND

Resource pools can include a dynamic number of similar resources that are used to perform computing tasks or work requests. Each resource in the resource pool at a given time can be described as a resource instance, and the resource instances together make up the resource pool. Work requests enter the resource pool as a result of an application program interface call or other examples. When there are more work requests than resource instances available to do the requested work, as is often the case, the work requests are arranged in a work queue. As soon as a resource instance completes the work on a work request, the resource instance is made available for another work request in the queue. Examples of resource management systems include the .NET thread pool, server pools in the Tivoli Workload scheduler, and HP Global Workload Manager, and application instances in IBM WebSphere.

There are many examples of resource instances and work requests in computing. Such examples can include servers and applications. One particular example is the use of threads in a thread pool to process work requests in concurrent programs. Often, each thread can be assigned to a different processor or processing core. Also, multiple threads assigned to a single processor or processing core can be time multiplexed through the processor. A resource manager can be used to create threads when the amount of work is too high, or it can be used to destroy or make idle threads when the amount of work is too low.

There have been attempts to address the issue of finding the proper amount of resource instances to perform a given set of tasks. One early model was to develop techniques that keep all of the processors fully realized. In most cases, however, the designer will not know what resources will create a bottleneck in throughput and will not be able to compensate for it. System designers also use algorithms to determine whether resource instances, such as threads, can be created, made idle, or destroyed to optimize performance. Such algorithms are typically primarily focused on the amount of resource instances made available to perform the work requests in a queue.

In a typical model, however, the rate of work completion is related to both the nature of the work itself and the amount of resource instances available, both of which can be unknown beforehand and constantly change. Although many systems incorporate resource pools with resource instances that are managed dynamically, there remains a need to improve management in many of these systems.

SUMMARY

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

One embodiment provides a method of managing a resource pool of dynamic resource instances. Work requests are received into the resource pool having a dynamic number of resources instances. An instance-throughput curve is determined that relates a number of resource instances in the resource pool to throughput of the work requests. A slope of a point on the instance-throughput curve is estimated with stochastic gradient approximation. The number of resource instances for the resource pool is selected when the estimated slope of the instance-throughput curve is zero.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings are included to provide a further understanding of embodiments and are incorporated in and constitute a part of this specification. The drawings illustrate embodiments and together with the description serve to explain principles of embodiments. Other embodiments and many of the intended advantages of embodiments will be readily appreciated, as they become better understood by reference to the following detailed description. The elements of the drawings are not necessarily to scale relative to each other. Like reference numerals designate corresponding similar parts.

FIG. 1 is a block diagram illustrating one of many possible examples of computing devices implementing the features of the present disclosure.

FIG. 2 is a block diagram illustrating an example resource management system in the example computing system of FIG. 1.

FIG. 3 is a block diagram illustrating an example resource instance controller of the resource management system of FIG. 2.

FIG. 4 is a flow diagram illustrating an example method for use with the resource management system of FIG. 2.

DETAILED DESCRIPTION

In the following Detailed Description, reference is made to the accompanying drawings, which form a part hereof, and in which is shown by way of illustration specific embodiments in which the invention may be practiced. It is to be understood that other embodiments may be utilized and structural or logical changes may be made without departing from the scope of the present invention. The following detailed description, therefore, is not to be taken in a limiting sense, and the scope of the present invention is defined by the appended claims. It is also to be understood that features of the various exemplary embodiments described herein may be combined with each other, unless specifically noted otherwise.

FIG. 1 illustrates an exemplary computer system that can be employed as an operating environment includes a computing device, such as computing device 100. In a basic configuration, computing device 100 typically includes a processor architecture having at least two processing units (i.e., processors 102) and memory 104. Depending on the exact configuration and type of computing device, memory 104 may be volatile (such as random access memory (RAM)), non-volatile (such as read only memory (ROM), flash memory, etc.), or some combination of the two. This basic configuration is illustrated in FIG. 1 by dashed line 106. The computing device can take one or more of several forms. Such forms include a personal computer, a server, a handheld device, a consumer electronic device (such as a video game console), or other.

Computing device 100 can also have additional features/functionality. For example, computing device 100 may also include additional storage (removable and/or non-removable) including, but not limited to, magnetic or optical disks or solid state memory, or flash storage devices such as removable storage 108 and non-removable storage 110. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any suitable method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Memory 104, removable storage 108 and non-removable storage 110 are all examples of computer storage media. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile discs (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, universal serial bus (USB) flash drive, flash memory card, or other flash storage devices, or any other medium that can be used to store the desired information and that can be accessed by computing device 100. Any such computer storage media may be part of computing device 100.

Computing device 100 includes one or more communication connections 114 that allow computing device 100 to communicate with other computers/applications 115. Computing device 100 may also include input device(s) 112, such as keyboard, pointing device (e.g., mouse), pen, voice input device, touch input device, etc. Computing device 100 may also include output device(s) 111, such as a display, speakers, printer, etc.

The computing device 100 can be configured to run an operating system software program and one or more software applications, which make up a system platform. In one example, the computing device 100 includes a software component referred to as a managed environment. The managed environment can be included as part of the operating system or can be included later as a software download. The managed environment typically includes pre-coded solutions to common programming problems to aid software developers to create software programs such as applications to run in the managed environment, and it also typically includes a virtual machine that allows the software applications to run in the managed environment so that the programmers need not consider the capabilities of the specific processors 102.

FIG. 2 illustrates an example resource management system 200 that can operate within the computing device 100. The system 200 includes a resource pool 202 operably coupled to a type of resource management, for example a resource instance controller 204. The resource pool 202 includes a dynamic number of resource instances 206, such as the illustrated resource instances 1 to N, where the number of resource instances N can continuously change in response to work requests 208. Work requests 208 enter the resource pool 202 as a result of an application program interface call (such as QueueUserWorkItem( ) in the NET ThreadPool, available from Microsoft Corp.), message receipt, or other means. The work requests 208 are placed in a work queue 210. A work request 208 waits in the work queue 210 until a resource instance 206 is available to execute the work request. Once the work request 208 is executed, a completed work 212 is passed from the resource pool 202.

The resource instance controller dynamically adjusts the number N of resource instances 206 based on two competing concerns. A first concern is that having too few resource instances 206 causes additional wait time for work requests 208. A second concern is that having too many resource instances 206 is inefficient because of overheads for each resource instance 206. Also, too many resource instances can reduce throughputs. For example, having too many threads can cause memory thrashing, excessive content switching, or both. In order for the resource instance controller 204 to dynamically adjust the number N of resource instances 206 in the example, the resource pool 202 provides measurement information to the resource interface controller 204 through a measurement interface 214. Such measurement information can include work queue counts, actual resource instances, and throughput, which can be defined as the number of work completions per unit of time in one example. The resource instance controller 204 dynamically adjusts the desired number of resource instances 206 (i.e., provides control settings) through a control interface 216 on the resource pool 202.

The resource instance controller 204 receives the measurement data and implements a process to dynamically adjust the desired number of resource instances. This process faces the problem of scheduling without a detailed knowledge of the resources involved. One extreme approach to solving this problem is to assume knowledge of the resources involved, which is difficult to implement because this information is difficult to obtain in practice. Another extreme is to perform extensive searching, such as recursive searching, to determine the knowledge of the resources involved. Despite this effort, there is no way to obtain this information in a reliable way.

One embodiment of resource instance controller 204 makes an assumption that there is a unimodal relationship between the number N of resources instances 206 and throughput, or the instance-throughput curve. A unimodal instance-throughput curve is broadly consistent with observed data for throughputs in virtual memory systems in general and in the NET thread pool in particular. The resource instance controller 204 can exploit the unimodal structure of the resource-instance curve to determine a new control setting that is provided to the resource pool 202.

Given the discovered nature of the instance-throughput curve, one example of the resource instance controller 204 uses stochastic gradient approximation to exploit the unimodal structure of the instance-throughput curve. The resource instance controller in this example employs stochastic gradient approximation to optimize the number of resource instances 206 in the resource pool 202. Stochastic gradient approximation optimizes a stochastic function such as the instance-throughput curve using deterministic techniques. With stochastic gradient approximation, the resource instance controller 204 uses the measurement information from the measurement interface 214 to estimate the slope of the instance-throughput curve at a point corresponding with number N of resource instances 206 in the resource pool 202. The resource instance controller 204 makes adjustments to the number N of resource instances and re-estimates the slope of the instance-through put curve until the slope is estimated to be zero. A zero slope indicates that the number N of resource instances 206 optimizes throughput.

The number of resource instances at the point on the instance-throughput curve where the slope is zero indicates resource optimization in an empirically-supported assumption that the curve is unimodal. One type of unimodal curve is concave. Stochastic gradient approximation is employed in one embodiment because throughput is largely stochastic rather than deterministic as resource instances are added or taken away.

The effectiveness of stochastic gradient approximation largely depends on accurately estimating the slope of the instance-throughput curve. Several problems with accurately estimating the slope have been determined to potentially arise in practice. A first known problem is that measurements can be stationary but have high variability due to variations in execution times, the number of work completions, and other factors. A second known problem is that the shape of the instance-throughput curve can change over time due to changes in workloads, resources, or both. A third known problem is that the resource instance controller 204 itself introduces variability in that it changes the number of resource instances to maximize throughput.

FIG. 3 illustrates an example resource instance controller 204 that can be employed to apply stochastic gradient approximation and address at least the above three problems that can arise in employing stochastic gradient approximation. The resource instance controller 204 includes a state 302, which is used to retain information to be used in invocations of the resource instance controller. The state 302 is coupled to several components such as an input filter 304, a condition checker 306, a state updater 308, and an action taker 310.

The input filter 304 is coupled to the measurement interface 214 of the resource pool 202, and receives the measurement information. The input filter 304 includes a mechanism to avoid using measurements when the resource pool is in transition. The input filter 304 compares the desired number of resource instances 206 specified in the last control action with the actual number of resources instances as measured the resource pool measurement interface 214. If the actual number of resource instances is less than the desired number of resource instances and the work queue 210 is not empty, then the measurement is discarded because the resource pool 202 is in transition to the desired number of resource instances. The measurement is also discarded if the desired number of resources is less than the actual amount of resources. Thus, the mechanism of the input filter 304 is used to address the problem of the resource instance controller 204 introduces variability in that it changes the number of resource instances to maximize throughput.

The input filter 304 can also include at least two additional mechanisms. One mechanism detects changes in the instance-throughput curve. The mechanism tests if a throughput measurement, such as work completions, measured at a time t_n+1is significantly different from measurements at times t₁. . . , t_nat the same control setting, such as desired resource instances. In one example, the mechanism uses the techniques of change-point detection, which is a statistical technique used in manufacturing, to detect changes in the instance-throughput curve. Another mechanism removes additional transients due to control actions. In practice, there may be a warm-up or cool-down period in the resource pool 202 even after it has instantiated the desired number of resource instances. Accordingly, more recent throughput measurements can differ significantly from earlier measurements for the same control setting. The input filter 304 can eliminate, or reduce the significance of, the earlier measurements. Thus, the additional two mechanisms in the input filter are used to address the problem of the shape of the instance-throughput curve can change over time due to changes in workloads, resources, or both.

The condition checker 306 is coupled the input filter 304. The condition checker implements a state machine to determine how the resource instance controller 204 adjusts the number of resource instances 206. In one example, the state machine includes an initializing state and a climbing state. In the initializing state, the resource instance controller 204 establishes one point for a first tangent line, which is tangent to the instance-throughput curve. In the climbing state, the resource instance controller 204 establishes a second point for a second tangent line on the instance-throughput curve. The states are determined by the number of control settings for which there are throughput measurements. For the initializing state, there is only one current control setting. For the climbing state, there is a current and previous control setting. If the resource instance controller 204 is in the climbing state and there is a significant difference between the current and previous control setting, then stochastic gradient approximation is employed to establish a new current control setting.

The condition checker 306 can be employed to address at least two of the know problems described above. The problem of measurements being stationary but having high variability due to variations in execution times, the number of work completions, and other factors is addressed because the resource instance controller 204 can remain in the initializing state until there is a sufficiently small variance for the mean throughputs at the current control setting. The resource instance controller does not attempt stochastic gradient approximation until there is a sufficiently small variance for the mean throughput at the current control setting. The condition checker 306 can also be used to address that the shape of the instance-throughput curve can change over time because the resource instance controller 204 can delete all history and return to the initializing state when it detects a change point.

The state updater 308 is coupled to the condition checker 306, and receives and provides information to the state 302. The state updater 308 includes a mechanism the updates information retained between invocations of the resource instance controller 204. This information includes measurement histories and control setting histories.

The action taker 310 is coupled to the state updater and provides an output to the control interface 216 of the resource pool 202. The action taker 310 is used to determine the new control setting. In one example, the new control setting is calculated using stochastic gradient approximation. The calculation estimates the slope of the instance-throughput curve that relates resource instances to throughputs based on the throughput measurements in the most recent history, or previous history, and the current throughput measurement, or the current history. Employing stochastic gradient approximation, a new control setting is calculated. The action taker 310 also assesses the performance achieved at a control setting. If comparable performance is achieved at two different control settings, then the resource interface controller 204 takes an action that minimizes the number of resource instances 206.

FIG. 4 illustrates an example method 400 of managing the resource pool 202 of resource instances 206, which method can be embodied in the resource instance controller 204 of FIG. 2. As work requests 208 are received into the resource pool 202, information regarding the resource pool is included in measurement information that is provided at 402 to the resource instance controller 204. In one example, the resource instance controller 204 is provided with periodic measurement information updates. The measurement information updates can be retained as a measurement history, which can include a sequence of updates of measurement information related to control settings. Examples of a measurement history can include measurement information related to a current control setting, a previous control setting, and so on.

Based on the measurement information, the resource instance controller 204 determines an instance-throughput curve at 404 relating the number of resource instances N in the resource pool 202 to throughput. In one example, the measurement history is used to determine the instance-throughput curve.

Stochastic gradient approximation is employed to estimate the slope of the curve, and is used to find the point where the slope is zero at 406. This point can be determined using the current control setting and the previous control setting. The point on the curve where the slope is zero corresponds with a selected number of resource instances estimated to maximize throughput. A new control setting is generated based on the selected number of resource instances at 408 and provided to the resource pool 202.

Although specific embodiments have been illustrated and described herein, it will be appreciated by those of ordinary skill in the art that a variety of alternate and/or equivalent implementations may be substituted for the specific embodiments shown and described without departing from the scope of the present invention. This application is intended to cover any adaptations or variations of the specific embodiments discussed herein. Therefore, it is intended that this invention be limited only by the claims and the equivalents thereof.

Claims

1. A computer-readable storage medium storing computer-executable instructions for controlling a computer system, the computer-executable instructions comprising:

a resource instance controller configured to:

manage a resource pool of dynamic resource instances in response to an amount of work requests;

retain a measurement history of a current control setting including:

a current throughput measurement for the resource pool, and a previous control setting including a previous throughput measurement for the resource pool;

relate the resource instances in the resource pool to throughput of the resource pool as an instance-throughput curve;

assume a unimodal structure of the instance-throughput curve;

exploit the unimodal structure of the instance-throughput curve to compute a new control setting for the resource pool, wherein the new control setting is computed with the current throughput measurement and the previous throughput measurement; and

provide the new control setting to the resource pool.

2. The computer readable storage medium of claim 1 wherein the resource instance controller interfaces with the resource pool through a measurement interface configured to provide measurement information to the resource instance controller and through a control interface configured to provide the new control setting to the resource pool.

3. The computer readable storage medium of claim 2 wherein the measurement information includes work queue counts, actual resource instances, and throughput.

4. The computer readable storage medium of claim 2 wherein the resource instance controller comprises:

an input filter configured to receive the measurement information and to avoid using the measurement information when the resource pool is in transition.

5. The computer readable storage medium of claim 4 wherein the input filter is configured to detect changes in the instance-throughput curve.

6. The computer readable storage medium of claim of claim 5 wherein the input filter is configured to detect changes in the instance-throughput curve with change point detection.

7. The computer readable storage medium of claim 2 wherein the resource instance controller comprises:

a condition checker configured to implement a state machine having an initializing state and a climbing state.

8. The computer readable storage medium of claim 1 wherein schochastic gradient approximation is used to exploit the unimodal structure of the instance-throughput curve to compute the new control setting.

9. The computer readable storage medium of claim 8 wherein the resource instance controller establishes in the initializing state a first point in a line tangent to the curve relating resource instances in the resource pool to throughput of the resource pool, and wherein the resource instance controller establishes in the climbing state a second point for a second line tangent to the curve.

10. A method of managing a resource pool of dynamic resource instances, the method comprising:

receiving a plurality of work requests into the resource pool having a dynamic number of resources instances;

determining an instance-throughput curve relating a number of resource instances in the resource pool to throughput of the work requests;

estimating a slope of a point on the instance-throughput curve with stochastic gradient approximation; and

selecting the number of resource instances when the estimated slope of the instance-throughput curve is zero for the resource pool.

11. The method of claim 10 wherein the plurality of work requests are placed in a work queue.

12. The method of claim 10 wherein the dynamic resource instances include similar resource instances.

13. The method of claim 12 wherein the similar resource instances include threads in a thread pool.

14. The method of claim 13 wherein the dynamic number of threads in the thread pool are added and taken away as a result of creating threads and destroying threads.

15. The method of claim 10 comprising:

assuming the instance-throughput curve is unimodal.

16. The method of claim 15 wherein the assumed unimodal instance-throughput curve is concave.

17. The method of claim 10 wherein the estimated slope of the instance-throughput curve is zero indicates that the number of resource instances optimizes throughput.

18. A managed environment operating on a computing device and configured to operate an application to manage a pool of dynamic resource instances, the managed environment comprising:

a resource pool including a dynamic number of resource instances having a work queue configured to receive a plurality of work requests, the resource pool configured to provide measurement information including throughput of the work requests related to a number of resource instances in the resource pool; and

a resource instance controller configured to receive the measurement information and to apply stochastic gradient approximation to manage the number of resource instances in the resource pool.

19. The managed environment of claim 18 wherein the resource instance controller calculates a throughput to instance curve based on the measurement information and applies the stochastic hill climbing on the throughput to instance curve.

20. The managed environment of claim 18 wherein applying stochastic gradient approximation optimizes the number of resource instances in the resource pool.