Kernel Policy Optimization for Computing Workloads

Info

Publication number: 20170262314
Type: Application
Filed: Mar 8, 2016
Publication Date: Sep 14, 2017
Patent Grant number: 10228973
Inventors: Yan Cui (Los Angeles, CA), Karthik Prasanna (Santa Monica, CA), Andres Rangel (Santa Monica, CA)
Application Number: 15/064,355

Abstract

In one embodiment, during execution of a current workload being processed by the kernel, the method searches policy storage for a similar workload that has been previously optimized, wherein the sensor is used to compare the current workload to workloads in the policy storage. When the similar workload is found, the method optimizes different parameter values in a parameter value range specified in the policy specification for the parameter while the current workload is being processed by the kernel, wherein a parameter setting for the similar workload is used in the optimizing. When the similar workload is not found, the method optimizes different parameter values based on the parameter value range specified in the policy specification for the parameter while the current workload is being processed by the kernel. The method then evaluates the optimizing based on the optimization target.

Description

Description

BACKGROUND

In computer systems, kernels may implement a small, fixed set of policies, such as disk input/output (I/O) scheduling policies. To provide flexibility, the kernel can expose many parameters that allow users to select a policy or adjust a specific setting of the policy. Adjustable parameters allow users to optimize the policy to achieve a desired performance. However, typically, users lack the domain knowledge of how to adjust the parameters optimally. In this case, most users often use default parameter settings, which may not be optimal for performing the workloads being processed by the computer system.

SUMMARY

In one embodiment, a method receives a specification that specifies a parameter to be optimized, a sensor that is used to evaluate workloads performed by a kernel of the computing device, and an optimization target. During execution of a current workload being processed by the kernel, the method searches policy storage for a similar workload that has been previously optimized, wherein the sensor is used to compare the current workload to workloads in the policy storage. When the similar workload is found, the method optimizes different parameter values in a parameter value range specified in the policy specification for the parameter while the current workload is being processed by the kernel, wherein a parameter setting for the similar workload is used in the optimizing. When the similar workload is not found, the method optimizes different parameter values based on the parameter value range specified in the policy specification for the parameter while the current workload is being processed by the kernel. The method then evaluates the optimizing based on the optimization target.

In one embodiment, a non-transitory computer-readable storage medium contains instructions, that when executed, control a computer system to be configured for: receiving a specification that specifies a parameter to be optimized, a sensor that is used to evaluate workloads performed by a kernel of the computing device, and an optimization target; during execution of a current workload being processed by the kernel, searching policy storage for a similar workload that has been previously optimized, wherein the sensor is used to compare the current workload to workloads in the policy storage; when the similar workload is found, optimizing different parameter values in a parameter value range specified in the policy specification for the parameter while the current workload is being processed by the kernel, wherein a parameter setting for the similar workload is used in the optimizing; when the similar workload is not found, optimizing different parameter values based on the parameter value range specified in the policy specification for the parameter while the current workload is being processed by the kernel; and evaluating the optimizing based on the optimization target.

In one embodiment, an apparatus includes: one or more computer processors; and a non-transitory computer-readable storage medium comprising instructions, that when executed, control the one or more computer processors to be configured for: receiving a specification that specifies a parameter to be optimized, a sensor that is used to evaluate workloads performed by a kernel of the computing device, and an optimization target; during execution of a current workload being processed by the kernel, searching policy storage for a similar workload that has been previously optimized, wherein the sensor is used to compare the current workload to workloads in the policy storage; when the similar workload is found, optimizing different parameter values in a parameter value range specified in the policy specification for the parameter while the current workload is being processed by the kernel, wherein a parameter setting for the similar workload is used in the optimizing; when the similar workload is not found, optimizing different parameter values based on the parameter value range specified in the policy specification for the parameter while the current workload is being processed by the kernel; and evaluating the optimizing based on the optimization target.

The following detailed description and accompanying drawings provide a better understanding of the nature and advantages of particular embodiments.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 depicts a simplified system for optimizing parameters for workloads being processed by a kernel according to one embodiment.

FIG. 2 depicts a more detailed example of an optimizer according to one embodiment.

FIG. 3A depicts an example of a data structure for a parameter specification that can be stored in a policy cache according to one embodiment.

FIG. 3B shows an example of the data structure that has been annotated and stored in the policy cache according to one embodiment.

FIG. 4 shows a more detailed example of the policy cache and the search engine according to one embodiment.

FIG. 5 depicts a simplified flowchart of a method for optimizing parameters according to one embodiment.

DETAILED DESCRIPTION

Described herein are techniques for a kernel policy optimization system. In the following description, for purposes of explanation, numerous examples and specific details are set forth in order to provide a thorough understanding of particular embodiments. Particular embodiments as defined by the claims may include some or all of the features in these examples alone or in combination with other features described below, and may further include modifications and equivalents of the features and concepts described herein.

Particular embodiments provide an optimizer that can optimally generate parameter settings while processing workloads of a computer system. The optimization may be performed in a kernel, such as a monolithic kernel, of the computer system. The optimizer may receive a policy specification via an application programming interface (API). The policy specification defines how optimizer can tune parameters for a workload. For example, the optimizer may select a policy and also tune the parameter settings for the policy when a workload is processed by the kernel. The optimizer also uses a policy cache that can store previous parameter settings for past workloads. The policy cache may store settings that have been fully optimized and can be reused. Also, the policy cache may store intermediate results that include settings that may not have been completely optimized to achieve the optimal parameter settings. However, by storing these intermediate results, the next time a similar workload runs, the optimizer can resume the optimization. Also, the optimizer uses a hierarchical search engine that can perform the optimization by searching for good parameter settings. In one embodiment, the hierarchical search engine operates at two levels where it first detects which workload is bottlenecked and then searches for a good parameter setting for that workload.

FIG. 1 depicts a simplified system 100 for optimizing parameters for workloads being processed by a kernel 102 according to one embodiment. System 100 includes a computer system 104 that includes kernel 102. Kernel 102, which may be referred to as an operating system, is processing operations from an application 106 that is being executed on kernel 102. Although one application is shown, it will be understood that kernel 102 may be processing operations for many applications 106. Additionally, computer system 104 includes hardware 112 (e.g., computer processors, storage, etc.) that kernel 102 runs on. Also, although one computer system 104 is shown, multiple computer systems may be optimized.

In one embodiment, kernel 102 may be a monolithic kernel. A monolithic kernel may be an operating system architecture where the entire operating system is working in the kernel space. The monolithic kernel can dynamically load and unload executable modules at runtime. In this case, one or more modules for an optimizer 110 are loaded into kernel 102. Although a monolithic kernel is described, other kernel architectures may be used, such as a micro kernel. In both kernels, modules can be loaded into them.

Kernel 102 may implement a set of policies, which may be fixed. When policies are fixed, allowing the parameters of the policies to be adjusted allows the performance of computer system 104 to be optimized. For example, policies may include I/O scheduling policies. The scheduling policies are used when kernel 102 schedules operations based on application 106 executing on kernel 102. In one example, scheduling policies include the deadline, complete fairness queuing (CFQ), and NOOP policies. These policies may have between 0 to 12 parameters that each can have different values. The policies may be kernel level or operating system level policies. In one embodiment, the policies at the kernel level are optimized, and not parameters at the application level.

Kernel 102 includes an optimizer 110 that can automatically optimize parameters for kernel 102. The parameters may select which policies are used and also be the parameters for a selected policy. Optimizer 110 optimizes parameters while workloads are being performed by kernel 102. A workload may be an application that is executing, such as an I/O scheduling operation for application 106. Optimizer 110 may optimize parameters for kernel 102 when the workload is executing. Optimizer 110 may first select a policy for the workload and then optimize the parameter values for the policy. If multiple policies do not exist, then optimizer 110 may just optimize parameter values. In one embodiment, optimizer 110 may select multiple policies and/or test multiple parameter values before selecting the optimal policy and parameter settings. This may also be referred to as a tuning process.

The different parts of optimizer 110 will now be described in more detail. FIG. 2 depicts a more detailed example of optimizer 110 according to one embodiment. Optimizer 110 includes an application programming interface (API) 202, parameters 204, a policy cache 206, and a search engine 208. Each of these entities may be kernel modules stored in kernel 102.

API 202 allows users using application 106 to describe how parameters can be optimized when workloads are executing. For example, using API 202, users can define metadata of the parameters that annotate the parameters with additional information, such as where the parameters are stored in memory (e.g., such that optimizer 110 can modify them) and the value ranges of the parameters. Optimizer 110 uses the value ranges to limit the search through different parameter values. Also, users can define sensors that capture the characteristics of the workloads, such as the average I/O request size, which optimizer 110 uses to identify workloads. Users may define optimization targets that optimizer 110 uses to measure and evaluate the performance of computer system 104, such as the number of I/O requests per unit of time can be used to measure the I/O scheduling performance. API 202 allows users to pass their domain knowledge to optimizer 110 such that it can automatically optimize the parameters during runtime.

When a user specifies parameter specifications via API 202, the values for parameters 204 may be stored in policy cache 206. Optimizer 110 may store different values for parameters 204 from the specifications received from API 202. The parameters can be used for any workloads that are processed by kernel 102. For example, optimizer 110 may select one of the policies based on a parameter that defines which policies can be used. Then, optimizer 110 may optimize the parameters in the policy while the workload executes. As will be described in more detail below, parameters 204 may be organized into subsystems of kernel 102. A subsystem may be a category of operations, a device, or part of kernel 102. For example, subsystems include a storage disk, printer, and I/O scheduling.

Optimizer 110 may optimize subsystems and is not workload specific. Each workload has the same set of kernel parameters optimized. However, the difference between workloads are they perform differently when optimizer 110 optimizes different parameters. For example, when application 106 is a word processing application that is being used to write a document, in the view of operating system, the performance is better if optimizer 110 optimizes the file writing subsystem instead of other subsystems (although optimizer 110 can still do that, but file writing is the bottleneck here.) But when a user watching video, which may use a lot of computer processing unit (CPU) resources, it makes more sense to optimize the CPU subsystem in this case. For these two cases, file writing workload and the watching video workload, in the view of kernel 102. they have the same set of parameters. However, optimizer 110 may optimize different subsystems with different parameter values.

Policy cache 206 is also used to store parameter settings that have been found during the optimization. This allows the re-use of optimal parameter settings when they are found. Workloads may be stable or repetitive over time. For example, kernel 102 may perform the same or similar workloads repetitively. When optimizer 110 finds a good policy and good parameter settings for a workload, optimizer 110 can store the policy and parameter settings in policy cache 206. This allows the re-use of optimal parameter settings when a similar workload is run again. A second use of policy cache 206 is to store an intermediate result before the search for a good parameter setting is finished for a workload. In one embodiment, since there are many settings to search, optimizer 110 may find a good setting only after the workload is repeated many times. Storing the intermediate settings for a policy in policy cache 206 saves time because optimizer 110 does not have to restart the optimization from the very beginning each time.

Search engine 208 may search for a policy and optimal parameters for a workload. A workload may be considered the same as an application 106. That is, kernel 102 is performing a workload for an application. In one embodiment, search engine 208 may be hierarchical. At a top level, search engine 208 uses a threshold to detect which kernel subsystem (e.g., a subsystem such as I/O scheduling or memory page replacement) to first optimize. For example, search engine 208 may first optimize a kernel subsystem that is considered a biggest bottleneck on the computer system (e.g., using the most computing resources). That is, this kernel subsystem is using more than a threshold of resources or is using the most resources out of the subsystems. Then, for the subsystem, search engine 208 performs the optimization process. For example, search engine 208 searches through policies to select one or more policies. Then, for one or more of the policies, search engine 208 searches for different values for the parameters. For each policy and parameter setting, search engine 208 measures the computer system's performance and selects the best-performing setting. Optimizer 110 may use an algorithm that performs the search through the parameter settings. Although selecting a policy is described, it will be understood that optimizer 110 may just optimize a parameter setting if a policy does not need to be selected.

In one embodiment, optimizer 110 may work best on steady, repeatable workloads. This gives optimizer 110 enough time to search through many policy and parameter settings to find an optimized setting. The search process may take time, which may make optimizing for flash workloads that last for a very short period of time difficult. If steady, repeatable workloads are optimized, then search engine 208 can have time to search for the appropriate policy and parameters to optimize the workloads. The more steady or repeatable a workload, the more likely search engine 208 can find settings for parameters for a policy for the workload. Many workloads are repeatable, especially in server environments because the events that cause flash workloads are rare. Flash workloads may be optimized also, however, since flash workloads are rare and do not reoccur frequent, search engine 208 does not leverage the ability to store intermediate results to optimize workloads that repeat.

Each portion of optimizer 110 will now be described in more detail. In order, the API, policy cache, and then the search engine will be described.

API

API 202 provides data structures and methods that can be used to specify parameter specifications. Users may specify parameters for kernel subsystems. This allows the user to not have to specify optimization parameters on the application level or for specific applications. For a parameter specification, one data structure describes the metadata of a parameter. For example, the parameter specification may be use to specify which policies can be used. Also, the parameter specification may be used to specify information for a parameter. This may include the unique name of the parameter, a kernel subsystem this parameter belongs to, the value range of the parameter, and how to adjust the value of the parameter in the search (e.g., linearly by adding or subtracting a number or exponentially by doubling or halving). Additionally, the data structure provides a setter method for setting the value of the parameter and a getter method for accessing the value of the parameter. Also, the data structure includes a field that points to where the parameter is stored in memory.

A user can enter information for the data structure using API 202. In one embodiment, a user may describe parameters that control the I/O scheduling policy for a storage disk. There may be three disk I/O scheduling policies and the minimum value and the maximum value of this parameter may be 0 and 2, respectively. Another parameter may control the maximum number of disk requests in a disk queue.

FIG. 3A depicts an example of a data structure 300 for a parameter specification that can be stored in policy cache 206 according to one embodiment. At 302, the unique name of the parameter is shown and at 304, the kernel subsystem this parameter belongs to is shown. At 306, the value range can be specified. At 308, how to adjust the value of the parameter in a search can be specified. For example, the search can be linear by adding or subtracting 1 or exponential by doubling or halving. At 310, a user can set how to access the value of the parameter in memory and also, at 312, how to set the value of the parameter.

API 202 allows a user to enter information for a second data structure at 313 that captures the dependencies between parameters. In some cases, a parameter may be active only when another parameter has a certain value. For example, a quantum parameter in a disk I/O subsystem is only active when the scheduler parameter is set to CFQ. Thus, it is useless to optimize the quantum parameter unless the scheduler is set to the value of CFQ. A user may specify the dependencies when providing a parameter in a second data structure.

At 314, a third data structure includes the methods that can be executed by search engine 208 in the optimization process to support a kernel subsystem. For example, at 315, the unique name of the subsystem is provided. This at 316, a get sensors method collects characteristics of a workload that can be used to distinguish between different workloads. The get sensors method may specify the size of the sensors and also the similarity thresholds that can define whether workloads are similar. The sensors hold the concrete values of the workload characteristics such as the average size of all I/O requests. The similarity threshold specifies a percentage variance such that two sets of values that are within the similarity threshold may be considered the same.

At 318, a second get-optimization target method returns a value representing the kernel subsystem performance on the workload. Optimizer 110 seeks to maximize this value when searching through different parameter settings. A user can choose to provide workload-specific optimization targets.

At 320, a third method may be a bottleneck check method that checks whether the subsystem is a bottleneck right now. For example, the method may compare the percentage of time the subsystem is busy within a threshold, such as 80%.

FIG. 3B shows an example of the data structure 320 that has been annotated and stored in policy cache 206 according to one embodiment. This data structure is a parameter for an I/O scheduling subsystem. This parameter may control the I/O scheduling policy for a disk. Since there are three disk I/O scheduling policies, the minimum value and the maximum value of this parameter are 0 and 2 respectively. That is the value of 0 uses the deadline I/O scheduling policy, the value of 1 uses the cfq I/O scheduling policy, and the value of 2 uses the noop I/O scheduling policy.

In this case, at 321, the name is a disk name that is based on the storage disk name. At 322, the subsystem is “disk”. This may be a storage subsystem in the kernel. Also, at 324, the range of values for the parameter is set at a minimum value of 0 and a maximum value of 2. At 326, the parameter can be registered for the I/O scheduling policy for the disk.

Policy Cache and Search Engine

Policy cache 206 may store optimized parameter settings and allow reuse of these parameter settings on similar workloads. This may greatly reduce the time spent searching for good parameter settings. Also, policy cache 206 can help make the search incremental. For example, when a workload runs for a time shorter than what it would take search engine 208 to find a good parameter setting, search engine 208 stores the intermediate search result in policy cache 206 such that next time a similar workload runs, search engine 208 can resume the search.

In one embodiment, policy cache 206 includes different sub-caches for each kernel subsystem because each subsystem's parameters are typically independent of another subsystem's parameters. A sub-cache may be organized as a list of workload signature and parameter setting pairs. That is, the workload signature captures characteristics of the workload and the optimized parameter settings. Even though the parameter definition was for a subsystem, policy cache 206 can store parameters settings that are optimized when a workload executes. Search engine 208 may compute the workload signature using the sensors that were provided in the policies.

To search a sub-cache to see if a workload signature exists in the sub-cache, search engine 208 uses the sensors to compute the signature of the workload, referred to s₁. Search engine 208 then scans the list and compares the signature with each signature s₂on the list to see if signature s₁is within the similarity threshold of signature s₂. For example, if signature s₁has a value of 8 and signature s₂has a value of 10, and the similarity threshold is 20%, then search engine 208 considers that these two fields have similar values because 8 is within 20% of 10. In one embodiment, a signature may include multiple fields and if search engine 208 determines that all fields of the two signatures are similar, optimizer 110 considers the two workloads similar and reuses the setting associated with the workload signature s₂for the workload signature s₁. In one example, search engine 208 may measure the size of I/O requests for a workload. If the average size is within a threshold to the average size for a stored setting, then that setting is used.

In one embodiment, there may be multiple entries matching a given signature. Search engine 208 may select one of these signatures, such as a first match or may analyze which entry is closest to the signature percentage-wise. When intermediate search results need to be stored, search engine 208 may store additional information used by search engine 208 in addition to the current parameter setting. When a similar workload runs, search engine 208 uses the additional information to resume the search. The additional information may include a flag that indicates that the parameter has been searched, and what setting values have been tried and the results. The search may continue for another parameter as the optimization continues. Search engine 208 may then optimize this next parameter and store intermediate results. When all parameters for the workload have been searched and optimized, the parameter settings for all parameters may be used for the workload and stored.

In one embodiment, search engine 208 may perform a search hierarchically. For example, search engine 208 first detects which subsystem of kernel 102 is bottlenecked and then searches for a good parameter setting for the bottlenecked subsystem. This may be a setting for one parameter or for multiple parameters. Once a subsystem is no longer bottlenecked, search engine 208 may detect that a second subsystem is bottlenecked. Then, search engine 208 moves to optimize the second subsystem by searching for a parameter setting for that subsystem. This focuses on one subsystem at a time as most likely, there is only one bottleneck at a time. If there is another bottleneck after the first bottleneck is removed, then search engine 208 may attempt to optimize that subsystem. In one embodiment, to detect whether a subsystem is bottlenecked, search engine 208 invokes each subsystem's bottleneck method. This method may detect whether or not the subsystem is using computing resources that are over a threshold.

Once search engine 208 determines that a subsystem is bottlenecked, search engine 208 may first determine a sub-cache for the kernel subsystem in policy cache 206. Then, for the list of workload signatures in the sub-cache, search engine 208 determines a signature that matches the current workload. If the signature has a good parameter setting, search engine 208 may apply that parameter setting to the kernel subsystem that is executing workload. Even though a good parameter setting may be found, search engine 208 may still attempt to optimize the parameters.

When a similar workload does not exist in policy cache 206, search engine 208 searches for a good parameter setting. In one embodiment, search engine 208 may select a parameter and iterate through the values to find the best value for the parameter. In one embodiment, search engine 208 optimizes multiple parameters for a subsystem, but one parameter at a time. This allows search engine 208 to evaluate the performance change due to changing a value for the parameter. The optimization targets can be used to determine when a good value is found. Search engine 208 may then combine the best values into a parameter setting file. The parameter setting file may store the intermediate parameter settings for one or more parameters that were searched. This allows the intermediate results to persist and be used again. The parameters may be largely independent so that they can be searched separately and the effects of the parameters on performance may be largely monotonic. That is, if increasing the value of the parameter improves performance, then, search engine 208 should keep increasing the value.

In one embodiment, search engine 208 may prioritize optimizing parameters that have more limited values. That is, search engine 208 selects a parameter with a smallest number of possible values and optimizes that parameter first. Since the number of possible values is smaller, the difference in the values may often have a larger impact on performance. Thus, once search engine 208 finds a good value for a parameter at the beginning of the search process, kernel 102 experiences a better application performance for the rest of the search, which improves the average application performance. Also, search engine 208 respects the dependencies between parameters. As discussed above, users express when a parameter is active it should not be dependent on another parameter that is also active. Search engine 208 does not try to optimize a parameter if it is not active.

FIG. 4 shows a more detailed example of policy cache 206 and search engine 208 according to one embodiment. Policy cache 206 includes a number of sub-caches 402-1-402-n. Each sub-cache may be associated with a different subsystem. Also, each subsystem may have individual parameters. An example of a subsystem may be a part of kernel (operating system). For example, subsystems are resource controllers that allocate varying levels of system resources to its users. Also, each subsystem may include a number of parameters 404-1-404-N, respectively.

Search engine 208 includes a bottleneck analysis manager 406 that can determine when subsystems are bottlenecked. In one embodiment, bottleneck analysis manager 406 may detect a subsystem is bottlenecked by invoking each subsystem's bottleneck method that may have been defined in the API policy specification. Each subsystem's bottleneck method may then return a number that indicates how much computing resources are being used by that subsystem. Bottleneck analysis manager 406 can then determine which subsystem is bottlenecked.

Once the bottlenecked subsystem is identified, a workload identifier 408 determines if a similar workload exists in a sub-cache 402 that is associated with a workload being performed by the subsystem. For example, if the subsystem is associated with sub-cache 402-1, workload identifier 408 uses the sensors defined for the current workload to generate a signature s₁. Then, workload identifier 408 analyzes signatures s₂within sub-cache 402-1 to determine if there is a match. For example, the sub-cache may be organized as a list of workload signatures and parameter setting pairs. When a workload in the sub-cache includes a signature s₂that is similar to the current signature s₁, workload identifier 408 selects that workload 5₂and retrieves the parameter setting. This is considered a warm start because parameter settings from a previous search are being used. When no signatures 5₂in sub-cache 402-1 match the current workload signature s₁, then the optimization starts cold. That is, there are no parameter values to use to start off with from prior workloads being optimized.

Parameter search engine 410 then performs the optimization by searching through different parameter values for the current workload s_l. For example, when a similar workload s₂already exists, parameter search engine 410 may start using those parameter settings. In one embodiment, these parameter settings may provide good performance for the current workload s_l. Parameter search engine 410, however, may continue to adjust the parameters to determine if performance is better or worse. For example, parameter search engine 410 may use the additional information attached to the workload s₂to determine which parameter settings have been tested already. These settings may not be tested again. Then, parameter search engine 410 may adjust a value of one parameter and determine if performance becomes better or worse. If performance becomes worse, then parameter search engine 410 may start decrementing the parameter values to determine if performance becomes better or worse. If performance becomes better, then parameter search engine 410 may continue to adjust the value in the same direction to see if performance continues to improve. If both increasing and decreasing the values does not improve performance, then parameter search engine 410 may determine that the original parameter setting is optimal. This process may be performed for multiple parameters.

In the case of a cold start, where a parameter setting in the sub-cache 402 is not found for the current workload, then parameter search engine 410 may start from a selected value. The selected value may be a random value, the minimum value, the maximum value, or a median value. Then, parameter search engine 410 performs a search. The search may have been specified in the policy specification, such as an incremental or exponential search. In either case, parameter search engine 410 may increase the parameter value to determine if performance is better or worse. If performance is not better, parameter search engine 410 may then decrease the value. Parameter search engine 410 may then sift through these values for the parameter in the optimization range to determine which parameter value is optimal.

Parameter search engine 410 uses the optimization target to determine whether or not performance is better or worse. For example, the optimization target may be the number of I/O requests. If the number of requests are smaller, then the setting is considered better in one example. Once parameter search engine determines optimal results, parameter search engine 410 stores the results as a workload parameter setting pair in sub-cache 402-1

FIG. 5 depicts a simplified flowchart 500 of a method for optimizing parameters according to one embodiment. At 502, optimizer 110 receives a specification that specifies a parameter to be optimized at API 202. The specification may also include a sensor that is used to evaluate workloads performed by kernel 102 and optimization targets.

At 504, optimizer 110 selects a subsystem to optimize. For example, optimizer 110 determines which subsystem is currently bottlenecked the most in kernel 102. Then, at 506, optimizer 110 searches policy cache 206 for a parameter setting from a similar workload as a current workload being performed by the subsystem. Optimizer 110 uses the sensor to determine when a workload is similar. In some cases, a similar workload may be found and in some cases, no previous workload is similar to the current workload.

At 506, optimizer 110 determines if a similar workload and parameter setting is found. When a parameter setting is found, at 508, optimizer 110 uses the parameter setting to test different parameter values for the workload starting with the parameter setting from the prior workload. That is, optimizer 110 finishes the optimization process for the current workload when the optimization process was not previously finished. This tests different parameter values while the kernel is processing the workload.

When a parameter setting is not found, at 510, optimizer 110 tests different parameter values for the workload based on a parameter value range specified in the policy specification. This starts the optimization process cold, but also tests different parameter values while the kernel is processing the workload.

In both the warm and cold starts, at 512, optimizer 110 evaluates the optimizing based on the optimization target. That is, optimizer 110 may change a value and determine if performance improves. This process continues until optimizer 110 finds an optimal performance value in the parameter value range.

I/O Scheduling Example

In one embodiment, API 202 may be used to annotate four parameters that may have a performance impact on computer system 104. The first parameter specifies which I/O scheduling policy to use for a disk. For example, kernel 102 may support three policies: (1) completely fair queuing, (2) deadline-based scheduling, (3) first-come/first-served. To change this parameter, a method is specified such that search engine 208 can change the parameter during runtime. The second parameter specifies the maximum number of requests on a disk queue. A third parameter specifies how many memory pages to read ahead. A last parameter specifies a time slice allocated to each workload when the I/O scheduling policy is set to completely fair queuing.

To identify workloads, API 202 proceeds to set five sensors: the number of concurrent processes; the read/write ratio of the request; the average I/O request size; the average seek distance between two consecutive disk requests; and the average time between two consecutive requests. These sensors may capture typical workload characteristics that significantly affect performance. Search engine 208 may use these characteristics to search for a similar workload in policy cache 206. API 202 is also used to set the optimization target to be the number of sectors read or written per second.

Once the policy is input via API 202, search engine 208 may determine when a workload matches the signature provided by the five sensors. When this occurs, search engine 208 may search through policy cache 206 for parameter settings that are associated with the workload. Search engine 208 may perform different optimizations. For example, policy cache 206 may not include any information that matches a workload for the policy. That is, policy cache 206 is cold in that no optimization for a workload that matches the policy is available. Search engine 208 may then search through a number of parameter settings and determine which parameter settings provide the best performance. In one embodiment, search engine 208 may take a certain amount of time to decide on a parameter setting, such as between 100-150 seconds. Also, there may be a warm cache where policy cache 206 includes intermediate results that search engine 208 can use to continue the optimization. When the settings are found, then search engine 208 may adjust the settings and measure the performance based on the optimization targets. This search may take less time than a cold search.

CPU Scheduling Example

In a CPU scheduling subsystem, API 202 may be used to specify three parameters. For example, a first parameter specifies the length of time for scheduling each runable process once; a second parameter specifies the minimum time a process is guaranteed to run when scheduled; and a third parameter describes the ability of processes being woken up to preempt the current process. The larger value makes it more difficult to preempt the current process.

To identify workloads, API 202 is used to specify one sensor—the number of retired instructions executed in the user space. The optimization target may be the same as the sensor. When a workload is executed, search engine 208 determines that the workload is associated with this policy based on the number of retirement instructions being executed in the user space. Then, search engine 208 determines the three parameters and can analyze the performance of computer system 104 when the parameters are changing. In one embodiment, search engine 208 may adjust the CPU scheduling parameters to preempt threads more often so that a few threads with work to do can make good progress while futile work scaling requests that waste many CPU cycles are not scheduled. In this case, during execution of a subsystem, many threads have no work to do but are busy trying to steal work from other threads without yielding CPU. This wastes many CPU cycles. Search engine 208 thus determines a parameter setting that limits the scheduling of these CPU threads that have no work to do.

Particular embodiments may be implemented in a non-transitory computer-readable storage medium for use by or in connection with the instruction execution system, apparatus, system, or machine. The computer-readable storage medium contains instructions for controlling a computer system to perform a method described by particular embodiments. The computer system may include one or more computing devices. The instructions, when executed by one or more computer processors, may be configured to perform that which is described in particular embodiments.

As used in the description herein and throughout the claims that follow, “a”, “an”, and “the” includes plural references unless the context clearly dictates otherwise. Also, as used in the description herein and throughout the claims that follow, the meaning of “in” includes “in” and “on” unless the context clearly dictates otherwise.

The above description illustrates various embodiments along with examples of how aspects of particular embodiments may be implemented. The above examples and embodiments should not be deemed to be the only embodiments, and are presented to illustrate the flexibility and advantages of particular embodiments as defined by the following claims. Based on the above disclosure and the following claims, other arrangements, embodiments, implementations and equivalents may be employed without departing from the scope hereof as defined by the claims.

Claims

1. A method comprising:

receiving, by a computing device, a specification that specifies a parameter to be optimized, a sensor that is used to evaluate workloads performed by a kernel of the computing device, and an optimization target;

during execution of a current workload being processed by the kernel, searching, by the computing device, policy storage for a similar workload that has been previously optimized, wherein the sensor is used to compare the current workload to workloads in the policy storage;

when the similar workload is found, optimizing, by the computing device, different parameter values in a parameter value range specified in the policy specification for the parameter while the current workload is being processed by the kernel, wherein a parameter setting for the similar workload is used in the optimizing;

when the similar workload is not found, optimizing, by the computing device, different parameter values based on the parameter value range specified in the policy specification for the parameter while the current workload is being processed by the kernel; and

evaluating, by the computing device, the optimizing based on the optimization target.

2. The method of claim 1, wherein the parameter setting is an intermediate result from a prior optimization that has not been completed yet.

3. The method of claim 2, wherein the optimizing completes the prior optimization.

4. The method of claim 1, wherein the different parameter values are for different policies.

5. The method of claim 1, wherein the different parameter values are for a parameter value for a policy used by the kernel to process the current workload.

6. The method of claim 1, further comprising:

selecting a subsystem in the kernel to optimize; and

selecting the parameter from the subsystem for optimizing.

7. The method of claim 6, wherein selecting the subsystem comprising;

analyzing resource usage for a plurality of subsystems; and

selecting the subsystem from the plurality of subsystems based on the resource usage.

8. The method of claim 7, wherein the subsystem is selected that has a largest resource usage.

9. The method of claim 1, wherein optimizing comprises:

changing a parameter value for the parameter; and

evaluating a performance of the computing device based on the changing.

10. The method of claim 1, wherein:

the similar workload is selected from a group of similar workload signatures, and

the similar workload has a similar workload signature that is within a threshold of a current workload signature for the current workload.

11. The method of claim 1, further comprising:

evaluating a dependency for the parameter; and

optimizing the parameter only when a dependency is not violated.

12. The method of claim 1, wherein the sensor specifies a characteristic of the current workload.

13. The method of claim 1, wherein the multiple parameter values are tested while the kernel is processing the workload.

14. The method of claim 1, wherein another parameter is optimized after the parameter optimizing is evaluated.

15. The method of claim 1, wherein the parameter is selected from a plurality of parameters due to the parameter having the parameter value range of possible values that is a smallest number of possible values.

16. A non-transitory computer-readable storage medium containing instructions, that when executed, control a computer system to be configured for:

receiving a specification that specifies a parameter to be optimized, a sensor that is used to evaluate workloads performed by a kernel of the computing device, and an optimization target;

during execution of a current workload being processed by the kernel, searching policy storage for a similar workload that has been previously optimized, wherein the sensor is used to compare the current workload to workloads in the policy storage;

when the similar workload is found, optimizing different parameter values in a parameter value range specified in the policy specification for the parameter while the current workload is being processed by the kernel, wherein a parameter setting for the similar workload is used in the optimizing;

when the similar workload is not found, optimizing different parameter values based on the parameter value range specified in the policy specification for the parameter while the current workload is being processed by the kernel; and

evaluating the optimizing based on the optimization target.

17. The non-transitory computer-readable storage medium of claim 16, wherein:

the parameter setting is an intermediate result from a prior optimization that has not been completed yet, and

the optimizing completes the prior optimization

18. The non-transitory computer-readable storage medium of claim 16, further configured for:

selecting a subsystem in the kernel to optimize; and

selecting the parameter from the subsystem for optimizing.

19. The non-transitory computer-readable storage medium of claim 16, wherein the different parameter values are tested while the kernel is processing the workload.

20. An apparatus comprising:

one or more computer processors; and

a non-transitory computer-readable storage medium comprising instructions, that when executed, control the one or more computer processors to be configured for:

receiving a specification that specifies a parameter to be optimized, a sensor that is used to evaluate workloads performed by a kernel of the computing device, and an optimization target;

during execution of a current workload being processed by the kernel, searching policy storage for a similar workload that has been previously optimized, wherein the sensor is used to compare the current workload to workloads in the policy storage;

when the similar workload is found, optimizing different parameter values in a parameter value range specified in the policy specification for the parameter while the current workload is being processed by the kernel, wherein a parameter setting for the similar workload is used in the optimizing;

when the similar workload is not found, optimizing different parameter values based on the parameter value range specified in the policy specification for the parameter while the current workload is being processed by the kernel; and

evaluating the optimizing based on the optimization target.