INTERACTIVE PHYSICAL DESIGN TUNING
An architecture for providing interactive sessions for physical database design is described, allowing users to readily try different options, identify problems, and obtain physical designs in a flexible way. Embodiments based on a .NET assembly and modifications to a database management system (DBMS) are also described.
Latest Microsoft Patents:
- SYSTEMS AND METHODS FOR IMMERSION-COOLED DATACENTERS
- HARDWARE-AWARE GENERATION OF MACHINE LEARNING MODELS
- HANDOFF OF EXECUTING APPLICATION BETWEEN LOCAL AND CLOUD-BASED COMPUTING DEVICES
- Automatic Text Legibility Improvement within Graphic Designs
- BLOCK VECTOR PREDICTION IN VIDEO AND IMAGE CODING/DECODING
Automated physical design tuning involves a database management system (DBMS) recommending a set of physical structures that increase the performance of an underlying database. Physical design has been formulated as a problem statement, traditionally: Given a workload W and a storage budget B, find the set of physical structures, or configuration, that fits in B and results in the lowest execution cost for W. Most modern commercial DBMS's have some facilities for automated design tuning. In general, however, it has not been possible to include in the tuning process information beyond the basic information of the design tuning problem statement.
For instance, it has not been possible to tune a given workload for maximum performance under a storage constraint while at the same time ensuring that no query degrades by more than 10% with respect to the original configuration. As another example, it has not been possible to enforce that the clustered index on a table T cannot be defined over certain columns of T that would introduce hot-spots (without specifying which of the remaining columns should be chosen). As yet another example, in order to decrease contention during query processing, there is no way to avoid any single column from a table from appearing in more than, say, three indexes (the more indexes a column appears in, the more contention arises due to exclusive locks during updates). While some new approaches allow more flexibility in the specification of a physical design tuning problem, existing solutions require that the whole specification to be provided upfront, without possibility of interaction.
Described herein are techniques for flexible and interactive physical design tuning.
SUMMARYThe following summary is included only to introduce some concepts discussed in the Detailed Description below. This summary is not comprehensive and is not intended to delineate the scope of the claimed subject matter, which is set forth by the claims presented at the end.
An architecture for providing interactive sessions for physical database design is described, allowing users to readily try different options, identify problems, and obtain physical designs in a flexible way. Embodiments based on a .NET assembly and modifications to a database management system (DBMS) are also described.
Many of the attendant features will be explained below with reference to the following detailed description considered in connection with the accompanying drawings.
The present description will be better understood from the following detailed description read in light of the accompanying drawings, wherein like reference numerals are used to designate like parts in the accompanying description.
Embodiments discussed below relate to interactive physical design tuning of databases. For background,
The scenarios mentioned in the Background above show that the state-of-the-art techniques for physical design tuning are inflexible. Referring to
Embodiments described below shift the design approach and allow tuning sessions to be highly interactive. Current monolithic architectures in physical design tools force users to specify the whole problem upfront and prevent users from making changes a posteriori or in general interacting with the system. Explanation will begin with description of an architecture for interactive sessions, followed by a review of Windows PowerShell as an infrastructure component that can support the architecture. Explanation will proceed with description of interactive tuning processes, followed by presentation of illustrative examples.
Layered Architecture for Physical Design TuningA low-level API layer 156 may expose, in formats that are simple to consume (e.g., XML), the functionality of the Core DBMS layer 150 (and also the DBMS itself). As an example, they may expose primitives to manipulate a what-if mode of the DBMS and also may expose rich explain modes which, after optimizing queries, surface optimization information use at higher levels of the DBMS. The explain mode may provide useful information about the optimization of a query, such as the final plan obtained by the optimizer, cardinality estimates for intermediate results, access path requests, etc. (it may be thought of as an extension to existing modes in relational systems, such as showplans in Microsoft SQL Server). The low level API layer 156 may also encapsulate existing DBMS functionality, such as mechanisms that monitor and gather workloads.
A High-level API layer 158 if provided to facilitate access to the lower level APIs 156 and Core DBMS layer 150. Physical design tools were previously built on top of the low-level APIs only exposed a rigid functionality (e.g., point to a workload, set the storage constraint, and optimize). The high-level API layer 158 exposes the internal representations and mechanisms in a modular way. Basic concepts such as queries, indexes, databases, tables, and access-path requests are exposed as instantiable classes. In addition to these data structures, the high-level API layer 158 exposes composable and simple algorithms sometimes found in previous tuning tools. For instance, this layer may expose mechanisms to merge two indexes, or to obtain the best set of indexes for a single query. These primitive data structures and algorithms are not necessarily meant to be consumed by DBAs, but instead provide a foundational abstraction for applications to be built on top, as explained next. In one embodiment, described later, the high-level API layer 158 may be implemented as a .NET assembly 160, which is executed by a .NET VM 162 (Virtual Machine), sometimes called a managed code environment.
Front-ends 164 are based on both the low-level APIs 156 and high-level APIs 158 and deliver functionality to end users. One example of a front-end 164 is an interactive scripting platform to interact with physical database designs. The scripting language understands and works with the data structures and algorithms exposed by the underlying layers and allows users to write interactive scripts to tune the physical design of a database. Common tasks, such as minimizing the cost for a single storage constraint (or other functionality provided by previous physical design tools), can be implemented as pre-existing scripts that can be accessed using graphical user interfaces by relatively inexperienced DBAs.
As mentioned, a front-end 164 can be implemented by a scripting environment. For example, Windows Powershell 166 (tm), available from Microsoft Corporation is a scripting language that can be used as a front-end 164 in the architecture. A prototype implementation of the architecture using Windows Powershell 166 will also be described.
Windows PowershellWindows PowerShell is an interactive, extensible scripting language that integrates with the Microsoft .NET Framework. It provides an environment to perform administrative tasks by execution of cmdlets (i.e., commandlets, which are basic operations), scripts (which are composition of cmdlets), stand-alone applications, or by directly instantiating regular .NET classes. The main features of Windows PowerShell include tight integration with .NET, strict naming conventions, object pipelines, and data providers.
Windows PowerShell integrates with the .NET framework and leverages the .NET framework to represent data. Windows PowerShell understands .NET classes natively, as illustrated below. Thus, new classes written in the .NET framework are easily available as first-class citizens in Windows PowerShell.
Windows PowerShell uses strict naming conventions. Cmdlets in Windows PowerShell follow a verb-noun naming convention, and parameters are passed in a unified manner. Some examples of such built-in cmdlets are Start-Service, which starts an OS (operating system) service in the current machine, Get-Process, which returns a list of processes currently executing, Clear-Host, which clears the screen, and Get-ChildItem which, if located in a file system directory, returns all its subdirectories or files. There are also aliases for the common cmdlets.
PowerShell also provides facilities to construct object pipelines. Similar to Unix shells, cmdlets can be pipelined using the “|” operator. However, unlike Unix shells, which typically pipeline strings, Windows PowerShell pipelines .NET objects. For instance, the script:
obtains the list of all running processes, pipes the result (which is a list of System.Diagnostics.Process.NET objects) to the Sort-Object cmdlet, which understands the semantics of the objects and sorts them by the property Handles in descending order. In turn, the result of this cmdlet (i.e., an ordered list of processes) is passed to the Select-Object cmdlet, which takes the first five processes and passes them to the next cmdlet in the pipeline, Stop-Process, which terminates them. The following script returns the number of lines that contains the word “constraint” in any LATEX file in the current directory that is below 100,000 bytes long:
which gets all files in the current path that have a “tex” extension and keeps only those that are smaller than 100,000 bytes. Then, each file is processed by first getting its content (which returns a list of string .NET classes), selecting only those that contain the work constraints. The combined result of this subscript is a list of strings, which is measured and the count is returned. To shorten a script, aliases (e.g., Get-ChildItem becomes “dir”, Where-Object becomes “?”, Foreach-Object becomes %), and positional cmdlet parameters can be used. For instance it is not necessary to explicitly write—Path after dir. An equivalent script is shown below:
PowerShell has the ability to expose hierarchical data models by means of data providers, which are then accessed and manipulated using a common set of cmdlets. As an example, the file system is one such provider. When situated in some node inside the file system provider, Get-ChildItem can be used to obtain the subdirectories or files in the current location, access contents of elements using Get-Content, and navigate the provider using Set-Location (aliased as cd). However, Windows PowerShell natively exposes the registry and the environment variables as providers. There also are third party providers that give a unified surface to access, query, and modify Active Directory, SharePoint and SQL Server, among others.
The next section describes how take advantage of the different features of Windows PowerShell to provide an interactive experience for physical design tuning.
Interactive Physical Design TuningA prototype implementation that enables interactive physical design tuning sessions will now be described. The architecture of this implementation is described first, followed by discussion of examples of how the implementation can be used.
The Core DBMS 150 and Low-level APIs 156 are implemented by instrumenting a database server, for instance Microsoft SQL Server 152. Some components (e.g., what-if optimization) are already part of this particular database server, while others (e.g., access-path request interception) were added.
High-Level APIsThe high-level API layer 158 is implemented by introducing a new .NET assembly 160 that encapsulates and exposes classes and algorithms relevant to physical design tuning. Among the classes that the assembly exposes are Database, Table, Index, Column, Query, Configuration, and Request classes. These are rich in functionality, so for instance the Index class may have methods that return merged and reduced indexes and methods that create hypothetical versions of the index in the database. The Query class may have methods that evaluate (optimize) it under a given configuration, and methods that return its set of access-path requests.
Additionally, as part of the .NET assembly 160, a sophisticated caching mechanism may be built to avoid optimizing the same query multiple times in the database server. Instead, each query remembers previous optimizations and, if asked again to optimize itself with a previously seen configuration, it returns the cached values without doing the expensive work again. Because these classes are exposed in an assembly, the definitions thereof can be loaded directly into Windows PowerShell which may be used to explore, in interactive form, the physical design of a database, as illustrated in
While the example above is useful, call the .NET methods directly can be inefficient. Also, using such methods directly may be a time consuming way to accomplish tuning a database design. Using the capabilities of Windows PowerShell, functionally such as a provider, visualizations, cmdlets, and scripts can be used. PowerShell providers are .NET programs that allow a user to work with data stores as though they were mounted drives or file systems, which simplifies accessing external data outside the PowerShell environment. A PowerShell provider can be implemented that exposes the information about a tuning session in a hierarchical and intuitive object model.
In addition to a provider, the bare .NET classes and methods of the .NET assembly may be provided with composable cmdlets.
Scripts are another feature of the implementation.
Other common algorithms may be similarly implemented, such as the relaxation based tuning approach in “Automatic physical database tuning: A relaxation-based approach” (N. Bruno and S. Chaudhuri., In Proceedings of the ACM International Conference on Management of Data (SIGMOD), 2005). One embodiment implements a version that handles constraint language, described in “Constrained physical design tuning” (N. Bruno and S. Chaudhuri, In Proceedings of the International Conference on Very Large Databases (VLDB), 2008). This script is called TuneConstrained-Workload and takes as inputs a workload, a timeout, and a set of constraints. Such a script may be implemented by using the .NET classes exported by the high-level APIs and may be implemented as a PowerShell script in fewer than 100 lines of code.
A Sample Interactive Tuning SessionEmbodiments and features discussed above can be realized in the form of information stored in volatile or non-volatile computer or device readable media. This is deemed to include at least media such as optical storage (e.g., CD-ROM), magnetic media, flash ROM, or any current or future means of storing digital information. The stored information can be in the form of machine executable instructions (e.g., compiled executable binary code), source code, bytecode, or any other information that can be used to enable or configure computing devices to perform the various embodiments discussed above. This is also deemed to include at least volatile memory such as RAM and/or virtual memory storing information such as CPU instructions during execution of a program carrying out an embodiment, as well as non-volatile media storing information that allows a program or executable to be loaded and executed. The embodiments and features can be performed on any type of computing device, including portable devices, workstations, servers, mobile wireless devices, and so on.
Claims
1. A computer-implemented method to allow a user to interactively explore a physical database design, the method comprising:
- accessing a database management system (DBMS) including a database engine;
- allowing a user to interactively input commands in arbitrary order, the commands including: specification commands specifying a database of the DBMS; specification commands specifying one or more queries directed to the database; specification commands specifying one or more physical configurations of the database, a specified physical configuration comprising at least set of indices of tables of the database; and execution commands obtaining performance and/or optimization information from the DBMS, the performance and/or optimization information generated by an API of the DBMS that provides access to what-if functions of the DBMS that analyze query execution without fully executing queries.
2. A computer-implemented method according to claim 1, wherein the commands are entered via a command line interpreter that receives the interactively inputted commands and which allows the user to create arbitrary variables and objects to be used as parameters of the commands.
3. A computer-implemented method according to claim 2, further comprising providing an interface between the DBMS and the command line interpreter, wherein the command line interpreter communicates with objects of the module and the module communicates with an API of the DBMS.
4. A computer-implemented method according to claim 1, the DBMS comprising a query optimizer, a low-level API communicating with the query optimizer, the device comprising an assembly executed by a managed code environment, the assembly communicating with the low level API, the device further comprising a front-end program to which the user enters the commands.
5. A computer-implemented method according to claim 4, wherein in response to some of the commands the program creates or alters objects that are instances of classes implemented by the assembly without communicating with the DBMS.
6. A computer-implemented method according to claim 5, wherein in response to some of the commands the program communicates one or more properties of the objects of the assembly via the low-level API to the DBMS and in response receives the performance and/or optimization information from the DBMS.
7. A computer-implemented method according to claim 1, wherein the commands are entered via a program separate from the database, when an execution command is executed by the user, information entered by the specification commands is used to obtain the performance and/or optimization information from the DBMS, and when an execution command is executed the information entered by the specification commands continues to be maintained by the program and after a first execution command is executed and performance and/or optimization information is obtained by the program, a second execution command can be entered at any arbitrary time using the stored information of the first execution command.
8. A computer-implemented method according to claim 7, wherein the program comprises a shell environment with general programming features that allow the user to arbitrarily name variables and assign values to the variables.
9. One or more computer-readable storage media storing information to enable a computing device to perform a process, the process comprising:
- establishing a connection between a component and a DBMS, the component configured to enable a user to interact with a tuning API of a DBMS, the tuning API receiving hypothetical scenarios and submitting them to the DBMS and returning corresponding physical database design configurations generated by the DBMS, the physical database design configurations comprising configurations of a database managed by the DBMS, wherein the interactive component is configured to allow the user to interactively build and modify hypothetical scenarios and at any arbitrary time while doing so allows the user to submit one of the hypothetical scenarios to the DBMS via the tuning API, wherein the interactive component maintains the hypothetical scenarios between submissions of same via the tuning API.
10. One or more computer-readable storage media according to claim 9, wherein the interactive component comprises an intermediary between an application program and the DBMS, and the user builds and modifies the hypothetical scenarios by interacting with the application program.
11. One or more computer-readable storage media according to claim 9, wherein the application program includes an implementation of a scripting language, the user interacting with the application program to formulate commands in the scripting language.
12. One or more computer-readable storage media according to claim 9, wherein the component exposes objects that represent structures of the database and exposes physical design tuning functions, the commands in the scripting language referencing the objects and the design tuning functions exposed by the component.
13. One or more computer-readable storage media according to claim 9 wherein the user interaction is with a programming environment in which the user can create and manipulate arbitrary variables that are used to define the arbitrary scenarios.
14. One or more computer-readable storage media according to claim 13, wherein the exposed objects include objects representing databases, tables, configurations comprised of at least indices, and queries.
15. One or more computer-readable storage media according to claim 9, wherein the physical database design configurations are generated by optimization analysis performed by the DBMS.
16. One or more computer-readable storage media storing information configured to enable a computing device to perform a process, the process performed by one or more components separate from and in communication with a DBMS, the process comprising:
- providing commands invoked by a user to instantiate objects and designate variables pointing to the objects;
- providing what-if commands executable by the user to invoke what-if analysis of a DBMS; and
- providing an interaction component that receives interactive input from a user and provides corresponding output to a display device, wherein the interactive input allows the user to invoke the what-if commands which reference the objects.
17. One or more computer-readable storage media according to claim 16, further comprising: providing math commands for arbitrary math computations and text commands for arbitrary text computations, and wherein the interaction component allows the user to construct arbitrary compound commands by interactively specifying, for a compound command, a what-if command, a math or text command, where the compound command references or provides output to one of the variables, and where output of one the compound command's commands is provided as input to another of the compound command's commands.
18. One or more computer-readable storage media according to claim 16, wherein at least one of the components comprises an assembly executed by a managed code environment, the one of the components interfacing with one or more APIs of the DBMS and implementing classes representing database structures of which the objects instantiated by the user are instances.
19. One or more computer-readable storage media according to claim 16, wherein the user invokes a command to specify a workload, and the user repeatedly invokes commands to refine a hypothetical configuration of the database and repeatedly invokes the what-if commands between refinements of the hypothetical configuration of the database.
20. One or more computer-readable storage media according to claim 16, further comprising providing a visualization command that is included as part of the compound command and receives output of one of the commands of the compound commands and generates corresponding graphical representation of the output.
Type: Application
Filed: Jun 15, 2009
Publication Date: Dec 16, 2010
Patent Grant number: 8214402
Applicant: MICROSOFT CORPORATION (Redmond, WA)
Inventors: Surajit Chaudhuri (Redmond, WA), Nicolas Bruno (Redmond, WA)
Application Number: 12/484,564
International Classification: G06F 17/30 (20060101);