Predicting physical memory attributes by compiler analysis of code blocks
Predicting physical memory attributes by compiler analysis of code blocks includes receiving source code including at least one code block, and identifying a buffer associated with the at least one code block. Buffer access characteristics associated with the buffer are determined from the at least one code block. The buffer access characteristics are mapped to physical memory attributes associated with one or more physical memories of a computing system. Executable program code including a system call associated with memory allocation is generated based upon the physical memory attribute values.
Latest IBM Patents:
The present invention relates generally to a method, system, and computer program product for compiler analysis of code blocks. More particularly, the present invention relates to a method, system, and computer program product for predicting physical memory attributes by compiler analysis of code blocks.
BACKGROUNDA compiler is a computer program that typically transforms or translates source computer code written in a source computer programming language to another programming language and/or a target program or application executable by a computing system. Typically, a compiler performs receives source code and performs preprocessing, lexical analysis, syntax analysis, semantic analysis, code optimization, and code generation to produce executable code in the form of a program or application. The program or application is compiled to execute on a target platform having one or more processors and one or more memory types or technologies capable of storing the program or application during execution.
SUMMARYThe illustrative embodiments provide a method, system, and computer program product. An embodiment of a computer-implemented method includes receiving source code including at least one code block, and identifying a buffer associated with the at least one code block. The embodiment further includes determining buffer access characteristics associated with the buffer from the at least one code block. The embodiment further includes mapping the buffer access characteristics to physical memory attributes associated with one or more physical memories of a computing system. The embodiment further includes generating executable program code including a system call associated with memory allocation based upon the physical memory attribute values.
In another embodiment, the buffer access characteristics include one or more of an interleave value of the buffer, a frequency of access of the buffer, or a size of the buffer. In another embodiment, identifying the buffer includes inserting semantic actions in the at least one source block to detect the buffer.
In another embodiment, determining the buffer access characteristics further includes determining loop iterations and loop limits of an iterative loop within the at least one code block. In another embodiment, determining the buffer access characteristics further includes inserting instrumentation within the at least one code block to capture loop iterations and loop limits of an iterative loop within the at least one code block during execution of the at least one code block.
Another embodiment further includes generating an output file including buffer information from the instrumentation, and processing the output file to compute the buffer access characteristics.
Another embodiment further includes selecting a memory type of the one or more physical memories based upon the mapping of the buffer access characteristics to physical memory attribute values.
In another embodiment, the determining of the buffer access characteristics associated with the buffer includes semantic analysis of the at least one code block.
In another embodiment, identifying the buffer associated with the at least one code block further includes identifying a memory pointer in the at least one code block.
In another embodiment, the physical memory attributes include one or more of a latency, a bandwidth, a power consumption, a device compute ability, a persistence, or a size of a memory device.
An embodiment includes a computer usable program product. The computer usable program product includes one or more computer-readable storage devices, and program instructions stored on at least one of the one or more storage devices.
An embodiment includes a computer system. The computer system includes one or more processors, one or more computer-readable memories, and one or more computer-readable storage devices, and program instructions stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories.
Certain novel features believed characteristic of the invention are set forth in the appended claims. The invention itself, however, as well as a preferred mode of use, further objectives and advantages thereof, will best be understood by reference to the following detailed description of the illustrative embodiments when read in conjunction with the accompanying drawings, wherein:
The illustrative embodiments described herein are directed to predicting physical memory attributes by compiler analysis of code blocks. Embodiments recognize that a computing system having a memory subsystem with heterogeneous standards and technologies is currently realizable and existent. For example, dynamic random-access memory (DRAM), spin-transfer torque-magnetoresistive random-access memory (STT-MRAM), phase-change memory (PCM) and other memory technologies may be included in the same memory subsystem. Further, DRAM alone may include different standards such as double data rate fifth-generation (DDR5), double data rate fourth-generation (DDR4), and low power DDR (LPDDR) DRAM may be included in the same system.
Embodiments recognize that enterprise servers and other computing systems may include co-existing applications with various resource needs and systems demands. Embodiments further recognize that in such computing systems, it would be highly beneficial to allocate, amp, and migrate applications on-demand to a memory subsystem of the computing system that aptly suits the needs of the particular application. Memory allocation requests and to memory devices have associated properties such as latency (e.g., speed of random access), bandwidth (e.g., speed of contiguous data stream processing), reliability (e.g., probable rate of memory failure), power consumption (e.g., power consumption in holding data in memory), device compute (e.g., non-CPU compute capabilities of the device), density (e.g., compactness of allocated memory), and persistence (e.g., volatile (DRAM) or non-volatile (STT-MRAM, PCM) memory technology).
Various embodiments described herein provide for solving compiler specific problems and platform specific issues with respect to memory maps by compiler analysis of source code blocks to determine physical memory most suitable for allocation with the code blocks. An embodiment includes mapping application code blocks by a compiler into intermediate triplet values identifying characteristics of the memory requirements of the code block such as access pattern, frequency, and size. The embodiment further includes converting the triplets into memory attribute values to be provided as input arguments to memory allocation based system calls (e.g., mbind/madvise) to allocate particular memory from among different memory types for the application blocks best suited for the code block based upon the memory attribute values. Accordingly, physical memory placement choices for an application virtual memory buffer are improved.
In an embodiment, a compiler deduces a single code identification set from a given instance of an iterative code block, such as a “for { }”, “do { } while”, or “while { }” code block, of any nested level looking at a possible range (e.g., start/end) for a buffer address, effective loop index increment at each nested level, and overall possible calling frequency of the whole iterative code block. In an embodiment, the compiler creates and stores multiple code identification sets for all instances of iterative access for a given virtual memory buffer in a runtime managed data structure. In the embodiment, the compiler analyzes the data structure to determine a final and effective code identification set for the virtual memory buffer. In particular embodiments, the multi value code identification set for a given virtual memory buffer is analyzed based upon a relative weight for each instance of iteration in proportion to the size and frequency of access.
In an embodiment, the compiler maps an effective code identification set for a given virtual memory buffer into effective memory attribute values. In the embodiment, the memory attribute values are provided as an argument into a memory allocation system calls (e.g., mbind/madvise) which in turn involve examining kernel exported memory attribute values such as bandwidth, latency, or power consumption.
In an embodiment, a compiler receives source program code and inserts semantic actions in the source code to recognize buffers of interest and copies to the buffers in one or more blocks of the source code. In one or more embodiments, a semantic action includes code inserted during semantic parsing of the source code to perform an action such as additional processing of parsed code to generate intermediate code. In the embodiment, the compiler sets the buffer of interest characteristics in the semantic action as attributes of program variables and statements. In the embodiment, the compiler further inserts semantic actions to compute loop iterations and limits within the source code that can be detected statically, and sets these loop characteristics as program variables and statements.
In the embodiment, the compiler traverses a program control flow graph and computes using semantic attributes the loop iteration and limits that can be detected statically and set these as attributes of program statements. In the embodiment, for cases in which the iterations and limits cannot be determined statically, the compiler traverses the program control flow graph and inserts instrumentation to capture the iteration and limits information dynamically. In one or more embodiments, instrumentation refers to one or more commands to collect and output measurement data during runtime of the program code.
In the embodiment, the program code is run with representative smaller inputs, and the instrumentation produces an output file containing the addresses of buffers of interest within the source code. In the embodiment, the compiler processes the instrumentation output and computes memory attributes such as size, interleave, and frequency information for each of the buffers of interest. In the embodiment, the compiler selects a physical memory from different types of technologies of physical memory present in a computing system for a buffer of interest and allocates the physical memory within output application code for the buffer of interest based upon the memory attributes of the buffer of interest.
An embodiment can be implemented as a software application. The application implementing an embodiment can be configured as a modification of an existing compiler, as a separate application that operates in conjunction with an existing compiler, a standalone application, or some combination thereof.
The illustrative embodiments are described with respect to certain types of tools and platforms, procedures and algorithms, services, devices, data processing systems, environments, components, programming languages, compilers, and applications only as examples. Any specific manifestations of these and other similar artifacts are not intended to be limiting to the invention. Any suitable manifestation of these and other similar artifacts can be selected within the scope of the illustrative embodiments.
Furthermore, the illustrative embodiments may be implemented with respect to any type of data, data source, or access to a data source over a data network. Any type of data storage device may provide the data to an embodiment of the invention, either locally at a data processing system or over a data network, within the scope of the invention. Where an embodiment is described using a mobile device, any type of data storage device suitable for use with the mobile device may provide the data to such embodiment, either locally at the mobile device or over a data network, within the scope of the illustrative embodiments.
The illustrative embodiments are described using specific code, designs, architectures, protocols, layouts, schematics, and tools only as examples and are not limiting to the illustrative embodiments. Furthermore, the illustrative embodiments are described in some instances using particular software, tools, and data processing environments only as an example for the clarity of the description. The illustrative embodiments may be used in conjunction with other comparable or similarly purposed structures, systems, applications, or architectures. For example, other comparable mobile devices, structures, systems, applications, or architectures therefor, may be used in conjunction with such embodiment of the invention within the scope of the invention. An illustrative embodiment may be implemented in hardware, software, or a combination thereof.
The examples in this disclosure are used only for the clarity of the description and are not limiting to the illustrative embodiments. Additional data, operations, actions, tasks, activities, and manipulations will be conceivable from this disclosure and the same are contemplated within the scope of the illustrative embodiments.
Any advantages listed herein are only examples and are not intended to be limiting to the illustrative embodiments. Additional or different advantages may be realized by specific illustrative embodiments. Furthermore, a particular illustrative embodiment may have some, all, or none of the advantages listed above.
With reference to the figures and in particular with reference to
Clients or servers are only example roles of certain data processing systems connected to network 102 and are not intended to exclude other configurations or roles for these data processing systems. Server 104, server 106, and server 140 couple to network 102 along with storage unit 108. Software applications may execute on any computer in data processing environment 100. Clients 110, 112, and 114 are also coupled to network 102. A data processing system, such as server 104 or 106, or client 110, 112, or 114 may contain data and may have software applications or software tools executing thereon.
Only as an example, and without implying any limitation to such architecture,
Device 132 is an example of a device described herein. For example, device 132 can take the form of a smartphone, a tablet computer, a laptop computer, client 110 in a stationary or a portable form, a wearable computing device, or any other suitable device. Any software application described as executing in another data processing system in
Servers 104, 106, and 140, storage unit 108, and clients 110, 112, and 114, and device 132 may couple to network 102 using wired connections, wireless communication protocols, or other suitable data connectivity. Clients 110, 112, and 114 may be, for example, personal computers or network computers.
In the depicted example, server 104 may provide data, such as boot files, operating system images, and applications to clients 110, 112, and 114. Clients 110, 112, and 114 may be clients to server 104 in this example. Clients 110, 112, 114, or some combination thereof, may include their own data, boot files, operating system images, and applications. Data processing environment 100 may include additional servers, clients, and other devices that are not shown. Server 104 includes a compiler application 105 that may be configured to implement one or more of the functions described herein for predicting physical memory attributes by compiler analysis of code blocks and producing executable code in accordance with one or more embodiments.
Server 140 includes a processor 142 and a memory subsystem 144. Processor 142 is configured for executing an application 146 stored within memory subsystem 144 as described herein with respect to various embodiments. In one or more embodiments, application 146 is produced by compiler application 105 and received from server 104. In one or more embodiments, memory subsystem 144 includes one or more memory devices and may include different memory devices having different types or technologies. In one or more embodiments, server 106 implements a serverless cloud computing environment. Storage device 108 includes one or more databases 109 that may be configured to store source code or other data associated with compiler application 105 such as one or more tables as described herein.
In the depicted example, data processing environment 100 may be the Internet. Network 102 may represent a collection of networks and gateways that use the Transmission Control Protocol/Internet Protocol (TCP/IP) and other protocols to communicate with one another. At the heart of the Internet is a backbone of data communication links between major nodes or host computers, including thousands of commercial, governmental, educational, and other computer systems that route data and messages. Of course, data processing environment 100 also may be implemented as a number of different types of networks, such as for example, an intranet, a local area network (LAN), or a wide area network (WAN).
Among other uses, data processing environment 100 may be used for implementing a client-server environment in which the illustrative embodiments may be implemented. A client-server environment enables software applications and data to be distributed across a network such that an application functions by using the interactivity between a client data processing system and a server data processing system. Data processing environment 100 may also employ a service oriented architecture where interoperable software components distributed across a network may be packaged together as coherent business applications. Data processing environment 100 may also take the form of a cloud, and employ a cloud computing model of service delivery for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g. networks, network bandwidth, servers, processing, memory, storage, applications, virtual machines, and services) that can be rapidly provisioned and released with minimal management effort or interaction with a provider of the service.
With reference to
Data processing system 200 is also representative of a data processing system or a configuration therein, such as data processing system 132 in
In the depicted example, data processing system 200 employs a hub architecture including North Bridge and memory controller hub (NB/MCH) 202 and South Bridge and input/output (I/O) controller hub (SB/ICH) 204. Processing unit 206, main memory 208, and graphics processor 210 are coupled to North Bridge and memory controller hub (NB/MCH) 202. Processing unit 206 may contain one or more processors and may be implemented using one or more heterogeneous processor systems. Processing unit 206 may be a multi-core processor. Graphics processor 210 may be coupled to NB/MCH 202 through an accelerated graphics port (AGP) in certain implementations.
In the depicted example, local area network (LAN) adapter 212 is coupled to South Bridge and I/O controller hub (SB/ICH) 204. Audio adapter 216, keyboard and mouse adapter 220, modem 222, read only memory (ROM) 224, universal serial bus (USB) and other ports 232, and PCI/PCIe devices 234 are coupled to South Bridge and I/O controller hub 204 through bus 238. Hard disk drive (HDD) or solid-state drive (SSD) 226 and CD-ROM 230 are coupled to South Bridge and I/O controller hub 204 through bus 240. PCI/PCIe devices 234 may include, for example, Ethernet adapters, add-in cards, and PC cards for notebook computers. PCI uses a card bus controller, while PCIe does not. ROM 224 may be, for example, a flash binary input/output system (BIOS). Hard disk drive 226 and CD-ROM 230 may use, for example, an integrated drive electronics (IDE), serial advanced technology attachment (SATA) interface, or variants such as external-SATA (eSATA) and micro-SATA (mSATA). A super I/O (SIO) device 236 may be coupled to South Bridge and I/O controller hub (SB/ICH) 204 through bus 238.
Memories, such as main memory 208, ROM 224, or flash memory (not shown), are some examples of computer usable storage devices. Hard disk drive or solid state drive 226, CD-ROM 230, and other similarly usable devices are some examples of computer usable storage devices including a computer usable storage medium.
An operating system runs on processing unit 206. The operating system coordinates and provides control of various components within data processing system 200 in
Instructions for the operating system, the object-oriented programming system, and applications or programs, such as application 105 in
Furthermore, in one case, code 226A may be downloaded over network 201A from remote system 201B, where similar code 201C is stored on a storage device 201D. in another case, code 226A may be downloaded over network 201A to remote system 201B, where downloaded code 201C is stored on a storage device 201D.
The hardware in
In some illustrative examples, data processing system 200 may be a personal digital assistant (PDA), which is generally configured with flash memory to provide non-volatile memory for storing operating system files and/or user-generated data. A bus system may comprise one or more buses, such as a system bus, an I/O bus, and a PCI bus. Of course, the bus system may be implemented using any type of communications fabric or architecture that provides for a transfer of data between different components or devices attached to the fabric or architecture.
A communications unit may include one or more devices used to transmit and receive data, such as a modem or a network adapter. A memory may be, for example, main memory 208 or a cache, such as the cache found in North Bridge and memory controller hub 202. A processing unit may include one or more processors or CPUs.
The depicted examples in
Where a computer or data processing system is described as a virtual machine, a virtual device, or a virtual component, the virtual machine, virtual device, or the virtual component operates in the manner of data processing system 200 using virtualized manifestation of some or all components depicted in data processing system 200. For example, in a virtual machine, virtual device, or virtual component, processing unit 206 is manifested as a virtualized instance of all or some number of hardware processing units 206 available in a host data processing system, main memory 208 is manifested as a virtualized instance of all or some portion of main memory 208 that may be available in the host data processing system, and disk 226 is manifested as a virtualized instance of all or some portion of disk 226 that may be available in the host data processing system. The host data processing system in such cases is represented by data processing system 200.
With reference to
Compiler application 302 is configured to receive source code 304. Compiler application 302 includes a lexical analyzer 306, a syntax analyzer 308, a semantic analyzer 310, a high level optimizer 312, and a low level optimizer 314. Compiler application 302 further includes a dynamic memory access table 316, a buffer access characteristic table 318, a physical memory attributes table 320, and a buffer access characteristic to physical memory attributes mapping table 322.
Lexical analyzer 306 is configured to perform lexical analysis on source code 304 using lexical grammar rules to tokenize portions of source code 304 to produce a sequence of tokens including, for example, symbols, keywords, identifiers, etc. Syntax analyzer 308 is configured to parse and perform syntax analysis on source code 304 to verify according to a syntax grammar that a valid sequence of tokens are produced from the lexical analysis.
Semantic analyzer 310 is configured to perform semantic analysis to gather semantic information from the parsed code and insert semantic actions for recognizing buffers of interest and determining associated attributes as discussed with respect to one or more embodiments. Semantic analyzer 310 is further configured to capture attributes such as size, existence in loops, loop iteration counts, and system call type used to allocate memory.
High level optimizer is configured to consider the static attributes of buffers of interest calculated by semantic analyzer 310 and computes the corresponding memory attributes that can be mapped to the most suitable physical memory available in the system using dynamic memory access table 316. High level optimizer 312 is further configured to perform high level optimization on the semantically analyzed code to generate optimized intermediate code. Low level optimizer 312 is configured to perform low level optimization on the optimized intermediate code to generate application code 324.
Dynamic memory access table 316 is configured to store a table including identifiers of the buffers of interest in association with determined memory attributes including size, interleave, and frequency information for each of the buffers of interest as further described herein. Buffer access characteristic table 318 is configured to store a table including a classification levels of memory attributes as further described herein. Physical memory attributes table 320 is configured to store a table representing rankings of a number of physical memory attributes as further described herein. Buffer access characteristic to physical memory attributes mapping table 322 is configured to store a table including a mapping of buffer access character tuples to physical memory tuples as further described herein.
With reference to
Semantic analyzer 408 performs semantic analysis to gather semantic information from the parsed code and inserts semantic actions for recognizing buffers of interest and capturing a loop modification factors. Semantic analyzer 408 generates semantic actions for recognizing buffers of interest including detecting copies of pointers into temporary points. Semantic analyzer 408 further computes compile time attributes for the buffers of interest. Semantic analyzer 408 further inserts semantic actions for capturing loop modification factors of one or more loops within source code 304 indicative of a modification of an iterative loop within source code 402.
Semantic analyzer 408 further computes static compile time attributes for each of the buffers of interest such as size, existence in loop, loop iteration counts, system call type used to allocate memory, and loop iteration counts. High level optimizer 410 traverses a control flow graph to access the attributes and computes buffer characteristics such as a median buffer size value, an access median interleave value, and an access median frequency value. The median buffer size value indicates a required memory size of the buffer of interest. The access median interleave value indicates a median interleave of an iterative loop of the buffer of interest, and the access median frequency value indicates a median frequency of access of the buffer of interest. For cases in which static compile time attributes cannot be computed, the compiler inserts instrumentation to compute the attributes including median buffer size, median interleave, and median frequency dynamically. In such cases, source code 402 is instrumented in locations in which buffers of interest are accessed to compute the size of the buffers, interleave, and frequency. In particular embodiments, the instrumentation records accesses of addresses within the buffer of interest. High level optimizer 410 further allocates physical memory for the buffers of interest according to the memory attributes computed either by static or dynamic analysis.
After performing the operations of computing the memory attributes, mapping the memory attributes to physical memory attributes, and generating instructions to allocate physical memory for buffers of interest according to computed memory attributes, high level optimizer 410 performs high level optimization on the semantically analyzed code to generate optimized intermediate code. Low level optimizer 412 performs low level optimization on the optimized intermediate code to generate application code 414.
With reference to
With reference to
With reference to
With reference to
The compiler than performs dynamic runtime analysis to dynamically determine attributes of buffers of interest that cannot be determined statically and allocates memory for the buffers of interest. The runtime dynamic analysis may continue until all buffers of interest have been analyzed.
With reference to
With reference to
In the example, the compiler application determines a memory buffer start “tmp=orig_ptr;” and a memory buffer end “orig_ptr+10” within code block 800. The compiler further identifies a sequential increase in single steps of the loop index “i++” and write access to the memory buffer “tmp[i]=10;”.
With reference to
With reference to
In the example, memory diagram 1000 shows an original pointer “orig_ptr” having a median buffer size of 12 bytes, an access median interleave of 0, and an access median frequency of 1. Memory diagram 1000 further shows a first temporary pointer “tmp_ptr” having a median buffer size of 1 byte, an access median interleave of 0, and an access median frequency of 1. Memory diagram 1000 still further shows a second temporary pointer “tmp_ptr1” having a median buffer size of 1 byte, an access median interleave of 0, and an access median frequency of 1.
In an embodiment, for repeated random access cases, the compiler application identifies memory pointers referenced repeatedly with no block wise access for the memory buffer (for loop, while loop, do while loop, etc.). In the embodiment, the compiler application counts references by their occurrence inside loops such as for, while and do while loops to determine an access median frequency for a code block. In various embodiments, repeated random access can also occur in complex data structure assignments either in the same loop or in a function call within a loop. In such situations, the compiler application may count references by their occurrence inside loops such as for, while, and do while loops to determine access median frequency for a buffer.
In an embodiment, the compiler application may estimate median interleave within a buffer by dynamic profiling to capture load addresses and uses the captured load addresses to estimate median interleave. In a particular embodiment, the compiler application inserts instrumentation into the source code, runs the application with a smaller training input, and records offset of load addresses to estimate median interleave.
In another embodiment, for repeated sequential access cases, the compiler application may identify text within the source code having repeated reference to memory pointers and block wise access to the memory buffer for an iterative loop (e.g., for loop, while loop, do while loop, etc.).
Examples of code blocks and corresponding attributes for repeated sequential access is as follows:
Example 1
-
- tmp=orig_ptr;
- for (i=0; i<10; i++)
- tmp[i]=10;
- Access Memory Pointer: tmp
- Access Median Size: 10
- Access Median Interleave: 1
- Access Median Frequency: 1
-
- tmp=orig_ptr;
- for (i=0; i<10; i=i+2)
- tmp[i]=10;
- Access Memory Pointer: tmp
- Access Median Size: 10
- Access Median Interleave: 2
- Access Median Frequency: 1
-
- tmp=orig_ptr;
- for (i=0; i<100; i=i+5)
- tmp[i]=10;
- Access Memory Pointer: tmp
- Access Median Size: 100
- Access Median Interleave: 5
- Access Median Frequency: 1
-
- tmp=orig_ptr;
- for (j=0; j<20; j++) {
- for (i=0; i<10; i++)
- tmp[i]=10;
- for (i=0; i<10; i++)
- }
- Access Memory Pointer: tmp
- Access Median Size: 10
- Access Median Interleave: 1
- Access Median Frequency: 20
-
- tmp=orig_ptr;
- for (j=0; j<20; j++) {
- for (i=0; i<100; i=i+5)
- tmp[i]=10;
- for (i=0; i<100; i=i+5)
- }
- Access Memory Pointer: tmp
- Access Median Size: 100
- Access Median Interleave: 5
- Access Median Frequency: 20
-
- tmp=orig_ptr;
- for (j=0; j<20; j=j+5) {
- for (i=0; i<100; i=i+5)
- tmp[i]=10;
- for (i=0; i<100; i=i+5)
- }
- Access Memory Pointer: tmp
- Access Median Size: 100
- Access Median Interleave: 5
- Access Median Frequency: 5
In one or more embodiments, the compiler application accounts for repeated access if a reference either directly appears inside a loop as discussed above or if the reference occurs in a function which is called from within a loop. For example, if pointer access occurs in a function which is at a depth of N in a call tree of the application, the parent function having the loop may be in the call tree depth of N−b 1, . . . 1.
With reference to
In an embodiment, the compiler application updates dynamic memory access table 1100 by evaluating buffers of interest continuously during runtime (block 1102), characterizing the buffers of interest by determining buffer access characteristics including access pattern, frequency, and size information of the buffer of interest (block 1104), and evaluating memory attributes of the buffer of interest (block 1106).
With reference to
Buffer access character table 1200 classifies each of the three components of the buffer access character by a multiple of levels and provides a 2-bit representation of each classification level. In buffer access table 1200, The access pattern component is classified as serial access represented by “00”, medium access represented by “10”, or random access represented by “11”; the frequency component is classified as low represented by “0” or high represented by “1”; and the size component is classified as low represented by “00”, medium represented by “01”, or high represented by “11”. As an example, if a buffer of interest of source code is serial, highly frequent, and spanning across medium size locations, the buffer access character is represented by <00-1-01>.
With reference to
In the illustrated embodiment, physical memory attribute table 1300 maps particular types of memory to each of the physical memory attributes using a value indicative of a ranking of the particular physical memory attribute with respect to the particular type of memory having possible values of “00”, “01”, “10” and “11” with “00” representing a worst ranking for the physical memory attribute in that segment and “11” representing a best ranking for the physical memory attribute in that segment. An “x” represents the particular physical memory attribute is not relevant for determining a suitable memory type. In the illustrated embodiment, the memory types include high bandwidth memory (HBM), low-latency DRAM (LLDRAM), DDR4 DRAM, DDR5 DRAM, non-volatile dual in-line memory module (NVDIMM), STT MRAM, phase change memory (PCM), 3D XPoint (3D XP), hybrid memory cube (HMC), and low-power DDR (LPDDR). Although the embodiment of
As an example, a DDR5 DRAM based memory is represented as <01-01-11-10-x-11-00> representative of a latency value of “01”, a bandwidth value of “01”, a reliability value of “11”, a power consumption value of “10”, a device compute ability value of “x”, a density value of “11”, and a persistence value of “00”.
With reference to
For attributes computed dynamically using instrumentation, the compiler implements the optimization in a profile directed feedback optimization (e.g., using a compiler flag-fprofile-generate in a GNU Compiler Collection (GCC) compiler) in one or more embodiments. In this type of optimization, a program is first compiled with a profile generate option, the binary code is run with a smaller training input, and a profile is generated. The application is then compiled with a profile use option (e.g., using a compiler flag-fprofile-use in a GCC compiler) which considers the profile information including the dynamic instrumentation generated which is used to accurately determine memory attributes for buffers of interest to generate the final optimized binary.
With reference to
In block 1506, compiler application 105 parses and performs syntax analysis on the source code to verify according to a syntax grammar that a valid sequence of tokens are produced from the lexical analysis and generate a syntax tree. In block 1508, compiler application 105 initiates semantic analysis of the source code. In block 1510, compiler application 105 inserts semantic actions for recognizing buffers of interest and sets buffer of interest characteristics in the semantic action as semantic attributes of program variables and statements. In block 1512, compiler application 105 inserts semantic actions to compute loop iterations and limits within the source code that can be detected statically, and sets these loop characteristics as semantic attributes of program variables and statements.
In block 1514, compiler application 105 traverses a program control flow of the source code and computes, using semantic attributes, the loop iteration and limits that can be detected statically and sets these as attributes of program statements. In block 1516, compiler application 105 traverses the program control flow and inserts instrumentation to capture the iteration and limits information dynamically for cases in which the iterations and limits cannot be determined statically.
In block 1518, compiler application 105 runs the program code with representative smaller inputs. In block 1520, compiler application 105 generates an output file containing the addresses of buffers of interest within the source code. In 1522, compiler application 105 processes the output file and computes memory attributes including size, interleave, and frequency information for each of the buffers of interest. In a particular embodiment, for a use case in which all of the attributes can be detected statically (e.g., if the program code only has “for” loops with constant loop counts), blocks 1516, 1518, and 1522 may be omitted and process 1500 may flow from block 1514 to block 1524.
In block 1524, compiler application 105 selects a physical memory from one or more different types of technologies of physical memory present in memory subsystem 144 for a buffer of interest and allocates the physical memory within output application code for the buffer of interest based upon the memory attributes of the buffer of interest and physical memory attributes of the selected physical memory.
In block 1526, compiler application 105 high level optimization on the semantically analyzed code to generate optimized intermediate code and low level optimization on the optimized intermediate code to generate executable application code 146. In block 1528, compiler application 105 outputs executable application code 146. In one or more embodiments, server 104 sends executable application code 146 to server 140, and server 140 stores application code 146 within the selected memory of memory subsystem 144 for execution by processor 142. Process 1500 then ends.
Thus, a computer implemented method, system or apparatus, and computer program product are provided in the illustrative embodiments for predicting physical memory attributes by compiler analysis of code blocks and other related features, functions, or operations. Where an embodiment or a portion thereof is described with respect to a type of device, the computer implemented method, system or apparatus, the computer program product, or a portion thereof, are adapted or configured for use with a suitable and comparable manifestation of that type of device.
Where an embodiment is described as implemented in an application, the delivery of the application in a Software as a Service (SaaS) model is contemplated within the scope of the illustrative embodiments. In a SaaS model, the capability of the application implementing an embodiment is provided to a user by executing the application in a cloud infrastructure. The user can access the application using a variety of client devices through a thin client interface such as a web browser (e.g., web-based e-mail), or other light-weight client-applications. The user does not manage or control the underlying cloud infrastructure including the network, servers, operating systems, or the storage of the cloud infrastructure. In some cases, the user may not even manage or control the capabilities of the SaaS application. In some other cases, the SaaS implementation of the application may permit a possible exception of limited user-specific application configuration settings.
The present invention may be a system, a method, and/or a computer program product at any possible technical detail level of integration. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention.
The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing. A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.
Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network. The network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.
Computer readable program instructions for carrying out operations of the present invention may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, configuration data for integrated circuitry, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++, or the like, and procedural programming languages, such as the “C” programming language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider). In some embodiments, electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present invention.
Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.
These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the blocks may occur out of the order noted in the Figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts or carry out combinations of special purpose hardware and computer instructions.
Claims
1. A computer-implemented method, the method comprising:
- receiving source code including at least one code block;
- identifying a buffer associated with the at least one code block;
- determining buffer access characteristics associated with the buffer from the at least one code block;
- mapping the buffer access characteristics to physical memory attributes associated with one or more physical memories of a computing system; and
- generating executable program code including a system call associated with memory allocation based upon the physical memory attribute values.
2. The computer-implemented method of claim 1, wherein the buffer access characteristics include one or more of an interleave value of the buffer, a frequency of access of the buffer, or a size of the buffer.
3. The computer-implemented method of claim 1, wherein identifying the buffer includes inserting semantic actions in the at least one source block to detect the buffer.
4. The computer-implemented method of claim 1, wherein determining the buffer access characteristics further includes determining loop iterations and loop limits of an iterative loop within the at least one code block.
5. The computer-implemented method of claim 1, wherein determining the buffer access characteristics further includes inserting instrumentation within the at least one code block to capture loop iterations and loop limits of an iterative loop within the at least one code block during execution of the at least one code block.
6. The computer-implemented method of claim 5, further comprising:
- generating an output file including buffer information from the instrumentation; and
- processing the output file to compute the buffer access characteristics.
7. The computer-implemented method of claim 1, further comprising:
- selecting a memory type of the one or more physical memories based upon the mapping of the buffer access characteristics to physical memory attribute values.
8. The computer-implemented method of claim 1, wherein the determining of the buffer access characteristics associated with the buffer includes semantic analysis of the at least one code block.
9. The computer-implemented method of claim 1, wherein identifying the buffer associated with the at least one code block further includes identifying a memory pointer in the at least one code block.
10. The computer-implemented method of claim 1, wherein the physical memory attributes include one or more of a latency, a bandwidth, a power consumption, a device compute ability, a persistence, or a size of a memory device.
11. A computer usable program product comprising one or more computer-readable storage devices, and program instructions stored on at least one of the one or more storage devices, the stored program instructions comprising:
- program instructions to receive source code including at least one code block;
- program instructions to identify a buffer associated with the at least one code block;
- program instruction to determine buffer access characteristics associated with the buffer from the at least one code block;
- program instructions to map the buffer access characteristics to physical memory attributes associated with one or more physical memories of a computing system; and
- program instructions to generate executable program code including a system call associated with memory allocation based upon the physical memory attribute values.
12. The computer usable program product of claim 11, wherein the buffer access characteristics include one or more of an interleave value of the buffer, a frequency of access of the buffer, or a size of the buffer.
13. The computer usable program product of claim 11, wherein the program instructions to identify the buffer includes program instructions to insert semantic actions in the at least one source block to detect the buffer.
14. The computer usable program product of claim 11, wherein the program instructions to determine the buffer access characteristics further includes program instructions to determine loop iterations and loop limits of an iterative loop within the at least one code block.
15. The computer usable program product of claim 11, wherein the program instructions to determine the buffer access characteristics further includes program instructions to insert instrumentation within the at least one code block to capture loop iterations and loop limits of an iterative loop within the at least one code block during execution of the at least one code block.
16. The computer usable program product of claim 15, further comprising:
- program instructions to generate an output file including buffer information from the instrumentation; and
- program instructions to process the output file to compute the buffer access characteristics.
17. The computer usable program product of claim 11, further comprising:
- program instructions to select a memory type of the one or more physical memories based upon the mapping of the buffer access characteristics to physical memory attribute values.
18. The computer usable program product of claim 11, wherein the computer usable code is stored in a computer readable storage device in a data processing system, and wherein the computer usable code is transferred over a network from a remote data processing system.
19. The computer usable program product of claim 11, wherein the computer usable code is stored in a computer readable storage device in a server data processing system, and wherein the computer usable code is downloaded over a network to a remote data processing system for use in a computer readable storage device associated with the remote data processing system.
20. A computer system comprising one or more processors, one or more computer-readable memories, and one or more computer-readable storage devices, and program instructions stored on at least one of the one or more storage devices for execution by at least one of the one or more processors via at least one of the one or more memories, the stored program instructions comprising:
- program instructions to receive source code including at least one code block;
- program instructions to identify a buffer associated with the at least one code block;
- program instruction to determine buffer access characteristics associated with the buffer from the at least one code block;
- program instructions to map the buffer access characteristics to physical memory attributes associated with one or more physical memories of a computing system; and
- program instructions to generate executable program code including a system call associated with memory allocation based upon the physical memory attribute values.
7549034 | June 16, 2009 | Foster, Sr. |
8813018 | August 19, 2014 | Gamsa |
9250878 | February 2, 2016 | McCallum |
20170076422 | March 16, 2017 | Shen |
20180321925 | November 8, 2018 | Kee |
2018050528 | March 2018 | WO |
- Udayakumaran, Dynamic Allocation for Scratch-Pad Memory Using Compile-Time Decisions:, 2006, ACM Transactions (Year: 2006).
Type: Grant
Filed: Mar 5, 2019
Date of Patent: Oct 13, 2020
Patent Publication Number: 20200285453
Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION (Armonk, NY)
Inventors: Saravanan Sethuraman (Karnataka), Anshuman Khandual (Karnataka), Archana Ravindar (Karnataka), Venkata K Tavva (Telangana)
Primary Examiner: Hossain M Morshed
Application Number: 16/292,737
International Classification: G06F 8/41 (20180101);