METHOD OF OPTIMIZING SCALAR REGISTER ALLOCATION AND A SYSTEM THEREOF

Info

Publication number: 20220164190
Type: Application
Filed: Feb 9, 2022
Publication Date: May 26, 2022
Applicant: Blaize, Inc. (El Dorado Hills, CA)
Inventors: Pathikonda Datta Nagraj (Pradesh), Aravind Rajulapudi (Telangana), Ravi Korsa (Telangana)
Application Number: 17/668,204

Abstract

The present disclosure relates to a system and a method of optimizing scalar register allocation by a processor. The method comprises receiving an intermediate code and information about one or more available physical registers in a memory of the processor, as input. The method further comprises allocating one or more virtual registers based on the received information, wherein each virtual register is having size of each available physical register. The method also comprises mapping one or more groups of 8-bit location of the one or more virtual registers to one or more register classes. The method further comprises identifying a plurality of scalar variables from the input intermediate code, and dynamically assigning the one or more available physical registers to the identified scalar variables using the one or more register classes.

Description

Description

TECHNICAL FIELD

The present disclosure relates to computer systems in general and more particularly, to compilers that optimize scalar register allocation.

BACKGROUND

Modern computer systems have evolved into extremely powerful devices with advances in both hardware and software and improved the performance of the computer systems. Modern software used in the modern computer systems also became very complex when compared to early computer software. Many modern computer software has tens to millions of lines of code or instructions. The execution time of a computer software or computer program is very closely associated to the quantity and complexity of instructions that are executed. Thus, as the quantity and complexity of computer instructions increase, the execution time of the computer program increases as well.

Many of the computer programs are written in high level language and converted into a stream of machine code instructions that are eventually executed on a computer system by a compiler. The manner in which the compiler allocates available physical registers to scalar variables in the computer program affects the execution time of the computer program. Presently, all the available solutions for scalar register allocation will allocate one complete register per each instruction. However, due to the allocation of one complete register per each instruction, there will be only few threads that can run parallel which will further slowdown the execution speed of the program. Thus, it is desirous to have a compiler that optimizes scalar register allocation and increase the number of threads that can run parallel thereby increasing the execution speed of the program.

The information disclosed in this background of the disclosure section is only for enhancement of understanding of the general background of the disclosure and should not be taken as an acknowledgement or any form of suggestion that this information forms prior art already known to a person skilled in the art.

SUMMARY

Embodiments of the present disclosure relate to a method of optimizing scalar register allocation by a processor. The method comprises receiving an intermediate code and information about one or more available physical registers in a memory of the processor, as input. The method further comprises allocating one or more virtual registers based on the received information, wherein each virtual register is having size of each available physical register. The method also comprises mapping one or more groups of 8-bit location of the one or more virtual registers to one or more register classes. The method further comprises identifying a plurality of scalar variables from the input intermediate code, and dynamically assigning the one or more available physical registers to the identified scalar variables using the one or more register classes.

Another aspect of the present disclosure relates to a system to optimize scalar register allocation. The system comprising a memory and a processor, coupled to the memory. The processor is configured to receive an intermediate code and information about one or more available physical registers in the memory of the processor, as input. The processor is further configured to allocate one or more virtual registers based on the received information, wherein each virtual register is having size of each available physical register. The processor is also configured to map one or more groups of 8-bit location of the one or more virtual registers to one or more register classes. The processor is further configured to identify the plurality of scalar variables from the input intermediate code, and dynamically assign the one or more available physical registers to the identified scalar variables using the one or more register classes.

Yet another aspect of the present disclosure relates to non-transitory computer readable medium comprising instructions that, when executed, cause one or more processors to receive an intermediate code and information about one or more available physical registers in the memory of the processor, as input. The one or more processors is further configured to allocate one or more virtual registers based on the received information, wherein each virtual register is having size of each available physical register. The one or more processors is also configured to map one or more groups of 8-bit location of the one or more virtual registers to one or more register classes. The one or more processors is further configured to identify the plurality of scalar variables from the input intermediate code, and dynamically assign the one or more available physical registers to the identified scalar variables using the one or more register classes.

The aforementioned aspects of the present disclosure may overcome one or more of the shortcomings of the prior art. Additional features and advantages may be realized through the techniques of the present disclosure. Other embodiments and aspects of the disclosure are described in detail herein and are considered a part of the claimed disclosure.

The foregoing summary is illustrative only and is not intended to be in any way limiting. In addition to the illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features will become apparent by reference to the drawings and the following detailed description.

BRIEF DESCRIPTION OF THE ACCOMPANYING DRAWINGS

The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate exemplary embodiments and, together with the description, serve to explain the disclosed principles. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The same numbers are used throughout the figures to reference like features and components. Some embodiments of device or system and/or methods in accordance with embodiments of the present subject matter are now described, by way of example only, and with reference to the accompanying figures, in which:

FIG. 1 illustrates an exemplary architecture of an optimized compiler in accordance with some embodiments of the present disclosure;

FIG. 2 is a block diagram of an exemplary computing system to optimize scalar register allocation, in accordance with some embodiments of the present disclosure;

FIG. 3 shows a flow chart of an exemplary method of optimizing scalar register allocation process in accordance with some embodiments of the present disclosure;

FIG. 4A shows an example scenario of register class allocation, in accordance with some embodiments of the present disclosure; and

FIG. 4B shows an example scenario of scalar register allocation, in accordance with some embodiments of the present disclosure.

DETAILED DESCRIPTION

In the present document, the word “exemplary” is used herein to mean “serving as an example, instance, or illustration.” Any embodiment or implementation of the present subject matter described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments.

While the disclosure is susceptible to various modifications and alternative forms, specific embodiment thereof has been shown by way of example in the drawings and will be described in detail below. It should be understood, however that it is not intended to limit the disclosure to the particular forms disclosed, but on the contrary, the disclosure is to cover all modifications, equivalents, and alternative falling within the spirit and the scope of the disclosure.

The terms “comprises”, “comprising”, or any other variations thereof, are intended to cover a non-exclusive inclusion, such that a setup, device or method that comprises a list of components or steps does not include only those components or steps but may include other components or steps not expressly listed or inherent to such setup or device or method. In other words, one or more elements in a device or system or apparatus proceeded by “comprises . . . a” does not, without more constraints, preclude the existence of other elements or additional elements in the device or system or apparatus.

In the following detailed description of the embodiments of the disclosure, reference is made to the accompanying drawings that form a part hereof, and in which are shown by way of illustration specific embodiments in which the disclosure may be practiced. These embodiments are described in sufficient detail to enable those skilled in the art to practice the disclosure, and it is to be understood that other embodiments may be utilized and that changes may be made without departing from the scope of the present disclosure. The following description is, therefore, not to be taken in a limiting sense.

FIG. 1 illustrates an exemplary architecture of a optimized compiler in accordance with some embodiments of the present disclosure.

As shown in FIG. 1, an optimized compiler 100 has a broad classification of different phases of compiler design that comprises a frond end 102, a middle end 104, and an optimized back end 106. The front end 102 typically comprises a lexical analyser, a syntax analyser, a semantic analyser, and intermediate code generator. The lexical analyser converts a source code 150 into tokens and removes white-spaces and comments. The syntax analyser constructs a parse tree by inputting all the tokens one by one and uses context free grammar for constructing the parse tree. The semantic analyser verifies the parse tree and produces a verified parse tree. The intermediate code generator generates an intermediate representation (IR) code 152 of the source code 150.

The middle end 104 comprises a code optimizer which performs optimization on the IR code 152 in order to improve the performance and quality of a target code. The middle end 104 commonly transforms the IR code 152 into an optimized IR code 154 so that it consumes fewer resources and produces more speed. The optimized back end 106 comprises a target code generator that converts the optimized IR code 154 into a target code 158 based on target CPU architecture.

The middle end 104, as shown in the FIG. 1 receives the Intermediate Representation (IR) code 152 of the source code 150 and generates an optimized IR code 154, and the optimized IR code 154 is further translated to the target code 158. Intermediate codes can be represented in number of ways like high-level IR, low-level IR etc. The high-level IR is very close to the source code and is less effective for target machine optimization. The low-level IR is close to target machine, which makes it suitable for register and memory allocation, instruction set selection etc. The IR code which is closer to low-level IR eliminates a need of new complete compiler for every unique machine by keeping the analysis section similar for all the compilers. Modern compiler infrastructures are designed around a low-level IR that serves as a portable, high-level assembly language that can be optimized with a variety of transformations over multiple passes. The present invention relates to a type of transformation method and device which can enhance the performance of a compiler by using an optimized scalar register allocation using the optimized back end 106.

The optimized backend 106 implements an optimized scalar register allocation which translates the optimized IR code 154 to the target code 158, wherein the target code 158 is executed in the machine having lesser compilation time and requires lesser computing capacity.

The optimized scalar register allocation as mentioned above identifies a plurality of scalar variables in the optimized IR code 154, and dynamically assign the one or more available physical registers to the identified scalar variables.

FIG. 2 is a block diagram of a computing system to optimize scalar register allocation, in accordance with some embodiments of the present disclosure.

In various embodiments the system 200 includes one or more processors 202 and one or more graphics processors 208, and may be a single processor desktop system, a multiprocessor workstation system, or a server system having a large number of processors 202 or processor cores 207. In one embodiment, the system 200 is a processing platform incorporated within a system-on-a-chip (SoC) integrated circuit for use in mobile, handheld, or embedded devices.

In one embodiment the system 200 can include or be incorporated within a server-based gaming platform, a game console, including a game and media console, a mobile gaming console, a handheld game console, or an online game console. In some embodiments the system 200 is a mobile phone, smart phone, tablet computing device or mobile Internet device. The processing system 200 can also include couple with, or be integrated within a wearable device, such as a smart watch wearable device, smart eyewear device, augmented reality device, or virtual reality device. In some embodiments, the computing system 200 is a television or set top box device having one or more processors 202 and a graphical interface generated by one or more graphics processors 208.

In some embodiments, the one or more processors 202 each include one or more processor cores 207 to process instructions which, when executed, perform operations for system and user software. In some embodiments, each of the one or more processor cores 207 is configured to process a specific instruction set 209. Processor core 207 may also include other processing devices, such a Digital Signal Processor (DSP).

In some embodiments, the processor 202 includes cache memory 204. Depending on the architecture, the processor 202 can have a single internal cache or multiple levels of internal cache. In some embodiments, the cache memory 204 is shared among various components of the processor 202. In some embodiments, the processor 202 also uses an external cache (e.g., a Level-3 (L3) cache or Last Level Cache (LLC)) (not shown), which may be shared among processor cores 207 using known cache coherency techniques.

In some embodiments, one or more processor(s) 202 are coupled with one or more interface bus(es) 210 to transmit communication signals such as address, data, or control signals between processor 202 and other components in the system 200. The interface bus 210, in one embodiment, can be a processor bus, such as a version of the Direct Media Interface (DMI) bus. However, processor busses are not limited to the DMI bus, and may include one or more Peripheral Component Interconnect buses (e.g., PCI, PCI Express), memory busses, or other types of interface busses. In one embodiment the processor(s) 202 include an integrated memory controller 216, a platform controller hub 230, and a scalar register allocator 240. The memory controller 216 facilitates communication between a memory device and other components of the system 200, while the platform controller hub (PCH) 230 provides connections to I/O devices via a local I/O bus. The scalar register allocator 240 comprises a receiving module 242, a virtual register allocator module 244, a mapping module 246, a scalar identification module 248, a register assignor module 250, and other module(s) 252. The modules may further include some other module to perform various miscellaneous functionalities of compiler apparatus 200. It will be appreciated that such aforementioned modules may be represented as a single module or combination of different modules. The one or more modules may be implemented in the form of system software performed by the processor 202.

The memory device 220 can be a dynamic random-access memory (DRAM) device, a static random access memory (SRAM) device, flash memory device, phase-change memory device, or some other memory device having suitable performance to serve as process memory. In one embodiment the memory device 220 can operate as system memory for the system 200, to store data 222 and instructions 221 for use when the one or more processors 202 executes an application or process. Memory controller 216 also couples with an optional external graphics processor 212, which may communicate with the one or more graphics processors 208 in processors 202 to perform graphics and media operations. In some embodiments a display device 211 can connect to the processor(s) 202. The display device 211 can be one or more of an internal display device, as in a mobile electronic device or a laptop device or an external display device attached via a display interface (e.g., DisplayPort, etc.). In one embodiment the display device 211 can be a head mounted display (HMD) such as a stereoscopic display device for use in virtual reality (VR) applications or augmented reality (AR) applications.

In some embodiments, the platform controller hub 230 enables peripherals to connect to memory device 220 and processor 202 via a high-speed I/O bus. The I/O peripherals include, but are not limited to, an audio controller 246, a network controller 234, a firmware interface 228, a wireless transceiver 226, touch sensors 225, a data storage device 224 (e.g., hard disk drive, flash memory, etc.). The data storage device 224 can connect via a storage interface (e.g., SATA) or via a peripheral bus, such as a Peripheral Component Interconnect bus (e.g., PCI, PCI Express). The touch sensors 225 can include touch screen sensors, pressure sensors, or fingerprint sensors. The wireless transceiver 226 can be a Wi-Fi transceiver, a Bluetooth transceiver, or a mobile network transceiver such as a 3G, 4G, or Long-Term Evolution (LTE) transceiver. The firmware interface 228 enables communication with system firmware, and can be, for example, a unified extensible firmware interface (UEFI). The network controller 234 can enable a network connection to a wired network. In some embodiments, a high-performance network controller (not shown) couples with the interface bus 210. The audio controller 246, in one embodiment, is a multi-channel high-definition audio controller. In one embodiment the system 200 includes an optional legacy I/O controller 240 for coupling legacy (e.g., Personal System 2 (PS/2)) devices to the system. The platform controller hub 230 can also connect to one or more Universal Serial Bus (USB) controllers 242 connect input devices, such as keyboard and mouse 243 combinations, a camera 244, or other USB input devices.

In operation, the computing system 200 enables optimization of scalar register allocation using the scalar register allocator 240. In order to optimize scalar register allocation, the receiving module 242 is configured to receive an intermediate code and information about the one or more available physical registers in a memory of the processor 202, as input. For example, Blaize Graph Stream Processor has a total of 64 scalar registers each of size 512 bits. Out of the 64 scalar registers, the user is given access to 32 physical registers. Accordingly, the receiving module 242 is configured to receive information about available 32 physical registers.

The virtual register allocator module 244 is coupled to the receiving module 242. The virtual register allocator module 244 is configured to allocate the one or more virtual registers based on the received information. Each of the one or more virtual registers have a size same as that of each of the available physical registers.

The mapping module 246 is coupled to the virtual register allocator module 244. The mapping module 246 is configured to map one or more groups of 8-bit location of the one or more virtual registers to one or more register classes. For example, when the size of the virtual register is 64 bits, each group of 8-bits is mapped to first register class. Similarly, two groups of 8-bit (i.e., each 16 bits) is mapped to second register class. Similarly, four groups of 8-bit (i.e., each 32 bits) is mapped to a third register class.

The scalar identification module 248 is coupled to the receiving module 242. The scalar identification module 248 is configured to identify the plurality of scalar variables from the intermediate code. In order to identify the plurality of scalar variables, the scalar identification module 248 is configured to receive an intermediate representation (IR) code of source code and analyse each basic block of the IR code to classify each instruction of the IR code as either vector instructions or scalar instructions. Each basic block is of LOAD, STORE and arithmetic logical and multiply (ALM) instructions. The step of analyzing each basic block of the IR code includes analyzing each instruction of the LOAD, STORE, and ALM instructions to classify each instruction into either a scalar instruction or a vector instruction. For the LOAD instructions, the information contained in the instruction is analysed. If the information contained in the instruction is either absolute or indirect value, then the instruction is classified as a scalar instruction. For the STORE and ALM instructions, source of the instruction is analysed, wherein source of the instruction may be defined as operators on which the operations are being executed. If neither of the sources is marked as vector instruction, then the corresponding STORE/ALM instructions is classified as scalar instruction

The register assignor module 250 is coupled to the scalar identification module 348 and the mapping module 246. The register assignor module 250 is configured to dynamically assign the one or more available physical registers to the identified scalar variables using the one or more register classes. In one embodiment, in order to assign the one or more available physical registers, the register assignor module 250 is configured to allocate the one or more register classes to the identified scalar variables. In one embodiment, the one or more register classes are allocated based on associated computer hardware constraints, type of scalar variables, and number of the available physical registers etc. After the allocation of the one or more register classes, the register assignor module 250 assigns the one or more available physical registers to each of the identified scalar variables, based on the allocated one or more register classes and type of scalar variable.

It will be appreciated that the system 200 shown is exemplary and not limiting, as other types of data processing systems that are differently configured may also be used. For example, an instance of the memory controller 216, platform controller hub 230, and scalar register allocator 240 may be integrated into a discreet external graphics processor, such as the external graphics processor 212. In one embodiment the platform controller hub 230 and/or memory controller 260 may be external to the one or more processor(s) 202. For example, the system 200 can include an external memory controller 216 and platform controller hub 230, which may be configured as a memory controller hub and peripheral controller hub within a system chipset that is in communication with the processor(s) 202.

FIG. 3 illustrates a flow chart of an exemplary method of optimizing scalar register allocation process in accordance with some embodiments of the present disclosure; and the method 300 comprises one or more blocks implemented by the computing system 200 for enabling optimization of scalar register allocation. The method 300 may be described in the general context of a computer processor executable instructions. Generally, computer processor executable instructions can include scalar instructions, vector instructions, comparison and selection-based instructions etc.

The order in which the method 300 is described in not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method 300. Additionally, individual blocks may be deleted from the method 300 without departing from the spirit and scope of the subject matter described herein. Furthermore, the method 300 can be implemented in any suitable hardware having parallel processing capability, software executed by a processor, firmware, or combination thereof.

At block 302, an intermediate code and information about one or more available physical registers is received by a receiving module 242. The receiving module 242 is configured to receive an intermediate code and information about the one or more available physical registers in a memory of the processor, as input. For example, Blaize Graph Stream Processor has a total of 64 scalar registers each of size 512 bits. Out of the 64 scalar registers, the user is given access to 32 physical registers. Accordingly, the receiving module is configured to receive information about available 32 physical registers.

At block 304, one or more virtual registers are allocated by a virtual register allocator module 344 based on the received information. The virtual register allocator module 244 is configured to allocate the one or more virtual registers based on the received information.

At block 306, one or more groups of 8-bit location of the one or more virtual registers are mapped to one or more register classes by a mapping module 246. The mapping module 246 is configured to map the one or more groups of 8-bit location of the one or more virtual registers to the one or more register classes. For example, when the size of the virtual register is 64 bits, each group of 8-bits is mapped to first register class. Similarly, two groups of 8-bit (i.e., each 16 bits) is mapped to second register class. Similarly, four groups of 8-bit (i.e., each 32 bits) is mapped to a third register class.

At block 308, a plurality of scalar variables from the intermediate code is identified by a scalar identification module 248. The scalar identification module 248 is configured to identify the plurality of scalar variables from the intermediate code. In order to identify the plurality of scalar variables, the scalar identification module 248 is configured to receive an intermediate representation (IR) code of source code and analyse each basic block of the IR code to classify each instruction of the IR code as either vector instructions or scalar instructions. Each basic block is of LOAD, STORE and arithmetic logical and multiply (ALM) instructions. The step of analyzing each basic block of the IR code includes analyzing each instruction of the LOAD, STORE, and ALM instructions to classify each instruction into either a scalar instruction or a vector instruction. For the LOAD instructions, the information contained in the instruction is analysed. If the information contained in the instruction is either absolute or indirect value, then the instruction is classified as a scalar instruction. For the STORE and ALM instructions, source of the instruction is analysed, wherein source of the instruction may be defined as operators on which the operations are being executed. If neither of the sources is marked as vector instruction, then the corresponding STORE/ALM instructions is classified as scalar instruction.

At block 310, the one or more available physical registers are assigned dynamically to the identified scalar variables by a register assignor module 250. The register assignor module 250 is configured to dynamically assign the one or more available physical registers to the identified scalar variables using the one or more register classes. In one embodiment, in order to assign the one or more available physical registers, the register assignor module 350 is configured to allocate the one or more register classes to the identified scalar variables. In one embodiment, the one or more register classes are allocated based on associated computer hardware constraints, type of scalar variables, and number of the available physical registers etc. After the allocation of the one or more register classes, the register assignor module 250 assigns the one or more available physical registers to each of the identified scalar variables, based on the allocated one or more register classes and type of scalar variable. In an exemplary embodiment, in order to allocate the one or more register classes to the identified scalar variables, the register assignor module 250 is configured to allocate greedily first available continuous one or more groups of 8-bit locations based on a data type associated with each identified scalar variable.

Example

In one example, if the scalar identification module 348 identifies 3 scalar variables in the intermediate code, then the scalar register allocator 340 performs optimized allocation of physical register as explained in the forthcoming paragraphs.

The receiving module 342 is configured to receive an intermediate code and information about one or more available physical registers in a memory of the processor. For example, if the processor 302 has 16 available physical registers, each having 64-bit size, the receiving module 342 is configured to receive the intermediate code and information about 16 available physical register having 64-bit size.

The virtual register allocator module 344 is configured to allocate the one or more virtual registers of 64-bit size.

The mapping module 346 is configured to map the one or more groups of 8-bit location of the one or more virtual registers to one or more register classes R1, R2, . . . R7 as shown in FIG. 4A. The mapping module 246 is configured to map each group of 8-bit to the first register class (i.e., R0, R1, R2, . . . R7). The mapping module 246 is configured to map two groups of 8-bit to the second register classes (i.e., R0R1, R2R3, R4R5, and R6R7). Further, the mapping module 246 is configured to map four groups of 8-bit to the third register classes (i.e., R0R1R2R3, and R4R5R6R7).

The register assignor module 250 is configured to dynamically assign a first scalar variable (char a) to the first register class (i.e., R0). Further, the register assignor module 250 is configured to dynamically assign a second scalar variable (int b) to the second register class (i.e., R2R3). Finally, the register assignor module 250 is configured to dynamically assign a third scalar variable (char c) to the first register class (i.e., R1) as shown in FIG. 4B.

Thus, by using the optimized scalar register allocation, the performance of the compiler is enhanced by allocating multiple scalar variables in a single physical register. Further, by implementing the optimized scalar register allocation, the number of threads that can run parallel is increased, thereby increasing the execution speed of the program.

The illustrated steps are set out to explain the exemplary embodiments shown, and it should be anticipated that ongoing technological development will change the manner in which particular functions are performed. These examples are presented herein for purposes of illustration, and not limitation. Further, the boundaries of the functional building blocks have been arbitrarily defined herein for the convenience of the description. Alternative boundaries can be defined so long as the specified functions and relationships thereof are appropriately performed. Alternatives (including equivalents, extensions, variations, deviations, etc., of those described herein) will be apparent to persons skilled in the relevant art(s) based on the teachings contained herein. Such alternatives fall within the scope and spirit of the disclosed embodiments. Also, the words “comprising,” “having,” “containing,” and “including,” and other similar forms are intended to be equivalent in meaning and be open ended in that an item or items following any one of these words is not meant to be an exhaustive listing of such item or items or meant to be limited to only the listed item or items. It must also be noted that as used herein and in the appended claims, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise.

Furthermore, one or more computer-readable storage media may be utilized in implementing embodiments consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory on which information or data readable by a processor may be stored. Thus, a computer-readable storage medium may store instructions for execution by one or more processors, including instructions for causing the processor(s) to perform steps or stages consistent with the embodiments described herein. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, i.e., are non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, non-volatile memory, hard drives, CD ROMs, DVDs, flash drives, disks, and any other known physical storage media.

Finally, the language used in the specification has been principally selected for readability and instructional purposes, and it may not have been selected to delineate or circumscribe the inventive subject matter. Accordingly, the disclosure of the embodiments of the disclosure is intended to be illustrative, but not limiting, of the scope of the disclosure.

With respect to the use of substantially any plural and/or singular terms herein, those having skill in the art can translate from the plural to the singular and/or from the singular to the plural as is appropriate to the context and/or application. The various singular/plural permutations may be expressly set forth herein for sake of clarity.

Claims

1. A method of optimizing scalar register allocation by a processor, comprising:

receiving an intermediate code and information about one or more available physical registers in a memory of the processor, as input;

allocating one or more virtual registers based on the received information, wherein each virtual register is having size of each of the one or more available physical register;

mapping one or more groups of 8-bit location of the one or more virtual registers to one or more register classes;

identifying a plurality of scalar variables from the input intermediate code; and

dynamically assigning the one or more available physical registers to the identified plurality of scalar variables using the one or more register classes.

2. The method as claimed in claim 1, wherein each of the one or more register classes include different data types, each data type having different size.

3. The method as claimed in claim 1, wherein mapping one or more groups of 8-bit location of the one or more virtual registers to one or more register classes is based on a size of the available physical registers.

4. The method as claimed in claim 1, wherein assigning the one or more available physical registers comprises:

allocating the one or more register classes to the identified scalar variables, wherein the one or more register classes are allocated based on associated computer hardware, type of scalar variables, and number of the available physical registers; and

assigning the one or more available physical registers to each of the identified scalar variables, based on the allocated one or more register classes and type of scalar variable.

5. The method as claimed in claim 4, wherein allocating the one or more register classes to the identified scalar variables comprises allocating greedily first available continuous one or more groups of 8-bit locations based on a data type associated with each identified scalar variable, wherein the data type is one of byte, short, int, long, char, float, and double.

6. A system to optimize scalar register allocation, the system comprising:

a memory; and

a processor, coupled to the memory, and configured to: receive an intermediate code and information about one or more available physical registers in the memory of the processor, as input; allocate one or more virtual registers based on the received information, wherein each virtual register is having size of each available physical register; map one or more groups of 8-bit location of the one or more virtual registers to one or more register classes; identify the plurality of scalar variables from the input intermediate code; and dynamically assign the one or more available physical registers to the identified scalar variables using the one or more register classes.

7. The system as claimed in claim 6, wherein the processor is configured to map the one or more groups of 8-bit location to the one or more register classes that include different data types, wherein each data type having different sizes.

8. The system as claimed in claim 6, wherein the processor is configured to map the one or more groups of 8-bit location of the one or more virtual registers to the one or more register classes based on a size of the available physical registers.

9. The system as claimed in claim 6, wherein to assign the one or more available registers, the processor is configured to:

allocate the one or more register classes to the identified scalar variables, wherein the one or more register classes are allocated based on associated computer hardware, type of scalar variables, and number of the available physical registers; and

assign the one or more available physical registers to each of the identified scalar variables, based on the allocated one or more register classes and type of scalar variable.

10. The system as claimed in claim 9, wherein to allocate the one or more register classes to the identified scalar variables, the processor is configured to allocate greedily first available continuous one or more groups of 8-bit locations based on a data type associated with each identified scalar variable, wherein the data type is one of byte, short, int, long, char, float, and double.

11. A non-transitory computer readable medium comprising instructions that, when executed, cause one or more processors to:

receive an intermediate code and information about one or more available physical registers in a memory of the processor, as input;

allocate one or more virtual registers based on the received information, wherein each virtual register is having size of each of the one or more available physical register;

map one or more groups of 8-bit location of the one or more virtual registers to one or more register classes;

identify a plurality of scalar variables from the input intermediate code; and

dynamically assign the one or more available physical registers to the identified plurality of scalar variables using the one or more register classes.

12. The non-transitory computer readable medium as claimed in claim 11, wherein the one or more processors is configured to map the one or more groups of 8-bit location to the one or more register classes that include different data types, wherein each data type having different sizes.

13. The non-transitory computer readable medium as claimed in claim 11, wherein the one or more processors is configured to map the one or more groups of 8-bit location of the one or more virtual registers to the one or more register classes based on a size of the available physical registers.

14. The non-transitory computer readable medium of claim 11, wherein the one or more processors assign the one or more available physical registers by performing steps comprising:

allocating the one or more register classes to the identified scalar variables, wherein the one or more register classes are allocated based on associated computer hardware, type of scalar variables, and number of the available physical registers; and

assigning the one or more available physical registers to each of the identified scalar variables, based on the allocated one or more register classes and type of scalar variable.

15. The non-transitory computer readable medium of claim 13, wherein the one or more processors is configured to allocate the one or more register classes to the identified scalar variables by allocating greedily first available continuous one or more groups of 8-bit locations based on a data type associated with each identified scalar variable, wherein the data type is one of byte, short, int, long, char, float, and double.