Patents by Inventor Zehra N. Sura
Zehra N. Sura has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20160283209Abstract: In one embodiment, a computer-implemented method includes receiving source code to be compiled into an executable file for an unaligned instruction set architecture (ISA). Aligned assembled code is generated, by a computer processor. The aligned assembled code complies with an aligned ISA and includes aligned processor code for a processor and aligned accelerator code for an accelerator. A first linking pass is performed on the aligned assembled code, including relocating a first relocation target in the aligned accelerator code that refers to a first object outside the aligned accelerator code. Unaligned assembled code is generated in accordance with the unaligned ISA and includes unaligned accelerator code for the accelerator and unaligned processor code for the processor. A second linking pass is performed on the unaligned assembled code, including relocating a second relocation target outside the unaligned accelerator code that refers to an object in the unaligned accelerator code.Type: ApplicationFiled: March 25, 2015Publication date: September 29, 2016Inventors: Carlo Bertolli, John K. O'Brien, Olivier H. Sallenave, Zehra N. Sura
-
Publication number: 20160283158Abstract: An aspect includes a table of contents (TOC) that was generated by a compiler being received at an accelerator device. The TOC includes an address of global data in a host memory space. The global data is copied from the address in the host memory space to an address in the device memory space. The address in the host memory space is obtained from the received TOC. The received TOC is updated to indicate that global data is stored at the address in the device memory space. A kernel that accesses the global data from the address in the device memory space is executed. The address in the device memory space is obtained based on contents of the updated TOC. When the executing is completed, the global data from the address in the device memory space is copied to the address in the host memory space.Type: ApplicationFiled: March 25, 2015Publication date: September 29, 2016Inventors: Carlo Bertolli, John K. O'Brien, Olivier H. Sallenave, Zehra N. Sura
-
Publication number: 20160283208Abstract: Embodiments relate to program structure-based blocking. An aspect includes receiving source code corresponding to a computer program by a compiler of a computer system. Another aspect includes determining a prefetching section in the source code by a marking module of the compiler. Yet another aspect includes performing, by a blocking module of the compiler, blocking of instructions located in the prefetching section into instruction blocks, such that the instruction blocks of the prefetching section only contain instructions that are located in the prefetching section.Type: ApplicationFiled: June 17, 2015Publication date: September 29, 2016Inventors: Carlo Bertolli, Alexandre E. Eichenberger, John K. O'Brien, Zehra N. Sura
-
Publication number: 20160283210Abstract: Embodiments relate to program structure-based blocking. An aspect includes receiving source code corresponding to a computer program by a compiler of a computer system. Another aspect includes determining a prefetching section in the source code by a marking module of the compiler. Yet another aspect includes performing, by a blocking module of the compiler, blocking of instructions located in the prefetching section into instruction blocks, such that the instruction blocks of the prefetching section only contain instructions that are located in the prefetching section.Type: ApplicationFiled: March 25, 2015Publication date: September 29, 2016Inventors: Carlo Bertolli, Alexandre E. Eichenberger, John K. O'Brien, Zehra N. Sura
-
Publication number: 20160283248Abstract: In one embodiment, a computer-implemented method includes tracking a size of a load-store queue (LSQ) during compile time of a program. The size of the LSQ is time-varying and indicates how many memory access instructions of the program are on the LSQ. The method further includes scheduling, by a computer processor, a plurality of memory access instructions of the program based on the size of the LSQ.Type: ApplicationFiled: March 26, 2015Publication date: September 29, 2016Inventors: Tong Chen, Alexandre E. Eichenberger, Arpith C. Jacob, Zehra N. Sura
-
Publication number: 20160283144Abstract: An aspect includes a table of contents (TOC) that was generated by a compiler being received at an accelerator device. The TOC includes an address of global data in a host memory space. The global data is copied from the address in the host memory space to an address in the device memory space. The address in the host memory space is obtained from the received TOC. The received TOC is updated to indicate that global data is stored at the address in the device memory space. A kernel that accesses the global data from the address in the device memory space is executed. The address in the device memory space is obtained based on contents of the updated TOC. When the executing is completed, the global data from the address in the device memory space is copied to the address in the host memory space.Type: ApplicationFiled: June 22, 2015Publication date: September 29, 2016Inventors: Carlo Bertolli, John K. O'Brien, Olivier H. Sallenave, Zehra N. Sura
-
Publication number: 20160283212Abstract: In one embodiment, a computer-implemented method includes tracking a size of a load-store queue (LSQ) during compile time of a program. The size of the LSQ is time-varying and indicates how many memory access instructions of the program are on the LSQ. The method further includes scheduling, by a computer processor, a plurality of memory access instructions of the program based on the size of the LSQ.Type: ApplicationFiled: June 19, 2015Publication date: September 29, 2016Inventors: Tong Chen, Alexandre E. Eichenberger, Arpith C. Jacob, Zehra N. Sura
-
Patent number: 9229715Abstract: A monitor bit per hardware thread in a memory location may be allocated, in a multiprocessing computer system having a plurality of hardware threads, the plurality of hardware threads sharing the memory location, and each of the allocated monitor bit corresponding to one of the plurality of hardware threads. A condition bit may be allocated for each of the plurality of hardware threads, the condition bit being allocated in each context of the plurality of hardware threads. In response to detecting the memory location being accessed, it is determined whether a monitor bit corresponding to a hardware thread in the memory location is set. In response to determining that the monitor bit corresponding to a hardware thread is set in the memory location, a condition bit corresponding to a thread accessing the memory location is set in the hardware thread's context.Type: GrantFiled: May 21, 2013Date of Patent: January 5, 2016Assignee: International Business Machines CorporationInventors: Michael K. Gschwind, John K. O'Brien, Valentina Salapura, Zehra N. Sura
-
Patent number: 9183063Abstract: An active memory system includes a computer and an active memory device including layers of memory forming a three-dimensional memory device and individual columns of chips forming vaults in communication with a processing element and logic. The processing element is configured to communicate to the chips and other processing elements. The active memory system also includes a compiler configured to implement a method. The method includes dividing a power budget for the active memory device into a discrete number of power tokens, each of the power tokens having an equal value of units of power. The method also includes determining a power requirement for executing a code segment on the processing element of the active memory device based on characteristics of the processing element and the code segment. The method further includes allocating, to the processing element at runtime, one or more power tokens to satisfy the power requirement.Type: GrantFiled: November 19, 2012Date of Patent: November 10, 2015Assignee: International Business Machines CorporationInventors: Hans M. Jacobson, Ravi Nair, John K. P. O'Brien, Zehra N. Sura
-
Publication number: 20150269073Abstract: According to one embodiment, a method of creating compiler-generated memory mapping hints in a computer system includes analyzing code, by a compiler of the computer system, to identify data access patterns in the code. System configuration information defining data processing system characteristics of a target system for the code is accessed. The data processing system characteristics include a plurality of processing resources and memory domain characteristics relative to the processing resources. A preferred allocation of data in memory domains of the target system is determined based on mapping the code to one or more selected processing resources and mapping the data to one or more of the memory domains based on the memory domain characteristics relative to the one or more selected processing resources. The preferred allocation is stored as compiler-generated memory mapping hints in a format accessible by a physical memory mapping resource of the target system.Type: ApplicationFiled: March 19, 2014Publication date: September 24, 2015Applicant: International Business Machines CorporationInventors: Kathryn M. O'Brien, John K. O'Brien, Zehra N. Sura
-
Patent number: 9110734Abstract: A heterogeneous processing system includes a compiler for performing power-constrained code generation and scheduling of work in the heterogeneous processing system. The compiler produces source code that is executable by a computer. The compiler performs a method. The method includes dividing a power budget for the heterogeneous processing system into a discrete number of power tokens. Each of the power tokens has an equal value of units of power. The method also includes determining a power requirement for executing a code segment on a processing element of the heterogeneous processing system. The determining is based on characteristics of the processing element and the code segment. The method further includes allocating, to the processing element at runtime, at least one of the power tokens to satisfy the power requirement.Type: GrantFiled: November 12, 2012Date of Patent: August 18, 2015Assignee: International Business Machines CorporationInventors: Hans M. Jacobson, Ravi Nair, John K. P. O'Brien, Zehra N. Sura
-
Patent number: 8997071Abstract: A compiler implemented by a computer performs optimized division of work across heterogeneous processors. The compiler divides source code into code sections and characterizes each of the code sections based on pre-defined criteria. Each of the code sections is characterized as at least one of: allocate to a main processor, allocate to a processing element, allocate to one of a parameterized main processor and a parameterized processing element, and indeterminate. The compiler analyzes side-effects and costs of executing the code sections on allocated processors, and transforms the code sections based on results of the analyzing. The transforming includes re-characterizing the code sections for alternate execution in a runtime environment.Type: GrantFiled: September 10, 2012Date of Patent: March 31, 2015Assignee: International Business Machines CorporationInventors: Tong Chen, John K. P. O'Brien, Zehra N. Sura
-
Patent number: 8762968Abstract: Prefetching irregular memory references into a software controlled cache is provided. A compiler analyzes source code to identify at least one of a plurality of loops that contain an irregular memory reference. The compiler determines if the irregular memory reference within the at least one loop is a candidate for optimization. Responsive to an indication that the irregular memory reference may be optimized, the compiler determines if the irregular memory reference is valid for prefetching. Responsive to an indication that the irregular memory reference is valid for prefetching, a store statement for an address of the irregular memory reference is inserted into the at least one loop. A runtime library call is inserted into a prefetch runtime library for the irregular memory reference. Data associated with the irregular memory reference is prefetched into the software controlled cache when the runtime library call is invoked.Type: GrantFiled: June 27, 2012Date of Patent: June 24, 2014Assignee: International Business Machines CorporationInventors: Tong Chen, Marc Gonzalez tallada, Zehra N. Sura, Tao Zhang
-
Publication number: 20140136858Abstract: An active memory system includes a computer and an active memory device including layers of memory forming a three-dimensional memory device and individual columns of chips forming vaults in communication with a processing element and logic. The processing element is configured to communicate to the chips and other processing elements. The active memory system also includes a compiler configured to implement a method. The method includes dividing a power budget for the active memory device into a discrete number of power tokens, each of the power tokens having an equal value of units of power. The method also includes determining a power requirement for executing a code segment on the processing element of the active memory device based on characteristics of the processing element and the code segment. The method further includes allocating, to the processing element at runtime, one or more power tokens to satisfy the power requirement.Type: ApplicationFiled: November 19, 2012Publication date: May 15, 2014Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Hans M. Jacobson, Ravi Nair, John K.P. O'Brien, Zehra N. Sura
-
Publication number: 20140136857Abstract: A heterogeneous processing system includes a compiler for performing power-constrained code generation and scheduling of work in the heterogeneous processing system. The compiler produces source code that is executable by a computer. The compiler performs a method. The method includes dividing a power budget for the heterogeneous processing system into a discrete number of power tokens. Each of the power tokens has an equal value of units of power. The method also includes determining a power requirement for executing a code segment on a processing element of the heterogeneous processing system. The determining is based on characteristics of the processing element and the code segment. The method further includes allocating, to the processing element at runtime, at least one of the power tokens to satisfy the power requirement.Type: ApplicationFiled: November 12, 2012Publication date: May 15, 2014Applicant: International Business Machines CorporationInventors: Hans M. Jacobson, Ravi Nair, John K.P. O'Brien, Zehra N. Sura
-
Publication number: 20140068581Abstract: A compiler implemented by a computer performs optimized division of work across heterogeneous processors. The compiler divides source code into code sections and characterizes each of the code sections based on pre-defined criteria. Each of the code sections is characterized as at least one of: allocate to a main processor, allocate to a processing element, allocate to one of a parameterized main processor and a parameterized processing element, and indeterminate. The compiler analyzes side-effects and costs of executing the code sections on allocated processors, and transforms the code sections based on results of the analyzing. The transforming includes re-characterizing the code sections for alternate execution in a runtime environment.Type: ApplicationFiled: August 30, 2012Publication date: March 6, 2014Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Tong Chen, John K.P. O'Brien, Zehra N. Sura
-
Publication number: 20140068582Abstract: A compiler implemented by a computer performs optimized division of work across heterogeneous processors. The compiler divides source code into code sections and characterizes each of the code sections based on pre-defined criteria. Each of the code sections is characterized as at least one of: allocate to a main processor, allocate to a processing element, allocate to one of a parameterized main processor and a parameterized processing element, and indeterminate. The compiler analyzes side-effects and costs of executing the code sections on allocated processors, and transforms the code sections based on results of the analyzing. The transforming includes re-characterizing the code sections for alternate execution in a runtime environment.Type: ApplicationFiled: September 10, 2012Publication date: March 6, 2014Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Tong Chen, John K.P. O'Brien, Zehra N. Sura
-
Patent number: 8561043Abstract: Mechanisms are provided for optimizing irregular memory references in computer code. These mechanisms may parse the computer code to identify memory references in the computer code. These mechanisms may further classify the memory references in the computer code as either regular memory references or irregular memory references. Moreover, the mechanisms may transform the computer code, by a compiler, to generate transformed computer code in which irregular memory references access a storage of a software cache of a data processing system through a transactional cache mechanism of the software cache.Type: GrantFiled: March 28, 2008Date of Patent: October 15, 2013Assignees: International Business Machines Corporation, Barcelona Supercomputing CenterInventors: Eduard Ayguade, Tong Chen, Alexandre E. Eichenberger, Marc Gonzalez Tallada, Xavier Martorell, John K. O'Brien, Kathryn M. O'Brien, Zehra N. Sura, Tao Zhang
-
Patent number: 8561044Abstract: Mechanisms for optimized code generation targeting a high locality software cache are provided. Original computer code is parsed to identify memory references in the original computer code. Memory references are classified as either regular memory references or irregular memory references. Regular memory references are controlled by a high locality cache mechanism. Original computer code is transformed, by a compiler, to generate transformed computer code in which the regular memory references are grouped into one or more memory reference streams, each memory reference stream having a leading memory reference, a trailing memory reference, and one or more middle memory references.Type: GrantFiled: October 7, 2008Date of Patent: October 15, 2013Assignee: International Business Machines CorporationInventors: Tong Chen, Alexandre E. Eichenberger, Marc Gonzalez Tallada, John K. O'Brien, Kathryn M. O'Brien, Zehra N. Sura, Tao Zhang
-
Publication number: 20130263145Abstract: A monitor bit per hardware thread in a memory location may be allocated, in a multiprocessing computer system having a plurality of hardware threads, the plurality of hardware threads sharing the memory location, and each of the allocated monitor bit corresponding to one of the plurality of hardware threads. A condition bit may be allocated for each of the plurality of hardware threads, the condition bit being allocated in each context of the plurality of hardware threads. In response to detecting the memory location being accessed, it is determined whether a monitor bit corresponding to a hardware thread in the memory location is set. In response to determining that the monitor bit corresponding to a hardware thread is set in the memory location, a condition bit corresponding to a thread accessing the memory location is set in the hardware thread's context.Type: ApplicationFiled: May 21, 2013Publication date: October 3, 2013Applicant: International Business Machines CorporationInventors: Michael K. Gschwind, John K. O'Brien, Valentina Salapura, Zehra N. Sura