Patents by Inventor Zehra N. Sura

Zehra N. Sura has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

UNALIGNED INSTRUCTION RELOCATION

Publication number: 20160283209

Abstract: In one embodiment, a computer-implemented method includes receiving source code to be compiled into an executable file for an unaligned instruction set architecture (ISA). Aligned assembled code is generated, by a computer processor. The aligned assembled code complies with an aligned ISA and includes aligned processor code for a processor and aligned accelerator code for an accelerator. A first linking pass is performed on the aligned assembled code, including relocating a first relocation target in the aligned accelerator code that refers to a first object outside the aligned accelerator code. Unaligned assembled code is generated in accordance with the unaligned ISA and includes unaligned accelerator code for the accelerator and unaligned processor code for the processor. A second linking pass is performed on the unaligned assembled code, including relocating a second relocation target outside the unaligned accelerator code that refers to an object in the unaligned accelerator code.

Type: Application

Filed: March 25, 2015

Publication date: September 29, 2016

Inventors: Carlo Bertolli, John K. O'Brien, Olivier H. Sallenave, Zehra N. Sura
ACCESSING GLOBAL DATA FROM ACCELERATOR DEVICES

Publication number: 20160283158

Abstract: An aspect includes a table of contents (TOC) that was generated by a compiler being received at an accelerator device. The TOC includes an address of global data in a host memory space. The global data is copied from the address in the host memory space to an address in the device memory space. The address in the host memory space is obtained from the received TOC. The received TOC is updated to indicate that global data is stored at the address in the device memory space. A kernel that accesses the global data from the address in the device memory space is executed. The address in the device memory space is obtained based on contents of the updated TOC. When the executing is completed, the global data from the address in the device memory space is copied to the address in the host memory space.

Type: Application

Filed: March 25, 2015

Publication date: September 29, 2016

Inventors: Carlo Bertolli, John K. O'Brien, Olivier H. Sallenave, Zehra N. Sura
PROGRAM STRUCTURE-BASED BLOCKING

Publication number: 20160283208

Abstract: Embodiments relate to program structure-based blocking. An aspect includes receiving source code corresponding to a computer program by a compiler of a computer system. Another aspect includes determining a prefetching section in the source code by a marking module of the compiler. Yet another aspect includes performing, by a blocking module of the compiler, blocking of instructions located in the prefetching section into instruction blocks, such that the instruction blocks of the prefetching section only contain instructions that are located in the prefetching section.

Type: Application

Filed: June 17, 2015

Publication date: September 29, 2016

Inventors: Carlo Bertolli, Alexandre E. Eichenberger, John K. O'Brien, Zehra N. Sura
PROGRAM STRUCTURE-BASED BLOCKING

Publication number: 20160283210

Abstract: Embodiments relate to program structure-based blocking. An aspect includes receiving source code corresponding to a computer program by a compiler of a computer system. Another aspect includes determining a prefetching section in the source code by a marking module of the compiler. Yet another aspect includes performing, by a blocking module of the compiler, blocking of instructions located in the prefetching section into instruction blocks, such that the instruction blocks of the prefetching section only contain instructions that are located in the prefetching section.

Type: Application

Filed: March 25, 2015

Publication date: September 29, 2016

Inventors: Carlo Bertolli, Alexandre E. Eichenberger, John K. O'Brien, Zehra N. Sura
SCHEDULERS WITH LOAD-STORE QUEUE AWARENESS

Publication number: 20160283248

Abstract: In one embodiment, a computer-implemented method includes tracking a size of a load-store queue (LSQ) during compile time of a program. The size of the LSQ is time-varying and indicates how many memory access instructions of the program are on the LSQ. The method further includes scheduling, by a computer processor, a plurality of memory access instructions of the program based on the size of the LSQ.

Type: Application

Filed: March 26, 2015

Publication date: September 29, 2016

Inventors: Tong Chen, Alexandre E. Eichenberger, Arpith C. Jacob, Zehra N. Sura
ACCESSING GLOBAL DATA FROM ACCELERATOR DEVICES

Publication number: 20160283144

Abstract: An aspect includes a table of contents (TOC) that was generated by a compiler being received at an accelerator device. The TOC includes an address of global data in a host memory space. The global data is copied from the address in the host memory space to an address in the device memory space. The address in the host memory space is obtained from the received TOC. The received TOC is updated to indicate that global data is stored at the address in the device memory space. A kernel that accesses the global data from the address in the device memory space is executed. The address in the device memory space is obtained based on contents of the updated TOC. When the executing is completed, the global data from the address in the device memory space is copied to the address in the host memory space.

Type: Application

Filed: June 22, 2015

Publication date: September 29, 2016

Inventors: Carlo Bertolli, John K. O'Brien, Olivier H. Sallenave, Zehra N. Sura
SCHEDULERS WITH LOAD-STORE QUEUE AWARENESS

Publication number: 20160283212

Abstract: In one embodiment, a computer-implemented method includes tracking a size of a load-store queue (LSQ) during compile time of a program. The size of the LSQ is time-varying and indicates how many memory access instructions of the program are on the LSQ. The method further includes scheduling, by a computer processor, a plurality of memory access instructions of the program based on the size of the LSQ.

Type: Application

Filed: June 19, 2015

Publication date: September 29, 2016

Inventors: Tong Chen, Alexandre E. Eichenberger, Arpith C. Jacob, Zehra N. Sura
Method and apparatus for efficient inter-thread synchronization for helper threads

Patent number: 9229715

Abstract: A monitor bit per hardware thread in a memory location may be allocated, in a multiprocessing computer system having a plurality of hardware threads, the plurality of hardware threads sharing the memory location, and each of the allocated monitor bit corresponding to one of the plurality of hardware threads. A condition bit may be allocated for each of the plurality of hardware threads, the condition bit being allocated in each context of the plurality of hardware threads. In response to detecting the memory location being accessed, it is determined whether a monitor bit corresponding to a hardware thread in the memory location is set. In response to determining that the monitor bit corresponding to a hardware thread is set in the memory location, a condition bit corresponding to a thread accessing the memory location is set in the hardware thread's context.

Type: Grant

Filed: May 21, 2013

Date of Patent: January 5, 2016

Assignee: International Business Machines Corporation

Inventors: Michael K. Gschwind, John K. O'Brien, Valentina Salapura, Zehra N. Sura
Power-constrained compiler code generation and scheduling of work in a heterogeneous processing system

Patent number: 9183063

Abstract: An active memory system includes a computer and an active memory device including layers of memory forming a three-dimensional memory device and individual columns of chips forming vaults in communication with a processing element and logic. The processing element is configured to communicate to the chips and other processing elements. The active memory system also includes a compiler configured to implement a method. The method includes dividing a power budget for the active memory device into a discrete number of power tokens, each of the power tokens having an equal value of units of power. The method also includes determining a power requirement for executing a code segment on the processing element of the active memory device based on characteristics of the processing element and the code segment. The method further includes allocating, to the processing element at runtime, one or more power tokens to satisfy the power requirement.

Type: Grant

Filed: November 19, 2012

Date of Patent: November 10, 2015

Assignee: International Business Machines Corporation

Inventors: Hans M. Jacobson, Ravi Nair, John K. P. O'Brien, Zehra N. Sura
COMPILER-GENERATED MEMORY MAPPING HINTS

Publication number: 20150269073

Abstract: According to one embodiment, a method of creating compiler-generated memory mapping hints in a computer system includes analyzing code, by a compiler of the computer system, to identify data access patterns in the code. System configuration information defining data processing system characteristics of a target system for the code is accessed. The data processing system characteristics include a plurality of processing resources and memory domain characteristics relative to the processing resources. A preferred allocation of data in memory domains of the target system is determined based on mapping the code to one or more selected processing resources and mapping the data to one or more of the memory domains based on the memory domain characteristics relative to the one or more selected processing resources. The preferred allocation is stored as compiler-generated memory mapping hints in a format accessible by a physical memory mapping resource of the target system.

Type: Application

Filed: March 19, 2014

Publication date: September 24, 2015

Applicant: International Business Machines Corporation

Inventors: Kathryn M. O'Brien, John K. O'Brien, Zehra N. Sura
Power-constrained compiler code generation and scheduling of work in a heterogeneous processing system

Patent number: 9110734

Abstract: A heterogeneous processing system includes a compiler for performing power-constrained code generation and scheduling of work in the heterogeneous processing system. The compiler produces source code that is executable by a computer. The compiler performs a method. The method includes dividing a power budget for the heterogeneous processing system into a discrete number of power tokens. Each of the power tokens has an equal value of units of power. The method also includes determining a power requirement for executing a code segment on a processing element of the heterogeneous processing system. The determining is based on characteristics of the processing element and the code segment. The method further includes allocating, to the processing element at runtime, at least one of the power tokens to satisfy the power requirement.

Type: Grant

Filed: November 12, 2012

Date of Patent: August 18, 2015

Assignee: International Business Machines Corporation

Inventors: Hans M. Jacobson, Ravi Nair, John K. P. O'Brien, Zehra N. Sura
Optimized division of work among processors in a heterogeneous processing system

Patent number: 8997071

Abstract: A compiler implemented by a computer performs optimized division of work across heterogeneous processors. The compiler divides source code into code sections and characterizes each of the code sections based on pre-defined criteria. Each of the code sections is characterized as at least one of: allocate to a main processor, allocate to a processing element, allocate to one of a parameterized main processor and a parameterized processing element, and indeterminate. The compiler analyzes side-effects and costs of executing the code sections on allocated processors, and transforms the code sections based on results of the analyzing. The transforming includes re-characterizing the code sections for alternate execution in a runtime environment.

Type: Grant

Filed: September 10, 2012

Date of Patent: March 31, 2015

Assignee: International Business Machines Corporation

Inventors: Tong Chen, John K. P. O'Brien, Zehra N. Sura
Prefetching irregular data references for software controlled caches

Patent number: 8762968

Abstract: Prefetching irregular memory references into a software controlled cache is provided. A compiler analyzes source code to identify at least one of a plurality of loops that contain an irregular memory reference. The compiler determines if the irregular memory reference within the at least one loop is a candidate for optimization. Responsive to an indication that the irregular memory reference may be optimized, the compiler determines if the irregular memory reference is valid for prefetching. Responsive to an indication that the irregular memory reference is valid for prefetching, a store statement for an address of the irregular memory reference is inserted into the at least one loop. A runtime library call is inserted into a prefetch runtime library for the irregular memory reference. Data associated with the irregular memory reference is prefetched into the software controlled cache when the runtime library call is invoked.

Type: Grant

Filed: June 27, 2012

Date of Patent: June 24, 2014

Assignee: International Business Machines Corporation

Inventors: Tong Chen, Marc Gonzalez tallada, Zehra N. Sura, Tao Zhang
POWER-CONSTRAINED COMPILER CODE GENERATION AND SCHEDULING OF WORK IN A HETEROGENEOUS PROCESSING SYSTEM

Publication number: 20140136858

Abstract: An active memory system includes a computer and an active memory device including layers of memory forming a three-dimensional memory device and individual columns of chips forming vaults in communication with a processing element and logic. The processing element is configured to communicate to the chips and other processing elements. The active memory system also includes a compiler configured to implement a method. The method includes dividing a power budget for the active memory device into a discrete number of power tokens, each of the power tokens having an equal value of units of power. The method also includes determining a power requirement for executing a code segment on the processing element of the active memory device based on characteristics of the processing element and the code segment. The method further includes allocating, to the processing element at runtime, one or more power tokens to satisfy the power requirement.

Type: Application

Filed: November 19, 2012

Publication date: May 15, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Hans M. Jacobson, Ravi Nair, John K.P. O'Brien, Zehra N. Sura
POWER-CONSTRAINED COMPILER CODE GENERATION AND SCHEDULING OF WORK IN A HETEROGENEOUS PROCESSING SYSTEM

Publication number: 20140136857

Abstract: A heterogeneous processing system includes a compiler for performing power-constrained code generation and scheduling of work in the heterogeneous processing system. The compiler produces source code that is executable by a computer. The compiler performs a method. The method includes dividing a power budget for the heterogeneous processing system into a discrete number of power tokens. Each of the power tokens has an equal value of units of power. The method also includes determining a power requirement for executing a code segment on a processing element of the heterogeneous processing system. The determining is based on characteristics of the processing element and the code segment. The method further includes allocating, to the processing element at runtime, at least one of the power tokens to satisfy the power requirement.

Type: Application

Filed: November 12, 2012

Publication date: May 15, 2014

Applicant: International Business Machines Corporation

Inventors: Hans M. Jacobson, Ravi Nair, John K.P. O'Brien, Zehra N. Sura
OPTIMIZED DIVISION OF WORK AMONG PROCESSORS IN A HETEROGENEOUS PROCESSING SYSTEM

Publication number: 20140068581

Abstract: A compiler implemented by a computer performs optimized division of work across heterogeneous processors. The compiler divides source code into code sections and characterizes each of the code sections based on pre-defined criteria. Each of the code sections is characterized as at least one of: allocate to a main processor, allocate to a processing element, allocate to one of a parameterized main processor and a parameterized processing element, and indeterminate. The compiler analyzes side-effects and costs of executing the code sections on allocated processors, and transforms the code sections based on results of the analyzing. The transforming includes re-characterizing the code sections for alternate execution in a runtime environment.

Type: Application

Filed: August 30, 2012

Publication date: March 6, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Tong Chen, John K.P. O'Brien, Zehra N. Sura
OPTIMIZED DIVISION OF WORK AMONG PROCESSORS IN A HETEROGENEOUS PROCESSING SYSTEM

Publication number: 20140068582

Abstract: A compiler implemented by a computer performs optimized division of work across heterogeneous processors. The compiler divides source code into code sections and characterizes each of the code sections based on pre-defined criteria. Each of the code sections is characterized as at least one of: allocate to a main processor, allocate to a processing element, allocate to one of a parameterized main processor and a parameterized processing element, and indeterminate. The compiler analyzes side-effects and costs of executing the code sections on allocated processors, and transforms the code sections based on results of the analyzing. The transforming includes re-characterizing the code sections for alternate execution in a runtime environment.

Type: Application

Filed: September 10, 2012

Publication date: March 6, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Tong Chen, John K.P. O'Brien, Zehra N. Sura
Data transfer optimized software cache for irregular memory references

Patent number: 8561043

Abstract: Mechanisms are provided for optimizing irregular memory references in computer code. These mechanisms may parse the computer code to identify memory references in the computer code. These mechanisms may further classify the memory references in the computer code as either regular memory references or irregular memory references. Moreover, the mechanisms may transform the computer code, by a compiler, to generate transformed computer code in which irregular memory references access a storage of a software cache of a data processing system through a transactional cache mechanism of the software cache.

Type: Grant

Filed: March 28, 2008

Date of Patent: October 15, 2013

Assignees: International Business Machines Corporation, Barcelona Supercomputing Center

Inventors: Eduard Ayguade, Tong Chen, Alexandre E. Eichenberger, Marc Gonzalez Tallada, Xavier Martorell, John K. O'Brien, Kathryn M. O'Brien, Zehra N. Sura, Tao Zhang
Optimized code generation targeting a high locality software cache

Patent number: 8561044

Abstract: Mechanisms for optimized code generation targeting a high locality software cache are provided. Original computer code is parsed to identify memory references in the original computer code. Memory references are classified as either regular memory references or irregular memory references. Regular memory references are controlled by a high locality cache mechanism. Original computer code is transformed, by a compiler, to generate transformed computer code in which the regular memory references are grouped into one or more memory reference streams, each memory reference stream having a leading memory reference, a trailing memory reference, and one or more middle memory references.

Type: Grant

Filed: October 7, 2008

Date of Patent: October 15, 2013

Assignee: International Business Machines Corporation

Inventors: Tong Chen, Alexandre E. Eichenberger, Marc Gonzalez Tallada, John K. O'Brien, Kathryn M. O'Brien, Zehra N. Sura, Tao Zhang
METHOD AND APPARATUS FOR EFFICIENT INTER-THREAD SYNCHRONIZATION FOR HELPER THREADS

Publication number: 20130263145

Abstract: A monitor bit per hardware thread in a memory location may be allocated, in a multiprocessing computer system having a plurality of hardware threads, the plurality of hardware threads sharing the memory location, and each of the allocated monitor bit corresponding to one of the plurality of hardware threads. A condition bit may be allocated for each of the plurality of hardware threads, the condition bit being allocated in each context of the plurality of hardware threads. In response to detecting the memory location being accessed, it is determined whether a monitor bit corresponding to a hardware thread in the memory location is set. In response to determining that the monitor bit corresponding to a hardware thread is set in the memory location, a condition bit corresponding to a thread accessing the memory location is set in the hardware thread's context.

Type: Application

Filed: May 21, 2013

Publication date: October 3, 2013

Applicant: International Business Machines Corporation

Inventors: Michael K. Gschwind, John K. O'Brien, Valentina Salapura, Zehra N. Sura

prev 1 2 3 4 next