Patents by Inventor Zehra N. Sura

Zehra N. Sura has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20160283209
    Abstract: In one embodiment, a computer-implemented method includes receiving source code to be compiled into an executable file for an unaligned instruction set architecture (ISA). Aligned assembled code is generated, by a computer processor. The aligned assembled code complies with an aligned ISA and includes aligned processor code for a processor and aligned accelerator code for an accelerator. A first linking pass is performed on the aligned assembled code, including relocating a first relocation target in the aligned accelerator code that refers to a first object outside the aligned accelerator code. Unaligned assembled code is generated in accordance with the unaligned ISA and includes unaligned accelerator code for the accelerator and unaligned processor code for the processor. A second linking pass is performed on the unaligned assembled code, including relocating a second relocation target outside the unaligned accelerator code that refers to an object in the unaligned accelerator code.
    Type: Application
    Filed: March 25, 2015
    Publication date: September 29, 2016
    Inventors: Carlo Bertolli, John K. O'Brien, Olivier H. Sallenave, Zehra N. Sura
  • Publication number: 20160283158
    Abstract: An aspect includes a table of contents (TOC) that was generated by a compiler being received at an accelerator device. The TOC includes an address of global data in a host memory space. The global data is copied from the address in the host memory space to an address in the device memory space. The address in the host memory space is obtained from the received TOC. The received TOC is updated to indicate that global data is stored at the address in the device memory space. A kernel that accesses the global data from the address in the device memory space is executed. The address in the device memory space is obtained based on contents of the updated TOC. When the executing is completed, the global data from the address in the device memory space is copied to the address in the host memory space.
    Type: Application
    Filed: March 25, 2015
    Publication date: September 29, 2016
    Inventors: Carlo Bertolli, John K. O'Brien, Olivier H. Sallenave, Zehra N. Sura
  • Publication number: 20160283208
    Abstract: Embodiments relate to program structure-based blocking. An aspect includes receiving source code corresponding to a computer program by a compiler of a computer system. Another aspect includes determining a prefetching section in the source code by a marking module of the compiler. Yet another aspect includes performing, by a blocking module of the compiler, blocking of instructions located in the prefetching section into instruction blocks, such that the instruction blocks of the prefetching section only contain instructions that are located in the prefetching section.
    Type: Application
    Filed: June 17, 2015
    Publication date: September 29, 2016
    Inventors: Carlo Bertolli, Alexandre E. Eichenberger, John K. O'Brien, Zehra N. Sura
  • Publication number: 20160283210
    Abstract: Embodiments relate to program structure-based blocking. An aspect includes receiving source code corresponding to a computer program by a compiler of a computer system. Another aspect includes determining a prefetching section in the source code by a marking module of the compiler. Yet another aspect includes performing, by a blocking module of the compiler, blocking of instructions located in the prefetching section into instruction blocks, such that the instruction blocks of the prefetching section only contain instructions that are located in the prefetching section.
    Type: Application
    Filed: March 25, 2015
    Publication date: September 29, 2016
    Inventors: Carlo Bertolli, Alexandre E. Eichenberger, John K. O'Brien, Zehra N. Sura
  • Publication number: 20160283248
    Abstract: In one embodiment, a computer-implemented method includes tracking a size of a load-store queue (LSQ) during compile time of a program. The size of the LSQ is time-varying and indicates how many memory access instructions of the program are on the LSQ. The method further includes scheduling, by a computer processor, a plurality of memory access instructions of the program based on the size of the LSQ.
    Type: Application
    Filed: March 26, 2015
    Publication date: September 29, 2016
    Inventors: Tong Chen, Alexandre E. Eichenberger, Arpith C. Jacob, Zehra N. Sura
  • Publication number: 20160283144
    Abstract: An aspect includes a table of contents (TOC) that was generated by a compiler being received at an accelerator device. The TOC includes an address of global data in a host memory space. The global data is copied from the address in the host memory space to an address in the device memory space. The address in the host memory space is obtained from the received TOC. The received TOC is updated to indicate that global data is stored at the address in the device memory space. A kernel that accesses the global data from the address in the device memory space is executed. The address in the device memory space is obtained based on contents of the updated TOC. When the executing is completed, the global data from the address in the device memory space is copied to the address in the host memory space.
    Type: Application
    Filed: June 22, 2015
    Publication date: September 29, 2016
    Inventors: Carlo Bertolli, John K. O'Brien, Olivier H. Sallenave, Zehra N. Sura
  • Publication number: 20160283212
    Abstract: In one embodiment, a computer-implemented method includes tracking a size of a load-store queue (LSQ) during compile time of a program. The size of the LSQ is time-varying and indicates how many memory access instructions of the program are on the LSQ. The method further includes scheduling, by a computer processor, a plurality of memory access instructions of the program based on the size of the LSQ.
    Type: Application
    Filed: June 19, 2015
    Publication date: September 29, 2016
    Inventors: Tong Chen, Alexandre E. Eichenberger, Arpith C. Jacob, Zehra N. Sura
  • Patent number: 9229715
    Abstract: A monitor bit per hardware thread in a memory location may be allocated, in a multiprocessing computer system having a plurality of hardware threads, the plurality of hardware threads sharing the memory location, and each of the allocated monitor bit corresponding to one of the plurality of hardware threads. A condition bit may be allocated for each of the plurality of hardware threads, the condition bit being allocated in each context of the plurality of hardware threads. In response to detecting the memory location being accessed, it is determined whether a monitor bit corresponding to a hardware thread in the memory location is set. In response to determining that the monitor bit corresponding to a hardware thread is set in the memory location, a condition bit corresponding to a thread accessing the memory location is set in the hardware thread's context.
    Type: Grant
    Filed: May 21, 2013
    Date of Patent: January 5, 2016
    Assignee: International Business Machines Corporation
    Inventors: Michael K. Gschwind, John K. O'Brien, Valentina Salapura, Zehra N. Sura
  • Patent number: 9183063
    Abstract: An active memory system includes a computer and an active memory device including layers of memory forming a three-dimensional memory device and individual columns of chips forming vaults in communication with a processing element and logic. The processing element is configured to communicate to the chips and other processing elements. The active memory system also includes a compiler configured to implement a method. The method includes dividing a power budget for the active memory device into a discrete number of power tokens, each of the power tokens having an equal value of units of power. The method also includes determining a power requirement for executing a code segment on the processing element of the active memory device based on characteristics of the processing element and the code segment. The method further includes allocating, to the processing element at runtime, one or more power tokens to satisfy the power requirement.
    Type: Grant
    Filed: November 19, 2012
    Date of Patent: November 10, 2015
    Assignee: International Business Machines Corporation
    Inventors: Hans M. Jacobson, Ravi Nair, John K. P. O'Brien, Zehra N. Sura
  • Publication number: 20150269073
    Abstract: According to one embodiment, a method of creating compiler-generated memory mapping hints in a computer system includes analyzing code, by a compiler of the computer system, to identify data access patterns in the code. System configuration information defining data processing system characteristics of a target system for the code is accessed. The data processing system characteristics include a plurality of processing resources and memory domain characteristics relative to the processing resources. A preferred allocation of data in memory domains of the target system is determined based on mapping the code to one or more selected processing resources and mapping the data to one or more of the memory domains based on the memory domain characteristics relative to the one or more selected processing resources. The preferred allocation is stored as compiler-generated memory mapping hints in a format accessible by a physical memory mapping resource of the target system.
    Type: Application
    Filed: March 19, 2014
    Publication date: September 24, 2015
    Applicant: International Business Machines Corporation
    Inventors: Kathryn M. O'Brien, John K. O'Brien, Zehra N. Sura
  • Patent number: 9110734
    Abstract: A heterogeneous processing system includes a compiler for performing power-constrained code generation and scheduling of work in the heterogeneous processing system. The compiler produces source code that is executable by a computer. The compiler performs a method. The method includes dividing a power budget for the heterogeneous processing system into a discrete number of power tokens. Each of the power tokens has an equal value of units of power. The method also includes determining a power requirement for executing a code segment on a processing element of the heterogeneous processing system. The determining is based on characteristics of the processing element and the code segment. The method further includes allocating, to the processing element at runtime, at least one of the power tokens to satisfy the power requirement.
    Type: Grant
    Filed: November 12, 2012
    Date of Patent: August 18, 2015
    Assignee: International Business Machines Corporation
    Inventors: Hans M. Jacobson, Ravi Nair, John K. P. O'Brien, Zehra N. Sura
  • Patent number: 8997071
    Abstract: A compiler implemented by a computer performs optimized division of work across heterogeneous processors. The compiler divides source code into code sections and characterizes each of the code sections based on pre-defined criteria. Each of the code sections is characterized as at least one of: allocate to a main processor, allocate to a processing element, allocate to one of a parameterized main processor and a parameterized processing element, and indeterminate. The compiler analyzes side-effects and costs of executing the code sections on allocated processors, and transforms the code sections based on results of the analyzing. The transforming includes re-characterizing the code sections for alternate execution in a runtime environment.
    Type: Grant
    Filed: September 10, 2012
    Date of Patent: March 31, 2015
    Assignee: International Business Machines Corporation
    Inventors: Tong Chen, John K. P. O'Brien, Zehra N. Sura
  • Patent number: 8762968
    Abstract: Prefetching irregular memory references into a software controlled cache is provided. A compiler analyzes source code to identify at least one of a plurality of loops that contain an irregular memory reference. The compiler determines if the irregular memory reference within the at least one loop is a candidate for optimization. Responsive to an indication that the irregular memory reference may be optimized, the compiler determines if the irregular memory reference is valid for prefetching. Responsive to an indication that the irregular memory reference is valid for prefetching, a store statement for an address of the irregular memory reference is inserted into the at least one loop. A runtime library call is inserted into a prefetch runtime library for the irregular memory reference. Data associated with the irregular memory reference is prefetched into the software controlled cache when the runtime library call is invoked.
    Type: Grant
    Filed: June 27, 2012
    Date of Patent: June 24, 2014
    Assignee: International Business Machines Corporation
    Inventors: Tong Chen, Marc Gonzalez tallada, Zehra N. Sura, Tao Zhang
  • Publication number: 20140136858
    Abstract: An active memory system includes a computer and an active memory device including layers of memory forming a three-dimensional memory device and individual columns of chips forming vaults in communication with a processing element and logic. The processing element is configured to communicate to the chips and other processing elements. The active memory system also includes a compiler configured to implement a method. The method includes dividing a power budget for the active memory device into a discrete number of power tokens, each of the power tokens having an equal value of units of power. The method also includes determining a power requirement for executing a code segment on the processing element of the active memory device based on characteristics of the processing element and the code segment. The method further includes allocating, to the processing element at runtime, one or more power tokens to satisfy the power requirement.
    Type: Application
    Filed: November 19, 2012
    Publication date: May 15, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Hans M. Jacobson, Ravi Nair, John K.P. O'Brien, Zehra N. Sura
  • Publication number: 20140136857
    Abstract: A heterogeneous processing system includes a compiler for performing power-constrained code generation and scheduling of work in the heterogeneous processing system. The compiler produces source code that is executable by a computer. The compiler performs a method. The method includes dividing a power budget for the heterogeneous processing system into a discrete number of power tokens. Each of the power tokens has an equal value of units of power. The method also includes determining a power requirement for executing a code segment on a processing element of the heterogeneous processing system. The determining is based on characteristics of the processing element and the code segment. The method further includes allocating, to the processing element at runtime, at least one of the power tokens to satisfy the power requirement.
    Type: Application
    Filed: November 12, 2012
    Publication date: May 15, 2014
    Applicant: International Business Machines Corporation
    Inventors: Hans M. Jacobson, Ravi Nair, John K.P. O'Brien, Zehra N. Sura
  • Publication number: 20140068581
    Abstract: A compiler implemented by a computer performs optimized division of work across heterogeneous processors. The compiler divides source code into code sections and characterizes each of the code sections based on pre-defined criteria. Each of the code sections is characterized as at least one of: allocate to a main processor, allocate to a processing element, allocate to one of a parameterized main processor and a parameterized processing element, and indeterminate. The compiler analyzes side-effects and costs of executing the code sections on allocated processors, and transforms the code sections based on results of the analyzing. The transforming includes re-characterizing the code sections for alternate execution in a runtime environment.
    Type: Application
    Filed: August 30, 2012
    Publication date: March 6, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Tong Chen, John K.P. O'Brien, Zehra N. Sura
  • Publication number: 20140068582
    Abstract: A compiler implemented by a computer performs optimized division of work across heterogeneous processors. The compiler divides source code into code sections and characterizes each of the code sections based on pre-defined criteria. Each of the code sections is characterized as at least one of: allocate to a main processor, allocate to a processing element, allocate to one of a parameterized main processor and a parameterized processing element, and indeterminate. The compiler analyzes side-effects and costs of executing the code sections on allocated processors, and transforms the code sections based on results of the analyzing. The transforming includes re-characterizing the code sections for alternate execution in a runtime environment.
    Type: Application
    Filed: September 10, 2012
    Publication date: March 6, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Tong Chen, John K.P. O'Brien, Zehra N. Sura
  • Patent number: 8561043
    Abstract: Mechanisms are provided for optimizing irregular memory references in computer code. These mechanisms may parse the computer code to identify memory references in the computer code. These mechanisms may further classify the memory references in the computer code as either regular memory references or irregular memory references. Moreover, the mechanisms may transform the computer code, by a compiler, to generate transformed computer code in which irregular memory references access a storage of a software cache of a data processing system through a transactional cache mechanism of the software cache.
    Type: Grant
    Filed: March 28, 2008
    Date of Patent: October 15, 2013
    Assignees: International Business Machines Corporation, Barcelona Supercomputing Center
    Inventors: Eduard Ayguade, Tong Chen, Alexandre E. Eichenberger, Marc Gonzalez Tallada, Xavier Martorell, John K. O'Brien, Kathryn M. O'Brien, Zehra N. Sura, Tao Zhang
  • Patent number: 8561044
    Abstract: Mechanisms for optimized code generation targeting a high locality software cache are provided. Original computer code is parsed to identify memory references in the original computer code. Memory references are classified as either regular memory references or irregular memory references. Regular memory references are controlled by a high locality cache mechanism. Original computer code is transformed, by a compiler, to generate transformed computer code in which the regular memory references are grouped into one or more memory reference streams, each memory reference stream having a leading memory reference, a trailing memory reference, and one or more middle memory references.
    Type: Grant
    Filed: October 7, 2008
    Date of Patent: October 15, 2013
    Assignee: International Business Machines Corporation
    Inventors: Tong Chen, Alexandre E. Eichenberger, Marc Gonzalez Tallada, John K. O'Brien, Kathryn M. O'Brien, Zehra N. Sura, Tao Zhang
  • Publication number: 20130263145
    Abstract: A monitor bit per hardware thread in a memory location may be allocated, in a multiprocessing computer system having a plurality of hardware threads, the plurality of hardware threads sharing the memory location, and each of the allocated monitor bit corresponding to one of the plurality of hardware threads. A condition bit may be allocated for each of the plurality of hardware threads, the condition bit being allocated in each context of the plurality of hardware threads. In response to detecting the memory location being accessed, it is determined whether a monitor bit corresponding to a hardware thread in the memory location is set. In response to determining that the monitor bit corresponding to a hardware thread is set in the memory location, a condition bit corresponding to a thread accessing the memory location is set in the hardware thread's context.
    Type: Application
    Filed: May 21, 2013
    Publication date: October 3, 2013
    Applicant: International Business Machines Corporation
    Inventors: Michael K. Gschwind, John K. O'Brien, Valentina Salapura, Zehra N. Sura