Patents by Inventor John Kevin Patrick O'Brien

John Kevin Patrick O'Brien has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 7962906
    Abstract: A compiler includes a mechanism for employing multiple synergistic processors to execute long vectors. The compiler receives a single source program. The compiler identifies vectorizable loop code in the single source program and extracts the vectorizable loop code from the single source program. The compiler then compiles the extracted vectorizable loop code for a plurality of synergistic processors. The compiler also compiles a remainder of the single source program for a principal processor to form an executable main program such that the executable main program controls operation of the executable vectorizable loop code on the plurality of synergistic processors.
    Type: Grant
    Filed: March 15, 2007
    Date of Patent: June 14, 2011
    Assignee: International Business Machines Corporation
    Inventors: John Kevin Patrick O'Brien, Kathryn M. O'Brien, Daniel Arthur Prener
  • Patent number: 7870544
    Abstract: A “kill” intrinsic that may be used in programs for designating specific data objects as having been “killed” by a preceding action is provided. The concept of a data object being “killed” is that the compiler is informed that no operations (e.g., loads and stores) on that data object, or its aliases, can be moved across the point in the program flow where the data object is designated as having been “killed.” The “kill” intrinsic limits the reordering capability of an optimization scheduler of a compiler with regard to operations performed on “killed” data objects. The “kill” intrinsic may be used with DMA operations. Data objects being DMA'ed from a local store of a processor may be “killed” through use of the “kill” intrinsic prior to submitting the DMA request. Data objects being DMA'ed to the local store of the processor may be “killed” after verifying the transfer completes.
    Type: Grant
    Filed: April 5, 2006
    Date of Patent: January 11, 2011
    Assignee: International Business Machines Corporation
    Inventors: Daniel A. Brokenshire, John Kevin Patrick O'Brien
  • Patent number: 7784037
    Abstract: A compiler implemented software cache is provided in which non-aliased explicitly fetched data are excluded are provided. With the mechanisms of the illustrative embodiments, a compiler uses a forward data flow analysis to prove that there is no alias between the cached data and explicitly fetched data. Explicitly fetched data that has no alias in the cached data are excluded from the software cache. Explicitly fetched data that has aliases in the cached data are allowed to be stored in the software cache. In this way, there is no runtime overhead to maintain the correctness of the two copies of data. Moreover, the number of lines of the software cache that must be protected from eviction is decreased. This leads to a decrease in the amount of computation cycles required by the cache miss handler when evicting cache lines during cache miss handling.
    Type: Grant
    Filed: April 14, 2006
    Date of Patent: August 24, 2010
    Assignee: International Business Machines Corporation
    Inventors: Tong Chen, John Kevin Patrick O'Brien, Kathryn O'Brien, Byoungro So, Zehra N. Sura, Tao Zhang
  • Patent number: 7765360
    Abstract: Mechanisms for performing useful computations during a software cache reload operation are provided. With the illustrative embodiments, in order to perform software caching, a compiler takes original source code, and while compiling the source code, inserts explicit cache lookup instructions into appropriate portions of the source code where cacheable variables are referenced. In addition, the compiler inserts a cache miss handler routine that is used to branch execution of the code to a cache miss handler if the cache lookup instructions result in a cache miss. The cache miss handler, prior to performing a wait operation for waiting for the data to be retrieved from the backing store, branches execution to an independent subroutine identified by a compiler. The independent subroutine is executed while the data is being retrieved from the backing store such that useful work is performed.
    Type: Grant
    Filed: October 1, 2008
    Date of Patent: July 27, 2010
    Assignee: International Business Machines Corporation
    Inventors: John Kevin Patrick O'Brien, Kathryn O'Brien
  • Publication number: 20090158019
    Abstract: The present invention provides for a system for computer program code size partitioning for multiple memory multi-processor systems. At least one system parameter of a computer system comprising one or more disparate processing nodes is identified. Computer program code comprising a program to be run on the computer system is received. A program representation based on received computer program code is generated. At least one single-entry-single-exit (SESE) region is identified based on the whole program representation. At least one SESE region of less than a certain size (store-size-specific) is identified based on identified SESE regions and the at least one system parameter. Each store-size-specific SESE region is grouped into a node-specific subroutine. The non node-specific parts of the computer program code are modified based on the partitioning into node-specific subroutines. The modified computer program code including each node-specific subroutine is compiled based on a specified node characteristic.
    Type: Application
    Filed: December 17, 2008
    Publication date: June 18, 2009
    Applicant: International Business Machines Corporation
    Inventors: Kathryn M. O'Brien, John Kevin Patrick O'Brien
  • Publication number: 20090119652
    Abstract: The present invention provides for a system for computer program functional partitioning for heterogeneous multi-processing systems. At least one system parameter of a computer system comprising one or more disparate processing nodes is identified. Computer program code comprising a program to be run on the computer system is received. A whole program representation is generated based on received computer program code. At least one single-entry-single-exit (SESE) region is identified based on the whole program representation. At least one node-specific SESE region is identified based on identified SESE regions and the at least one system parameter. Each node-specific SESE region is grouped into a node-specific subroutine. Each node-specific subroutine is compiled based on a specified node characteristic. The computer program code is modified based on the node-specific subroutines and the modified computer program code is compiled.
    Type: Application
    Filed: January 8, 2009
    Publication date: May 7, 2009
    Applicant: International Business Machines Corporation
    Inventors: Kathryn M. O'Brien, John Kevin Patrick O'Brien
  • Patent number: 7512699
    Abstract: A method for managing position independent code using a software framework is presented. A software framework provides the ability to cache multiple plug-in's which are loaded in a processor's local storage. A processor receives a command or data stream from another processor, which includes information corresponding to a particular plug-in. The processor uses the plug-in identifier to load the plug-in from shared memory into local memory before it is required in order to minimize latency. When the data stream requests the processor to use the plug-in, the processor retrieves a location offset corresponding to the plug-in and applies the plug-in to the data stream. A plug-in manager manages an entry point table that identifies memory locations corresponding to each plug-in and, therefore, plug-ins may be placed anywhere in a processor's local memory.
    Type: Grant
    Filed: November 12, 2004
    Date of Patent: March 31, 2009
    Assignee: International Business Machines Corporation
    Inventors: Michael Stan Gowen, Barry L Minor, Mark Richard Nutter, John Kevin Patrick O'Brien
  • Patent number: 7512745
    Abstract: Garbage collection in heterogeneous multiprocessor systems is provided. In some illustrative embodiments, garbage collection operations are distributed across a plurality of the processors in the heterogeneous multiprocessor system. Portions of a global mark queue are assigned to processors of the heterogeneous multiprocessor system along with corresponding chunks of a shared memory. The processors perform garbage collection on their assigned portions of the global mark queue and corresponding chunk of shared memory marking memory object references as reachable or adding memory object references to a non-local mark stack. The marked memory objects are merged with a global mark stack and memory object references in the non-local mark stack are merged with a “to be traced” portion of the global mark queue for re-checking using a garbage collection operation.
    Type: Grant
    Filed: April 28, 2006
    Date of Patent: March 31, 2009
    Assignee: International Business Machines Corporation
    Inventors: Michael K. Gschwind, John Kevin Patrick O'Brien, Kathryn M. O'Brien
  • Publication number: 20090055588
    Abstract: Mechanisms for performing useful computations during a software cache reload operation are provided. With the illustrative embodiments, in order to perform software caching, a compiler takes original source code, and while compiling the source code, inserts explicit cache lookup instructions into appropriate portions of the source code where cacheable variables are referenced. In addition, the compiler inserts a cache miss handler routine that is used to branch execution of the code to a cache miss handler if the cache lookup instructions result in a cache miss. The cache miss handler, prior to performing a wait operation for waiting for the data to be retrieved from the backing store, branches execution to an independent subroutine identified by a compiler. The independent subroutine is executed while the data is being retrieved from the backing store such that useful work is performed.
    Type: Application
    Filed: October 1, 2008
    Publication date: February 26, 2009
    Applicant: International Business Machines Corporation
    Inventors: John Kevin Patrick O'Brien, Kathryn O'Brien
  • Patent number: 7493452
    Abstract: A method to efficiently pre-fetch and batch compiler-assisted software cache accesses is provided. The method reduces the overhead associated with software cache directory accesses. With the method, the local memory address of the cache line that stores the pre-fetched data is itself cached, such as in a register or well known location in local memory, so that a later data access does not need to perform address translation and software cache operations and can instead access the data directly from the software cache using the cached local memory address. This saves processor cycles that would otherwise be required to perform the address translation a second time when the data is to be used. Moreover, the system and method directly enable software cache accesses to be effectively decoupled from address translation in order to increase the overlap between computation and communication.
    Type: Grant
    Filed: August 18, 2006
    Date of Patent: February 17, 2009
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, John Kevin Patrick O'Brien, Kathryn M. O'Brien
  • Patent number: 7487496
    Abstract: The present invention provides for a method for computer program functional partitioning for heterogeneous multi-processing systems. At least one system parameter of a computer system comprising one or more disparate processing nodes is identified. Computer program code comprising a program to be run on the computer system is received. A whole program representation is generated based on received computer program code. At least one single-entry-single-exit (SESE) region is identified based on the whole program representation. At least one node-specific SESE region is identified based on identified SESE regions and the at least one system parameter. Each node-specific SESE region is grouped into a node-specific subroutine. Each node-specific subroutine is compiled based on a specified node characteristic. The computer program code is modified based on the node-specific subroutines and the modified computer program code is compiled.
    Type: Grant
    Filed: December 2, 2004
    Date of Patent: February 3, 2009
    Assignee: International Business Machines Corporation
    Inventors: Kathryn M. O'Brien, John Kevin Patrick O'Brien
  • Patent number: 7478376
    Abstract: The present invention provides for a method for computer program code size partitioning for multiple memory multi-processor systems. At least one system parameter of a computer system comprising one or more disparate processing nodes is identified. Computer program code comprising a program to be run on the computer system is received. A program representation based on received computer program code is generated. At least one single-entry-single-exit (SESE) region is identified based on the whole program representation. At least one SESE region of less than a certain size (store-size-specific) is identified based on identified SESE regions and the at least one system parameter. Each store-size-specific SESE region is grouped into a node-specific subroutine. The non node-specific parts of the computer program code are modified based on the partitioning into node-specific subroutines. The modified computer program code including each node-specific subroutine is compiled based on a specified node characteristic.
    Type: Grant
    Filed: December 2, 2004
    Date of Patent: January 13, 2009
    Assignee: International Business Machines Corporation
    Inventors: Kathryn M. O'Brien, John Kevin Patrick O'Brien
  • Patent number: 7461205
    Abstract: Mechanisms for performing useful computations during a software cache reload operation are provided. With the illustrative embodiments, in order to perform software caching, a compiler takes original source code, and while compiling the source code, inserts explicit cache lookup instructions into appropriate portions of the source code where cacheable variables are referenced. In addition, the compiler inserts a cache miss handler routine that is used to branch execution of the code to a cache miss handler if the cache lookup instructions result in a cache miss. The cache miss handler, prior to performing a wait operation for waiting for the data to be retrieved from the backing store, branches execution to an independent subroutine identified by a compiler. The independent subroutine is executed while the data is being retrieved from the backing store such that useful work is performed.
    Type: Grant
    Filed: June 1, 2006
    Date of Patent: December 2, 2008
    Assignee: International Business Machines Corporation
    Inventors: John Kevin Patrick O'Brien, Kathryn O'Brien
  • Publication number: 20080282064
    Abstract: A system and method for speculative assistance to a thread in a heterogeneous processing environment is provided. A first set of instructions is identified in a source code representation (e.g., a source code file) that is suitable for speculative execution. The identified set of instructions are analyzed to determine the processing requirements. Based on the analysis, a processor type is identified that will be used to execute the identified first set of instructions based. The processor type is selected from more than one processor types that are included in the heterogeneous processing environment. The heterogeneous processing environment includes more than one heterogeneous processing cores in a single silicon substrate. The various processing cores can utilize different instruction set architectures (ISAs). An object code representation is then generated for the identified first set of instructions with the object code representation being adapted to execute on the determined type of processor.
    Type: Application
    Filed: May 7, 2007
    Publication date: November 13, 2008
    Inventors: Michael Norman Day, Michael Karl Gschwind, John Kevin Patrick O'Brien, Kathryn O'Brien
  • Publication number: 20080256521
    Abstract: An apparatus and method for partitioning programs between a general purpose core and one or more accelerators are provided. With the apparatus and method, a compiler front end is provided for converting a program source code in a corresponding high level programming language into an intermediate code representation. This intermediate code representation is provided to an interprocedural optimizer which determines which core processor or accelerator each portion of the program should execute on and partitions the program into sub-programs based on this set of decisions. The interprocedural optimizer may further add instructions to the partitions to coordinate and synchronize the sub-programs as required. Each sub-program is compiled on an appropriate compiler backend for the instruction set architecture of the particular core processor or accelerator selected to execute the sub-program. The compiled sub-programs and then linked to thereby generate an executable program.
    Type: Application
    Filed: May 27, 2008
    Publication date: October 16, 2008
    Applicant: International Business Machines Corporation
    Inventors: John Kevin Patrick O'Brien, Kathryn M. O'Brien, Daniel A. Prener
  • Publication number: 20080229291
    Abstract: A compiler implemented software cache apparatus and method in which non-aliased explicitly fetched data are excluded are provided. With the mechanisms of the illustrative embodiments, a compiler uses a forward data flow analysis to prove that there is no alias between the cached data and explicitly fetched data. Explicitly fetched data that has no alias in the cached data are excluded from the software cache. Explicitly fetched data that has aliases in the cached data are allowed to be stored in the software cache. In this way, there is no runtime overhead to maintain the correctness of the two copies of data. Moreover, the number of lines of the software cache that must be protected from eviction is decreased. This leads to a decrease in the amount of computation cycles required by the cache miss handler when evicting cache lines during cache miss handling.
    Type: Application
    Filed: May 28, 2008
    Publication date: September 18, 2008
    Applicant: International Business Machines Corporation
    Inventors: Tong Chen, John Kevin Patrick O'Brien, Kathryn O'Brien, Byoungro So, Zehra N. Sura, Tao Zhang
  • Publication number: 20080229298
    Abstract: A compiler includes a mechanism for employing multiple synergistic processors to execute long vectors. The compiler receives a single source program. The compiler identifies vectorizable loop code in the single source program and extracts the vectorizable loop code from the single source program. The compiler then compiles the extracted vectorizable loop code for a plurality of synergistic processors. The compiler also compiles a remainder of the single source program for a principal processor to form an executable main program such that the executable main program controls operation of the executable vectorizable loop code on the plurality of synergistic processors.
    Type: Application
    Filed: March 15, 2007
    Publication date: September 18, 2008
    Inventors: John Kevin Patrick O'Brien, Kathryn M. O'Brien, Daniel Arthur Prener
  • Publication number: 20080229295
    Abstract: A “kill” intrinsic that may be used in programs for designating specific data objects as having been “killed” by a preceding action is provided. The concept of a data object being “killed” is that the compiler is informed that no operations (e.g., loads and stores) on that data object, or its aliases, can be moved across the point in the program flow where the data object is designated as having been “killed.” The “kill” intrinsic limits the reordering capability of an optimization scheduler of a compiler with regard to operations performed on “killed” data objects. The “kill” intrinsic may be used with DMA operations. Data objects being DMA'ed from a local store of a processor may be “killed” through use of the “kill” intrinsic prior to submitting the DMA request. Data objects being DMA'ed to the local store of the processor may be “killed” after verifying the transfer completes.
    Type: Application
    Filed: May 29, 2008
    Publication date: September 18, 2008
    Applicant: International Business Machines Corporation
    Inventors: Daniel A. Brokenshire, John Kevin Patrick O'Brien
  • Publication number: 20080222391
    Abstract: An apparatus and method for optimizing scalar code executed on a single instruction multiple data (SIMD) engine is provided that aligns the slots of SIMD registers. With the apparatus and method, a compiler is provided that parses source code and, for each statement in the program, generates an expression tree. The compiler inspects all storage inputs to scalar operations in the expression tree to determine their alignment in the SIMD registers. This alignment is propagated up the expression tree from the leaves. When the alignments of two operands in the expression tree are the same, the resulting alignment is the shared value. When the alignments of two operands in the expression tree are different, one operand is shifted. For shifted operands, a shift operation is inserted in the expression tree. The executable code is then generated for the expression tree and shifts are inserted where indicated.
    Type: Application
    Filed: May 27, 2008
    Publication date: September 11, 2008
    Applicant: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, John Kevin Patrick O'Brien
  • Publication number: 20080201699
    Abstract: Vectorizing misaligned references in compiled code for SIMD architectures that support only aligned loads and stores is presented. In the framework presented herein, a loop is first simdized as if the memory unit imposes no alignment constraints. The compiler then inserts data reorganization operations to satisfy the actual alignment requirement of the hardware. Finally, the code generation algorithm generates SIMD codes based on the data reorganization graph, addressing realistic issues such as runtime alignments, unknown loop bounds, residue iteration counts, and multiple statements with arbitrary alignment combinations. Beyond generating a valid simdization, a preferred embodiment further improves the quality of the generated codes. Four stream-shift placement policies are disclosed, which minimize the number of data reorganization generated by the alignment handling.
    Type: Application
    Filed: April 23, 2008
    Publication date: August 21, 2008
    Inventors: Alexandre E. Eichenberger, John Kevin Patrick O'Brien, Peng Wu