Patents by Inventor Spiros Kalogeropulos

Spiros Kalogeropulos has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 7849453
    Abstract: One embodiment of the present invention provides a system that generates code for software scouting the regions of a program. During operation, the system receives source code for a program. The system then compiles the source code. In the first step of the compilation process, the system identifies a first set of loops from a hierarchy of loops in the source code, wherein each loop in the first set of loops contains at least one effective prefetch candidate. Then, from the first set of loops, the system identifies a second set of loops where scout-mode prefetching is profitable. Next, for each loop in the second set of loops, the system produces executable code for a helper-thread which contains a prefetch instruction for each effective prefetch candidate. At runtime the helper-thread is executed in parallel with the main thread in advance of where the main thread is executing to prefetch data items for the main thread.
    Type: Grant
    Filed: November 9, 2005
    Date of Patent: December 7, 2010
    Assignee: Oracle America, Inc.
    Inventors: Partha P. Tirumalai, Yonghong Song, Spiros Kalogeropulos
  • Publication number: 20100153959
    Abstract: A system and method for automatically controlling run-time parallelization of a software application. A buffer is allocated during execution of program code of an application. When a point in program code near a parallelized region is reached, demand information is stored in the buffer in response to reaching a predetermined first checkpoint. Subsequently, the demand information is read from the buffer in response to reaching a predetermined second checkpoint. Allocation information corresponding to the read demand information is computed and stored the in the buffer for the application to later access. The allocation information is read from the buffer in response to reaching a predetermined third checkpoint, and the parallelized region of code is executed in a manner corresponding to the allocation information.
    Type: Application
    Filed: December 15, 2008
    Publication date: June 17, 2010
    Inventors: Yonghong Song, Spiros Kalogeropulos, Partha P. Tirumalai
  • Publication number: 20100146480
    Abstract: A system and method for automatic efficient parallelization of code combined with hardware transactional memory support. A software application may contain a transaction synchronization region (TSR) utilizing lock and unlock transaction synchronization function calls for a shared region of memory within a shared memory. The TSR is replaced with two portions of code. The first portion comprises hardware transactional memory primitives in place of lock and unlock function calls. Also, the first portion ensures no other transaction is accessing the shared region without disabling existing hardware transactional memory support. The second portion performs a fail routine, which utilizes lock and unlock transaction synchronization primitives in response to an indication that a failure occurs within said first portion.
    Type: Application
    Filed: December 10, 2008
    Publication date: June 10, 2010
    Inventors: Spiros Kalogeropulos, Yonghong Song, Partha P. Tirumalai
  • Publication number: 20100146495
    Abstract: A computing system has an amount of shared cache, and performs runtime automatic parallelization wherein when a parallelized loop is encountered, a main thread shares the workload with at least one other non-main thread. A method for providing interprocedural prefetching includes compiling source code to produce compiled code having a main thread including a parallelized loop. Prior to the parallelized loop in the main thread, the main thread includes prefetching instructions for the at least one other non-main thread that shares the workload of the parallelized loop. As a result, the main thread prefetches data into the shared cache for use by the at least one other non-main thread.
    Type: Application
    Filed: December 10, 2008
    Publication date: June 10, 2010
    Applicant: SUN MICROSYSTEMS, INC.
    Inventors: Yonghong Song, Spiros Kalogeropulos, Partha P. Tirumalai
  • Patent number: 7681188
    Abstract: One embodiment of the present invention provides a system that facilitates locked prefetch scheduling in general cyclic regions of a computer program. The system operates by first receiving a source code for the computer program and compiling the source code into intermediate code. The system then performs a trace detection on the intermediate code. Next, the system inserts prefetch instructions and corresponding locks into the intermediate code. Finally, the system generates executable code from the intermediate code, wherein a lock for a given prefetch instruction prevents subsequent prefetches from being issued until the data value returns for the given prefetch instruction.
    Type: Grant
    Filed: April 29, 2005
    Date of Patent: March 16, 2010
    Assignee: Sun Microsystems, Inc.
    Inventors: Partha P. Tirumalai, Spiros Kalogeropulos, Yonghong Song
  • Publication number: 20090288075
    Abstract: A system and method for speculatively parallelizing non-countable loops in a multi-threaded application. A multi-core processor receives instructions for a multi-threaded application. The application may contain non-countable loops. Non-countable loops have an iteration count value that cannot be determined prior to the execution of the non-countable loop, a loop index value that cannot be non-speculatively determined prior to the execution of an iteration of the non-countable loop, and control that is not transferred out of the loop body by a code line in the loop body. The compiler replaces the non-countable loop with a parallelized loop pattern that uses outlined function calls defined in a parallelization library (PL) in order to speculatively execute iterations of the parallelized loop. The parallelized loop pattern is configured to squash and re-execute any speculative thread of the parallelized loop pattern that is signaled to have a transaction failure.
    Type: Application
    Filed: May 19, 2008
    Publication date: November 19, 2009
    Inventors: Yonghong Song, Spiros Kalogeropulos, Partha P. Tirumalai
  • Publication number: 20090276758
    Abstract: A compilation method and mechanism for parallelizing program code. A method for compilation includes analyzing source code and identifying candidate code for parallelization. Having identified one or more suitable candidates, the profitability of parallelizing the candidate code is determined. If the profitability determination meets a predetermined criteria, then the candidate code may be parallelized. If, however, the profitability determination does not meet the predetermined criteria, then the candidate code may not be parallelized. Candidate code may comprises a loop, and determining profitability of parallelization may include computing a probability of transaction failure for the loop. Additionally, a determination of an execution time of a parallelized version of the loop is made. If the determined execution time is less than an execution time of a non-parallelized version of said loop by at least a given amount, then the loop may be parallelized.
    Type: Application
    Filed: May 1, 2008
    Publication date: November 5, 2009
    Inventors: Yonghong Song, Spiros Kalogeropulos, Partha P. Tirumalai
  • Publication number: 20090276766
    Abstract: A compilation method and mechanism for parallelizing program code. A method for compilation includes analyzing source code and identifying candidate code for parallelization. The method includes parallelizing the candidate code, in response to determining said profitability meets a predetermined criteria; and generating object code corresponding to the source code. The generated object code includes both a non-parallelized version of the candidate code and a parallelized version of the candidate code. During execution of the object code, a dynamic selection between execution of the non-parallelized version of the candidate code and the parallelized version of the candidate code is made. Changing execution from said parallelized version of the candidate code to the non-parallelized version of the candidate code, may be in response to determining a transaction failure count meets a pre-determined threshold.
    Type: Application
    Filed: May 1, 2008
    Publication date: November 5, 2009
    Inventors: Yonghong Song, Spiros Kalogeropulos, Partha P. Tirumalai
  • Publication number: 20090235237
    Abstract: Parallelize a computer program by scoping program variables at compile time and inserting code into the program. Identify as value predictable variables, variables that are: defined only once in a loop of the program; not defined in any inner loop of the loop; and used in the loop. Optionally also: identify a code block in the program that contains a variable assignment, and then traverse a path backwards from the block through a control flow graph of the program. Name in a set all blocks along the path until a loop header block. For each block in the set, determine program blocks that logically succeed the block and are not in the first set. Identify all paths between the block and the determined blocks as failure paths, and insert code into the failure paths. When executed at run time of the program, the inserted code fails the corresponding path.
    Type: Application
    Filed: March 11, 2008
    Publication date: September 17, 2009
    Applicant: Sun Microsystems, Inc.
    Inventors: Yonghong Song, Xiangyun Kong, Spiros Kalogeropulos, Partha P. Tirumalai
  • Publication number: 20090217253
    Abstract: A computer program is speculatively parallelized with transactional memory by scoping program variables at compile time, and inserting code into the program at compile time. Determinations of the scoping can be based on whether scalar variables being scoped are involved in inter-loop non-reduction data dependencies, are used outside loops in which they were defined, and at what point in a loop a scalar variable is defined. The inserted code can include instructions for execution at a run time of the program to determine loop boundaries of the program, and issue checkpoint instructions and commit instructions that encompass transaction regions in the program. A transaction region can include an original function of the program and a spin-waiting loop with a non-transactional load, wherein the spin-waiting loop is configured to wait for a previous thread to commit before the current transaction commits.
    Type: Application
    Filed: February 22, 2008
    Publication date: August 27, 2009
    Applicant: Sun Microsystems, Inc.
    Inventors: Yonghong Song, Xiangyun Kong, Spiros Kalogeropulos, Partha P. Tirumalai
  • Publication number: 20090144746
    Abstract: Methods and apparatus provide for a workload adjuster to estimate the startup cost of one or more non-main threads of loop execution and to estimate the amount of workload to be migrated between different threads. Upon deciding to parallelize the execution of a loop, the workload adjuster creates a scheduling policy with a workload for a main thread and workloads for respective non-main threads. The scheduling policy distributes iterations of a parallelized loop to the workload of the main thread and iterations of the parallelized loop to the workloads of the non-main threads. The workload adjuster evaluates a start-up cost of the workload of a non-main thread and, based on the start-up cost, migrates a portion of the workload for that non-main thread to the main thread's workload.
    Type: Application
    Filed: December 4, 2007
    Publication date: June 4, 2009
    Inventors: Yonghong Song, Spiros Kalogeropulos, Partha Pal Tirumalai
  • Patent number: 7458067
    Abstract: One embodiment of the present invention provides a system that facilitates optimizing computer program performance by using steered execution. The system operates by first receiving source code for a computer program, and then compiling a portion of this source code with a first set of optimizations to generate a first compiled portion. The system also compiles the same portion of the source code with a second set of optimizations to generate a second compiled portion. Remaining source code is compiled to generate a third compiled portion. Additionally, a rule is generated for selecting between the first compiled portion and the second compiled portion. Finally, the first compiled portion, the second compiled portion, the third compiled portion, and the rule are combined into an executable output file.
    Type: Grant
    Filed: March 18, 2005
    Date of Patent: November 25, 2008
    Assignee: Sun Microsystems, Inc.
    Inventors: Partha P. Tirumalai, Spiros Kalogeropulos, Yonghong Song, Kurt J. Goebel
  • Publication number: 20080141268
    Abstract: A method and mechanism for using threads in a computing system. A multithreaded computing system is configured to execute a first thread and a second thread. The first and second threads are configured to operate in a producer-consumer relationship. The second thread is configured to execute utility type functions in advance of the first thread reaching the functions in the program code. The second thread executes in parallel with the first thread and produces results from the execution which are made available for consumption by the first thread. Analysis of the program code is performed to identify such utility functions and modify the program code to support execution of the functions by the second thread.
    Type: Application
    Filed: December 12, 2006
    Publication date: June 12, 2008
    Inventors: Partha P. Tirumalai, Yonghong Song, Spiros Kalogeropulos
  • Patent number: 7383402
    Abstract: Prefetch information is generated for multi-block indirect memory access chains. A method may include selecting a chain of indirect memory accesses of a procedure, the chain comprising a head access that does not depend for its address on another prefetch candidate memory access within the procedure and an indirect access that depends for its address on the head access. The method may further include determining a prefetch-ahead value for the chain, and generating a load operation corresponding to the head access that specifies a target memory address that is dependent upon the prefetch-ahead value and an address of the head access. The method may further include, for a terminal indirect access of the chain, generating a respective prefetch operation that is dependent for its address computation on results of preceding load operations in the same manner as its corresponding terminal indirect access depends upon preceding accesses in the chain.
    Type: Grant
    Filed: June 5, 2006
    Date of Patent: June 3, 2008
    Assignee: Sun Microsystems, Inc.
    Inventors: Spiros Kalogeropulos, Yonghong Song, Partha P. Tirumalai
  • Patent number: 7383401
    Abstract: A method and system for identifying multi-block indirect memory access chains. A method may include identifying basic blocks between an entry point and an exit point of a procedure, where the procedure includes a control statement governing its execution. It may be determined whether a probability of execution of a given basic block relative to the control statement equals or exceeds a first threshold value. If so, a respective set of one or more chains of indirect memory accesses may be generated, where each chain includes at least a respective head memory access that does not depend for its memory address computation on another memory access within the given basic block. Chains may be joined across basic blocks dependent upon whether the relative execution probabilities of the blocks exceed a threshold value.
    Type: Grant
    Filed: June 5, 2006
    Date of Patent: June 3, 2008
    Assignee: Sun Microsystems, Inc.
    Inventors: Spiros Kalogeropulos, Yonghong Song, Partha P. Tirumalai
  • Publication number: 20080127134
    Abstract: A method and mechanism for producing and executing self-steering program code. A method comprises analyzing program code and identifying portions which may be amenable to optimization. Having identified such a portion of code, at least one optimized version of the identified code is added to the program code. Additionally, a selection mechanism is added to the program code which is configured to select between two or more versions of the portion of code during runtime. The modified program code is then compiled with the added optimized version and the selection mechanism. During execution, monitoring of behavior of the code may be enabled or disabled. Based upon such monitored behavior, a different version of the code may be selected for execution.
    Type: Application
    Filed: October 27, 2006
    Publication date: May 29, 2008
    Applicant: Sun Microsystems, Inc.
    Inventors: Partha P. Tirumalai, Kurt J. Goebel, Yonghong Song, Spiros Kalogeropulos
  • Publication number: 20070283106
    Abstract: Prefetch information is generated for multi-block indirect memory access chains. A method may include selecting a chain of indirect memory accesses of a procedure, the chain comprising a head access that does not depend for its address on another prefetch candidate memory access within the procedure and an indirect access that depends for its address on the head access. The method may further include determining a prefetch-ahead value for the chain, and generating a load operation corresponding to the head access that specifies a target memory address that is dependent upon the prefetch-ahead value and an address of the head access. The method may further include, for a terminal indirect access of the chain, generating a respective prefetch operation that is dependent for its address computation on results of preceding load operations in the same manner as its corresponding terminal indirect access depends upon preceding accesses in the chain.
    Type: Application
    Filed: June 5, 2006
    Publication date: December 6, 2007
    Applicant: Sun Microsystems, Inc.
    Inventors: Spiros Kalogeropulos, Yonghong Song, Partha P. Tirumalai
  • Publication number: 20070283105
    Abstract: A method and system for identifying multi-block indirect memory access chains. A method may include identifying basic blocks between an entry point and an exit point of a procedure, where the procedure includes a control statement governing its execution. It may be determined whether a probability of execution of a given basic block relative to the control statement equals or exceeds a first threshold value. If so, a respective set of one or more chains of indirect memory accesses may be generated, where each chain includes at least a respective head memory access that does not depend for its memory address computation on another memory access within the given basic block. Chains may be joined across basic blocks dependent upon whether the relative execution probabilities of the blocks exceed a threshold value.
    Type: Application
    Filed: June 5, 2006
    Publication date: December 6, 2007
    Applicant: Sun Microsystems, Inc.
    Inventors: Spiros Kalogeropulos, Yonghong Song, Partha P. Tirumalai
  • Publication number: 20070271565
    Abstract: A method and mechanism for using threads in a computing system. A multithreaded computing system is configured to execute a first thread and a second thread. Responsive to the first thread detecting a launch point for a function, the first thread is configured to provide an indication to the second thread that the second thread may begin execution of a given function. The launch point of the function precedes an actual call point of the function in an execution sequence. The second thread is configured to initiate execution of the function in response to the indication. The function includes one or more inputs and the second thread uses anticipated values for each of the one or more inputs. When the first thread reaches a call point for the function, the first thread is configured to use a results of the second thread's execution, in response to determining the anticipated values used by the second thread were correct.
    Type: Application
    Filed: May 18, 2006
    Publication date: November 22, 2007
    Inventors: Partha P. Tirumalai, Yonghong Song, Spiros Kalogeropulos
  • Patent number: 7257810
    Abstract: One embodiment of the present invention provides a system that generates code to perform anticipatory prefetching for data references. During operation, the system receives code to be executed on a computer system. Next, the system analyzes the code to identify data references to be prefetched. This analysis can involve: using a two-phase marking process in which blocks that are certain to execute are considered before other blocks; and analyzing complex array subscripts. Next, the system inserts prefetch instructions into the code in advance of the identified data references. This insertion can involve: dealing with non-constant or unknown stride values; moving prefetch instructions into preceding basic blocks; and issuing multiple prefetches for the same data reference.
    Type: Grant
    Filed: November 2, 2001
    Date of Patent: August 14, 2007
    Assignee: Sun Microsystems, Inc.
    Inventors: Partha P Tirumalai, Spiros Kalogeropulos, Mahadevan Rajagopalan, Yonghong Song, Vikram Rao