Patents by Inventor Spiros Kalogeropulos
Spiros Kalogeropulos has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 7849453Abstract: One embodiment of the present invention provides a system that generates code for software scouting the regions of a program. During operation, the system receives source code for a program. The system then compiles the source code. In the first step of the compilation process, the system identifies a first set of loops from a hierarchy of loops in the source code, wherein each loop in the first set of loops contains at least one effective prefetch candidate. Then, from the first set of loops, the system identifies a second set of loops where scout-mode prefetching is profitable. Next, for each loop in the second set of loops, the system produces executable code for a helper-thread which contains a prefetch instruction for each effective prefetch candidate. At runtime the helper-thread is executed in parallel with the main thread in advance of where the main thread is executing to prefetch data items for the main thread.Type: GrantFiled: November 9, 2005Date of Patent: December 7, 2010Assignee: Oracle America, Inc.Inventors: Partha P. Tirumalai, Yonghong Song, Spiros Kalogeropulos
-
Publication number: 20100153959Abstract: A system and method for automatically controlling run-time parallelization of a software application. A buffer is allocated during execution of program code of an application. When a point in program code near a parallelized region is reached, demand information is stored in the buffer in response to reaching a predetermined first checkpoint. Subsequently, the demand information is read from the buffer in response to reaching a predetermined second checkpoint. Allocation information corresponding to the read demand information is computed and stored the in the buffer for the application to later access. The allocation information is read from the buffer in response to reaching a predetermined third checkpoint, and the parallelized region of code is executed in a manner corresponding to the allocation information.Type: ApplicationFiled: December 15, 2008Publication date: June 17, 2010Inventors: Yonghong Song, Spiros Kalogeropulos, Partha P. Tirumalai
-
Publication number: 20100146480Abstract: A system and method for automatic efficient parallelization of code combined with hardware transactional memory support. A software application may contain a transaction synchronization region (TSR) utilizing lock and unlock transaction synchronization function calls for a shared region of memory within a shared memory. The TSR is replaced with two portions of code. The first portion comprises hardware transactional memory primitives in place of lock and unlock function calls. Also, the first portion ensures no other transaction is accessing the shared region without disabling existing hardware transactional memory support. The second portion performs a fail routine, which utilizes lock and unlock transaction synchronization primitives in response to an indication that a failure occurs within said first portion.Type: ApplicationFiled: December 10, 2008Publication date: June 10, 2010Inventors: Spiros Kalogeropulos, Yonghong Song, Partha P. Tirumalai
-
Publication number: 20100146495Abstract: A computing system has an amount of shared cache, and performs runtime automatic parallelization wherein when a parallelized loop is encountered, a main thread shares the workload with at least one other non-main thread. A method for providing interprocedural prefetching includes compiling source code to produce compiled code having a main thread including a parallelized loop. Prior to the parallelized loop in the main thread, the main thread includes prefetching instructions for the at least one other non-main thread that shares the workload of the parallelized loop. As a result, the main thread prefetches data into the shared cache for use by the at least one other non-main thread.Type: ApplicationFiled: December 10, 2008Publication date: June 10, 2010Applicant: SUN MICROSYSTEMS, INC.Inventors: Yonghong Song, Spiros Kalogeropulos, Partha P. Tirumalai
-
Patent number: 7681188Abstract: One embodiment of the present invention provides a system that facilitates locked prefetch scheduling in general cyclic regions of a computer program. The system operates by first receiving a source code for the computer program and compiling the source code into intermediate code. The system then performs a trace detection on the intermediate code. Next, the system inserts prefetch instructions and corresponding locks into the intermediate code. Finally, the system generates executable code from the intermediate code, wherein a lock for a given prefetch instruction prevents subsequent prefetches from being issued until the data value returns for the given prefetch instruction.Type: GrantFiled: April 29, 2005Date of Patent: March 16, 2010Assignee: Sun Microsystems, Inc.Inventors: Partha P. Tirumalai, Spiros Kalogeropulos, Yonghong Song
-
Publication number: 20090288075Abstract: A system and method for speculatively parallelizing non-countable loops in a multi-threaded application. A multi-core processor receives instructions for a multi-threaded application. The application may contain non-countable loops. Non-countable loops have an iteration count value that cannot be determined prior to the execution of the non-countable loop, a loop index value that cannot be non-speculatively determined prior to the execution of an iteration of the non-countable loop, and control that is not transferred out of the loop body by a code line in the loop body. The compiler replaces the non-countable loop with a parallelized loop pattern that uses outlined function calls defined in a parallelization library (PL) in order to speculatively execute iterations of the parallelized loop. The parallelized loop pattern is configured to squash and re-execute any speculative thread of the parallelized loop pattern that is signaled to have a transaction failure.Type: ApplicationFiled: May 19, 2008Publication date: November 19, 2009Inventors: Yonghong Song, Spiros Kalogeropulos, Partha P. Tirumalai
-
Publication number: 20090276758Abstract: A compilation method and mechanism for parallelizing program code. A method for compilation includes analyzing source code and identifying candidate code for parallelization. Having identified one or more suitable candidates, the profitability of parallelizing the candidate code is determined. If the profitability determination meets a predetermined criteria, then the candidate code may be parallelized. If, however, the profitability determination does not meet the predetermined criteria, then the candidate code may not be parallelized. Candidate code may comprises a loop, and determining profitability of parallelization may include computing a probability of transaction failure for the loop. Additionally, a determination of an execution time of a parallelized version of the loop is made. If the determined execution time is less than an execution time of a non-parallelized version of said loop by at least a given amount, then the loop may be parallelized.Type: ApplicationFiled: May 1, 2008Publication date: November 5, 2009Inventors: Yonghong Song, Spiros Kalogeropulos, Partha P. Tirumalai
-
Publication number: 20090276766Abstract: A compilation method and mechanism for parallelizing program code. A method for compilation includes analyzing source code and identifying candidate code for parallelization. The method includes parallelizing the candidate code, in response to determining said profitability meets a predetermined criteria; and generating object code corresponding to the source code. The generated object code includes both a non-parallelized version of the candidate code and a parallelized version of the candidate code. During execution of the object code, a dynamic selection between execution of the non-parallelized version of the candidate code and the parallelized version of the candidate code is made. Changing execution from said parallelized version of the candidate code to the non-parallelized version of the candidate code, may be in response to determining a transaction failure count meets a pre-determined threshold.Type: ApplicationFiled: May 1, 2008Publication date: November 5, 2009Inventors: Yonghong Song, Spiros Kalogeropulos, Partha P. Tirumalai
-
Publication number: 20090235237Abstract: Parallelize a computer program by scoping program variables at compile time and inserting code into the program. Identify as value predictable variables, variables that are: defined only once in a loop of the program; not defined in any inner loop of the loop; and used in the loop. Optionally also: identify a code block in the program that contains a variable assignment, and then traverse a path backwards from the block through a control flow graph of the program. Name in a set all blocks along the path until a loop header block. For each block in the set, determine program blocks that logically succeed the block and are not in the first set. Identify all paths between the block and the determined blocks as failure paths, and insert code into the failure paths. When executed at run time of the program, the inserted code fails the corresponding path.Type: ApplicationFiled: March 11, 2008Publication date: September 17, 2009Applicant: Sun Microsystems, Inc.Inventors: Yonghong Song, Xiangyun Kong, Spiros Kalogeropulos, Partha P. Tirumalai
-
Publication number: 20090217253Abstract: A computer program is speculatively parallelized with transactional memory by scoping program variables at compile time, and inserting code into the program at compile time. Determinations of the scoping can be based on whether scalar variables being scoped are involved in inter-loop non-reduction data dependencies, are used outside loops in which they were defined, and at what point in a loop a scalar variable is defined. The inserted code can include instructions for execution at a run time of the program to determine loop boundaries of the program, and issue checkpoint instructions and commit instructions that encompass transaction regions in the program. A transaction region can include an original function of the program and a spin-waiting loop with a non-transactional load, wherein the spin-waiting loop is configured to wait for a previous thread to commit before the current transaction commits.Type: ApplicationFiled: February 22, 2008Publication date: August 27, 2009Applicant: Sun Microsystems, Inc.Inventors: Yonghong Song, Xiangyun Kong, Spiros Kalogeropulos, Partha P. Tirumalai
-
Publication number: 20090144746Abstract: Methods and apparatus provide for a workload adjuster to estimate the startup cost of one or more non-main threads of loop execution and to estimate the amount of workload to be migrated between different threads. Upon deciding to parallelize the execution of a loop, the workload adjuster creates a scheduling policy with a workload for a main thread and workloads for respective non-main threads. The scheduling policy distributes iterations of a parallelized loop to the workload of the main thread and iterations of the parallelized loop to the workloads of the non-main threads. The workload adjuster evaluates a start-up cost of the workload of a non-main thread and, based on the start-up cost, migrates a portion of the workload for that non-main thread to the main thread's workload.Type: ApplicationFiled: December 4, 2007Publication date: June 4, 2009Inventors: Yonghong Song, Spiros Kalogeropulos, Partha Pal Tirumalai
-
Patent number: 7458067Abstract: One embodiment of the present invention provides a system that facilitates optimizing computer program performance by using steered execution. The system operates by first receiving source code for a computer program, and then compiling a portion of this source code with a first set of optimizations to generate a first compiled portion. The system also compiles the same portion of the source code with a second set of optimizations to generate a second compiled portion. Remaining source code is compiled to generate a third compiled portion. Additionally, a rule is generated for selecting between the first compiled portion and the second compiled portion. Finally, the first compiled portion, the second compiled portion, the third compiled portion, and the rule are combined into an executable output file.Type: GrantFiled: March 18, 2005Date of Patent: November 25, 2008Assignee: Sun Microsystems, Inc.Inventors: Partha P. Tirumalai, Spiros Kalogeropulos, Yonghong Song, Kurt J. Goebel
-
Publication number: 20080141268Abstract: A method and mechanism for using threads in a computing system. A multithreaded computing system is configured to execute a first thread and a second thread. The first and second threads are configured to operate in a producer-consumer relationship. The second thread is configured to execute utility type functions in advance of the first thread reaching the functions in the program code. The second thread executes in parallel with the first thread and produces results from the execution which are made available for consumption by the first thread. Analysis of the program code is performed to identify such utility functions and modify the program code to support execution of the functions by the second thread.Type: ApplicationFiled: December 12, 2006Publication date: June 12, 2008Inventors: Partha P. Tirumalai, Yonghong Song, Spiros Kalogeropulos
-
Patent number: 7383402Abstract: Prefetch information is generated for multi-block indirect memory access chains. A method may include selecting a chain of indirect memory accesses of a procedure, the chain comprising a head access that does not depend for its address on another prefetch candidate memory access within the procedure and an indirect access that depends for its address on the head access. The method may further include determining a prefetch-ahead value for the chain, and generating a load operation corresponding to the head access that specifies a target memory address that is dependent upon the prefetch-ahead value and an address of the head access. The method may further include, for a terminal indirect access of the chain, generating a respective prefetch operation that is dependent for its address computation on results of preceding load operations in the same manner as its corresponding terminal indirect access depends upon preceding accesses in the chain.Type: GrantFiled: June 5, 2006Date of Patent: June 3, 2008Assignee: Sun Microsystems, Inc.Inventors: Spiros Kalogeropulos, Yonghong Song, Partha P. Tirumalai
-
Patent number: 7383401Abstract: A method and system for identifying multi-block indirect memory access chains. A method may include identifying basic blocks between an entry point and an exit point of a procedure, where the procedure includes a control statement governing its execution. It may be determined whether a probability of execution of a given basic block relative to the control statement equals or exceeds a first threshold value. If so, a respective set of one or more chains of indirect memory accesses may be generated, where each chain includes at least a respective head memory access that does not depend for its memory address computation on another memory access within the given basic block. Chains may be joined across basic blocks dependent upon whether the relative execution probabilities of the blocks exceed a threshold value.Type: GrantFiled: June 5, 2006Date of Patent: June 3, 2008Assignee: Sun Microsystems, Inc.Inventors: Spiros Kalogeropulos, Yonghong Song, Partha P. Tirumalai
-
Publication number: 20080127134Abstract: A method and mechanism for producing and executing self-steering program code. A method comprises analyzing program code and identifying portions which may be amenable to optimization. Having identified such a portion of code, at least one optimized version of the identified code is added to the program code. Additionally, a selection mechanism is added to the program code which is configured to select between two or more versions of the portion of code during runtime. The modified program code is then compiled with the added optimized version and the selection mechanism. During execution, monitoring of behavior of the code may be enabled or disabled. Based upon such monitored behavior, a different version of the code may be selected for execution.Type: ApplicationFiled: October 27, 2006Publication date: May 29, 2008Applicant: Sun Microsystems, Inc.Inventors: Partha P. Tirumalai, Kurt J. Goebel, Yonghong Song, Spiros Kalogeropulos
-
Publication number: 20070283106Abstract: Prefetch information is generated for multi-block indirect memory access chains. A method may include selecting a chain of indirect memory accesses of a procedure, the chain comprising a head access that does not depend for its address on another prefetch candidate memory access within the procedure and an indirect access that depends for its address on the head access. The method may further include determining a prefetch-ahead value for the chain, and generating a load operation corresponding to the head access that specifies a target memory address that is dependent upon the prefetch-ahead value and an address of the head access. The method may further include, for a terminal indirect access of the chain, generating a respective prefetch operation that is dependent for its address computation on results of preceding load operations in the same manner as its corresponding terminal indirect access depends upon preceding accesses in the chain.Type: ApplicationFiled: June 5, 2006Publication date: December 6, 2007Applicant: Sun Microsystems, Inc.Inventors: Spiros Kalogeropulos, Yonghong Song, Partha P. Tirumalai
-
Publication number: 20070283105Abstract: A method and system for identifying multi-block indirect memory access chains. A method may include identifying basic blocks between an entry point and an exit point of a procedure, where the procedure includes a control statement governing its execution. It may be determined whether a probability of execution of a given basic block relative to the control statement equals or exceeds a first threshold value. If so, a respective set of one or more chains of indirect memory accesses may be generated, where each chain includes at least a respective head memory access that does not depend for its memory address computation on another memory access within the given basic block. Chains may be joined across basic blocks dependent upon whether the relative execution probabilities of the blocks exceed a threshold value.Type: ApplicationFiled: June 5, 2006Publication date: December 6, 2007Applicant: Sun Microsystems, Inc.Inventors: Spiros Kalogeropulos, Yonghong Song, Partha P. Tirumalai
-
Publication number: 20070271565Abstract: A method and mechanism for using threads in a computing system. A multithreaded computing system is configured to execute a first thread and a second thread. Responsive to the first thread detecting a launch point for a function, the first thread is configured to provide an indication to the second thread that the second thread may begin execution of a given function. The launch point of the function precedes an actual call point of the function in an execution sequence. The second thread is configured to initiate execution of the function in response to the indication. The function includes one or more inputs and the second thread uses anticipated values for each of the one or more inputs. When the first thread reaches a call point for the function, the first thread is configured to use a results of the second thread's execution, in response to determining the anticipated values used by the second thread were correct.Type: ApplicationFiled: May 18, 2006Publication date: November 22, 2007Inventors: Partha P. Tirumalai, Yonghong Song, Spiros Kalogeropulos
-
Patent number: 7257810Abstract: One embodiment of the present invention provides a system that generates code to perform anticipatory prefetching for data references. During operation, the system receives code to be executed on a computer system. Next, the system analyzes the code to identify data references to be prefetched. This analysis can involve: using a two-phase marking process in which blocks that are certain to execute are considered before other blocks; and analyzing complex array subscripts. Next, the system inserts prefetch instructions into the code in advance of the identified data references. This insertion can involve: dealing with non-constant or unknown stride values; moving prefetch instructions into preceding basic blocks; and issuing multiple prefetches for the same data reference.Type: GrantFiled: November 2, 2001Date of Patent: August 14, 2007Assignee: Sun Microsystems, Inc.Inventors: Partha P Tirumalai, Spiros Kalogeropulos, Mahadevan Rajagopalan, Yonghong Song, Vikram Rao