Loop Compiling Patents (Class 717/150)

Dynamically controlling a prefetching range of a software controlled cache

Patent number: 8146064

Abstract: Dynamically controlling a prefetching range of a software controlled cache is provided. A compiler analyzes source code to identify at least one of a plurality of loops that contain irregular memory references. For each irregular memory reference in the source code, the compiler determines whether the irregular memory reference is a candidate for optimization. Responsive to identifying an irregular memory reference that may be optimized, the complier determines whether the irregular memory reference is valid for prefetching. If the irregular memory reference is valid for prefetching, a store statement for an address of the irregular memory reference is inserted into the at least one loop. A runtime library call is inserted into a prefetch runtime library to dynamically prefetch the irregular memory references. Data associated with the irregular memory references are dynamically prefetched into the software controlled cache when the runtime library call is invoked.

Type: Grant

Filed: April 4, 2008

Date of Patent: March 27, 2012

Assignee: International Business Machines Corporation

Inventors: Tong Chen, Marc Gonzalez tallada, Zehra N. Sura, Tao Zhang
Efficient data reorganization to satisfy data alignment constraints

Patent number: 8146067

Abstract: Vectorizing misaligned references in compiled code for SIMD architectures that support only aligned loads and stores is presented. In the framework presented herein, a loop is first simdized as if the memory unit imposes no alignment constraints. The compiler then inserts data reorganization operations to satisfy the actual alignment requirement of the hardware. Finally, the code generation algorithm generates SIMD codes based on the data reorganization graph, addressing realistic issues such as runtime alignments, unknown loop bounds, residue iteration counts, and multiple statements with arbitrary alignment combinations. Beyond generating a valid simdization, a preferred embodiment further improves the quality of the generated codes. Four stream-shift placement policies are disclosed, which minimize the number of data reorganization generated by the alignment handling.

Type: Grant

Filed: April 23, 2008

Date of Patent: March 27, 2012

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, John Kevin Patrick O'Brien, Peng Wu
Pipelined parallelization of multi-dimensional loops with multiple data dependencies

Patent number: 8146071

Abstract: A mechanism for folding all the data dependencies in a loop into a single, conservative dependence. This mechanism leads to one pair of synchronization primitives per loop. This mechanism does not require complicated, multi-stage compile time analysis. This mechanism considers only the data dependence information in the loop. The low synchronization cost balances the loss in parallelism due to the reduced overlap between iterations. Additionally, a novel scheme is presented to implement required synchronization to enforce data dependences in a DOACROSS loop. The synchronization is based on an iteration vector, which identifies a spatial position in the iteration space of the loop. Multiple iterations executing in parallel have their own iteration vector for synchronization where they update their position in the iteration space. As no sequential updates to the synchronization variable exist, this method exploits a greater degree of parallelism.

Type: Grant

Filed: September 18, 2007

Date of Patent: March 27, 2012

Assignee: International Business Machines Corporation

Inventors: Raul Esteban Silvera, Priya Unnikrishnan
Software pipelining using one or more vector registers

Patent number: 8136107

Abstract: A method for managing multiple values assigned to a variable during various stages of a software pipelined process executed in a computing environment. The method comprises allocating two or more slots in a vector register to two or more values associated with said variable during two or more stages of a pipeline process; and rotating values in each slot responsive to an instruction.

Type: Grant

Filed: October 24, 2007

Date of Patent: March 13, 2012

Assignee: International Business Machines Corporation

Inventor: Ayal Zaks
Mechanism to restrict parallelization of loops

Patent number: 8104030

Abstract: A computer implemented method, computer usable program code, and a system for parallelizing a loop. A parameter that will be used to limit parallelization of the loop is identified to limit parallelization of the loop. The parameter specifies a minimum number of loop iterations that a thread should execute. The parameter can be adjusted based on a parallel performance factor. A parallel performance factor is a factor that influences the performance of parallel code. A number of threads from a plurality of threads is selected for processing iterations of the loop based on the parameter. The number of threads is selected prior to execution of the first iteration of the loop.

Type: Grant

Filed: December 21, 2005

Date of Patent: January 24, 2012

Assignee: International Business Machines Corporation

Inventors: Raul Esteban Silvera, Priya Unnikrishnan, Guansong Zhang
Selective code generation optimization for an advanced dual-representation polyhedral loop transformation framework

Patent number: 8087010

Abstract: Mechanisms for selective code generation optimization for an advanced dual-representation polyhedral loop transformation framework are provided. The mechanisms of the illustrative embodiments address the weaknesses of the known polyhedral loop transformation based approaches by providing mechanisms for performing code generation transformations on individual statement instances in an intermediate representation generated by the polyhedral loop transformation optimization of the source code. These code generation transformations have the important property that they do not change program order of the statements in the intermediate representation. This property allows the result of the code generation transformations to be provided back to the polyhedral loop transformation mechanisms in a program statement view, via a new re-entrance path of the illustrative embodiments, for additional optimization.

Type: Grant

Filed: September 26, 2007

Date of Patent: December 27, 2011

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, John K. P. O'Brien, Kathryn M. O'Brien, Nicolas T. Vasilache
Domain stretching for an advanced dual-representation polyhedral loop transformation framework

Patent number: 8087011

Abstract: Mechanisms for domain stretching for an advanced dual-representation polyhedral loop transformation framework are provided. The mechanisms of the illustrative embodiments address the weaknesses of the known polyhedral loop transformation based approaches by providing mechanisms for performing code generation transformations on individual statement instances in an intermediate representation generated by the polyhedral loop transformation optimization of the source code. These code generation transformations have the important property that they do not change program order of the statements in the intermediate representation. This property allows the result of the code generation transformations to be provided back to the polyhedral loop transformation mechanisms in a program statement view, via a new re-entrance path of the illustrative embodiments, for additional optimization.

Type: Grant

Filed: September 26, 2007

Date of Patent: December 27, 2011

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, John K. P. O'Brien, Kathryn M. O'Brien, Nicolas T. Vasilache
Method and system for generating addresses for a processor

Patent number: 8051272

Abstract: A method for generating addresses for a processor is provided. The addresses are for use by an application that may be executed by the processor. The application comprises a plurality of instructions, and each instruction comprises at least one line. The method includes storing a plurality of predetermined addresses and, for each line of each instruction, generating at least one address for the processor based on the predetermined addresses.

Type: Grant

Filed: September 15, 2006

Date of Patent: November 1, 2011

Assignee: Samsung Electronics Co., Ltd.

Inventor: Eran Pisek
Automatic Parallelization in a Tracing Just-in-Time Compiler System

Publication number: 20110265067

Abstract: A tracing just-in-time (TJIT) compiler system is described for performing parallelization of code in a runtime phase in the execution of code. Upon detecting a hot loop during the execution of the code, the compiler system extracts trace information from sequentially recorded traces. In a first phase, the compiler system uses the trace information to identify at least one group of operation components that can be operated on in a parallel manner. In a second phase, the compiler system provides instructions which allocate the group of operation components to plural processing resources. A native code generator module carries out those instructions by recompiling native code that directs the operation of a native system to perform parallel processing. The compiler system terminates a group if it encounters program data in a loop iteration that is not consistent with previously encountered predicated information (upon which it records a new trace in a sequential manner).

Type: Application

Filed: April 21, 2010

Publication date: October 27, 2011

Applicant: Microsoft Corporation

Inventors: Wolfram Schulte, Nikolai Tillmann, Michal J. Moskal, Manuel A. Fahndrich, Daniel JP Leijen, Barend H. Venter
Run-Time parallelization of loops in computer programs using bit vectors

Patent number: 8028281

Abstract: Parallelization of loops is performed for loops having indirect loop index variables and embedded conditional statements in the loop body. Loops having any finite number of array variables in the loop body, and any finite number of indirect loop index variables can be parallelized. There are two particular limitations of the described techniques: (i) that there are no cross-iteration dependencies in the loop other than through the indirect loop index variables; and (ii) that the loop index variables (either direct or indirect) are not redefined in the loop body.

Type: Grant

Filed: January 5, 2007

Date of Patent: September 27, 2011

Assignee: International Business Machines Corporation

Inventor: Rajendra K. Bera
Method and apparatus for efficiently processing array operation in computer system

Patent number: 8024717

Abstract: An apparatus and a method for processing an array in a loop in a computer system, including: applying loop unrolling to a multi-dimensional array included in a loop based on a predetermined unrolling factor to generate a plurality of unrolled multi-dimensional arrays; and transforming each of the plurality of unrolled multi-dimensional arrays into a one-dimensional array having an array subscript expression in a form of an affine function with respect to a loop counter variable.

Type: Grant

Filed: July 26, 2006

Date of Patent: September 20, 2011

Assignee: Samsung Electronics Co., Ltd.

Inventors: Dong-Hoon Yoo, Hee Seok Kim, Jeong Wook Kim, Soo Jung Ryu
Workload partitioning in a parallel system with hetergeneous alignment constraints

Patent number: 8006238

Abstract: A process, compiler, computer program product and system for workload partitioning in a heterogeneous system. The process includes determining heterogeneous alignment constraints in the workload, partitioning a portion of tasks to a processing element sensitive to alignment constraints, and partitioning a remaining portion of tasks to a processing element not sensitive to alignment constraints.

Type: Grant

Filed: September 26, 2006

Date of Patent: August 23, 2011

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, John Kevin Patrick O'Brien, Kathryn M. O'Brien, Tong Chen
System for automatically generating optimized codes

Patent number: 7979852

Abstract: The inventive system for automatically generating optimizes codes (19) which are operational on a predefined hardware platform (90) comprises at least one processor which is (91) based on code sources (17) provided by users and comprises means (51, 52) for receiving symbolic code sequences or standard sequences (1) representative for the processor (91) behavior in terms of performance for a predetermined application area, means (53), for receiving static parameters (2), means (55) for receiving dynamic parameters (7), an analysing device (10) for defining optimization rules (9) on the basis of performance tests and measures determined on the basis of the standard sequences (1) and the static (2) and dynamic (7) parameters, a device (80) for optimizing and generating the code receiving the standard sequences (1) and the optimization rules (9) for examining the code sources (17) of the users, detecting optimizable loops, decomposing into cores and for assembling and injecting the codes in such a way that the op

Type: Grant

Filed: January 13, 2005

Date of Patent: July 12, 2011

Assignees: Commissariat a l'Energie Atomique et Aux Energies Alternatives, Caps Entreprise, University of Versailles Saint-Quentin-En-Yvelines

Inventors: François Bodin, William Jalby, Xavier Le Pasteur, Christophe Lemuet, Eric Courtois, Jean Papadopoulo, Pierre Leca
Compiler method for employing multiple autonomous synergistic processors to simultaneously operate on longer vectors of data

Patent number: 7962906

Abstract: A compiler includes a mechanism for employing multiple synergistic processors to execute long vectors. The compiler receives a single source program. The compiler identifies vectorizable loop code in the single source program and extracts the vectorizable loop code from the single source program. The compiler then compiles the extracted vectorizable loop code for a plurality of synergistic processors. The compiler also compiles a remainder of the single source program for a principal processor to form an executable main program such that the executable main program controls operation of the executable vectorizable loop code on the plurality of synergistic processors.

Type: Grant

Filed: March 15, 2007

Date of Patent: June 14, 2011

Assignee: International Business Machines Corporation

Inventors: John Kevin Patrick O'Brien, Kathryn M. O'Brien, Daniel Arthur Prener
Compiler method for extracting and accelerator template program

Patent number: 7926046

Abstract: This invention describes a compilation method of extracting and implementing an accelerator control program from an application source code in a processor based system. The application source code comprises arrays and loops. The input application source code is sequential, with loop, branch and call control structures, while the generated output of this invention has parallel execution semantics. The compilation method comprises the step of performing loop nest analysis, transformations and backend processes. The step of loop nest analysis consists of dependence analysis and pointer analysis. Dependence analysis determines the conflicts between the various references to arrays in the loop, and pointer analysis determines if two pointer references in a loop are in conflict. Transformations convert the loops from their original sequential execution semantics to parallel execution semantics. The back-end process determines the parameters and memory map of the accelerator and the hardware dependent software.

Type: Grant

Filed: July 7, 2006

Date of Patent: April 12, 2011

Inventors: Soorgoli Ashok Halambi, Sarang Ramchandra Shelke, Bhramar Bhushan Vatsa, Dibyapran Sanyal, Nishant Manohar Nakate, Ramanujan K Valmiki, Sai Pramod Kumar Atmakuru, William C Salefski, Vidya Praveen
Single-chip multiprocessor with clock cycle-precise program scheduling of parallel execution

Patent number: 7895587

Abstract: A single-chip multiprocessor system and operation method of this system based on a static macro-scheduling of parallel streams for multiprocessor parallel execution. The single-chip multiprocessor system has buses for direct exchange between the processor register files and access to their store addresses and data. Each explicit parallelism architecture processor of this system has an interprocessor interface providing the synchronization signals exchange, data exchange at the register file level and access to store addresses and data of other processors. The single-chip multiprocessor system uses ILP to increase the performance. Synchronization of the streams parallel execution is ensured using special operations setting a sequence of streams and stream fragments execution prescribed by the program algorithm.

Type: Grant

Filed: September 8, 2006

Date of Patent: February 22, 2011

Assignee: Elbrus International

Inventors: Boris A. Babaian, Yuli Kh. Sakhin, Vladimir Yu. Volkonskiy, Sergey A. Rozhkov, Vladimir V. Tikhorsky, Feodor A. Gruzdov, Leonid N. Nazarov, Mikhail L. Chudakov
Array value substitution and propagation with loop transformations through static analysis

Patent number: 7890942

Abstract: A method and system for substituting array values (i.e., expressions) in a program at compile time. An initialization of an array is identified in a loop. The initialization is an assignment of an expression (i.e., a constant or a function of an induction variable to elements of the array). The expression is stored in a table that associates the expression with the array and indices of the array. An assignment statement is detected that is to assign at least one element of the initialized elements. The expression is retrieved from the table based on the expression being associated with the array and corresponding indices. The expression is substituted for the at least one element so that the expression is to be assigned by the assignment statement. The process of substituting array values is extended to interprocedural analysis.

Type: Grant

Filed: August 15, 2006

Date of Patent: February 15, 2011

Assignee: International Business Machines Corporation

Inventor: Rohini Nair
Method, system, and program of a compiler to parallelize source code

Patent number: 7882498

Abstract: Provided are a method, system, and program for parallelizing source code with a compiler. Source code including source code statements is received. The source code statements are processed to determine a dependency of the statements. Multiple groups of statements are determined from the determined dependency of the statements, wherein statements in one group are dependent on one another. At least one directive is inserted in the source code, wherein each directive is associated with one group of statements. Resulting threaded code is generated including the inserted at least one directive. The group of statements to which the directive in the resulting threaded code applies are processed as a separate task. Each group of statements designated by the directive to be processed as a separate task may be processed concurrently with respect to other groups of statements.

Type: Grant

Filed: March 31, 2006

Date of Patent: February 1, 2011

Assignee: Intel Corporation

Inventors: Guilherme D. Ottoni, Xinmin Tian, Hong Wang, Richard A. Hankins, Wei Li, John Shen
Method, system, and computer program product to generate test instruction streams while guaranteeing loop termination

Patent number: 7877742

Abstract: A method, system, and computer program product for generating terminating, pseudo-random test instruction streams, including forward and backward branching instructions. A first instruction stream is generated, including at least one backward branching instruction and at least one forward branching instruction. Each backward branching instruction is preceded by at least one forward branching instruction, which is used to guarantee termination of the loop formed by the backward branching instruction. Backward branching targets are resolved when the backward branching instruction is inserted into the first instruction stream. Forward branching targets remain unresolved in the first instruction stream. A set of potential branch targets is determined for each forward branching instruction. For each forward branching instruction, a branch target is randomly selected from the set of potential branch targets for that forward branching instruction.

Type: Grant

Filed: June 8, 2006

Date of Patent: January 25, 2011

Assignee: International Business Machines Corporation

Inventors: Ali Y. Duale, Theodore J. Bohizic, Dennis W. Wittig
Method of partially copying first and last private arrays for parallelized loops based on array data flow

Patent number: 7877739

Abstract: A computer-implemented method for determining whether an array within a loop can be privatized for that loop is presented. The method calculates the array sections that require first or last privatization and copies only those sections, reducing the privatization overhead of the known solutions.

Type: Grant

Filed: October 9, 2006

Date of Patent: January 25, 2011

Assignee: International Business Machines Corporation

Inventors: Roch G. Archambault, Erik P. Charlebois, Guansong Zhang
Method and apparatus for software scouting regions of a program

Patent number: 7849453

Abstract: One embodiment of the present invention provides a system that generates code for software scouting the regions of a program. During operation, the system receives source code for a program. The system then compiles the source code. In the first step of the compilation process, the system identifies a first set of loops from a hierarchy of loops in the source code, wherein each loop in the first set of loops contains at least one effective prefetch candidate. Then, from the first set of loops, the system identifies a second set of loops where scout-mode prefetching is profitable. Next, for each loop in the second set of loops, the system produces executable code for a helper-thread which contains a prefetch instruction for each effective prefetch candidate. At runtime the helper-thread is executed in parallel with the main thread in advance of where the main thread is executing to prefetch data items for the main thread.

Type: Grant

Filed: November 9, 2005

Date of Patent: December 7, 2010

Assignee: Oracle America, Inc.

Inventors: Partha P. Tirumalai, Yonghong Song, Spiros Kalogeropulos
Seamless extension of shareable interface mechanism to servlet and extended applet model for inter-application communication

Patent number: 7836456

Abstract: In a smart card system in which applications executing in different execution contexts are allowed to communicate with each other only through shareable interface objects (SIO's), a registry mechanism is provided to mediate in inter-application communication between legacy applets, extended applets and servlets. A request by a client application for a SIO of a server application in a different execution context is routed to the registry mechanism by the system. Dependent on what types the client and server applications are, the registry mechanism provides call interfaces as would be expected by the applications to enable passing the SIO from the server application to the client application. In one embodiment, servlets and extended applets may also register and unregister their SIOs dynamically with the registration mechanism.

Type: Grant

Filed: October 31, 2006

Date of Patent: November 16, 2010

Assignee: Oracle America, Inc.

Inventors: Thierry P. Violleau, Tanjore Ravishankar, Matthew R. Hill
Method and system for analyzing array access to a pointer that is referenced as an array

Patent number: 7836434

Abstract: Methods, systems, and articles of manufacture consistent with the present invention provide an improved technique for analyzing statements that use pointer or array syntax to access dynamically-allocated arrays to determine whether the statement generates a reference that is outside the bounds of the array's allocated memory. Statements that use pointer or array syntax to access dynamically-allocated arrays can be either statically (at compile-time) or dynamically bounds (at run-time) checked. Methods and systems in accordance with the present invention determine at compile-time if an array reference can be determined to always be in bounds or definitely out of bounds at least once, and if not, insert code into the program to check the array bounds dynamically at run-time before the access of the array reference.

Type: Grant

Filed: May 4, 2004

Date of Patent: November 16, 2010

Assignee: Oracle America, Inc.

Inventor: Michael L. Boucher
METHODS AND APPARATUSES FOR COMPILER-CREATING HELPER THREADS FOR MULTI-THREADING

Publication number: 20100281471

Abstract: Methods and apparatuses for compiler-created helper thread for multi-threading are described herein. In one embodiment, exemplary process includes identifying a region of a main thread that likely has one or more delinquent loads, the one or more delinquent loads representing loads which likely suffer cache misses during an execution of the main thread, analyzing the region for one or more helper threads with respect to the main thread, and generating code for the one or more helper threads, the one or more helper threads being speculatively executed in parallel with the main thread to perform one or more tasks for the region of the main thread. Other methods and apparatuses are also described.

Type: Application

Filed: December 31, 2009

Publication date: November 4, 2010

Inventors: Shih-Wei Liao, Xinmin Tian, Gerolf F. Hoflehner, Hong Wang, Daniel M. Lavery, Perry Wang, Dongkeun Kim, Milind Girkar, John P. Shen
Compiler apparatus

Patent number: 7827542

Abstract: A compiler apparatus that improves the performance of loop processing. The compiler apparatus translates a C program that includes a loop into a machine language program, and includes: a movement judgment unit that judges whether or not an instruction which is positioned outside of the loop of the C program can be moved into the loop, based on a state of live ranges of variables used in the instruction; a movement execution unit that moves the instruction into the loop in the case where the movement judgment unit judges that the instruction can be moved into the loop, thereby generating an intermediate program; and a translation unit that translates the intermediate program into the machine language program.

Type: Grant

Filed: September 25, 2006

Date of Patent: November 2, 2010

Assignee: Panasonic Corporation

Inventors: Hajime Ogawa, Ryoko Miyachi, Toshiyuki Sakata
CONCURRENT MUTATION OF ISOLATED OBJECT GRAPHS

Publication number: 20100275191

Abstract: Fine-grained parallelism within isolated object graphs is used to provide safe concurrent operations within the isolated object graphs. One example provides an abstraction labeled IsolatedObjectGraph that encapsulates at least one object graph, but often two or more object graphs, rooted by an instance of a type member. By encapsulating the object graph, no references from outside of the object graph are allowed to objects inside of the object graph. Also, the encapsulated object graph does not contain references to objects outside of the graphs. The isolated object graphs provide for safe data parallel operations, including safe data parallel mutations such as for each loops. In an example, the ability to isolate the object graph is provided through type permissions.

Type: Application

Filed: April 24, 2009

Publication date: October 28, 2010

Applicant: MICROSOFT CORPORATION

Inventors: John J. Duffy, Niklas Gustafsson, Vance Morrison
Using a concurrent partial inspector loop with speculative parallelism

Patent number: 7823141

Abstract: A method for executing a loop in an application that includes executing iterations in a first segment of the loop by a base thread, logging memory transactions that occur during execution of iterations in the first segment by a co-inspector thread to obtain a co-inspector log, executing iterations in a second segment of the loop by a co-thread to obtain temporary results, logging memory transactions that occur during execution of iterations in the second segment to obtain a co-thread log, and comparing the co-inspector log and the co-thread log to determine whether a thread interdependency exists.

Type: Grant

Filed: September 30, 2005

Date of Patent: October 26, 2010

Assignee: Oracle America, Inc.

Inventors: Phyllis E. Gustafson, Michael H. Paleczny, Christopher A. Vick, Olaf Manczak, Jay R. Freeman, Yuguang Wu
Automatic minimal build dependency determination and building an executable with source code

Patent number: 7818730

Abstract: The present invention provides a method and system for building an executable using only the necessary source modules or a reduced set of source modules. The complete list of necessary source modules can be determined by checking for dependency of any already identified necessary source modules. Hence, if any of the source modules belongs to a library, the entire library will not need to be compiled in order to use any necessary source module to build the executable. The present invention has the advantage that the executable takes shorter time to build and the executable is smaller in memory size so that it is easier to be ported to a target system. The present invention may also be used to minimize or reduce the memory needed to load a model so that only the elements/blocks that are used in the model are loaded into memory when a model loads.

Type: Grant

Filed: June 30, 2006

Date of Patent: October 19, 2010

Assignee: The Math Works, Inc.

Inventors: Anthony Robert Ryan, James Carrick
Automated safe secure techniques for eliminating undefined behavior in computer software

Patent number: 7818729

Abstract: Automated (e.g., compiler implemented) techniques provide safe secure software development including techniques for testing and verifying software for determining and/or certifying that the software had certain characteristics and/or complies with certain properties. In another illustrative implementation, methods are provided whereby the consumer can verify, to any desired level of certainty, that software as delivered truly has the specified properties, and that the compiler used to produce that software can be trusted to provide those assurances.

Type: Grant

Filed: June 23, 2006

Date of Patent: October 19, 2010

Inventors: Thomas S. Plum, David M. Keaton
Method for loop reformulation

Patent number: 7814468

Abstract: A method for loop reformulation is provided such that a single exit ill-formed loop (SEIFL) can be reformulated into a reformulated code block that contains a transformed well-formed loop (TWFL). A SEIFL loop is a loop that can exit from the loop body of the loop. After the loop reformulation, the TWFL of the reformulated code block can only exit from the end of the loop. The reformulated code block will replace the SEIFL in the compiler's internal representation (IR) such that a more efficient executable machine code can be generated by optimizing the reformulated compiler's IR.

Type: Grant

Filed: April 20, 2005

Date of Patent: October 12, 2010

Assignee: Oracle America, Inc.

Inventors: Yonghong Song, Xiangyun Kong
Array compression method

Patent number: 7805413

Abstract: A program stored in a storage device is read. Partial compression, in the element in an array in a loop nest in the program, is performed by replacing an element local only in the loop nest in the entire program with a scalar variable. Access to an original array is inserted into a program for an non-local element.

Type: Grant

Filed: December 22, 2003

Date of Patent: September 28, 2010

Assignee: Fujitsu Limited

Inventor: Akira Hosoi
COMPILER, COMPILE METHOD, AND PROCESSOR CORE CONTROL METHOD AND PROCESSOR

Publication number: 20100235611

Abstract: A compiler compiling a source code and is implemented in a plurality of processor cores includes a parallel loop processing detection unit configured to detect from the source code a loop processing code for execution of an internal processing operation for a given number of repeating times, and an independent parallel loop processing code in the internal processing operation performed for each repetition to be concurrently processed, and a dynamic parallel conversion unit configured to generate a control core code for control of the number of repeating times in the parallel loop processing code and a parallel processing code for changing the number of repeating times corresponding to the control from the control core code.

Type: Application

Filed: March 18, 2010

Publication date: September 16, 2010

Applicant: FUJITSU LIMITED

Inventor: Koichiro YAMASHITA
Estimating a dominant resource used by a computer program

Patent number: 7797692

Abstract: A system that estimates a dominant computational resource which is used by a computer program. During operation, for each basic block in the computer program, the system determines a nesting level for the basic block. Next, the system selects basic blocks with nesting levels greater than a specified threshold. For each selected basic block, the system analyzes the basic block to estimate the dominant computational resource used by the basic block. The system then uses the estimated dominant computational resources for the selected basic blocks to estimate the dominant computational resource for the computer program.

Type: Grant

Filed: May 12, 2006

Date of Patent: September 14, 2010

Assignee: Google Inc.

Inventor: Grzegorz J. Czajkowski
Systems and methods for affine-partitioning programs onto multiple processing units

Patent number: 7793278

Abstract: Systems and methods perform affine partitioning on a code stream to produce code segments that may be parallelized. The code segments include copies of the original code stream with conditional inserted that aid in parallelizing code. The conditional is formed by determining the constraints on a processor variable determined by the affine partitioning and applying the constraints to the original code stream.

Type: Grant

Filed: September 30, 2005

Date of Patent: September 7, 2010

Assignee: Intel Corporation

Inventors: Zhao Hui Du, Shih-Wei Liao, Gansha Wu, Guei-Yuan Lueh
Profiling of performance behaviour of executed loops

Patent number: 7784040

Abstract: A method and system for profiling performance behaviour of executed loops. For each invocation of a loop, a count of a measured event is incremented. A display is provided for a loop (209) showing the number of measured events for each of the loop's invocations. The code of a loop is instrumented (204) to obtain the count of loop invocations (207) and the occurrences of the measured event (208).

Type: Grant

Filed: November 15, 2005

Date of Patent: August 24, 2010

Assignee: International Business Machines Corporation

Inventors: Gad Haber, Marcel Zalmanovici
Model checking with bounded context switches

Patent number: 7779382

Abstract: Validity of one or more assertions for any concurrent execution of a plurality of software instructions with at most k?1 context switches can be determined. Validity checking can account for execution of the software instructions in an unbounded stack depth scenario. A finite data domain representation can be used. The software instructions can be represented by a pushdown system. Validity checking can account for thread creation during execution of the plurality of software instructions.

Type: Grant

Filed: December 10, 2004

Date of Patent: August 17, 2010

Assignee: Microsoft Corporation

Inventors: Niels Jakob Rehof, Shaz Qadeer
Non-Localized Constraints for Automated Program Generation

Publication number: 20100205589

Abstract: A method and a system for non-locally constraining a plurality of related but separated program entities (e.g., a loop operation and a related accumulation operation within the loop's scope) such that any broad program transformation affecting both will have the machinery to assure that the changes to both entities will preserve the invariant properties of and dependencies among them. For example, if a program transform alters one entity (e.g., re-expresses an accumulation operation as a vector operation incorporating some or all of the loop's iteration) the constraint will provide the machinery to assure a compensating alteration of the other entities (e.g., the loop operation is reduced to reflect the vectorization of the accumulation operation). One realization of this method comprises specialized instances of the related entities that while retaining their roles as program entities (i.e., operators), also contain data and machinery to define the non-local constraint relationship.

Type: Application

Filed: April 25, 2010

Publication date: August 12, 2010

Inventor: Ted James Biggerstaff
Method and system for performing reassociation in software loops

Patent number: 7774766

Abstract: Various embodiments of the present invention relate to methods and systems for optimizing an intermediate code in a compilation logic. The intermediate code is optimized by performing reassociation in software loops. The intermediate code includes at least one critical recurrence cycle. The performance of reassociation in software loops can reduce a critical recurrence cycle in them, which can speed up their execution. The subject method can include the determination of one or more critical recurrence cycles in a software loop. The method can also include the determination of at least one edge in a critical recurrence cycle, with respect to which reassociation can be performed, if one or more pre-determined criteria are met. The method can further include performing reassociation of a dependee and a dependent of an edge. In an embodiment, when one or more pre-determined criteria are met, the logic of the software loop is maintained after performing reassociation of the dependee and the dependent of the edge.

Type: Grant

Filed: September 29, 2005

Date of Patent: August 10, 2010

Assignee: Intel Corporation

Inventors: Kalyan Muthukumar, Daniel M Lavery
Methods And Apparatus For Local Memory Compaction

Publication number: 20100192138

Abstract: Methods, apparatus and computer software product for local memory compaction are provided. In an exemplary embodiment, a processor in connection with a memory compaction module identifies inefficiencies in array references contained within in received source code, allocates a local array and maps the data from the inefficient array reference to the local array in a manner which improves the memory size requirements for storing and accessing the data. In another embodiment, a computer software product implementing a local memory compaction module is provided. In a further embodiment a computing apparatus is provided. The computing apparatus is configured to improve the efficiency of data storage in array references. This Abstract is provided for the sole purpose of complying with the Abstract requirement rules. This Abstract is submitted with the explicit understanding that it will not be used to interpret or to limit the scope or the meaning of the claims.

Type: Application

Filed: February 4, 2009

Publication date: July 29, 2010

Inventors: Allen K. Leung, Benoit J. Meister, David E. Wohlford, Nicolas T. Vasilache, Richard A. Lethin
Compiler with cache utilization optimizations

Patent number: 7765534

Abstract: A compiling program with cache utilization optimizations employs an inter-procedural global analysis of the data access patterns of compile units to be processed. The global analysis determines sufficient information to allow intelligent application of optimization techniques to be employed to enhance the operation and utilization of the available cache systems on target hardware.

Type: Grant

Filed: April 30, 2004

Date of Patent: July 27, 2010

Assignee: International Business Machines Corporation

Inventors: Roch G. Archambault, Robert J. Blainey, Yaoqing Gao
Generating efficient parallel code using partitioning, coalescing, and degenerative loop and guard removal

Patent number: 7757222

Abstract: Code is affine partitioned to generate affine partitioning mappings. Parallel code is generated based on the affine partitioning mappings. Generating the parallel code includes coalescing loops in the parallel code generated from the affine partitioning mappings to generate coalesced parallel code and optimizing the coalesced parallel code.

Type: Grant

Filed: September 30, 2005

Date of Patent: July 13, 2010

Assignee: Intel Corporation

Inventors: Shih-wei Liao, Zhao Hui Du, Bu Qi Cheng, Gansha Wu, Guei-Yuan Lueh
Method and program for generating execution code for performing parallel processing

Patent number: 7739530

Abstract: Provided is a method of reliably reducing power consumption of a computer, while promoting prompt compilation of a source code and execution of an output code. The method according to this invention includes the steps of: reading a code which is preset and analyzing an amount of operation of the CPU and an access amount with respect to the cache memory based on the code; obtaining an execution rate of the CPU and an access rate with respect to the cache memory based on the amount of operation and the access amount; determining an area in which the access rate with respect to the cache memory is higher than the execution rate of the CPU, based on the code; adding a code for enabling the power consumption reduction function to the area; and generating an execution code executable on the computer, based on the code.

Type: Grant

Filed: February 16, 2007

Date of Patent: June 15, 2010

Assignee: Hitachi, Ltd.

Inventors: Koichi Takayama, Naonobu Sukegawa
Efficient generation of SIMD code in presence of multi-threading and other false sharing conditions and in machines having memory protection support

Patent number: 7730463

Abstract: A computer implemented method, system and computer program product for automatically generating SIMD code. The method begins by analyzing data to be accessed by a targeted loop including at least one statement, where each statement has at least one memory reference, to determine if memory accesses are safe. If memory accesses are safe, the targeted loop is simdized. If not safe, it is determined if a scheme can be applied in which safety need not be guaranteed. If such a scheme can be applied, the targeted loop is simdized according to the scheme. If such a scheme cannot be applied, it is determined if padding is appropriate. If padding is appropriate, the data is padded and the targeted loop is simdized. If padding is not appropriate, non-simdized code is generated based on the targeted loop for handling boundary conditions, the targeted loop is simdized and combined with the non-simdized code.

Type: Grant

Filed: February 21, 2006

Date of Patent: June 1, 2010

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu, Peng Zhao
Method and apparatus for modulo scheduled loop execution in a processor architecture

Patent number: 7725696

Abstract: A processor method and apparatus that allows for the overlapped execution of multiple iterations of a loop while allowing the compiler to include only a single copy of the loop body in the code while automatically managing which iterations are active. Since the prologue and epilogue are implicitly created and maintained within the hardware in the invention, a significant reduction in code size can be achieved compared to software-only modulo scheduling. Furthermore, loops with iteration counts less than the number of concurrent iterations present in the kernel are also automatically handled. This hardware enhanced scheme achieves the same performance as the fully-specified standard method. Furthermore, the hardware reduces the power requirement as the entire fetch unit can be deactivated for a portion of the loop's execution.

Type: Grant

Filed: October 4, 2007

Date of Patent: May 25, 2010

Inventors: Wen-mei W. Hwu, Matthew C. Merten
Efficient protocol for encoding software pipelined loop when PC trace is enabled

Patent number: 7721267

Abstract: A software pipelined loop tracing method involves inhibiting an output of trace data at a start of a software pipelined loop (SPLOOP). A skip in an output trace packet is indicated if the SPLOOP is skipped, and the SPLOOP is indicated at a cycle of an epilog state in the output trace packet if the SPLOOP is not skipped. An iteration count indication SPLOOP information and a position within a SPLOOP, is maintained. A periodic SPLOOP marker (PerSP) coinciding with a sync point is output if the SPLOOP is active.

Type: Grant

Filed: May 16, 2006

Date of Patent: May 18, 2010

Assignee: Texas Instruments Incorporated

Inventor: Manisha Agarwala
Method for predicate promotion in a software loop

Patent number: 7712091

Abstract: A method and system for optimizing the execution of a software loop is provided. The method involves the determination of an edge in a critical recurrence cycle in the software loop. The edge is a dependency link between two instructions and contains a dependee and a dependent. The dependee is an instruction that produces a result, and the dependent is an instruction that uses the result. The method further involves performing predicate promotion of at least one of the dependee and the dependent if one or more pre-determined conditions are met.

Type: Grant

Filed: September 30, 2005

Date of Patent: May 4, 2010

Assignee: Intel Corporation

Inventors: Kalyan Muthukumar, Robyn A. Sampson, Daniel Lavery
Architecture for a computer-based development environment with self-contained components and a threading model

Patent number: 7707543

Abstract: A method, a device and a system arrangement are disclosed for generating self-contained software components having in each case synchronous and/or asynchronous interfaces with an internal threading model. The concept disclosed enables all necessary synchronization mechanisms to be provided automatically. The concept is based on an asynchronous operation manager used to divert callbacks from a called component into a calling component.

Type: Grant

Filed: November 22, 2005

Date of Patent: April 27, 2010

Assignee: Siemens Aktiengesellschaft

Inventors: Detlef Becker, Karlheinz Dorn, Vladyslav Ukis, Hans-Martin Von Stockhausen
Compiler apparatus with flexible optimization

Patent number: 7698696

Abstract: A compiler comprises an analysis unit that detects directives (options and pragmas) from a user to the compiler, an optimization unit that is made up of a processing unit (a global region allocation unit, a software pipelining unit, a loop unrolling unit, a “if” conversion unit, and a pair instruction generation unit) that performs individual optimization processing designated by options and pragmas from a user, following the directives and the like from the analysis unit, etc. The global region allocation unit performs optimization processing, following designation of the maximum data size of variables to be allocated to a global region, designation of variables to be allocated to the global region, and options and pragmas regarding designation of variables not to be allocated in the global region.

Type: Grant

Filed: June 30, 2003

Date of Patent: April 13, 2010

Assignee: Panasonic Corporation

Inventors: Hajime Ogawa, Taketo Heishi, Toshiyuki Sakata, Shuichi Takayama, Shohei Michimoto, Tomoo Hamada, Ryoko Miyachi
Instruction stream control

Patent number: 7689735

Abstract: An interface requests instructions from a data store storing instructions of an application to be processed by a data processor, and receives and transmits the instructions to the data processor. The interface includes: an input that receives the instructions from the data store via at least one input bus; a buffer that stores received instructions; an output that outputs instructions to the data processing apparatus via the output bus; a control signal input that receives a control signal; and a buffer controller that controls the buffer to request an instruction subsequent to a previously received instruction within an instruction stream of the application from the data store in response to detection of no control signal on the control signal input and to detection of available buffer storage capacity.

Type: Grant

Filed: October 3, 2005

Date of Patent: March 30, 2010

Assignee: ARM Limited

Inventors: Martinus Cornelis Wezelenburg, Dirk Duerinckx, Jan Guffens
Splitting the computation space to optimize parallel code

Patent number: 7689980

Abstract: Linear transformations of statements in code are performed to generate linear expressions associated with the statements. Parallel code is generated using the linear expressions. Generating the parallel code includes splitting the computation-space of the statements into intervals and generating parallel code for the intervals.

Type: Grant

Filed: September 30, 2005

Date of Patent: March 30, 2010

Assignee: Intel Corporation

Inventors: Zhao Hui Du, Shih-wei Liao, Gansha Wu, Guei-Yuan Lueh

prev 1 2 3 4 5 6 7 next