Including Loop Patents (Class 717/160)
-
Publication number: 20110161923Abstract: The system includes a command set defining a plurality of navigation commands for an audiovisual reproduction apparatus and a human-oriented scripting program for automatically authoring a navigation structure for use in a stand alone audiovisual product playable in the audiovisual reproduction apparatus. The scripting program includes an iterative loop with a variable adjusted according to the iterations of the loop. The scripting program is operable to automatically, for each iteration of the loop; select from the plurality of navigation commands a navigation command defined according to the variable as adjusted for each iteration of the loop; and add the navigation command to an intermediate representation of the navigation structure. An associated method is also provided.Type: ApplicationFiled: April 19, 2005Publication date: June 30, 2011Applicant: ZOOtech LimitedInventor: Stuart Green
-
Patent number: 7962906Abstract: A compiler includes a mechanism for employing multiple synergistic processors to execute long vectors. The compiler receives a single source program. The compiler identifies vectorizable loop code in the single source program and extracts the vectorizable loop code from the single source program. The compiler then compiles the extracted vectorizable loop code for a plurality of synergistic processors. The compiler also compiles a remainder of the single source program for a principal processor to form an executable main program such that the executable main program controls operation of the executable vectorizable loop code on the plurality of synergistic processors.Type: GrantFiled: March 15, 2007Date of Patent: June 14, 2011Assignee: International Business Machines CorporationInventors: John Kevin Patrick O'Brien, Kathryn M. O'Brien, Daniel Arthur Prener
-
Patent number: 7937695Abstract: Based on operations within an uncounted loop of source code, one or more calculations are generated for determining, at runtime, an expected number of iterations through which the uncounted loop can iterate before encountering an exception corresponding to at least one target exception check. A copy of the uncounted loop omitting each target exception check is generated. The uncounted loop, the copy of the uncounted loop, and the one or more calculations are arranged in compiled code so that at runtime program flow enters the copy of the uncounted loop. If a maximum number of iterations of the copy of the uncounted loop is reached, program flow proceeds from the copy of the uncounted loop to the uncounted loop. The maximum number of iterations is no more than the smallest member of a set consisting of the expected number of iterations for each target exception check.Type: GrantFiled: April 27, 2007Date of Patent: May 3, 2011Assignee: International Business Machines CorporationInventor: Mark Graham Stoodley
-
Patent number: 7908256Abstract: A computer-implementable method, system and computer-usable medium. One or more objects among a plurality of objects can be processed utilizing a data-processing apparatus/system. One or more lock reservations can be applied among a group of lock reservations over a multiple sequential lock operations with respect the particular object. Thereafter, the lock reservation can be cancelled with respect to the last monitor exit operation in order to eliminate lock operations where traditional lock coarsening cannot be applied.Type: GrantFiled: November 30, 2007Date of Patent: March 15, 2011Assignee: International Business Machines CorporationInventors: Nikola Grcevski, Peter Burka
-
Publication number: 20110047534Abstract: A system and method for optimization of code with non-adjacent loops. A compiler builds a node tree, which is not a control flow graph, that represents parent-child relationships of nodes of a computer program. Each node represents a control flow statement or a straight-line block of statements of the computer program. If a non-adjacent loop pair of nodes satisfy predetermined conditions, the compiler may perform legal code transformations on the computer program and corresponding node transformations on the node tree. These transformations may make adjacent this pair of loop nodes. The compiler may be configured to perform legal code transformations, such as head and tail duplication, code motion, and if-merging, in order to make adjacent these two loop nodes. Then loop fusion may be performed on this loop pair in order to increase instruction level parallelism (ILP) within an optimized version of the original source code.Type: ApplicationFiled: August 22, 2009Publication date: February 24, 2011Inventors: Mei Ye, Dinesh Suresh, Dz-ching Ju, Michael Lai
-
Patent number: 7890942Abstract: A method and system for substituting array values (i.e., expressions) in a program at compile time. An initialization of an array is identified in a loop. The initialization is an assignment of an expression (i.e., a constant or a function of an induction variable to elements of the array). The expression is stored in a table that associates the expression with the array and indices of the array. An assignment statement is detected that is to assign at least one element of the initialized elements. The expression is retrieved from the table based on the expression being associated with the array and corresponding indices. The expression is substituted for the at least one element so that the expression is to be assigned by the assignment statement. The process of substituting array values is extended to interprocedural analysis.Type: GrantFiled: August 15, 2006Date of Patent: February 15, 2011Assignee: International Business Machines CorporationInventor: Rohini Nair
-
Patent number: 7890943Abstract: Instructions that have no dependence constraint between them and other instructions in a loop of a critical section may be moved out of the critical section so that the size of the critical section may be reduced. A flow graph of a program including the critical section may be generated, which includes loops. The flow graph may be transformed based on which any unnecessary instructions in loops may be moved out of the critical section. Subsequently, the original flow graph of the critical section may be recovered from the transformed flow graph.Type: GrantFiled: March 30, 2007Date of Patent: February 15, 2011Assignee: Intel CorporationInventors: Xiaofeng Guo, Jinquan Dai, Long Li
-
Patent number: 7890940Abstract: To collect frequencies with which processes of a program are executed at high speed. A compiler apparatus for optimizing a program based on frequencies with which each process is executed has a loop process detection portion for detecting a repeatedly executed loop process of the program, a loop process frequency collection portion for collecting loop process frequencies with which the loop process is executed in the program, an in-loop process frequency collection portion for collecting in-loop process frequencies with which, as against times of execution of loop process, each of a plurality of in-loop processes included in the loop process is executed, an in-loop execution information generating portion for generating in-loop execution information indicating the frequencies with which each of the plurality of in-loop processes is executed in the case where the program is executed, and an optimization portion for optimizing the program based on the in-loop execution information.Type: GrantFiled: January 11, 2008Date of Patent: February 15, 2011Assignee: International Business Machines CorporationInventors: Hideaki Komatsu, Toshio Suganuma, Toshiaki Yasue
-
Publication number: 20110029962Abstract: A method for vectorization of a block of code is provided. The method comprises receiving a first block of code as input; and converting the first block of code into at least a second block of code and a third block of code. The first block of code accesses a first set of memory addresses that are potentially misaligned. The second block of code performs conditional leaping address incrementation to selectively access a first subset of the first set of memory addresses. The third block of code accesses a second subset of the first set of memory addresses starting from an aligned memory address, simultaneously accessing multiple memory addresses at a time. No memory address belongs to both the first subset and the second subset of memory addresses.Type: ApplicationFiled: July 28, 2009Publication date: February 3, 2011Applicant: International Business Machines CorporationInventors: Dorit Nuzman, Ira Rosen, Ayal Zaks
-
Patent number: 7882498Abstract: Provided are a method, system, and program for parallelizing source code with a compiler. Source code including source code statements is received. The source code statements are processed to determine a dependency of the statements. Multiple groups of statements are determined from the determined dependency of the statements, wherein statements in one group are dependent on one another. At least one directive is inserted in the source code, wherein each directive is associated with one group of statements. Resulting threaded code is generated including the inserted at least one directive. The group of statements to which the directive in the resulting threaded code applies are processed as a separate task. Each group of statements designated by the directive to be processed as a separate task may be processed concurrently with respect to other groups of statements.Type: GrantFiled: March 31, 2006Date of Patent: February 1, 2011Assignee: Intel CorporationInventors: Guilherme D. Ottoni, Xinmin Tian, Hong Wang, Richard A. Hankins, Wei Li, John Shen
-
Patent number: 7877739Abstract: A computer-implemented method for determining whether an array within a loop can be privatized for that loop is presented. The method calculates the array sections that require first or last privatization and copies only those sections, reducing the privatization overhead of the known solutions.Type: GrantFiled: October 9, 2006Date of Patent: January 25, 2011Assignee: International Business Machines CorporationInventors: Roch G. Archambault, Erik P. Charlebois, Guansong Zhang
-
Patent number: 7873954Abstract: Stack signature marking segments are inserted into re-entrant programming source code modules prior to compilation of the modules at each code module entry point and at each code module exit point, followed by producing one or more executable programs from the programming source code modules. Upon execution of instances of the executable programs, the inserted segments assign unique, non-duplicated module identifier values to the instances of the code modules, generate an instance count for each instantiation of executable code module in the stack signature for each object instance dynamically created during runtime of a re-entrant executable code module, and push onto a processing stack the module identifier values and the instance counts within stack frames allocated to each of the executable program instances.Type: GrantFiled: May 12, 2006Date of Patent: January 18, 2011Assignee: International Business Machines CorporationInventors: Lorin Ullmann, Allen Chester Wynn
-
Using transactional memory for precise exception handling in aggressive dynamic binary optimizations
Patent number: 7865885Abstract: Dynamic optimization of application code is performed by selecting a portion of the application code as a possible transaction. A transaction has a property that when it is executed, it is either atomically committed or atomically aborted. Determining whether to convert the selected portion of the application code to a transaction includes determining whether to apply at least one of a group of code optimizations to the portion of the application code. If it is determined to apply at least one of the code optimizations of the group of optimizations to the portion of application code, then the optimization is applied to the portion of the code and the portion of the code is converted to a transaction.Type: GrantFiled: September 27, 2006Date of Patent: January 4, 2011Assignee: Intel CorporationInventors: Youfeng Wu, Cheng Wang, Ho-seop Kim -
Patent number: 7865886Abstract: A method and apparatus for to blocking nested loops having feedback or feedforward indexing. An embodiment of a method includes receiving a computer code segment, the segment including a first inner loop and a second outer loop, the inner loop being within the outer loop and the inn loops having a one-dimensional iteration space that is independent of the outer loop. The first loop is indexed by a variable I over a contiguous one-dimensional iteration space and addresses one or more data arrays with a shift in the index. The method further includes dividing a two-dimensional iteration space of the first loop and the second loop into multiple contiguous windows, where the second loop uses only one window of the plurality of windows during each iteration and the plurality of windows cover the iteration space. The method includes modifying the computer code segment by adding a third outer loop outside the second loop of the segment, the third loop encompassing the first loop and the second loop.Type: GrantFiled: November 28, 2005Date of Patent: January 4, 2011Assignee: Intel CorporationInventor: Hans-Joachim Plum
-
Patent number: 7856629Abstract: A compiler apparatus, which can perform software pipelining optimization that has a considerable effect of reducing the number of execution cycles taken to complete a loop process, converts a source program into a machine program for a processor which is capable of parallel processing. The compiler apparatus is composed of: a parsing unit operable to parse the source program and then to convert the source program into an intermediate program which is described in an intermediate language; an optimization unit operable to optimize the intermediate program; and a conversion unit operable to convert the optimized intermediate program into the machine language program, wherein the optimization unit is operable to execute software pipelining, by inserting a transfer instruction, which is used for transferring data between operands, into a loop process included in the intermediate program so that a data dependence relation is changed.Type: GrantFiled: May 24, 2006Date of Patent: December 21, 2010Assignee: Panasonic CorporationInventors: Shohei Michimoto, Taketo Heishi, Hajime Ogawa, Teruo Kawabata
-
Publication number: 20100318980Abstract: Described is an analysis tool/techniques for determining the computational complexity of a computer program, including when the program includes procedures having nested loops and/or multi-path loops. First, multi-path loops are converted into code-fragments consisting of simpler loops via a transformation called control flow refinement. Progress invariants are determined for appropriate locations in the procedure to represent relationships between a state that can arise at that program location and the previous state at that location. A bound finding mechanism (such as one based on pattern matching) is then used to compute loop bounds from progress invariants. These bounds are then composed appropriately to determine a precise bound for the enclosing procedure.Type: ApplicationFiled: June 13, 2009Publication date: December 16, 2010Applicant: Microsoft CorporationInventors: Sumit Gulwani, Sagar Jain, Eric J. Koskinen
-
Publication number: 20100318979Abstract: A system and method of compiling program code, wherein the program code includes an operation on an array of data elements stored in memory of a computer system. The program code is scanned for an equation which may have recurring data points. The equation is then replaced with vectorized machine executable code, wherein the machine executable code comprises a nested loop and wherein the nested loop comprises an exterior loop and a virtual interior loop. The exterior loop decomposes the equation into a plurality of loops of length N, wherein N is an integer greater than one. The virtual interior loop executes vector operations corresponding to the N length loop to form a result vector resident in memory, wherein the virtual interior loop includes a vector atomic memory operation (AMO) instruction.Type: ApplicationFiled: June 12, 2009Publication date: December 16, 2010Applicant: Cray Inc.Inventor: Terry D. Greyzck
-
Patent number: 7849453Abstract: One embodiment of the present invention provides a system that generates code for software scouting the regions of a program. During operation, the system receives source code for a program. The system then compiles the source code. In the first step of the compilation process, the system identifies a first set of loops from a hierarchy of loops in the source code, wherein each loop in the first set of loops contains at least one effective prefetch candidate. Then, from the first set of loops, the system identifies a second set of loops where scout-mode prefetching is profitable. Next, for each loop in the second set of loops, the system produces executable code for a helper-thread which contains a prefetch instruction for each effective prefetch candidate. At runtime the helper-thread is executed in parallel with the main thread in advance of where the main thread is executing to prefetch data items for the main thread.Type: GrantFiled: November 9, 2005Date of Patent: December 7, 2010Assignee: Oracle America, Inc.Inventors: Partha P. Tirumalai, Yonghong Song, Spiros Kalogeropulos
-
Patent number: 7827542Abstract: A compiler apparatus that improves the performance of loop processing. The compiler apparatus translates a C program that includes a loop into a machine language program, and includes: a movement judgment unit that judges whether or not an instruction which is positioned outside of the loop of the C program can be moved into the loop, based on a state of live ranges of variables used in the instruction; a movement execution unit that moves the instruction into the loop in the case where the movement judgment unit judges that the instruction can be moved into the loop, thereby generating an intermediate program; and a translation unit that translates the intermediate program into the machine language program.Type: GrantFiled: September 25, 2006Date of Patent: November 2, 2010Assignee: Panasonic CorporationInventors: Hajime Ogawa, Ryoko Miyachi, Toshiyuki Sakata
-
Patent number: 7823141Abstract: A method for executing a loop in an application that includes executing iterations in a first segment of the loop by a base thread, logging memory transactions that occur during execution of iterations in the first segment by a co-inspector thread to obtain a co-inspector log, executing iterations in a second segment of the loop by a co-thread to obtain temporary results, logging memory transactions that occur during execution of iterations in the second segment to obtain a co-thread log, and comparing the co-inspector log and the co-thread log to determine whether a thread interdependency exists.Type: GrantFiled: September 30, 2005Date of Patent: October 26, 2010Assignee: Oracle America, Inc.Inventors: Phyllis E. Gustafson, Michael H. Paleczny, Christopher A. Vick, Olaf Manczak, Jay R. Freeman, Yuguang Wu
-
Patent number: 7814468Abstract: A method for loop reformulation is provided such that a single exit ill-formed loop (SEIFL) can be reformulated into a reformulated code block that contains a transformed well-formed loop (TWFL). A SEIFL loop is a loop that can exit from the loop body of the loop. After the loop reformulation, the TWFL of the reformulated code block can only exit from the end of the loop. The reformulated code block will replace the SEIFL in the compiler's internal representation (IR) such that a more efficient executable machine code can be generated by optimizing the reformulated compiler's IR.Type: GrantFiled: April 20, 2005Date of Patent: October 12, 2010Assignee: Oracle America, Inc.Inventors: Yonghong Song, Xiangyun Kong
-
Publication number: 20100257516Abstract: A method, apparatus and program product are provided for parallelizing analysis and optimization in a compiler. A plurality of basic blocks and a subset of data points of a computer program is prepared for processing by a main thread selected from a plurality of hardware threads. The plurality of prepared basic blocks and subset of data points are placed in a shared data structure by the main thread. A prepared basic block of the plurality of prepared basic blocks and/or a tuple associated with the subset of data points is concurrently retrieved from the shared data structure by a work thread selected from the plurality of hardware threads. A compiler analysis or optimization is performed on the prepared basic block or tuple by the work thread.Type: ApplicationFiled: April 2, 2009Publication date: October 7, 2010Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Robert R. Roediger, William J. Schmidt
-
Patent number: 7810032Abstract: A method and system for computing statistical parameters for sets of data items, by executing instructions of a computer program that is coded within a spreadsheet. Each set is generated in a time sequence that is specific to each set. For each time sequence, each data item is one data value or a pair of data values. The data items appears one-at-a-time in only one cell structure of the spreadsheet at each time in the time sequence. The one cell structure is a single cell or two cells. A loop of iterations is performed for each set. In each iteration, a command is responded to by updating the statistical parameters based on the latest data item in the one cell structure in the spreadsheet. The updated statistical parameter are stored in a parameter field of the spreadsheet assigned to each statistical parameter.Type: GrantFiled: September 13, 2005Date of Patent: October 5, 2010Assignee: International Business Machines CorporationInventors: Frederic Bauchot, Gerard Marmigere
-
Patent number: 7805413Abstract: A program stored in a storage device is read. Partial compression, in the element in an array in a loop nest in the program, is performed by replacing an element local only in the loop nest in the entire program with a scalar variable. Access to an original array is inserted into a program for an non-local element.Type: GrantFiled: December 22, 2003Date of Patent: September 28, 2010Assignee: Fujitsu LimitedInventor: Akira Hosoi
-
Publication number: 20100235819Abstract: In embodiments, prior to compilation into machine code, a preprocessor generates directives by processing a source code and/or bytecode representation of a program and/or selecting default directives. The preprocessor embeds the directives in a bytecode representation of the program or a separate stream associated with the bytecode representation of the program. A just-in-time compiler may compile the bytecode representation into machine code directed by the embedded directives in one pass and/or a bytecode interpreter may interpret the bytecode representation of the program. In some embodiments, a computing device generates bytecodes during execution of a program, selects default directives, and embeds the default directives in the bytecodes or a separate stream associated with the bytecodes prior to compilation of the bytecodes into machine code.Type: ApplicationFiled: March 10, 2009Publication date: September 16, 2010Applicant: Sun Microsystems, Inc.Inventor: John Robert Rose
-
Patent number: 7797692Abstract: A system that estimates a dominant computational resource which is used by a computer program. During operation, for each basic block in the computer program, the system determines a nesting level for the basic block. Next, the system selects basic blocks with nesting levels greater than a specified threshold. For each selected basic block, the system analyzes the basic block to estimate the dominant computational resource used by the basic block. The system then uses the estimated dominant computational resources for the selected basic blocks to estimate the dominant computational resource for the computer program.Type: GrantFiled: May 12, 2006Date of Patent: September 14, 2010Assignee: Google Inc.Inventor: Grzegorz J. Czajkowski
-
Patent number: 7793278Abstract: Systems and methods perform affine partitioning on a code stream to produce code segments that may be parallelized. The code segments include copies of the original code stream with conditional inserted that aid in parallelizing code. The conditional is formed by determining the constraints on a processor variable determined by the affine partitioning and applying the constraints to the original code stream.Type: GrantFiled: September 30, 2005Date of Patent: September 7, 2010Assignee: Intel CorporationInventors: Zhao Hui Du, Shih-Wei Liao, Gansha Wu, Guei-Yuan Lueh
-
Patent number: 7788659Abstract: The present invention is a method of eliminating loops from a computer program by receiving the program, graphing its function and control, identifying its entry point, and identifying groups of loops connected to its entry point. Stop if there are no such groups. Otherwise, selecting a group of loops. Then, identifying the selected group's entry point. If the selected group includes no group of loops having a different entry point then replacing it with a recursive or non-recursive function, reconfiguring each connection entering and exiting the selected group to preserve their functionality, and returning to the fifth step. Otherwise, identifying groups of loops in the selected group connected to, but having different entry points and returning to the loop selection step.Type: GrantFiled: February 27, 2007Date of Patent: August 31, 2010Assignee: United States of America as represented by the Director, the National Security AgencyInventor: Francis S. Rimlinger
-
Publication number: 20100218196Abstract: Methods, apparatus and computer software product for source code optimization are provided. In an exemplary embodiment, a first custom computing apparatus is used to optimize the execution of source code on a second computing apparatus. In this embodiment, the first custom computing apparatus contains a memory, a storage medium and at least one processor with at least one multi-stage execution unit. The second computing apparatus contains at least two multi-stage execution units that allow for parallel execution of tasks. The first custom computing apparatus optimizes the code for parallelism, locality of operations and contiguity of memory accesses on the second computing apparatus. This Abstract is provided for the sole purpose of complying with the Abstract requirement rules. This Abstract is submitted with the explicit understanding that it will not be used to interpret or to limit the scope or the meaning of the claims.Type: ApplicationFiled: April 16, 2010Publication date: August 26, 2010Inventors: Allen K. Leung, Benoit Meister, Nicolas T. Vasilache, David E. Wohlford, Cedric Bastoul, Peter Szilagyi, Richard A. Lethin
-
Publication number: 20100205592Abstract: A system and method for discovering a set of possible iteration sequences for a given loop in a software program is described, to transform the loop representation. In a program containing a loop, the loop is partitioned into a plurality of portions based on splitting criteria. Labels are associated with the portions, and an initial loop automaton is constructed that represents the loop iterations as a regular language over the labels corresponding to the portions in the program. Subsequences of the labels are analyzed to determine infeasibility of the subsequences permitted in the automaton. The automaton is refined by removing all infeasible subsequences to discover a set of possible iteration sequences in the loop. The resulting loop automaton is used in a subsequent program verification or analysis technique to find violations of correctness properties in programs.Type: ApplicationFiled: February 8, 2010Publication date: August 12, 2010Applicant: NEC Laboratories America, Inc.Inventors: SRIRAM SANKARANARAYANAN, Aarti Gupta, Gogul Balakrishnan
-
Patent number: 7774766Abstract: Various embodiments of the present invention relate to methods and systems for optimizing an intermediate code in a compilation logic. The intermediate code is optimized by performing reassociation in software loops. The intermediate code includes at least one critical recurrence cycle. The performance of reassociation in software loops can reduce a critical recurrence cycle in them, which can speed up their execution. The subject method can include the determination of one or more critical recurrence cycles in a software loop. The method can also include the determination of at least one edge in a critical recurrence cycle, with respect to which reassociation can be performed, if one or more pre-determined criteria are met. The method can further include performing reassociation of a dependee and a dependent of an edge. In an embodiment, when one or more pre-determined criteria are met, the logic of the software loop is maintained after performing reassociation of the dependee and the dependent of the edge.Type: GrantFiled: September 29, 2005Date of Patent: August 10, 2010Assignee: Intel CorporationInventors: Kalyan Muthukumar, Daniel M Lavery
-
Patent number: 7757222Abstract: Code is affine partitioned to generate affine partitioning mappings. Parallel code is generated based on the affine partitioning mappings. Generating the parallel code includes coalescing loops in the parallel code generated from the affine partitioning mappings to generate coalesced parallel code and optimizing the coalesced parallel code.Type: GrantFiled: September 30, 2005Date of Patent: July 13, 2010Assignee: Intel CorporationInventors: Shih-wei Liao, Zhao Hui Du, Bu Qi Cheng, Gansha Wu, Guei-Yuan Lueh
-
Publication number: 20100175056Abstract: A compiler comprises an analysis unit that detects directives (options and pragmas) from a user to the compiler, an optimization unit that is made up of a processing unit (a global region allocation unit, a software pipelining unit, a loop unrolling unit, a “if” conversion unit, and a pair instruction generation unit) that performs individual optimization processing designated by options and pragmas from a user, following the directives and the like from the analysis unit, etc. The global region allocation unit performs optimization processing, following designation of the maximum data size of variables to be allocated to a global region, designation of variables to be allocated to the global region, and options and pragmas regarding designation of variables not to be allocated in the global region.Type: ApplicationFiled: February 16, 2010Publication date: July 8, 2010Inventors: Hajime OGAWA, Taketo Heishi, Toshiyuki Sakata, Shuichi Takayama, Shohei Michimoto, Tomoo Hamada, Ryoko Miyachi
-
Patent number: 7747993Abstract: A method of ordering instructions. The method can include placing a first instruction that consumes a value of an object before a second instruction that produces the value of the object such that the first instruction is processed before the second instruction and a physical location is allocated to the value of the object upon processing the first instruction.Type: GrantFiled: December 30, 2004Date of Patent: June 29, 2010Assignee: Michigan Technological UniversityInventor: Soner Onder
-
Publication number: 20100146495Abstract: A computing system has an amount of shared cache, and performs runtime automatic parallelization wherein when a parallelized loop is encountered, a main thread shares the workload with at least one other non-main thread. A method for providing interprocedural prefetching includes compiling source code to produce compiled code having a main thread including a parallelized loop. Prior to the parallelized loop in the main thread, the main thread includes prefetching instructions for the at least one other non-main thread that shares the workload of the parallelized loop. As a result, the main thread prefetches data into the shared cache for use by the at least one other non-main thread.Type: ApplicationFiled: December 10, 2008Publication date: June 10, 2010Applicant: SUN MICROSYSTEMS, INC.Inventors: Yonghong Song, Spiros Kalogeropulos, Partha P. Tirumalai
-
Patent number: 7721267Abstract: A software pipelined loop tracing method involves inhibiting an output of trace data at a start of a software pipelined loop (SPLOOP). A skip in an output trace packet is indicated if the SPLOOP is skipped, and the SPLOOP is indicated at a cycle of an epilog state in the output trace packet if the SPLOOP is not skipped. An iteration count indication SPLOOP information and a position within a SPLOOP, is maintained. A periodic SPLOOP marker (PerSP) coinciding with a sync point is output if the SPLOOP is active.Type: GrantFiled: May 16, 2006Date of Patent: May 18, 2010Assignee: Texas Instruments IncorporatedInventor: Manisha Agarwala
-
Publication number: 20100122069Abstract: A macroscalar processor architecture is described herein. In one embodiment, a processor receives instructions of a program loop having a vector block and a sequence block intended to be executed after the vector block, where the processor includes multiple slices and each of the slices is capable of executing an instruction of an iteration of the program loop substantially in parallel. For each iteration of the program loop, the processor executes an instruction of the sequence block using one of the slices while executing instructions of the vector block using a remainder of the slices substantially in parallel. Other methods and apparatuses are also described.Type: ApplicationFiled: November 6, 2009Publication date: May 13, 2010Inventor: Jeffry E. Gonion
-
Patent number: 7712091Abstract: A method and system for optimizing the execution of a software loop is provided. The method involves the determination of an edge in a critical recurrence cycle in the software loop. The edge is a dependency link between two instructions and contains a dependee and a dependent. The dependee is an instruction that produces a result, and the dependent is an instruction that uses the result. The method further involves performing predicate promotion of at least one of the dependee and the dependent if one or more pre-determined conditions are met.Type: GrantFiled: September 30, 2005Date of Patent: May 4, 2010Assignee: Intel CorporationInventors: Kalyan Muthukumar, Robyn A. Sampson, Daniel Lavery
-
Patent number: 7702856Abstract: The prefetch distance to be used by a prefetch instruction may not always be correctly calculated using compile-time information. In one embodiment, the present invention generates prefetch distance calculation code to dynamically calculate a prefetch distance used by a prefetch instruction at run-time.Type: GrantFiled: November 9, 2005Date of Patent: April 20, 2010Assignee: Intel CorporationInventors: Rakesh Krishnaiyer, Somnath Ghosh, Abhay Kanhere
-
Patent number: 7698696Abstract: A compiler comprises an analysis unit that detects directives (options and pragmas) from a user to the compiler, an optimization unit that is made up of a processing unit (a global region allocation unit, a software pipelining unit, a loop unrolling unit, a “if” conversion unit, and a pair instruction generation unit) that performs individual optimization processing designated by options and pragmas from a user, following the directives and the like from the analysis unit, etc. The global region allocation unit performs optimization processing, following designation of the maximum data size of variables to be allocated to a global region, designation of variables to be allocated to the global region, and options and pragmas regarding designation of variables not to be allocated in the global region.Type: GrantFiled: June 30, 2003Date of Patent: April 13, 2010Assignee: Panasonic CorporationInventors: Hajime Ogawa, Taketo Heishi, Toshiyuki Sakata, Shuichi Takayama, Shohei Michimoto, Tomoo Hamada, Ryoko Miyachi
-
Patent number: 7689980Abstract: Linear transformations of statements in code are performed to generate linear expressions associated with the statements. Parallel code is generated using the linear expressions. Generating the parallel code includes splitting the computation-space of the statements into intervals and generating parallel code for the intervals.Type: GrantFiled: September 30, 2005Date of Patent: March 30, 2010Assignee: Intel CorporationInventors: Zhao Hui Du, Shih-wei Liao, Gansha Wu, Guei-Yuan Lueh
-
Publication number: 20100070956Abstract: Methods, apparatus and computer software product for source code optimization are provided. In an exemplary embodiment, a first custom computing apparatus is used to optimize the execution of source code on a second computing apparatus. In this embodiment, the first custom computing apparatus contains a memory, a storage medium and at least one processor with at least one mufti-stage execution unit. The second computing apparatus contains at least two multi-stage execution units that avow for parallel execution of tasks. The first custom computing apparatus optimizes the code for both parallelism and locality of operations on the second computing apparatus. This Abstract is provided for the sole purpose of complying with the Abstract requirement rules. This Abstract is submitted with the explicit understanding that it will not be used to interpret or to limit the scope or the meaning of the claims.Type: ApplicationFiled: September 16, 2009Publication date: March 18, 2010Inventors: Allen Leung, Nicolas T. Vasilache, Benoit Meister, Richard A. Lethin
-
Patent number: 7669194Abstract: A mechanism for minimizing effective memory latency without unnecessary cost through fine-grained software-directed data prefetching using integrated high-level and low-level code analysis and optimizations is provided. The mechanism identifies and classifies streams, identifies data that is most likely to incur a cache miss, exploits effective hardware prefetching to determine the proper number of streams to be prefetched, exploits effective data prefetching on different types of streams in order to eliminate redundant prefetching and avoid cache pollution, and uses high-level transformations with integrated lower level cost analysis in the instruction scheduler to schedule prefetch instructions effectively.Type: GrantFiled: August 26, 2004Date of Patent: February 23, 2010Assignee: International Business Machines CorporationInventors: Roch Georges Archambault, Robert James Blainey, Yaoqing Gao, Allan Russell Martin, James Lawrence McInnes, Francis Patrick O'Connell
-
Patent number: 7665079Abstract: It is one object of the present invention to provide a program execution method for performing greater optimization. A program execution apparatus according to the present invention performs a transfer from an interpreter process to a compiled code process in the course of the execution of a method. At this time, if no problem occurs when a transfer point is moved to the top of a loop, the transfer point for code is so moved. And when a transfer point is located inside a loop, a point that post-dominates the top of the loop and the transfer point is copied to a position immediately preceding the loop. Then, information for generating recalculation code is provided for the transfer point, and a recalculation is performed.Type: GrantFiled: November 8, 2000Date of Patent: February 16, 2010Assignee: International Business Machines CorporationInventors: Toshiaki Yasue, Kazunori Ogata, Kazuaki Ishizaki, Hideaki Komatsu
-
Patent number: 7665078Abstract: A method for optimizing a code sequence by tuning the representations of an instruction set based on the frequency of operations performed by the code sequence. For example, the number of bit symbols used to represent a code sequence may be reduced using the present invention.Type: GrantFiled: August 21, 2003Date of Patent: February 16, 2010Assignee: Gateway, Inc.Inventor: Frank Liebenow
-
Publication number: 20100023700Abstract: Reducing coherency problems in a data processing system is provided. Source code that is to be compiled is received and analyzed to identify at least one of a plurality of loops that contain a memory reference. A determination is made as to whether the memory reference is an access to a global memory that should be handled by a direct buffer. Responsive to an indication that the memory reference is an access to the global memory that should be handled by the direct buffer, the memory reference is marked for direct buffer transformation. The direct buffer transformation is then applied to the memory reference.Type: ApplicationFiled: July 22, 2008Publication date: January 28, 2010Applicant: International Business Machines CorporationInventors: Tong Chen, John K. O'Brien, Tao Zhang
-
Publication number: 20100023932Abstract: A mechanism for efficient software cache accessing with handle reuse is provided. The mechanism groups references in source code into a reference stream with the reference stream having a size equal to or less than a size of a software cache line. The source code is transformed into optimized code by modifying the source code to include code for performing at most two cache lookup operations for the reference stream to obtain two cache line handles. Moreover, the transformation involves inserting code to resolve references in the reference stream based on the two cache line handles. The optimized code may be output for generation of executable code.Type: ApplicationFiled: July 22, 2008Publication date: January 28, 2010Applicant: International Business Machines CorporationInventors: Alexandre E. Eichenberger, Marc Gonzalez Tallada, John K. O'Brien
-
Patent number: 7647577Abstract: Provides methods for transforming a flowchart to an equivalent tree diagram, methods for transforming an equivalent tree diagram to a flowchart, methods for verifying reorganization of a flowchart, methods for editing a flowchart, methods for creating a flowchart and a flowchart editor. A flowchart includes one or more logic structures and one or more processing activities in said one or more logic structures. The method for transforming a flowchart to an equivalent tree diagram comprises: traversing said flowchart; transforming said one or more logic structures in said flowchart to one or more branching nodes in said tree diagram; and transforming one or more processing activities in said logic structures of said flowchart to one or more leaf nodes below corresponding branching nodes in said tree diagram. Further, edition of a flowchart and verification of reorganization of a flowchart are performed by utilizing an equivalent tree diagram.Type: GrantFiled: May 27, 2005Date of Patent: January 12, 2010Assignee: International Business Machines CorporationInventors: Jian Wang, Jun Zhu, Sheng Ye, Jing Li, Hai Qi Liang, Ying Liu, Ying Nan Zuo
-
Publication number: 20090328020Abstract: Interface optimization is provided using a closed system in which all the individual software components in the system are known to the compiler at a single point in time. This knowledge enables significant opportunities to optimize the implementation of interfaces on a set of implemented objects. When code is compiled, because the compiler knows the full list of interfaces and the objects which implement the interfaces, it can improve execution and working set (i.e., recently referenced pages in a program's virtual address space) when implementing the interfaces on objects. This improvement may be realized by reducing the size of interface lookup tables which map each interface to the object types which implement that particular interface.Type: ApplicationFiled: June 28, 2008Publication date: December 31, 2009Applicant: Microsoft CorporationInventors: Jeffrey E. Stall, Jonathon Michael Stall
-
Publication number: 20090328021Abstract: In one embodiment of the invention, a method for fusing a first loop nested in a first IF statement with a second loop nested in a second IF statement without the use of modified and referenced (mod-ref) information to determine if certain conditional statements in the IF statements retain variable values.Type: ApplicationFiled: June 30, 2008Publication date: December 31, 2009Inventors: John L. Ng, Robert Cox, Dmitry V. Budanov