Including Loop Patents (Class 717/160)
  • Publication number: 20110161923
    Abstract: The system includes a command set defining a plurality of navigation commands for an audiovisual reproduction apparatus and a human-oriented scripting program for automatically authoring a navigation structure for use in a stand alone audiovisual product playable in the audiovisual reproduction apparatus. The scripting program includes an iterative loop with a variable adjusted according to the iterations of the loop. The scripting program is operable to automatically, for each iteration of the loop; select from the plurality of navigation commands a navigation command defined according to the variable as adjusted for each iteration of the loop; and add the navigation command to an intermediate representation of the navigation structure. An associated method is also provided.
    Type: Application
    Filed: April 19, 2005
    Publication date: June 30, 2011
    Applicant: ZOOtech Limited
    Inventor: Stuart Green
  • Patent number: 7962906
    Abstract: A compiler includes a mechanism for employing multiple synergistic processors to execute long vectors. The compiler receives a single source program. The compiler identifies vectorizable loop code in the single source program and extracts the vectorizable loop code from the single source program. The compiler then compiles the extracted vectorizable loop code for a plurality of synergistic processors. The compiler also compiles a remainder of the single source program for a principal processor to form an executable main program such that the executable main program controls operation of the executable vectorizable loop code on the plurality of synergistic processors.
    Type: Grant
    Filed: March 15, 2007
    Date of Patent: June 14, 2011
    Assignee: International Business Machines Corporation
    Inventors: John Kevin Patrick O'Brien, Kathryn M. O'Brien, Daniel Arthur Prener
  • Patent number: 7937695
    Abstract: Based on operations within an uncounted loop of source code, one or more calculations are generated for determining, at runtime, an expected number of iterations through which the uncounted loop can iterate before encountering an exception corresponding to at least one target exception check. A copy of the uncounted loop omitting each target exception check is generated. The uncounted loop, the copy of the uncounted loop, and the one or more calculations are arranged in compiled code so that at runtime program flow enters the copy of the uncounted loop. If a maximum number of iterations of the copy of the uncounted loop is reached, program flow proceeds from the copy of the uncounted loop to the uncounted loop. The maximum number of iterations is no more than the smallest member of a set consisting of the expected number of iterations for each target exception check.
    Type: Grant
    Filed: April 27, 2007
    Date of Patent: May 3, 2011
    Assignee: International Business Machines Corporation
    Inventor: Mark Graham Stoodley
  • Patent number: 7908256
    Abstract: A computer-implementable method, system and computer-usable medium. One or more objects among a plurality of objects can be processed utilizing a data-processing apparatus/system. One or more lock reservations can be applied among a group of lock reservations over a multiple sequential lock operations with respect the particular object. Thereafter, the lock reservation can be cancelled with respect to the last monitor exit operation in order to eliminate lock operations where traditional lock coarsening cannot be applied.
    Type: Grant
    Filed: November 30, 2007
    Date of Patent: March 15, 2011
    Assignee: International Business Machines Corporation
    Inventors: Nikola Grcevski, Peter Burka
  • Publication number: 20110047534
    Abstract: A system and method for optimization of code with non-adjacent loops. A compiler builds a node tree, which is not a control flow graph, that represents parent-child relationships of nodes of a computer program. Each node represents a control flow statement or a straight-line block of statements of the computer program. If a non-adjacent loop pair of nodes satisfy predetermined conditions, the compiler may perform legal code transformations on the computer program and corresponding node transformations on the node tree. These transformations may make adjacent this pair of loop nodes. The compiler may be configured to perform legal code transformations, such as head and tail duplication, code motion, and if-merging, in order to make adjacent these two loop nodes. Then loop fusion may be performed on this loop pair in order to increase instruction level parallelism (ILP) within an optimized version of the original source code.
    Type: Application
    Filed: August 22, 2009
    Publication date: February 24, 2011
    Inventors: Mei Ye, Dinesh Suresh, Dz-ching Ju, Michael Lai
  • Patent number: 7890942
    Abstract: A method and system for substituting array values (i.e., expressions) in a program at compile time. An initialization of an array is identified in a loop. The initialization is an assignment of an expression (i.e., a constant or a function of an induction variable to elements of the array). The expression is stored in a table that associates the expression with the array and indices of the array. An assignment statement is detected that is to assign at least one element of the initialized elements. The expression is retrieved from the table based on the expression being associated with the array and corresponding indices. The expression is substituted for the at least one element so that the expression is to be assigned by the assignment statement. The process of substituting array values is extended to interprocedural analysis.
    Type: Grant
    Filed: August 15, 2006
    Date of Patent: February 15, 2011
    Assignee: International Business Machines Corporation
    Inventor: Rohini Nair
  • Patent number: 7890943
    Abstract: Instructions that have no dependence constraint between them and other instructions in a loop of a critical section may be moved out of the critical section so that the size of the critical section may be reduced. A flow graph of a program including the critical section may be generated, which includes loops. The flow graph may be transformed based on which any unnecessary instructions in loops may be moved out of the critical section. Subsequently, the original flow graph of the critical section may be recovered from the transformed flow graph.
    Type: Grant
    Filed: March 30, 2007
    Date of Patent: February 15, 2011
    Assignee: Intel Corporation
    Inventors: Xiaofeng Guo, Jinquan Dai, Long Li
  • Patent number: 7890940
    Abstract: To collect frequencies with which processes of a program are executed at high speed. A compiler apparatus for optimizing a program based on frequencies with which each process is executed has a loop process detection portion for detecting a repeatedly executed loop process of the program, a loop process frequency collection portion for collecting loop process frequencies with which the loop process is executed in the program, an in-loop process frequency collection portion for collecting in-loop process frequencies with which, as against times of execution of loop process, each of a plurality of in-loop processes included in the loop process is executed, an in-loop execution information generating portion for generating in-loop execution information indicating the frequencies with which each of the plurality of in-loop processes is executed in the case where the program is executed, and an optimization portion for optimizing the program based on the in-loop execution information.
    Type: Grant
    Filed: January 11, 2008
    Date of Patent: February 15, 2011
    Assignee: International Business Machines Corporation
    Inventors: Hideaki Komatsu, Toshio Suganuma, Toshiaki Yasue
  • Publication number: 20110029962
    Abstract: A method for vectorization of a block of code is provided. The method comprises receiving a first block of code as input; and converting the first block of code into at least a second block of code and a third block of code. The first block of code accesses a first set of memory addresses that are potentially misaligned. The second block of code performs conditional leaping address incrementation to selectively access a first subset of the first set of memory addresses. The third block of code accesses a second subset of the first set of memory addresses starting from an aligned memory address, simultaneously accessing multiple memory addresses at a time. No memory address belongs to both the first subset and the second subset of memory addresses.
    Type: Application
    Filed: July 28, 2009
    Publication date: February 3, 2011
    Applicant: International Business Machines Corporation
    Inventors: Dorit Nuzman, Ira Rosen, Ayal Zaks
  • Patent number: 7882498
    Abstract: Provided are a method, system, and program for parallelizing source code with a compiler. Source code including source code statements is received. The source code statements are processed to determine a dependency of the statements. Multiple groups of statements are determined from the determined dependency of the statements, wherein statements in one group are dependent on one another. At least one directive is inserted in the source code, wherein each directive is associated with one group of statements. Resulting threaded code is generated including the inserted at least one directive. The group of statements to which the directive in the resulting threaded code applies are processed as a separate task. Each group of statements designated by the directive to be processed as a separate task may be processed concurrently with respect to other groups of statements.
    Type: Grant
    Filed: March 31, 2006
    Date of Patent: February 1, 2011
    Assignee: Intel Corporation
    Inventors: Guilherme D. Ottoni, Xinmin Tian, Hong Wang, Richard A. Hankins, Wei Li, John Shen
  • Patent number: 7877739
    Abstract: A computer-implemented method for determining whether an array within a loop can be privatized for that loop is presented. The method calculates the array sections that require first or last privatization and copies only those sections, reducing the privatization overhead of the known solutions.
    Type: Grant
    Filed: October 9, 2006
    Date of Patent: January 25, 2011
    Assignee: International Business Machines Corporation
    Inventors: Roch G. Archambault, Erik P. Charlebois, Guansong Zhang
  • Patent number: 7873954
    Abstract: Stack signature marking segments are inserted into re-entrant programming source code modules prior to compilation of the modules at each code module entry point and at each code module exit point, followed by producing one or more executable programs from the programming source code modules. Upon execution of instances of the executable programs, the inserted segments assign unique, non-duplicated module identifier values to the instances of the code modules, generate an instance count for each instantiation of executable code module in the stack signature for each object instance dynamically created during runtime of a re-entrant executable code module, and push onto a processing stack the module identifier values and the instance counts within stack frames allocated to each of the executable program instances.
    Type: Grant
    Filed: May 12, 2006
    Date of Patent: January 18, 2011
    Assignee: International Business Machines Corporation
    Inventors: Lorin Ullmann, Allen Chester Wynn
  • Patent number: 7865885
    Abstract: Dynamic optimization of application code is performed by selecting a portion of the application code as a possible transaction. A transaction has a property that when it is executed, it is either atomically committed or atomically aborted. Determining whether to convert the selected portion of the application code to a transaction includes determining whether to apply at least one of a group of code optimizations to the portion of the application code. If it is determined to apply at least one of the code optimizations of the group of optimizations to the portion of application code, then the optimization is applied to the portion of the code and the portion of the code is converted to a transaction.
    Type: Grant
    Filed: September 27, 2006
    Date of Patent: January 4, 2011
    Assignee: Intel Corporation
    Inventors: Youfeng Wu, Cheng Wang, Ho-seop Kim
  • Patent number: 7865886
    Abstract: A method and apparatus for to blocking nested loops having feedback or feedforward indexing. An embodiment of a method includes receiving a computer code segment, the segment including a first inner loop and a second outer loop, the inner loop being within the outer loop and the inn loops having a one-dimensional iteration space that is independent of the outer loop. The first loop is indexed by a variable I over a contiguous one-dimensional iteration space and addresses one or more data arrays with a shift in the index. The method further includes dividing a two-dimensional iteration space of the first loop and the second loop into multiple contiguous windows, where the second loop uses only one window of the plurality of windows during each iteration and the plurality of windows cover the iteration space. The method includes modifying the computer code segment by adding a third outer loop outside the second loop of the segment, the third loop encompassing the first loop and the second loop.
    Type: Grant
    Filed: November 28, 2005
    Date of Patent: January 4, 2011
    Assignee: Intel Corporation
    Inventor: Hans-Joachim Plum
  • Patent number: 7856629
    Abstract: A compiler apparatus, which can perform software pipelining optimization that has a considerable effect of reducing the number of execution cycles taken to complete a loop process, converts a source program into a machine program for a processor which is capable of parallel processing. The compiler apparatus is composed of: a parsing unit operable to parse the source program and then to convert the source program into an intermediate program which is described in an intermediate language; an optimization unit operable to optimize the intermediate program; and a conversion unit operable to convert the optimized intermediate program into the machine language program, wherein the optimization unit is operable to execute software pipelining, by inserting a transfer instruction, which is used for transferring data between operands, into a loop process included in the intermediate program so that a data dependence relation is changed.
    Type: Grant
    Filed: May 24, 2006
    Date of Patent: December 21, 2010
    Assignee: Panasonic Corporation
    Inventors: Shohei Michimoto, Taketo Heishi, Hajime Ogawa, Teruo Kawabata
  • Publication number: 20100318980
    Abstract: Described is an analysis tool/techniques for determining the computational complexity of a computer program, including when the program includes procedures having nested loops and/or multi-path loops. First, multi-path loops are converted into code-fragments consisting of simpler loops via a transformation called control flow refinement. Progress invariants are determined for appropriate locations in the procedure to represent relationships between a state that can arise at that program location and the previous state at that location. A bound finding mechanism (such as one based on pattern matching) is then used to compute loop bounds from progress invariants. These bounds are then composed appropriately to determine a precise bound for the enclosing procedure.
    Type: Application
    Filed: June 13, 2009
    Publication date: December 16, 2010
    Applicant: Microsoft Corporation
    Inventors: Sumit Gulwani, Sagar Jain, Eric J. Koskinen
  • Publication number: 20100318979
    Abstract: A system and method of compiling program code, wherein the program code includes an operation on an array of data elements stored in memory of a computer system. The program code is scanned for an equation which may have recurring data points. The equation is then replaced with vectorized machine executable code, wherein the machine executable code comprises a nested loop and wherein the nested loop comprises an exterior loop and a virtual interior loop. The exterior loop decomposes the equation into a plurality of loops of length N, wherein N is an integer greater than one. The virtual interior loop executes vector operations corresponding to the N length loop to form a result vector resident in memory, wherein the virtual interior loop includes a vector atomic memory operation (AMO) instruction.
    Type: Application
    Filed: June 12, 2009
    Publication date: December 16, 2010
    Applicant: Cray Inc.
    Inventor: Terry D. Greyzck
  • Patent number: 7849453
    Abstract: One embodiment of the present invention provides a system that generates code for software scouting the regions of a program. During operation, the system receives source code for a program. The system then compiles the source code. In the first step of the compilation process, the system identifies a first set of loops from a hierarchy of loops in the source code, wherein each loop in the first set of loops contains at least one effective prefetch candidate. Then, from the first set of loops, the system identifies a second set of loops where scout-mode prefetching is profitable. Next, for each loop in the second set of loops, the system produces executable code for a helper-thread which contains a prefetch instruction for each effective prefetch candidate. At runtime the helper-thread is executed in parallel with the main thread in advance of where the main thread is executing to prefetch data items for the main thread.
    Type: Grant
    Filed: November 9, 2005
    Date of Patent: December 7, 2010
    Assignee: Oracle America, Inc.
    Inventors: Partha P. Tirumalai, Yonghong Song, Spiros Kalogeropulos
  • Patent number: 7827542
    Abstract: A compiler apparatus that improves the performance of loop processing. The compiler apparatus translates a C program that includes a loop into a machine language program, and includes: a movement judgment unit that judges whether or not an instruction which is positioned outside of the loop of the C program can be moved into the loop, based on a state of live ranges of variables used in the instruction; a movement execution unit that moves the instruction into the loop in the case where the movement judgment unit judges that the instruction can be moved into the loop, thereby generating an intermediate program; and a translation unit that translates the intermediate program into the machine language program.
    Type: Grant
    Filed: September 25, 2006
    Date of Patent: November 2, 2010
    Assignee: Panasonic Corporation
    Inventors: Hajime Ogawa, Ryoko Miyachi, Toshiyuki Sakata
  • Patent number: 7823141
    Abstract: A method for executing a loop in an application that includes executing iterations in a first segment of the loop by a base thread, logging memory transactions that occur during execution of iterations in the first segment by a co-inspector thread to obtain a co-inspector log, executing iterations in a second segment of the loop by a co-thread to obtain temporary results, logging memory transactions that occur during execution of iterations in the second segment to obtain a co-thread log, and comparing the co-inspector log and the co-thread log to determine whether a thread interdependency exists.
    Type: Grant
    Filed: September 30, 2005
    Date of Patent: October 26, 2010
    Assignee: Oracle America, Inc.
    Inventors: Phyllis E. Gustafson, Michael H. Paleczny, Christopher A. Vick, Olaf Manczak, Jay R. Freeman, Yuguang Wu
  • Patent number: 7814468
    Abstract: A method for loop reformulation is provided such that a single exit ill-formed loop (SEIFL) can be reformulated into a reformulated code block that contains a transformed well-formed loop (TWFL). A SEIFL loop is a loop that can exit from the loop body of the loop. After the loop reformulation, the TWFL of the reformulated code block can only exit from the end of the loop. The reformulated code block will replace the SEIFL in the compiler's internal representation (IR) such that a more efficient executable machine code can be generated by optimizing the reformulated compiler's IR.
    Type: Grant
    Filed: April 20, 2005
    Date of Patent: October 12, 2010
    Assignee: Oracle America, Inc.
    Inventors: Yonghong Song, Xiangyun Kong
  • Publication number: 20100257516
    Abstract: A method, apparatus and program product are provided for parallelizing analysis and optimization in a compiler. A plurality of basic blocks and a subset of data points of a computer program is prepared for processing by a main thread selected from a plurality of hardware threads. The plurality of prepared basic blocks and subset of data points are placed in a shared data structure by the main thread. A prepared basic block of the plurality of prepared basic blocks and/or a tuple associated with the subset of data points is concurrently retrieved from the shared data structure by a work thread selected from the plurality of hardware threads. A compiler analysis or optimization is performed on the prepared basic block or tuple by the work thread.
    Type: Application
    Filed: April 2, 2009
    Publication date: October 7, 2010
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Robert R. Roediger, William J. Schmidt
  • Patent number: 7810032
    Abstract: A method and system for computing statistical parameters for sets of data items, by executing instructions of a computer program that is coded within a spreadsheet. Each set is generated in a time sequence that is specific to each set. For each time sequence, each data item is one data value or a pair of data values. The data items appears one-at-a-time in only one cell structure of the spreadsheet at each time in the time sequence. The one cell structure is a single cell or two cells. A loop of iterations is performed for each set. In each iteration, a command is responded to by updating the statistical parameters based on the latest data item in the one cell structure in the spreadsheet. The updated statistical parameter are stored in a parameter field of the spreadsheet assigned to each statistical parameter.
    Type: Grant
    Filed: September 13, 2005
    Date of Patent: October 5, 2010
    Assignee: International Business Machines Corporation
    Inventors: Frederic Bauchot, Gerard Marmigere
  • Patent number: 7805413
    Abstract: A program stored in a storage device is read. Partial compression, in the element in an array in a loop nest in the program, is performed by replacing an element local only in the loop nest in the entire program with a scalar variable. Access to an original array is inserted into a program for an non-local element.
    Type: Grant
    Filed: December 22, 2003
    Date of Patent: September 28, 2010
    Assignee: Fujitsu Limited
    Inventor: Akira Hosoi
  • Publication number: 20100235819
    Abstract: In embodiments, prior to compilation into machine code, a preprocessor generates directives by processing a source code and/or bytecode representation of a program and/or selecting default directives. The preprocessor embeds the directives in a bytecode representation of the program or a separate stream associated with the bytecode representation of the program. A just-in-time compiler may compile the bytecode representation into machine code directed by the embedded directives in one pass and/or a bytecode interpreter may interpret the bytecode representation of the program. In some embodiments, a computing device generates bytecodes during execution of a program, selects default directives, and embeds the default directives in the bytecodes or a separate stream associated with the bytecodes prior to compilation of the bytecodes into machine code.
    Type: Application
    Filed: March 10, 2009
    Publication date: September 16, 2010
    Applicant: Sun Microsystems, Inc.
    Inventor: John Robert Rose
  • Patent number: 7797692
    Abstract: A system that estimates a dominant computational resource which is used by a computer program. During operation, for each basic block in the computer program, the system determines a nesting level for the basic block. Next, the system selects basic blocks with nesting levels greater than a specified threshold. For each selected basic block, the system analyzes the basic block to estimate the dominant computational resource used by the basic block. The system then uses the estimated dominant computational resources for the selected basic blocks to estimate the dominant computational resource for the computer program.
    Type: Grant
    Filed: May 12, 2006
    Date of Patent: September 14, 2010
    Assignee: Google Inc.
    Inventor: Grzegorz J. Czajkowski
  • Patent number: 7793278
    Abstract: Systems and methods perform affine partitioning on a code stream to produce code segments that may be parallelized. The code segments include copies of the original code stream with conditional inserted that aid in parallelizing code. The conditional is formed by determining the constraints on a processor variable determined by the affine partitioning and applying the constraints to the original code stream.
    Type: Grant
    Filed: September 30, 2005
    Date of Patent: September 7, 2010
    Assignee: Intel Corporation
    Inventors: Zhao Hui Du, Shih-Wei Liao, Gansha Wu, Guei-Yuan Lueh
  • Patent number: 7788659
    Abstract: The present invention is a method of eliminating loops from a computer program by receiving the program, graphing its function and control, identifying its entry point, and identifying groups of loops connected to its entry point. Stop if there are no such groups. Otherwise, selecting a group of loops. Then, identifying the selected group's entry point. If the selected group includes no group of loops having a different entry point then replacing it with a recursive or non-recursive function, reconfiguring each connection entering and exiting the selected group to preserve their functionality, and returning to the fifth step. Otherwise, identifying groups of loops in the selected group connected to, but having different entry points and returning to the loop selection step.
    Type: Grant
    Filed: February 27, 2007
    Date of Patent: August 31, 2010
    Assignee: United States of America as represented by the Director, the National Security Agency
    Inventor: Francis S. Rimlinger
  • Publication number: 20100218196
    Abstract: Methods, apparatus and computer software product for source code optimization are provided. In an exemplary embodiment, a first custom computing apparatus is used to optimize the execution of source code on a second computing apparatus. In this embodiment, the first custom computing apparatus contains a memory, a storage medium and at least one processor with at least one multi-stage execution unit. The second computing apparatus contains at least two multi-stage execution units that allow for parallel execution of tasks. The first custom computing apparatus optimizes the code for parallelism, locality of operations and contiguity of memory accesses on the second computing apparatus. This Abstract is provided for the sole purpose of complying with the Abstract requirement rules. This Abstract is submitted with the explicit understanding that it will not be used to interpret or to limit the scope or the meaning of the claims.
    Type: Application
    Filed: April 16, 2010
    Publication date: August 26, 2010
    Inventors: Allen K. Leung, Benoit Meister, Nicolas T. Vasilache, David E. Wohlford, Cedric Bastoul, Peter Szilagyi, Richard A. Lethin
  • Publication number: 20100205592
    Abstract: A system and method for discovering a set of possible iteration sequences for a given loop in a software program is described, to transform the loop representation. In a program containing a loop, the loop is partitioned into a plurality of portions based on splitting criteria. Labels are associated with the portions, and an initial loop automaton is constructed that represents the loop iterations as a regular language over the labels corresponding to the portions in the program. Subsequences of the labels are analyzed to determine infeasibility of the subsequences permitted in the automaton. The automaton is refined by removing all infeasible subsequences to discover a set of possible iteration sequences in the loop. The resulting loop automaton is used in a subsequent program verification or analysis technique to find violations of correctness properties in programs.
    Type: Application
    Filed: February 8, 2010
    Publication date: August 12, 2010
    Applicant: NEC Laboratories America, Inc.
    Inventors: SRIRAM SANKARANARAYANAN, Aarti Gupta, Gogul Balakrishnan
  • Patent number: 7774766
    Abstract: Various embodiments of the present invention relate to methods and systems for optimizing an intermediate code in a compilation logic. The intermediate code is optimized by performing reassociation in software loops. The intermediate code includes at least one critical recurrence cycle. The performance of reassociation in software loops can reduce a critical recurrence cycle in them, which can speed up their execution. The subject method can include the determination of one or more critical recurrence cycles in a software loop. The method can also include the determination of at least one edge in a critical recurrence cycle, with respect to which reassociation can be performed, if one or more pre-determined criteria are met. The method can further include performing reassociation of a dependee and a dependent of an edge. In an embodiment, when one or more pre-determined criteria are met, the logic of the software loop is maintained after performing reassociation of the dependee and the dependent of the edge.
    Type: Grant
    Filed: September 29, 2005
    Date of Patent: August 10, 2010
    Assignee: Intel Corporation
    Inventors: Kalyan Muthukumar, Daniel M Lavery
  • Patent number: 7757222
    Abstract: Code is affine partitioned to generate affine partitioning mappings. Parallel code is generated based on the affine partitioning mappings. Generating the parallel code includes coalescing loops in the parallel code generated from the affine partitioning mappings to generate coalesced parallel code and optimizing the coalesced parallel code.
    Type: Grant
    Filed: September 30, 2005
    Date of Patent: July 13, 2010
    Assignee: Intel Corporation
    Inventors: Shih-wei Liao, Zhao Hui Du, Bu Qi Cheng, Gansha Wu, Guei-Yuan Lueh
  • Publication number: 20100175056
    Abstract: A compiler comprises an analysis unit that detects directives (options and pragmas) from a user to the compiler, an optimization unit that is made up of a processing unit (a global region allocation unit, a software pipelining unit, a loop unrolling unit, a “if” conversion unit, and a pair instruction generation unit) that performs individual optimization processing designated by options and pragmas from a user, following the directives and the like from the analysis unit, etc. The global region allocation unit performs optimization processing, following designation of the maximum data size of variables to be allocated to a global region, designation of variables to be allocated to the global region, and options and pragmas regarding designation of variables not to be allocated in the global region.
    Type: Application
    Filed: February 16, 2010
    Publication date: July 8, 2010
    Inventors: Hajime OGAWA, Taketo Heishi, Toshiyuki Sakata, Shuichi Takayama, Shohei Michimoto, Tomoo Hamada, Ryoko Miyachi
  • Patent number: 7747993
    Abstract: A method of ordering instructions. The method can include placing a first instruction that consumes a value of an object before a second instruction that produces the value of the object such that the first instruction is processed before the second instruction and a physical location is allocated to the value of the object upon processing the first instruction.
    Type: Grant
    Filed: December 30, 2004
    Date of Patent: June 29, 2010
    Assignee: Michigan Technological University
    Inventor: Soner Onder
  • Publication number: 20100146495
    Abstract: A computing system has an amount of shared cache, and performs runtime automatic parallelization wherein when a parallelized loop is encountered, a main thread shares the workload with at least one other non-main thread. A method for providing interprocedural prefetching includes compiling source code to produce compiled code having a main thread including a parallelized loop. Prior to the parallelized loop in the main thread, the main thread includes prefetching instructions for the at least one other non-main thread that shares the workload of the parallelized loop. As a result, the main thread prefetches data into the shared cache for use by the at least one other non-main thread.
    Type: Application
    Filed: December 10, 2008
    Publication date: June 10, 2010
    Applicant: SUN MICROSYSTEMS, INC.
    Inventors: Yonghong Song, Spiros Kalogeropulos, Partha P. Tirumalai
  • Patent number: 7721267
    Abstract: A software pipelined loop tracing method involves inhibiting an output of trace data at a start of a software pipelined loop (SPLOOP). A skip in an output trace packet is indicated if the SPLOOP is skipped, and the SPLOOP is indicated at a cycle of an epilog state in the output trace packet if the SPLOOP is not skipped. An iteration count indication SPLOOP information and a position within a SPLOOP, is maintained. A periodic SPLOOP marker (PerSP) coinciding with a sync point is output if the SPLOOP is active.
    Type: Grant
    Filed: May 16, 2006
    Date of Patent: May 18, 2010
    Assignee: Texas Instruments Incorporated
    Inventor: Manisha Agarwala
  • Publication number: 20100122069
    Abstract: A macroscalar processor architecture is described herein. In one embodiment, a processor receives instructions of a program loop having a vector block and a sequence block intended to be executed after the vector block, where the processor includes multiple slices and each of the slices is capable of executing an instruction of an iteration of the program loop substantially in parallel. For each iteration of the program loop, the processor executes an instruction of the sequence block using one of the slices while executing instructions of the vector block using a remainder of the slices substantially in parallel. Other methods and apparatuses are also described.
    Type: Application
    Filed: November 6, 2009
    Publication date: May 13, 2010
    Inventor: Jeffry E. Gonion
  • Patent number: 7712091
    Abstract: A method and system for optimizing the execution of a software loop is provided. The method involves the determination of an edge in a critical recurrence cycle in the software loop. The edge is a dependency link between two instructions and contains a dependee and a dependent. The dependee is an instruction that produces a result, and the dependent is an instruction that uses the result. The method further involves performing predicate promotion of at least one of the dependee and the dependent if one or more pre-determined conditions are met.
    Type: Grant
    Filed: September 30, 2005
    Date of Patent: May 4, 2010
    Assignee: Intel Corporation
    Inventors: Kalyan Muthukumar, Robyn A. Sampson, Daniel Lavery
  • Patent number: 7702856
    Abstract: The prefetch distance to be used by a prefetch instruction may not always be correctly calculated using compile-time information. In one embodiment, the present invention generates prefetch distance calculation code to dynamically calculate a prefetch distance used by a prefetch instruction at run-time.
    Type: Grant
    Filed: November 9, 2005
    Date of Patent: April 20, 2010
    Assignee: Intel Corporation
    Inventors: Rakesh Krishnaiyer, Somnath Ghosh, Abhay Kanhere
  • Patent number: 7698696
    Abstract: A compiler comprises an analysis unit that detects directives (options and pragmas) from a user to the compiler, an optimization unit that is made up of a processing unit (a global region allocation unit, a software pipelining unit, a loop unrolling unit, a “if” conversion unit, and a pair instruction generation unit) that performs individual optimization processing designated by options and pragmas from a user, following the directives and the like from the analysis unit, etc. The global region allocation unit performs optimization processing, following designation of the maximum data size of variables to be allocated to a global region, designation of variables to be allocated to the global region, and options and pragmas regarding designation of variables not to be allocated in the global region.
    Type: Grant
    Filed: June 30, 2003
    Date of Patent: April 13, 2010
    Assignee: Panasonic Corporation
    Inventors: Hajime Ogawa, Taketo Heishi, Toshiyuki Sakata, Shuichi Takayama, Shohei Michimoto, Tomoo Hamada, Ryoko Miyachi
  • Patent number: 7689980
    Abstract: Linear transformations of statements in code are performed to generate linear expressions associated with the statements. Parallel code is generated using the linear expressions. Generating the parallel code includes splitting the computation-space of the statements into intervals and generating parallel code for the intervals.
    Type: Grant
    Filed: September 30, 2005
    Date of Patent: March 30, 2010
    Assignee: Intel Corporation
    Inventors: Zhao Hui Du, Shih-wei Liao, Gansha Wu, Guei-Yuan Lueh
  • Publication number: 20100070956
    Abstract: Methods, apparatus and computer software product for source code optimization are provided. In an exemplary embodiment, a first custom computing apparatus is used to optimize the execution of source code on a second computing apparatus. In this embodiment, the first custom computing apparatus contains a memory, a storage medium and at least one processor with at least one mufti-stage execution unit. The second computing apparatus contains at least two multi-stage execution units that avow for parallel execution of tasks. The first custom computing apparatus optimizes the code for both parallelism and locality of operations on the second computing apparatus. This Abstract is provided for the sole purpose of complying with the Abstract requirement rules. This Abstract is submitted with the explicit understanding that it will not be used to interpret or to limit the scope or the meaning of the claims.
    Type: Application
    Filed: September 16, 2009
    Publication date: March 18, 2010
    Inventors: Allen Leung, Nicolas T. Vasilache, Benoit Meister, Richard A. Lethin
  • Patent number: 7669194
    Abstract: A mechanism for minimizing effective memory latency without unnecessary cost through fine-grained software-directed data prefetching using integrated high-level and low-level code analysis and optimizations is provided. The mechanism identifies and classifies streams, identifies data that is most likely to incur a cache miss, exploits effective hardware prefetching to determine the proper number of streams to be prefetched, exploits effective data prefetching on different types of streams in order to eliminate redundant prefetching and avoid cache pollution, and uses high-level transformations with integrated lower level cost analysis in the instruction scheduler to schedule prefetch instructions effectively.
    Type: Grant
    Filed: August 26, 2004
    Date of Patent: February 23, 2010
    Assignee: International Business Machines Corporation
    Inventors: Roch Georges Archambault, Robert James Blainey, Yaoqing Gao, Allan Russell Martin, James Lawrence McInnes, Francis Patrick O'Connell
  • Patent number: 7665079
    Abstract: It is one object of the present invention to provide a program execution method for performing greater optimization. A program execution apparatus according to the present invention performs a transfer from an interpreter process to a compiled code process in the course of the execution of a method. At this time, if no problem occurs when a transfer point is moved to the top of a loop, the transfer point for code is so moved. And when a transfer point is located inside a loop, a point that post-dominates the top of the loop and the transfer point is copied to a position immediately preceding the loop. Then, information for generating recalculation code is provided for the transfer point, and a recalculation is performed.
    Type: Grant
    Filed: November 8, 2000
    Date of Patent: February 16, 2010
    Assignee: International Business Machines Corporation
    Inventors: Toshiaki Yasue, Kazunori Ogata, Kazuaki Ishizaki, Hideaki Komatsu
  • Patent number: 7665078
    Abstract: A method for optimizing a code sequence by tuning the representations of an instruction set based on the frequency of operations performed by the code sequence. For example, the number of bit symbols used to represent a code sequence may be reduced using the present invention.
    Type: Grant
    Filed: August 21, 2003
    Date of Patent: February 16, 2010
    Assignee: Gateway, Inc.
    Inventor: Frank Liebenow
  • Publication number: 20100023700
    Abstract: Reducing coherency problems in a data processing system is provided. Source code that is to be compiled is received and analyzed to identify at least one of a plurality of loops that contain a memory reference. A determination is made as to whether the memory reference is an access to a global memory that should be handled by a direct buffer. Responsive to an indication that the memory reference is an access to the global memory that should be handled by the direct buffer, the memory reference is marked for direct buffer transformation. The direct buffer transformation is then applied to the memory reference.
    Type: Application
    Filed: July 22, 2008
    Publication date: January 28, 2010
    Applicant: International Business Machines Corporation
    Inventors: Tong Chen, John K. O'Brien, Tao Zhang
  • Publication number: 20100023932
    Abstract: A mechanism for efficient software cache accessing with handle reuse is provided. The mechanism groups references in source code into a reference stream with the reference stream having a size equal to or less than a size of a software cache line. The source code is transformed into optimized code by modifying the source code to include code for performing at most two cache lookup operations for the reference stream to obtain two cache line handles. Moreover, the transformation involves inserting code to resolve references in the reference stream based on the two cache line handles. The optimized code may be output for generation of executable code.
    Type: Application
    Filed: July 22, 2008
    Publication date: January 28, 2010
    Applicant: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, Marc Gonzalez Tallada, John K. O'Brien
  • Patent number: 7647577
    Abstract: Provides methods for transforming a flowchart to an equivalent tree diagram, methods for transforming an equivalent tree diagram to a flowchart, methods for verifying reorganization of a flowchart, methods for editing a flowchart, methods for creating a flowchart and a flowchart editor. A flowchart includes one or more logic structures and one or more processing activities in said one or more logic structures. The method for transforming a flowchart to an equivalent tree diagram comprises: traversing said flowchart; transforming said one or more logic structures in said flowchart to one or more branching nodes in said tree diagram; and transforming one or more processing activities in said logic structures of said flowchart to one or more leaf nodes below corresponding branching nodes in said tree diagram. Further, edition of a flowchart and verification of reorganization of a flowchart are performed by utilizing an equivalent tree diagram.
    Type: Grant
    Filed: May 27, 2005
    Date of Patent: January 12, 2010
    Assignee: International Business Machines Corporation
    Inventors: Jian Wang, Jun Zhu, Sheng Ye, Jing Li, Hai Qi Liang, Ying Liu, Ying Nan Zuo
  • Publication number: 20090328020
    Abstract: Interface optimization is provided using a closed system in which all the individual software components in the system are known to the compiler at a single point in time. This knowledge enables significant opportunities to optimize the implementation of interfaces on a set of implemented objects. When code is compiled, because the compiler knows the full list of interfaces and the objects which implement the interfaces, it can improve execution and working set (i.e., recently referenced pages in a program's virtual address space) when implementing the interfaces on objects. This improvement may be realized by reducing the size of interface lookup tables which map each interface to the object types which implement that particular interface.
    Type: Application
    Filed: June 28, 2008
    Publication date: December 31, 2009
    Applicant: Microsoft Corporation
    Inventors: Jeffrey E. Stall, Jonathon Michael Stall
  • Publication number: 20090328021
    Abstract: In one embodiment of the invention, a method for fusing a first loop nested in a first IF statement with a second loop nested in a second IF statement without the use of modified and referenced (mod-ref) information to determine if certain conditional statements in the IF statements retain variable values.
    Type: Application
    Filed: June 30, 2008
    Publication date: December 31, 2009
    Inventors: John L. Ng, Robert Cox, Dmitry V. Budanov