Including Loop Patents (Class 717/160)
  • Patent number: 6721943
    Abstract: In general, the malloc-combining transformation optimization during compile-time of a source program engaged in dynamically constructing multi-dimensional arrays provides an effective method of improving cache locality by combining qualified malloc and free/realloc calls found in counted loops into a single system call and hoisting out the single call and placing it immediately preceding the beginning of the counted loops. As a result of the application of the malloc-combining optimization results in improved cache locality allows for prefetching array pointers and data elements of the dynamic arrays as if the dynamic arrays were static.
    Type: Grant
    Filed: March 30, 2001
    Date of Patent: April 13, 2004
    Assignee: Intel Corporation
    Inventors: Rakesh Krishnaiyer, Somnath Ghosh, Wei Li
  • Patent number: 6708331
    Abstract: The invention provides a scalable, automated, network friendly method for building parallel applications from embarrassingly parallel serial programs. Briefly, the steps of an exemplary method in this invention are as follows: First, the application loops with significant potential parallelism are identified. Second, from the set of loops identified, those loops which can statically be shown to not be parallelizable are disqualified. Next, the program is transformed into a parallel form in which the remaining identified loops are optimistically parallelized and packaged into per-iteration functions. Then, shared memory in the functions is relocated to a shared memory section available to all computers and references to the shared memory in the source code are transformed into indirect accesses. Finally, the per-iteration functions are spawned on to multiple computers, at run-time, where each computer is given a range of iteration.
    Type: Grant
    Filed: October 6, 2000
    Date of Patent: March 16, 2004
    Inventor: Leon Schwartz
  • Publication number: 20040015932
    Abstract: A method of converting a program, such as a graphic database, representing the geometry of a workpiece, into numeric control code in order to program a numeric machine control to operate a machine, such as a router. The machine control receives and processes the program according to a set of machine-specific attributes, including axis configuration and worktable size and layout. Operational attributes, such as feed rate and tool assignments, are specified. Optionally, multiple components or workpieces are nested into a cluster, and available off-fall sheets are matched to the cluster, so as to maximize the efficient use of material. Numeric code is then generated to permit the machine control to operate the machine.
    Type: Application
    Filed: May 22, 2001
    Publication date: January 22, 2004
    Inventor: Kenneth J. Susnjara
  • Publication number: 20040015681
    Abstract: A process and associated programs are described for structuring program code, comprising the steps of: procuring a syntax tree representative of an input program code; replacing at least some jump statements in the input program code by one-shot loops by introducing loop structure nodes directly in the syntax tree to depend from a common ancestor of the jump statement and the target thereof, the basic blocks in the same branches of the syntax tree as the jump statement and its target and the branches inbetween being moved to depend from the introduced loop structure node, the jump statement being replaced by a break or continue statement so that the syntax tree corresponds to an output program code having functionality substantially equivalent to that of the input program code.
    Type: Application
    Filed: April 29, 2003
    Publication date: January 22, 2004
    Applicant: Hewlett-Packard Development Company L.P.
    Inventor: Sylvain Reynaud
  • Publication number: 20040015933
    Abstract: Methods and apparatuses for backlash compensation. A dynamics inversion compensation scheme is designed for control of nonlinear discrete-time systems with input backlash. The techniques of this disclosure extend the dynamic inversion technique to discrete-time systems by using a filtered prediction, and shows how to use a neural network (NN) for inverting the backlash nonlinearity in the feedforward path. The techniques provide a general procedure for using NN to determine the dynamics preinverse of an invertible discrete time dynamical system. A discrete-time tuning algorithm is given for the NN weights so that the backlash compensation scheme guarantees bounded tracking and backlash errors, and also bounded parameter estimates. A rigorous proof of stability and performance is given and a simulation example verifies performance. Unlike standard discrete-time adaptive control techniques, no certainty equivalence (CE) or linear-in-the-parameters (LIP) assumptions are needed.
    Type: Application
    Filed: October 2, 2001
    Publication date: January 22, 2004
    Applicant: Board of Regents, The University of Texas System
    Inventors: Javier Campos, Frank L. Lewis
  • Publication number: 20040015915
    Abstract: A system and method for processing a variable looping statement into a constant looping statement to enable loop unrolling. A lower bound and an upper bound of the loop index within the variable looping statement are determined. A constant looping statement is then formed using the lower bound and upper bound to define a range over which the loop index varies within the constant looping statement. The constant looping statement further includes a conditional statement that reflects conditions in the initial expression and/or the exit expression of the variable looping statement. The conditional statement controls execution of the body of the generated constant looping statement, which includes the body from the original variable looping statement. Loop unrolling may then be performed on the generated constant looping statement.
    Type: Application
    Filed: May 8, 2001
    Publication date: January 22, 2004
    Applicant: SUN MICROSYSTEMS, INC.
    Inventors: William K. Lam, David S. Allison
  • Publication number: 20040015934
    Abstract: A method is provided for processing nested loops that include a modulo-scheduled inner loop within an outer loop. The nested loop is scheduled to execute the epilog stage of the inner loop for a given iteration of the outer loop with the prolog stage of the inner loop for the next iteration of the outer loop. For one embodiment of the invention, this is accomplished by initializing an epilog counter for the inner loop to a value that bypasses draining the software pipeline. This causes the processor to exit the inner loop before it begins draining the inner loop pipeline. The inner loop pipeline is drained during the next iteration of the outer loop, while the inner loop pipeline fills for the next iteration of the outer loop.
    Type: Application
    Filed: May 9, 2002
    Publication date: January 22, 2004
    Inventors: Kalyan Muthukumar, Gautam B. Doshi
  • Publication number: 20040003386
    Abstract: The present invention is directed to a transformation technique for nested loops. A virtual iteration space may be determined based on an unroll factor (UF). The virtual iteration space, which includes the actual iteration space, is formed such that, the virtual iteration space may be evenly divided by a selected UF. Once the virtual iteration space has been calculated or determined, the virtual iteration space is “cut” into regular portions by one or more unroll factors. Portions of the actual iteration space which do not fill the cut portions of the virtual iteration space or which fall outside these cuts which have been evenly divided by the unroll factor form a residue which is calculated. The portions of the actual iteration space which remain are also evenly divided by the unroll factor(s). An outer loop for this remaining portion of the actual iteration space is then unrolled. This unrolled portion forms a perfect nested loop.
    Type: Application
    Filed: November 14, 2002
    Publication date: January 1, 2004
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Arie Tal, Robert J. Blainey
  • Publication number: 20040003381
    Abstract: In a compiler, a source program analysis unit forms an intermediate program by analyzing a source program. A vectorization unit extracts logically vectorizable loops from the intermediate program, gives a SIMD expression to each loop regardless of whether or not the corresponding SIMD instruction exists, and vectorizes all the loops. A vector operation expansion unit performs unrolling expansion of a portion with no corresponding SIMD instruction, selection of an optimum vector length, etc. An instruction scheduling unit optimizes the intermediate program, and assign instructions. A code generation unit forms an object program from the intermediate program.
    Type: Application
    Filed: June 19, 2003
    Publication date: January 1, 2004
    Applicant: FUJITSU LIMITED
    Inventors: Kiyofumi Suzuki, Masaki Aoki, Hiroaki Sato
  • Patent number: 6671878
    Abstract: Disclosed herein is an instruction set scheduling system for scheduling instruction sets in a pipelined processing system. In particular, the scheduling system includes a binary search technique for ascertaining the minimum acceptable iteration interval amongst a range of possible iteration intervals for use by the modulo scheduler.
    Type: Grant
    Filed: March 24, 2000
    Date of Patent: December 30, 2003
    Inventor: Brian E. Bliss
  • Patent number: 6665864
    Abstract: The present invention eliminates redundant array range checks. A two-phased check is performed, namely a wide range check is performed by combining a plurality of array range checks, and a strict range check is unsuccessful, so as to reduce the number of range checks at execution time and allow execution at high speed. For instance, it is possible with a processor such as PowerPC, by using a flag, to invalidate a code for performing an array range check at high speed without increasing a code size. Consequently, the number of array range checks to be executed can be reduced so as to allow execution at high speed. Also, for instance, a plurality of array range checks can be combined without considering existence of instructions which cause a side effect. Consequently, the number of array range checks to be executed can be reduced so as to allow execution at high speed.
    Type: Grant
    Filed: December 28, 1999
    Date of Patent: December 16, 2003
    Assignee: International Business Machines Corporation
    Inventors: Motohiro Kawahito, Hideaki Komatsu, Toshiaki Yasue
  • Patent number: 6651246
    Abstract: Loop allocation for optimizing compilers includes the generation of a program dependence graph for a source code segment. Control dependence graph representations of the nested loops, from innermost to outermost, are generated and data dependence graph representations are generated for each level of nested loop as constrained by the control dependence graph. An interference graph is generated with the nodes of the data dependence graph. Weights are generated for the edges of the interference graph reflecting the affinity between statements represented by the nodes joined by the edges. Nodes in the interference graph are given weights reflecting resource usage by the statements associated with the nodes. The interference graph is partitioned using a profitability test based on the weights of edges and nodes and on a correctness test based on the reachability of nodes in the data dependence graph. Code is emitted based on the partitioned interference graph.
    Type: Grant
    Filed: May 18, 2000
    Date of Patent: November 18, 2003
    Assignee: International Business Machines Corporation
    Inventors: Roch Georges Archambault, Robert James Blainey
  • Publication number: 20030204840
    Abstract: An apparatus and method for one-pass profiling to concurrently generate a frequency profile and a stride profile to enable pre-fetching of irregular program data are described. In one embodiment, the method includes the selective generation of stride profile information according to partially generated frequency profile information to concurrently form a stride profile and a frequency profile during execution of a user program instrumented during a single profiling pass. Once the stride profile and frequency profile are generated, prefetch instructions are inserted into the user program utilizing the stride profile and the frequency profile. In one embodiment, the present invention utilizes profiling to identify regular stride patterns in irregular program code, which is referred to herein as stride profiling.
    Type: Application
    Filed: April 30, 2002
    Publication date: October 30, 2003
    Inventor: Youfeng Wu
  • Publication number: 20030200538
    Abstract: The method, system and programming language of the present invention, provide for program constructs, such as commands, declarations, variables, and statements, which have been developed to describe computations for an adaptive computing architecture, rather than provide instructions to a sequential microprocessor or DSP architecture. The invention includes program constructs that permit a programmer to define data flow graphs in software, to provide for operations to be executed in parallel, and to reference variable states and historical values in a straightforward manner. The preferred method, system, and programming language also includes mechanisms for efficiently referencing array variables, and enables the programmer to succinctly describe the direct data flow among matrices, nodes, and other configurations of computational elements and computational units forming the adaptive computing architecture.
    Type: Application
    Filed: April 23, 2002
    Publication date: October 23, 2003
    Applicant: QuickSilver Technology, Inc.
    Inventors: W. H. Carl Ebeling, Eugene B. Hogenauer
  • Patent number: 6634024
    Abstract: The present invention integrates data prefetching into a modulo scheduling technique to provide for the generation of assembly code having improved performance. Modulo scheduling can produce optimal steady state code for many important cases by sufficiently separating defining instructions (producers) from using instructions (consumers), thereby avoiding machine stall cycles and simultaneously maximizing processor utilization. Integrating data prefetching within modulo scheduling yields high performance assembly code by prefetching data from memory while at the same time using modulo scheduling to efficiently schedule the remaining operations. The invention integrates data prefetching into modulo scheduling by postponing prefetch insertion until after modulo scheduling is complete. Actual insertion of the prefetch instructions occurs in a postpass after the generation of appropriate prologue-kernel-epilogue code.
    Type: Grant
    Filed: June 27, 2001
    Date of Patent: October 14, 2003
    Assignee: Sun Microsystems, Inc.
    Inventors: Partha Pal Tirumalai, Rajagopalan Mahadevan
  • Patent number: 6631465
    Abstract: A method and apparatus that provides instruction re-alignment using a branch on a falsehood of a qualifying predicate. A complementary predicate related to a qualifying predicate is determined to be available. Instructions are re-aligned using a branch on a falsehood of the qualifying predicate if the complementary predicate is not available. Thus, a complementary predicate does not have to be generated to re-align instructions if no complementary predicate is available for the qualifying predicate.
    Type: Grant
    Filed: June 30, 2000
    Date of Patent: October 7, 2003
    Assignee: Intel Corporation
    Inventors: William Y. Chen, Dong-Yuan Chen
  • Publication number: 20030188302
    Abstract: A method and apparatus for detecting and decomposing component loops in a logic design is described. The invention first detects any component loops when the compiler schedules the processing order of the combinational logic components in the digital circuit design. To identify component loops, the compiler levelizes the design and sorts the combinational logic components, making sure that no true combinational logic loops exist. If the sorting fails, a component loop exists, and the compiler identifies such components and selects one or more of the components to be split. Next, the invention corrects the component loops by splitting a component into multiple sub-components. By splitting a component into multiple sub-components, the output of the split component no longer provides input to another component, and hence, the component loop is broken.
    Type: Application
    Filed: March 29, 2002
    Publication date: October 2, 2003
    Inventors: Liang T. Chen, Jeffrey Broughton, Derek Pappas
  • Patent number: 6622301
    Abstract: When converting a sequential execution source program into a parallel program to be executed by respective processors (nodes) of a distributed shared memory parallel computer, a compiler computer transforms the source program to increase a processing speed of the parallel program. First, a kernel loop having a longest sequential execution time is detected in the source program. Next, a data access pattern equal to that of the kernel loop is reproduced to generate a control code to control first touch data distribution. The first touch control code generated is inserted in the parallel program.
    Type: Grant
    Filed: February 8, 2000
    Date of Patent: September 16, 2003
    Assignee: Hitachi, Ltd.
    Inventors: Takashi Hirooka, Hiroshi Ohta, Takayoshi Iitsuka, Sumio Kikuchi
  • Publication number: 20030167457
    Abstract: The present invention provides a system and method for providing a graphic representation of code characteristic and optimizations performed. In architecture, the system includes an optimizer display tool that indicates at least one instruction characteristic in a program and comprises logic that acquires a block of code in the program, and logic for analyzing the block of code for the at least one instruction characteristic. The optimizer display tool further comprises logic for generating a unique graphical indicator for the at least one instruction characteristic, and logic for displaying the unique graphical indicator with the block of code to indicate that the at least one instruction characteristic is present in the block of code.
    Type: Application
    Filed: March 1, 2002
    Publication date: September 4, 2003
    Inventors: Carol L. Thompson, Cary A. Coutant
  • Patent number: 6615403
    Abstract: The present invention provides a mechanism for implementing compare speculation in software pipelined loops. A data dependency graph (DDG) is generated for a loop that includes a control compare instruction, a compare instruction and a non-speculative instruction that depends directly or indirectly on the compare instruction. A loop-carried edge between the control compare instruction and the compare instruction is replaced by a loop-carried edge between the control compare instruction and the non-speculative instruction. If the compare instruction is speculated when the loop is modulo-scheduled, any load instruction that depends on the compare is converted to a speculative load, and a loop-carried edge is added between the control compare and a check instruction associated with the speculative load. A loop-independent edge is also added between the check instruction and the non-speculative instruction if the non-speculative instruction also depends on the load.
    Type: Grant
    Filed: June 30, 2000
    Date of Patent: September 2, 2003
    Assignee: Intel Corporation
    Inventors: Kalyan Muthukumar, David A Helder
  • Publication number: 20030115579
    Abstract: An embodiment of the present invention provides an optimizer for optimizing source code to generate optimized source code having instructions for instructing a central processing unit (CPU) to iteratively compute values for a primary recurrence element. A computer programmed loop for computing the primary recurrence element and subsequent recurrence elements is an example of a case involving iteratively computing the primary recurrence element. The CPU is operatively coupled to fast operating memory (FOM) and operatively coupled to slow operating memory (SOM). SOM stores the generated optimized source code. The optimized source code includes instructions for instructing said CPU to store a computed value of the primary recurrence element in a storage location of FOM. The instructions also includes instructions to consign the computed value of the primary recurrence element from the storage location to another storage location of the FOM.
    Type: Application
    Filed: December 5, 2002
    Publication date: June 19, 2003
    Applicant: International Business Machines Corporation
    Inventors: Roch Georges Archambault, Robert James Blainey, Charles Brian Hall, Yingwei Zhang
  • Publication number: 20030097652
    Abstract: A profile-based loop optimizer generates an execution frequency table for each loop that gives more detailed profile data that allows making a more intelligent decision regarding if and how to optimize each loop in the computer program. The execution frequency table contains entries that correlate a number of times a loop is executed each time the loop is entered with a count of the occurrences of each number during the execution of an instrumented instruction stream. The execution frequency table is used to determine whether there is one dominant mode that appears in the profile data, and if so, optimizes the loop according to the dominant mode. The optimizer may perform optimizations by peeling a loop, by unrolling a loop, and by performing both peeling and unrolling on a loop according to the profile data in the execution frequency table for the loop.
    Type: Application
    Filed: November 19, 2001
    Publication date: May 22, 2003
    Applicant: International Business Machines Corporation
    Inventors: Robert Ralph Roediger, William Jon Schmidt
  • Publication number: 20030097653
    Abstract: The present invention relates to parallel loop transformation methods for race detection during an execution of parallel programs which is one of the debugging methods for parallel loop programs. Using the information obtained from a static analysis of parallel loop bodies, the monitoring time for race detection is improved by transforming the loop bodies in order for only the necessary iterations for race detection can be dynamically selected during the execution. Specifically, in comparison to the conventional monitoring methods which typically consumes a long time since they monitor the full iterations for each parallel loop in parallel loop programs, by monitoring two times of the execution paths irrespective of the parallelism of each parallel loop, the present invention can significantly reduce the execution time. As a result, the present invention allows a convenient race detection of parallel loop programs therefore making the race detection more practical.
    Type: Application
    Filed: December 26, 2001
    Publication date: May 22, 2003
    Inventors: Jeong Si Kim, Dong Soo Han, Chan Su Yu
  • Patent number: 6567976
    Abstract: A compiler for compiling source code whereby the compiled source code is optimized by performing outer loop unrolling (a generalization of “unroll and jam” on selected loop nests. The present invention allows any arbitrarily deep loop nests with non-varying loop bounds to be properly unrolled even in the presence of imperfectly nested code. This is accomplished for two-deep loop nests by transforming the code into multiple adjacent loop nests. In the transformed code, the imperfect code is isolated so that one of the adjacent loops nests has none, and thus can be unrolled and jammed. For three-deep or greater loop nests, the process is repeated recursively from the outer-most loop. The present invention also allows outer loop unrolling for two-deep loop nests with convex bounds, even with the presence of imperfectly nested code. This is accomplished by identifying strips of code which do not contain imperfectly nested code. An unroll and jam operation is executed for the identified strips.
    Type: Grant
    Filed: March 20, 1997
    Date of Patent: May 20, 2003
    Assignee: Silicon Graphics, Inc.
    Inventor: Michael Wolf
  • Publication number: 20030088864
    Abstract: One embodiment of the present invention provides a system that generates code to perform anticipatory prefetching for data references. During operation, the system receives code to be executed on a computer system. Next, the system analyzes the code to identify data references to be prefetched. This analysis can involve: using a two-phase marking process in which blocks that are certain to execute are considered before other blocks; and analyzing complex array subscripts. Next, the system inserts prefetch instructions into the code in advance of the identified data references. This insertion can involve: dealing with non-constant or unknown stride values; moving prefetch instructions into preceding basic blocks; and issuing multiple prefetches for the same data reference.
    Type: Application
    Filed: November 2, 2001
    Publication date: May 8, 2003
    Inventors: Partha P. Tirumalai, Spiros Kalogeropulos, Mahadevan Rajagopalan, Yonghong Song, Vikram Rao
  • Patent number: 6539543
    Abstract: A method and apparatus for optimizing the compilation of computer program by exposing parallelism are disclosed. The computer program contains steps which involve index expressions. The program also involves function calls. An index path in the program is identified by noting the steps involving index expressions. A non-hierarchical representation of the index path, including operations in the function calls is created and interrogated with questions relating to memory accesses. The results of the interrogation are stored in or back annotated to a question data structure. The method and apparatus preferably involve the use of a signal flow graph which is completed using the information in the question data structure.
    Type: Grant
    Filed: November 29, 1999
    Date of Patent: March 25, 2003
    Assignee: Adelante Technologies, NV
    Inventors: Jan Guffens, Kurt Du Pont
  • Patent number: 6539541
    Abstract: A method of constructing and unrolling speculatively counted loops. The method of the present invention first locates a memory load instruction within the loop body of a loop. An advance load instruction is inserted into the preheader of the loop. The memory load instruction is replaced with a check instruction. The loop body is unrolled. A cleanup block is generated for said loop.
    Type: Grant
    Filed: August 20, 1999
    Date of Patent: March 25, 2003
    Assignee: Intel Corporation
    Inventor: Robert Y. Geva
  • Patent number: 6507947
    Abstract: A programmatic method transforms a nested loop in a high level programming language into a set of parallel processes, each a single time loop, such that the parallel processes satisfy a specified design constraint. Another programmatic method synthesizes a processor array from the set of parallel processes and a specified design constraint.
    Type: Grant
    Filed: August 20, 1999
    Date of Patent: January 14, 2003
    Assignee: Hewlett-Packard Company
    Inventors: Robert S. Schreiber, B. Ramakrishna Rau, Shail Aditya Gupta, Vinod K. Kathail, Sadun Anik
  • Publication number: 20020120923
    Abstract: A method for software pipelining of irregular conditional control loops including pre-processing the loops so they can be safely software pipelined. The pre-processing step ensures that each original instruction in the loop body can be over-executed as many times as necessary. During the pre-processing stage, each instruction in the loop body is processing in turn (N4). If the instruction can be safely speculatively executed, it is left alone (N6). If it could be safely speculatively executed except that it modifies registers that are live out of the loop, then the instruction can be pre-processed using predication or register copying (N7, N8, N9). Otherwise, predication must be applied (N10). Predication is the process of guarding an instruction. When the guard condition is true, the instruction executes as though it were unguarded. When the guard condition is false, the instruction is nullified.
    Type: Application
    Filed: December 8, 2000
    Publication date: August 29, 2002
    Inventors: Elana D. Granston, Joseph Zbiciak, Eric J. Stotzer
  • Patent number: 6438747
    Abstract: A parallel compiler maps iterations of a nested loop to processor elements in a parallel array and schedules a start time for each iteration such that the processor elements are fully utilized without being overloaded. The compiler employs an efficient and direct method for generating a set of iteration schedules that satisfy the following constraints: no more than one iteration is in initiated per processor element in a specified initiation interval, and a new iteration begins on each processor element nearly every initiation interval. Since the iteration scheduling method efficiently generates a set of schedules, the compiler can select an iteration schedule that is optimized based on other criteria, such as memory bandwidth, local memory size of each processor element, estimated hardware cost of each processor element, etc.
    Type: Grant
    Filed: August 20, 1999
    Date of Patent: August 20, 2002
    Assignee: Hewlett-Packard Company
    Inventors: Robert S. Schreiber, Bantwal Ramakrishna Rau, Alain Darte
  • Patent number: 6421826
    Abstract: One embodiment of the present invention provides a system for compiling source code into executable code that performs prefetching for memory operations within regions of code that tend to generate cache misses. The system operates by compiling a source code module containing programming language instructions into an executable code module containing instructions suitable for execution by a processor. Next, the system runs the executable code module in a training mode on a representative workload and keeps statistics on cache miss rates for functions within the executable code module. These statistics are used to identify a set of “hot” functions that generate a large number of cache misses. Next, explicit prefetch instructions are scheduled in advance of memory operations within the set of hot functions.
    Type: Grant
    Filed: November 5, 1999
    Date of Patent: July 16, 2002
    Assignee: Sun Microsystems, Inc.
    Inventors: Nicolai Kosche, Peter C. Damron
  • Patent number: RE38365
    Abstract: In a parallel processor, a local area and an overlap area are assigned to the memory of each processing element (PE), and each PE makes calculations to update the data in both areas at the runtime. If the data in the overlap area is updated in processes closed in the PEs, the data transfer between adjacent PEs can be reduced and the parallel processes can be performed at a high speed.
    Type: Grant
    Filed: December 21, 1999
    Date of Patent: December 23, 2003
    Assignee: Fujitsu Limited
    Inventor: Tatsuya Shindo