Including Loop Patents (Class 717/160)
-
Patent number: 8671401Abstract: Described is a technology by which a series of loop nests corresponding to source code are detected by a compiler, with the series of loop nests tiled together, (thereby increasing the ratio of cache hits to misses in a multi-processor environment). The compiler transforms the series of loop nests into a plurality of tile loops within a controller loop, including using dependency analysis to determine which results from a tile loop need to be pre-computed before another tile loop. For dependency analysis, the compiler may use a directed acyclic graph as a high-level intermediate representation, and split the graph into sub-graphs each representing an array. The compiler uses descriptors processed from the graph to determine the controller loop and the tile loops within that controller loop.Type: GrantFiled: April 9, 2007Date of Patent: March 11, 2014Assignee: Microsoft CorporationInventors: Siddhartha Puri, Jaydeep P. Marathe
-
Patent number: 8640112Abstract: System and method for vectorizing combinations of program operations. Program code is received that includes a combination of individually vectorizable program portions that collectively implement a first computation. Each individually vectorizable program portion has at least one array input and at least one array output. The combination of individually vectorizable program portions is transformed into a single vectorizable program portion that is or includes a functional composition of the combination of individually vectorizable program portions. Vectorized executable code implementing the first computation is generated based on the single vectorizable program portion. The generated executable code is directed to SIMD (Single-Instruction-Multiple-Data) computing units of a target processor.Type: GrantFiled: March 30, 2011Date of Patent: January 28, 2014Assignee: National Instruments CorporationInventors: Haoran Yi, Brady C. Duggan, Robert E. Dye, Adam L. Bordelon, Jeffrey L. Kodosky
-
Patent number: 8635606Abstract: Technologies are generally described for runtime optimization adjusted dynamically according to changing costs of one or more system resources. Multicore systems may encounter dynamic variations in performance associated with the relative cost of related system resources. Furthermore, multicore systems can experience dramatic variations in resource availability and costs. A dynamic registry of system resource costs can be utilized to guide dynamic optimization. The relative scarcity of each resource can be updated dynamically within the registry of system resource costs. A runtime code generating loader and optimizer may be adapted to adjust optimization according to the resource cost registry. Information regarding system resource costs can support optimization tradeoffs based on resource cost functions.Type: GrantFiled: October 13, 2009Date of Patent: January 21, 2014Assignee: Empire Technology Development LLCInventor: Ezekiel John Joseph Kruglick
-
Patent number: 8627300Abstract: Technologies are generally described for parallel dynamic optimization using multicore processors. A runtime compiler may be adapted to generate multiple instances of executable code from a portable intermediate software module. The various instances of executable code may be generated with variations of optimization parameters such that the code instances each express different optimization attempts. A multicore processor may be leveraged to simultaneously execute some, or all, of the various code instances. Preferred optimization parameters may be determined from the executable code instances that may correctly complete in the least time, or may use the least amount of memory, or that may prove superior according to some other fitness metric. Preferred optimization parameters may be used to seed future optimization attempts. Output generated from the preferred instances may be used as soon as the first instance correctly completes block.Type: GrantFiled: October 13, 2009Date of Patent: January 7, 2014Assignee: Empire Technology Development LLCInventor: Ezekiel John Joseph Kruglick
-
Patent number: 8627304Abstract: A method for vectorization of a block of code is provided. The method comprises receiving a first block of code as input; and converting the first block of code into at least a second block of code and a third block of code. The first block of code accesses a first set of memory addresses that are potentially misaligned. The second block of code performs conditional leaping address incrementation to selectively access a first subset of the first set of memory addresses. The third block of code accesses a second subset of the first set of memory addresses starting from an aligned memory address, simultaneously accessing multiple memory addresses at a time. No memory address belongs to both the first subset and the second subset of memory addresses.Type: GrantFiled: July 28, 2009Date of Patent: January 7, 2014Assignee: International Business Machines CorporationInventors: Dorit Nuzman, Ira Rosen, Ayal Zaks
-
Device and method for automatically optimizing composite applications having orchestrated activities
Patent number: 8601454Abstract: A device (D) is intended for optimizing composite applications comprising at least two orchestrated activities participating to at least one process. This device (D) comprises i) an analyzing means (AM) arranged for determining orchestrated activities contained into a composite application to be optimized and dependencies between these activities, and ii) an optimizing means (OM) arranged for determining a new orchestration between the determined activities which allows the composite application to execute requests of users in a minimal time, according to the determined dependencies and to predefined rules, and for outputting an optimized composite application based on the new orchestration.Type: GrantFiled: December 12, 2008Date of Patent: December 3, 2013Assignee: Alcatel LucentInventor: Benoit Christophe -
Patent number: 8601459Abstract: A system and method for discovering a set of possible iteration sequences for a given loop in a software program is described, to transform the loop representation. In a program containing a loop, the loop is partitioned into a plurality of portions based on splitting criteria. Labels are associated with the portions, and an initial loop automaton is constructed that represents the loop iterations as a regular language over the labels corresponding to the portions in the program. Subsequences of the labels are analyzed to determine infeasibility of the subsequences permitted in the automaton. The automaton is refined by removing all infeasible subsequences to discover a set of possible iteration sequences in the loop. The resulting loop automaton is used in a subsequent program verification or analysis technique to find violations of correctness properties in programs.Type: GrantFiled: April 9, 2013Date of Patent: December 3, 2013Assignee: NEC Laboratories America, Inc.Inventors: Sriram Sankaranarayanan, Aarti Gupta, Gogul Balakrishnan
-
Patent number: 8589901Abstract: A system and method are configured to apply region level optimizations to a selected region of source code rather than loop level optimizations to a loop or loop nest. The region may include an outer loop, a plurality of inner loops and at least one control code. If the region includes an exceptional control flow statement and/or a procedure call, speculative region-level multi-versioning may be applied.Type: GrantFiled: December 22, 2010Date of Patent: November 19, 2013Inventors: Jin Lin, John L. Ng, Robert J. Cox, Xinmin Tian
-
Patent number: 8578358Abstract: A macroscalar processor architecture is described herein. In one embodiment, a processor receives instructions of a program loop having a vector block and a sequence block intended to be executed after the vector block, where the processor includes multiple slices and each of the slices is capable of executing an instruction of an iteration of the program loop substantially in parallel. For each iteration of the program loop, the processor executes an instruction of the sequence block using one of the slices while executing instructions of the vector block using a remainder of the slices substantially in parallel. Other methods and apparatuses are also described.Type: GrantFiled: November 17, 2011Date of Patent: November 5, 2013Assignee: Apple Inc.Inventor: Jeffry E. Gonion
-
Publication number: 20130290943Abstract: According to one embodiment, a code optimizer is configured to receive first code having a program loop implemented with scalar instructions to store values of a first array to a second array based on values of a third array and to generate second code representing the program loop using at least one vector instruction. The second code include a shuffle instruction to shuffle elements of the first array based on the third array using a shuffle table in a vector manner, a blend instruction to blend the shuffled elements of the first array using a blend table in a vector manner, and a store instruction to store the blended elements of the first array in the second array.Type: ApplicationFiled: December 15, 2011Publication date: October 31, 2013Applicant: Intel CorporationInventors: Tal Uliel, Elmoustapha Ould-Ahmedvall, Bret T. Toll
-
Patent number: 8555030Abstract: A device identifies array accesses of variables in a program code that includes multiple arrays, and identifies array access patterns for one of the array accesses. The device also determines an order of the array access patterns identified for the array accesses, and calculates, based on the order, distances between the array access patterns. The device further shares address calculations amongst the array accesses associated with array access patterns with one or more of the distances that are equivalent.Type: GrantFiled: July 14, 2011Date of Patent: October 8, 2013Assignee: Advanced Micro Devices, Inc.Inventors: Tim J. Wilkens, Michael C. Berg
-
Patent number: 8555267Abstract: A mechanism for performing register allocation based on priority spills and assignments is disclosed. A method of embodiments of the invention includes repetitively detecting fat points during a compilation process of a software program running on a virtual machine of a computer system, each fat point representing a program point having a high register pressure, the high register pressure occurs when a number of live program variables of the software program living at a given program point of the software program is greater than a number of available processor registers of the computer system. The method further includes choosing a fat point with a highest register pressure, selecting a live program variable having a lowest priority at the chosen fat point, and spilling the lowest priority live program variable to memory of the computer system.Type: GrantFiled: March 3, 2010Date of Patent: October 8, 2013Assignee: Red Hat, Inc.Inventor: Vladimir Makarov
-
Patent number: 8549507Abstract: A loop coalescing method and a loop coalescing device are disclosed. The loop coalescing method comprises removing an inner-most loop from among nested loops, so that an outer operation provided outside of the inner-most loop is performed when a condition of a conditional statement is satisfied, generating a guard code by applying an if-conversion method to the conditional statement, and converting a guard by using an instruction calculating the guard of the guard code, the instruction calculating the guard using a register where information related to a period of time corresponding to the number of iterations of the inner-most loop is stored.Type: GrantFiled: August 22, 2007Date of Patent: October 1, 2013Assignee: Samsung Electronics Co., Ltd.Inventors: Hee Seok Kim, Hong-Seok Kim, Chang-Woo Baek, Jeongwook Kim
-
Patent number: 8549508Abstract: A mechanism for performing instruction scheduling based on register pressure sensitivity is disclosed. A method of embodiments of the invention includes performing a preliminary register pressure minimization on program points during a compilation process of a software program running on a virtual machine of a computer system. The method further includes calculating a register pressure at each of the program points, detecting an instruction to be scheduled, and performing instruction scheduling of the instruction based on a current register pressure at a current scheduling point and potential register pressures at subsequent scheduling points.Type: GrantFiled: March 3, 2010Date of Patent: October 1, 2013Assignee: Red Hat, Inc.Inventor: Vladimir Makarov
-
Patent number: 8549501Abstract: Generating mixed-mode operations in the compilation of program code for processors having vector or SIMD processing units is disclosed. In a preferred embodiment of the present invention, program instructions making up the body of a loop are abstracted into virtual vector instructions. These virtual vector instructions are treated, for initial code optimization purposes, as vector instructions (i.e., instructions written for the vector unit). The virtual vector instructions are eventually expanded into native code for the target processor, at which time a determination is made for each virtual vector instruction as to whether to expand the virtual vector instruction into native vector instructions, into native scalar instructions, into calls to pre-defined library functions, or into a combination of these. A cost model is used to determine the optimal choice of expansion based on hardware/software constraints, performance costs/benefits, and other criteria.Type: GrantFiled: August 16, 2004Date of Patent: October 1, 2013Assignee: International Business Machines CorporationInventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu
-
Patent number: 8543993Abstract: A compiler compiling a source code and is implemented in a plurality of processor cores includes a parallel loop processing detection unit configured to detect from the source code a loop processing code for execution of an internal processing operation for a given number of repeating times, and an independent parallel loop processing code in the internal processing operation performed for each repetition to be concurrently processed, and a dynamic parallel conversion unit configured to generate a control core code for control of the number of repeating times in the parallel loop processing code and a parallel processing code for changing the number of repeating times corresponding to the control from the control core code.Type: GrantFiled: March 18, 2010Date of Patent: September 24, 2013Assignee: Fujitsu LimitedInventor: Koichiro Yamashita
-
Publication number: 20130227537Abstract: A system and method for discovering a set of possible iteration sequences for a given loop in a software program is described, to transform the loop representation. In a program containing a loop, the loop is partitioned into a plurality of portions based on splitting criteria. Labels are associated with the portions, and an initial loop automaton is constructed that represents the loop iterations as a regular language over the labels corresponding to the portions in the program. Subsequences of the labels are analyzed to determine infeasibility of the subsequences permitted in the automaton. The automaton is refined by removing all infeasible subsequences to discover a set of possible iteration sequences in the loop. The resulting loop automaton is used in a subsequent program verification or analysis technique to find violations of correctness properties in programs.Type: ApplicationFiled: April 9, 2013Publication date: August 29, 2013Applicant: NEC Laboratories America, Inc.Inventor: NEC Laboratories America, Inc.
-
Patent number: 8522226Abstract: A system and method for discovering a set of possible iteration sequences for a given loop in a software program is described, to transform the loop representation. In a program containing a loop, the loop is partitioned into a plurality of portions based on splitting criteria. Labels are associated with the portions, and an initial loop automaton is constructed that represents the loop iterations as a regular language over the labels corresponding to the portions in the program. Subsequences of the labels are analyzed to determine infeasibility of the subsequences permitted in the automaton. The automaton is refined by removing all infeasible subsequences to discover a set of possible iteration sequences in the loop. The resulting loop automaton is used in a subsequent program verification or analysis technique to find violations of correctness properties in programs.Type: GrantFiled: February 8, 2010Date of Patent: August 27, 2013Assignee: NEC Laboratories America, Inc.Inventors: Sriram Sankaranarayanan, Aarti Gupta, Gogul Balakrishnan
-
Patent number: 8516468Abstract: In one embodiment of the invention, a method for fusing a first loop nested in a first IF statement with a second loop nested in a second IF statement without the use of modified and referenced (mod-ref) information to determine if certain conditional statements in the IF statements retain variable values.Type: GrantFiled: June 30, 2008Date of Patent: August 20, 2013Assignee: Intel CorporationInventors: John L. Ng, Robert Cox, Dmitry V. Budanov
-
Patent number: 8505002Abstract: A data processing system is provided having a processor and analysing circuitry for identifying a SIMD instruction associated with a first SIMD instruction set and replacing it by a functionally-equivalent scalar representation and marking that functionally-equivalent scalar representation. The marked functionally-equivalent scalar representation is dynamically translated using translation circuitry upon execution of the program to generate one or more corresponding translated instructions corresponding to a instruction set architecture different from the first SIMD architecture corresponding to the identified SIMD instruction.Type: GrantFiled: September 27, 2007Date of Patent: August 6, 2013Assignees: ARM Limited, The Regents of the University of MichiganInventors: Sami Yehia, Krisztian Flautner, Nathan Clark, Amir Hormati, Scott Mahlke
-
Patent number: 8495606Abstract: A system performs operations comprising creating a call graph for a program translated from source code, identifying redundant exception handling code in the program utilizing the call graph, and removing the redundant exception handling code. The operation of identifying redundant exception handling code may comprise identifying at least one function or callsite by determining that a first function in the at least one function's or callsite's callee chain throws an exception and that the exception is handled by a second function in the function's or callsite's callee chain or by determining that an exception is not thrown in the at least one function's or callsite's callee chain. The operation of removing the redundant exception handling code may comprise removing redundant exception handling code included in at least one function or callsite and/or removing at least one entry for the at least one function or callsite from an exception lookup table.Type: GrantFiled: November 14, 2008Date of Patent: July 23, 2013Assignee: Oracle America, Inc.Inventors: Sheldon M. Lobo, Fu-Hwa Wang
-
Patent number: 8495607Abstract: Mechanisms for aggressively optimizing computer code are provided. With these mechanisms, a compiler determines an optimization to apply to a portion of source code and determines if the optimization as applied to the portion of source code will result in unsafe optimized code that introduces a new source of exceptions being generated by the optimized code. In response to a determination that the optimization is an unsafe optimization, the compiler generates an aggressively compiled code version, in which the unsafe optimization is applied, and a conservatively compiled code version in which the unsafe optimization is not applied. The compiler stores both versions and provides them for execution. Mechanisms are provided for switching between these versions during execution in the event of a failure of the aggressively compiled code version. Moreover, predictive mechanisms are provided for predicting whether such a failure is likely.Type: GrantFiled: March 1, 2010Date of Patent: July 23, 2013Assignee: International Business Machines CorporationInventor: Michael K. Gschwind
-
Patent number: 8484623Abstract: A method for determining the number and location of instrumentation probes to be inserted into a program is disclosed. The method advantageously inserts the minimum number of probes that are required to obtain execution coverage for every node in the program's control-flow graph. In addition, the method requires only type of node marking and one bit to store each probe, and does not require the assignment of weights to arcs or nodes of the control-flow graph. In the illustrative embodiment, the nodes of a control-flow graph are partitioned into non-empty sets, where each non-empty set corresponds to a super nested block of the program.Type: GrantFiled: September 29, 2008Date of Patent: July 9, 2013Assignee: Avaya, Inc.Inventors: Juan Jenny Li, David Mandel Weiss
-
Patent number: 8479185Abstract: A method for compiling application source code that includes selecting multiple loops for parallelization. The multiple loops include a first loop and a second loop. The method further includes partitioning the first loop into a first set of chunks, partitioning the second loop into a second set of chunks, and calculating data dependencies between the first set of chunks and the second set of chunks. A first chunk of the second set of chunks is dependent on a first chunk of the first set of chunks. The method further includes inserting, into the first loop and prior to completing compilation, a precedent synchronization instruction for execution when execution of the first chunk of the first set of chunks completes, and completing the compilation of the application source code to create an application compiled code.Type: GrantFiled: December 9, 2010Date of Patent: July 2, 2013Assignee: Oracle International CorporationInventors: Spiros Kalogeropulos, Partha P. Tirumalai
-
Patent number: 8479179Abstract: A method for compiling a program including a loop is provided. In the program, the loop includes K instructions (K>2) and repeats for M times (M>2). The compiling method comprises following steps: performing resource conflict analysis to the K instructions in the loop; dividing the K instructions in the loop into a first combined instruction section, a connection instruction section and a second combined instruction section, wherein there is no resource conflict between the instructions in the first combined instruction section and the instructions in the second combined instruction section respectively; and compiling the program, wherein the instructions in the first combined instruction section in the cycle N (N=2, 3, . . . M) and the instructions in the second combined instruction section in the cycle N?1 are combined to be compiled respectively. A compiling apparatus and a computer system for realizing the above-mentioned compiling method are further provided.Type: GrantFiled: December 7, 2005Date of Patent: July 2, 2013Assignee: St-Ericsson SAInventors: Fan Wu, Yanmeng Sun
-
Publication number: 20130167130Abstract: An illustrative embodiment of a computer-implemented process for shared data prefetching and coalescing optimization versions a loop containing one or more shared references into an optimized loop and an un-optimized loop, transforms the optimized loop into a set of loops, and stores shared access associated information of the loop using a prologue loop in the set of loops. The shared access associated information pertains to remote data and is collected using the prologue loop in absence of network communication and builds a hash table. An associated data structure is updated each time the hash table is entered, and is sorted to remove duplicate entries and create a reduced data structure. Patterns across entries of the reduced data structure are identified and entries are coalesced. Data associated with a coalesced entry is pre-fetched using a single communication and a local buffer is populated with the fetched data for reuse.Type: ApplicationFiled: October 24, 2012Publication date: June 27, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: International Business Machines Corporation
-
Patent number: 8473931Abstract: A method, system and program product for optimizing emulation of a suspected malware. The method includes identifying, using an emulation optimizer tool, whether an instruction in a suspected malware being emulated by an emulation engine in a virtual environment signifies a long loop and, if so, generating a first hash for the loop. Further, the method includes ascertaining whether the first hash generated matches any long loop entries in a storage and, if so calculating a second hash for the long loop. Furthermore, the method includes inspecting any long loop entries ascertained to find an entry having a respective second hash matching the second hash calculated. If an entry matching the second hash calculated is found, the method further includes updating one or more states of the emulation engine, such that, execution of the long loop of the suspected malware is skipped, which optimizes emulation of the suspected malware.Type: GrantFiled: March 20, 2012Date of Patent: June 25, 2013Assignee: International Business Machines CorporationInventor: Ji Yan Wu
-
Patent number: 8468508Abstract: An optimizing compiler device, a method, a computer program product which are capable of performing parallelization of irregular reductions. The method for performing parallelization of irregular reductions includes receiving, at a compiler, a program and selecting, at compile time, at least one unit of work (UW) from the program, each UW configured to operate on at least one reduction operation, where at least one reduction operation in the UW operates on a reduction variable whose address is determinable when running the program at a run-time. At run time, for each successive current UW, a list of reduction operations accessed by that unit of work is recorded. Further, it is determined at run time whether reduction operations accessed by a current UW conflict with any reduction operations recorded as having been accessed by prior selected units of work, and assigning the unit of work as a conflict free unit of work (CFUW) when no conflicts are found.Type: GrantFiled: October 9, 2009Date of Patent: June 18, 2013Assignee: International Business Machines CorporationInventors: Alexandre E. Eichenberger, Yangchun Luo, John K. O'Brien, Xiaotong Zhuang
-
Patent number: 8458685Abstract: A system and method of compiling program code, wherein the program code includes an operation on an array of data elements stored in memory of a computer system. The program code is scanned for an equation which may have recurring data points. The equation is then replaced with vectorized machine executable code, wherein the machine executable code comprises a nested loop and wherein the nested loop comprises an exterior loop and a virtual interior loop. The exterior loop decomposes the equation into a plurality of loops of length N, wherein N is an integer greater than one. The virtual interior loop executes vector operations corresponding to the N length loop to form a result vector resident in memory, wherein the virtual interior loop includes a vector atomic memory operation (AMO) instruction.Type: GrantFiled: June 12, 2009Date of Patent: June 4, 2013Assignee: Cray Inc.Inventor: Terry D. Greyzck
-
Patent number: 8458682Abstract: System and method for converting a class oriented data flow program to a structure oriented data flow program. A first data flow program is received, where the first data flow program is an object oriented program comprising instances of one or more classes, and wherein the first data flow program is executable to perform a first function. The first data flow program is automatically converted to a second data flow program, where the second data flow program does not include the instances of the one or more classes, and where the second data flow program is executable to perform the first function. The second data flow program is stored on a computer memory, where the second data flow program is configured to be deployed to a device, e.g., a programmable hardware element, and where the second data flow program is executable on the device to perform the first function.Type: GrantFiled: April 27, 2009Date of Patent: June 4, 2013Assignee: National Instruments CorporationInventors: Stephen R. Mercer, Akash B. Bhakta, Matthew E. Novacek
-
Patent number: 8453135Abstract: A compiler selects a nested loop within software code that includes an outer loop and an inner loop. The outer loop includes an outer induction variable and the inner loop includes an inner induction variable. The compiler identifies a computation included in the nested loop that generates an irregular array access, which includes an expression of both the outer induction variable and the inner induction variable. Next, the compiler identifies a redundant calculation for the computation based upon the outer induction variable and the inner induction variable, and generates a temporary variable to correspond with the redundant calculation. The compiler replaces the computation with the temporary variable in the nested loop and, in turn, compiles the nested loop with the included temporary variable.Type: GrantFiled: March 11, 2010Date of Patent: May 28, 2013Assignee: Freescale Semiconductor, Inc.Inventor: Abderrazek Zaafrani
-
Patent number: 8453134Abstract: Provided are a method, system, and article of manufacture improving data locality and parallelism by code replication and array contraction. Source code including an array of elements referenced using at least two indices is processed. The array is nested within multiple loops, wherein at least two of the loops perform iterations with respect to the indices of the array, wherein the index incremented in at least one innermost loop of the loops does not comprise a leftmost index in the array. The source code is transformed to object code by performing operations including fusing at least two innermost loops of the loops in object code generated by compiling the source code by replicating statements from at least one of the innermost loops into a fused innermost loop and performing loop interchange in the object code to have the fused innermost loop provide iterations with respect to the leftmost index in the array.Type: GrantFiled: June 4, 2008Date of Patent: May 28, 2013Assignee: Intel CorporationInventors: John L. Ng, Alexander Y. Ostanevich, Alexander L. Sushentsov
-
Patent number: 8453156Abstract: A method and system to balance the load of a task-based multi-threaded application on a platform. When the work required by the multi-threaded application is represented as a task with a computational requirement that is proportional to the amount of the work, embodiments of the invention control the recursive binary task division of the task using auxiliary partitions to create subtasks of balanced loads to enhance resource utilization and to improve application performance. The task is binary partitioned recursively into a plurality of subtasks until the plurality of subtasks is equal to the plurality of resources available on the platform to execute the subtasks.Type: GrantFiled: March 30, 2009Date of Patent: May 28, 2013Assignee: Intel CorporationInventors: Wooyoung Kim, Michael Joseph Voss
-
Publication number: 20130125105Abstract: Control flow information and data flow information associated with a program containing a upc_forall loop are built. A shared reference map data structure using the control flow information and the data flow information is created. All local shared accesses are hashed to facilitate a constant access stride after being rewritten. All local shared references in a hash entry having a longest list are privatized. The upc_forall loop is rewritten into a for loop. Responsive to a determination that an unprocessed upc_forall loop does not exist, dead store elimination is run. The control flow information and the data flow information associated with the program containing the for loop is rebuilt.Type: ApplicationFiled: November 15, 2011Publication date: May 16, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Yaoqing Gao, Liangxiao Hu, Raul Esteban Silvera, Ettore Tiotto
-
Publication number: 20130125104Abstract: According to one aspect of the present disclosure, a method and technique for reducing branch misprediction impact for nested loop code is disclosed. The method includes: responsive to identifying code having an outer loop and an inner loop, determining a quantity of iterations of the inner loop for an initial number of iterations of the outer loop; determining a number of processor cycles for executing the quantity of iterations of the inner loop for the initial number of iterations of the outer loop; determining whether the number of processor cycles is less than a threshold; and responsive to determining that the number of processor cycles is less than the threshold, fully unrolling the inner loop for the initial number of iterations of the outer loop.Type: ApplicationFiled: November 11, 2011Publication date: May 16, 2013Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Madhavi G. Valluri, Steven W. White
-
Patent number: 8443344Abstract: Approaches for generating a hardware definition from a program specified in a high-level language. In one approach, a first set of blocks of instructions in the high-level language program is identified. Each block in the first set is bounded by a respective loop designation in the high-level language. For each block in the first set, an associated respective second set of one or more blocks of the program is identified. Each block in the second set is outside the block in the first set. A hardware definition of the program is generated and stored. For each block in the first set, the hardware definition specifies power-reducing circuitry for one or more blocks in the associated second set. The power-reducing circuitry is controlled based on a status indication from the hardware definition of the block in the first set.Type: GrantFiled: September 25, 2008Date of Patent: May 14, 2013Assignee: Xilinx, Inc.Inventors: Prasanna Sundararajan, Tim Tuan
-
Patent number: 8443351Abstract: The subject disclosure pertains broadly to parallelization of workflow loops. More specifically, loop containers and related elements are cloned several times to match a desired number of parallel iterations or threads. The cloned containers are communicatively coupled or connected to a single enumerator component and can interact therewith to facilitate acquisition of collection elements. This arrangement, among other things, ensures that the correct number of iterations are executed as if the loop was processed sequentially.Type: GrantFiled: February 23, 2006Date of Patent: May 14, 2013Assignee: Microsoft CorporationInventors: J. Kirk Haselden, Sergei Ivanov
-
Publication number: 20130117737Abstract: One embodiment of the present invention sets forth a technique for reducing sign-extension instructions (SEIs) included in a computer program, the technique involves receiving intermediate code that is associated with the computer program and includes a first SEI that is included in a loop structure within the computer program, determining that the first SEI is eligible to be moved outside of the loop structure, inserting into a preheader of the loop a second SEI that, when executed by a processor, promotes an original value targeted by the first SEI from a smaller type to a larger type, and replacing the first SEI with one or more intermediate instructions that are eligible for additional compiler optimizations.Type: ApplicationFiled: October 26, 2012Publication date: May 9, 2013Applicant: NVIDIA CORPORATIONInventor: NVIDIA Corporation
-
Patent number: 8434076Abstract: A system which combines sequential and iterative source code is provided. The system decides which type of processing would be most suitable for all portions of the source code, regardless of type. The system can adjust that decision based on the specific nature of the constructs within the source code, and can also adjust that decision based on the platform upon which the resulting executable program will run.Type: GrantFiled: December 12, 2007Date of Patent: April 30, 2013Assignee: Oracle International CorporationInventors: Anguel Novoselsky, Zhen Hua Liu
-
Patent number: 8429625Abstract: A method and system for processing generic formatted data, including first data describing a sequence of generic operations without any loops, in view of providing specific formatted data, for a determined platform including Q processor(s) and at least one memory, the platform configured to process, according, directly or indirectly, to specific formatted data, an object made up of elementary information of same type, each elementary information being represented by at least one numerical value.Type: GrantFiled: December 19, 2006Date of Patent: April 23, 2013Assignee: DXO LabsInventor: Bruno Liege
-
Patent number: 8418156Abstract: Generally, the present disclosure provides systems and methods to generate a two-stage commit (TSC) region which has two separate commit stages. Frequently executed code may be identified and combined for the TSC region. Binary optimization operations may be performed on the TSC region to enable the code to run more efficiently by, for example, reordering load and store instructions. In the first stage, load operations in the region may be committed atomically and in the second stage, store operations in the region may be committed atomically.Type: GrantFiled: December 16, 2009Date of Patent: April 9, 2013Assignee: Intel CorporationInventors: Cheng Wang, Youfeng Wu
-
Patent number: 8413127Abstract: A mechanism for minimizing effective memory latency without unnecessary cost through fine-grained software-directed data prefetching using integrated high-level and low-level code analysis and optimizations is provided. The mechanism identifies and classifies streams, identifies data that is most likely to incur a cache miss, exploits effective hardware prefetching to determine the proper number of streams to be prefetched, exploits effective data prefetching on different types of streams in order to eliminate redundant prefetching and avoid cache pollution, and uses high-level transformations with integrated lower level cost analysis in the instruction scheduler to schedule prefetch instructions effectively.Type: GrantFiled: December 22, 2009Date of Patent: April 2, 2013Assignee: International Business Machines CorporationInventors: Roch G. Archambault, Robert J. Blainey, Yaoqing Gao, Allan R. Martin, James L. McInnes, Francis Patrick O'Connell
-
Patent number: 8412914Abstract: A method for aggregating a program loop in a Macroscalar architecture includes identifying one or more instructions of the program loop having a branch instruction that causes the program loop to branch dependent upon a predicate condition after a memory write operation. The method also includes modifying at least one of the one or more instructions to cause a processor executing the one or more instructions to branch after the memory write operation executed as a vector block for iterations prior to and including an iteration during which the predicate condition is satisfied.Type: GrantFiled: November 17, 2011Date of Patent: April 2, 2013Assignee: Apple Inc.Inventor: Jeffry E. Gonion
-
Patent number: 8402447Abstract: Various technologies and techniques are disclosed for transforming a sequential loop into a parallel loop for use with a transactional memory system. Open ended and/or closed ended sequential loops can be transformed to parallel loops. For example, a section of code containing an original sequential loop is analyzed to determine a fixed number of iterations for the original sequential loop. The original sequential loop is transformed into a parallel loop that can generate transactions in an amount up to the fixed number of iterations. As another example, an open ended sequential loop can be transformed into a parallel loop that generates a separate transaction containing a respective work item for each iteration of a speculation pipeline. The parallel loop is then executed using the transactional memory system, with at least some of the separate transactions being executed on different threads.Type: GrantFiled: July 25, 2011Date of Patent: March 19, 2013Assignee: Microsoft CorporationInventors: John Joseph Duffy, Jan Gray, Yosseff Levanoni
-
Patent number: 8402450Abstract: A high level programming language provides a map transformation that takes a data parallel algorithm and a set of one or more input indexable types as arguments. The map transformation applies the data parallel algorithm to the set of input indexable types to generate an output indexable type, and returns the output indexable type. The map transformation may be used to fuse one or more data parallel algorithms with another data parallel algorithm.Type: GrantFiled: November 17, 2010Date of Patent: March 19, 2013Assignee: Microsoft CorporationInventors: Paul F. Ringseth, Yosseff Levanoni, Weirong Zhu
-
Patent number: 8387036Abstract: A method for executing a computer program involving obtaining a statement of the source code, where the statement comprises a method call, and where the source code is composed in a statically-typed programming language. The method also involves, upon entry into a loop included in the computer program: incrementing an entry counter by one; and, for each iteration of the loop, incrementing an iteration counter by one, incrementing a local counter by one to obtain an incremented value of the local counter, incrementing a summation variable by the incremented value of the local counter, and executing the iteration of the loop.Type: GrantFiled: January 27, 2010Date of Patent: February 26, 2013Assignee: Oracle America, Inc.Inventor: John Rose
-
Patent number: 8375375Abstract: A method and system of auto parallelization of zero-trip loops that substitutes a nested basic linear induction variable by exploiting a parallelizing compiler is provided. Provided is a use of a max{0,N} variable for loop iterations in case of no information is known about the value of N, for a typical loop iterating from 1 to N, in which N is the loop invariant. For the nested basic induction variables, an induction variable substitution process is applied to the nested loops starting from the innermost loop to the outermost one. Then a removal of the max operator afterwards through a copy propagation pass of the IBM compiler is provided. In doing so, the loop dependency on the induction variable is eliminated and an opportunity for a parallelizing compiler to parallel the outermost loop is provided.Type: GrantFiled: January 21, 2009Date of Patent: February 12, 2013Assignee: International Business Machines CorporationInventors: Zhixing Ren, Raul Esteban Silvera, Guansong Zhang
-
Patent number: 8359587Abstract: A compilation method and mechanism for parallelizing program code. A method for compilation includes analyzing source code and identifying candidate code for parallelization. The method includes parallelizing the candidate code, in response to determining said profitability meets a predetermined criteria; and generating object code corresponding to the source code. The generated object code includes both a non-parallelized version of the candidate code and a parallelized version of the candidate code. During execution of the object code, a dynamic selection between execution of the non-parallelized version of the candidate code and the parallelized version of the candidate code is made. Changing execution from said parallelized version of the candidate code to the non-parallelized version of the candidate code, may be in response to determining a transaction failure count meets a pre-determined threshold.Type: GrantFiled: May 1, 2008Date of Patent: January 22, 2013Assignee: Oracle America, Inc.Inventors: Yonghong Song, Spiros Kalogeropulos, Partha P. Tirumalai
-
Publication number: 20120331453Abstract: A method for vectorization of a block of code is provided. The method comprises receiving a first block of code as input; and converting the first block of code into at least a second block of code and a third block of code. The first block of code accesses a first set of memory addresses that are potentially misaligned. The second block of code performs conditional leaping address incrementation to selectively access a first subset of the first set of memory addresses. The third block of code accesses a second subset of the first set of memory addresses starting from an aligned memory address, simultaneously accessing multiple memory addresses at a time. No memory address belongs to both the first subset and the second subset of memory addresses.Type: ApplicationFiled: September 7, 2012Publication date: December 27, 2012Applicant: INTERNATIONAL BUSINESS MACHINESInventors: Dorit Nuzman, Ira Rosen, Ayal Zaks
-
Patent number: 8327344Abstract: Mechanisms are provided for analyzing and optimizing loops with conditional control flow in source code based on array reference safety. Mechanisms are provided for analyzing blocks of the source code to identify a conditional control flow loop having loop source code specifying a total access range for an array reference. A safe access range, of the total access range of the array reference in the loop source code, is identified over which a compiler-based optimization of the loop source code can be safely applied without introducing new exception conditions. The compiler-based optimization of the loop source code is performed based on the identified safe access range to generate optimized code. The optimized code is output for generation of executable code for execution on a processor.Type: GrantFiled: October 14, 2008Date of Patent: December 4, 2012Assignee: International Business Machines CorporationInventor: Michael K. Gschwind