Including Loop Patents (Class 717/160)
  • Patent number: 8640112
    Abstract: System and method for vectorizing combinations of program operations. Program code is received that includes a combination of individually vectorizable program portions that collectively implement a first computation. Each individually vectorizable program portion has at least one array input and at least one array output. The combination of individually vectorizable program portions is transformed into a single vectorizable program portion that is or includes a functional composition of the combination of individually vectorizable program portions. Vectorized executable code implementing the first computation is generated based on the single vectorizable program portion. The generated executable code is directed to SIMD (Single-Instruction-Multiple-Data) computing units of a target processor.
    Type: Grant
    Filed: March 30, 2011
    Date of Patent: January 28, 2014
    Assignee: National Instruments Corporation
    Inventors: Haoran Yi, Brady C. Duggan, Robert E. Dye, Adam L. Bordelon, Jeffrey L. Kodosky
  • Patent number: 8635606
    Abstract: Technologies are generally described for runtime optimization adjusted dynamically according to changing costs of one or more system resources. Multicore systems may encounter dynamic variations in performance associated with the relative cost of related system resources. Furthermore, multicore systems can experience dramatic variations in resource availability and costs. A dynamic registry of system resource costs can be utilized to guide dynamic optimization. The relative scarcity of each resource can be updated dynamically within the registry of system resource costs. A runtime code generating loader and optimizer may be adapted to adjust optimization according to the resource cost registry. Information regarding system resource costs can support optimization tradeoffs based on resource cost functions.
    Type: Grant
    Filed: October 13, 2009
    Date of Patent: January 21, 2014
    Assignee: Empire Technology Development LLC
    Inventor: Ezekiel John Joseph Kruglick
  • Patent number: 8627300
    Abstract: Technologies are generally described for parallel dynamic optimization using multicore processors. A runtime compiler may be adapted to generate multiple instances of executable code from a portable intermediate software module. The various instances of executable code may be generated with variations of optimization parameters such that the code instances each express different optimization attempts. A multicore processor may be leveraged to simultaneously execute some, or all, of the various code instances. Preferred optimization parameters may be determined from the executable code instances that may correctly complete in the least time, or may use the least amount of memory, or that may prove superior according to some other fitness metric. Preferred optimization parameters may be used to seed future optimization attempts. Output generated from the preferred instances may be used as soon as the first instance correctly completes block.
    Type: Grant
    Filed: October 13, 2009
    Date of Patent: January 7, 2014
    Assignee: Empire Technology Development LLC
    Inventor: Ezekiel John Joseph Kruglick
  • Patent number: 8627304
    Abstract: A method for vectorization of a block of code is provided. The method comprises receiving a first block of code as input; and converting the first block of code into at least a second block of code and a third block of code. The first block of code accesses a first set of memory addresses that are potentially misaligned. The second block of code performs conditional leaping address incrementation to selectively access a first subset of the first set of memory addresses. The third block of code accesses a second subset of the first set of memory addresses starting from an aligned memory address, simultaneously accessing multiple memory addresses at a time. No memory address belongs to both the first subset and the second subset of memory addresses.
    Type: Grant
    Filed: July 28, 2009
    Date of Patent: January 7, 2014
    Assignee: International Business Machines Corporation
    Inventors: Dorit Nuzman, Ira Rosen, Ayal Zaks
  • Patent number: 8601454
    Abstract: A device (D) is intended for optimizing composite applications comprising at least two orchestrated activities participating to at least one process. This device (D) comprises i) an analyzing means (AM) arranged for determining orchestrated activities contained into a composite application to be optimized and dependencies between these activities, and ii) an optimizing means (OM) arranged for determining a new orchestration between the determined activities which allows the composite application to execute requests of users in a minimal time, according to the determined dependencies and to predefined rules, and for outputting an optimized composite application based on the new orchestration.
    Type: Grant
    Filed: December 12, 2008
    Date of Patent: December 3, 2013
    Assignee: Alcatel Lucent
    Inventor: Benoit Christophe
  • Patent number: 8601459
    Abstract: A system and method for discovering a set of possible iteration sequences for a given loop in a software program is described, to transform the loop representation. In a program containing a loop, the loop is partitioned into a plurality of portions based on splitting criteria. Labels are associated with the portions, and an initial loop automaton is constructed that represents the loop iterations as a regular language over the labels corresponding to the portions in the program. Subsequences of the labels are analyzed to determine infeasibility of the subsequences permitted in the automaton. The automaton is refined by removing all infeasible subsequences to discover a set of possible iteration sequences in the loop. The resulting loop automaton is used in a subsequent program verification or analysis technique to find violations of correctness properties in programs.
    Type: Grant
    Filed: April 9, 2013
    Date of Patent: December 3, 2013
    Assignee: NEC Laboratories America, Inc.
    Inventors: Sriram Sankaranarayanan, Aarti Gupta, Gogul Balakrishnan
  • Patent number: 8589901
    Abstract: A system and method are configured to apply region level optimizations to a selected region of source code rather than loop level optimizations to a loop or loop nest. The region may include an outer loop, a plurality of inner loops and at least one control code. If the region includes an exceptional control flow statement and/or a procedure call, speculative region-level multi-versioning may be applied.
    Type: Grant
    Filed: December 22, 2010
    Date of Patent: November 19, 2013
    Inventors: Jin Lin, John L. Ng, Robert J. Cox, Xinmin Tian
  • Patent number: 8578358
    Abstract: A macroscalar processor architecture is described herein. In one embodiment, a processor receives instructions of a program loop having a vector block and a sequence block intended to be executed after the vector block, where the processor includes multiple slices and each of the slices is capable of executing an instruction of an iteration of the program loop substantially in parallel. For each iteration of the program loop, the processor executes an instruction of the sequence block using one of the slices while executing instructions of the vector block using a remainder of the slices substantially in parallel. Other methods and apparatuses are also described.
    Type: Grant
    Filed: November 17, 2011
    Date of Patent: November 5, 2013
    Assignee: Apple Inc.
    Inventor: Jeffry E. Gonion
  • Publication number: 20130290943
    Abstract: According to one embodiment, a code optimizer is configured to receive first code having a program loop implemented with scalar instructions to store values of a first array to a second array based on values of a third array and to generate second code representing the program loop using at least one vector instruction. The second code include a shuffle instruction to shuffle elements of the first array based on the third array using a shuffle table in a vector manner, a blend instruction to blend the shuffled elements of the first array using a blend table in a vector manner, and a store instruction to store the blended elements of the first array in the second array.
    Type: Application
    Filed: December 15, 2011
    Publication date: October 31, 2013
    Applicant: Intel Corporation
    Inventors: Tal Uliel, Elmoustapha Ould-Ahmedvall, Bret T. Toll
  • Patent number: 8555030
    Abstract: A device identifies array accesses of variables in a program code that includes multiple arrays, and identifies array access patterns for one of the array accesses. The device also determines an order of the array access patterns identified for the array accesses, and calculates, based on the order, distances between the array access patterns. The device further shares address calculations amongst the array accesses associated with array access patterns with one or more of the distances that are equivalent.
    Type: Grant
    Filed: July 14, 2011
    Date of Patent: October 8, 2013
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Tim J. Wilkens, Michael C. Berg
  • Patent number: 8555267
    Abstract: A mechanism for performing register allocation based on priority spills and assignments is disclosed. A method of embodiments of the invention includes repetitively detecting fat points during a compilation process of a software program running on a virtual machine of a computer system, each fat point representing a program point having a high register pressure, the high register pressure occurs when a number of live program variables of the software program living at a given program point of the software program is greater than a number of available processor registers of the computer system. The method further includes choosing a fat point with a highest register pressure, selecting a live program variable having a lowest priority at the chosen fat point, and spilling the lowest priority live program variable to memory of the computer system.
    Type: Grant
    Filed: March 3, 2010
    Date of Patent: October 8, 2013
    Assignee: Red Hat, Inc.
    Inventor: Vladimir Makarov
  • Patent number: 8549507
    Abstract: A loop coalescing method and a loop coalescing device are disclosed. The loop coalescing method comprises removing an inner-most loop from among nested loops, so that an outer operation provided outside of the inner-most loop is performed when a condition of a conditional statement is satisfied, generating a guard code by applying an if-conversion method to the conditional statement, and converting a guard by using an instruction calculating the guard of the guard code, the instruction calculating the guard using a register where information related to a period of time corresponding to the number of iterations of the inner-most loop is stored.
    Type: Grant
    Filed: August 22, 2007
    Date of Patent: October 1, 2013
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Hee Seok Kim, Hong-Seok Kim, Chang-Woo Baek, Jeongwook Kim
  • Patent number: 8549508
    Abstract: A mechanism for performing instruction scheduling based on register pressure sensitivity is disclosed. A method of embodiments of the invention includes performing a preliminary register pressure minimization on program points during a compilation process of a software program running on a virtual machine of a computer system. The method further includes calculating a register pressure at each of the program points, detecting an instruction to be scheduled, and performing instruction scheduling of the instruction based on a current register pressure at a current scheduling point and potential register pressures at subsequent scheduling points.
    Type: Grant
    Filed: March 3, 2010
    Date of Patent: October 1, 2013
    Assignee: Red Hat, Inc.
    Inventor: Vladimir Makarov
  • Patent number: 8549501
    Abstract: Generating mixed-mode operations in the compilation of program code for processors having vector or SIMD processing units is disclosed. In a preferred embodiment of the present invention, program instructions making up the body of a loop are abstracted into virtual vector instructions. These virtual vector instructions are treated, for initial code optimization purposes, as vector instructions (i.e., instructions written for the vector unit). The virtual vector instructions are eventually expanded into native code for the target processor, at which time a determination is made for each virtual vector instruction as to whether to expand the virtual vector instruction into native vector instructions, into native scalar instructions, into calls to pre-defined library functions, or into a combination of these. A cost model is used to determine the optimal choice of expansion based on hardware/software constraints, performance costs/benefits, and other criteria.
    Type: Grant
    Filed: August 16, 2004
    Date of Patent: October 1, 2013
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu
  • Patent number: 8543993
    Abstract: A compiler compiling a source code and is implemented in a plurality of processor cores includes a parallel loop processing detection unit configured to detect from the source code a loop processing code for execution of an internal processing operation for a given number of repeating times, and an independent parallel loop processing code in the internal processing operation performed for each repetition to be concurrently processed, and a dynamic parallel conversion unit configured to generate a control core code for control of the number of repeating times in the parallel loop processing code and a parallel processing code for changing the number of repeating times corresponding to the control from the control core code.
    Type: Grant
    Filed: March 18, 2010
    Date of Patent: September 24, 2013
    Assignee: Fujitsu Limited
    Inventor: Koichiro Yamashita
  • Publication number: 20130227537
    Abstract: A system and method for discovering a set of possible iteration sequences for a given loop in a software program is described, to transform the loop representation. In a program containing a loop, the loop is partitioned into a plurality of portions based on splitting criteria. Labels are associated with the portions, and an initial loop automaton is constructed that represents the loop iterations as a regular language over the labels corresponding to the portions in the program. Subsequences of the labels are analyzed to determine infeasibility of the subsequences permitted in the automaton. The automaton is refined by removing all infeasible subsequences to discover a set of possible iteration sequences in the loop. The resulting loop automaton is used in a subsequent program verification or analysis technique to find violations of correctness properties in programs.
    Type: Application
    Filed: April 9, 2013
    Publication date: August 29, 2013
    Applicant: NEC Laboratories America, Inc.
    Inventor: NEC Laboratories America, Inc.
  • Patent number: 8522226
    Abstract: A system and method for discovering a set of possible iteration sequences for a given loop in a software program is described, to transform the loop representation. In a program containing a loop, the loop is partitioned into a plurality of portions based on splitting criteria. Labels are associated with the portions, and an initial loop automaton is constructed that represents the loop iterations as a regular language over the labels corresponding to the portions in the program. Subsequences of the labels are analyzed to determine infeasibility of the subsequences permitted in the automaton. The automaton is refined by removing all infeasible subsequences to discover a set of possible iteration sequences in the loop. The resulting loop automaton is used in a subsequent program verification or analysis technique to find violations of correctness properties in programs.
    Type: Grant
    Filed: February 8, 2010
    Date of Patent: August 27, 2013
    Assignee: NEC Laboratories America, Inc.
    Inventors: Sriram Sankaranarayanan, Aarti Gupta, Gogul Balakrishnan
  • Patent number: 8516468
    Abstract: In one embodiment of the invention, a method for fusing a first loop nested in a first IF statement with a second loop nested in a second IF statement without the use of modified and referenced (mod-ref) information to determine if certain conditional statements in the IF statements retain variable values.
    Type: Grant
    Filed: June 30, 2008
    Date of Patent: August 20, 2013
    Assignee: Intel Corporation
    Inventors: John L. Ng, Robert Cox, Dmitry V. Budanov
  • Patent number: 8505002
    Abstract: A data processing system is provided having a processor and analysing circuitry for identifying a SIMD instruction associated with a first SIMD instruction set and replacing it by a functionally-equivalent scalar representation and marking that functionally-equivalent scalar representation. The marked functionally-equivalent scalar representation is dynamically translated using translation circuitry upon execution of the program to generate one or more corresponding translated instructions corresponding to a instruction set architecture different from the first SIMD architecture corresponding to the identified SIMD instruction.
    Type: Grant
    Filed: September 27, 2007
    Date of Patent: August 6, 2013
    Assignees: ARM Limited, The Regents of the University of Michigan
    Inventors: Sami Yehia, Krisztian Flautner, Nathan Clark, Amir Hormati, Scott Mahlke
  • Patent number: 8495606
    Abstract: A system performs operations comprising creating a call graph for a program translated from source code, identifying redundant exception handling code in the program utilizing the call graph, and removing the redundant exception handling code. The operation of identifying redundant exception handling code may comprise identifying at least one function or callsite by determining that a first function in the at least one function's or callsite's callee chain throws an exception and that the exception is handled by a second function in the function's or callsite's callee chain or by determining that an exception is not thrown in the at least one function's or callsite's callee chain. The operation of removing the redundant exception handling code may comprise removing redundant exception handling code included in at least one function or callsite and/or removing at least one entry for the at least one function or callsite from an exception lookup table.
    Type: Grant
    Filed: November 14, 2008
    Date of Patent: July 23, 2013
    Assignee: Oracle America, Inc.
    Inventors: Sheldon M. Lobo, Fu-Hwa Wang
  • Patent number: 8495607
    Abstract: Mechanisms for aggressively optimizing computer code are provided. With these mechanisms, a compiler determines an optimization to apply to a portion of source code and determines if the optimization as applied to the portion of source code will result in unsafe optimized code that introduces a new source of exceptions being generated by the optimized code. In response to a determination that the optimization is an unsafe optimization, the compiler generates an aggressively compiled code version, in which the unsafe optimization is applied, and a conservatively compiled code version in which the unsafe optimization is not applied. The compiler stores both versions and provides them for execution. Mechanisms are provided for switching between these versions during execution in the event of a failure of the aggressively compiled code version. Moreover, predictive mechanisms are provided for predicting whether such a failure is likely.
    Type: Grant
    Filed: March 1, 2010
    Date of Patent: July 23, 2013
    Assignee: International Business Machines Corporation
    Inventor: Michael K. Gschwind
  • Patent number: 8484623
    Abstract: A method for determining the number and location of instrumentation probes to be inserted into a program is disclosed. The method advantageously inserts the minimum number of probes that are required to obtain execution coverage for every node in the program's control-flow graph. In addition, the method requires only type of node marking and one bit to store each probe, and does not require the assignment of weights to arcs or nodes of the control-flow graph. In the illustrative embodiment, the nodes of a control-flow graph are partitioned into non-empty sets, where each non-empty set corresponds to a super nested block of the program.
    Type: Grant
    Filed: September 29, 2008
    Date of Patent: July 9, 2013
    Assignee: Avaya, Inc.
    Inventors: Juan Jenny Li, David Mandel Weiss
  • Patent number: 8479185
    Abstract: A method for compiling application source code that includes selecting multiple loops for parallelization. The multiple loops include a first loop and a second loop. The method further includes partitioning the first loop into a first set of chunks, partitioning the second loop into a second set of chunks, and calculating data dependencies between the first set of chunks and the second set of chunks. A first chunk of the second set of chunks is dependent on a first chunk of the first set of chunks. The method further includes inserting, into the first loop and prior to completing compilation, a precedent synchronization instruction for execution when execution of the first chunk of the first set of chunks completes, and completing the compilation of the application source code to create an application compiled code.
    Type: Grant
    Filed: December 9, 2010
    Date of Patent: July 2, 2013
    Assignee: Oracle International Corporation
    Inventors: Spiros Kalogeropulos, Partha P. Tirumalai
  • Patent number: 8479179
    Abstract: A method for compiling a program including a loop is provided. In the program, the loop includes K instructions (K>2) and repeats for M times (M>2). The compiling method comprises following steps: performing resource conflict analysis to the K instructions in the loop; dividing the K instructions in the loop into a first combined instruction section, a connection instruction section and a second combined instruction section, wherein there is no resource conflict between the instructions in the first combined instruction section and the instructions in the second combined instruction section respectively; and compiling the program, wherein the instructions in the first combined instruction section in the cycle N (N=2, 3, . . . M) and the instructions in the second combined instruction section in the cycle N?1 are combined to be compiled respectively. A compiling apparatus and a computer system for realizing the above-mentioned compiling method are further provided.
    Type: Grant
    Filed: December 7, 2005
    Date of Patent: July 2, 2013
    Assignee: St-Ericsson SA
    Inventors: Fan Wu, Yanmeng Sun
  • Publication number: 20130167130
    Abstract: An illustrative embodiment of a computer-implemented process for shared data prefetching and coalescing optimization versions a loop containing one or more shared references into an optimized loop and an un-optimized loop, transforms the optimized loop into a set of loops, and stores shared access associated information of the loop using a prologue loop in the set of loops. The shared access associated information pertains to remote data and is collected using the prologue loop in absence of network communication and builds a hash table. An associated data structure is updated each time the hash table is entered, and is sorted to remove duplicate entries and create a reduced data structure. Patterns across entries of the reduced data structure are identified and entries are coalesced. Data associated with a coalesced entry is pre-fetched using a single communication and a local buffer is populated with the fetched data for reuse.
    Type: Application
    Filed: October 24, 2012
    Publication date: June 27, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: International Business Machines Corporation
  • Patent number: 8473931
    Abstract: A method, system and program product for optimizing emulation of a suspected malware. The method includes identifying, using an emulation optimizer tool, whether an instruction in a suspected malware being emulated by an emulation engine in a virtual environment signifies a long loop and, if so, generating a first hash for the loop. Further, the method includes ascertaining whether the first hash generated matches any long loop entries in a storage and, if so calculating a second hash for the long loop. Furthermore, the method includes inspecting any long loop entries ascertained to find an entry having a respective second hash matching the second hash calculated. If an entry matching the second hash calculated is found, the method further includes updating one or more states of the emulation engine, such that, execution of the long loop of the suspected malware is skipped, which optimizes emulation of the suspected malware.
    Type: Grant
    Filed: March 20, 2012
    Date of Patent: June 25, 2013
    Assignee: International Business Machines Corporation
    Inventor: Ji Yan Wu
  • Patent number: 8468508
    Abstract: An optimizing compiler device, a method, a computer program product which are capable of performing parallelization of irregular reductions. The method for performing parallelization of irregular reductions includes receiving, at a compiler, a program and selecting, at compile time, at least one unit of work (UW) from the program, each UW configured to operate on at least one reduction operation, where at least one reduction operation in the UW operates on a reduction variable whose address is determinable when running the program at a run-time. At run time, for each successive current UW, a list of reduction operations accessed by that unit of work is recorded. Further, it is determined at run time whether reduction operations accessed by a current UW conflict with any reduction operations recorded as having been accessed by prior selected units of work, and assigning the unit of work as a conflict free unit of work (CFUW) when no conflicts are found.
    Type: Grant
    Filed: October 9, 2009
    Date of Patent: June 18, 2013
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, Yangchun Luo, John K. O'Brien, Xiaotong Zhuang
  • Patent number: 8458685
    Abstract: A system and method of compiling program code, wherein the program code includes an operation on an array of data elements stored in memory of a computer system. The program code is scanned for an equation which may have recurring data points. The equation is then replaced with vectorized machine executable code, wherein the machine executable code comprises a nested loop and wherein the nested loop comprises an exterior loop and a virtual interior loop. The exterior loop decomposes the equation into a plurality of loops of length N, wherein N is an integer greater than one. The virtual interior loop executes vector operations corresponding to the N length loop to form a result vector resident in memory, wherein the virtual interior loop includes a vector atomic memory operation (AMO) instruction.
    Type: Grant
    Filed: June 12, 2009
    Date of Patent: June 4, 2013
    Assignee: Cray Inc.
    Inventor: Terry D. Greyzck
  • Patent number: 8458682
    Abstract: System and method for converting a class oriented data flow program to a structure oriented data flow program. A first data flow program is received, where the first data flow program is an object oriented program comprising instances of one or more classes, and wherein the first data flow program is executable to perform a first function. The first data flow program is automatically converted to a second data flow program, where the second data flow program does not include the instances of the one or more classes, and where the second data flow program is executable to perform the first function. The second data flow program is stored on a computer memory, where the second data flow program is configured to be deployed to a device, e.g., a programmable hardware element, and where the second data flow program is executable on the device to perform the first function.
    Type: Grant
    Filed: April 27, 2009
    Date of Patent: June 4, 2013
    Assignee: National Instruments Corporation
    Inventors: Stephen R. Mercer, Akash B. Bhakta, Matthew E. Novacek
  • Patent number: 8453135
    Abstract: A compiler selects a nested loop within software code that includes an outer loop and an inner loop. The outer loop includes an outer induction variable and the inner loop includes an inner induction variable. The compiler identifies a computation included in the nested loop that generates an irregular array access, which includes an expression of both the outer induction variable and the inner induction variable. Next, the compiler identifies a redundant calculation for the computation based upon the outer induction variable and the inner induction variable, and generates a temporary variable to correspond with the redundant calculation. The compiler replaces the computation with the temporary variable in the nested loop and, in turn, compiles the nested loop with the included temporary variable.
    Type: Grant
    Filed: March 11, 2010
    Date of Patent: May 28, 2013
    Assignee: Freescale Semiconductor, Inc.
    Inventor: Abderrazek Zaafrani
  • Patent number: 8453134
    Abstract: Provided are a method, system, and article of manufacture improving data locality and parallelism by code replication and array contraction. Source code including an array of elements referenced using at least two indices is processed. The array is nested within multiple loops, wherein at least two of the loops perform iterations with respect to the indices of the array, wherein the index incremented in at least one innermost loop of the loops does not comprise a leftmost index in the array. The source code is transformed to object code by performing operations including fusing at least two innermost loops of the loops in object code generated by compiling the source code by replicating statements from at least one of the innermost loops into a fused innermost loop and performing loop interchange in the object code to have the fused innermost loop provide iterations with respect to the leftmost index in the array.
    Type: Grant
    Filed: June 4, 2008
    Date of Patent: May 28, 2013
    Assignee: Intel Corporation
    Inventors: John L. Ng, Alexander Y. Ostanevich, Alexander L. Sushentsov
  • Patent number: 8453156
    Abstract: A method and system to balance the load of a task-based multi-threaded application on a platform. When the work required by the multi-threaded application is represented as a task with a computational requirement that is proportional to the amount of the work, embodiments of the invention control the recursive binary task division of the task using auxiliary partitions to create subtasks of balanced loads to enhance resource utilization and to improve application performance. The task is binary partitioned recursively into a plurality of subtasks until the plurality of subtasks is equal to the plurality of resources available on the platform to execute the subtasks.
    Type: Grant
    Filed: March 30, 2009
    Date of Patent: May 28, 2013
    Assignee: Intel Corporation
    Inventors: Wooyoung Kim, Michael Joseph Voss
  • Publication number: 20130125105
    Abstract: Control flow information and data flow information associated with a program containing a upc_forall loop are built. A shared reference map data structure using the control flow information and the data flow information is created. All local shared accesses are hashed to facilitate a constant access stride after being rewritten. All local shared references in a hash entry having a longest list are privatized. The upc_forall loop is rewritten into a for loop. Responsive to a determination that an unprocessed upc_forall loop does not exist, dead store elimination is run. The control flow information and the data flow information associated with the program containing the for loop is rebuilt.
    Type: Application
    Filed: November 15, 2011
    Publication date: May 16, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Yaoqing Gao, Liangxiao Hu, Raul Esteban Silvera, Ettore Tiotto
  • Publication number: 20130125104
    Abstract: According to one aspect of the present disclosure, a method and technique for reducing branch misprediction impact for nested loop code is disclosed. The method includes: responsive to identifying code having an outer loop and an inner loop, determining a quantity of iterations of the inner loop for an initial number of iterations of the outer loop; determining a number of processor cycles for executing the quantity of iterations of the inner loop for the initial number of iterations of the outer loop; determining whether the number of processor cycles is less than a threshold; and responsive to determining that the number of processor cycles is less than the threshold, fully unrolling the inner loop for the initial number of iterations of the outer loop.
    Type: Application
    Filed: November 11, 2011
    Publication date: May 16, 2013
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Madhavi G. Valluri, Steven W. White
  • Patent number: 8443344
    Abstract: Approaches for generating a hardware definition from a program specified in a high-level language. In one approach, a first set of blocks of instructions in the high-level language program is identified. Each block in the first set is bounded by a respective loop designation in the high-level language. For each block in the first set, an associated respective second set of one or more blocks of the program is identified. Each block in the second set is outside the block in the first set. A hardware definition of the program is generated and stored. For each block in the first set, the hardware definition specifies power-reducing circuitry for one or more blocks in the associated second set. The power-reducing circuitry is controlled based on a status indication from the hardware definition of the block in the first set.
    Type: Grant
    Filed: September 25, 2008
    Date of Patent: May 14, 2013
    Assignee: Xilinx, Inc.
    Inventors: Prasanna Sundararajan, Tim Tuan
  • Patent number: 8443351
    Abstract: The subject disclosure pertains broadly to parallelization of workflow loops. More specifically, loop containers and related elements are cloned several times to match a desired number of parallel iterations or threads. The cloned containers are communicatively coupled or connected to a single enumerator component and can interact therewith to facilitate acquisition of collection elements. This arrangement, among other things, ensures that the correct number of iterations are executed as if the loop was processed sequentially.
    Type: Grant
    Filed: February 23, 2006
    Date of Patent: May 14, 2013
    Assignee: Microsoft Corporation
    Inventors: J. Kirk Haselden, Sergei Ivanov
  • Publication number: 20130117737
    Abstract: One embodiment of the present invention sets forth a technique for reducing sign-extension instructions (SEIs) included in a computer program, the technique involves receiving intermediate code that is associated with the computer program and includes a first SEI that is included in a loop structure within the computer program, determining that the first SEI is eligible to be moved outside of the loop structure, inserting into a preheader of the loop a second SEI that, when executed by a processor, promotes an original value targeted by the first SEI from a smaller type to a larger type, and replacing the first SEI with one or more intermediate instructions that are eligible for additional compiler optimizations.
    Type: Application
    Filed: October 26, 2012
    Publication date: May 9, 2013
    Applicant: NVIDIA CORPORATION
    Inventor: NVIDIA Corporation
  • Patent number: 8434076
    Abstract: A system which combines sequential and iterative source code is provided. The system decides which type of processing would be most suitable for all portions of the source code, regardless of type. The system can adjust that decision based on the specific nature of the constructs within the source code, and can also adjust that decision based on the platform upon which the resulting executable program will run.
    Type: Grant
    Filed: December 12, 2007
    Date of Patent: April 30, 2013
    Assignee: Oracle International Corporation
    Inventors: Anguel Novoselsky, Zhen Hua Liu
  • Patent number: 8429625
    Abstract: A method and system for processing generic formatted data, including first data describing a sequence of generic operations without any loops, in view of providing specific formatted data, for a determined platform including Q processor(s) and at least one memory, the platform configured to process, according, directly or indirectly, to specific formatted data, an object made up of elementary information of same type, each elementary information being represented by at least one numerical value.
    Type: Grant
    Filed: December 19, 2006
    Date of Patent: April 23, 2013
    Assignee: DXO Labs
    Inventor: Bruno Liege
  • Patent number: 8418156
    Abstract: Generally, the present disclosure provides systems and methods to generate a two-stage commit (TSC) region which has two separate commit stages. Frequently executed code may be identified and combined for the TSC region. Binary optimization operations may be performed on the TSC region to enable the code to run more efficiently by, for example, reordering load and store instructions. In the first stage, load operations in the region may be committed atomically and in the second stage, store operations in the region may be committed atomically.
    Type: Grant
    Filed: December 16, 2009
    Date of Patent: April 9, 2013
    Assignee: Intel Corporation
    Inventors: Cheng Wang, Youfeng Wu
  • Patent number: 8413127
    Abstract: A mechanism for minimizing effective memory latency without unnecessary cost through fine-grained software-directed data prefetching using integrated high-level and low-level code analysis and optimizations is provided. The mechanism identifies and classifies streams, identifies data that is most likely to incur a cache miss, exploits effective hardware prefetching to determine the proper number of streams to be prefetched, exploits effective data prefetching on different types of streams in order to eliminate redundant prefetching and avoid cache pollution, and uses high-level transformations with integrated lower level cost analysis in the instruction scheduler to schedule prefetch instructions effectively.
    Type: Grant
    Filed: December 22, 2009
    Date of Patent: April 2, 2013
    Assignee: International Business Machines Corporation
    Inventors: Roch G. Archambault, Robert J. Blainey, Yaoqing Gao, Allan R. Martin, James L. McInnes, Francis Patrick O'Connell
  • Patent number: 8412914
    Abstract: A method for aggregating a program loop in a Macroscalar architecture includes identifying one or more instructions of the program loop having a branch instruction that causes the program loop to branch dependent upon a predicate condition after a memory write operation. The method also includes modifying at least one of the one or more instructions to cause a processor executing the one or more instructions to branch after the memory write operation executed as a vector block for iterations prior to and including an iteration during which the predicate condition is satisfied.
    Type: Grant
    Filed: November 17, 2011
    Date of Patent: April 2, 2013
    Assignee: Apple Inc.
    Inventor: Jeffry E. Gonion
  • Patent number: 8402447
    Abstract: Various technologies and techniques are disclosed for transforming a sequential loop into a parallel loop for use with a transactional memory system. Open ended and/or closed ended sequential loops can be transformed to parallel loops. For example, a section of code containing an original sequential loop is analyzed to determine a fixed number of iterations for the original sequential loop. The original sequential loop is transformed into a parallel loop that can generate transactions in an amount up to the fixed number of iterations. As another example, an open ended sequential loop can be transformed into a parallel loop that generates a separate transaction containing a respective work item for each iteration of a speculation pipeline. The parallel loop is then executed using the transactional memory system, with at least some of the separate transactions being executed on different threads.
    Type: Grant
    Filed: July 25, 2011
    Date of Patent: March 19, 2013
    Assignee: Microsoft Corporation
    Inventors: John Joseph Duffy, Jan Gray, Yosseff Levanoni
  • Patent number: 8402450
    Abstract: A high level programming language provides a map transformation that takes a data parallel algorithm and a set of one or more input indexable types as arguments. The map transformation applies the data parallel algorithm to the set of input indexable types to generate an output indexable type, and returns the output indexable type. The map transformation may be used to fuse one or more data parallel algorithms with another data parallel algorithm.
    Type: Grant
    Filed: November 17, 2010
    Date of Patent: March 19, 2013
    Assignee: Microsoft Corporation
    Inventors: Paul F. Ringseth, Yosseff Levanoni, Weirong Zhu
  • Patent number: 8387036
    Abstract: A method for executing a computer program involving obtaining a statement of the source code, where the statement comprises a method call, and where the source code is composed in a statically-typed programming language. The method also involves, upon entry into a loop included in the computer program: incrementing an entry counter by one; and, for each iteration of the loop, incrementing an iteration counter by one, incrementing a local counter by one to obtain an incremented value of the local counter, incrementing a summation variable by the incremented value of the local counter, and executing the iteration of the loop.
    Type: Grant
    Filed: January 27, 2010
    Date of Patent: February 26, 2013
    Assignee: Oracle America, Inc.
    Inventor: John Rose
  • Patent number: 8375375
    Abstract: A method and system of auto parallelization of zero-trip loops that substitutes a nested basic linear induction variable by exploiting a parallelizing compiler is provided. Provided is a use of a max{0,N} variable for loop iterations in case of no information is known about the value of N, for a typical loop iterating from 1 to N, in which N is the loop invariant. For the nested basic induction variables, an induction variable substitution process is applied to the nested loops starting from the innermost loop to the outermost one. Then a removal of the max operator afterwards through a copy propagation pass of the IBM compiler is provided. In doing so, the loop dependency on the induction variable is eliminated and an opportunity for a parallelizing compiler to parallel the outermost loop is provided.
    Type: Grant
    Filed: January 21, 2009
    Date of Patent: February 12, 2013
    Assignee: International Business Machines Corporation
    Inventors: Zhixing Ren, Raul Esteban Silvera, Guansong Zhang
  • Patent number: 8359587
    Abstract: A compilation method and mechanism for parallelizing program code. A method for compilation includes analyzing source code and identifying candidate code for parallelization. The method includes parallelizing the candidate code, in response to determining said profitability meets a predetermined criteria; and generating object code corresponding to the source code. The generated object code includes both a non-parallelized version of the candidate code and a parallelized version of the candidate code. During execution of the object code, a dynamic selection between execution of the non-parallelized version of the candidate code and the parallelized version of the candidate code is made. Changing execution from said parallelized version of the candidate code to the non-parallelized version of the candidate code, may be in response to determining a transaction failure count meets a pre-determined threshold.
    Type: Grant
    Filed: May 1, 2008
    Date of Patent: January 22, 2013
    Assignee: Oracle America, Inc.
    Inventors: Yonghong Song, Spiros Kalogeropulos, Partha P. Tirumalai
  • Publication number: 20120331453
    Abstract: A method for vectorization of a block of code is provided. The method comprises receiving a first block of code as input; and converting the first block of code into at least a second block of code and a third block of code. The first block of code accesses a first set of memory addresses that are potentially misaligned. The second block of code performs conditional leaping address incrementation to selectively access a first subset of the first set of memory addresses. The third block of code accesses a second subset of the first set of memory addresses starting from an aligned memory address, simultaneously accessing multiple memory addresses at a time. No memory address belongs to both the first subset and the second subset of memory addresses.
    Type: Application
    Filed: September 7, 2012
    Publication date: December 27, 2012
    Applicant: INTERNATIONAL BUSINESS MACHINES
    Inventors: Dorit Nuzman, Ira Rosen, Ayal Zaks
  • Patent number: 8327344
    Abstract: Mechanisms are provided for analyzing and optimizing loops with conditional control flow in source code based on array reference safety. Mechanisms are provided for analyzing blocks of the source code to identify a conditional control flow loop having loop source code specifying a total access range for an array reference. A safe access range, of the total access range of the array reference in the loop source code, is identified over which a compiler-based optimization of the loop source code can be safely applied without introducing new exception conditions. The compiler-based optimization of the loop source code is performed based on the identified safe access range to generate optimized code. The optimized code is output for generation of executable code for execution on a processor.
    Type: Grant
    Filed: October 14, 2008
    Date of Patent: December 4, 2012
    Assignee: International Business Machines Corporation
    Inventor: Michael K. Gschwind
  • Patent number: 8327345
    Abstract: In response to receiving pre-processed code, a compiler identifies a code section that is not candidate for acceleration and identifying a code block specifying an iterated operation that is a candidate for acceleration. In response to identifying the code section, the compiler generates post-processed code containing one or more lower level instructions corresponding to the identified code section, and in response to identifying the code block, the compiler creates and outputs an operation data structure separate from the post-processed code that identifies the iterated operation. The compiler places a block computation command in the post-processed code that invokes processing of the operation data structure to perform the iterated operation and outputs the post-processed code.
    Type: Grant
    Filed: December 16, 2008
    Date of Patent: December 4, 2012
    Assignee: International Business Machines Corporation
    Inventors: Ravi K. Arimilli, Balaram Sinharoy