Including Loop Patents (Class 717/160)

Including scheduling instructions (Class 717/161)

Vectorizing combinations of program operations

Patent number: 8640112

Abstract: System and method for vectorizing combinations of program operations. Program code is received that includes a combination of individually vectorizable program portions that collectively implement a first computation. Each individually vectorizable program portion has at least one array input and at least one array output. The combination of individually vectorizable program portions is transformed into a single vectorizable program portion that is or includes a functional composition of the combination of individually vectorizable program portions. Vectorized executable code implementing the first computation is generated based on the single vectorizable program portion. The generated executable code is directed to SIMD (Single-Instruction-Multiple-Data) computing units of a target processor.

Type: Grant

Filed: March 30, 2011

Date of Patent: January 28, 2014

Assignee: National Instruments Corporation

Inventors: Haoran Yi, Brady C. Duggan, Robert E. Dye, Adam L. Bordelon, Jeffrey L. Kodosky
Dynamic optimization using a resource cost registry

Patent number: 8635606

Abstract: Technologies are generally described for runtime optimization adjusted dynamically according to changing costs of one or more system resources. Multicore systems may encounter dynamic variations in performance associated with the relative cost of related system resources. Furthermore, multicore systems can experience dramatic variations in resource availability and costs. A dynamic registry of system resource costs can be utilized to guide dynamic optimization. The relative scarcity of each resource can be updated dynamically within the registry of system resource costs. A runtime code generating loader and optimizer may be adapted to adjust optimization according to the resource cost registry. Information regarding system resource costs can support optimization tradeoffs based on resource cost functions.

Type: Grant

Filed: October 13, 2009

Date of Patent: January 21, 2014

Assignee: Empire Technology Development LLC

Inventor: Ezekiel John Joseph Kruglick
Parallel dynamic optimization

Patent number: 8627300

Abstract: Technologies are generally described for parallel dynamic optimization using multicore processors. A runtime compiler may be adapted to generate multiple instances of executable code from a portable intermediate software module. The various instances of executable code may be generated with variations of optimization parameters such that the code instances each express different optimization attempts. A multicore processor may be leveraged to simultaneously execute some, or all, of the various code instances. Preferred optimization parameters may be determined from the executable code instances that may correctly complete in the least time, or may use the least amount of memory, or that may prove superior according to some other fitness metric. Preferred optimization parameters may be used to seed future optimization attempts. Output generated from the preferred instances may be used as soon as the first instance correctly completes block.

Type: Grant

Filed: October 13, 2009

Date of Patent: January 7, 2014

Assignee: Empire Technology Development LLC

Inventor: Ezekiel John Joseph Kruglick
Vectorization of program code

Patent number: 8627304

Abstract: A method for vectorization of a block of code is provided. The method comprises receiving a first block of code as input; and converting the first block of code into at least a second block of code and a third block of code. The first block of code accesses a first set of memory addresses that are potentially misaligned. The second block of code performs conditional leaping address incrementation to selectively access a first subset of the first set of memory addresses. The third block of code accesses a second subset of the first set of memory addresses starting from an aligned memory address, simultaneously accessing multiple memory addresses at a time. No memory address belongs to both the first subset and the second subset of memory addresses.

Type: Grant

Filed: July 28, 2009

Date of Patent: January 7, 2014

Assignee: International Business Machines Corporation

Inventors: Dorit Nuzman, Ira Rosen, Ayal Zaks
Device and method for automatically optimizing composite applications having orchestrated activities

Patent number: 8601454

Abstract: A device (D) is intended for optimizing composite applications comprising at least two orchestrated activities participating to at least one process. This device (D) comprises i) an analyzing means (AM) arranged for determining orchestrated activities contained into a composite application to be optimized and dependencies between these activities, and ii) an optimizing means (OM) arranged for determining a new orchestration between the determined activities which allows the composite application to execute requests of users in a minimal time, according to the determined dependencies and to predefined rules, and for outputting an optimized composite application based on the new orchestration.

Type: Grant

Filed: December 12, 2008

Date of Patent: December 3, 2013

Assignee: Alcatel Lucent

Inventor: Benoit Christophe
Control structure refinement of loops using static analysis

Patent number: 8601459

Abstract: A system and method for discovering a set of possible iteration sequences for a given loop in a software program is described, to transform the loop representation. In a program containing a loop, the loop is partitioned into a plurality of portions based on splitting criteria. Labels are associated with the portions, and an initial loop automaton is constructed that represents the loop iterations as a regular language over the labels corresponding to the portions in the program. Subsequences of the labels are analyzed to determine infeasibility of the subsequences permitted in the automaton. The automaton is refined by removing all infeasible subsequences to discover a set of possible iteration sequences in the loop. The resulting loop automaton is used in a subsequent program verification or analysis technique to find violations of correctness properties in programs.

Type: Grant

Filed: April 9, 2013

Date of Patent: December 3, 2013

Assignee: NEC Laboratories America, Inc.

Inventors: Sriram Sankaranarayanan, Aarti Gupta, Gogul Balakrishnan
Speculative region-level loop optimizations

Patent number: 8589901

Abstract: A system and method are configured to apply region level optimizations to a selected region of source code rather than loop level optimizations to a loop or loop nest. The region may include an outer loop, a plurality of inner loops and at least one control code. If the region includes an exceptional control flow statement and/or a procedure call, speculative region-level multi-versioning may be applied.

Type: Grant

Filed: December 22, 2010

Date of Patent: November 19, 2013

Inventors: Jin Lin, John L. Ng, Robert J. Cox, Xinmin Tian
Macroscalar processor architecture

Patent number: 8578358

Abstract: A macroscalar processor architecture is described herein. In one embodiment, a processor receives instructions of a program loop having a vector block and a sequence block intended to be executed after the vector block, where the processor includes multiple slices and each of the slices is capable of executing an instruction of an iteration of the program loop substantially in parallel. For each iteration of the program loop, the processor executes an instruction of the sequence block using one of the slices while executing instructions of the vector block using a remainder of the slices substantially in parallel. Other methods and apparatuses are also described.

Type: Grant

Filed: November 17, 2011

Date of Patent: November 5, 2013

Assignee: Apple Inc.

Inventor: Jeffry E. Gonion
METHODS TO OPTIMIZE A PROGRAM LOOP VIA VECTOR INSTRUCTIONS USING A SHUFFLE TABLE AND A BLEND TABLE

Publication number: 20130290943

Abstract: According to one embodiment, a code optimizer is configured to receive first code having a program loop implemented with scalar instructions to store values of a first array to a second array based on values of a third array and to generate second code representing the program loop using at least one vector instruction. The second code include a shuffle instruction to shuffle elements of the first array based on the third array using a shuffle table in a vector manner, a blend instruction to blend the shuffled elements of the first array using a blend table in a vector manner, and a store instruction to store the blended elements of the first array in the second array.

Type: Application

Filed: December 15, 2011

Publication date: October 31, 2013

Applicant: Intel Corporation

Inventors: Tal Uliel, Elmoustapha Ould-Ahmedvall, Bret T. Toll
Creating multiple versions for interior pointers and alignment of an array

Patent number: 8555030

Abstract: A device identifies array accesses of variables in a program code that includes multiple arrays, and identifies array access patterns for one of the array accesses. The device also determines an order of the array access patterns identified for the array accesses, and calculates, based on the order, distances between the array access patterns. The device further shares address calculations amongst the array accesses associated with array access patterns with one or more of the distances that are equivalent.

Type: Grant

Filed: July 14, 2011

Date of Patent: October 8, 2013

Assignee: Advanced Micro Devices, Inc.

Inventors: Tim J. Wilkens, Michael C. Berg
Performing register allocation of program variables based on priority spills and assignments

Patent number: 8555267

Abstract: A mechanism for performing register allocation based on priority spills and assignments is disclosed. A method of embodiments of the invention includes repetitively detecting fat points during a compilation process of a software program running on a virtual machine of a computer system, each fat point representing a program point having a high register pressure, the high register pressure occurs when a number of live program variables of the software program living at a given program point of the software program is greater than a number of available processor registers of the computer system. The method further includes choosing a fat point with a highest register pressure, selecting a live program variable having a lowest priority at the chosen fat point, and spilling the lowest priority live program variable to memory of the computer system.

Type: Grant

Filed: March 3, 2010

Date of Patent: October 8, 2013

Assignee: Red Hat, Inc.

Inventor: Vladimir Makarov
Loop coalescing method and loop coalescing device

Patent number: 8549507

Abstract: A loop coalescing method and a loop coalescing device are disclosed. The loop coalescing method comprises removing an inner-most loop from among nested loops, so that an outer operation provided outside of the inner-most loop is performed when a condition of a conditional statement is satisfied, generating a guard code by applying an if-conversion method to the conditional statement, and converting a guard by using an instruction calculating the guard of the guard code, the instruction calculating the guard using a register where information related to a period of time corresponding to the number of iterations of the inner-most loop is stored.

Type: Grant

Filed: August 22, 2007

Date of Patent: October 1, 2013

Assignee: Samsung Electronics Co., Ltd.

Inventors: Hee Seok Kim, Hong-Seok Kim, Chang-Woo Baek, Jeongwook Kim
Mechanism for performing instruction scheduling based on register pressure sensitivity

Patent number: 8549508

Abstract: A mechanism for performing instruction scheduling based on register pressure sensitivity is disclosed. A method of embodiments of the invention includes performing a preliminary register pressure minimization on program points during a compilation process of a software program running on a virtual machine of a computer system. The method further includes calculating a register pressure at each of the program points, detecting an instruction to be scheduled, and performing instruction scheduling of the instruction based on a current register pressure at a current scheduling point and potential register pressures at subsequent scheduling points.

Type: Grant

Filed: March 3, 2010

Date of Patent: October 1, 2013

Assignee: Red Hat, Inc.

Inventor: Vladimir Makarov
Framework for generating mixed-mode operations in loop-level simdization

Patent number: 8549501

Abstract: Generating mixed-mode operations in the compilation of program code for processors having vector or SIMD processing units is disclosed. In a preferred embodiment of the present invention, program instructions making up the body of a loop are abstracted into virtual vector instructions. These virtual vector instructions are treated, for initial code optimization purposes, as vector instructions (i.e., instructions written for the vector unit). The virtual vector instructions are eventually expanded into native code for the target processor, at which time a determination is made for each virtual vector instruction as to whether to expand the virtual vector instruction into native vector instructions, into native scalar instructions, into calls to pre-defined library functions, or into a combination of these. A cost model is used to determine the optimal choice of expansion based on hardware/software constraints, performance costs/benefits, and other criteria.

Type: Grant

Filed: August 16, 2004

Date of Patent: October 1, 2013

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu
Compiler, compile method, and processor core control method and processor

Patent number: 8543993

Abstract: A compiler compiling a source code and is implemented in a plurality of processor cores includes a parallel loop processing detection unit configured to detect from the source code a loop processing code for execution of an internal processing operation for a given number of repeating times, and an independent parallel loop processing code in the internal processing operation performed for each repetition to be concurrently processed, and a dynamic parallel conversion unit configured to generate a control core code for control of the number of repeating times in the parallel loop processing code and a parallel processing code for changing the number of repeating times corresponding to the control from the control core code.

Type: Grant

Filed: March 18, 2010

Date of Patent: September 24, 2013

Assignee: Fujitsu Limited

Inventor: Koichiro Yamashita
CONTROL STRUCTURE REFINEMENT OF LOOPS USING STATIC ANALYSIS

Publication number: 20130227537

Abstract: A system and method for discovering a set of possible iteration sequences for a given loop in a software program is described, to transform the loop representation. In a program containing a loop, the loop is partitioned into a plurality of portions based on splitting criteria. Labels are associated with the portions, and an initial loop automaton is constructed that represents the loop iterations as a regular language over the labels corresponding to the portions in the program. Subsequences of the labels are analyzed to determine infeasibility of the subsequences permitted in the automaton. The automaton is refined by removing all infeasible subsequences to discover a set of possible iteration sequences in the loop. The resulting loop automaton is used in a subsequent program verification or analysis technique to find violations of correctness properties in programs.

Type: Application

Filed: April 9, 2013

Publication date: August 29, 2013

Applicant: NEC Laboratories America, Inc.

Inventor: NEC Laboratories America, Inc.
Control structure refinement of loops using static analysis

Patent number: 8522226

Abstract: A system and method for discovering a set of possible iteration sequences for a given loop in a software program is described, to transform the loop representation. In a program containing a loop, the loop is partitioned into a plurality of portions based on splitting criteria. Labels are associated with the portions, and an initial loop automaton is constructed that represents the loop iterations as a regular language over the labels corresponding to the portions in the program. Subsequences of the labels are analyzed to determine infeasibility of the subsequences permitted in the automaton. The automaton is refined by removing all infeasible subsequences to discover a set of possible iteration sequences in the loop. The resulting loop automaton is used in a subsequent program verification or analysis technique to find violations of correctness properties in programs.

Type: Grant

Filed: February 8, 2010

Date of Patent: August 27, 2013

Assignee: NEC Laboratories America, Inc.

Inventors: Sriram Sankaranarayanan, Aarti Gupta, Gogul Balakrishnan
Multiversioning if statement merging and loop fusion

Patent number: 8516468

Abstract: In one embodiment of the invention, a method for fusing a first loop nested in a first IF statement with a second loop nested in a second IF statement without the use of modified and referenced (mod-ref) information to determine if certain conditional statements in the IF statements retain variable values.

Type: Grant

Filed: June 30, 2008

Date of Patent: August 20, 2013

Assignee: Intel Corporation

Inventors: John L. Ng, Robert Cox, Dmitry V. Budanov
Translation of SIMD instructions in a data processing system

Patent number: 8505002

Abstract: A data processing system is provided having a processor and analysing circuitry for identifying a SIMD instruction associated with a first SIMD instruction set and replacing it by a functionally-equivalent scalar representation and marking that functionally-equivalent scalar representation. The marked functionally-equivalent scalar representation is dynamically translated using translation circuitry upon execution of the program to generate one or more corresponding translated instructions corresponding to a instruction set architecture different from the first SIMD architecture corresponding to the identified SIMD instruction.

Type: Grant

Filed: September 27, 2007

Date of Patent: August 6, 2013

Assignees: ARM Limited, The Regents of the University of Michigan

Inventors: Sami Yehia, Krisztian Flautner, Nathan Clark, Amir Hormati, Scott Mahlke
Redundant exception handling code removal

Patent number: 8495606

Abstract: A system performs operations comprising creating a call graph for a program translated from source code, identifying redundant exception handling code in the program utilizing the call graph, and removing the redundant exception handling code. The operation of identifying redundant exception handling code may comprise identifying at least one function or callsite by determining that a first function in the at least one function's or callsite's callee chain throws an exception and that the exception is handled by a second function in the function's or callsite's callee chain or by determining that an exception is not thrown in the at least one function's or callsite's callee chain. The operation of removing the redundant exception handling code may comprise removing redundant exception handling code included in at least one function or callsite and/or removing at least one entry for the at least one function or callsite from an exception lookup table.

Type: Grant

Filed: November 14, 2008

Date of Patent: July 23, 2013

Assignee: Oracle America, Inc.

Inventors: Sheldon M. Lobo, Fu-Hwa Wang
Performing aggressive code optimization with an ability to rollback changes made by the aggressive optimizations

Patent number: 8495607

Abstract: Mechanisms for aggressively optimizing computer code are provided. With these mechanisms, a compiler determines an optimization to apply to a portion of source code and determines if the optimization as applied to the portion of source code will result in unsafe optimized code that introduces a new source of exceptions being generated by the optimized code. In response to a determination that the optimization is an unsafe optimization, the compiler generates an aggressively compiled code version, in which the unsafe optimization is applied, and a conservatively compiled code version in which the unsafe optimization is not applied. The compiler stores both versions and provides them for execution. Mechanisms are provided for switching between these versions during execution in the event of a failure of the aggressively compiled code version. Moreover, predictive mechanisms are provided for predicting whether such a failure is likely.

Type: Grant

Filed: March 1, 2010

Date of Patent: July 23, 2013

Assignee: International Business Machines Corporation

Inventor: Michael K. Gschwind
Efficient program instrumentation

Patent number: 8484623

Abstract: A method for determining the number and location of instrumentation probes to be inserted into a program is disclosed. The method advantageously inserts the minimum number of probes that are required to obtain execution coverage for every node in the program's control-flow graph. In addition, the method requires only type of node marking and one bit to store each probe, and does not require the assignment of weights to arcs or nodes of the control-flow graph. In the illustrative embodiment, the nodes of a control-flow graph are partitioned into non-empty sets, where each non-empty set corresponds to a super nested block of the program.

Type: Grant

Filed: September 29, 2008

Date of Patent: July 9, 2013

Assignee: Avaya, Inc.

Inventors: Juan Jenny Li, David Mandel Weiss
Method and system for utilizing parallelism across loops

Patent number: 8479185

Abstract: A method for compiling application source code that includes selecting multiple loops for parallelization. The multiple loops include a first loop and a second loop. The method further includes partitioning the first loop into a first set of chunks, partitioning the second loop into a second set of chunks, and calculating data dependencies between the first set of chunks and the second set of chunks. A first chunk of the second set of chunks is dependent on a first chunk of the first set of chunks. The method further includes inserting, into the first loop and prior to completing compilation, a precedent synchronization instruction for execution when execution of the first chunk of the first set of chunks completes, and completing the compilation of the application source code to create an application compiled code.

Type: Grant

Filed: December 9, 2010

Date of Patent: July 2, 2013

Assignee: Oracle International Corporation

Inventors: Spiros Kalogeropulos, Partha P. Tirumalai
Compiling method, compiling apparatus and computer system for a loop in a program

Patent number: 8479179

Abstract: A method for compiling a program including a loop is provided. In the program, the loop includes K instructions (K>2) and repeats for M times (M>2). The compiling method comprises following steps: performing resource conflict analysis to the K instructions in the loop; dividing the K instructions in the loop into a first combined instruction section, a connection instruction section and a second combined instruction section, wherein there is no resource conflict between the instructions in the first combined instruction section and the instructions in the second combined instruction section respectively; and compiling the program, wherein the instructions in the first combined instruction section in the cycle N (N=2, 3, . . . M) and the instructions in the second combined instruction section in the cycle N?1 are combined to be compiled respectively. A compiling apparatus and a computer system for realizing the above-mentioned compiling method are further provided.

Type: Grant

Filed: December 7, 2005

Date of Patent: July 2, 2013

Assignee: St-Ericsson SA

Inventors: Fan Wu, Yanmeng Sun
Data Prefetching and Coalescing for Partitioned Global Address Space Languages

Publication number: 20130167130

Abstract: An illustrative embodiment of a computer-implemented process for shared data prefetching and coalescing optimization versions a loop containing one or more shared references into an optimized loop and an un-optimized loop, transforms the optimized loop into a set of loops, and stores shared access associated information of the loop using a prologue loop in the set of loops. The shared access associated information pertains to remote data and is collected using the prologue loop in absence of network communication and builds a hash table. An associated data structure is updated each time the hash table is entered, and is sorted to remove duplicate entries and create a reduced data structure. Patterns across entries of the reduced data structure are identified and entries are coalesced. Data associated with a coalesced entry is pre-fetched using a single communication and a local buffer is populated with the fetched data for reuse.

Type: Application

Filed: October 24, 2012

Publication date: June 27, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: International Business Machines Corporation
Method, system and program product for optimizing emulation of a suspected malware

Patent number: 8473931

Abstract: A method, system and program product for optimizing emulation of a suspected malware. The method includes identifying, using an emulation optimizer tool, whether an instruction in a suspected malware being emulated by an emulation engine in a virtual environment signifies a long loop and, if so, generating a first hash for the loop. Further, the method includes ascertaining whether the first hash generated matches any long loop entries in a storage and, if so calculating a second hash for the long loop. Furthermore, the method includes inspecting any long loop entries ascertained to find an entry having a respective second hash matching the second hash calculated. If an entry matching the second hash calculated is found, the method further includes updating one or more states of the emulation engine, such that, execution of the long loop of the suspected malware is skipped, which optimizes emulation of the suspected malware.

Type: Grant

Filed: March 20, 2012

Date of Patent: June 25, 2013

Assignee: International Business Machines Corporation

Inventor: Ji Yan Wu
Parallelization of irregular reductions via parallel building and exploitation of conflict-free units of work at runtime

Patent number: 8468508

Abstract: An optimizing compiler device, a method, a computer program product which are capable of performing parallelization of irregular reductions. The method for performing parallelization of irregular reductions includes receiving, at a compiler, a program and selecting, at compile time, at least one unit of work (UW) from the program, each UW configured to operate on at least one reduction operation, where at least one reduction operation in the UW operates on a reduction variable whose address is determinable when running the program at a run-time. At run time, for each successive current UW, a list of reduction operations accessed by that unit of work is recorded. Further, it is determined at run time whether reduction operations accessed by a current UW conflict with any reduction operations recorded as having been accessed by prior selected units of work, and assigning the unit of work as a conflict free unit of work (CFUW) when no conflicts are found.

Type: Grant

Filed: October 9, 2009

Date of Patent: June 18, 2013

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, Yangchun Luo, John K. O'Brien, Xiaotong Zhuang
Vector atomic memory operation vector update system and method

Patent number: 8458685

Abstract: A system and method of compiling program code, wherein the program code includes an operation on an array of data elements stored in memory of a computer system. The program code is scanned for an equation which may have recurring data points. The equation is then replaced with vectorized machine executable code, wherein the machine executable code comprises a nested loop and wherein the nested loop comprises an exterior loop and a virtual interior loop. The exterior loop decomposes the equation into a plurality of loops of length N, wherein N is an integer greater than one. The virtual interior loop executes vector operations corresponding to the N length loop to form a result vector resident in memory, wherein the virtual interior loop includes a vector atomic memory operation (AMO) instruction.

Type: Grant

Filed: June 12, 2009

Date of Patent: June 4, 2013

Assignee: Cray Inc.

Inventor: Terry D. Greyzck
Conversion of a class oriented data flow program to a structure oriented data flow program with dynamic interpretation of data types

Patent number: 8458682

Abstract: System and method for converting a class oriented data flow program to a structure oriented data flow program. A first data flow program is received, where the first data flow program is an object oriented program comprising instances of one or more classes, and wherein the first data flow program is executable to perform a first function. The first data flow program is automatically converted to a second data flow program, where the second data flow program does not include the instances of the one or more classes, and where the second data flow program is executable to perform the first function. The second data flow program is stored on a computer memory, where the second data flow program is configured to be deployed to a device, e.g., a programmable hardware element, and where the second data flow program is executable on the device to perform the first function.

Type: Grant

Filed: April 27, 2009

Date of Patent: June 4, 2013

Assignee: National Instruments Corporation

Inventors: Stephen R. Mercer, Akash B. Bhakta, Matthew E. Novacek
Computation reuse for loops with irregular accesses

Patent number: 8453135

Abstract: A compiler selects a nested loop within software code that includes an outer loop and an inner loop. The outer loop includes an outer induction variable and the inner loop includes an inner induction variable. The compiler identifies a computation included in the nested loop that generates an irregular array access, which includes an expression of both the outer induction variable and the inner induction variable. Next, the compiler identifies a redundant calculation for the computation based upon the outer induction variable and the inner induction variable, and generates a temporary variable to correspond with the redundant calculation. The compiler replaces the computation with the temporary variable in the nested loop and, in turn, compiles the nested loop with the included temporary variable.

Type: Grant

Filed: March 11, 2010

Date of Patent: May 28, 2013

Assignee: Freescale Semiconductor, Inc.

Inventor: Abderrazek Zaafrani
Improving data locality and parallelism by code replication

Patent number: 8453134

Abstract: Provided are a method, system, and article of manufacture improving data locality and parallelism by code replication and array contraction. Source code including an array of elements referenced using at least two indices is processed. The array is nested within multiple loops, wherein at least two of the loops perform iterations with respect to the indices of the array, wherein the index incremented in at least one innermost loop of the loops does not comprise a leftmost index in the array. The source code is transformed to object code by performing operations including fusing at least two innermost loops of the loops in object code generated by compiling the source code by replicating statements from at least one of the innermost loops into a fused innermost loop and performing loop interchange in the object code to have the fused innermost loop provide iterations with respect to the leftmost index in the array.

Type: Grant

Filed: June 4, 2008

Date of Patent: May 28, 2013

Assignee: Intel Corporation

Inventors: John L. Ng, Alexander Y. Ostanevich, Alexander L. Sushentsov
Method and system to perform load balancing of a task-based multi-threaded application

Patent number: 8453156

Abstract: A method and system to balance the load of a task-based multi-threaded application on a platform. When the work required by the multi-threaded application is represented as a task with a computational requirement that is proportional to the amount of the work, embodiments of the invention control the recursive binary task division of the task using auxiliary partitions to create subtasks of balanced loads to enhance resource utilization and to improve application performance. The task is binary partitioned recursively into a plurality of subtasks until the plurality of subtasks is equal to the plurality of resources available on the platform to execute the subtasks.

Type: Grant

Filed: March 30, 2009

Date of Patent: May 28, 2013

Assignee: Intel Corporation

Inventors: Wooyoung Kim, Michael Joseph Voss
UNIFIED PARALLEL C WORK-SHARING LOOP CONSTRUCT TRANSFORMATION

Publication number: 20130125105

Abstract: Control flow information and data flow information associated with a program containing a upc_forall loop are built. A shared reference map data structure using the control flow information and the data flow information is created. All local shared accesses are hashed to facilitate a constant access stride after being rewritten. All local shared references in a hash entry having a longest list are privatized. The upc_forall loop is rewritten into a for loop. Responsive to a determination that an unprocessed upc_forall loop does not exist, dead store elimination is run. The control flow information and the data flow information associated with the program containing the for loop is rebuilt.

Type: Application

Filed: November 15, 2011

Publication date: May 16, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Yaoqing Gao, Liangxiao Hu, Raul Esteban Silvera, Ettore Tiotto
REDUCING BRANCH MISPREDICTION IMPACT IN NESTED LOOP CODE

Publication number: 20130125104

Abstract: According to one aspect of the present disclosure, a method and technique for reducing branch misprediction impact for nested loop code is disclosed. The method includes: responsive to identifying code having an outer loop and an inner loop, determining a quantity of iterations of the inner loop for an initial number of iterations of the outer loop; determining a number of processor cycles for executing the quantity of iterations of the inner loop for the initial number of iterations of the outer loop; determining whether the number of processor cycles is less than a threshold; and responsive to determining that the number of processor cycles is less than the threshold, fully unrolling the inner loop for the initial number of iterations of the outer loop.

Type: Application

Filed: November 11, 2011

Publication date: May 16, 2013

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Madhavi G. Valluri, Steven W. White
Methods for identifying gating opportunities from a high-level language program and generating a hardware definition

Patent number: 8443344

Abstract: Approaches for generating a hardware definition from a program specified in a high-level language. In one approach, a first set of blocks of instructions in the high-level language program is identified. Each block in the first set is bounded by a respective loop designation in the high-level language. For each block in the first set, an associated respective second set of one or more blocks of the program is identified. Each block in the second set is outside the block in the first set. A hardware definition of the program is generated and stored. For each block in the first set, the hardware definition specifies power-reducing circuitry for one or more blocks in the associated second set. The power-reducing circuitry is controlled based on a status indication from the hardware definition of the block in the first set.

Type: Grant

Filed: September 25, 2008

Date of Patent: May 14, 2013

Assignee: Xilinx, Inc.

Inventors: Prasanna Sundararajan, Tim Tuan
Parallel loops in a workflow

Patent number: 8443351

Abstract: The subject disclosure pertains broadly to parallelization of workflow loops. More specifically, loop containers and related elements are cloned several times to match a desired number of parallel iterations or threads. The cloned containers are communicatively coupled or connected to a single enumerator component and can interact therewith to facilitate acquisition of collection elements. This arrangement, among other things, ensures that the correct number of iterations are executed as if the loop was processed sequentially.

Type: Grant

Filed: February 23, 2006

Date of Patent: May 14, 2013

Assignee: Microsoft Corporation

Inventors: J. Kirk Haselden, Sergei Ivanov
DEMAND-DRIVEN ALGORITHM TO REDUCE SIGN-EXTENSION INSTRUCTIONS INCLUDED IN LOOPS OF A 64-BIT COMPUTER PROGRAM

Publication number: 20130117737

Abstract: One embodiment of the present invention sets forth a technique for reducing sign-extension instructions (SEIs) included in a computer program, the technique involves receiving intermediate code that is associated with the computer program and includes a first SEI that is included in a loop structure within the computer program, determining that the first SEI is eligible to be moved outside of the loop structure, inserting into a preheader of the loop a second SEI that, when executed by a processor, promotes an original value targeted by the first SEI from a smaller type to a larger type, and replacing the first SEI with one or more intermediate instructions that are eligible for additional compiler optimizations.

Type: Application

Filed: October 26, 2012

Publication date: May 9, 2013

Applicant: NVIDIA CORPORATION

Inventor: NVIDIA Corporation
Efficient compilation and execution of imperative-query languages

Patent number: 8434076

Abstract: A system which combines sequential and iterative source code is provided. The system decides which type of processing would be most suitable for all portions of the source code, regardless of type. The system can adjust that decision based on the specific nature of the constructs within the source code, and can also adjust that decision based on the platform upon which the resulting executable program will run.

Type: Grant

Filed: December 12, 2007

Date of Patent: April 30, 2013

Assignee: Oracle International Corporation

Inventors: Anguel Novoselsky, Zhen Hua Liu
Digital data processing method and system

Patent number: 8429625

Abstract: A method and system for processing generic formatted data, including first data describing a sequence of generic operations without any loops, in view of providing specific formatted data, for a determined platform including Q processor(s) and at least one memory, the platform configured to process, according, directly or indirectly, to specific formatted data, an object made up of elementary information of same type, each elementary information being represented by at least one numerical value.

Type: Grant

Filed: December 19, 2006

Date of Patent: April 23, 2013

Assignee: DXO Labs

Inventor: Bruno Liege
Two-stage commit (TSC) region for dynamic binary optimization in X86

Patent number: 8418156

Abstract: Generally, the present disclosure provides systems and methods to generate a two-stage commit (TSC) region which has two separate commit stages. Frequently executed code may be identified and combined for the TSC region. Binary optimization operations may be performed on the TSC region to enable the code to run more efficiently by, for example, reordering load and store instructions. In the first stage, load operations in the region may be committed atomically and in the second stage, store operations in the region may be committed atomically.

Type: Grant

Filed: December 16, 2009

Date of Patent: April 9, 2013

Assignee: Intel Corporation

Inventors: Cheng Wang, Youfeng Wu
Fine-grained software-directed data prefetching using integrated high-level and low-level code analysis optimizations

Patent number: 8413127

Abstract: A mechanism for minimizing effective memory latency without unnecessary cost through fine-grained software-directed data prefetching using integrated high-level and low-level code analysis and optimizations is provided. The mechanism identifies and classifies streams, identifies data that is most likely to incur a cache miss, exploits effective hardware prefetching to determine the proper number of streams to be prefetched, exploits effective data prefetching on different types of streams in order to eliminate redundant prefetching and avoid cache pollution, and uses high-level transformations with integrated lower level cost analysis in the instruction scheduler to schedule prefetch instructions effectively.

Type: Grant

Filed: December 22, 2009

Date of Patent: April 2, 2013

Assignee: International Business Machines Corporation

Inventors: Roch G. Archambault, Robert J. Blainey, Yaoqing Gao, Allan R. Martin, James L. McInnes, Francis Patrick O'Connell
Macroscalar processor architecture

Patent number: 8412914

Abstract: A method for aggregating a program loop in a Macroscalar architecture includes identifying one or more instructions of the program loop having a branch instruction that causes the program loop to branch dependent upon a predicate condition after a memory write operation. The method also includes modifying at least one of the one or more instructions to cause a processor executing the one or more instructions to branch after the memory write operation executed as a vector block for iterations prior to and including an iteration during which the predicate condition is satisfied.

Type: Grant

Filed: November 17, 2011

Date of Patent: April 2, 2013

Assignee: Apple Inc.

Inventor: Jeffry E. Gonion
Parallelizing sequential frameworks using transactions

Patent number: 8402447

Abstract: Various technologies and techniques are disclosed for transforming a sequential loop into a parallel loop for use with a transactional memory system. Open ended and/or closed ended sequential loops can be transformed to parallel loops. For example, a section of code containing an original sequential loop is analyzed to determine a fixed number of iterations for the original sequential loop. The original sequential loop is transformed into a parallel loop that can generate transactions in an amount up to the fixed number of iterations. As another example, an open ended sequential loop can be transformed into a parallel loop that generates a separate transaction containing a respective work item for each iteration of a speculation pipeline. The parallel loop is then executed using the transactional memory system, with at least some of the separate transactions being executed on different threads.

Type: Grant

Filed: July 25, 2011

Date of Patent: March 19, 2013

Assignee: Microsoft Corporation

Inventors: John Joseph Duffy, Jan Gray, Yosseff Levanoni
Map transformation in data parallel code

Patent number: 8402450

Abstract: A high level programming language provides a map transformation that takes a data parallel algorithm and a set of one or more input indexable types as arguments. The map transformation applies the data parallel algorithm to the set of input indexable types to generate an output indexable type, and returns the output indexable type. The map transformation may be used to fuse one or more data parallel algorithms with another data parallel algorithm.

Type: Grant

Filed: November 17, 2010

Date of Patent: March 19, 2013

Assignee: Microsoft Corporation

Inventors: Paul F. Ringseth, Yosseff Levanoni, Weirong Zhu
Method and system for execution profiling using loop count variance

Patent number: 8387036

Abstract: A method for executing a computer program involving obtaining a statement of the source code, where the statement comprises a method call, and where the source code is composed in a statically-typed programming language. The method also involves, upon entry into a loop included in the computer program: incrementing an entry counter by one; and, for each iteration of the loop, incrementing an iteration counter by one, incrementing a local counter by one to obtain an incremented value of the local counter, incrementing a summation variable by the incremented value of the local counter, and executing the iteration of the loop.

Type: Grant

Filed: January 27, 2010

Date of Patent: February 26, 2013

Assignee: Oracle America, Inc.

Inventor: John Rose
Auto parallelization of zero-trip loops through the induction variable substitution

Patent number: 8375375

Abstract: A method and system of auto parallelization of zero-trip loops that substitutes a nested basic linear induction variable by exploiting a parallelizing compiler is provided. Provided is a use of a max{0,N} variable for loop iterations in case of no information is known about the value of N, for a typical loop iterating from 1 to N, in which N is the loop invariant. For the nested basic induction variables, an induction variable substitution process is applied to the nested loops starting from the innermost loop to the outermost one. Then a removal of the max operator afterwards through a copy propagation pass of the IBM compiler is provided. In doing so, the loop dependency on the induction variable is eliminated and an opportunity for a parallelizing compiler to parallel the outermost loop is provided.

Type: Grant

Filed: January 21, 2009

Date of Patent: February 12, 2013

Assignee: International Business Machines Corporation

Inventors: Zhixing Ren, Raul Esteban Silvera, Guansong Zhang
Runtime profitability control for speculative automatic parallelization

Patent number: 8359587

Abstract: A compilation method and mechanism for parallelizing program code. A method for compilation includes analyzing source code and identifying candidate code for parallelization. The method includes parallelizing the candidate code, in response to determining said profitability meets a predetermined criteria; and generating object code corresponding to the source code. The generated object code includes both a non-parallelized version of the candidate code and a parallelized version of the candidate code. During execution of the object code, a dynamic selection between execution of the non-parallelized version of the candidate code and the parallelized version of the candidate code is made. Changing execution from said parallelized version of the candidate code to the non-parallelized version of the candidate code, may be in response to determining a transaction failure count meets a pre-determined threshold.

Type: Grant

Filed: May 1, 2008

Date of Patent: January 22, 2013

Assignee: Oracle America, Inc.

Inventors: Yonghong Song, Spiros Kalogeropulos, Partha P. Tirumalai
VECTORIZATION OF PROGRAM CODE

Publication number: 20120331453

Abstract: A method for vectorization of a block of code is provided. The method comprises receiving a first block of code as input; and converting the first block of code into at least a second block of code and a third block of code. The first block of code accesses a first set of memory addresses that are potentially misaligned. The second block of code performs conditional leaping address incrementation to selectively access a first subset of the first set of memory addresses. The third block of code accesses a second subset of the first set of memory addresses starting from an aligned memory address, simultaneously accessing multiple memory addresses at a time. No memory address belongs to both the first subset and the second subset of memory addresses.

Type: Application

Filed: September 7, 2012

Publication date: December 27, 2012

Applicant: INTERNATIONAL BUSINESS MACHINES

Inventors: Dorit Nuzman, Ira Rosen, Ayal Zaks
Array reference safety analysis in the presence of loops with conditional control flow

Patent number: 8327344

Abstract: Mechanisms are provided for analyzing and optimizing loops with conditional control flow in source code based on array reference safety. Mechanisms are provided for analyzing blocks of the source code to identify a conditional control flow loop having loop source code specifying a total access range for an array reference. A safe access range, of the total access range of the array reference in the loop source code, is identified over which a compiler-based optimization of the loop source code can be safely applied without introducing new exception conditions. The compiler-based optimization of the loop source code is performed based on the identified safe access range to generate optimized code. The optimized code is output for generation of executable code for execution on a processor.

Type: Grant

Filed: October 14, 2008

Date of Patent: December 4, 2012

Assignee: International Business Machines Corporation

Inventor: Michael K. Gschwind
Computation table for block computation

Patent number: 8327345

Abstract: In response to receiving pre-processed code, a compiler identifies a code section that is not candidate for acceleration and identifying a code block specifying an iterated operation that is a candidate for acceleration. In response to identifying the code section, the compiler generates post-processed code containing one or more lower level instructions corresponding to the identified code section, and in response to identifying the code block, the compiler creates and outputs an operation data structure separate from the post-processed code that identifies the iterated operation. The compiler places a block computation command in the post-processed code that invokes processing of the operation data structure to perform the iterated operation and outputs the post-processed code.

Type: Grant

Filed: December 16, 2008

Date of Patent: December 4, 2012

Assignee: International Business Machines Corporation

Inventors: Ravi K. Arimilli, Balaram Sinharoy

prev 1 2 3 4 5 6 7 8 … next