Including Loop Patents (Class 717/160)

Including scheduling instructions (Class 717/161)

Type and length abstraction for data types

Patent number: 8959501

Abstract: Embodiments are directed to implementing a generic SIMD data type in software code. In an embodiment, a computer system accesses a portion of software code that includes an algorithm with a generic SIMD data type that includes a variable number of elements. The algorithm with the generic SIMD data type is to be processed by a specific processor that includes various specific hardware features. The computer system determines at runtime a portion of customized processor-specific code that is to be used with the specified processor based on the generic SIMD data type, wherein the runtime determination resolves the number of elements that are to be used with the specified processor. The computer system also processes the software code including the algorithm with the generic SIMD data type using the determined, customized processor-specific code.

Type: Grant

Filed: December 14, 2010

Date of Patent: February 17, 2015

Assignee: Microsoft Corporation

Inventors: Carol Thompson Eidt, David L. Detlefs
Optimizing libraries for validating C++ programs using symbolic execution

Patent number: 8943487

Abstract: Particular embodiments optimize a C++ function comprising one or more loops for symbolic execution, comprising for each loop, if there is a branching condition within the loop, then rewrite the loop to move the branching condition outside the loop. Particular embodiments may further optimize the C++ function through simplified symbolic expressions and adding constructs forcing delayed interpretation of symbolic expressions during the symbolic execution.

Type: Grant

Filed: January 20, 2011

Date of Patent: January 27, 2015

Assignee: Fujitsu Limited

Inventors: Guodong Li, Sreeranga P. Rajan, Indradeep Ghosh
Loop invariant method expression hoisting

Patent number: 8935684

Abstract: A system, method and computer-readable medium are disclosed for improving the performance of a compiler. A set of source code instructions are processed to generate a plurality of source code instruction subsets, each of which is respectively associated with a mathematical operator. The source code subsets are then reordered to “hoist,” or place, a source code instruction subset associated with a product operator before a source code instruction subset associated with a summation operator. The plurality of source code instruction subsets are iteratively reordered until no source code instruction subset associated with a summation operator precedes a source code instruction subset associated with a product operator. A compiler is then used to compile the resulting reordered plurality of source code instruction subsets into a set of optimized object code instructions.

Type: Grant

Filed: December 13, 2012

Date of Patent: January 13, 2015

Assignee: Advanced Micro Devices, Inc.

Inventor: Mohammed Javed Absar
Reconfigurable processor and method for processing a nested loop

Patent number: 8930929

Abstract: A reconfigurable processor which merges an inner loop and an outer loop which are included in a nested loop and allocates the merged loop to processing elements in parallel, thereby reducing processing time to process the nested loop. The reconfigurable processor may extract loop execution frequency information from the inner loop and the outer loop of the nested loop, and may merge the inner loop and the outer loop based on the extracted loop execution frequency information.

Type: Grant

Filed: April 14, 2011

Date of Patent: January 6, 2015

Assignee: Samsung Electronics Co., Ltd.

Inventors: Min-Wook Ahn, Dong-Hoon Yoo, Jin-Seok Lee, Bernhard Egger, Tai-Song Jin, Won-Sub Kim, Hee-Jin Ahn
METHODS AND SYSTEMS TO VECTORIZE SCALAR COMPUTER PROGRAM LOOPS HAVING LOOP-CARRIED DEPENDENCES

Publication number: 20150007154

Abstract: Methods and systems to convert scalar computer program loops having loop carried dependences to vector computer program loops are disclosed. One example method and system generates a first predicate set associated with a first conditionally executed statement. The first predicate set contains a first set of predicates that cause a variable to be defined in a scalar computer program loop at or before the variable is defined by the first conditionally executed statement. The method and system also generates a second predicate set associated with the first conditionally executed statement. The second predicate set contains a second set of predicates that cause the variable to be used in the scalar computer program loop at or before the variable is defined by the first conditionally executed statement.

Type: Application

Filed: March 15, 2013

Publication date: January 1, 2015

Inventors: Jayashankar Bharadwaj, Nalini Vasudevan, Albert Hartono, Sara S. Baghsorkhi
Compiler for X86-based many-core coprocessors

Patent number: 8918770

Abstract: A system and method for compiling includes, for a parallelizable code portion of an application stored on a computer readable storage medium, determining one or more variables that are to be transferred to and/or from a coprocessor if the parallelizable code portion were to be offloaded. A start location and an end location are determined for at least one of the one or more variables as a size in memory. The parallelizable code portion is transformed by inserting an offload construct around the parallelizable code portion and passing the one or more variables and the size as arguments of the offload construct such that the parallelizable code portion is offloaded to a coprocessor at runtime.

Type: Grant

Filed: August 24, 2012

Date of Patent: December 23, 2014

Assignee: NEC Laboratories America, Inc.

Inventors: Nishkam Ravi, Tao Bao, Ozcan Ozturk, Srimat Chakradhar
Optimization of declarative queries

Patent number: 8914782

Abstract: Source code is generated that includes one or more iterator-based expressions such as declarative queries. The source code is translated into an intermediate language that classifies operators making up the iterator-based expressions into classes based on whether the operators are aggregating, element-wise, or sink operators. The intermediate language, including the identified classes, is processed using an automaton to replace the iterator-based expressions with one or more equivalent non-iterator-based expressions. Where an iterator-based expression is nested, the nested expression is processed using an equivalent number of nested automatons. The resulting optimized source code may be compiled and executed using fewer virtual function calls than the equivalent non-optimized source code.

Type: Grant

Filed: November 10, 2010

Date of Patent: December 16, 2014

Assignee: Microsoft Corporation

Inventors: Michael Isard, Yuan Yu, Derek Gordon Murray
Use of vectorization instruction sets

Patent number: 8904366

Abstract: In one embodiment, the invention is a method and apparatus for use of vectorization instruction sets. One embodiment of a method for generating vector instructions includes receiving source code written in a high-level programming language, wherein the source code includes at least one high-level instruction that performs multiple operations on a plurality of vector operands, and compiling the high-level instruction(s) into one or more low-level instructions, wherein the low-level instructions are in an instruction set of a specific computer architecture.

Type: Grant

Filed: May 15, 2009

Date of Patent: December 2, 2014

Assignee: International Business Machines Corporation

Inventors: Henrique Andrade, Bugra Gedik, Hua Yong Wang, Kun-Lung Wu
COMPUTER-READABLE RECORDING MEDIUM, COMPILING METHOD, AND INFORMATION PROCESSING APPARATUS

Publication number: 20140344795

Abstract: A compiler determines executability of loop fusion, for each of a plurality of loops existing in a code to be processed, based on performance information of a system where the code to be processed is executed and based on operands and number of data transfers executed inside each of the loops. Then, the compiler executes fusion of loop processing in accordance with a determination result of executability of the loop fusion.

Type: Application

Filed: April 17, 2014

Publication date: November 20, 2014

Applicant: FUJITSU LIMITED

Inventors: Tomoko Nikko, Shuichi Chiba
Automatic asynchronous offload to many-core coprocessors

Patent number: 8893103

Abstract: Methods and systems for asynchronous offload to many-core coprocessors include splitting a loop in an input source code into a sampling sub-part, a many integrated core (MIC) sub-part, and a central processing unit (CPU) sub-part; executing the sampling sub-part with a processor to determine loop characteristics including memory- and processor-operations executed by the loop; identifying optimal split boundaries based on the loop characteristics such that the MIC sub-part will complete in a same amount of time when executed on a MIC processor as the CPU sub-part will take when executed on a CPU; and modifying the input source code to split the loop at the identified boundaries, such that the MIC sub-part is executed on a MIC processor and the CPU sub-part is concurrently executed on a CPU.

Type: Grant

Filed: July 12, 2013

Date of Patent: November 18, 2014

Assignee: NEC Laboratories America, Inc.

Inventors: Nishkam Ravi, Yi Yang, Srimat Chakradhar
Loop control flow diversion

Patent number: 8887142

Abstract: Loop control flow diversion supports thread synchronization, garbage collection, and other situations involving suspension of long-running loops. Divertible loops have a loop body, a loop top, an indirection cell containing a loop top address, and a loop jump instruction sequence which references the indirection cell. In normal execution, control flows through the indirection cell to the loop top. After the indirection cell is altered, however, execution flow is diverted to a point away from the loop top. Operations such as garbage collection are performed while the loop (and hence the thread(s) using the loop) is thus diverted. The kernel or another thread then restores the loop top address into the indirection cell, and execution flow again continues through the restored indirection cell to the loop top.

Type: Grant

Filed: March 10, 2010

Date of Patent: November 11, 2014

Assignee: Microsoft Corporation

Inventors: Scott Mosier, Michael McKenzie Magruder, Frank V. Peschel-Gallee
APPARATUS AND METHOD FOR TRANSLATING MULTITHREAD PROGRAM CODE

Publication number: 20140331216

Abstract: A method and apparatus for translating a multithread program code are provided. The method includes: dividing a multithread program code into a plurality of statements according to a synchronization point; generating at least one loop group by combining one or more adjacent statements based on a number of instructions included in the plurality of statements; expanding or renaming variables in each of the plurality of statements so that each statement included in the at least one loop group is executed with respect to a work item of a different work group; and enclosing each of the generated at least one loop group respectively with a work item coalescing loop.

Type: Application

Filed: May 2, 2014

Publication date: November 6, 2014

Applicant: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Seong-Gun KIM, Dong-Hoon YOO, Jin-Seok LEE, Seok-Joong HWANG
Compiler device, compiler program, and loop parallelization method

Patent number: 8881124

Abstract: According to the conventional loop parallelization method, when a loop in which a value of a loop-carried dependency variable can be calculated in all of the iterations without sequentially executing the loop from the start, it is determined that DOALL parallelization is not applicable due to the loop-carried dependency variable. Accordingly, the loop is sequentially executed or parallelized by using DOACROSS parallelization that executes a loop including a loop-carried dependency variable. That is, there is a problem that an expression including a loop-carried dependency cannot be parallelized and efficiently processed with use of a multi-processor. By generating initial value calculating codes, the loop-carried dependency in a source code prior to parallelization can be solved, and by dividing a loop included in the source code into subloops that can be executed in parallel, the multi-processor can efficiently process the source code.

Type: Grant

Filed: December 13, 2011

Date of Patent: November 4, 2014

Assignee: Panasonic Corporation

Inventor: Daisuke Baba
Apparatus and method for scheduling instruction

Patent number: 8869129

Abstract: An apparatus and method for scheduling an instruction are provided. The apparatus includes an analyzer configured to analyze dependency of a plurality of recurrence loops and a scheduler configured to schedule the recurrence loops based the analyzed dependencies. When scheduling a plurality of recurrence loops, the apparatus first schedules a dominant loop whose loop head has no dependency on another loop among the recurrence loops.

Type: Grant

Filed: November 2, 2009

Date of Patent: October 21, 2014

Assignee: Samsung Electronics Co., Ltd.

Inventors: Tae-wook Oh, Won-sub Kim, Bernhard Egger
System and method for compiling machine-executable code generated from a sequentially ordered plurality of processor instructions

Patent number: 8856768

Abstract: A method and system are provided for deriving a resultant software program from an originating software program having overlapping branches, wherein the resultant software project has either no overlapping branches or fewer overlapping branches than the originating software program. A preferred embodiment of the invented method generates a resultant software program that has no overlapping branches. The resultant software is more easily converted into programming reconfigurable logic than the originating software program. Separate and individually applicable aspects of the invented method are used to eliminate all four possible states of two overlapping branches, i.e., forward branch overlapping forward branch, back branch overlapping back branch, and each of the two possible and distinguishable states of forward branch and back branch overlap. One or more elements of each aspect of the invention may be performed by one or more computers or processors, or by means of a computer or a communications network.

Type: Grant

Filed: January 30, 2012

Date of Patent: October 7, 2014

Inventor: Robert Keith Mykland
Adaptive instruction prefetching and fetching memory system apparatus and method for microprocessor system

Patent number: 8856769

Abstract: A method and system of the instruction packing and scaling are designed for simultaneously enhancing energy efficiency by concurrent and advanced prefetching/fetching instructions via the small and/or banked caches and for improving the performance of microprocessors by reducing the fraction of program and by employing the simple and fast caches. The invention is also designed for converting high fraction code to simplified, branch-reduced, and hidden code during compilation time, for storing packed/scaled code to concurrently accessible the plurality of caches and main memories, and for reverting the code to the native instructions during the instruction prefetch and fetch operations. Consequently, the invention does not forward many flow control instructions including procedure callers/returns and unconditional branches to microprocessors.

Type: Grant

Filed: October 23, 2012

Date of Patent: October 7, 2014

Inventor: Yong-Kyu Jung
Loop detection apparatus, loop detection method, and loop detection program

Patent number: 8856762

Abstract: A loop detection method, system, and article of manufacture for determining whether a sequence of unit processes continuously executed among unit processes in a program is a loop by means of computational processing performed by a computer. The method includes: reading address information on the sequence of unit processes; comparing an address of a unit process as a loop starting point candidate with an address of a last unit process in the sequence of unit processes; reading call stack information on the sequence of unit processes; comparing a call stack upon execution of the unit process as the loop starting point candidate with a call stack upon execution of the last unit process; outputting a determination result indicating that the sequence of unit processes forms a loop if the respective comparison results of the addresses and the call stacks match with each other.

Type: Grant

Filed: November 21, 2011

Date of Patent: October 7, 2014

Assignee: International Business Machines Corporation

Inventor: Hiroshige Hayashizaki
System using a unique marker with each software code-block

Patent number: 8850410

Abstract: A system and method for improving software maintainability, performance, and/or security by associating a unique marker to each software code-block; the system comprising of a plurality of processors, a plurality of code-blocks, and a marker associated with each code-block. The system may also include a special hardware register (code-block marker hardware register) in each processor for identifying the markers of the code-blocks executed by the processor, without changing any of the plurality of code-blocks.

Type: Grant

Filed: January 29, 2010

Date of Patent: September 30, 2014

Assignee: International Business Machines Corporation

Inventors: Ramanjaneya S. Burugula, Joefon Jann, Pratap C. Pattnaik
Data prefetching and coalescing for partitioned global address space languages

Patent number: 8839219

Abstract: An illustrative embodiment of a computer-implemented process for shared data prefetching and coalescing optimization versions a loop containing one or more shared references into an optimized loop and an un-optimized loop, transforms the optimized loop into a set of loops, and stores shared access associated information of the loop using a prologue loop in the set of loops. The shared access associated information pertains to remote data and is collected using the prologue loop in absence of network communication and builds a hash table. An associated data structure is updated each time the hash table is entered, and is sorted to remove duplicate entries and create a reduced data structure. Patterns across entries of the reduced data structure are identified and entries are coalesced. Data associated with a coalesced entry is pre-fetched using a single communication and a local buffer is populated with the fetched data for reuse.

Type: Grant

Filed: October 24, 2012

Date of Patent: September 16, 2014

Assignee: International Business Machines Corporation

Inventors: Michail Alvanos, Ettore Tiotto
Method, system and program product for optimizing emulation of a suspected malware

Patent number: 8826245

Abstract: A method, system and program product for optimizing emulation of a suspected malware. The method includes identifying, using an emulation optimizer tool, whether an instruction in a suspected malware being emulated by an emulation engine in a virtual environment signifies a long loop and, if so, generating a first hash for the loop. Further, the method includes ascertaining whether the first hash generated matches any long loop entries in a storage and, if so calculating a second hash for the long loop. Furthermore, the method includes inspecting any long loop entries ascertained to find an entry having a respective second hash matching the second hash calculated. If an entry matching the second hash calculated is found, the method further includes updating one or more states of the emulation engine, such that, execution of the long loop of the suspected malware is skipped, which optimizes emulation of the suspected malware.

Type: Grant

Filed: May 23, 2013

Date of Patent: September 2, 2014

Assignee: International Business Machines Corporation

Inventor: Ji Yan Wu
Memory disambiguation hardware to support software binary translation

Patent number: 8826257

Abstract: A method of memory disambiguation hardware to support software binary translation is provided. This method includes unrolling a set of instructions to be executed within a processor, the set of instructions having a number of memory operations. An original relative order of memory operations is determined. Then, possible reordering problems are detected and identified in software. The reordering problem being when a first memory operation has been reordered prior to and aliases to a second memory operation with respect to the original order of memory operations. The reordering problem is addressed and a relative order of memory operations to the processor is communicated.

Type: Grant

Filed: March 30, 2012

Date of Patent: September 2, 2014

Assignee: Intel Corporation

Inventors: Muawya M. Al-Otoom, Paul Caprioli, Abhay S. Kanhere, Arvind Krishnaswamy, Omar M. Shaikh
Efficient software cache accessing with handle reuse

Patent number: 8819651

Abstract: A mechanism for efficient software cache accessing with handle reuse is provided. The mechanism groups references in source code into a reference stream with the reference stream having a size equal to or less than a size of a software cache line. The source code is transformed into optimized code by modifying the source code to include code for performing at most two cache lookup operations for the reference stream to obtain two cache line handles. Moreover, the transformation involves inserting code to resolve references in the reference stream based on the two cache line handles. The optimized code may be output for generation of executable code.

Type: Grant

Filed: July 22, 2008

Date of Patent: August 26, 2014

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, Marc Gonzalez Tallada, John K. O'Brien
VECTORIZATION IN AN OPTIMIZING COMPILER

Publication number: 20140237460

Abstract: An optimizing compiler includes a vectorization mechanism that optimizes a computer program by substituting code that includes one or more vector instructions (vectorized code) for one or more scalar instructions. The cost of the vectorized code is compared to the cost of the code with only scalar instructions. When the cost of the vectorized code is less than the cost of the code with only scalar instructions, the vectorization mechanism determines whether the vectorized code will likely result in processor stalls. If not, the vectorization mechanism substitutes the vectorized code for the code with only scalar instructions. When the vectorized code will likely result in processor stalls, the vectorization mechanism does not substitute the vectorized code, and the code with only scalar instructions remains in the computer program.

Type: Application

Filed: March 9, 2013

Publication date: August 21, 2014

Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventor: William J. Schmidt
Dynamic optimization of mobile services

Patent number: 8813044

Abstract: A method, system, and article of manufacture are disclosed for transforming a definition of a process for delivering a service. This service process definition is comprised of computer readable code. The method comprises the steps of expressing a given set of assumptions in a computer readable code; and transforming said process definition by using a processing unit to apply said assumptions to said process definition to change the configuration of the process definition. The process definition may be transformed by using factors relating to the specific context in or for which the process definition is executed. Also, the process definition may be transformed by identifying, in a flow diagram for the service process definition, flows to which the assumptions apply, and applying program rewriting techniques to those identified flows.

Type: Grant

Filed: September 6, 2012

Date of Patent: August 19, 2014

Assignee: International Business Machines Corporation

Inventors: David F. Bantz, Steven J. Mastrianni, James R. Moulic, Dennis G. Shea
Program generation device, program production method, and program

Patent number: 8806466

Abstract: A program generation apparatus references a source program including a loop for executing a block N times (N?2) and having such dependence that a variable defined in a statement in the block pertaining to ith execution (1?i<N) is referenced by a statement in the block pertaining to jth execution (i<j?N), calculates equivalent representations of variables in the block pertaining to the ith execution and the block pertaining to any other execution than the ith execution, specifies, with respect to each representation of a target variable causing the dependence, a representation of a variable not causing the dependence that is equivalent to the representation of the target variable, and generates a program being for executing the block M times (M?N) and including a statement including the specified representation in place of each representation of the target variable.

Type: Grant

Filed: July 4, 2011

Date of Patent: August 12, 2014

Assignee: Panasonic Corporation

Inventors: Akira Tanaka, Hiroyuki Morishita, Akihiko Inoue
Program parallelization device and program product

Patent number: 8799881

Abstract: According to one embodiment, a parallelizing unit divides a loop into first and second processes based on a program to be converted and division information. The first and second processes respectively have termination control information, loop control information, and change information. The parallelizing unit inserts into the first process a determination process determining whether the second process is terminated at execution of an (n?1)th iteration of the second process when the second process is subsequent to the first process or determining whether the second process is terminated at execution of an nth iteration of the second process when the second process precedes the first process. The parallelizing unit inserts into the second process a control process controlling execution of the second process based on the result of determination notified by the determination process.

Type: Grant

Filed: July 12, 2011

Date of Patent: August 5, 2014

Assignee: Kabushiki Kaisha Toshiba

Inventors: Nobuaki Tojo, Hidenori Matsuzaki
Loop parallelization based on loop splitting or index array

Patent number: 8793675

Abstract: Methods and apparatus to provide loop parallelization based on loop splitting and/or index array are described. In one embodiment, one or more split loops, corresponding to an original loop, are generated based on the mis-speculation information. In another embodiment, a plurality of subloops are generated from an original loop based on an index array. Other embodiments are also described.

Type: Grant

Filed: December 24, 2010

Date of Patent: July 29, 2014

Assignee: Intel Corporation

Inventors: Jin Lin, Nishkam Ravi, Xinmin Tian, John L. Ng, Renat V. Valiullin
SPECULATIVE MEMORY DISAMBIGUATION ANALYSIS AND OPTIMIZATION WITH HARDWARE SUPPORT

Publication number: 20140189667

Abstract: Methods and apparatus to provide speculative memory disambiguation analysis and optimization with hardware support are described. In one embodiment, input code is analyzed to determine one or more memory locations to be accessed by the input program and output code is generated based on the input code and one or more assumptions about invariance of the one or more memory locations. The output code is generated also based on hardware transactional memory support and hardware dynamic disambiguation support. Other embodiments are also described.

Type: Application

Filed: December 29, 2012

Publication date: July 3, 2014

Inventors: Abhay S. Kanhere, Suriya Subramanian, Saurabh S. Shukla
AUTOMATIC PIPELINE COMPOSITION

Publication number: 20140189666

Abstract: A method and apparatus for automatic pipeline are provided herein. Syntax elements may be manually inserted into the code, or automatically injected into the code. The syntax elements may specify hints such as data type parameters to independent functions allowing the functions to be automatically coalesced into a single loop, providing optimized data accesses to be coalesced for each function in the pipeline within the single loop. A run-time system produces optimized machine code for a target processor using syntax elements to guide the optimizations. Additionally, the pipeline may be executed. The pipeline includes the coalesced functions and data accesses.

Type: Application

Filed: December 27, 2012

Publication date: July 3, 2014

Inventors: Scott A. Krig, Michael D. Jeronimo
Dynamic optimization of mobile services

Patent number: 8769507

Abstract: A method, system, and article of manufacture are disclosed for transforming a definition of a process for delivering a service on a specified computing device. This service process definition is comprised of computer readable code. The method comprises the steps of expressing a given set of assumptions in a computer readable code; and transforming the definition by using a processing unit to apply the assumptions to the definition of the process to change the way in which the process operates. The definition of the process may be transformed by using factors relating to the specific context in or for which the definition is executed. Also, the definition may be transformed by identifying, in a flow diagram for the process, flows to which the assumptions apply, and applying program rewriting techniques to those identified flows.

Type: Grant

Filed: May 14, 2009

Date of Patent: July 1, 2014

Assignee: International Business Machines Corporation

Inventors: David F. Bantz, Steven J. Mastrianni, James R. Moulic, Dennis G. Shea
Loop Invariant Method Expression Hoisting

Publication number: 20140173576

Abstract: A system, method and computer-readable medium are disclosed for improving the performance of a compiler. A set of source code instructions are processed to generate a plurality of source code instruction subsets, each of which is respectively associated with a mathematical operator. The source code subsets are then reordered to “hoist,” or place, a source code instruction subset associated with a product operator before a source code instruction subset associated with a summation operator. The plurality of source code instruction subsets are iteratively reordered until no source code instruction subset associated with a summation operator precedes a source code instruction subset associated with a product operator. A compiler is then used to compile the resulting reordered plurality of source code instruction subsets into a set of optimized object code instructions.

Type: Application

Filed: December 13, 2012

Publication date: June 19, 2014

Inventor: Mohammed Javed Absar
Throughput-aware software pipelining for highly multi-threaded systems

Patent number: 8752036

Abstract: Embodiments of the invention provide systems and methods for throughput-aware software pipelining in compilers to produce optimal code for single-thread and multi-thread execution on multi-threaded systems. A loop is identified within source code as a candidate for software pipelining. An attempt is made to generate pipelined code (e.g., generate an instruction schedule and a set of register assignments) for the loop in satisfaction of throughput-aware pipelining criteria, like maximum register count, minimum trip count, target core pipeline resource utilization, maximum code size, etc. If the attempt fails to generate code in satisfaction of the criteria, embodiments adjust one or more settings (e.g., by reducing scalarity or latency settings being used to generate the instruction schedule).

Type: Grant

Filed: October 31, 2011

Date of Patent: June 10, 2014

Assignee: Oracle International Corporation

Inventors: Spiros Kalogeropulos, Partha Tirumalai
Reducing branch misprediction impact in nested loop code

Patent number: 8745607

Abstract: According to one aspect of the present disclosure, a method and technique for reducing branch misprediction impact for nested loop code is disclosed. The method includes: responsive to identifying code having an outer loop and an inner loop, determining a quantity of iterations of the inner loop for an initial number of iterations of the outer loop; determining a number of processor cycles for executing the quantity of iterations of the inner loop for the initial number of iterations of the outer loop; determining whether the number of processor cycles is less than a threshold; and responsive to determining that the number of processor cycles is less than the threshold, fully unrolling the inner loop for the initial number of iterations of the outer loop.

Type: Grant

Filed: November 11, 2011

Date of Patent: June 3, 2014

Assignee: International Business Machines Corporation

Inventors: Madhavi G. Valluri, Steven W. White
Method and system for implementing parallel execution in a computing system and in a circuit simulator

Patent number: 8738348

Abstract: A method and mechanism for implementing a general purpose scripting language that supports parallel execution is described. In one approach, parallel execution is provided in a seamless and high-level approach rather than requiring or expecting a user to have low-level programming expertise with parallel processing languages/functions. Also described is a system and method for performing circuit simulation. The present approach provides methods and systems that create reusable and independent measurements for use with circuit simulators. Also disclosed are parallelizable measurements having looping constructs that can be run without interference between parallel iterations. Reusability is enhanced by having parameterized measurements. Revisions and history of the operating parameters of circuit designs subject to simulation are tracked.

Type: Grant

Filed: June 15, 2012

Date of Patent: May 27, 2014

Assignee: Cadence Design Systems, Inc.

Inventor: Kenneth S. Kundert
Parallelizing non-countable loops with hardware transactional memory

Patent number: 8739141

Abstract: A system and method for speculatively parallelizing non-countable loops in a multi-threaded application. A multi-core processor receives instructions for a multi-threaded application. The application may contain non-countable loops. Non-countable loops have an iteration count value that cannot be determined prior to the execution of the non-countable loop, a loop index value that cannot be non-speculatively determined prior to the execution of an iteration of the non-countable loop, and control that is not transferred out of the loop body by a code line in the loop body. The compiler replaces the non-countable loop with a parallelized loop pattern that uses outlined function calls defined in a parallelization library (PL) in order to speculatively execute iterations of the parallelized loop. The parallelized loop pattern is configured to squash and re-execute any speculative thread of the parallelized loop pattern that is signaled to have a transaction failure.

Type: Grant

Filed: May 19, 2008

Date of Patent: May 27, 2014

Assignee: Oracle America, Inc.

Inventors: Yonghong Song, Spiros Kalogeropulos, Partha P. Tirumalai
Loop transformation for computer compiler optimization

Patent number: 8732679

Abstract: A new computer-compiler architecture includes code analysis processes in which loops present in an intermediate instruction set are transformed into more efficient loops prior to fully executing the intermediate instruction set. The compiler architecture starts by generating the equivalent intermediate instructions for the original high level source code. For each loop in the intermediate instructions, a total cycle cost is calculated using a cycle cost table associated with the compiler. The compiler then generates intermediate code for replacement loops in which all conversion instructions are removed. The cycle costs for these new transformed loops are then compared against the total cycle cost for the original loops. If the total cycle costs exceed the new cycle costs, the compiler will replace the original loops in the intermediate instructions with the new transformed loops prior to generation of final code using the instruction set of the processor.

Type: Grant

Filed: March 16, 2010

Date of Patent: May 20, 2014

Assignee: QUALCOMM Incorporated

Inventors: Sumesh Udayakumaran, Chihong Zhang
Unrolling quantifications to control in-degree and/or out-degree of automaton

Patent number: 8726256

Abstract: Apparatus, systems, and methods for a compiler are disclosed. One such compiler parses a human readable expression into a syntax tree and converts the syntax tree into an automaton having in-transitions and out-transitions. Converting can include unrolling the quantification as a function of in-degree limitations wherein in-degree limitations includes a limit on the number of transitions into a state of the automaton. The compiler can also convert the automaton into an image for programming a parallel machine, and publishes the image. Additional apparatus, systems, and methods are disclosed.

Type: Grant

Filed: January 24, 2012

Date of Patent: May 13, 2014

Assignee: Micron Technology, Inc.

Inventors: Junjuan Xu, Paul Glendenning
Pipelined loop parallelization with pre-computations

Patent number: 8726251

Abstract: Embodiments of the invention provide systems and methods for automatically parallelizing loops with non-speculative pipelined execution of chunks of iterations with pre-computation of selected values. Non-DOALL loops are identified and divided the loops into chunks. The chunks are assigned to separate logical threads, which may be further assigned to hardware threads. As a thread performs its runtime computations, subsequent threads attempt to pre-compute their respective chunks of the loop. These pre-computations may result in a set of assumed initial values and pre-computed final variable values associated with each chunk. As subsequent pre-computed chunks are reached at runtime, those assumed initial values can be verified to determine whether to proceed with runtime computation of the chunk or to avoid runtime execution and instead use the pre-computed final variable values.

Type: Grant

Filed: March 29, 2011

Date of Patent: May 13, 2014

Assignee: Oracle International Corporation

Inventors: Spiros Kalogeropulos, Partha Pal Tirumalai
Vectorization of program code

Patent number: 8713549

Abstract: A method for vectorization of a block of code is provided. The method comprises receiving a first block of code as input; and converting the first block of code into at least a second block of code and a third block of code. The first block of code accesses a first set of memory addresses that are potentially misaligned. The second block of code performs conditional leaping address incrementation to selectively access a first subset of the first set of memory addresses. The third block of code accesses a second subset of the first set of memory addresses starting from an aligned memory address, simultaneously accessing multiple memory addresses at a time. No memory address belongs to both the first subset and the second subset of memory addresses.

Type: Grant

Filed: September 7, 2012

Date of Patent: April 29, 2014

Assignee: International Business Machines Corporation

Inventors: Dorit Nuzman, Ira Rosen, Ayal Zaks
ADAPTIVE INSTRUCTION PREFETCHING AND FETCHING MEMORY SYSTEM APPARATUS AND METHOD FOR MICROPROCESSOR SYSTEM

Publication number: 20140115569

Abstract: A method and system of the instruction packing and scaling are designed for simultaneously enhancing energy efficiency by concurrent and advanced prefetching/fetching instructions via the small and/or banked caches and for improving the performance of microprocessors by reducing the fraction of program and by employing the simple and fast caches. The invention is also designed for converting high fraction code to simplified, branch-reduced, and hidden code during compilation time, for storing packed/scaled code to concurrently accessible the plurality of caches and main memories, and for reverting the code to the native instructions during the instruction prefetch and fetch operations. Consequently, the invention does not forward many flow control instructions including procedure callers/returns and unconditional branches to microprocessors.

Type: Application

Filed: October 23, 2012

Publication date: April 24, 2014

Inventor: Yong-Kyu Jung
SYSTEMS AND METHODS FOR PARALLELIZATION OF PROGRAM CODE, INTERACTIVE DATA VISUALIZATION, AND GRAPHICALLY-AUGMENTED CODE EDITING

Publication number: 20140115560

Abstract: A system for providing a computer configured to read an immutable value for a variable; read the value of the variable at a specific timestamp, thereby providing an ability to create looping constructs; set a current or next value of a loop variable as a function of previous or current loop variable values; read a set of all values that a variable will assume; push or scattering the values into unordered collections; and reduce the collections into a single value.

Type: Application

Filed: October 21, 2013

Publication date: April 24, 2014

Inventor: Luke Hutchison
Leveraging multicore systems when compiling procedures

Patent number: 8701098

Abstract: A method, apparatus and program product are provided for parallelizing analysis and optimization in a compiler. A plurality of basic blocks and a subset of data points of a computer program is prepared for processing by a main thread selected from a plurality of hardware threads. The plurality of prepared basic blocks and subset of data points are placed in a shared data structure by the main thread. A prepared basic block of the plurality of prepared basic blocks and/or a tuple associated with the subset of data points is concurrently retrieved from the shared data structure by a work thread selected from the plurality of hardware threads. A compiler analysis or optimization is performed on the prepared basic block or tuple by the work thread.

Type: Grant

Filed: April 2, 2009

Date of Patent: April 15, 2014

Assignee: International Business Machines Corporation

Inventors: Robert R. Roediger, William J. Schmidt
Accelerating generic loop iterators using speculative execution

Patent number: 8701099

Abstract: A method, a system and a computer program product for effectively accelerating loop iterators using speculative execution of iterators. An Efficient Loop Iterator (ELI) utility detects initiation of a target program and initiates/spawns a speculative iterator thread at the start of the basic code block ahead of the code block that initiates a nested loop. The ELI utility assigns the iterator thread to a dedicated processor in a multi-processor system. The speculative thread runs/executes ahead of the execution of the nested loop and calculates indices in a corresponding multidimensional array. The iterator thread adds all the precomputed indices to a single queue. As a result, the ELI utility effectively enables a multidimensional loop to be replaced by a single dimensional loop. At the beginning of (or during) each iteration of the iterator, the ELI utility “dequeues” an entry from the queue to use the entry to access the array upon which the ELI utility iterates.

Type: Grant

Filed: November 2, 2010

Date of Patent: April 15, 2014

Assignee: International Business Machines Corporation

Inventors: Ganesh Bikshandi, Dibyendu Das, Smruti Ranjan Sarangi
Scalable property-sensitive points-to analysis for program code

Patent number: 8694971

Abstract: A novel system, computer program product, and method are disclosed for transforming a program to facilitate points-to analysis. The method begins with accessing at least a portion of program code, such as JavaScript. In one example, a method with at least one dynamic property correlation is identified for extraction. When a method m is identified for extraction with the dynamic property correlation, a body of the loop l in the method m is extracted. A new method mp is created to include the body of the loop l with the variable i as a parameter. The loop l is substituted in the program code of the method m with the new method mp to create a transformed program code.

Type: Grant

Filed: October 5, 2011

Date of Patent: April 8, 2014

Assignee: International Business Machines Corporation

Inventors: Satish Chandra, Julian Dolby, Manu Sridharan, Frank Tip
LOOP VECTORIZATION METHODS AND APPARATUS

Publication number: 20140096119

Abstract: Loop vectorization methods and apparatus are disclosed. An example method includes setting a dynamic adjustment value of a vectorization loop; executing the vectorization loop to vectorize a loop by grouping iterations of the loop into one or more vectors; identifying a dependency between iterations of the loop as; and setting the dynamic adjustment value based on the identified dependency.

Type: Application

Filed: September 28, 2012

Publication date: April 3, 2014

Inventors: NALINI VASUDEVAN, JAYASHANKAR BHARADWAJ, CHRISTOPHER J. HUGHES, MILIND B. GIRKAR, MARK J. CHARNEY, ROBERT VALENTINE, VICTOR W. LEE, DAEHYUN KIM, ALBERT HARTONO, SARA S. BAGHSORKHI
Static profitability control for speculative automatic parallelization

Patent number: 8677337

Abstract: A compilation method and mechanism for parallelizing program code. A method for compilation includes analyzing source code and identifying candidate code for parallelization. Having identified one or more suitable candidates, the profitability of parallelizing the candidate code is determined. If the profitability determination meets a predetermined criteria, then the candidate code may be parallelized. If, however, the profitability determination does not meet the predetermined criteria, then the candidate code may not be parallelized. Candidate code may comprises a loop, and determining profitability of parallelization may include computing a probability of transaction failure for the loop. Additionally, a determination of an execution time of a parallelized version of the loop is made. If the determined execution time is less than an execution time of a non-parallelized version of said loop by at least a given amount, then the loop may be parallelized.

Type: Grant

Filed: May 1, 2008

Date of Patent: March 18, 2014

Assignee: Oracle America, Inc.

Inventors: Yonghong Song, Spiros Kalogeropulos, Partha P. Tirumalai
Processors and compiling methods for processors

Patent number: 8677330

Abstract: A compiling method compiles an object program to be executed by a processor having a plurality of execution units operable in parallel. In the method a first availability chain is created from a producer instruction (p1), scheduled for execution by a first one of the execution units (20: AGU), to a first consumer instruction (c1), scheduled for execution by a second one of the execution units (22: EXU) and requiring a value produced by the said producer instruction. The first availability chain comprises at least one move instruction (mv1-mv3) for moving the required value from a first point (20: ARF) accessible by the first execution unit to a second point (22: DRF) accessible by the second execution unit.

Type: Grant

Filed: June 9, 2010

Date of Patent: March 18, 2014

Assignee: Altera Corporation

Inventors: Marcio Merino Fernandes, Raymond Malcolm Livesley
Data dependence testing for loop fusion with code replication, array contraction, and loop interchange

Patent number: 8677338

Abstract: Methods and apparatus to data dependence testing for loop fusion, e.g., with code replication, array contraction, and/or loop interchange, are described. In one embodiment, a compiler may optimize code for efficient execution during run-time by testing for dependencies associated with improving memory locality through code replication in loops that enable various loop transformations. Other embodiments are also described.

Type: Grant

Filed: June 4, 2008

Date of Patent: March 18, 2014

Assignee: Intel Corporation

Inventors: John L. Ng, Rakesh Krishnaiyer, Alexander Y. Ostanevich
Tiling across loop nests with possible recomputation

Patent number: 8671401

Abstract: Described is a technology by which a series of loop nests corresponding to source code are detected by a compiler, with the series of loop nests tiled together, (thereby increasing the ratio of cache hits to misses in a multi-processor environment). The compiler transforms the series of loop nests into a plurality of tile loops within a controller loop, including using dependency analysis to determine which results from a tile loop need to be pre-computed before another tile loop. For dependency analysis, the compiler may use a directed acyclic graph as a high-level intermediate representation, and split the graph into sub-graphs each representing an array. The compiler uses descriptors processed from the graph to determine the controller loop and the tile loops within that controller loop.

Type: Grant

Filed: April 9, 2007

Date of Patent: March 11, 2014

Assignee: Microsoft Corporation

Inventors: Siddhartha Puri, Jaydeep P. Marathe
Compiler apparatus

Patent number: RE45199

Abstract: A compiler apparatus, which can perform software pipelining optimization that has a considerable effect of reducing the number of execution cycles taken to complete a loop process, converts a source program into a machine program for a processor which is capable of parallel processing. The compiler apparatus is composed of: a parsing unit operable to parse the source program and then to convert the source program into an intermediate program which is described in an intermediate language; an optimization unit operable to optimize the intermediate program; and a conversion unit operable to convert the optimized intermediate program into the machine language program, wherein the optimization unit is operable to execute software pipelining, by inserting a transfer instruction, which is used for transferring data between operands, into a loop process included in the intermediate program so that a data dependence relation is changed.

Type: Grant

Filed: September 14, 2012

Date of Patent: October 14, 2014

Assignee: Panasonic Corporation

Inventors: Shohei Michimoto, Taketo Heishi, Hajime Ogawa, Teruo Kawabata

prev 1 2 3 4 5 6 7 … next