Including Loop Patents (Class 717/160)
  • Patent number: 8959501
    Abstract: Embodiments are directed to implementing a generic SIMD data type in software code. In an embodiment, a computer system accesses a portion of software code that includes an algorithm with a generic SIMD data type that includes a variable number of elements. The algorithm with the generic SIMD data type is to be processed by a specific processor that includes various specific hardware features. The computer system determines at runtime a portion of customized processor-specific code that is to be used with the specified processor based on the generic SIMD data type, wherein the runtime determination resolves the number of elements that are to be used with the specified processor. The computer system also processes the software code including the algorithm with the generic SIMD data type using the determined, customized processor-specific code.
    Type: Grant
    Filed: December 14, 2010
    Date of Patent: February 17, 2015
    Assignee: Microsoft Corporation
    Inventors: Carol Thompson Eidt, David L. Detlefs
  • Patent number: 8943487
    Abstract: Particular embodiments optimize a C++ function comprising one or more loops for symbolic execution, comprising for each loop, if there is a branching condition within the loop, then rewrite the loop to move the branching condition outside the loop. Particular embodiments may further optimize the C++ function through simplified symbolic expressions and adding constructs forcing delayed interpretation of symbolic expressions during the symbolic execution.
    Type: Grant
    Filed: January 20, 2011
    Date of Patent: January 27, 2015
    Assignee: Fujitsu Limited
    Inventors: Guodong Li, Sreeranga P. Rajan, Indradeep Ghosh
  • Patent number: 8935684
    Abstract: A system, method and computer-readable medium are disclosed for improving the performance of a compiler. A set of source code instructions are processed to generate a plurality of source code instruction subsets, each of which is respectively associated with a mathematical operator. The source code subsets are then reordered to “hoist,” or place, a source code instruction subset associated with a product operator before a source code instruction subset associated with a summation operator. The plurality of source code instruction subsets are iteratively reordered until no source code instruction subset associated with a summation operator precedes a source code instruction subset associated with a product operator. A compiler is then used to compile the resulting reordered plurality of source code instruction subsets into a set of optimized object code instructions.
    Type: Grant
    Filed: December 13, 2012
    Date of Patent: January 13, 2015
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Mohammed Javed Absar
  • Patent number: 8930929
    Abstract: A reconfigurable processor which merges an inner loop and an outer loop which are included in a nested loop and allocates the merged loop to processing elements in parallel, thereby reducing processing time to process the nested loop. The reconfigurable processor may extract loop execution frequency information from the inner loop and the outer loop of the nested loop, and may merge the inner loop and the outer loop based on the extracted loop execution frequency information.
    Type: Grant
    Filed: April 14, 2011
    Date of Patent: January 6, 2015
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Min-Wook Ahn, Dong-Hoon Yoo, Jin-Seok Lee, Bernhard Egger, Tai-Song Jin, Won-Sub Kim, Hee-Jin Ahn
  • Publication number: 20150007154
    Abstract: Methods and systems to convert scalar computer program loops having loop carried dependences to vector computer program loops are disclosed. One example method and system generates a first predicate set associated with a first conditionally executed statement. The first predicate set contains a first set of predicates that cause a variable to be defined in a scalar computer program loop at or before the variable is defined by the first conditionally executed statement. The method and system also generates a second predicate set associated with the first conditionally executed statement. The second predicate set contains a second set of predicates that cause the variable to be used in the scalar computer program loop at or before the variable is defined by the first conditionally executed statement.
    Type: Application
    Filed: March 15, 2013
    Publication date: January 1, 2015
    Inventors: Jayashankar Bharadwaj, Nalini Vasudevan, Albert Hartono, Sara S. Baghsorkhi
  • Patent number: 8918770
    Abstract: A system and method for compiling includes, for a parallelizable code portion of an application stored on a computer readable storage medium, determining one or more variables that are to be transferred to and/or from a coprocessor if the parallelizable code portion were to be offloaded. A start location and an end location are determined for at least one of the one or more variables as a size in memory. The parallelizable code portion is transformed by inserting an offload construct around the parallelizable code portion and passing the one or more variables and the size as arguments of the offload construct such that the parallelizable code portion is offloaded to a coprocessor at runtime.
    Type: Grant
    Filed: August 24, 2012
    Date of Patent: December 23, 2014
    Assignee: NEC Laboratories America, Inc.
    Inventors: Nishkam Ravi, Tao Bao, Ozcan Ozturk, Srimat Chakradhar
  • Patent number: 8914782
    Abstract: Source code is generated that includes one or more iterator-based expressions such as declarative queries. The source code is translated into an intermediate language that classifies operators making up the iterator-based expressions into classes based on whether the operators are aggregating, element-wise, or sink operators. The intermediate language, including the identified classes, is processed using an automaton to replace the iterator-based expressions with one or more equivalent non-iterator-based expressions. Where an iterator-based expression is nested, the nested expression is processed using an equivalent number of nested automatons. The resulting optimized source code may be compiled and executed using fewer virtual function calls than the equivalent non-optimized source code.
    Type: Grant
    Filed: November 10, 2010
    Date of Patent: December 16, 2014
    Assignee: Microsoft Corporation
    Inventors: Michael Isard, Yuan Yu, Derek Gordon Murray
  • Patent number: 8904366
    Abstract: In one embodiment, the invention is a method and apparatus for use of vectorization instruction sets. One embodiment of a method for generating vector instructions includes receiving source code written in a high-level programming language, wherein the source code includes at least one high-level instruction that performs multiple operations on a plurality of vector operands, and compiling the high-level instruction(s) into one or more low-level instructions, wherein the low-level instructions are in an instruction set of a specific computer architecture.
    Type: Grant
    Filed: May 15, 2009
    Date of Patent: December 2, 2014
    Assignee: International Business Machines Corporation
    Inventors: Henrique Andrade, Bugra Gedik, Hua Yong Wang, Kun-Lung Wu
  • Publication number: 20140344795
    Abstract: A compiler determines executability of loop fusion, for each of a plurality of loops existing in a code to be processed, based on performance information of a system where the code to be processed is executed and based on operands and number of data transfers executed inside each of the loops. Then, the compiler executes fusion of loop processing in accordance with a determination result of executability of the loop fusion.
    Type: Application
    Filed: April 17, 2014
    Publication date: November 20, 2014
    Applicant: FUJITSU LIMITED
    Inventors: Tomoko Nikko, Shuichi Chiba
  • Patent number: 8893103
    Abstract: Methods and systems for asynchronous offload to many-core coprocessors include splitting a loop in an input source code into a sampling sub-part, a many integrated core (MIC) sub-part, and a central processing unit (CPU) sub-part; executing the sampling sub-part with a processor to determine loop characteristics including memory- and processor-operations executed by the loop; identifying optimal split boundaries based on the loop characteristics such that the MIC sub-part will complete in a same amount of time when executed on a MIC processor as the CPU sub-part will take when executed on a CPU; and modifying the input source code to split the loop at the identified boundaries, such that the MIC sub-part is executed on a MIC processor and the CPU sub-part is concurrently executed on a CPU.
    Type: Grant
    Filed: July 12, 2013
    Date of Patent: November 18, 2014
    Assignee: NEC Laboratories America, Inc.
    Inventors: Nishkam Ravi, Yi Yang, Srimat Chakradhar
  • Patent number: 8887142
    Abstract: Loop control flow diversion supports thread synchronization, garbage collection, and other situations involving suspension of long-running loops. Divertible loops have a loop body, a loop top, an indirection cell containing a loop top address, and a loop jump instruction sequence which references the indirection cell. In normal execution, control flows through the indirection cell to the loop top. After the indirection cell is altered, however, execution flow is diverted to a point away from the loop top. Operations such as garbage collection are performed while the loop (and hence the thread(s) using the loop) is thus diverted. The kernel or another thread then restores the loop top address into the indirection cell, and execution flow again continues through the restored indirection cell to the loop top.
    Type: Grant
    Filed: March 10, 2010
    Date of Patent: November 11, 2014
    Assignee: Microsoft Corporation
    Inventors: Scott Mosier, Michael McKenzie Magruder, Frank V. Peschel-Gallee
  • Publication number: 20140331216
    Abstract: A method and apparatus for translating a multithread program code are provided. The method includes: dividing a multithread program code into a plurality of statements according to a synchronization point; generating at least one loop group by combining one or more adjacent statements based on a number of instructions included in the plurality of statements; expanding or renaming variables in each of the plurality of statements so that each statement included in the at least one loop group is executed with respect to a work item of a different work group; and enclosing each of the generated at least one loop group respectively with a work item coalescing loop.
    Type: Application
    Filed: May 2, 2014
    Publication date: November 6, 2014
    Applicant: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Seong-Gun KIM, Dong-Hoon YOO, Jin-Seok LEE, Seok-Joong HWANG
  • Patent number: 8881124
    Abstract: According to the conventional loop parallelization method, when a loop in which a value of a loop-carried dependency variable can be calculated in all of the iterations without sequentially executing the loop from the start, it is determined that DOALL parallelization is not applicable due to the loop-carried dependency variable. Accordingly, the loop is sequentially executed or parallelized by using DOACROSS parallelization that executes a loop including a loop-carried dependency variable. That is, there is a problem that an expression including a loop-carried dependency cannot be parallelized and efficiently processed with use of a multi-processor. By generating initial value calculating codes, the loop-carried dependency in a source code prior to parallelization can be solved, and by dividing a loop included in the source code into subloops that can be executed in parallel, the multi-processor can efficiently process the source code.
    Type: Grant
    Filed: December 13, 2011
    Date of Patent: November 4, 2014
    Assignee: Panasonic Corporation
    Inventor: Daisuke Baba
  • Patent number: 8869129
    Abstract: An apparatus and method for scheduling an instruction are provided. The apparatus includes an analyzer configured to analyze dependency of a plurality of recurrence loops and a scheduler configured to schedule the recurrence loops based the analyzed dependencies. When scheduling a plurality of recurrence loops, the apparatus first schedules a dominant loop whose loop head has no dependency on another loop among the recurrence loops.
    Type: Grant
    Filed: November 2, 2009
    Date of Patent: October 21, 2014
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Tae-wook Oh, Won-sub Kim, Bernhard Egger
  • Patent number: 8856768
    Abstract: A method and system are provided for deriving a resultant software program from an originating software program having overlapping branches, wherein the resultant software project has either no overlapping branches or fewer overlapping branches than the originating software program. A preferred embodiment of the invented method generates a resultant software program that has no overlapping branches. The resultant software is more easily converted into programming reconfigurable logic than the originating software program. Separate and individually applicable aspects of the invented method are used to eliminate all four possible states of two overlapping branches, i.e., forward branch overlapping forward branch, back branch overlapping back branch, and each of the two possible and distinguishable states of forward branch and back branch overlap. One or more elements of each aspect of the invention may be performed by one or more computers or processors, or by means of a computer or a communications network.
    Type: Grant
    Filed: January 30, 2012
    Date of Patent: October 7, 2014
    Inventor: Robert Keith Mykland
  • Patent number: 8856769
    Abstract: A method and system of the instruction packing and scaling are designed for simultaneously enhancing energy efficiency by concurrent and advanced prefetching/fetching instructions via the small and/or banked caches and for improving the performance of microprocessors by reducing the fraction of program and by employing the simple and fast caches. The invention is also designed for converting high fraction code to simplified, branch-reduced, and hidden code during compilation time, for storing packed/scaled code to concurrently accessible the plurality of caches and main memories, and for reverting the code to the native instructions during the instruction prefetch and fetch operations. Consequently, the invention does not forward many flow control instructions including procedure callers/returns and unconditional branches to microprocessors.
    Type: Grant
    Filed: October 23, 2012
    Date of Patent: October 7, 2014
    Inventor: Yong-Kyu Jung
  • Patent number: 8856762
    Abstract: A loop detection method, system, and article of manufacture for determining whether a sequence of unit processes continuously executed among unit processes in a program is a loop by means of computational processing performed by a computer. The method includes: reading address information on the sequence of unit processes; comparing an address of a unit process as a loop starting point candidate with an address of a last unit process in the sequence of unit processes; reading call stack information on the sequence of unit processes; comparing a call stack upon execution of the unit process as the loop starting point candidate with a call stack upon execution of the last unit process; outputting a determination result indicating that the sequence of unit processes forms a loop if the respective comparison results of the addresses and the call stacks match with each other.
    Type: Grant
    Filed: November 21, 2011
    Date of Patent: October 7, 2014
    Assignee: International Business Machines Corporation
    Inventor: Hiroshige Hayashizaki
  • Patent number: 8850410
    Abstract: A system and method for improving software maintainability, performance, and/or security by associating a unique marker to each software code-block; the system comprising of a plurality of processors, a plurality of code-blocks, and a marker associated with each code-block. The system may also include a special hardware register (code-block marker hardware register) in each processor for identifying the markers of the code-blocks executed by the processor, without changing any of the plurality of code-blocks.
    Type: Grant
    Filed: January 29, 2010
    Date of Patent: September 30, 2014
    Assignee: International Business Machines Corporation
    Inventors: Ramanjaneya S. Burugula, Joefon Jann, Pratap C. Pattnaik
  • Patent number: 8839219
    Abstract: An illustrative embodiment of a computer-implemented process for shared data prefetching and coalescing optimization versions a loop containing one or more shared references into an optimized loop and an un-optimized loop, transforms the optimized loop into a set of loops, and stores shared access associated information of the loop using a prologue loop in the set of loops. The shared access associated information pertains to remote data and is collected using the prologue loop in absence of network communication and builds a hash table. An associated data structure is updated each time the hash table is entered, and is sorted to remove duplicate entries and create a reduced data structure. Patterns across entries of the reduced data structure are identified and entries are coalesced. Data associated with a coalesced entry is pre-fetched using a single communication and a local buffer is populated with the fetched data for reuse.
    Type: Grant
    Filed: October 24, 2012
    Date of Patent: September 16, 2014
    Assignee: International Business Machines Corporation
    Inventors: Michail Alvanos, Ettore Tiotto
  • Patent number: 8826245
    Abstract: A method, system and program product for optimizing emulation of a suspected malware. The method includes identifying, using an emulation optimizer tool, whether an instruction in a suspected malware being emulated by an emulation engine in a virtual environment signifies a long loop and, if so, generating a first hash for the loop. Further, the method includes ascertaining whether the first hash generated matches any long loop entries in a storage and, if so calculating a second hash for the long loop. Furthermore, the method includes inspecting any long loop entries ascertained to find an entry having a respective second hash matching the second hash calculated. If an entry matching the second hash calculated is found, the method further includes updating one or more states of the emulation engine, such that, execution of the long loop of the suspected malware is skipped, which optimizes emulation of the suspected malware.
    Type: Grant
    Filed: May 23, 2013
    Date of Patent: September 2, 2014
    Assignee: International Business Machines Corporation
    Inventor: Ji Yan Wu
  • Patent number: 8826257
    Abstract: A method of memory disambiguation hardware to support software binary translation is provided. This method includes unrolling a set of instructions to be executed within a processor, the set of instructions having a number of memory operations. An original relative order of memory operations is determined. Then, possible reordering problems are detected and identified in software. The reordering problem being when a first memory operation has been reordered prior to and aliases to a second memory operation with respect to the original order of memory operations. The reordering problem is addressed and a relative order of memory operations to the processor is communicated.
    Type: Grant
    Filed: March 30, 2012
    Date of Patent: September 2, 2014
    Assignee: Intel Corporation
    Inventors: Muawya M. Al-Otoom, Paul Caprioli, Abhay S. Kanhere, Arvind Krishnaswamy, Omar M. Shaikh
  • Patent number: 8819651
    Abstract: A mechanism for efficient software cache accessing with handle reuse is provided. The mechanism groups references in source code into a reference stream with the reference stream having a size equal to or less than a size of a software cache line. The source code is transformed into optimized code by modifying the source code to include code for performing at most two cache lookup operations for the reference stream to obtain two cache line handles. Moreover, the transformation involves inserting code to resolve references in the reference stream based on the two cache line handles. The optimized code may be output for generation of executable code.
    Type: Grant
    Filed: July 22, 2008
    Date of Patent: August 26, 2014
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, Marc Gonzalez Tallada, John K. O'Brien
  • Publication number: 20140237460
    Abstract: An optimizing compiler includes a vectorization mechanism that optimizes a computer program by substituting code that includes one or more vector instructions (vectorized code) for one or more scalar instructions. The cost of the vectorized code is compared to the cost of the code with only scalar instructions. When the cost of the vectorized code is less than the cost of the code with only scalar instructions, the vectorization mechanism determines whether the vectorized code will likely result in processor stalls. If not, the vectorization mechanism substitutes the vectorized code for the code with only scalar instructions. When the vectorized code will likely result in processor stalls, the vectorization mechanism does not substitute the vectorized code, and the code with only scalar instructions remains in the computer program.
    Type: Application
    Filed: March 9, 2013
    Publication date: August 21, 2014
    Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventor: William J. Schmidt
  • Patent number: 8813044
    Abstract: A method, system, and article of manufacture are disclosed for transforming a definition of a process for delivering a service. This service process definition is comprised of computer readable code. The method comprises the steps of expressing a given set of assumptions in a computer readable code; and transforming said process definition by using a processing unit to apply said assumptions to said process definition to change the configuration of the process definition. The process definition may be transformed by using factors relating to the specific context in or for which the process definition is executed. Also, the process definition may be transformed by identifying, in a flow diagram for the service process definition, flows to which the assumptions apply, and applying program rewriting techniques to those identified flows.
    Type: Grant
    Filed: September 6, 2012
    Date of Patent: August 19, 2014
    Assignee: International Business Machines Corporation
    Inventors: David F. Bantz, Steven J. Mastrianni, James R. Moulic, Dennis G. Shea
  • Patent number: 8806466
    Abstract: A program generation apparatus references a source program including a loop for executing a block N times (N?2) and having such dependence that a variable defined in a statement in the block pertaining to ith execution (1?i<N) is referenced by a statement in the block pertaining to jth execution (i<j?N), calculates equivalent representations of variables in the block pertaining to the ith execution and the block pertaining to any other execution than the ith execution, specifies, with respect to each representation of a target variable causing the dependence, a representation of a variable not causing the dependence that is equivalent to the representation of the target variable, and generates a program being for executing the block M times (M?N) and including a statement including the specified representation in place of each representation of the target variable.
    Type: Grant
    Filed: July 4, 2011
    Date of Patent: August 12, 2014
    Assignee: Panasonic Corporation
    Inventors: Akira Tanaka, Hiroyuki Morishita, Akihiko Inoue
  • Patent number: 8799881
    Abstract: According to one embodiment, a parallelizing unit divides a loop into first and second processes based on a program to be converted and division information. The first and second processes respectively have termination control information, loop control information, and change information. The parallelizing unit inserts into the first process a determination process determining whether the second process is terminated at execution of an (n?1)th iteration of the second process when the second process is subsequent to the first process or determining whether the second process is terminated at execution of an nth iteration of the second process when the second process precedes the first process. The parallelizing unit inserts into the second process a control process controlling execution of the second process based on the result of determination notified by the determination process.
    Type: Grant
    Filed: July 12, 2011
    Date of Patent: August 5, 2014
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Nobuaki Tojo, Hidenori Matsuzaki
  • Patent number: 8793675
    Abstract: Methods and apparatus to provide loop parallelization based on loop splitting and/or index array are described. In one embodiment, one or more split loops, corresponding to an original loop, are generated based on the mis-speculation information. In another embodiment, a plurality of subloops are generated from an original loop based on an index array. Other embodiments are also described.
    Type: Grant
    Filed: December 24, 2010
    Date of Patent: July 29, 2014
    Assignee: Intel Corporation
    Inventors: Jin Lin, Nishkam Ravi, Xinmin Tian, John L. Ng, Renat V. Valiullin
  • Publication number: 20140189667
    Abstract: Methods and apparatus to provide speculative memory disambiguation analysis and optimization with hardware support are described. In one embodiment, input code is analyzed to determine one or more memory locations to be accessed by the input program and output code is generated based on the input code and one or more assumptions about invariance of the one or more memory locations. The output code is generated also based on hardware transactional memory support and hardware dynamic disambiguation support. Other embodiments are also described.
    Type: Application
    Filed: December 29, 2012
    Publication date: July 3, 2014
    Inventors: Abhay S. Kanhere, Suriya Subramanian, Saurabh S. Shukla
  • Publication number: 20140189666
    Abstract: A method and apparatus for automatic pipeline are provided herein. Syntax elements may be manually inserted into the code, or automatically injected into the code. The syntax elements may specify hints such as data type parameters to independent functions allowing the functions to be automatically coalesced into a single loop, providing optimized data accesses to be coalesced for each function in the pipeline within the single loop. A run-time system produces optimized machine code for a target processor using syntax elements to guide the optimizations. Additionally, the pipeline may be executed. The pipeline includes the coalesced functions and data accesses.
    Type: Application
    Filed: December 27, 2012
    Publication date: July 3, 2014
    Inventors: Scott A. Krig, Michael D. Jeronimo
  • Patent number: 8769507
    Abstract: A method, system, and article of manufacture are disclosed for transforming a definition of a process for delivering a service on a specified computing device. This service process definition is comprised of computer readable code. The method comprises the steps of expressing a given set of assumptions in a computer readable code; and transforming the definition by using a processing unit to apply the assumptions to the definition of the process to change the way in which the process operates. The definition of the process may be transformed by using factors relating to the specific context in or for which the definition is executed. Also, the definition may be transformed by identifying, in a flow diagram for the process, flows to which the assumptions apply, and applying program rewriting techniques to those identified flows.
    Type: Grant
    Filed: May 14, 2009
    Date of Patent: July 1, 2014
    Assignee: International Business Machines Corporation
    Inventors: David F. Bantz, Steven J. Mastrianni, James R. Moulic, Dennis G. Shea
  • Publication number: 20140173576
    Abstract: A system, method and computer-readable medium are disclosed for improving the performance of a compiler. A set of source code instructions are processed to generate a plurality of source code instruction subsets, each of which is respectively associated with a mathematical operator. The source code subsets are then reordered to “hoist,” or place, a source code instruction subset associated with a product operator before a source code instruction subset associated with a summation operator. The plurality of source code instruction subsets are iteratively reordered until no source code instruction subset associated with a summation operator precedes a source code instruction subset associated with a product operator. A compiler is then used to compile the resulting reordered plurality of source code instruction subsets into a set of optimized object code instructions.
    Type: Application
    Filed: December 13, 2012
    Publication date: June 19, 2014
    Inventor: Mohammed Javed Absar
  • Patent number: 8752036
    Abstract: Embodiments of the invention provide systems and methods for throughput-aware software pipelining in compilers to produce optimal code for single-thread and multi-thread execution on multi-threaded systems. A loop is identified within source code as a candidate for software pipelining. An attempt is made to generate pipelined code (e.g., generate an instruction schedule and a set of register assignments) for the loop in satisfaction of throughput-aware pipelining criteria, like maximum register count, minimum trip count, target core pipeline resource utilization, maximum code size, etc. If the attempt fails to generate code in satisfaction of the criteria, embodiments adjust one or more settings (e.g., by reducing scalarity or latency settings being used to generate the instruction schedule).
    Type: Grant
    Filed: October 31, 2011
    Date of Patent: June 10, 2014
    Assignee: Oracle International Corporation
    Inventors: Spiros Kalogeropulos, Partha Tirumalai
  • Patent number: 8745607
    Abstract: According to one aspect of the present disclosure, a method and technique for reducing branch misprediction impact for nested loop code is disclosed. The method includes: responsive to identifying code having an outer loop and an inner loop, determining a quantity of iterations of the inner loop for an initial number of iterations of the outer loop; determining a number of processor cycles for executing the quantity of iterations of the inner loop for the initial number of iterations of the outer loop; determining whether the number of processor cycles is less than a threshold; and responsive to determining that the number of processor cycles is less than the threshold, fully unrolling the inner loop for the initial number of iterations of the outer loop.
    Type: Grant
    Filed: November 11, 2011
    Date of Patent: June 3, 2014
    Assignee: International Business Machines Corporation
    Inventors: Madhavi G. Valluri, Steven W. White
  • Patent number: 8738348
    Abstract: A method and mechanism for implementing a general purpose scripting language that supports parallel execution is described. In one approach, parallel execution is provided in a seamless and high-level approach rather than requiring or expecting a user to have low-level programming expertise with parallel processing languages/functions. Also described is a system and method for performing circuit simulation. The present approach provides methods and systems that create reusable and independent measurements for use with circuit simulators. Also disclosed are parallelizable measurements having looping constructs that can be run without interference between parallel iterations. Reusability is enhanced by having parameterized measurements. Revisions and history of the operating parameters of circuit designs subject to simulation are tracked.
    Type: Grant
    Filed: June 15, 2012
    Date of Patent: May 27, 2014
    Assignee: Cadence Design Systems, Inc.
    Inventor: Kenneth S. Kundert
  • Patent number: 8739141
    Abstract: A system and method for speculatively parallelizing non-countable loops in a multi-threaded application. A multi-core processor receives instructions for a multi-threaded application. The application may contain non-countable loops. Non-countable loops have an iteration count value that cannot be determined prior to the execution of the non-countable loop, a loop index value that cannot be non-speculatively determined prior to the execution of an iteration of the non-countable loop, and control that is not transferred out of the loop body by a code line in the loop body. The compiler replaces the non-countable loop with a parallelized loop pattern that uses outlined function calls defined in a parallelization library (PL) in order to speculatively execute iterations of the parallelized loop. The parallelized loop pattern is configured to squash and re-execute any speculative thread of the parallelized loop pattern that is signaled to have a transaction failure.
    Type: Grant
    Filed: May 19, 2008
    Date of Patent: May 27, 2014
    Assignee: Oracle America, Inc.
    Inventors: Yonghong Song, Spiros Kalogeropulos, Partha P. Tirumalai
  • Patent number: 8732679
    Abstract: A new computer-compiler architecture includes code analysis processes in which loops present in an intermediate instruction set are transformed into more efficient loops prior to fully executing the intermediate instruction set. The compiler architecture starts by generating the equivalent intermediate instructions for the original high level source code. For each loop in the intermediate instructions, a total cycle cost is calculated using a cycle cost table associated with the compiler. The compiler then generates intermediate code for replacement loops in which all conversion instructions are removed. The cycle costs for these new transformed loops are then compared against the total cycle cost for the original loops. If the total cycle costs exceed the new cycle costs, the compiler will replace the original loops in the intermediate instructions with the new transformed loops prior to generation of final code using the instruction set of the processor.
    Type: Grant
    Filed: March 16, 2010
    Date of Patent: May 20, 2014
    Assignee: QUALCOMM Incorporated
    Inventors: Sumesh Udayakumaran, Chihong Zhang
  • Patent number: 8726256
    Abstract: Apparatus, systems, and methods for a compiler are disclosed. One such compiler parses a human readable expression into a syntax tree and converts the syntax tree into an automaton having in-transitions and out-transitions. Converting can include unrolling the quantification as a function of in-degree limitations wherein in-degree limitations includes a limit on the number of transitions into a state of the automaton. The compiler can also convert the automaton into an image for programming a parallel machine, and publishes the image. Additional apparatus, systems, and methods are disclosed.
    Type: Grant
    Filed: January 24, 2012
    Date of Patent: May 13, 2014
    Assignee: Micron Technology, Inc.
    Inventors: Junjuan Xu, Paul Glendenning
  • Patent number: 8726251
    Abstract: Embodiments of the invention provide systems and methods for automatically parallelizing loops with non-speculative pipelined execution of chunks of iterations with pre-computation of selected values. Non-DOALL loops are identified and divided the loops into chunks. The chunks are assigned to separate logical threads, which may be further assigned to hardware threads. As a thread performs its runtime computations, subsequent threads attempt to pre-compute their respective chunks of the loop. These pre-computations may result in a set of assumed initial values and pre-computed final variable values associated with each chunk. As subsequent pre-computed chunks are reached at runtime, those assumed initial values can be verified to determine whether to proceed with runtime computation of the chunk or to avoid runtime execution and instead use the pre-computed final variable values.
    Type: Grant
    Filed: March 29, 2011
    Date of Patent: May 13, 2014
    Assignee: Oracle International Corporation
    Inventors: Spiros Kalogeropulos, Partha Pal Tirumalai
  • Patent number: 8713549
    Abstract: A method for vectorization of a block of code is provided. The method comprises receiving a first block of code as input; and converting the first block of code into at least a second block of code and a third block of code. The first block of code accesses a first set of memory addresses that are potentially misaligned. The second block of code performs conditional leaping address incrementation to selectively access a first subset of the first set of memory addresses. The third block of code accesses a second subset of the first set of memory addresses starting from an aligned memory address, simultaneously accessing multiple memory addresses at a time. No memory address belongs to both the first subset and the second subset of memory addresses.
    Type: Grant
    Filed: September 7, 2012
    Date of Patent: April 29, 2014
    Assignee: International Business Machines Corporation
    Inventors: Dorit Nuzman, Ira Rosen, Ayal Zaks
  • Publication number: 20140115569
    Abstract: A method and system of the instruction packing and scaling are designed for simultaneously enhancing energy efficiency by concurrent and advanced prefetching/fetching instructions via the small and/or banked caches and for improving the performance of microprocessors by reducing the fraction of program and by employing the simple and fast caches. The invention is also designed for converting high fraction code to simplified, branch-reduced, and hidden code during compilation time, for storing packed/scaled code to concurrently accessible the plurality of caches and main memories, and for reverting the code to the native instructions during the instruction prefetch and fetch operations. Consequently, the invention does not forward many flow control instructions including procedure callers/returns and unconditional branches to microprocessors.
    Type: Application
    Filed: October 23, 2012
    Publication date: April 24, 2014
    Inventor: Yong-Kyu Jung
  • Publication number: 20140115560
    Abstract: A system for providing a computer configured to read an immutable value for a variable; read the value of the variable at a specific timestamp, thereby providing an ability to create looping constructs; set a current or next value of a loop variable as a function of previous or current loop variable values; read a set of all values that a variable will assume; push or scattering the values into unordered collections; and reduce the collections into a single value.
    Type: Application
    Filed: October 21, 2013
    Publication date: April 24, 2014
    Inventor: Luke Hutchison
  • Patent number: 8701098
    Abstract: A method, apparatus and program product are provided for parallelizing analysis and optimization in a compiler. A plurality of basic blocks and a subset of data points of a computer program is prepared for processing by a main thread selected from a plurality of hardware threads. The plurality of prepared basic blocks and subset of data points are placed in a shared data structure by the main thread. A prepared basic block of the plurality of prepared basic blocks and/or a tuple associated with the subset of data points is concurrently retrieved from the shared data structure by a work thread selected from the plurality of hardware threads. A compiler analysis or optimization is performed on the prepared basic block or tuple by the work thread.
    Type: Grant
    Filed: April 2, 2009
    Date of Patent: April 15, 2014
    Assignee: International Business Machines Corporation
    Inventors: Robert R. Roediger, William J. Schmidt
  • Patent number: 8701099
    Abstract: A method, a system and a computer program product for effectively accelerating loop iterators using speculative execution of iterators. An Efficient Loop Iterator (ELI) utility detects initiation of a target program and initiates/spawns a speculative iterator thread at the start of the basic code block ahead of the code block that initiates a nested loop. The ELI utility assigns the iterator thread to a dedicated processor in a multi-processor system. The speculative thread runs/executes ahead of the execution of the nested loop and calculates indices in a corresponding multidimensional array. The iterator thread adds all the precomputed indices to a single queue. As a result, the ELI utility effectively enables a multidimensional loop to be replaced by a single dimensional loop. At the beginning of (or during) each iteration of the iterator, the ELI utility “dequeues” an entry from the queue to use the entry to access the array upon which the ELI utility iterates.
    Type: Grant
    Filed: November 2, 2010
    Date of Patent: April 15, 2014
    Assignee: International Business Machines Corporation
    Inventors: Ganesh Bikshandi, Dibyendu Das, Smruti Ranjan Sarangi
  • Patent number: 8694971
    Abstract: A novel system, computer program product, and method are disclosed for transforming a program to facilitate points-to analysis. The method begins with accessing at least a portion of program code, such as JavaScript. In one example, a method with at least one dynamic property correlation is identified for extraction. When a method m is identified for extraction with the dynamic property correlation, a body of the loop l in the method m is extracted. A new method mp is created to include the body of the loop l with the variable i as a parameter. The loop l is substituted in the program code of the method m with the new method mp to create a transformed program code.
    Type: Grant
    Filed: October 5, 2011
    Date of Patent: April 8, 2014
    Assignee: International Business Machines Corporation
    Inventors: Satish Chandra, Julian Dolby, Manu Sridharan, Frank Tip
  • Publication number: 20140096119
    Abstract: Loop vectorization methods and apparatus are disclosed. An example method includes setting a dynamic adjustment value of a vectorization loop; executing the vectorization loop to vectorize a loop by grouping iterations of the loop into one or more vectors; identifying a dependency between iterations of the loop as; and setting the dynamic adjustment value based on the identified dependency.
    Type: Application
    Filed: September 28, 2012
    Publication date: April 3, 2014
    Inventors: NALINI VASUDEVAN, JAYASHANKAR BHARADWAJ, CHRISTOPHER J. HUGHES, MILIND B. GIRKAR, MARK J. CHARNEY, ROBERT VALENTINE, VICTOR W. LEE, DAEHYUN KIM, ALBERT HARTONO, SARA S. BAGHSORKHI
  • Patent number: 8677337
    Abstract: A compilation method and mechanism for parallelizing program code. A method for compilation includes analyzing source code and identifying candidate code for parallelization. Having identified one or more suitable candidates, the profitability of parallelizing the candidate code is determined. If the profitability determination meets a predetermined criteria, then the candidate code may be parallelized. If, however, the profitability determination does not meet the predetermined criteria, then the candidate code may not be parallelized. Candidate code may comprises a loop, and determining profitability of parallelization may include computing a probability of transaction failure for the loop. Additionally, a determination of an execution time of a parallelized version of the loop is made. If the determined execution time is less than an execution time of a non-parallelized version of said loop by at least a given amount, then the loop may be parallelized.
    Type: Grant
    Filed: May 1, 2008
    Date of Patent: March 18, 2014
    Assignee: Oracle America, Inc.
    Inventors: Yonghong Song, Spiros Kalogeropulos, Partha P. Tirumalai
  • Patent number: 8677330
    Abstract: A compiling method compiles an object program to be executed by a processor having a plurality of execution units operable in parallel. In the method a first availability chain is created from a producer instruction (p1), scheduled for execution by a first one of the execution units (20: AGU), to a first consumer instruction (c1), scheduled for execution by a second one of the execution units (22: EXU) and requiring a value produced by the said producer instruction. The first availability chain comprises at least one move instruction (mv1-mv3) for moving the required value from a first point (20: ARF) accessible by the first execution unit to a second point (22: DRF) accessible by the second execution unit.
    Type: Grant
    Filed: June 9, 2010
    Date of Patent: March 18, 2014
    Assignee: Altera Corporation
    Inventors: Marcio Merino Fernandes, Raymond Malcolm Livesley
  • Patent number: 8677338
    Abstract: Methods and apparatus to data dependence testing for loop fusion, e.g., with code replication, array contraction, and/or loop interchange, are described. In one embodiment, a compiler may optimize code for efficient execution during run-time by testing for dependencies associated with improving memory locality through code replication in loops that enable various loop transformations. Other embodiments are also described.
    Type: Grant
    Filed: June 4, 2008
    Date of Patent: March 18, 2014
    Assignee: Intel Corporation
    Inventors: John L. Ng, Rakesh Krishnaiyer, Alexander Y. Ostanevich
  • Patent number: 8671401
    Abstract: Described is a technology by which a series of loop nests corresponding to source code are detected by a compiler, with the series of loop nests tiled together, (thereby increasing the ratio of cache hits to misses in a multi-processor environment). The compiler transforms the series of loop nests into a plurality of tile loops within a controller loop, including using dependency analysis to determine which results from a tile loop need to be pre-computed before another tile loop. For dependency analysis, the compiler may use a directed acyclic graph as a high-level intermediate representation, and split the graph into sub-graphs each representing an array. The compiler uses descriptors processed from the graph to determine the controller loop and the tile loops within that controller loop.
    Type: Grant
    Filed: April 9, 2007
    Date of Patent: March 11, 2014
    Assignee: Microsoft Corporation
    Inventors: Siddhartha Puri, Jaydeep P. Marathe
  • Patent number: RE45199
    Abstract: A compiler apparatus, which can perform software pipelining optimization that has a considerable effect of reducing the number of execution cycles taken to complete a loop process, converts a source program into a machine program for a processor which is capable of parallel processing. The compiler apparatus is composed of: a parsing unit operable to parse the source program and then to convert the source program into an intermediate program which is described in an intermediate language; an optimization unit operable to optimize the intermediate program; and a conversion unit operable to convert the optimized intermediate program into the machine language program, wherein the optimization unit is operable to execute software pipelining, by inserting a transfer instruction, which is used for transferring data between operands, into a loop process included in the intermediate program so that a data dependence relation is changed.
    Type: Grant
    Filed: September 14, 2012
    Date of Patent: October 14, 2014
    Assignee: Panasonic Corporation
    Inventors: Shohei Michimoto, Taketo Heishi, Hajime Ogawa, Teruo Kawabata