Loop Compiling Patents (Class 717/150)
  • Patent number: 10095434
    Abstract: A compilation system can compile a program to be executed using an event driven tasks (EDT) system that requires knowledge of dependencies between program statement instances, and generate the required dependencies efficiently when a tiling transformation is applied. To this end, the system may use pre-tiling dependencies and can derive post-tiling dependencies via an analysis of the tiling to be applied.
    Type: Grant
    Filed: January 4, 2016
    Date of Patent: October 9, 2018
    Assignee: RESERVOIR LABS, INC.
    Inventors: Muthu M. Baskaran, Thomas Henretty, Ann Johnson, Athanasios Konstantinidis, M. H. Langston, Janice O. McMahon, Benoit J. Meister, Paul D. Mountcastle, Aale Naqvi, Benoit Pradelle, Tahina Ramananandro, Sanket Tavarageri, Richard A. Lethin
  • Patent number: 10095494
    Abstract: A system can generate and impose constraints on a compiler/scheduler so as to specifically minimize the footprints of one or more program variables. The constraints can be based on scopes of the variables and/or on dependence distances between statements specifying operations that use the one or more program variables.
    Type: Grant
    Filed: August 28, 2015
    Date of Patent: October 9, 2018
    Assignee: Reservoir Labs, Inc.
    Inventors: Benoit J. Meister, Muthu M. Baskaran, Richard A. Lethin
  • Patent number: 9760352
    Abstract: A program optimization method, executed by an arithmetic processing device, includes collecting profile information including a runtime analysis result by causing a computer to execute an original program to be optimized, calculating a calculation wait time based on the profile information, and generating a tuned-up program, when the calculation wait time is longer than a first threshold, by inserting an SIMD operation control line that performs an SIMD operation for an instruction in IF statement in the loop when an SIMD instruction ratio in the loop in the original program is lower than a second threshold.
    Type: Grant
    Filed: July 15, 2015
    Date of Patent: September 12, 2017
    Assignee: FUJITSU LIMITED
    Inventors: Shusaku Nakashima, Toshiya Naito
  • Patent number: 9507568
    Abstract: A high level programming language provides a nested communication operator that partitions a computational space. An indexable type with a rank and element type defines the computational space. The nested communication operator partitions a specified dimension of an index indexable type into segments specified by a segmentation vector and returns an output indexable type that represents the segments. By doing so, the nested communication operator allows data parallel algorithms to operate on the segments as individual units.
    Type: Grant
    Filed: December 9, 2010
    Date of Patent: November 29, 2016
    Assignee: Microsoft Technology Licensing, LLC
    Inventor: Paul F. Ringseth
  • Patent number: 9430196
    Abstract: In certain embodiments, source code that includes at least one request to send an asynchronous message between state machines is obtained at a computing system. The computing system determines whether the asynchronous message can be implemented as an inlined message. Executable machine code corresponding to the source code is compiled such that the asynchronous message is compiled as an inlined message.
    Type: Grant
    Filed: October 16, 2014
    Date of Patent: August 30, 2016
    Assignee: Cisco Technology, Inc.
    Inventors: Swaha Das Miller, Jonathan Gregory Rossie, Jr., Jamie Taylor
  • Patent number: 9430203
    Abstract: A storage unit stores source code including loop processing that is written with an array referenced by an index, a loop variable, and a parameter. A computing unit generates a conditional expression indicating that the index of the array satisfies a predetermined condition, using the loop variable and the parameter. The computing unit generates determination information on the parameter, by eliminating the loop variable from the conditional expression through formula manipulation. Then, the computing unit generates object code corresponding to the source code in accordance with the determination information.
    Type: Grant
    Filed: October 24, 2014
    Date of Patent: August 30, 2016
    Assignee: FUJITSU LIMITED
    Inventors: Kuninori Ishii, Masanori Yamanaka, Masaki Arai
  • Patent number: 9430201
    Abstract: Methods are disclosed of compiling a software application having multiple functions. At least one of the functions is identified as a targeted function having a significant contribution to performance of the software application. A code version of the targeted function is generated with one of multiple machine models corresponding to different target utilizations for a target architecture, specifically corresponding to the one with the greatest of the different target utilizations. The generated code version of the targeted function is matched with an application thread of the target architecture.
    Type: Grant
    Filed: August 21, 2014
    Date of Patent: August 30, 2016
    Assignee: Oracle International Corporation
    Inventors: Spiros Kalogeropulos, Partha Tirumalai
  • Patent number: 9424092
    Abstract: Heterogeneous thread scheduling techniques are described in which a processing workload is distributed to heterogeneous processing cores of a processing system. The heterogeneous thread scheduling may be implemented based upon a combination of periodic assessments of system-wide power management considerations used to control states of the processing cores and higher frequency thread-by-thread placement decisions that are made in accordance with thread specific policies. In one or more implementations, a system workload context is periodically analyzed for a processing system having heterogeneous cores including power efficient cores and performance oriented cores. Based on the periodic analysis, cores states are set for some of the heterogeneous cores to control activation of the power efficient cores and performance oriented cores for thread scheduling.
    Type: Grant
    Filed: September 26, 2014
    Date of Patent: August 23, 2016
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Neeraj Kumar Singh, Tristan A. Brown, Jeremiah S. Samli, Jason S. Wohlgemuth, Youssef Maged Barakat
  • Patent number: 9158506
    Abstract: Loop abstraction includes determining an original loop within the source code. The original loop includes a control statement and a loop body such that the original loop causes the loop body to be repeatedly executed based on the control statement. Further, output variables in the original loop and a number of blocks associated with the original loop are identified. The number of blocks is indicative of a count of unconditionally executed statement sets in which at least one output variable is computed. An abstract loop corresponding to the original loop is generated by adding a modified expression for accelerated assignment for each output variable in a subset of the output variables, and replacing the control statement with a bounded control statement. The original loop is replaced with the abstract loop for generating an abstract source code for the model checking.
    Type: Grant
    Filed: February 27, 2015
    Date of Patent: October 13, 2015
    Assignee: Tata Consultancy Services Limited
    Inventors: Priyanka Dilip Darke, Bharti Dewrao Chimdyalwar, Venkatesh R, Ulka Aniruddha Shrotri
  • Patent number: 9141357
    Abstract: A compiler determines executability of loop fusion, for each of a plurality of loops existing in a code to be processed, based on performance information of a system where the code to be processed is executed and based on operands and number of data transfers executed inside each of the loops. Then, the compiler executes fusion of loop processing in accordance with a determination result of executability of the loop fusion.
    Type: Grant
    Filed: April 17, 2014
    Date of Patent: September 22, 2015
    Assignee: FUJITSU LIMITED
    Inventors: Tomoko Nikko, Shuichi Chiba
  • Patent number: 9122513
    Abstract: A graph analytics appliance can be employed to extract data from a graph database in an efficient manner. The graph analytics appliance includes a router, a worklist scheduler, a processing unit, and an input/output unit. The router receives an abstraction program including a plurality of parallel algorithms for a query request from an abstraction program compiler residing on computational node or the graph analytics appliance. The worklist scheduler generates a prioritized plurality of parallel threads for executing the query request from the plurality of parallel algorithms. The processing unit executes multiple threads selected from the prioritized plurality of parallel threads. The input/output unit communicates with a graph database.
    Type: Grant
    Filed: November 28, 2012
    Date of Patent: September 1, 2015
    Assignee: International Business Machines Corporation
    Inventors: Arpith C. Jacob, Jude A. Rivers
  • Patent number: 9116738
    Abstract: A graph analytics appliance can be employed to extract data from a graph database in an efficient manner. The graph analytics appliance includes a router, a worklist scheduler, a processing unit, and an input/output unit. The router receives an abstraction program including a plurality of parallel algorithms for a query request from an abstraction program compiler residing on computational node or the graph analytics appliance. The worklist scheduler generates a prioritized plurality of parallel threads for executing the query request from the plurality of parallel algorithms. The processing unit executes multiple threads selected from the prioritized plurality of parallel threads. The input/output unit communicates with a graph database.
    Type: Grant
    Filed: November 13, 2012
    Date of Patent: August 25, 2015
    Assignee: International Business Machines Corporation
    Inventors: Arpith C. Jacob, Jude A. Rivers
  • Patent number: 9043769
    Abstract: The present invention relates to a method for compiling code for a multi-core processor, comprising: detecting and optimizing a loop, partitioning the loop into partitions executable and mappable on physical hardware with optimal instruction level parallelism, optimizing the loop iterations and/or loop counter for ideal mapping on hardware, chaining the loop partitions generating a list representing the execution sequence of the partitions.
    Type: Grant
    Filed: December 28, 2010
    Date of Patent: May 26, 2015
    Assignee: Hyperion Core Inc.
    Inventor: Martin Vorbach
  • Patent number: 9038041
    Abstract: A stream processing platform that provides fast execution of stream processing applications within a safe runtime environment. The platform includes a stream compiler that converts a representation of a stream processing application into executable program modules for a safe environment. The platform allows users to specify aspects of the program that contribute to generation of modules that execute as intended. A user may specify aspects to control a type of implementation for loops, order of execution for parallel paths, whether multiple instances of an operation can be performed in parallel or whether certain operations should be executed in separate threads. In addition, the stream compiler may generate executable modules in a way that cause a safe runtime environment to allocate memory or otherwise operate efficiently.
    Type: Grant
    Filed: December 22, 2006
    Date of Patent: May 19, 2015
    Assignee: TIBCO Software, Inc.
    Inventors: Jonathan Salz, Richard S. Tibbetts
  • Patent number: 9038042
    Abstract: Loop instructions are analyzed and assigned stage numbers based on dependencies between them and machine resources available. The loop instructions are selectively executed based on their stage numbers, thereby eliminating the need for explicit loop set-up and tear-down instructions. On a Single Instruction, Multiple Data machine, the final instance of each instruction may be executed on a subset of the processing elements or vector elements, dependent on the number of iterations of the original loop.
    Type: Grant
    Filed: June 29, 2012
    Date of Patent: May 19, 2015
    Assignee: ANALOG DEVICES, INC.
    Inventors: Michael G. Perkins, Andrew J. Higham
  • Publication number: 20150135171
    Abstract: A storage unit stores source code including loop processing that is written with an array referenced by an index, a loop variable, and a parameter. A computing unit generates a conditional expression indicating that the index of the array satisfies a predetermined condition, using the loop variable and the parameter. The computing unit generates determination information on the parameter, by eliminating the loop variable from the conditional expression through formula manipulation. Then, the computing unit generates object code corresponding to the source code in accordance with the determination information.
    Type: Application
    Filed: October 24, 2014
    Publication date: May 14, 2015
    Inventors: Kuninori ISHII, Masanori YAMANAKA, MASAKI ARAI
  • Patent number: 9009671
    Abstract: Crash notification between debuggers, including: initiating, by a first debugger, a first debug session of a first application; detecting, by the first debugger, an error condition in the first application; determining, by the first debugger, whether any variables utilized by the first application are related to variables utilized by a second application, wherein the second application is being debugged in a second debug session by a second debugger; and communicating, by the first debugger to a second debugger, information associated with the error condition in the first application.
    Type: Grant
    Filed: February 13, 2013
    Date of Patent: April 14, 2015
    Assignee: International Business Machines Corporation
    Inventors: Cary L. Bates, Justin K. King, Lee Nee, Michelle A. Schlicht
  • Patent number: 8997042
    Abstract: The current application is directed to flexible and run-time-modifiable implementation of crosscutting functionalities, including code instrumentation, error logging, and other such crosscutting functionalities. These crosscutting functionalities generally violate, or run counter to, modern code-development strategies and programming-language features that seek to partition logic into hierarchically organized compartments and modules with related functionalities, attribute values, and other common features. One feature of the methods and systems for implementing crosscutting functionalities to which the current application is directed is an intelligent switch that can be controlled, at run time, to alter invocation and behavior of crosscutting-functionality implementations, including data-collection instrumentation, error logging, and other crosscutting-functionality implementations.
    Type: Grant
    Filed: October 15, 2012
    Date of Patent: March 31, 2015
    Assignee: Pivotal Software, Inc.
    Inventors: John Victor Kew, Jonathan Travis
  • Publication number: 20150058832
    Abstract: System and methods for the parallelization of software applications are described. In some embodiments, a compiler may automatically identify within source code dependencies of a function called by another function. A persistent database may be generated to store identified dependencies. When calls the function are encountered within the source code, the persistent database may be checked, and a parallelized implementation of the function may be employed dependent upon the dependency indicated in the persistent database.
    Type: Application
    Filed: November 4, 2014
    Publication date: February 26, 2015
    Inventor: Jeffry E. Gonion
  • Patent number: 8966459
    Abstract: A compiling method compiles an object program to be executed by a processor having a plurality of execution units operable in parallel. In the method a first availability chain is created from a producer instruction (p1), scheduled for execution by a first one of the execution units (20: AGU), to a first consumer instruction (c1), scheduled for execution by a second one of the execution units (22: EXU) and requiring a value produced by the said producer instruction. The first availability chain comprises at least one move instruction (mv1-mv3) for moving the required value from a first point (20: ARF) accessible by the first execution unit to a second point (22: DRF) accessible by the second execution unit.
    Type: Grant
    Filed: February 20, 2014
    Date of Patent: February 24, 2015
    Assignee: Altera Corporation
    Inventors: Marcio Merino Fernandes, Raymond Malcolm Livesley
  • Patent number: 8959496
    Abstract: A tracing just-in-time (TJIT) compiler system is described for performing parallelization of code in a runtime phase in the execution of code. Upon detecting a hot loop during the execution of the code, the compiler system extracts trace information from sequentially recorded traces. In a first phase, the compiler system uses the trace information to identify at least one group of operation components that can be operated on in a parallel manner. In a second phase, the compiler system provides instructions which allocate the group of operation components to plural processing resources. A native code generator module carries out those instructions by recompiling native code that directs the operation of a native system to perform parallel processing. The compiler system terminates a group if it encounters program data in a loop iteration that is not consistent with previously encountered predicated information (upon which it records a new trace in a sequential manner).
    Type: Grant
    Filed: April 21, 2010
    Date of Patent: February 17, 2015
    Assignee: Microsoft Corporation
    Inventors: Wolfram Schulte, Nikolai Tillmann, Michal J. Moskal, Manuel A. Fahndrich, Daniel J P Leijen, Barend H. Venter
  • Patent number: 8959501
    Abstract: Embodiments are directed to implementing a generic SIMD data type in software code. In an embodiment, a computer system accesses a portion of software code that includes an algorithm with a generic SIMD data type that includes a variable number of elements. The algorithm with the generic SIMD data type is to be processed by a specific processor that includes various specific hardware features. The computer system determines at runtime a portion of customized processor-specific code that is to be used with the specified processor based on the generic SIMD data type, wherein the runtime determination resolves the number of elements that are to be used with the specified processor. The computer system also processes the software code including the algorithm with the generic SIMD data type using the determined, customized processor-specific code.
    Type: Grant
    Filed: December 14, 2010
    Date of Patent: February 17, 2015
    Assignee: Microsoft Corporation
    Inventors: Carol Thompson Eidt, David L. Detlefs
  • Patent number: 8949809
    Abstract: A system and associated method for automatically pipeline parallelizing a nested loop in sequential code over a predefined number of threads. Pursuant to task dependencies of the nested loop, each subloop of the nested loop are allocated to a respective thread. Combinations of stage partitions executing the nested loop are configured for parallel execution of a subloop where permitted. For each combination of stage partitions, a respective bottleneck is calculated and a combination with a minimum bottleneck is selected for parallelization.
    Type: Grant
    Filed: March 1, 2012
    Date of Patent: February 3, 2015
    Assignee: International Business Machines Corporation
    Inventors: Pradeep Varma, Manish Gupta, Monika Gupta, Naga Praveen Kumar Katta
  • Patent number: 8949807
    Abstract: A device receives, via a technical computing environment, a program that includes a parallel construct and a command to be executed by graphical processing units, and analyzes the program. The device also creates, based on the parallel construct and the analysis, one or more instances of the command to be executed in parallel by the graphical processing units, and transforms, via the technical computing environment, the one or more command instances into one or more command instances that are executable by the graphical processing units. The device further allocates the one or more transformed command instances to the graphical processing units for parallel execution, and receives, from the graphical processing units, one or more results associated with parallel execution of the one or more transformed command instances by the graphical processing units.
    Type: Grant
    Filed: September 30, 2013
    Date of Patent: February 3, 2015
    Assignee: The MathWorks, Inc.
    Inventors: Halldor N. Stefansson, Edric Ellis
  • Patent number: 8949808
    Abstract: Systems and methods for the vectorization of software applications are described. In some embodiments, a compiler may automatically generate both scalar and vector versions of a function from a single source code description. A vector interface may be exposed in a persistent dependency database that is associated with the function. This may allow a compiler to make vector function calls from within vectorized loops, rather than making multiple serialized scalar function calls from within a vectorized loop. This may in turn facilitate the vectorization of hierarchical code, which may improve application performance when vector execution resources are available.
    Type: Grant
    Filed: September 23, 2010
    Date of Patent: February 3, 2015
    Assignee: Apple Inc.
    Inventor: Jeffry E. Gonion
  • Patent number: 8930926
    Abstract: Methods, apparatus and computer software product for source code optimization are provided. In an exemplary embodiment, a first custom computing apparatus is used to optimize the execution of source code on a second computing apparatus. In this embodiment, the first custom computing apparatus contains a memory, a storage medium and at least one processor with at least one multi-stage execution unit. The second computing apparatus contains at least two multi-stage execution units that allow for parallel execution of tasks. The first custom computing apparatus optimizes the code for parallelism, locality of operations and contiguity of memory accesses on the second computing apparatus. This Abstract is provided for the sole purpose of complying with the Abstract requirement rules. This Abstract is submitted with the explicit understanding that it will not be used to interpret or to limit the scope or the meaning of the claims.
    Type: Grant
    Filed: April 16, 2010
    Date of Patent: January 6, 2015
    Assignee: Reservoir Labs, Inc.
    Inventors: Cedric Bastoul, Richard A. Lethin, Allen K. Leung, Benoit J. Meister, Peter Szilagyi, Nicolas T. Vasilache, David E. Wohlford
  • Patent number: 8914782
    Abstract: Source code is generated that includes one or more iterator-based expressions such as declarative queries. The source code is translated into an intermediate language that classifies operators making up the iterator-based expressions into classes based on whether the operators are aggregating, element-wise, or sink operators. The intermediate language, including the identified classes, is processed using an automaton to replace the iterator-based expressions with one or more equivalent non-iterator-based expressions. Where an iterator-based expression is nested, the nested expression is processed using an equivalent number of nested automatons. The resulting optimized source code may be compiled and executed using fewer virtual function calls than the equivalent non-optimized source code.
    Type: Grant
    Filed: November 10, 2010
    Date of Patent: December 16, 2014
    Assignee: Microsoft Corporation
    Inventors: Michael Isard, Yuan Yu, Derek Gordon Murray
  • Patent number: 8898794
    Abstract: One embodiment of a computer-implemented data structure synchronization mechanism comprises an interface for accessing a data structure and storing ownership data in a shared memory location. The method further comprises denying write operations if the thread attempting the write operation is not designated as the owner thread by said ownership data. The method further comprises denying requests to modify the ownership data if the thread making the request is not designated as the owner thread by said ownership data. The method further comprises effecting a write fence in the context of the thread making the request to modify ownership data prior to modifying the ownership data. Other embodiments are described.
    Type: Grant
    Filed: September 6, 2011
    Date of Patent: November 25, 2014
    Inventor: Andrei Teodor Borac
  • Publication number: 20140344793
    Abstract: An apparatus and method for executing code are provided. The apparatus includes a memory manager that allocates a stack in memory to store processed data that needs to be retained; a loop generator that divides program code programmed to be processed in parallel into regions based on a barrier function, transforms a region that includes the processed data that needs to be retained in the stack into a first coalescing loop, and transforms a region that uses the processed data stored in the stack into a second coalescing loop such that the transformed program code may be serially processed; and a loop changer that reverses a processing order of the second coalescing loop in comparison to a processing order of the first coalescing loop.
    Type: Application
    Filed: March 31, 2014
    Publication date: November 20, 2014
    Applicant: Samsung Electronics Co., Ltd.
    Inventors: Jin-Seok LEE, Seong-Gun KIM, Dong-Hoon YOO, Seok-Joong HWANG
  • Patent number: 8893104
    Abstract: The aspects enable a computing device to allocate memory space to variables during runtime compilation of a software application. A compiler may be modified to identify operations that can be performed on either a main pipe or an alternative pipe, identify chains of related operations that can be performed on either the main pipe or the alternative pipe, identify points in the execution of code at which the number of live values will exceed the number of registers, and choosing a chain of operations as a candidate to be moved to the alternative pipe in order to reduce the number of live values at identified points in the execution of code. The entire chosen chain of operations may be moved to the alternative pipe. The alternative pipe may perform the computations and return the results to the main pipe for execution.
    Type: Grant
    Filed: March 1, 2012
    Date of Patent: November 18, 2014
    Assignee: QUALCOMM Incorporated
    Inventors: Christopher A. Vick, Gregory M. Wright
  • Patent number: 8881124
    Abstract: According to the conventional loop parallelization method, when a loop in which a value of a loop-carried dependency variable can be calculated in all of the iterations without sequentially executing the loop from the start, it is determined that DOALL parallelization is not applicable due to the loop-carried dependency variable. Accordingly, the loop is sequentially executed or parallelized by using DOACROSS parallelization that executes a loop including a loop-carried dependency variable. That is, there is a problem that an expression including a loop-carried dependency cannot be parallelized and efficiently processed with use of a multi-processor. By generating initial value calculating codes, the loop-carried dependency in a source code prior to parallelization can be solved, and by dividing a loop included in the source code into subloops that can be executed in parallel, the multi-processor can efficiently process the source code.
    Type: Grant
    Filed: December 13, 2011
    Date of Patent: November 4, 2014
    Assignee: Panasonic Corporation
    Inventor: Daisuke Baba
  • Publication number: 20140325495
    Abstract: A computer implemented method entails identifying code regions in an application from which offloadable tasks can be generated by a compiler for heterogenous computing system with processor and accelerator memory, including adding relaxed semantics to a directive based language in the heterogenous computing for allowing a suggesting rather than specifying a parallel code region as an offloadable candidate, and identifying one or more offloadable tasks in a neighborhood of code region marked by the directive.
    Type: Application
    Filed: April 25, 2014
    Publication date: October 30, 2014
    Applicant: NEC Laboratories America, Inc.
    Inventors: Nishkam Ravi, Yi Yang, Srimat Chakradhar
  • Patent number: 8869127
    Abstract: Disclosed is a novel computer implemented system, on demand service, computer program product and a method that provides a set of lock usages that improves concurrency resulting in execution performance of the software application by reducing lock contention through refactoring. More specifically, disclosed is a method to refactor a software application. The method starts with accessing at least a portion of a software application that can execute in an operating environment where there are more two or more threads of execution. Next, a determination is made if there is at least one lock used in the software application to enforce limits on accessing a resource. In response to determining that there is a lock with a first type of construct with a given set of features, the software application is refactored with the lock to preserve behavior of the software application.
    Type: Grant
    Filed: January 3, 2011
    Date of Patent: October 21, 2014
    Assignee: International Business Machines Corporation
    Inventors: Julian Dolby, Manu Sridharan, Frank Tip, Max Schaefer
  • Patent number: 8850410
    Abstract: A system and method for improving software maintainability, performance, and/or security by associating a unique marker to each software code-block; the system comprising of a plurality of processors, a plurality of code-blocks, and a marker associated with each code-block. The system may also include a special hardware register (code-block marker hardware register) in each processor for identifying the markers of the code-blocks executed by the processor, without changing any of the plurality of code-blocks.
    Type: Grant
    Filed: January 29, 2010
    Date of Patent: September 30, 2014
    Assignee: International Business Machines Corporation
    Inventors: Ramanjaneya S. Burugula, Joefon Jann, Pratap C. Pattnaik
  • Patent number: 8843910
    Abstract: A facility for identifying functionally distinct memory access reorderings for a multithreaded program is described. The facility monitors execution of the program to detect, for each of one or more memory locations, an order in which the memory location was accessed by the threads of the program, each access being at least one of a read access and a write access. Among a number of possible memory access reorderings of a read access by a reading thread to a location and a write access by a writing thread to the same location where the write access preceded the read access, the facility identifies as functionally distinct memory access reorderings those possible memory access reorderings where the reading thread could have become newly aware of changed state of the writing thread as a result of the indicated read access.
    Type: Grant
    Filed: March 14, 2011
    Date of Patent: September 23, 2014
    Assignee: F5 Networks, Inc.
    Inventors: Andrew M. Schwerin, Peter J. Godman, Kaya Bekiroglu
  • Patent number: 8839219
    Abstract: An illustrative embodiment of a computer-implemented process for shared data prefetching and coalescing optimization versions a loop containing one or more shared references into an optimized loop and an un-optimized loop, transforms the optimized loop into a set of loops, and stores shared access associated information of the loop using a prologue loop in the set of loops. The shared access associated information pertains to remote data and is collected using the prologue loop in absence of network communication and builds a hash table. An associated data structure is updated each time the hash table is entered, and is sorted to remove duplicate entries and create a reduced data structure. Patterns across entries of the reduced data structure are identified and entries are coalesced. Data associated with a coalesced entry is pre-fetched using a single communication and a local buffer is populated with the fetched data for reuse.
    Type: Grant
    Filed: October 24, 2012
    Date of Patent: September 16, 2014
    Assignee: International Business Machines Corporation
    Inventors: Michail Alvanos, Ettore Tiotto
  • Patent number: 8826252
    Abstract: A system and method of compiling program code, wherein the program code includes an operation on an array of data elements stored in memory of a computer system. The program code is scanned for an equation which operates on data of lengths other than the limited number of vector supported data lengths. The equation is then replaced with vectorized machine executable code, wherein the machine executable code comprises a nested loop and wherein the nested loop comprises an exterior loop and a virtual interior loop. The exterior loop decomposes the equation into a plurality of loops of length N, wherein N is an integer greater than one. The virtual interior loop executes vector operations corresponding to the N length loop to form a result vector of length N, wherein the virtual interior loop includes one or more vector atomic memory operation (AMO) instructions, used to resolve false conflicts.
    Type: Grant
    Filed: June 12, 2009
    Date of Patent: September 2, 2014
    Assignee: Cray Inc.
    Inventor: Terry D. Greyzck
  • Patent number: 8813044
    Abstract: A method, system, and article of manufacture are disclosed for transforming a definition of a process for delivering a service. This service process definition is comprised of computer readable code. The method comprises the steps of expressing a given set of assumptions in a computer readable code; and transforming said process definition by using a processing unit to apply said assumptions to said process definition to change the configuration of the process definition. The process definition may be transformed by using factors relating to the specific context in or for which the process definition is executed. Also, the process definition may be transformed by identifying, in a flow diagram for the service process definition, flows to which the assumptions apply, and applying program rewriting techniques to those identified flows.
    Type: Grant
    Filed: September 6, 2012
    Date of Patent: August 19, 2014
    Assignee: International Business Machines Corporation
    Inventors: David F. Bantz, Steven J. Mastrianni, James R. Moulic, Dennis G. Shea
  • Patent number: 8813053
    Abstract: Systems and methods for parallel incomplete LU (ILU) factorization in distributed sparse linear systems, which order nodes underlying the equations in the system(s) by dividing nodes into interior nodes and boundary nodes and assigning no more than three codes to distinguish the boundary nodes. Each code determines an ordering of the nodes, which in turn determines the order in which the equations will be factored and the solution performed.
    Type: Grant
    Filed: September 25, 2012
    Date of Patent: August 19, 2014
    Assignee: Landmark Graphics Corporation
    Inventors: Qinghua Wang, James William Watts, III
  • Publication number: 20140229926
    Abstract: Apparatus, systems, and methods for a compiler are disclosed. One such compiler parses a human readable expression into a syntax tree and converts the syntax tree into an automaton having in-transitions and out-transitions. Converting can include unrolling the quantification as a function of in-degree limitations wherein in-degree limitations includes a limit on the number of transitions into a state of the automaton. The compiler can also convert the automaton into an image for programming a parallel machine, and publishes the image. Additional apparatus, systems, and methods are disclosed.
    Type: Application
    Filed: April 14, 2014
    Publication date: August 14, 2014
    Applicant: Micron Technology, Inc.
    Inventors: Junjuan Xu, Paul Glendenning
  • Patent number: 8806466
    Abstract: A program generation apparatus references a source program including a loop for executing a block N times (N?2) and having such dependence that a variable defined in a statement in the block pertaining to ith execution (1?i<N) is referenced by a statement in the block pertaining to jth execution (i<j?N), calculates equivalent representations of variables in the block pertaining to the ith execution and the block pertaining to any other execution than the ith execution, specifies, with respect to each representation of a target variable causing the dependence, a representation of a variable not causing the dependence that is equivalent to the representation of the target variable, and generates a program being for executing the block M times (M?N) and including a statement including the specified representation in place of each representation of the target variable.
    Type: Grant
    Filed: July 4, 2011
    Date of Patent: August 12, 2014
    Assignee: Panasonic Corporation
    Inventors: Akira Tanaka, Hiroyuki Morishita, Akihiko Inoue
  • Patent number: 8799629
    Abstract: A method of executing a loop over an integer index range of indices in a parallel manner includes assigning a plurality of index subsets of the integer index range to a corresponding plurality of threads, and defining for each index subset a start point of the index subset, an end point of the index subset, and a boundary point of the index subset positioned between the start point and the end point of the index subset. A portion of the index subset between the start point and the boundary point represents a private range and the portion of the index subset between the boundary point and the end point represents a public range. Loop code is executed by each thread based on the index subset of the integer index range assigned to the thread.
    Type: Grant
    Filed: December 4, 2008
    Date of Patent: August 5, 2014
    Assignee: Microsoft Corporation
    Inventors: Huseyin S. Yildiz, Stephen S. Toub, Paul Ringseth, John Duffy
  • Patent number: 8799881
    Abstract: According to one embodiment, a parallelizing unit divides a loop into first and second processes based on a program to be converted and division information. The first and second processes respectively have termination control information, loop control information, and change information. The parallelizing unit inserts into the first process a determination process determining whether the second process is terminated at execution of an (n?1)th iteration of the second process when the second process is subsequent to the first process or determining whether the second process is terminated at execution of an nth iteration of the second process when the second process precedes the first process. The parallelizing unit inserts into the second process a control process controlling execution of the second process based on the result of determination notified by the determination process.
    Type: Grant
    Filed: July 12, 2011
    Date of Patent: August 5, 2014
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Nobuaki Tojo, Hidenori Matsuzaki
  • Patent number: 8793675
    Abstract: Methods and apparatus to provide loop parallelization based on loop splitting and/or index array are described. In one embodiment, one or more split loops, corresponding to an original loop, are generated based on the mis-speculation information. In another embodiment, a plurality of subloops are generated from an original loop based on an index array. Other embodiments are also described.
    Type: Grant
    Filed: December 24, 2010
    Date of Patent: July 29, 2014
    Assignee: Intel Corporation
    Inventors: Jin Lin, Nishkam Ravi, Xinmin Tian, John L. Ng, Renat V. Valiullin
  • Patent number: 8769507
    Abstract: A method, system, and article of manufacture are disclosed for transforming a definition of a process for delivering a service on a specified computing device. This service process definition is comprised of computer readable code. The method comprises the steps of expressing a given set of assumptions in a computer readable code; and transforming the definition by using a processing unit to apply the assumptions to the definition of the process to change the way in which the process operates. The definition of the process may be transformed by using factors relating to the specific context in or for which the definition is executed. Also, the definition may be transformed by identifying, in a flow diagram for the process, flows to which the assumptions apply, and applying program rewriting techniques to those identified flows.
    Type: Grant
    Filed: May 14, 2009
    Date of Patent: July 1, 2014
    Assignee: International Business Machines Corporation
    Inventors: David F. Bantz, Steven J. Mastrianni, James R. Moulic, Dennis G. Shea
  • Patent number: 8762968
    Abstract: Prefetching irregular memory references into a software controlled cache is provided. A compiler analyzes source code to identify at least one of a plurality of loops that contain an irregular memory reference. The compiler determines if the irregular memory reference within the at least one loop is a candidate for optimization. Responsive to an indication that the irregular memory reference may be optimized, the compiler determines if the irregular memory reference is valid for prefetching. Responsive to an indication that the irregular memory reference is valid for prefetching, a store statement for an address of the irregular memory reference is inserted into the at least one loop. A runtime library call is inserted into a prefetch runtime library for the irregular memory reference. Data associated with the irregular memory reference is prefetched into the software controlled cache when the runtime library call is invoked.
    Type: Grant
    Filed: June 27, 2012
    Date of Patent: June 24, 2014
    Assignee: International Business Machines Corporation
    Inventors: Tong Chen, Marc Gonzalez tallada, Zehra N. Sura, Tao Zhang
  • Publication number: 20140173575
    Abstract: A compiling method compiles an object program to be executed by a processor having a plurality of execution units operable in parallel. In the method a first availability chain is created from a producer instruction (p1), scheduled for execution by a first one of the execution units (20: AGU), to a first consumer instruction (c1), scheduled for execution by a second one of the execution units (22: EXU) and requiring a value produced by the said producer instruction. The first availability chain comprises at least one move instruction (mv1-mv3) for moving the required value from a first point (20: ARF) accessible by the first execution unit to a second point (22: DRF) accessible by the second execution unit.
    Type: Application
    Filed: February 20, 2014
    Publication date: June 19, 2014
    Applicant: Altera Corporation
    Inventors: Marcio Merino Fernandes, Raymond Malcolm Livesley
  • Patent number: 8745607
    Abstract: According to one aspect of the present disclosure, a method and technique for reducing branch misprediction impact for nested loop code is disclosed. The method includes: responsive to identifying code having an outer loop and an inner loop, determining a quantity of iterations of the inner loop for an initial number of iterations of the outer loop; determining a number of processor cycles for executing the quantity of iterations of the inner loop for the initial number of iterations of the outer loop; determining whether the number of processor cycles is less than a threshold; and responsive to determining that the number of processor cycles is less than the threshold, fully unrolling the inner loop for the initial number of iterations of the outer loop.
    Type: Grant
    Filed: November 11, 2011
    Date of Patent: June 3, 2014
    Assignee: International Business Machines Corporation
    Inventors: Madhavi G. Valluri, Steven W. White
  • Patent number: 8745360
    Abstract: Embodiments of a method for performing parallel operations in a computer system when one or more conditional dependencies may be present, where a given conditional dependency includes a dependency associated with at least two data elements based on a pair of conditions. During operation, a processor receives instructions for generating one or more predicate values based on actual dependencies, where a given predicate value indicates data elements that may be safely evaluated in parallel, and where the given actual dependency occurs when the pair of conditions matches one or more criteria. Then, the processor executes the instructions for generating the one or more predicate values.
    Type: Grant
    Filed: September 24, 2008
    Date of Patent: June 3, 2014
    Assignee: Apple Inc.
    Inventors: Jeffry E. Gonion, Keith E. Diefendorff
  • Patent number: 8739026
    Abstract: The following is iteratively performed a number of times. Whether the markup language schema has an error is determined. Where the markup language schema has an error, the markup language schema is modified to attempt to correct the error.
    Type: Grant
    Filed: October 23, 2011
    Date of Patent: May 27, 2014
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventor: Yaron Naveh