Loop Compiling Patents (Class 717/150)
  • Patent number: 8949807
    Abstract: A device receives, via a technical computing environment, a program that includes a parallel construct and a command to be executed by graphical processing units, and analyzes the program. The device also creates, based on the parallel construct and the analysis, one or more instances of the command to be executed in parallel by the graphical processing units, and transforms, via the technical computing environment, the one or more command instances into one or more command instances that are executable by the graphical processing units. The device further allocates the one or more transformed command instances to the graphical processing units for parallel execution, and receives, from the graphical processing units, one or more results associated with parallel execution of the one or more transformed command instances by the graphical processing units.
    Type: Grant
    Filed: September 30, 2013
    Date of Patent: February 3, 2015
    Assignee: The MathWorks, Inc.
    Inventors: Halldor N. Stefansson, Edric Ellis
  • Patent number: 8930926
    Abstract: Methods, apparatus and computer software product for source code optimization are provided. In an exemplary embodiment, a first custom computing apparatus is used to optimize the execution of source code on a second computing apparatus. In this embodiment, the first custom computing apparatus contains a memory, a storage medium and at least one processor with at least one multi-stage execution unit. The second computing apparatus contains at least two multi-stage execution units that allow for parallel execution of tasks. The first custom computing apparatus optimizes the code for parallelism, locality of operations and contiguity of memory accesses on the second computing apparatus. This Abstract is provided for the sole purpose of complying with the Abstract requirement rules. This Abstract is submitted with the explicit understanding that it will not be used to interpret or to limit the scope or the meaning of the claims.
    Type: Grant
    Filed: April 16, 2010
    Date of Patent: January 6, 2015
    Assignee: Reservoir Labs, Inc.
    Inventors: Cedric Bastoul, Richard A. Lethin, Allen K. Leung, Benoit J. Meister, Peter Szilagyi, Nicolas T. Vasilache, David E. Wohlford
  • Patent number: 8914782
    Abstract: Source code is generated that includes one or more iterator-based expressions such as declarative queries. The source code is translated into an intermediate language that classifies operators making up the iterator-based expressions into classes based on whether the operators are aggregating, element-wise, or sink operators. The intermediate language, including the identified classes, is processed using an automaton to replace the iterator-based expressions with one or more equivalent non-iterator-based expressions. Where an iterator-based expression is nested, the nested expression is processed using an equivalent number of nested automatons. The resulting optimized source code may be compiled and executed using fewer virtual function calls than the equivalent non-optimized source code.
    Type: Grant
    Filed: November 10, 2010
    Date of Patent: December 16, 2014
    Assignee: Microsoft Corporation
    Inventors: Michael Isard, Yuan Yu, Derek Gordon Murray
  • Patent number: 8898794
    Abstract: One embodiment of a computer-implemented data structure synchronization mechanism comprises an interface for accessing a data structure and storing ownership data in a shared memory location. The method further comprises denying write operations if the thread attempting the write operation is not designated as the owner thread by said ownership data. The method further comprises denying requests to modify the ownership data if the thread making the request is not designated as the owner thread by said ownership data. The method further comprises effecting a write fence in the context of the thread making the request to modify ownership data prior to modifying the ownership data. Other embodiments are described.
    Type: Grant
    Filed: September 6, 2011
    Date of Patent: November 25, 2014
    Inventor: Andrei Teodor Borac
  • Publication number: 20140344793
    Abstract: An apparatus and method for executing code are provided. The apparatus includes a memory manager that allocates a stack in memory to store processed data that needs to be retained; a loop generator that divides program code programmed to be processed in parallel into regions based on a barrier function, transforms a region that includes the processed data that needs to be retained in the stack into a first coalescing loop, and transforms a region that uses the processed data stored in the stack into a second coalescing loop such that the transformed program code may be serially processed; and a loop changer that reverses a processing order of the second coalescing loop in comparison to a processing order of the first coalescing loop.
    Type: Application
    Filed: March 31, 2014
    Publication date: November 20, 2014
    Applicant: Samsung Electronics Co., Ltd.
    Inventors: Jin-Seok LEE, Seong-Gun KIM, Dong-Hoon YOO, Seok-Joong HWANG
  • Patent number: 8893104
    Abstract: The aspects enable a computing device to allocate memory space to variables during runtime compilation of a software application. A compiler may be modified to identify operations that can be performed on either a main pipe or an alternative pipe, identify chains of related operations that can be performed on either the main pipe or the alternative pipe, identify points in the execution of code at which the number of live values will exceed the number of registers, and choosing a chain of operations as a candidate to be moved to the alternative pipe in order to reduce the number of live values at identified points in the execution of code. The entire chosen chain of operations may be moved to the alternative pipe. The alternative pipe may perform the computations and return the results to the main pipe for execution.
    Type: Grant
    Filed: March 1, 2012
    Date of Patent: November 18, 2014
    Assignee: QUALCOMM Incorporated
    Inventors: Christopher A. Vick, Gregory M. Wright
  • Patent number: 8881124
    Abstract: According to the conventional loop parallelization method, when a loop in which a value of a loop-carried dependency variable can be calculated in all of the iterations without sequentially executing the loop from the start, it is determined that DOALL parallelization is not applicable due to the loop-carried dependency variable. Accordingly, the loop is sequentially executed or parallelized by using DOACROSS parallelization that executes a loop including a loop-carried dependency variable. That is, there is a problem that an expression including a loop-carried dependency cannot be parallelized and efficiently processed with use of a multi-processor. By generating initial value calculating codes, the loop-carried dependency in a source code prior to parallelization can be solved, and by dividing a loop included in the source code into subloops that can be executed in parallel, the multi-processor can efficiently process the source code.
    Type: Grant
    Filed: December 13, 2011
    Date of Patent: November 4, 2014
    Assignee: Panasonic Corporation
    Inventor: Daisuke Baba
  • Publication number: 20140325495
    Abstract: A computer implemented method entails identifying code regions in an application from which offloadable tasks can be generated by a compiler for heterogenous computing system with processor and accelerator memory, including adding relaxed semantics to a directive based language in the heterogenous computing for allowing a suggesting rather than specifying a parallel code region as an offloadable candidate, and identifying one or more offloadable tasks in a neighborhood of code region marked by the directive.
    Type: Application
    Filed: April 25, 2014
    Publication date: October 30, 2014
    Applicant: NEC Laboratories America, Inc.
    Inventors: Nishkam Ravi, Yi Yang, Srimat Chakradhar
  • Patent number: 8869127
    Abstract: Disclosed is a novel computer implemented system, on demand service, computer program product and a method that provides a set of lock usages that improves concurrency resulting in execution performance of the software application by reducing lock contention through refactoring. More specifically, disclosed is a method to refactor a software application. The method starts with accessing at least a portion of a software application that can execute in an operating environment where there are more two or more threads of execution. Next, a determination is made if there is at least one lock used in the software application to enforce limits on accessing a resource. In response to determining that there is a lock with a first type of construct with a given set of features, the software application is refactored with the lock to preserve behavior of the software application.
    Type: Grant
    Filed: January 3, 2011
    Date of Patent: October 21, 2014
    Assignee: International Business Machines Corporation
    Inventors: Julian Dolby, Manu Sridharan, Frank Tip, Max Schaefer
  • Patent number: 8850410
    Abstract: A system and method for improving software maintainability, performance, and/or security by associating a unique marker to each software code-block; the system comprising of a plurality of processors, a plurality of code-blocks, and a marker associated with each code-block. The system may also include a special hardware register (code-block marker hardware register) in each processor for identifying the markers of the code-blocks executed by the processor, without changing any of the plurality of code-blocks.
    Type: Grant
    Filed: January 29, 2010
    Date of Patent: September 30, 2014
    Assignee: International Business Machines Corporation
    Inventors: Ramanjaneya S. Burugula, Joefon Jann, Pratap C. Pattnaik
  • Patent number: 8843910
    Abstract: A facility for identifying functionally distinct memory access reorderings for a multithreaded program is described. The facility monitors execution of the program to detect, for each of one or more memory locations, an order in which the memory location was accessed by the threads of the program, each access being at least one of a read access and a write access. Among a number of possible memory access reorderings of a read access by a reading thread to a location and a write access by a writing thread to the same location where the write access preceded the read access, the facility identifies as functionally distinct memory access reorderings those possible memory access reorderings where the reading thread could have become newly aware of changed state of the writing thread as a result of the indicated read access.
    Type: Grant
    Filed: March 14, 2011
    Date of Patent: September 23, 2014
    Assignee: F5 Networks, Inc.
    Inventors: Andrew M. Schwerin, Peter J. Godman, Kaya Bekiroglu
  • Patent number: 8839219
    Abstract: An illustrative embodiment of a computer-implemented process for shared data prefetching and coalescing optimization versions a loop containing one or more shared references into an optimized loop and an un-optimized loop, transforms the optimized loop into a set of loops, and stores shared access associated information of the loop using a prologue loop in the set of loops. The shared access associated information pertains to remote data and is collected using the prologue loop in absence of network communication and builds a hash table. An associated data structure is updated each time the hash table is entered, and is sorted to remove duplicate entries and create a reduced data structure. Patterns across entries of the reduced data structure are identified and entries are coalesced. Data associated with a coalesced entry is pre-fetched using a single communication and a local buffer is populated with the fetched data for reuse.
    Type: Grant
    Filed: October 24, 2012
    Date of Patent: September 16, 2014
    Assignee: International Business Machines Corporation
    Inventors: Michail Alvanos, Ettore Tiotto
  • Patent number: 8826252
    Abstract: A system and method of compiling program code, wherein the program code includes an operation on an array of data elements stored in memory of a computer system. The program code is scanned for an equation which operates on data of lengths other than the limited number of vector supported data lengths. The equation is then replaced with vectorized machine executable code, wherein the machine executable code comprises a nested loop and wherein the nested loop comprises an exterior loop and a virtual interior loop. The exterior loop decomposes the equation into a plurality of loops of length N, wherein N is an integer greater than one. The virtual interior loop executes vector operations corresponding to the N length loop to form a result vector of length N, wherein the virtual interior loop includes one or more vector atomic memory operation (AMO) instructions, used to resolve false conflicts.
    Type: Grant
    Filed: June 12, 2009
    Date of Patent: September 2, 2014
    Assignee: Cray Inc.
    Inventor: Terry D. Greyzck
  • Patent number: 8813044
    Abstract: A method, system, and article of manufacture are disclosed for transforming a definition of a process for delivering a service. This service process definition is comprised of computer readable code. The method comprises the steps of expressing a given set of assumptions in a computer readable code; and transforming said process definition by using a processing unit to apply said assumptions to said process definition to change the configuration of the process definition. The process definition may be transformed by using factors relating to the specific context in or for which the process definition is executed. Also, the process definition may be transformed by identifying, in a flow diagram for the service process definition, flows to which the assumptions apply, and applying program rewriting techniques to those identified flows.
    Type: Grant
    Filed: September 6, 2012
    Date of Patent: August 19, 2014
    Assignee: International Business Machines Corporation
    Inventors: David F. Bantz, Steven J. Mastrianni, James R. Moulic, Dennis G. Shea
  • Patent number: 8813053
    Abstract: Systems and methods for parallel incomplete LU (ILU) factorization in distributed sparse linear systems, which order nodes underlying the equations in the system(s) by dividing nodes into interior nodes and boundary nodes and assigning no more than three codes to distinguish the boundary nodes. Each code determines an ordering of the nodes, which in turn determines the order in which the equations will be factored and the solution performed.
    Type: Grant
    Filed: September 25, 2012
    Date of Patent: August 19, 2014
    Assignee: Landmark Graphics Corporation
    Inventors: Qinghua Wang, James William Watts, III
  • Publication number: 20140229926
    Abstract: Apparatus, systems, and methods for a compiler are disclosed. One such compiler parses a human readable expression into a syntax tree and converts the syntax tree into an automaton having in-transitions and out-transitions. Converting can include unrolling the quantification as a function of in-degree limitations wherein in-degree limitations includes a limit on the number of transitions into a state of the automaton. The compiler can also convert the automaton into an image for programming a parallel machine, and publishes the image. Additional apparatus, systems, and methods are disclosed.
    Type: Application
    Filed: April 14, 2014
    Publication date: August 14, 2014
    Applicant: Micron Technology, Inc.
    Inventors: Junjuan Xu, Paul Glendenning
  • Patent number: 8806466
    Abstract: A program generation apparatus references a source program including a loop for executing a block N times (N?2) and having such dependence that a variable defined in a statement in the block pertaining to ith execution (1?i<N) is referenced by a statement in the block pertaining to jth execution (i<j?N), calculates equivalent representations of variables in the block pertaining to the ith execution and the block pertaining to any other execution than the ith execution, specifies, with respect to each representation of a target variable causing the dependence, a representation of a variable not causing the dependence that is equivalent to the representation of the target variable, and generates a program being for executing the block M times (M?N) and including a statement including the specified representation in place of each representation of the target variable.
    Type: Grant
    Filed: July 4, 2011
    Date of Patent: August 12, 2014
    Assignee: Panasonic Corporation
    Inventors: Akira Tanaka, Hiroyuki Morishita, Akihiko Inoue
  • Patent number: 8799629
    Abstract: A method of executing a loop over an integer index range of indices in a parallel manner includes assigning a plurality of index subsets of the integer index range to a corresponding plurality of threads, and defining for each index subset a start point of the index subset, an end point of the index subset, and a boundary point of the index subset positioned between the start point and the end point of the index subset. A portion of the index subset between the start point and the boundary point represents a private range and the portion of the index subset between the boundary point and the end point represents a public range. Loop code is executed by each thread based on the index subset of the integer index range assigned to the thread.
    Type: Grant
    Filed: December 4, 2008
    Date of Patent: August 5, 2014
    Assignee: Microsoft Corporation
    Inventors: Huseyin S. Yildiz, Stephen S. Toub, Paul Ringseth, John Duffy
  • Patent number: 8799881
    Abstract: According to one embodiment, a parallelizing unit divides a loop into first and second processes based on a program to be converted and division information. The first and second processes respectively have termination control information, loop control information, and change information. The parallelizing unit inserts into the first process a determination process determining whether the second process is terminated at execution of an (n?1)th iteration of the second process when the second process is subsequent to the first process or determining whether the second process is terminated at execution of an nth iteration of the second process when the second process precedes the first process. The parallelizing unit inserts into the second process a control process controlling execution of the second process based on the result of determination notified by the determination process.
    Type: Grant
    Filed: July 12, 2011
    Date of Patent: August 5, 2014
    Assignee: Kabushiki Kaisha Toshiba
    Inventors: Nobuaki Tojo, Hidenori Matsuzaki
  • Patent number: 8793675
    Abstract: Methods and apparatus to provide loop parallelization based on loop splitting and/or index array are described. In one embodiment, one or more split loops, corresponding to an original loop, are generated based on the mis-speculation information. In another embodiment, a plurality of subloops are generated from an original loop based on an index array. Other embodiments are also described.
    Type: Grant
    Filed: December 24, 2010
    Date of Patent: July 29, 2014
    Assignee: Intel Corporation
    Inventors: Jin Lin, Nishkam Ravi, Xinmin Tian, John L. Ng, Renat V. Valiullin
  • Patent number: 8769507
    Abstract: A method, system, and article of manufacture are disclosed for transforming a definition of a process for delivering a service on a specified computing device. This service process definition is comprised of computer readable code. The method comprises the steps of expressing a given set of assumptions in a computer readable code; and transforming the definition by using a processing unit to apply the assumptions to the definition of the process to change the way in which the process operates. The definition of the process may be transformed by using factors relating to the specific context in or for which the definition is executed. Also, the definition may be transformed by identifying, in a flow diagram for the process, flows to which the assumptions apply, and applying program rewriting techniques to those identified flows.
    Type: Grant
    Filed: May 14, 2009
    Date of Patent: July 1, 2014
    Assignee: International Business Machines Corporation
    Inventors: David F. Bantz, Steven J. Mastrianni, James R. Moulic, Dennis G. Shea
  • Patent number: 8762968
    Abstract: Prefetching irregular memory references into a software controlled cache is provided. A compiler analyzes source code to identify at least one of a plurality of loops that contain an irregular memory reference. The compiler determines if the irregular memory reference within the at least one loop is a candidate for optimization. Responsive to an indication that the irregular memory reference may be optimized, the compiler determines if the irregular memory reference is valid for prefetching. Responsive to an indication that the irregular memory reference is valid for prefetching, a store statement for an address of the irregular memory reference is inserted into the at least one loop. A runtime library call is inserted into a prefetch runtime library for the irregular memory reference. Data associated with the irregular memory reference is prefetched into the software controlled cache when the runtime library call is invoked.
    Type: Grant
    Filed: June 27, 2012
    Date of Patent: June 24, 2014
    Assignee: International Business Machines Corporation
    Inventors: Tong Chen, Marc Gonzalez tallada, Zehra N. Sura, Tao Zhang
  • Publication number: 20140173575
    Abstract: A compiling method compiles an object program to be executed by a processor having a plurality of execution units operable in parallel. In the method a first availability chain is created from a producer instruction (p1), scheduled for execution by a first one of the execution units (20: AGU), to a first consumer instruction (c1), scheduled for execution by a second one of the execution units (22: EXU) and requiring a value produced by the said producer instruction. The first availability chain comprises at least one move instruction (mv1-mv3) for moving the required value from a first point (20: ARF) accessible by the first execution unit to a second point (22: DRF) accessible by the second execution unit.
    Type: Application
    Filed: February 20, 2014
    Publication date: June 19, 2014
    Applicant: Altera Corporation
    Inventors: Marcio Merino Fernandes, Raymond Malcolm Livesley
  • Patent number: 8745360
    Abstract: Embodiments of a method for performing parallel operations in a computer system when one or more conditional dependencies may be present, where a given conditional dependency includes a dependency associated with at least two data elements based on a pair of conditions. During operation, a processor receives instructions for generating one or more predicate values based on actual dependencies, where a given predicate value indicates data elements that may be safely evaluated in parallel, and where the given actual dependency occurs when the pair of conditions matches one or more criteria. Then, the processor executes the instructions for generating the one or more predicate values.
    Type: Grant
    Filed: September 24, 2008
    Date of Patent: June 3, 2014
    Assignee: Apple Inc.
    Inventors: Jeffry E. Gonion, Keith E. Diefendorff
  • Patent number: 8745607
    Abstract: According to one aspect of the present disclosure, a method and technique for reducing branch misprediction impact for nested loop code is disclosed. The method includes: responsive to identifying code having an outer loop and an inner loop, determining a quantity of iterations of the inner loop for an initial number of iterations of the outer loop; determining a number of processor cycles for executing the quantity of iterations of the inner loop for the initial number of iterations of the outer loop; determining whether the number of processor cycles is less than a threshold; and responsive to determining that the number of processor cycles is less than the threshold, fully unrolling the inner loop for the initial number of iterations of the outer loop.
    Type: Grant
    Filed: November 11, 2011
    Date of Patent: June 3, 2014
    Assignee: International Business Machines Corporation
    Inventors: Madhavi G. Valluri, Steven W. White
  • Patent number: 8738348
    Abstract: A method and mechanism for implementing a general purpose scripting language that supports parallel execution is described. In one approach, parallel execution is provided in a seamless and high-level approach rather than requiring or expecting a user to have low-level programming expertise with parallel processing languages/functions. Also described is a system and method for performing circuit simulation. The present approach provides methods and systems that create reusable and independent measurements for use with circuit simulators. Also disclosed are parallelizable measurements having looping constructs that can be run without interference between parallel iterations. Reusability is enhanced by having parameterized measurements. Revisions and history of the operating parameters of circuit designs subject to simulation are tracked.
    Type: Grant
    Filed: June 15, 2012
    Date of Patent: May 27, 2014
    Assignee: Cadence Design Systems, Inc.
    Inventor: Kenneth S. Kundert
  • Patent number: 8739026
    Abstract: The following is iteratively performed a number of times. Whether the markup language schema has an error is determined. Where the markup language schema has an error, the markup language schema is modified to attempt to correct the error.
    Type: Grant
    Filed: October 23, 2011
    Date of Patent: May 27, 2014
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventor: Yaron Naveh
  • Patent number: 8739141
    Abstract: A system and method for speculatively parallelizing non-countable loops in a multi-threaded application. A multi-core processor receives instructions for a multi-threaded application. The application may contain non-countable loops. Non-countable loops have an iteration count value that cannot be determined prior to the execution of the non-countable loop, a loop index value that cannot be non-speculatively determined prior to the execution of an iteration of the non-countable loop, and control that is not transferred out of the loop body by a code line in the loop body. The compiler replaces the non-countable loop with a parallelized loop pattern that uses outlined function calls defined in a parallelization library (PL) in order to speculatively execute iterations of the parallelized loop. The parallelized loop pattern is configured to squash and re-execute any speculative thread of the parallelized loop pattern that is signaled to have a transaction failure.
    Type: Grant
    Filed: May 19, 2008
    Date of Patent: May 27, 2014
    Assignee: Oracle America, Inc.
    Inventors: Yonghong Song, Spiros Kalogeropulos, Partha P. Tirumalai
  • Patent number: 8732679
    Abstract: A new computer-compiler architecture includes code analysis processes in which loops present in an intermediate instruction set are transformed into more efficient loops prior to fully executing the intermediate instruction set. The compiler architecture starts by generating the equivalent intermediate instructions for the original high level source code. For each loop in the intermediate instructions, a total cycle cost is calculated using a cycle cost table associated with the compiler. The compiler then generates intermediate code for replacement loops in which all conversion instructions are removed. The cycle costs for these new transformed loops are then compared against the total cycle cost for the original loops. If the total cycle costs exceed the new cycle costs, the compiler will replace the original loops in the intermediate instructions with the new transformed loops prior to generation of final code using the instruction set of the processor.
    Type: Grant
    Filed: March 16, 2010
    Date of Patent: May 20, 2014
    Assignee: QUALCOMM Incorporated
    Inventors: Sumesh Udayakumaran, Chihong Zhang
  • Patent number: 8732732
    Abstract: Systems and methods that enhance and balance a late binding and an early binding in a programming language, via supplying an option component to opt-in (or opt-out) late binding, and wherein a late binding is triggered based on a static type for the variable (e.g., object or a type/string.) Additionally, the variable is enabled to have different static types at different regions (e.g., a program fragment) of the programming language.
    Type: Grant
    Filed: June 25, 2013
    Date of Patent: May 20, 2014
    Assignee: Microsoft Corporation
    Inventors: Henricus Johannes Maria Meijer, Brian C. Beckman, David N. Schach, Amanda Silver, Paul A. Vick, Peter F. Drayton, Avner Y. Aharoni, Ralf Lammel
  • Patent number: 8726249
    Abstract: A bootup device and method for an application program on a mobile equipment to improve the bootup speed of the application program on the mobile equipment. The bootup device has an application management module, that boots up a virtual machine module based on the application program to be run. A virtual machine module, loads codes of the application program and Just in Time (JIT) compilation results of a bootup process of the application program into a memory, search, in the JIT compilation results, for local JIT compiled codes corresponding to the bootup process code segment to be executed, and executes the found local JIT compiled codes when executing each bootup process code segment of the application program. A storage management module, store and reads the codes of the application program and the JIT compilation results obtained from the JIT compilation of the bootup process of the application program.
    Type: Grant
    Filed: February 21, 2011
    Date of Patent: May 13, 2014
    Assignee: ZTE Corportaion
    Inventors: Youpeng Gu, Lifeng Xu, Wei Hu, Sheng Zhong, Wei Wang, Zemin Wang
  • Patent number: 8726256
    Abstract: Apparatus, systems, and methods for a compiler are disclosed. One such compiler parses a human readable expression into a syntax tree and converts the syntax tree into an automaton having in-transitions and out-transitions. Converting can include unrolling the quantification as a function of in-degree limitations wherein in-degree limitations includes a limit on the number of transitions into a state of the automaton. The compiler can also convert the automaton into an image for programming a parallel machine, and publishes the image. Additional apparatus, systems, and methods are disclosed.
    Type: Grant
    Filed: January 24, 2012
    Date of Patent: May 13, 2014
    Assignee: Micron Technology, Inc.
    Inventors: Junjuan Xu, Paul Glendenning
  • Patent number: 8726251
    Abstract: Embodiments of the invention provide systems and methods for automatically parallelizing loops with non-speculative pipelined execution of chunks of iterations with pre-computation of selected values. Non-DOALL loops are identified and divided the loops into chunks. The chunks are assigned to separate logical threads, which may be further assigned to hardware threads. As a thread performs its runtime computations, subsequent threads attempt to pre-compute their respective chunks of the loop. These pre-computations may result in a set of assumed initial values and pre-computed final variable values associated with each chunk. As subsequent pre-computed chunks are reached at runtime, those assumed initial values can be verified to determine whether to proceed with runtime computation of the chunk or to avoid runtime execution and instead use the pre-computed final variable values.
    Type: Grant
    Filed: March 29, 2011
    Date of Patent: May 13, 2014
    Assignee: Oracle International Corporation
    Inventors: Spiros Kalogeropulos, Partha Pal Tirumalai
  • Patent number: 8719806
    Abstract: The latencies associated with retrieving instruction information for a main thread are decreased through the use of a simultaneous helper thread. The helper thread is a speculative prefetch thread to perform instruction prefetch and/or trace pre-build for the main thread.
    Type: Grant
    Filed: September 10, 2010
    Date of Patent: May 6, 2014
    Assignee: Intel Corporation
    Inventors: Hong Wang, Tor M. Aamodt, Pedro Marcuello, Jared W. Stark, IV, John P. Shen, Antonio Gonzalez, Per Hammarlund, Gerolf F. Hoflehner, Perry H. Wang, Steve Shih-wei Liao
  • Patent number: 8707281
    Abstract: One or more computer-readable media store executable instructions that, when executed by processing logic, perform parallel processing. The media store one or more instructions for initiating a single programming language, and identifying, via the single programming language, one or more data distribution schemes for executing a program. The media also store one or more instructions for transforming, via the single programming language, the program into a parallel program with an optimum data distribution scheme selected from the one or more identified data distribution schemes, and allocating the parallel program to two or more labs for parallel execution. The media further store one or more instructions for receiving one or more results associated with the parallel execution of the parallel program from the two or more labs, and providing the one or more results to the program.
    Type: Grant
    Filed: July 23, 2012
    Date of Patent: April 22, 2014
    Assignee: The MathWorks, Inc.
    Inventors: Piotr R. Luszczek, John N. Little, Jocelyn Luke Martin, Halldor N. Stefansson, Edric Ellis, Penelope L. Anderson, Brett Baker, Loren Dean, Roy E. Lurie
  • Patent number: 8677318
    Abstract: A computer implemented method, data processing system, computer usable program code, and active repository are provided for management of a software service. A request is received to deploy the software service in a computer network. A dependency analysis is performed for the requested software service to determine component software services and physical resources necessary to deploy and manage new software service as a composite in responsive to the software service being the new software service. An active object is created to manage the new software service using an active template based on the analysis. The new software service is deployed in the computer network using the active object. The new software service is managed using the active object.
    Type: Grant
    Filed: March 24, 2008
    Date of Patent: March 18, 2014
    Assignee: International Business Machines Corporation
    Inventors: Ajay Mohindra, Vijay K. Naik
  • Patent number: 8677335
    Abstract: Disclosed herein are methods and systems for using on stack replacement for optimization of software. A source code is compiled into an unoptimized code on a computing device. The unoptimized code is then executed on a computing device. A hot count is incremented. It is then determined whether a function within the unoptimized code is hot. If a function is determined to be hot, an OSR triggering code is inserted at a back edge of each loop within the function. The OSR triggering code is configured to trigger OSR at a loop depth that is less than the hot count.
    Type: Grant
    Filed: December 6, 2011
    Date of Patent: March 18, 2014
    Assignee: Google Inc.
    Inventors: Kevin Millikin, Mads Sig Ager, Kasper Verdich Lund, Florian Schneider
  • Patent number: 8677330
    Abstract: A compiling method compiles an object program to be executed by a processor having a plurality of execution units operable in parallel. In the method a first availability chain is created from a producer instruction (p1), scheduled for execution by a first one of the execution units (20: AGU), to a first consumer instruction (c1), scheduled for execution by a second one of the execution units (22: EXU) and requiring a value produced by the said producer instruction. The first availability chain comprises at least one move instruction (mv1-mv3) for moving the required value from a first point (20: ARF) accessible by the first execution unit to a second point (22: DRF) accessible by the second execution unit.
    Type: Grant
    Filed: June 9, 2010
    Date of Patent: March 18, 2014
    Assignee: Altera Corporation
    Inventors: Marcio Merino Fernandes, Raymond Malcolm Livesley
  • Patent number: 8677332
    Abstract: Systems and methods for compiling one or more code blocks written in programming language are provided. In some aspects, display associated with application is provided. Display includes plurality of graphical objects. That each of plurality of graphical objects is associated with child code block in one-to-one association between graphical objects and child code blocks is determined. Each child code block is written in programming language. The child code blocks associated with plurality of graphical objects are transformed into single parent code block. Parent code block, upon compiling, is configured to be reused across execution contexts and to allow injection of global scope. Parent code block, upon specific execution, includes execution context for specified child code block. Parent code block is configured to receive indication of specified child code block for initiating execution of parent code block. Parent code block is compiled.
    Type: Grant
    Filed: July 24, 2012
    Date of Patent: March 18, 2014
    Assignee: Google Inc.
    Inventors: John Hjelmstad, Malte Ubl
  • Patent number: 8671401
    Abstract: Described is a technology by which a series of loop nests corresponding to source code are detected by a compiler, with the series of loop nests tiled together, (thereby increasing the ratio of cache hits to misses in a multi-processor environment). The compiler transforms the series of loop nests into a plurality of tile loops within a controller loop, including using dependency analysis to determine which results from a tile loop need to be pre-computed before another tile loop. For dependency analysis, the compiler may use a directed acyclic graph as a high-level intermediate representation, and split the graph into sub-graphs each representing an array. The compiler uses descriptors processed from the graph to determine the controller loop and the tile loops within that controller loop.
    Type: Grant
    Filed: April 9, 2007
    Date of Patent: March 11, 2014
    Assignee: Microsoft Corporation
    Inventors: Siddhartha Puri, Jaydeep P. Marathe
  • Patent number: 8640112
    Abstract: System and method for vectorizing combinations of program operations. Program code is received that includes a combination of individually vectorizable program portions that collectively implement a first computation. Each individually vectorizable program portion has at least one array input and at least one array output. The combination of individually vectorizable program portions is transformed into a single vectorizable program portion that is or includes a functional composition of the combination of individually vectorizable program portions. Vectorized executable code implementing the first computation is generated based on the single vectorizable program portion. The generated executable code is directed to SIMD (Single-Instruction-Multiple-Data) computing units of a target processor.
    Type: Grant
    Filed: March 30, 2011
    Date of Patent: January 28, 2014
    Assignee: National Instruments Corporation
    Inventors: Haoran Yi, Brady C. Duggan, Robert E. Dye, Adam L. Bordelon, Jeffrey L. Kodosky
  • Patent number: 8635606
    Abstract: Technologies are generally described for runtime optimization adjusted dynamically according to changing costs of one or more system resources. Multicore systems may encounter dynamic variations in performance associated with the relative cost of related system resources. Furthermore, multicore systems can experience dramatic variations in resource availability and costs. A dynamic registry of system resource costs can be utilized to guide dynamic optimization. The relative scarcity of each resource can be updated dynamically within the registry of system resource costs. A runtime code generating loader and optimizer may be adapted to adjust optimization according to the resource cost registry. Information regarding system resource costs can support optimization tradeoffs based on resource cost functions.
    Type: Grant
    Filed: October 13, 2009
    Date of Patent: January 21, 2014
    Assignee: Empire Technology Development LLC
    Inventor: Ezekiel John Joseph Kruglick
  • Publication number: 20140019949
    Abstract: A method of program compilation to improve parallelism during the linking of the program by a compiler. The method includes converting statements of the program to canonical form, constructing abstract system tree (AST) for each procedure in the program, and traversing the program to construct a graph by making each non-control flow statement and each control structure into at least one node of the graph.
    Type: Application
    Filed: June 27, 2013
    Publication date: January 16, 2014
    Inventor: Loring Craymer
  • Patent number: 8627300
    Abstract: Technologies are generally described for parallel dynamic optimization using multicore processors. A runtime compiler may be adapted to generate multiple instances of executable code from a portable intermediate software module. The various instances of executable code may be generated with variations of optimization parameters such that the code instances each express different optimization attempts. A multicore processor may be leveraged to simultaneously execute some, or all, of the various code instances. Preferred optimization parameters may be determined from the executable code instances that may correctly complete in the least time, or may use the least amount of memory, or that may prove superior according to some other fitness metric. Preferred optimization parameters may be used to seed future optimization attempts. Output generated from the preferred instances may be used as soon as the first instance correctly completes block.
    Type: Grant
    Filed: October 13, 2009
    Date of Patent: January 7, 2014
    Assignee: Empire Technology Development LLC
    Inventor: Ezekiel John Joseph Kruglick
  • Publication number: 20140007061
    Abstract: Loop instructions are analyzed and assigned stage numbers based on dependencies between them and machine resources available. The loop instructions are selectively executed based on their stage numbers, thereby eliminating the need for explicit loop set-up and tear-down instructions. On a Single Instruction, Multiple Data machine, the final instance of each instruction may be executed on a subset of the processing elements or vector elements, dependent on the number of iterations of the original loop.
    Type: Application
    Filed: June 29, 2012
    Publication date: January 2, 2014
    Applicant: Analog Devices, Inc.
    Inventors: Michael G. Perkins, Andrew J. Higham
  • Patent number: 8612949
    Abstract: Methods and apparatuses for compiler-created helper thread for multi-threading are described herein. In one embodiment, exemplary process includes identifying a region of a main thread that likely has one or more delinquent loads, the one or more delinquent loads representing loads which likely suffer cache misses during an execution of the main thread, analyzing the region for one or more helper threads with respect to the main thread, and generating code for the one or more helper threads, the one or more helper threads being speculatively executed in parallel with the main thread to perform one or more tasks for the region of the main thread. Other methods and apparatuses are also described.
    Type: Grant
    Filed: December 31, 2009
    Date of Patent: December 17, 2013
    Assignee: Intel Corporation
    Inventors: Shih-wei Liao, Xinmin Tian, Gerolf F. Hoflehner, Hong Wang, Daniel M. Lavery, Perry Wang, Dongkeun Kim, Milind Girkar, John P. Shen
  • Patent number: 8601459
    Abstract: A system and method for discovering a set of possible iteration sequences for a given loop in a software program is described, to transform the loop representation. In a program containing a loop, the loop is partitioned into a plurality of portions based on splitting criteria. Labels are associated with the portions, and an initial loop automaton is constructed that represents the loop iterations as a regular language over the labels corresponding to the portions in the program. Subsequences of the labels are analyzed to determine infeasibility of the subsequences permitted in the automaton. The automaton is refined by removing all infeasible subsequences to discover a set of possible iteration sequences in the loop. The resulting loop automaton is used in a subsequent program verification or analysis technique to find violations of correctness properties in programs.
    Type: Grant
    Filed: April 9, 2013
    Date of Patent: December 3, 2013
    Assignee: NEC Laboratories America, Inc.
    Inventors: Sriram Sankaranarayanan, Aarti Gupta, Gogul Balakrishnan
  • Patent number: 8572593
    Abstract: Simplifying determination of whether application specific parameters are setup for optimal performance of associated applications. In an embodiment, a monitor program associated with an application specific parameter is identified and executed to cause retrieval of a current value of the parameter. The retrieved current value is then compared with a recommended value for the parameter to determine whether the parameter is setup for optimal performance of the application. The result of comparison may be displayed to the user. Another aspect provides for downloading of the recommended values and the monitor programs associated with application specific parameters from an external system (such as a vendor system). One more aspect enables the user to execute a correction program to correct the value of the parameter for optimal performance of the application.
    Type: Grant
    Filed: September 11, 2007
    Date of Patent: October 29, 2013
    Assignee: Oracle International Corporation
    Inventor: Venkata Naga Ravikiran Vedula
  • Patent number: 8549499
    Abstract: A method of dynamic parallelization for programs in systems having at least two processors includes examining computer code of a program to be performed by the system, determining a largest possible parallel region in the computer code, classifying data to be used by the program based on a usage pattern and initiating multiple, concurrent processes to perform the program. The multiple, concurrent processes ensure a baseline performance that is at least as efficient as a sequential performance of the computer code.
    Type: Grant
    Filed: June 18, 2007
    Date of Patent: October 1, 2013
    Assignee: University of Rochester
    Inventors: Chen Ding, Xipeng Shen, Ruke Huang
  • Patent number: 8549501
    Abstract: Generating mixed-mode operations in the compilation of program code for processors having vector or SIMD processing units is disclosed. In a preferred embodiment of the present invention, program instructions making up the body of a loop are abstracted into virtual vector instructions. These virtual vector instructions are treated, for initial code optimization purposes, as vector instructions (i.e., instructions written for the vector unit). The virtual vector instructions are eventually expanded into native code for the target processor, at which time a determination is made for each virtual vector instruction as to whether to expand the virtual vector instruction into native vector instructions, into native scalar instructions, into calls to pre-defined library functions, or into a combination of these. A cost model is used to determine the optimal choice of expansion based on hardware/software constraints, performance costs/benefits, and other criteria.
    Type: Grant
    Filed: August 16, 2004
    Date of Patent: October 1, 2013
    Assignee: International Business Machines Corporation
    Inventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu