Loop Compiling Patents (Class 717/150)

Saving and loading graphical processing unit (GPU) arrays providing high computational capabilities in a computing environment

Patent number: 8949807

Abstract: A device receives, via a technical computing environment, a program that includes a parallel construct and a command to be executed by graphical processing units, and analyzes the program. The device also creates, based on the parallel construct and the analysis, one or more instances of the command to be executed in parallel by the graphical processing units, and transforms, via the technical computing environment, the one or more command instances into one or more command instances that are executable by the graphical processing units. The device further allocates the one or more transformed command instances to the graphical processing units for parallel execution, and receives, from the graphical processing units, one or more results associated with parallel execution of the one or more transformed command instances by the graphical processing units.

Type: Grant

Filed: September 30, 2013

Date of Patent: February 3, 2015

Assignee: The MathWorks, Inc.

Inventors: Halldor N. Stefansson, Edric Ellis
System, methods and apparatus for program optimization for multi-threaded processor architectures

Patent number: 8930926

Abstract: Methods, apparatus and computer software product for source code optimization are provided. In an exemplary embodiment, a first custom computing apparatus is used to optimize the execution of source code on a second computing apparatus. In this embodiment, the first custom computing apparatus contains a memory, a storage medium and at least one processor with at least one multi-stage execution unit. The second computing apparatus contains at least two multi-stage execution units that allow for parallel execution of tasks. The first custom computing apparatus optimizes the code for parallelism, locality of operations and contiguity of memory accesses on the second computing apparatus. This Abstract is provided for the sole purpose of complying with the Abstract requirement rules. This Abstract is submitted with the explicit understanding that it will not be used to interpret or to limit the scope or the meaning of the claims.

Type: Grant

Filed: April 16, 2010

Date of Patent: January 6, 2015

Assignee: Reservoir Labs, Inc.

Inventors: Cedric Bastoul, Richard A. Lethin, Allen K. Leung, Benoit J. Meister, Peter Szilagyi, Nicolas T. Vasilache, David E. Wohlford
Optimization of declarative queries

Patent number: 8914782

Abstract: Source code is generated that includes one or more iterator-based expressions such as declarative queries. The source code is translated into an intermediate language that classifies operators making up the iterator-based expressions into classes based on whether the operators are aggregating, element-wise, or sink operators. The intermediate language, including the identified classes, is processed using an automaton to replace the iterator-based expressions with one or more equivalent non-iterator-based expressions. Where an iterator-based expression is nested, the nested expression is processed using an equivalent number of nested automatons. The resulting optimized source code may be compiled and executed using fewer virtual function calls than the equivalent non-optimized source code.

Type: Grant

Filed: November 10, 2010

Date of Patent: December 16, 2014

Assignee: Microsoft Corporation

Inventors: Michael Isard, Yuan Yu, Derek Gordon Murray
Efficient and secure data structure synchronization

Patent number: 8898794

Abstract: One embodiment of a computer-implemented data structure synchronization mechanism comprises an interface for accessing a data structure and storing ownership data in a shared memory location. The method further comprises denying write operations if the thread attempting the write operation is not designated as the owner thread by said ownership data. The method further comprises denying requests to modify the ownership data if the thread making the request is not designated as the owner thread by said ownership data. The method further comprises effecting a write fence in the context of the thread making the request to modify ownership data prior to modifying the ownership data. Other embodiments are described.

Type: Grant

Filed: September 6, 2011

Date of Patent: November 25, 2014

Inventor: Andrei Teodor Borac
APPARATUS AND METHOD FOR EXECUTING CODE

Publication number: 20140344793

Abstract: An apparatus and method for executing code are provided. The apparatus includes a memory manager that allocates a stack in memory to store processed data that needs to be retained; a loop generator that divides program code programmed to be processed in parallel into regions based on a barrier function, transforms a region that includes the processed data that needs to be retained in the stack into a first coalescing loop, and transforms a region that uses the processed data stored in the stack into a second coalescing loop such that the transformed program code may be serially processed; and a loop changer that reverses a processing order of the second coalescing loop in comparison to a processing order of the first coalescing loop.

Type: Application

Filed: March 31, 2014

Publication date: November 20, 2014

Applicant: Samsung Electronics Co., Ltd.

Inventors: Jin-Seok LEE, Seong-Gun KIM, Dong-Hoon YOO, Seok-Joong HWANG
Method and apparatus for register spill minimization

Patent number: 8893104

Abstract: The aspects enable a computing device to allocate memory space to variables during runtime compilation of a software application. A compiler may be modified to identify operations that can be performed on either a main pipe or an alternative pipe, identify chains of related operations that can be performed on either the main pipe or the alternative pipe, identify points in the execution of code at which the number of live values will exceed the number of registers, and choosing a chain of operations as a candidate to be moved to the alternative pipe in order to reduce the number of live values at identified points in the execution of code. The entire chosen chain of operations may be moved to the alternative pipe. The alternative pipe may perform the computations and return the results to the main pipe for execution.

Type: Grant

Filed: March 1, 2012

Date of Patent: November 18, 2014

Assignee: QUALCOMM Incorporated

Inventors: Christopher A. Vick, Gregory M. Wright
Compiler device, compiler program, and loop parallelization method

Patent number: 8881124

Abstract: According to the conventional loop parallelization method, when a loop in which a value of a loop-carried dependency variable can be calculated in all of the iterations without sequentially executing the loop from the start, it is determined that DOALL parallelization is not applicable due to the loop-carried dependency variable. Accordingly, the loop is sequentially executed or parallelized by using DOACROSS parallelization that executes a loop including a loop-carried dependency variable. That is, there is a problem that an expression including a loop-carried dependency cannot be parallelized and efficiently processed with use of a multi-processor. By generating initial value calculating codes, the loop-carried dependency in a source code prior to parallelization can be solved, and by dividing a loop included in the source code into subloops that can be executed in parallel, the multi-processor can efficiently process the source code.

Type: Grant

Filed: December 13, 2011

Date of Patent: November 4, 2014

Assignee: Panasonic Corporation

Inventor: Daisuke Baba
Semi-Automatic Restructuring of Offloadable Tasks for Accelerators

Publication number: 20140325495

Abstract: A computer implemented method entails identifying code regions in an application from which offloadable tasks can be generated by a compiler for heterogenous computing system with processor and accelerator memory, including adding relaxed semantics to a directive based language in the heterogenous computing for allowing a suggesting rather than specifying a parallel code region as an offloadable candidate, and identifying one or more offloadable tasks in a neighborhood of code region marked by the directive.

Type: Application

Filed: April 25, 2014

Publication date: October 30, 2014

Applicant: NEC Laboratories America, Inc.

Inventors: Nishkam Ravi, Yi Yang, Srimat Chakradhar
Refactoring programs for flexible locking

Patent number: 8869127

Abstract: Disclosed is a novel computer implemented system, on demand service, computer program product and a method that provides a set of lock usages that improves concurrency resulting in execution performance of the software application by reducing lock contention through refactoring. More specifically, disclosed is a method to refactor a software application. The method starts with accessing at least a portion of a software application that can execute in an operating environment where there are more two or more threads of execution. Next, a determination is made if there is at least one lock used in the software application to enforce limits on accessing a resource. In response to determining that there is a lock with a first type of construct with a given set of features, the software application is refactored with the lock to preserve behavior of the software application.

Type: Grant

Filed: January 3, 2011

Date of Patent: October 21, 2014

Assignee: International Business Machines Corporation

Inventors: Julian Dolby, Manu Sridharan, Frank Tip, Max Schaefer
System using a unique marker with each software code-block

Patent number: 8850410

Abstract: A system and method for improving software maintainability, performance, and/or security by associating a unique marker to each software code-block; the system comprising of a plurality of processors, a plurality of code-blocks, and a marker associated with each code-block. The system may also include a special hardware register (code-block marker hardware register) in each processor for identifying the markers of the code-blocks executed by the processor, without changing any of the plurality of code-blocks.

Type: Grant

Filed: January 29, 2010

Date of Patent: September 30, 2014

Assignee: International Business Machines Corporation

Inventors: Ramanjaneya S. Burugula, Joefon Jann, Pratap C. Pattnaik
Identifying a set of functionally distinct reorderings in a multithreaded program

Patent number: 8843910

Abstract: A facility for identifying functionally distinct memory access reorderings for a multithreaded program is described. The facility monitors execution of the program to detect, for each of one or more memory locations, an order in which the memory location was accessed by the threads of the program, each access being at least one of a read access and a write access. Among a number of possible memory access reorderings of a read access by a reading thread to a location and a write access by a writing thread to the same location where the write access preceded the read access, the facility identifies as functionally distinct memory access reorderings those possible memory access reorderings where the reading thread could have become newly aware of changed state of the writing thread as a result of the indicated read access.

Type: Grant

Filed: March 14, 2011

Date of Patent: September 23, 2014

Assignee: F5 Networks, Inc.

Inventors: Andrew M. Schwerin, Peter J. Godman, Kaya Bekiroglu
Data prefetching and coalescing for partitioned global address space languages

Patent number: 8839219

Abstract: An illustrative embodiment of a computer-implemented process for shared data prefetching and coalescing optimization versions a loop containing one or more shared references into an optimized loop and an un-optimized loop, transforms the optimized loop into a set of loops, and stores shared access associated information of the loop using a prologue loop in the set of loops. The shared access associated information pertains to remote data and is collected using the prologue loop in absence of network communication and builds a hash table. An associated data structure is updated each time the hash table is entered, and is sorted to remove duplicate entries and create a reduced data structure. Patterns across entries of the reduced data structure are identified and entries are coalesced. Data associated with a coalesced entry is pre-fetched using a single communication and a local buffer is populated with the fetched data for reuse.

Type: Grant

Filed: October 24, 2012

Date of Patent: September 16, 2014

Assignee: International Business Machines Corporation

Inventors: Michail Alvanos, Ettore Tiotto
Using vector atomic memory operation to handle data of different lengths

Patent number: 8826252

Abstract: A system and method of compiling program code, wherein the program code includes an operation on an array of data elements stored in memory of a computer system. The program code is scanned for an equation which operates on data of lengths other than the limited number of vector supported data lengths. The equation is then replaced with vectorized machine executable code, wherein the machine executable code comprises a nested loop and wherein the nested loop comprises an exterior loop and a virtual interior loop. The exterior loop decomposes the equation into a plurality of loops of length N, wherein N is an integer greater than one. The virtual interior loop executes vector operations corresponding to the N length loop to form a result vector of length N, wherein the virtual interior loop includes one or more vector atomic memory operation (AMO) instructions, used to resolve false conflicts.

Type: Grant

Filed: June 12, 2009

Date of Patent: September 2, 2014

Assignee: Cray Inc.

Inventor: Terry D. Greyzck
Dynamic optimization of mobile services

Patent number: 8813044

Abstract: A method, system, and article of manufacture are disclosed for transforming a definition of a process for delivering a service. This service process definition is comprised of computer readable code. The method comprises the steps of expressing a given set of assumptions in a computer readable code; and transforming said process definition by using a processing unit to apply said assumptions to said process definition to change the configuration of the process definition. The process definition may be transformed by using factors relating to the specific context in or for which the process definition is executed. Also, the process definition may be transformed by identifying, in a flow diagram for the service process definition, flows to which the assumptions apply, and applying program rewriting techniques to those identified flows.

Type: Grant

Filed: September 6, 2012

Date of Patent: August 19, 2014

Assignee: International Business Machines Corporation

Inventors: David F. Bantz, Steven J. Mastrianni, James R. Moulic, Dennis G. Shea
Systems and methods for improved parallel ILU factorization in distributed sparse linear systems

Patent number: 8813053

Abstract: Systems and methods for parallel incomplete LU (ILU) factorization in distributed sparse linear systems, which order nodes underlying the equations in the system(s) by dividing nodes into interior nodes and boundary nodes and assigning no more than three codes to distinguish the boundary nodes. Each code determines an ordering of the nodes, which in turn determines the order in which the equations will be factored and the solution performed.

Type: Grant

Filed: September 25, 2012

Date of Patent: August 19, 2014

Assignee: Landmark Graphics Corporation

Inventors: Qinghua Wang, James William Watts, III
UNROLLING QUANTIFICATIONS TO CONTROL IN-DEGREE AND/OR OUT-DEGREE OF AUTOMATON

Publication number: 20140229926

Abstract: Apparatus, systems, and methods for a compiler are disclosed. One such compiler parses a human readable expression into a syntax tree and converts the syntax tree into an automaton having in-transitions and out-transitions. Converting can include unrolling the quantification as a function of in-degree limitations wherein in-degree limitations includes a limit on the number of transitions into a state of the automaton. The compiler can also convert the automaton into an image for programming a parallel machine, and publishes the image. Additional apparatus, systems, and methods are disclosed.

Type: Application

Filed: April 14, 2014

Publication date: August 14, 2014

Applicant: Micron Technology, Inc.

Inventors: Junjuan Xu, Paul Glendenning
Program generation device, program production method, and program

Patent number: 8806466

Abstract: A program generation apparatus references a source program including a loop for executing a block N times (N?2) and having such dependence that a variable defined in a statement in the block pertaining to ith execution (1?i<N) is referenced by a statement in the block pertaining to jth execution (i<j?N), calculates equivalent representations of variables in the block pertaining to the ith execution and the block pertaining to any other execution than the ith execution, specifies, with respect to each representation of a target variable causing the dependence, a representation of a variable not causing the dependence that is equivalent to the representation of the target variable, and generates a program being for executing the block M times (M?N) and including a statement including the specified representation in place of each representation of the target variable.

Type: Grant

Filed: July 4, 2011

Date of Patent: August 12, 2014

Assignee: Panasonic Corporation

Inventors: Akira Tanaka, Hiroyuki Morishita, Akihiko Inoue
Parallel execution of a loop

Patent number: 8799629

Abstract: A method of executing a loop over an integer index range of indices in a parallel manner includes assigning a plurality of index subsets of the integer index range to a corresponding plurality of threads, and defining for each index subset a start point of the index subset, an end point of the index subset, and a boundary point of the index subset positioned between the start point and the end point of the index subset. A portion of the index subset between the start point and the boundary point represents a private range and the portion of the index subset between the boundary point and the end point represents a public range. Loop code is executed by each thread based on the index subset of the integer index range assigned to the thread.

Type: Grant

Filed: December 4, 2008

Date of Patent: August 5, 2014

Assignee: Microsoft Corporation

Inventors: Huseyin S. Yildiz, Stephen S. Toub, Paul Ringseth, John Duffy
Program parallelization device and program product

Patent number: 8799881

Abstract: According to one embodiment, a parallelizing unit divides a loop into first and second processes based on a program to be converted and division information. The first and second processes respectively have termination control information, loop control information, and change information. The parallelizing unit inserts into the first process a determination process determining whether the second process is terminated at execution of an (n?1)th iteration of the second process when the second process is subsequent to the first process or determining whether the second process is terminated at execution of an nth iteration of the second process when the second process precedes the first process. The parallelizing unit inserts into the second process a control process controlling execution of the second process based on the result of determination notified by the determination process.

Type: Grant

Filed: July 12, 2011

Date of Patent: August 5, 2014

Assignee: Kabushiki Kaisha Toshiba

Inventors: Nobuaki Tojo, Hidenori Matsuzaki
Loop parallelization based on loop splitting or index array

Patent number: 8793675

Abstract: Methods and apparatus to provide loop parallelization based on loop splitting and/or index array are described. In one embodiment, one or more split loops, corresponding to an original loop, are generated based on the mis-speculation information. In another embodiment, a plurality of subloops are generated from an original loop based on an index array. Other embodiments are also described.

Type: Grant

Filed: December 24, 2010

Date of Patent: July 29, 2014

Assignee: Intel Corporation

Inventors: Jin Lin, Nishkam Ravi, Xinmin Tian, John L. Ng, Renat V. Valiullin
Dynamic optimization of mobile services

Patent number: 8769507

Abstract: A method, system, and article of manufacture are disclosed for transforming a definition of a process for delivering a service on a specified computing device. This service process definition is comprised of computer readable code. The method comprises the steps of expressing a given set of assumptions in a computer readable code; and transforming the definition by using a processing unit to apply the assumptions to the definition of the process to change the way in which the process operates. The definition of the process may be transformed by using factors relating to the specific context in or for which the definition is executed. Also, the definition may be transformed by identifying, in a flow diagram for the process, flows to which the assumptions apply, and applying program rewriting techniques to those identified flows.

Type: Grant

Filed: May 14, 2009

Date of Patent: July 1, 2014

Assignee: International Business Machines Corporation

Inventors: David F. Bantz, Steven J. Mastrianni, James R. Moulic, Dennis G. Shea
Prefetching irregular data references for software controlled caches

Patent number: 8762968

Abstract: Prefetching irregular memory references into a software controlled cache is provided. A compiler analyzes source code to identify at least one of a plurality of loops that contain an irregular memory reference. The compiler determines if the irregular memory reference within the at least one loop is a candidate for optimization. Responsive to an indication that the irregular memory reference may be optimized, the compiler determines if the irregular memory reference is valid for prefetching. Responsive to an indication that the irregular memory reference is valid for prefetching, a store statement for an address of the irregular memory reference is inserted into the at least one loop. A runtime library call is inserted into a prefetch runtime library for the irregular memory reference. Data associated with the irregular memory reference is prefetched into the software controlled cache when the runtime library call is invoked.

Type: Grant

Filed: June 27, 2012

Date of Patent: June 24, 2014

Assignee: International Business Machines Corporation

Inventors: Tong Chen, Marc Gonzalez tallada, Zehra N. Sura, Tao Zhang
PROCESSORS AND COMPILING METHODS FOR PROCESSORS

Publication number: 20140173575

Abstract: A compiling method compiles an object program to be executed by a processor having a plurality of execution units operable in parallel. In the method a first availability chain is created from a producer instruction (p1), scheduled for execution by a first one of the execution units (20: AGU), to a first consumer instruction (c1), scheduled for execution by a second one of the execution units (22: EXU) and requiring a value produced by the said producer instruction. The first availability chain comprises at least one move instruction (mv1-mv3) for moving the required value from a first point (20: ARF) accessible by the first execution unit to a second point (22: DRF) accessible by the second execution unit.

Type: Application

Filed: February 20, 2014

Publication date: June 19, 2014

Applicant: Altera Corporation

Inventors: Marcio Merino Fernandes, Raymond Malcolm Livesley
Generating predicate values based on conditional data dependency in vector processors

Patent number: 8745360

Abstract: Embodiments of a method for performing parallel operations in a computer system when one or more conditional dependencies may be present, where a given conditional dependency includes a dependency associated with at least two data elements based on a pair of conditions. During operation, a processor receives instructions for generating one or more predicate values based on actual dependencies, where a given predicate value indicates data elements that may be safely evaluated in parallel, and where the given actual dependency occurs when the pair of conditions matches one or more criteria. Then, the processor executes the instructions for generating the one or more predicate values.

Type: Grant

Filed: September 24, 2008

Date of Patent: June 3, 2014

Assignee: Apple Inc.

Inventors: Jeffry E. Gonion, Keith E. Diefendorff
Reducing branch misprediction impact in nested loop code

Patent number: 8745607

Abstract: According to one aspect of the present disclosure, a method and technique for reducing branch misprediction impact for nested loop code is disclosed. The method includes: responsive to identifying code having an outer loop and an inner loop, determining a quantity of iterations of the inner loop for an initial number of iterations of the outer loop; determining a number of processor cycles for executing the quantity of iterations of the inner loop for the initial number of iterations of the outer loop; determining whether the number of processor cycles is less than a threshold; and responsive to determining that the number of processor cycles is less than the threshold, fully unrolling the inner loop for the initial number of iterations of the outer loop.

Type: Grant

Filed: November 11, 2011

Date of Patent: June 3, 2014

Assignee: International Business Machines Corporation

Inventors: Madhavi G. Valluri, Steven W. White
Method and system for implementing parallel execution in a computing system and in a circuit simulator

Patent number: 8738348

Abstract: A method and mechanism for implementing a general purpose scripting language that supports parallel execution is described. In one approach, parallel execution is provided in a seamless and high-level approach rather than requiring or expecting a user to have low-level programming expertise with parallel processing languages/functions. Also described is a system and method for performing circuit simulation. The present approach provides methods and systems that create reusable and independent measurements for use with circuit simulators. Also disclosed are parallelizable measurements having looping constructs that can be run without interference between parallel iterations. Reusability is enhanced by having parameterized measurements. Revisions and history of the operating parameters of circuit designs subject to simulation are tracked.

Type: Grant

Filed: June 15, 2012

Date of Patent: May 27, 2014

Assignee: Cadence Design Systems, Inc.

Inventor: Kenneth S. Kundert
Markup language schema error correction

Patent number: 8739026

Abstract: The following is iteratively performed a number of times. Whether the markup language schema has an error is determined. Where the markup language schema has an error, the markup language schema is modified to attempt to correct the error.

Type: Grant

Filed: October 23, 2011

Date of Patent: May 27, 2014

Assignee: Hewlett-Packard Development Company, L.P.

Inventor: Yaron Naveh
Parallelizing non-countable loops with hardware transactional memory

Patent number: 8739141

Abstract: A system and method for speculatively parallelizing non-countable loops in a multi-threaded application. A multi-core processor receives instructions for a multi-threaded application. The application may contain non-countable loops. Non-countable loops have an iteration count value that cannot be determined prior to the execution of the non-countable loop, a loop index value that cannot be non-speculatively determined prior to the execution of an iteration of the non-countable loop, and control that is not transferred out of the loop body by a code line in the loop body. The compiler replaces the non-countable loop with a parallelized loop pattern that uses outlined function calls defined in a parallelization library (PL) in order to speculatively execute iterations of the parallelized loop. The parallelized loop pattern is configured to squash and re-execute any speculative thread of the parallelized loop pattern that is signaled to have a transaction failure.

Type: Grant

Filed: May 19, 2008

Date of Patent: May 27, 2014

Assignee: Oracle America, Inc.

Inventors: Yonghong Song, Spiros Kalogeropulos, Partha P. Tirumalai
Loop transformation for computer compiler optimization

Patent number: 8732679

Abstract: A new computer-compiler architecture includes code analysis processes in which loops present in an intermediate instruction set are transformed into more efficient loops prior to fully executing the intermediate instruction set. The compiler architecture starts by generating the equivalent intermediate instructions for the original high level source code. For each loop in the intermediate instructions, a total cycle cost is calculated using a cycle cost table associated with the compiler. The compiler then generates intermediate code for replacement loops in which all conversion instructions are removed. The cycle costs for these new transformed loops are then compared against the total cycle cost for the original loops. If the total cycle costs exceed the new cycle costs, the compiler will replace the original loops in the intermediate instructions with the new transformed loops prior to generation of final code using the instruction set of the processor.

Type: Grant

Filed: March 16, 2010

Date of Patent: May 20, 2014

Assignee: QUALCOMM Incorporated

Inventors: Sumesh Udayakumaran, Chihong Zhang
Type inference and type-directed late binding

Patent number: 8732732

Abstract: Systems and methods that enhance and balance a late binding and an early binding in a programming language, via supplying an option component to opt-in (or opt-out) late binding, and wherein a late binding is triggered based on a static type for the variable (e.g., object or a type/string.) Additionally, the variable is enabled to have different static types at different regions (e.g., a program fragment) of the programming language.

Type: Grant

Filed: June 25, 2013

Date of Patent: May 20, 2014

Assignee: Microsoft Corporation

Inventors: Henricus Johannes Maria Meijer, Brian C. Beckman, David N. Schach, Amanda Silver, Paul A. Vick, Peter F. Drayton, Avner Y. Aharoni, Ralf Lammel
Bootup method and device for application program in mobile equipment

Patent number: 8726249

Abstract: A bootup device and method for an application program on a mobile equipment to improve the bootup speed of the application program on the mobile equipment. The bootup device has an application management module, that boots up a virtual machine module based on the application program to be run. A virtual machine module, loads codes of the application program and Just in Time (JIT) compilation results of a bootup process of the application program into a memory, search, in the JIT compilation results, for local JIT compiled codes corresponding to the bootup process code segment to be executed, and executes the found local JIT compiled codes when executing each bootup process code segment of the application program. A storage management module, store and reads the codes of the application program and the JIT compilation results obtained from the JIT compilation of the bootup process of the application program.

Type: Grant

Filed: February 21, 2011

Date of Patent: May 13, 2014

Assignee: ZTE Corportaion

Inventors: Youpeng Gu, Lifeng Xu, Wei Hu, Sheng Zhong, Wei Wang, Zemin Wang
Unrolling quantifications to control in-degree and/or out-degree of automaton

Patent number: 8726256

Abstract: Apparatus, systems, and methods for a compiler are disclosed. One such compiler parses a human readable expression into a syntax tree and converts the syntax tree into an automaton having in-transitions and out-transitions. Converting can include unrolling the quantification as a function of in-degree limitations wherein in-degree limitations includes a limit on the number of transitions into a state of the automaton. The compiler can also convert the automaton into an image for programming a parallel machine, and publishes the image. Additional apparatus, systems, and methods are disclosed.

Type: Grant

Filed: January 24, 2012

Date of Patent: May 13, 2014

Assignee: Micron Technology, Inc.

Inventors: Junjuan Xu, Paul Glendenning
Pipelined loop parallelization with pre-computations

Patent number: 8726251

Abstract: Embodiments of the invention provide systems and methods for automatically parallelizing loops with non-speculative pipelined execution of chunks of iterations with pre-computation of selected values. Non-DOALL loops are identified and divided the loops into chunks. The chunks are assigned to separate logical threads, which may be further assigned to hardware threads. As a thread performs its runtime computations, subsequent threads attempt to pre-compute their respective chunks of the loop. These pre-computations may result in a set of assumed initial values and pre-computed final variable values associated with each chunk. As subsequent pre-computed chunks are reached at runtime, those assumed initial values can be verified to determine whether to proceed with runtime computation of the chunk or to avoid runtime execution and instead use the pre-computed final variable values.

Type: Grant

Filed: March 29, 2011

Date of Patent: May 13, 2014

Assignee: Oracle International Corporation

Inventors: Spiros Kalogeropulos, Partha Pal Tirumalai
Speculative multi-threading for instruction prefetch and/or trace pre-build

Patent number: 8719806

Abstract: The latencies associated with retrieving instruction information for a main thread are decreased through the use of a simultaneous helper thread. The helper thread is a speculative prefetch thread to perform instruction prefetch and/or trace pre-build for the main thread.

Type: Grant

Filed: September 10, 2010

Date of Patent: May 6, 2014

Assignee: Intel Corporation

Inventors: Hong Wang, Tor M. Aamodt, Pedro Marcuello, Jared W. Stark, IV, John P. Shen, Antonio Gonzalez, Per Hammarlund, Gerolf F. Hoflehner, Perry H. Wang, Steve Shih-wei Liao
Performing parallel processing of distributed arrays

Patent number: 8707281

Abstract: One or more computer-readable media store executable instructions that, when executed by processing logic, perform parallel processing. The media store one or more instructions for initiating a single programming language, and identifying, via the single programming language, one or more data distribution schemes for executing a program. The media also store one or more instructions for transforming, via the single programming language, the program into a parallel program with an optimum data distribution scheme selected from the one or more identified data distribution schemes, and allocating the parallel program to two or more labs for parallel execution. The media further store one or more instructions for receiving one or more results associated with the parallel execution of the parallel program from the two or more labs, and providing the one or more results to the program.

Type: Grant

Filed: July 23, 2012

Date of Patent: April 22, 2014

Assignee: The MathWorks, Inc.

Inventors: Piotr R. Luszczek, John N. Little, Jocelyn Luke Martin, Halldor N. Stefansson, Edric Ellis, Penelope L. Anderson, Brett Baker, Loren Dean, Roy E. Lurie
Management of composite software services

Patent number: 8677318

Abstract: A computer implemented method, data processing system, computer usable program code, and active repository are provided for management of a software service. A request is received to deploy the software service in a computer network. A dependency analysis is performed for the requested software service to determine component software services and physical resources necessary to deploy and manage new software service as a composite in responsive to the software service being the new software service. An active object is created to manage the new software service using an active template based on the analysis. The new software service is deployed in the computer network using the active object. The new software service is managed using the active object.

Type: Grant

Filed: March 24, 2008

Date of Patent: March 18, 2014

Assignee: International Business Machines Corporation

Inventors: Ajay Mohindra, Vijay K. Naik
Performing on-stack replacement for outermost loops

Patent number: 8677335

Abstract: Disclosed herein are methods and systems for using on stack replacement for optimization of software. A source code is compiled into an unoptimized code on a computing device. The unoptimized code is then executed on a computing device. A hot count is incremented. It is then determined whether a function within the unoptimized code is hot. If a function is determined to be hot, an OSR triggering code is inserted at a back edge of each loop within the function. The OSR triggering code is configured to trigger OSR at a loop depth that is less than the hot count.

Type: Grant

Filed: December 6, 2011

Date of Patent: March 18, 2014

Assignee: Google Inc.

Inventors: Kevin Millikin, Mads Sig Ager, Kasper Verdich Lund, Florian Schneider
Processors and compiling methods for processors

Patent number: 8677330

Abstract: A compiling method compiles an object program to be executed by a processor having a plurality of execution units operable in parallel. In the method a first availability chain is created from a producer instruction (p1), scheduled for execution by a first one of the execution units (20: AGU), to a first consumer instruction (c1), scheduled for execution by a second one of the execution units (22: EXU) and requiring a value produced by the said producer instruction. The first availability chain comprises at least one move instruction (mv1-mv3) for moving the required value from a first point (20: ARF) accessible by the first execution unit to a second point (22: DRF) accessible by the second execution unit.

Type: Grant

Filed: June 9, 2010

Date of Patent: March 18, 2014

Assignee: Altera Corporation

Inventors: Marcio Merino Fernandes, Raymond Malcolm Livesley
Executing multiple child code blocks via a single compiled parent code block

Patent number: 8677332

Abstract: Systems and methods for compiling one or more code blocks written in programming language are provided. In some aspects, display associated with application is provided. Display includes plurality of graphical objects. That each of plurality of graphical objects is associated with child code block in one-to-one association between graphical objects and child code blocks is determined. Each child code block is written in programming language. The child code blocks associated with plurality of graphical objects are transformed into single parent code block. Parent code block, upon compiling, is configured to be reused across execution contexts and to allow injection of global scope. Parent code block, upon specific execution, includes execution context for specified child code block. Parent code block is configured to receive indication of specified child code block for initiating execution of parent code block. Parent code block is compiled.

Type: Grant

Filed: July 24, 2012

Date of Patent: March 18, 2014

Assignee: Google Inc.

Inventors: John Hjelmstad, Malte Ubl
Tiling across loop nests with possible recomputation

Patent number: 8671401

Abstract: Described is a technology by which a series of loop nests corresponding to source code are detected by a compiler, with the series of loop nests tiled together, (thereby increasing the ratio of cache hits to misses in a multi-processor environment). The compiler transforms the series of loop nests into a plurality of tile loops within a controller loop, including using dependency analysis to determine which results from a tile loop need to be pre-computed before another tile loop. For dependency analysis, the compiler may use a directed acyclic graph as a high-level intermediate representation, and split the graph into sub-graphs each representing an array. The compiler uses descriptors processed from the graph to determine the controller loop and the tile loops within that controller loop.

Type: Grant

Filed: April 9, 2007

Date of Patent: March 11, 2014

Assignee: Microsoft Corporation

Inventors: Siddhartha Puri, Jaydeep P. Marathe
Vectorizing combinations of program operations

Patent number: 8640112

Abstract: System and method for vectorizing combinations of program operations. Program code is received that includes a combination of individually vectorizable program portions that collectively implement a first computation. Each individually vectorizable program portion has at least one array input and at least one array output. The combination of individually vectorizable program portions is transformed into a single vectorizable program portion that is or includes a functional composition of the combination of individually vectorizable program portions. Vectorized executable code implementing the first computation is generated based on the single vectorizable program portion. The generated executable code is directed to SIMD (Single-Instruction-Multiple-Data) computing units of a target processor.

Type: Grant

Filed: March 30, 2011

Date of Patent: January 28, 2014

Assignee: National Instruments Corporation

Inventors: Haoran Yi, Brady C. Duggan, Robert E. Dye, Adam L. Bordelon, Jeffrey L. Kodosky
Dynamic optimization using a resource cost registry

Patent number: 8635606

Abstract: Technologies are generally described for runtime optimization adjusted dynamically according to changing costs of one or more system resources. Multicore systems may encounter dynamic variations in performance associated with the relative cost of related system resources. Furthermore, multicore systems can experience dramatic variations in resource availability and costs. A dynamic registry of system resource costs can be utilized to guide dynamic optimization. The relative scarcity of each resource can be updated dynamically within the registry of system resource costs. A runtime code generating loader and optimizer may be adapted to adjust optimization according to the resource cost registry. Information regarding system resource costs can support optimization tradeoffs based on resource cost functions.

Type: Grant

Filed: October 13, 2009

Date of Patent: January 21, 2014

Assignee: Empire Technology Development LLC

Inventor: Ezekiel John Joseph Kruglick
Method and System for Automated Improvement of Parallelism in Program Compilation

Publication number: 20140019949

Abstract: A method of program compilation to improve parallelism during the linking of the program by a compiler. The method includes converting statements of the program to canonical form, constructing abstract system tree (AST) for each procedure in the program, and traversing the program to construct a graph by making each non-control flow statement and each control structure into at least one node of the graph.

Type: Application

Filed: June 27, 2013

Publication date: January 16, 2014

Inventor: Loring Craymer
Parallel dynamic optimization

Patent number: 8627300

Abstract: Technologies are generally described for parallel dynamic optimization using multicore processors. A runtime compiler may be adapted to generate multiple instances of executable code from a portable intermediate software module. The various instances of executable code may be generated with variations of optimization parameters such that the code instances each express different optimization attempts. A multicore processor may be leveraged to simultaneously execute some, or all, of the various code instances. Preferred optimization parameters may be determined from the executable code instances that may correctly complete in the least time, or may use the least amount of memory, or that may prove superior according to some other fitness metric. Preferred optimization parameters may be used to seed future optimization attempts. Output generated from the preferred instances may be used as soon as the first instance correctly completes block.

Type: Grant

Filed: October 13, 2009

Date of Patent: January 7, 2014

Assignee: Empire Technology Development LLC

Inventor: Ezekiel John Joseph Kruglick
STAGED LOOP INSTRUCTIONS

Publication number: 20140007061

Abstract: Loop instructions are analyzed and assigned stage numbers based on dependencies between them and machine resources available. The loop instructions are selectively executed based on their stage numbers, thereby eliminating the need for explicit loop set-up and tear-down instructions. On a Single Instruction, Multiple Data machine, the final instance of each instruction may be executed on a subset of the processing elements or vector elements, dependent on the number of iterations of the original loop.

Type: Application

Filed: June 29, 2012

Publication date: January 2, 2014

Applicant: Analog Devices, Inc.

Inventors: Michael G. Perkins, Andrew J. Higham
Methods and apparatuses for compiler-creating helper threads for multi-threading

Patent number: 8612949

Abstract: Methods and apparatuses for compiler-created helper thread for multi-threading are described herein. In one embodiment, exemplary process includes identifying a region of a main thread that likely has one or more delinquent loads, the one or more delinquent loads representing loads which likely suffer cache misses during an execution of the main thread, analyzing the region for one or more helper threads with respect to the main thread, and generating code for the one or more helper threads, the one or more helper threads being speculatively executed in parallel with the main thread to perform one or more tasks for the region of the main thread. Other methods and apparatuses are also described.

Type: Grant

Filed: December 31, 2009

Date of Patent: December 17, 2013

Assignee: Intel Corporation

Inventors: Shih-wei Liao, Xinmin Tian, Gerolf F. Hoflehner, Hong Wang, Daniel M. Lavery, Perry Wang, Dongkeun Kim, Milind Girkar, John P. Shen
Control structure refinement of loops using static analysis

Patent number: 8601459

Abstract: A system and method for discovering a set of possible iteration sequences for a given loop in a software program is described, to transform the loop representation. In a program containing a loop, the loop is partitioned into a plurality of portions based on splitting criteria. Labels are associated with the portions, and an initial loop automaton is constructed that represents the loop iterations as a regular language over the labels corresponding to the portions in the program. Subsequences of the labels are analyzed to determine infeasibility of the subsequences permitted in the automaton. The automaton is refined by removing all infeasible subsequences to discover a set of possible iteration sequences in the loop. The resulting loop automaton is used in a subsequent program verification or analysis technique to find violations of correctness properties in programs.

Type: Grant

Filed: April 9, 2013

Date of Patent: December 3, 2013

Assignee: NEC Laboratories America, Inc.

Inventors: Sriram Sankaranarayanan, Aarti Gupta, Gogul Balakrishnan
Simplifying determination of whether application specific parameters are setup for optimal performance of associated applications

Patent number: 8572593

Abstract: Simplifying determination of whether application specific parameters are setup for optimal performance of associated applications. In an embodiment, a monitor program associated with an application specific parameter is identified and executed to cause retrieval of a current value of the parameter. The retrieved current value is then compared with a recommended value for the parameter to determine whether the parameter is setup for optimal performance of the application. The result of comparison may be displayed to the user. Another aspect provides for downloading of the recommended values and the monitor programs associated with application specific parameters from an external system (such as a vendor system). One more aspect enables the user to execute a correction program to correct the value of the parameter for optimal performance of the application.

Type: Grant

Filed: September 11, 2007

Date of Patent: October 29, 2013

Assignee: Oracle International Corporation

Inventor: Venkata Naga Ravikiran Vedula
Parallel programming using possible parallel regions and its language profiling compiler, run-time system and debugging support

Patent number: 8549499

Abstract: A method of dynamic parallelization for programs in systems having at least two processors includes examining computer code of a program to be performed by the system, determining a largest possible parallel region in the computer code, classifying data to be used by the program based on a usage pattern and initiating multiple, concurrent processes to perform the program. The multiple, concurrent processes ensure a baseline performance that is at least as efficient as a sequential performance of the computer code.

Type: Grant

Filed: June 18, 2007

Date of Patent: October 1, 2013

Assignee: University of Rochester

Inventors: Chen Ding, Xipeng Shen, Ruke Huang
Framework for generating mixed-mode operations in loop-level simdization

Patent number: 8549501

Abstract: Generating mixed-mode operations in the compilation of program code for processors having vector or SIMD processing units is disclosed. In a preferred embodiment of the present invention, program instructions making up the body of a loop are abstracted into virtual vector instructions. These virtual vector instructions are treated, for initial code optimization purposes, as vector instructions (i.e., instructions written for the vector unit). The virtual vector instructions are eventually expanded into native code for the target processor, at which time a determination is made for each virtual vector instruction as to whether to expand the virtual vector instruction into native vector instructions, into native scalar instructions, into calls to pre-defined library functions, or into a combination of these. A cost model is used to determine the optimal choice of expansion based on hardware/software constraints, performance costs/benefits, and other criteria.

Type: Grant

Filed: August 16, 2004

Date of Patent: October 1, 2013

Assignee: International Business Machines Corporation

Inventors: Alexandre E. Eichenberger, Kai-Ting Amy Wang, Peng Wu

prev 1 2 3 4 5 6 … next