Including Scheduling Instructions Patents (Class 717/161)
-
Patent number: 8918770Abstract: A system and method for compiling includes, for a parallelizable code portion of an application stored on a computer readable storage medium, determining one or more variables that are to be transferred to and/or from a coprocessor if the parallelizable code portion were to be offloaded. A start location and an end location are determined for at least one of the one or more variables as a size in memory. The parallelizable code portion is transformed by inserting an offload construct around the parallelizable code portion and passing the one or more variables and the size as arguments of the offload construct such that the parallelizable code portion is offloaded to a coprocessor at runtime.Type: GrantFiled: August 24, 2012Date of Patent: December 23, 2014Assignee: NEC Laboratories America, Inc.Inventors: Nishkam Ravi, Tao Bao, Ozcan Ozturk, Srimat Chakradhar
-
Patent number: 8898115Abstract: A computer-implemented method for using data archiving to expedite server migration may include: 1) archiving data from at least one source computing system to an archiving system in accordance with an archiving policy, 2) altering metadata associated with the archived data on the archiving system so that the metadata references a desired target computing system instead of the source computing system, and then, upon bringing the target computing system online, 3) restoring at least a portion of the archived data from the archiving system to the target computing system. Various other methods, systems, and configured computer-readable media are also disclosed.Type: GrantFiled: February 12, 2013Date of Patent: November 25, 2014Assignee: Symantec CorporationInventors: Laxmikant Gunda, Praveen Rakshe
-
Patent number: 8898648Abstract: A profiling tool identifies a code region with a false sharing potential. A static analysis tool classifies variables and arrays in the identified code region. A mapping detection library correlates memory access instructions in the identified code region with variables and arrays in the identified code region while a processor is running the identified code region. The mapping detection library identifies one or more instructions at risk, in the identified code region, which are subject to an analysis by a false sharing detection library. A false sharing detection library performs a run-time analysis of the one or more instructions at risk while the processor is re-running the identified code region. The false sharing detection library determines, based on the performed run-time analysis, whether two different portions of the cache memory line are accessed by the generated binary code.Type: GrantFiled: November 30, 2012Date of Patent: November 25, 2014Assignee: International Business Machines CorporationInventors: I-Hsin Chung, Guojing Cong, Hiroki Murata, Yasushi Negishi, Hui-Fang Wen
-
Patent number: 8893104Abstract: The aspects enable a computing device to allocate memory space to variables during runtime compilation of a software application. A compiler may be modified to identify operations that can be performed on either a main pipe or an alternative pipe, identify chains of related operations that can be performed on either the main pipe or the alternative pipe, identify points in the execution of code at which the number of live values will exceed the number of registers, and choosing a chain of operations as a candidate to be moved to the alternative pipe in order to reduce the number of live values at identified points in the execution of code. The entire chosen chain of operations may be moved to the alternative pipe. The alternative pipe may perform the computations and return the results to the main pipe for execution.Type: GrantFiled: March 1, 2012Date of Patent: November 18, 2014Assignee: QUALCOMM IncorporatedInventors: Christopher A. Vick, Gregory M. Wright
-
Patent number: 8893095Abstract: There are provided methods and computer program products for generating code for an architecture encoding an extended register specification. A method for generating code for a fixed-width instruction set includes identifying a non-contiguous register specifier. The method further includes generating a fixed-width instruction word that includes the non-contiguous register specifier.Type: GrantFiled: July 26, 2012Date of Patent: November 18, 2014Assignee: International Business Machines CorporationInventors: Michael Karl Gschwind, Robert Kevin Montoye, Brett Olsson, John-David Wellman
-
Patent number: 8869129Abstract: An apparatus and method for scheduling an instruction are provided. The apparatus includes an analyzer configured to analyze dependency of a plurality of recurrence loops and a scheduler configured to schedule the recurrence loops based the analyzed dependencies. When scheduling a plurality of recurrence loops, the apparatus first schedules a dominant loop whose loop head has no dependency on another loop among the recurrence loops.Type: GrantFiled: November 2, 2009Date of Patent: October 21, 2014Assignee: Samsung Electronics Co., Ltd.Inventors: Tae-wook Oh, Won-sub Kim, Bernhard Egger
-
Patent number: 8856313Abstract: A system and method for determining data usage based on provenance information, in a stream-processing system, includes progressively setting usage information for output stream data objects (SDOs), determining input SDOs that an output SDO depends on, based on a provenance dependency function; recursively feeding back the usage information for a subset of SDOs that can be discarded; and discarding the subset of SDOs. A system and method for data retention based on usage information, in a stream-processing system, includes managing retention of SDOs by deleting SDOs that are determined to be of null usage; and enhancing retention characteristics of SDOs that are deemed to have usage.Type: GrantFiled: November 13, 2007Date of Patent: October 7, 2014Assignee: International Business Machines CorporationInventors: Lisa Amini, Chitra Venkatramani
-
Patent number: 8850410Abstract: A system and method for improving software maintainability, performance, and/or security by associating a unique marker to each software code-block; the system comprising of a plurality of processors, a plurality of code-blocks, and a marker associated with each code-block. The system may also include a special hardware register (code-block marker hardware register) in each processor for identifying the markers of the code-blocks executed by the processor, without changing any of the plurality of code-blocks.Type: GrantFiled: January 29, 2010Date of Patent: September 30, 2014Assignee: International Business Machines CorporationInventors: Ramanjaneya S. Burugula, Joefon Jann, Pratap C. Pattnaik
-
Patent number: 8832669Abstract: Generating decode time instruction optimization (DTIO) object code that enables a DTIO enabled processor to optimize execution of DTIO instructions. A code sequence configured to facilitate DTIO in a DTIO enabled processor is identified by a computer. The code sequence includes an internal representation (IR) of a first instruction and an IR of a second instruction. The second instruction is dependent on the first instruction. A schedule associated with at least one of the IR of the first instruction and the IR of the second instruction is modified. The modifying includes generating a modified schedule that is configured to place the first instruction next to the second instruction. An object file is generated based on the modified schedule. The object file includes the first instruction placed next to the second instruction. The object file is emitted.Type: GrantFiled: August 5, 2013Date of Patent: September 9, 2014Assignee: International Business Machines CorporationInventors: Robert J. Blainey, Michael K. Gschwind, James L. McInnes, Steven J. Munroe
-
Patent number: 8831791Abstract: Illustrative embodiments provide a computer implemented method, a data processing system, and a computer program product for adjusting cooling settings. The computer implemented method comprises analyzing a set of instructions of an application to determine a number of degrees by which a set of instructions will raise a temperature of at least one processor core. The computer implemented method further calculates a cooling setting for at least one cooling system for the at least one processor core. The computer implemented method adjusts the at least one cooling system based on the cooling setting. The step of analyzing the set of instructions is performed before the set of instructions is executed on the at least one processor core. The step of adjusting the at least one cooling system is performed before the set of instructions is executed on the at least one processor core.Type: GrantFiled: June 21, 2012Date of Patent: September 9, 2014Assignee: International Business Machines CorporationInventors: Robert L. Angell, David W. Cosby, Robert R. Friedlander, James R. Kraemer
-
Patent number: 8826257Abstract: A method of memory disambiguation hardware to support software binary translation is provided. This method includes unrolling a set of instructions to be executed within a processor, the set of instructions having a number of memory operations. An original relative order of memory operations is determined. Then, possible reordering problems are detected and identified in software. The reordering problem being when a first memory operation has been reordered prior to and aliases to a second memory operation with respect to the original order of memory operations. The reordering problem is addressed and a relative order of memory operations to the processor is communicated.Type: GrantFiled: March 30, 2012Date of Patent: September 2, 2014Assignee: Intel CorporationInventors: Muawya M. Al-Otoom, Paul Caprioli, Abhay S. Kanhere, Arvind Krishnaswamy, Omar M. Shaikh
-
Patent number: 8819618Abstract: A device receives a model that includes model elements scheduled to execute in time slots on a hardware device. The device identifies time slots, of the time slots, that are unoccupied or underutilized by the model elements, and identifies a set of model elements that can be moved to the unoccupied time slots without affecting a behavior of the model. The device calculates a combined execution time of the model elements, determines whether the combined execution time of the model elements is less than or equal to a duration of a first time slot of the time slots, and schedules the model elements for execution in the first time slot when the combined execution time of the model elements is less than or equal to the duration of the first time slot.Type: GrantFiled: September 26, 2012Date of Patent: August 26, 2014Assignee: The MathWorks, Inc.Inventors: David MacLay, Matej Urbas
-
Patent number: 8813044Abstract: A method, system, and article of manufacture are disclosed for transforming a definition of a process for delivering a service. This service process definition is comprised of computer readable code. The method comprises the steps of expressing a given set of assumptions in a computer readable code; and transforming said process definition by using a processing unit to apply said assumptions to said process definition to change the configuration of the process definition. The process definition may be transformed by using factors relating to the specific context in or for which the process definition is executed. Also, the process definition may be transformed by identifying, in a flow diagram for the service process definition, flows to which the assumptions apply, and applying program rewriting techniques to those identified flows.Type: GrantFiled: September 6, 2012Date of Patent: August 19, 2014Assignee: International Business Machines CorporationInventors: David F. Bantz, Steven J. Mastrianni, James R. Moulic, Dennis G. Shea
-
Patent number: 8806466Abstract: A program generation apparatus references a source program including a loop for executing a block N times (N?2) and having such dependence that a variable defined in a statement in the block pertaining to ith execution (1?i<N) is referenced by a statement in the block pertaining to jth execution (i<j?N), calculates equivalent representations of variables in the block pertaining to the ith execution and the block pertaining to any other execution than the ith execution, specifies, with respect to each representation of a target variable causing the dependence, a representation of a variable not causing the dependence that is equivalent to the representation of the target variable, and generates a program being for executing the block M times (M?N) and including a statement including the specified representation in place of each representation of the target variable.Type: GrantFiled: July 4, 2011Date of Patent: August 12, 2014Assignee: Panasonic CorporationInventors: Akira Tanaka, Hiroyuki Morishita, Akihiko Inoue
-
Patent number: 8769507Abstract: A method, system, and article of manufacture are disclosed for transforming a definition of a process for delivering a service on a specified computing device. This service process definition is comprised of computer readable code. The method comprises the steps of expressing a given set of assumptions in a computer readable code; and transforming the definition by using a processing unit to apply the assumptions to the definition of the process to change the way in which the process operates. The definition of the process may be transformed by using factors relating to the specific context in or for which the definition is executed. Also, the definition may be transformed by identifying, in a flow diagram for the process, flows to which the assumptions apply, and applying program rewriting techniques to those identified flows.Type: GrantFiled: May 14, 2009Date of Patent: July 1, 2014Assignee: International Business Machines CorporationInventors: David F. Bantz, Steven J. Mastrianni, James R. Moulic, Dennis G. Shea
-
Patent number: 8752036Abstract: Embodiments of the invention provide systems and methods for throughput-aware software pipelining in compilers to produce optimal code for single-thread and multi-thread execution on multi-threaded systems. A loop is identified within source code as a candidate for software pipelining. An attempt is made to generate pipelined code (e.g., generate an instruction schedule and a set of register assignments) for the loop in satisfaction of throughput-aware pipelining criteria, like maximum register count, minimum trip count, target core pipeline resource utilization, maximum code size, etc. If the attempt fails to generate code in satisfaction of the criteria, embodiments adjust one or more settings (e.g., by reducing scalarity or latency settings being used to generate the instruction schedule).Type: GrantFiled: October 31, 2011Date of Patent: June 10, 2014Assignee: Oracle International CorporationInventors: Spiros Kalogeropulos, Partha Tirumalai
-
Patent number: 8745608Abstract: A scheduler of a reconfigurable array, a method of scheduling commands, and a computing apparatus are provided. To perform a loop operation in a reconfigurable array, a recurrence node, a producer node, and a predecessor node are detected from a data flow graph of the loop operation such that resources are assigned to such nodes so as to increase the loop operating speed. Also, a dedicated path having a fixed delay may be added to the assigned resources.Type: GrantFiled: February 1, 2010Date of Patent: June 3, 2014Assignee: Samsung Electronics Co., Ltd.Inventors: Won-sub Kim, Tae-wook Oh, Bernhard Egger
-
Patent number: 8738348Abstract: A method and mechanism for implementing a general purpose scripting language that supports parallel execution is described. In one approach, parallel execution is provided in a seamless and high-level approach rather than requiring or expecting a user to have low-level programming expertise with parallel processing languages/functions. Also described is a system and method for performing circuit simulation. The present approach provides methods and systems that create reusable and independent measurements for use with circuit simulators. Also disclosed are parallelizable measurements having looping constructs that can be run without interference between parallel iterations. Reusability is enhanced by having parameterized measurements. Revisions and history of the operating parameters of circuit designs subject to simulation are tracked.Type: GrantFiled: June 15, 2012Date of Patent: May 27, 2014Assignee: Cadence Design Systems, Inc.Inventor: Kenneth S. Kundert
-
Patent number: 8738570Abstract: A file cloning mechanism allows for quickly creating copies (clones) of files within a filesystem, such as when a user makes a copy of a file. In exemplary embodiments, a clone of a source object is at least initially represented by a structure containing references to various elements of the source object (e.g., indirect onodes, direct onodes, and data blocks). Both read-only and mutable clones can be created. The source file and the clone initially share such elements and continue to share unmodified elements as changes are made to the source file or mutable clone. None of the user data blocks or the metadata blocks describing the data stream (i.e., the indirect/direct onodes) associated with the source file need to be copied at the time the clone is created. At appropriate times, cloned files may be “de-cloned.Type: GrantFiled: November 21, 2011Date of Patent: May 27, 2014Assignee: Hitachi Data Systems Engineering UK LimitedInventors: Daniel J. N. Picken, Neil Berrington
-
Patent number: 8726251Abstract: Embodiments of the invention provide systems and methods for automatically parallelizing loops with non-speculative pipelined execution of chunks of iterations with pre-computation of selected values. Non-DOALL loops are identified and divided the loops into chunks. The chunks are assigned to separate logical threads, which may be further assigned to hardware threads. As a thread performs its runtime computations, subsequent threads attempt to pre-compute their respective chunks of the loop. These pre-computations may result in a set of assumed initial values and pre-computed final variable values associated with each chunk. As subsequent pre-computed chunks are reached at runtime, those assumed initial values can be verified to determine whether to proceed with runtime computation of the chunk or to avoid runtime execution and instead use the pre-computed final variable values.Type: GrantFiled: March 29, 2011Date of Patent: May 13, 2014Assignee: Oracle International CorporationInventors: Spiros Kalogeropulos, Partha Pal Tirumalai
-
Patent number: 8701099Abstract: A method, a system and a computer program product for effectively accelerating loop iterators using speculative execution of iterators. An Efficient Loop Iterator (ELI) utility detects initiation of a target program and initiates/spawns a speculative iterator thread at the start of the basic code block ahead of the code block that initiates a nested loop. The ELI utility assigns the iterator thread to a dedicated processor in a multi-processor system. The speculative thread runs/executes ahead of the execution of the nested loop and calculates indices in a corresponding multidimensional array. The iterator thread adds all the precomputed indices to a single queue. As a result, the ELI utility effectively enables a multidimensional loop to be replaced by a single dimensional loop. At the beginning of (or during) each iteration of the iterator, the ELI utility “dequeues” an entry from the queue to use the entry to access the array upon which the ELI utility iterates.Type: GrantFiled: November 2, 2010Date of Patent: April 15, 2014Assignee: International Business Machines CorporationInventors: Ganesh Bikshandi, Dibyendu Das, Smruti Ranjan Sarangi
-
Patent number: 8689202Abstract: A method of automatically extracting information from an architecture description. A memory resident directed acyclic graph data structure comprising nodes representing instructions and edges whose weights represent dependencies between pairs of instructions is constructed. A list of ready nodes are maintained in the directed acyclic graph. A list of nodes not scheduled is maintained. And, it is determined whether the next instruction to be scheduled is to be taken from the list of ready nodes or from the list of nodes not yet scheduled.Type: GrantFiled: March 30, 2005Date of Patent: April 1, 2014Assignee: Synopsys, Inc.Inventors: Gunnar Braun, Andreas Hoffmann, Volker Grieve, Manuel Hohenauer, Rainer Leupers
-
Patent number: 8677338Abstract: Methods and apparatus to data dependence testing for loop fusion, e.g., with code replication, array contraction, and/or loop interchange, are described. In one embodiment, a compiler may optimize code for efficient execution during run-time by testing for dependencies associated with improving memory locality through code replication in loops that enable various loop transformations. Other embodiments are also described.Type: GrantFiled: June 4, 2008Date of Patent: March 18, 2014Assignee: Intel CorporationInventors: John L. Ng, Rakesh Krishnaiyer, Alexander Y. Ostanevich
-
Patent number: 8635606Abstract: Technologies are generally described for runtime optimization adjusted dynamically according to changing costs of one or more system resources. Multicore systems may encounter dynamic variations in performance associated with the relative cost of related system resources. Furthermore, multicore systems can experience dramatic variations in resource availability and costs. A dynamic registry of system resource costs can be utilized to guide dynamic optimization. The relative scarcity of each resource can be updated dynamically within the registry of system resource costs. A runtime code generating loader and optimizer may be adapted to adjust optimization according to the resource cost registry. Information regarding system resource costs can support optimization tradeoffs based on resource cost functions.Type: GrantFiled: October 13, 2009Date of Patent: January 21, 2014Assignee: Empire Technology Development LLCInventor: Ezekiel John Joseph Kruglick
-
Patent number: 8635627Abstract: A method, medium and apparatus for storing and restoring a register context for a fast context switching between tasks is disclosed. The method, medium and apparatus may improve overall operating speed of a system by increasing the speed of context switching. The method may include adding an update code for updating information of live registers to a task file that includes a code of a task to perform a specified function, converting the task file having the update code added thereto into a run file, updating the information of the live registers with the update code during running of the task using the run file, and storing a live register context according to the updated information of the registers.Type: GrantFiled: December 12, 2006Date of Patent: January 21, 2014Assignee: Samsung Electronics Co., Ltd.Inventors: Jung-keun Park, Keun-soo Yim, Woon-gee Kim, Jeong-joon Yoo, Kyoung-ho Kang, Chae-seok Im, Jae-don Lee
-
Patent number: 8627300Abstract: Technologies are generally described for parallel dynamic optimization using multicore processors. A runtime compiler may be adapted to generate multiple instances of executable code from a portable intermediate software module. The various instances of executable code may be generated with variations of optimization parameters such that the code instances each express different optimization attempts. A multicore processor may be leveraged to simultaneously execute some, or all, of the various code instances. Preferred optimization parameters may be determined from the executable code instances that may correctly complete in the least time, or may use the least amount of memory, or that may prove superior according to some other fitness metric. Preferred optimization parameters may be used to seed future optimization attempts. Output generated from the preferred instances may be used as soon as the first instance correctly completes block.Type: GrantFiled: October 13, 2009Date of Patent: January 7, 2014Assignee: Empire Technology Development LLCInventor: Ezekiel John Joseph Kruglick
-
Patent number: 8615746Abstract: Compiling code for an enhanced application binary interface (ABI) including identifying, by a computer, a code sequence configured to perform a variable address reference table function including an access to a variable at an offset outside of a location in a variable address reference table. The code sequence includes an internal representation (IR) of a first instruction and an IR of a second instruction. The second instruction is dependent on the first instruction. A scheduler cost function associated with at least one of the IR of the first instruction and the IR of the second instruction is modified. The modifying includes generating a modified scheduler cost function that is configured to place the first instruction next to the second instruction. An object file is generated responsive to the modified scheduler cost function. The object file includes the first instruction placed next to the second instruction. The object file is emitted.Type: GrantFiled: April 30, 2012Date of Patent: December 24, 2013Assignee: International Business Machines CorporationInventors: Robert J. Blainey, Michael K. Gschwind, James L. McInnes, Steven J. Munroe
-
Patent number: 8615745Abstract: A code sequence made up multiple instructions and specifying an offset from a base address is identified in an object file. The offset from the base address corresponds to an offset location in a memory configured for storing an address of a variable or data. The identified code sequence is configured to perform a memory reference function or a memory address computation function. It is determined that the offset location is within a specified distance of the base address and that a replacement of the identified code sequence with a replacement code sequence will not alter program semantics. The identified code sequence in the object file is replaced with the replacement code sequence that includes a no-operation (NOP) instruction or having fewer instructions than the identified code sequence. Linked executable code is generated based on the object file and the linked executable code is emitted.Type: GrantFiled: October 3, 2011Date of Patent: December 24, 2013Assignee: International Business Machines CorporationInventors: Robert J. Blainey, Michael Gschwind, James L. McInnes, Steven J. Munroe
-
Patent number: 8612958Abstract: A compiler, which corresponds to a recent processor having a multithread function, that enables execution of efficient instruction scheduling and allows a programmer to control the instruction scheduling includes: an instruction scheduling directive receiving unit which receives, from a programmer, a directive for specifying an instruction scheduling method; and an instruction scheduling unit which executes, conforming to one of instruction scheduling methods, instruction scheduling of rearranging intermediate codes corresponding to the source program. The instruction scheduling unit selects one of instruction scheduling methods according to the directive received by the instruction scheduling directive receiving unit, and executes instruction scheduling conforming to the selected instruction scheduling method.Type: GrantFiled: June 17, 2011Date of Patent: December 17, 2013Assignee: Panasonic CorporationInventors: Taketo Heishi, Shohei Michimoto, Teruo Kawabata
-
Patent number: 8612955Abstract: A dataflow instruction set architecture and execution model, referred to as WaveScalar, which is designed for scalable, low-complexity/high-performance processors, while efficiently providing traditional memory semantics through a mechanism called wave-ordered memory. Wave-ordered memory enables “real-world” programs, written in any language, to be run on the WaveScalar architecture, as well as any out-of-order execution unit. Because it is software-controlled, wave-ordered memory can be disabled to obtain greater parallelism. Wavescalar also includes a software-controlled tag management system.Type: GrantFiled: January 22, 2008Date of Patent: December 17, 2013Assignee: University of WashingtonInventors: Mark H. Oskin, Steven J. Swanson, Susan J. Eggers
-
Patent number: 8612957Abstract: A computer implemented method for scheduling multithreaded programming instructions based on the dependency graph wherein the dependency graph organizes the programming instruction logically based on blocks, nodes, and super blocks and wherein the programming instructions could be executed outside of a critical section may be executed outside of the critical section by inserting dependency relationship in the dependency graph.Type: GrantFiled: January 26, 2006Date of Patent: December 17, 2013Assignee: Intel CorporationInventors: Xiaofeng Guo, Jinquan Dai, Long Li
-
Patent number: 8584106Abstract: A runtime system implemented in accordance with the present invention provides an application platform for parallel-processing computer systems. Such a runtime system enables users to leverage the computational power of parallel-processing computer systems to accelerate/optimize numeric and array-intensive computations in their application programs. This enables greatly increased performance of high-performance computing (HPC) applications.Type: GrantFiled: February 9, 2012Date of Patent: November 12, 2013Assignee: Google Inc.Inventors: Matthew N. Papakipos, Brian K. Grant, Christopher G. Demetriou, Morgan S. McGuire
-
Patent number: 8572557Abstract: To obtain a program code generation support device, method, and the like, capable of generating a new program code, in particular, generating in accordance with an incorporating apparatus, by performing a further optimization on a program code. The device includes storage means for storing as data, an optimization rule that is composed of a conversion condition for converting data of a program code and a conversion content thereof, and code optimization means that includes a code analysis unit for analyzing the program code, a condition search unit for search a part matching the conversion condition in the program code through a collation with the optimization rule stored in the storage means on the basis of the analyzed program code, and an optimization unit for generating data of a new program code by converting the part matching the conversion condition on the basis of the conversion content.Type: GrantFiled: August 1, 2011Date of Patent: October 29, 2013Assignee: Mitsubishi Electric CorporationInventors: Takahiro Ito, Shigeki Suzuki, Yoshiko Ochiai, Noriyuki Kushiro, Yoshiaki Koizumi
-
Patent number: 8561047Abstract: When a modification is applied to a statically linked executable program file, in the executable program file, an old object is replaced with a new object by adding the new object to a bottom of already-existing objects without changing the location of the old object, and the reference relationship of symbols among objects is updated and resolved and thereby a modification is applied.Type: GrantFiled: January 18, 2011Date of Patent: October 15, 2013Assignee: Fujitsu LimitedInventors: Masaki Kobayashi, Masanori Iwazaki
-
Patent number: 8561046Abstract: A system and method for automatically parallelizing a computer program for multi-threaded execution. A compiler identifies and parallelizes non-DOALL parallel regions, such as loops, within a computer program. The compiler determines enhanced helper thread instructions based upon the main body instructions of the non-DOALL region. These helper thread instructions are inserted ahead of the main body instructions within each of the plurality of threads, rather than within a single main thread. Next, synchronization instructions are inserted in one or more threads such that the main body of work of each thread is performed in a pipelined manner. The helper thread instructions within each thread may reduce the total execution time of each thread.Type: GrantFiled: September 14, 2009Date of Patent: October 15, 2013Assignee: Oracle America, Inc.Inventors: Yonghong Song, Spiros Kalogeropulos, Partha P. Tirumalai
-
Patent number: 8555267Abstract: A mechanism for performing register allocation based on priority spills and assignments is disclosed. A method of embodiments of the invention includes repetitively detecting fat points during a compilation process of a software program running on a virtual machine of a computer system, each fat point representing a program point having a high register pressure, the high register pressure occurs when a number of live program variables of the software program living at a given program point of the software program is greater than a number of available processor registers of the computer system. The method further includes choosing a fat point with a highest register pressure, selecting a live program variable having a lowest priority at the chosen fat point, and spilling the lowest priority live program variable to memory of the computer system.Type: GrantFiled: March 3, 2010Date of Patent: October 8, 2013Assignee: Red Hat, Inc.Inventor: Vladimir Makarov
-
Patent number: 8549507Abstract: A loop coalescing method and a loop coalescing device are disclosed. The loop coalescing method comprises removing an inner-most loop from among nested loops, so that an outer operation provided outside of the inner-most loop is performed when a condition of a conditional statement is satisfied, generating a guard code by applying an if-conversion method to the conditional statement, and converting a guard by using an instruction calculating the guard of the guard code, the instruction calculating the guard using a register where information related to a period of time corresponding to the number of iterations of the inner-most loop is stored.Type: GrantFiled: August 22, 2007Date of Patent: October 1, 2013Assignee: Samsung Electronics Co., Ltd.Inventors: Hee Seok Kim, Hong-Seok Kim, Chang-Woo Baek, Jeongwook Kim
-
Patent number: 8549508Abstract: A mechanism for performing instruction scheduling based on register pressure sensitivity is disclosed. A method of embodiments of the invention includes performing a preliminary register pressure minimization on program points during a compilation process of a software program running on a virtual machine of a computer system. The method further includes calculating a register pressure at each of the program points, detecting an instruction to be scheduled, and performing instruction scheduling of the instruction based on a current register pressure at a current scheduling point and potential register pressures at subsequent scheduling points.Type: GrantFiled: March 3, 2010Date of Patent: October 1, 2013Assignee: Red Hat, Inc.Inventor: Vladimir Makarov
-
Patent number: 8543992Abstract: A method of compiling code that includes partitioning instructions in the code among a plurality of processors based on memory access latency associated with the instructions is disclosed. According to one aspect of the invention, partitioning instructions includes partitioning memory access dependence chains. Other embodiments are described and claimed.Type: GrantFiled: December 17, 2005Date of Patent: September 24, 2013Assignee: Intel CorporationInventors: Xiaodan Jiang, Jinquan Dai
-
Patent number: 8516466Abstract: Various embodiments for optimizing automated system-managed storage (SMS) operations in a computing storage environment. An execution of at least one automatic class selection (ACS) routine is monitored to determine at least one frequently used instruction. The ACS routine is modified for at least one predetermined time interval. The at least one frequently used instruction is moved to a higher execution priority of the modified ACS routine.Type: GrantFiled: June 30, 2010Date of Patent: August 20, 2013Assignee: International Business Machines CorporationInventors: Harold S. Huber, David C. Reed, Max D. Smith
-
Patent number: 8505002Abstract: A data processing system is provided having a processor and analysing circuitry for identifying a SIMD instruction associated with a first SIMD instruction set and replacing it by a functionally-equivalent scalar representation and marking that functionally-equivalent scalar representation. The marked functionally-equivalent scalar representation is dynamically translated using translation circuitry upon execution of the program to generate one or more corresponding translated instructions corresponding to a instruction set architecture different from the first SIMD architecture corresponding to the identified SIMD instruction.Type: GrantFiled: September 27, 2007Date of Patent: August 6, 2013Assignees: ARM Limited, The Regents of the University of MichiganInventors: Sami Yehia, Krisztian Flautner, Nathan Clark, Amir Hormati, Scott Mahlke
-
Patent number: 8473935Abstract: Pre-compiling postdominating functions. Some embodiments may be practiced in a computing environment including a runtime compilation. For example one method includes acts for compiling functions. The method includes determining that a function of an application has been called. A control flow graph is used to determine one or more postdominance relationships between the function and one or more other functions. The one or more other functions are assigned to be pre-compiled based on the postdominance relationship.Type: GrantFiled: April 21, 2008Date of Patent: June 25, 2013Assignee: Microsoft CorporationInventor: Matthew B. Grice
-
Patent number: 8458682Abstract: System and method for converting a class oriented data flow program to a structure oriented data flow program. A first data flow program is received, where the first data flow program is an object oriented program comprising instances of one or more classes, and wherein the first data flow program is executable to perform a first function. The first data flow program is automatically converted to a second data flow program, where the second data flow program does not include the instances of the one or more classes, and where the second data flow program is executable to perform the first function. The second data flow program is stored on a computer memory, where the second data flow program is configured to be deployed to a device, e.g., a programmable hardware element, and where the second data flow program is executable on the device to perform the first function.Type: GrantFiled: April 27, 2009Date of Patent: June 4, 2013Assignee: National Instruments CorporationInventors: Stephen R. Mercer, Akash B. Bhakta, Matthew E. Novacek
-
Patent number: 8453134Abstract: Provided are a method, system, and article of manufacture improving data locality and parallelism by code replication and array contraction. Source code including an array of elements referenced using at least two indices is processed. The array is nested within multiple loops, wherein at least two of the loops perform iterations with respect to the indices of the array, wherein the index incremented in at least one innermost loop of the loops does not comprise a leftmost index in the array. The source code is transformed to object code by performing operations including fusing at least two innermost loops of the loops in object code generated by compiling the source code by replicating statements from at least one of the innermost loops into a fused innermost loop and performing loop interchange in the object code to have the fused innermost loop provide iterations with respect to the leftmost index in the array.Type: GrantFiled: June 4, 2008Date of Patent: May 28, 2013Assignee: Intel CorporationInventors: John L. Ng, Alexander Y. Ostanevich, Alexander L. Sushentsov
-
Patent number: 8453135Abstract: A compiler selects a nested loop within software code that includes an outer loop and an inner loop. The outer loop includes an outer induction variable and the inner loop includes an inner induction variable. The compiler identifies a computation included in the nested loop that generates an irregular array access, which includes an expression of both the outer induction variable and the inner induction variable. Next, the compiler identifies a redundant calculation for the computation based upon the outer induction variable and the inner induction variable, and generates a temporary variable to correspond with the redundant calculation. The compiler replaces the computation with the temporary variable in the nested loop and, in turn, compiles the nested loop with the included temporary variable.Type: GrantFiled: March 11, 2010Date of Patent: May 28, 2013Assignee: Freescale Semiconductor, Inc.Inventor: Abderrazek Zaafrani
-
Patent number: 8448156Abstract: A runtime system implemented in accordance with the present invention provides an application platform for parallel-processing computer systems. Such a runtime system enables users to leverage the computational power of parallel-processing computer systems to accelerate/optimize numeric and array-intensive computations in their application programs. This enables greatly increased performance of high-performance computing (HPC) applications.Type: GrantFiled: February 27, 2012Date of Patent: May 21, 2013Assignee: Googe Inc.Inventors: Christopher G. Demetriou, Matthew N. Papakipos
-
Patent number: 8443349Abstract: A runtime system implemented in accordance with the present invention provides an application platform for parallel-processing computer systems. Such a runtime system enables users to leverage the computational power of parallel-processing computer systems to accelerate/optimize numeric and array-intensive computations in their application programs. This enables greatly increased performance of high-performance computing (HPC) applications.Type: GrantFiled: February 9, 2012Date of Patent: May 14, 2013Assignee: Google Inc.Inventors: Matthew N. Papakipos, Brian K. Grant, Morgan S. McGuire, Christopher G. Demetriou
-
Publication number: 20130111453Abstract: Embodiments of the invention provide systems and methods for throughput-aware software pipelining in compilers to produce optimal code for single-thread and multi-thread execution on multi-threaded systems. A loop is identified within source code as a candidate for software pipelining. An attempt is made to generate pipelined code (e.g., generate an instruction schedule and a set of register assignments) for the loop in satisfaction of throughput-aware pipelining criteria, like maximum register count, minimum trip count, target core pipeline resource utilization, maximum code size, etc. If the attempt fails to generate code in satisfaction of the criteria, embodiments adjust one or more settings (e.g., by reducing scalarity or latency settings being used to generate the instruction schedule).Type: ApplicationFiled: October 31, 2011Publication date: May 2, 2013Applicant: Oracle International CorporationInventors: Spiros Kalogeropulos, Partha Tirumalai
-
Patent number: 8418156Abstract: Generally, the present disclosure provides systems and methods to generate a two-stage commit (TSC) region which has two separate commit stages. Frequently executed code may be identified and combined for the TSC region. Binary optimization operations may be performed on the TSC region to enable the code to run more efficiently by, for example, reordering load and store instructions. In the first stage, load operations in the region may be committed atomically and in the second stage, store operations in the region may be committed atomically.Type: GrantFiled: December 16, 2009Date of Patent: April 9, 2013Assignee: Intel CorporationInventors: Cheng Wang, Youfeng Wu
-
Patent number: RE45199Abstract: A compiler apparatus, which can perform software pipelining optimization that has a considerable effect of reducing the number of execution cycles taken to complete a loop process, converts a source program into a machine program for a processor which is capable of parallel processing. The compiler apparatus is composed of: a parsing unit operable to parse the source program and then to convert the source program into an intermediate program which is described in an intermediate language; an optimization unit operable to optimize the intermediate program; and a conversion unit operable to convert the optimized intermediate program into the machine language program, wherein the optimization unit is operable to execute software pipelining, by inserting a transfer instruction, which is used for transferring data between operands, into a loop process included in the intermediate program so that a data dependence relation is changed.Type: GrantFiled: September 14, 2012Date of Patent: October 14, 2014Assignee: Panasonic CorporationInventors: Shohei Michimoto, Taketo Heishi, Hajime Ogawa, Teruo Kawabata