Including Scheduling Instructions Patents (Class 717/161)
-
Patent number: 7765536Abstract: A Veil program analyzes the source code and data of a target program and determines how best to distribute the target program and data among the processors of a multi-processor computing system. The Veil program analyzes source code loops, data sizes and types to prepare a set of distribution attempts, whereby each distribution is run under a run-time evaluation wrapper and evaluated to determine the optimal distribution.Type: GrantFiled: December 21, 2005Date of Patent: July 27, 2010Assignee: Management Services Group, Inc.Inventors: Robert Stephen Gordy, Terry Spitzer
-
Patent number: 7761667Abstract: A mechanism is provided that identifies instructions that access storage and may be candidates for catch prefetching. The mechanism augments these instructions so that any given instance of the instruction operates in one of four modes, namely normal, unexecuted, data gathering, and validation. In the normal mode, the instruction merely performs the function specified in the software runtime environment. An instruction in unexecuted mode, upon the next execution, is placed in data gathering mode. When an instruction in the data gathering mode is encountered, the mechanism of the present invention collects data to discover potential fixed storage access patterns. When an instruction is in validation mode, the mechanism of the present invention validates the presumed fixed storage access patterns.Type: GrantFiled: August 12, 2008Date of Patent: July 20, 2010Assignee: International Business Machines CorporationInventors: Christopher Michael Donawa, Allan Henry Kielstra
-
Patent number: 7752611Abstract: Various embodiments that may be used in performing speculative code motion for memory latency hiding are disclosed. One embodiment comprises extracting an asynchronous signal from a memory access instruction in a program to represent a latency of the memory access instruction, and generating a wait instruction to wait the asynchronous signal.Type: GrantFiled: December 10, 2005Date of Patent: July 6, 2010Assignee: Intel CorporationInventors: Long Li, Jinquan Dai, Zhiyuan Lv
-
Patent number: 7747993Abstract: A method of ordering instructions. The method can include placing a first instruction that consumes a value of an object before a second instruction that produces the value of the object such that the first instruction is processed before the second instruction and a physical location is allocated to the value of the object upon processing the first instruction.Type: GrantFiled: December 30, 2004Date of Patent: June 29, 2010Assignee: Michigan Technological UniversityInventor: Soner Onder
-
Patent number: 7730470Abstract: A system for binary code instrumentation to reduce effective memory latency comprises a processor and memory coupled to the processor. The memory comprises program instructions executable by the processor to implement a code analyzer configured to analyze an instruction stream of compiled code executable at an execution engine to identify, for a given memory reference instruction in the stream that references data at a memory address calculated during an execution of the instruction stream, an earliest point in time during the execution at which sufficient data is available at the execution engine to calculate the memory address. The code analyzer generates an indication of whether the given memory reference instruction is suitable for a prefetch operation based on a difference in time between the earliest point in time and a time at which the given memory reference instruction is executed during the execution.Type: GrantFiled: February 27, 2006Date of Patent: June 1, 2010Assignee: Oracle America, Inc.Inventors: Ilya A. Sharapov, Andrew J. Over
-
Patent number: 7712091Abstract: A method and system for optimizing the execution of a software loop is provided. The method involves the determination of an edge in a critical recurrence cycle in the software loop. The edge is a dependency link between two instructions and contains a dependee and a dependent. The dependee is an instruction that produces a result, and the dependent is an instruction that uses the result. The method further involves performing predicate promotion of at least one of the dependee and the dependent if one or more pre-determined conditions are met.Type: GrantFiled: September 30, 2005Date of Patent: May 4, 2010Assignee: Intel CorporationInventors: Kalyan Muthukumar, Robyn A. Sampson, Daniel Lavery
-
Publication number: 20100107147Abstract: A compiler allocates an unroll_group_number conferred based on a sequence in which a loop body is replicated by loop unrolling to each loop body during loop unrolling based on the optimized number of loop unrolling. The allocated unroll_group_number is added to each instruction included in each loop body. A priority of an instruction is adjusted based on the allocated unroll_group_number during instruction scheduling.Type: ApplicationFiled: September 11, 2009Publication date: April 29, 2010Inventor: Byung-chang Cha
-
Patent number: 7702856Abstract: The prefetch distance to be used by a prefetch instruction may not always be correctly calculated using compile-time information. In one embodiment, the present invention generates prefetch distance calculation code to dynamically calculate a prefetch distance used by a prefetch instruction at run-time.Type: GrantFiled: November 9, 2005Date of Patent: April 20, 2010Assignee: Intel CorporationInventors: Rakesh Krishnaiyer, Somnath Ghosh, Abhay Kanhere
-
Patent number: 7698696Abstract: A compiler comprises an analysis unit that detects directives (options and pragmas) from a user to the compiler, an optimization unit that is made up of a processing unit (a global region allocation unit, a software pipelining unit, a loop unrolling unit, a “if” conversion unit, and a pair instruction generation unit) that performs individual optimization processing designated by options and pragmas from a user, following the directives and the like from the analysis unit, etc. The global region allocation unit performs optimization processing, following designation of the maximum data size of variables to be allocated to a global region, designation of variables to be allocated to the global region, and options and pragmas regarding designation of variables not to be allocated in the global region.Type: GrantFiled: June 30, 2003Date of Patent: April 13, 2010Assignee: Panasonic CorporationInventors: Hajime Ogawa, Taketo Heishi, Toshiyuki Sakata, Shuichi Takayama, Shohei Michimoto, Tomoo Hamada, Ryoko Miyachi
-
Patent number: 7689980Abstract: Linear transformations of statements in code are performed to generate linear expressions associated with the statements. Parallel code is generated using the linear expressions. Generating the parallel code includes splitting the computation-space of the statements into intervals and generating parallel code for the intervals.Type: GrantFiled: September 30, 2005Date of Patent: March 30, 2010Assignee: Intel CorporationInventors: Zhao Hui Du, Shih-wei Liao, Gansha Wu, Guei-Yuan Lueh
-
Patent number: 7685588Abstract: Embodiments of the present invention provide for platform independence, low intrusiveness, and optimal memory usage of the binary instrumentation process by means of employing one procedure (interceptor function) implemented in a high-level programming language to intercept an arbitrary number of functions or blocks of code. Each time a function or code block needs to be intercepted a new copy of the procedure from a provided memory region may be associated with the address of the function or block of code by means of a memory region descriptor and an intercepted function address table. Once activated, the interceptor function may retrieve its current address and, by searching memory region descriptors, determine the region the current address belongs to; the region's base address may then be obtained. A reference to the intercepted function address table may be fetched from the region descriptor; and an index to the intercepted function address table may be computed.Type: GrantFiled: March 28, 2005Date of Patent: March 23, 2010Assignee: Intel CorporationInventors: Sergey N. Zheltov, Stanislav V. Bratanov, Dmitry Eremin
-
Publication number: 20100070956Abstract: Methods, apparatus and computer software product for source code optimization are provided. In an exemplary embodiment, a first custom computing apparatus is used to optimize the execution of source code on a second computing apparatus. In this embodiment, the first custom computing apparatus contains a memory, a storage medium and at least one processor with at least one mufti-stage execution unit. The second computing apparatus contains at least two multi-stage execution units that avow for parallel execution of tasks. The first custom computing apparatus optimizes the code for both parallelism and locality of operations on the second computing apparatus. This Abstract is provided for the sole purpose of complying with the Abstract requirement rules. This Abstract is submitted with the explicit understanding that it will not be used to interpret or to limit the scope or the meaning of the claims.Type: ApplicationFiled: September 16, 2009Publication date: March 18, 2010Inventors: Allen Leung, Nicolas T. Vasilache, Benoit Meister, Richard A. Lethin
-
Patent number: 7681188Abstract: One embodiment of the present invention provides a system that facilitates locked prefetch scheduling in general cyclic regions of a computer program. The system operates by first receiving a source code for the computer program and compiling the source code into intermediate code. The system then performs a trace detection on the intermediate code. Next, the system inserts prefetch instructions and corresponding locks into the intermediate code. Finally, the system generates executable code from the intermediate code, wherein a lock for a given prefetch instruction prevents subsequent prefetches from being issued until the data value returns for the given prefetch instruction.Type: GrantFiled: April 29, 2005Date of Patent: March 16, 2010Assignee: Sun Microsystems, Inc.Inventors: Partha P. Tirumalai, Spiros Kalogeropulos, Yonghong Song
-
Patent number: 7673296Abstract: A method of scheduling optional instructions in a compiler targets a processor. The scheduling includes indicating a limit on the additional processor computations that are available for executing an optional code, generating one or more required instructions corresponding to a source code and one or more optional instructions corresponding to the optional code used with the source code and scheduling all of the one or more required instructions with as many of the one or more optional instructions as possible without exceeding the indicated limit on the additional processor computations for executing the optional code.Type: GrantFiled: July 28, 2004Date of Patent: March 2, 2010Assignee: Hewlett-Packard Development Company, L.P.Inventors: Jean-Francois Collard, Alan H. Karp
-
Patent number: 7669194Abstract: A mechanism for minimizing effective memory latency without unnecessary cost through fine-grained software-directed data prefetching using integrated high-level and low-level code analysis and optimizations is provided. The mechanism identifies and classifies streams, identifies data that is most likely to incur a cache miss, exploits effective hardware prefetching to determine the proper number of streams to be prefetched, exploits effective data prefetching on different types of streams in order to eliminate redundant prefetching and avoid cache pollution, and uses high-level transformations with integrated lower level cost analysis in the instruction scheduler to schedule prefetch instructions effectively.Type: GrantFiled: August 26, 2004Date of Patent: February 23, 2010Assignee: International Business Machines CorporationInventors: Roch Georges Archambault, Robert James Blainey, Yaoqing Gao, Allan Russell Martin, James Lawrence McInnes, Francis Patrick O'Connell
-
Patent number: 7657882Abstract: A dataflow instruction set architecture and execution model, referred to as WaveScalar, which is designed for scalable, low-complexity/high-performance processors, while efficiently providing traditional memory semantics through a mechanism called wave-ordered memory. Wave-ordered memory enables “real-world” programs, written in any language, to be run on the WaveScalar architecture, as well as any out-of-order execution unit. Because it is software-controlled, wave-ordered memory can be disabled to obtain greater parallelism. Wavescalar also includes a software-controlled tag management system.Type: GrantFiled: January 21, 2005Date of Patent: February 2, 2010Assignee: University of WashingtonInventors: Mark H. Oskin, Steven J. Swanson, Susan J. Eggers
-
Patent number: 7657880Abstract: The latencies associated with retrieving instruction information for a main thread are decreased through the use of a simultaneous helper thread. The helper thread is permitted to execute Store instructions. Store blocker logic operates to prevent data associated with a Store instruction in a helper thread from being committed to memory. Dependence blocker logic operates to prevent data associated with a Store instruction in a speculative helper thread from being bypassed to a Load instruction in a non-speculative thread.Type: GrantFiled: August 1, 2003Date of Patent: February 2, 2010Assignee: Intel CorporationInventors: Hong Wang, Tor Aamodt, Per Hammarlund, John Shen, Xinmin Tian, Milind Girkar, Perry Wang, Steve Shih-wei Liao
-
Patent number: 7657883Abstract: A dispatch scheduler in a multithreading microprocessor is disclosed. Each of N concurrently executing threads has one of P priorities. P N-bit round-robin vectors are generated, each being a 1-bit left-rotated and subsequently sign-extended version of an N-bit 1-hot input vector indicating the last thread selected for dispatching at the priority. N P-input muxes each receive a corresponding one of the N bits of each of the P round-robin vectors and selects the input specified by the thread priority. Selection logic selects an instruction for dispatching from the thread having a dispatch value greater than or equal to any of the threads left thereof in the N-bit input vectors. The dispatch value of each of the threads comprises a least-significant bit equal to the corresponding P-input mux output, a most-significant bit that is true if the instruction is dispatchable, and middle bits comprising the priority of the thread.Type: GrantFiled: March 22, 2005Date of Patent: February 2, 2010Assignee: MIPS Technologies, Inc.Inventor: Michael Gottlieb Jensen
-
Patent number: 7647586Abstract: A system and method for providing exceptional flow control in protected code through watchpoints is described. Code is generated. The generated code includes a sequence of normal operations and is subject to protection against copying during execution of the generated code. Execution points within the generated code are identified. A watchpoint corresponding to each of the execution points is set. An exception handler associated with each watchpoint is defined and includes operations exceptional to the normal operations sequence that are performed upon a triggering of each watchpoint during execution of the generated code.Type: GrantFiled: August 13, 2004Date of Patent: January 12, 2010Assignee: Sun Microsystems, Inc.Inventors: Dean R. E. Long, Christopher J. Plummer, Nedim Fresko
-
Patent number: 7647473Abstract: An instruction processing method for checking an arrangement of basic instructions in a very long instruction word (VLIW) instruction, suitable for language processing systems, an assembler and a compiler, used for processors which execute variable length VLIW instructions designed based on variable length VLIW architecture.Type: GrantFiled: January 24, 2002Date of Patent: January 12, 2010Assignee: Fujitsu LimitedInventors: Teruhiko Kamigata, Hideo Miyake
-
Patent number: 7631305Abstract: Methods and products for processing a software kernel of instructions are disclosed. The software kernel has stages representing a loop nest. The software kernel is processed by partitioning iterations of an outermost loop into groups with each group representing iterations of the outermost loop, running the software kernel and rotating a register file for each stage of the software kernel preceding an innermost loop to generate code to prepare for filling and executing instructions in software pipelines for a current group, running the software kernel for each stage of the software kernel in the innermost loop to generate code to fill the software pipelines for the current group with the register file being rotated after at least one run of the software kernel for the innermost loop, and repeatedly running the software kernel to unroll inner loops to generate code to further fill the software pipelines for the current group.Type: GrantFiled: September 20, 2004Date of Patent: December 8, 2009Assignee: University of DelawareInventors: Hongbo Rong, Guang R. Gao, Alban Douillet, Ramaswamy Govindarajan
-
Patent number: 7617496Abstract: A macroscalar processor architecture is described herein. In one embodiment, a processor receives instructions of a program loop having a vector block and a sequence block intended to be executed after the vector block, where the processor includes multiple slices and each of the slices is capable of executing an instruction of an iteration of the program loop substantially in parallel. For each iteration of the program loop, the processor executes an instruction of the sequence block using one of the slices while executing instructions of the vector block using a remainder of the slices substantially in parallel. Other methods and apparatuses are also described.Type: GrantFiled: September 1, 2005Date of Patent: November 10, 2009Assignee: Apple Inc.Inventor: Jeffry E. Gonion
-
Patent number: 7617495Abstract: Disclosed are embodiments of a compiler, methods, and system for resource-aware scheduling of instructions. A list scheduling approach is augmented to take into account resource constraints when determining priority for scheduling of instructions. Other embodiments are also described and claimed.Type: GrantFiled: March 24, 2004Date of Patent: November 10, 2009Assignee: Intel CorporationInventors: Kalyan Muthukumar, Daniel M. Lavery, Gerolf F. Hoflehner, Chu-cheow Lim, Jean-Francois Collard
-
Patent number: 7613599Abstract: An integrated design environment (IDE) is disclosed for forming virtual embedded systems. The IDE includes a design language for forming finite state machine models of hardware components that are coupled to simulators of processor cores, preferably instruction set accurate simulators. A software debugger interface permits a software application to be loaded and executed on the virtual embedded system. A virtual test bench may be coupled to the simulation to serve as a human-machine interface. In one embodiment, the IDE is provided as a web-based service for the evaluation, development and procurement phases of an embedded system project. IP components, such as processor cores, may be evaluated using a virtual embedded system. In one embodiment, a virtual embedded system is used as an executable specification for the procurement of a good or service related to an embedded system.Type: GrantFiled: June 1, 2001Date of Patent: November 3, 2009Assignee: Synopsys, Inc.Inventors: Stephen L Bade, Shay Ben-Chorin, Paul Caamano, Marcelo E Montoreano, Ani Taggu, Filip C Theon, Dean C Wills
-
Publication number: 20090254892Abstract: A compiling method for compiling software which is adapted to output an intermediate result at a given timing, the compiling method includes extracting, by a computer, a process block related to parallel processing and conditional branch from a processing sequence included in a source code of a software which is processed time-sequentially, and generating, by the computer, an execution code by restructuring the process block that is extracted.Type: ApplicationFiled: June 10, 2009Publication date: October 8, 2009Applicant: FUJITSU LIMITEDInventor: Koichiro Yamashita
-
Publication number: 20090254895Abstract: Prefetching irregular memory references into a software controlled cache is provided. A compiler analyzes source code to identify at least one of a plurality of loops that contain an irregular memory reference. The compiler determines if the irregular memory reference within the at least one loop is a candidate for optimization. Responsive to an indication that the irregular memory reference may be optimized, the compiler determines if the irregular memory reference is valid for prefetching. Responsive to an indication that the irregular memory reference is valid for prefetching, a store statement for an address of the irregular memory reference is inserted into the at least one loop. A runtime library call is inserted into a prefetch runtime library for the irregular memory reference. Data associated with the irregular memory reference is prefetched into the software controlled cache when the runtime library call is invoked.Type: ApplicationFiled: April 4, 2008Publication date: October 8, 2009Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Tong Chen, Marc Gonzalez tallada, Zehra N. Sura, Tao Zhang
-
Patent number: 7590976Abstract: The present invention relates a compiler program, a computer-readable storage medium storing such a compiler program, a compiling method and a compiling unit, and an object thereof is to automatically generate a reentrant object program. In order to accomplish this object, an address saving program generator 16a generates an address saving program for saving a data area address of a calling program module; an address setting program generator 16b generates an address setting program for setting a data area address of an other program module; a transferring program generator 16c generates a transferring program for the transfer from the calling program module to the other program module; an address resetting program generator 16d generates an address resetting program for reading and resetting the saved data area address; and an accessing program generator 16e generates an accessing program for accessing a data area for the other program module using a relative address from the set data area address.Type: GrantFiled: December 26, 2003Date of Patent: September 15, 2009Assignee: Panasonic CorporationInventors: Masaki Kawai, Takuji Kawamoto, Shusuke Haruna, Yutaka Fujihara
-
Publication number: 20090217253Abstract: A computer program is speculatively parallelized with transactional memory by scoping program variables at compile time, and inserting code into the program at compile time. Determinations of the scoping can be based on whether scalar variables being scoped are involved in inter-loop non-reduction data dependencies, are used outside loops in which they were defined, and at what point in a loop a scalar variable is defined. The inserted code can include instructions for execution at a run time of the program to determine loop boundaries of the program, and issue checkpoint instructions and commit instructions that encompass transaction regions in the program. A transaction region can include an original function of the program and a spin-waiting loop with a non-transactional load, wherein the spin-waiting loop is configured to wait for a previous thread to commit before the current transaction commits.Type: ApplicationFiled: February 22, 2008Publication date: August 27, 2009Applicant: Sun Microsystems, Inc.Inventors: Yonghong Song, Xiangyun Kong, Spiros Kalogeropulos, Partha P. Tirumalai
-
Patent number: 7581215Abstract: We present a technique to perform dependence analysis on more complex array subscripts than the linear form of the enclosing loop indices. For such complex array subscripts, we decouple the original iteration space and the dependence test iteration space and link them through index-association functions. The dependence analysis is performed in the dependence test iteration space to determine whether the dependence exists in the original iteration space. The dependence distance in the original iteration space is determined by the distance in the dependence test iteration space and the property of index-association functions. For certain non-linear expressions, we show how to transform it to a set of linear expressions equivalently. The latter can be used in dependence test with traditional techniques. We also show how our advanced dependence analysis technique can help parallelize some otherwise hard-to-parallelize loops.Type: GrantFiled: June 24, 2004Date of Patent: August 25, 2009Assignee: Sun Microsystems, Inc.Inventors: Yonghong Song, Xiangyun Kong
-
Patent number: 7581210Abstract: One embodiment disclosed relates to a method of compiling a program to be executed on a target microprocessor with multiple functional units of a same type. The method includes opportunistically scheduling a redundant operation on one of the functional units that would otherwise be idle during a cycle.Type: GrantFiled: September 10, 2003Date of Patent: August 25, 2009Assignee: Hewlett-Packard Development Company, L.P.Inventors: Dale John Shidla, Andrew Harvey Barr, Ken Gary Pomaranski
-
Patent number: 7571435Abstract: A method (and structure) for executing linear algebra subroutines, includes, for an execution code controlling operation of a floating point unit (FPU) performing the linear algebra subroutine execution, unrolling instructions to preload data into a floating point register (FReg) of the FPU. The unrolling generates an instruction to load data into the FReg and the instruction is inserted into a sequence of instructions that execute the linear algebra subroutine on the FPU.Type: GrantFiled: September 29, 2003Date of Patent: August 4, 2009Assignee: International Business Machines CorporationInventors: Fred Gehrung Gustavson, John A. Gunnels
-
Patent number: 7565343Abstract: Fixed-length data (560) contained in a database (560) are segmented into a number of pieces of data that are searchable at a time and searching is performed at high speed. As means for it, a pointer table (500), a secondary pointer table, a local table, and a fixed-length-data table are provided, and when more segmentation is required, a table having a numeric-value comparing function is further provided. As means for performing efficient configuration/management of the tables and for performing management that does not interfere with a search operation, an empty-house table (700), an empty-room table (720), a room-management table (740), and a structure-management table (760) may be provided.Type: GrantFiled: March 31, 2004Date of Patent: July 21, 2009Assignee: IPT CorporationInventor: Shinpei Watanabe
-
Patent number: 7546592Abstract: A method, computer program product, and a data processing system for scheduling instructions in a data processing system are provided. Dependencies among a plurality of nodes are analyzed to determine if any of the plurality of nodes uses a constrained resource. Each of the plurality of nodes represents an instruction in a set of instructions. A subset of the plurality of nodes is designated as resource-constrained nodes. An attempt is made to generate a schedule with the subset of the plurality of nodes scheduled with priority with respect to any of the plurality of nodes not included in the subset.Type: GrantFiled: July 21, 2005Date of Patent: June 9, 2009Assignee: International Business Machines CorporationInventor: Allan Russell Martin
-
Publication number: 20090138864Abstract: A method and apparatus for automatic second-order predictive commoning is provided by the present invention. During an analysis phase, the intermediate representation of a program code is analyzed to identify opportunities for second-order predictive commoning optimization. The analyzed information is used by the present invention for apply transformations to the program code, such that the number of memory access and the number of computations are reduced for loop iterations and performance of program code is improved.Type: ApplicationFiled: January 30, 2009Publication date: May 28, 2009Applicant: International Business Machines CorporationInventors: Arie Tal, Dina Tal
-
Patent number: 7539884Abstract: A method of power-gating instruction scheduling for leakage power reduction comprises receiving a program, generating a control-flow graph dividing the program into a plurality of blocks, analyzing utilization of power-gated components of a processor executing the program, generating the first power-gating instruction placement comprising power-off instructions and power-on instructions to shut down the inactive power-gated components, generating the second power-gating instruction placement by merging the power-off instructions as one compound power-off instruction and merging the power-on instructions as one compound power-on instruction, and inserting power-gating instructions into the program in accordance with the second power-gating instruction placement.Type: GrantFiled: July 27, 2006Date of Patent: May 26, 2009Assignee: Industrial Technology Research InstituteInventors: Yi-Ping You, Chung Wen Huang, Jeng Kuen Lee, Chi-Lung Wang, Kuo Yu Chuang
-
Patent number: 7523449Abstract: A method for adaptive runtime reconfiguration of a co-processor instruction set, in a computer system with at least a main processor communicatively connected to at least one reconfigurable co-processor, includes the steps of configuring the co-processor to implement an instruction set comprising one or more co-processor instructions, issuing a co-processor instruction to the co-processor, and determining whether the instruction is implemented in the co-processor. For an instruction not implemented in the co-processor instruction set, raising a stall signal to delay the main processor, determining whether there is enough space in the co-processor for the non-implemented instruction, and if there is enough space for said instruction, reconfiguring the instruction set of the co-processor by adding the non-implemented instruction to the co-processor instruction set. The stall signal is cleared and the instruction is executed.Type: GrantFiled: August 23, 2006Date of Patent: April 21, 2009Assignee: International Business Machines CorporationInventors: Sameh W. Asaad, Richard Gerard Hofmann
-
Patent number: 7516481Abstract: A program development supporting apparatus that groups a plurality of events each executed in an information processor to divide the events into a plurality of parallel execution units to be executed in parallel with each other has a directional graph acquisition section that acquires directional graph data expressing each of the plurality of events as a vertex and a restriction on the execution order between two of the plurality of events as a directional branch, an inverse chain partial set extraction section that traces the directional branch from each event in the forward direction to extract from the directional graph data an inverse partial set that is a combination of the events having such a relationship that any one of the events cannot be reached from the other events, and a parallel execution unit assignment section that assigns the plurality of events belonging to the inverse partial set to units different from each other in the parallel execution units.Type: GrantFiled: December 2, 2004Date of Patent: April 7, 2009Assignee: International Business Machines CorporationInventor: Toshiyuki Fujikura
-
Patent number: 7509634Abstract: A translator receives a source code that is described using a process designation (such as a line-by-line process designation, a line data extraction designation, and a broadcast designation) to be performed on line data of an image on a line by line basis, parses and optimizes the source code, and then generates an SIMD macro code that is an intermediate form taking into consideration the use of an SIMD instruction set. A simplifier generates, from the SIMD macro code, a simplified SIMD macro code, namely, a composite macro code into which a series of codes having the relationship between the definition and the reference of the same virtual SIMD register is organized. A machine code generator generates, from the simplified SIMD macro code, a machine code that efficiently uses an SIMD instruction.Type: GrantFiled: November 12, 2003Date of Patent: March 24, 2009Assignee: NEC CorporationInventor: Shorin Kyo
-
Patent number: 7506331Abstract: A method, apparatus, and computer instructions for processing instructions. A data dependency graph is built. The data dependency graph is analyzed for recurrences, and unpipelined instructions that lie outside of the recurrences are expanded.Type: GrantFiled: August 30, 2004Date of Patent: March 17, 2009Assignee: International Business Machines CorporationInventors: Roch Georges Archambault, Robert Frederick Enenkel, Robert William Hay, Allan Russell Martin, James Lawrence McInnes, Ronald Ian McIntosh, Mark Peter Mendell
-
Patent number: 7506326Abstract: An improved method, apparatus, and computer instructions for generating instructions to process multiple similar expressions. Parameters are identified for the expressions in the original instructions, to form a set of identified parameters typically including the operations performed, the types of data used, and the data sizes. Each type of execution unit that can execute the instructions needed to process the expressions using the set of identified parameters is identified, wherein a set of identified execution unit types is formed. An execution unit type from the set of identified execution unit types is selected to meet a performance goal. The new instructions are generated for the selected execution unit type to process the expressions, and the original instructions for the expressions are discarded.Type: GrantFiled: March 7, 2005Date of Patent: March 17, 2009Assignee: International Business Machines CorporationInventor: Ronald Ian McIntosh
-
Publication number: 20090064121Abstract: Systems, methods and computer products for implementing shadow versioning to improve data dependence analysis for instruction scheduling. Exemplary embodiments include a method to identify loops within the code to be compiled, for each loop a dependence initializing a matrix, for each loop shadow identifying symbols that are accessed by the loop, examining dependencies, storing, comparing and classifying the dependence vectors, generating new shadow symbols, replacing the old shadow symbols with the new shadow symbols, generating alias relationships between the newly created shadow symbols, scheduling instructions and compiling the code.Type: ApplicationFiled: August 29, 2007Publication date: March 5, 2009Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventors: Roch G. Archambault, Yaoqing Gao, Raul E. Silvera, Peng Zhao
-
Patent number: 7493609Abstract: A method and apparatus for automatic second-order predictive commoning is provided by the present invention. During an analysis phase, the intermediate representation of a program code is analyzed to identify opportunities for second-order predictive commoning optimization. The analyzed information is used by the present invention for apply transformations to the program code, such that the number of memory access and the number of computations are reduced for loop iterations and performance of program code is improved.Type: GrantFiled: August 30, 2004Date of Patent: February 17, 2009Assignee: International Business Machines CorporationInventors: Arie Tal, Dina Tal
-
Patent number: 7493611Abstract: A scheduling algorithm is provided for selecting the placement of instructions with internal slack into a schedule of instructions within a loop. The algorithm achieves this by pinning nodes with internal slack to corresponding nodes on the critical path of the code that have similar properties in terms of the data dependency graph, such as earliest time and latest time. The effect is that nodes with internal slack are more often optimally placed in the schedule, reducing the need for rotating registers or register copy instructions. The benefit of the present invention can primarily be seen when performing instruction scheduling or software pipelining on loop code, but can also apply to other forms of instruction scheduling when greater control of placement of nodes with internal slack is desired.Type: GrantFiled: August 30, 2004Date of Patent: February 17, 2009Assignee: International Business Machines CorporationInventor: Allan Russell Martin
-
Patent number: 7487336Abstract: The present disclosure relates to the allocation of registers the scheduling of instructions, and, more specifically, to the classifying of operands and allocation of registers to local operands.Type: GrantFiled: December 12, 2003Date of Patent: February 3, 2009Assignee: Intel CorporationInventors: Jayashankar Bharadwaj, Tatiana Shpeisman, Ali-Reza Adl-Tabatabai
-
Patent number: 7478379Abstract: A technique of ordering machine instructions to reduce spill code. For each machine instruction that is ready for scheduling, an amount is determined by which the size of a committed set of machine instructions would increase upon the scheduling of the machine instruction. The machine instruction for which the determined amount is smallest is then scheduled. The currently committed instructions may be determined to be the machine instructions that are already scheduled as well as the machine instructions that are descendent from already scheduled machine instructions. The result is that new computations upon which a target processor will embark tend to be deferred. Bit vectors may be employed for efficiency during the assessment of candidate instructions that are ready for scheduling. The technique may be triggered when the risk of registers becoming overcommitted becomes high, as may occur when the number of available processor registers drops below a certain threshold.Type: GrantFiled: May 6, 2004Date of Patent: January 13, 2009Assignee: International Business Machines CorporationInventors: Damien Bonaventure, James Lawrence McInnes
-
Publication number: 20090013316Abstract: A method, apparatus, and computer instructions for scheduling instructions for execution. Identify a series of instructions in a loop, wherein the series of instructions has a cyclic data dependency. Determine whether the series of instructions is a uniform series of instructions. Schedule execution of the uniform series of instructions within the loop to optimize execution of the loop in response to the identified series of instructions being the uniform series of instructions.Type: ApplicationFiled: September 19, 2008Publication date: January 8, 2009Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Allan Russell Martin
-
Publication number: 20080313622Abstract: The present invention relates to a device and a method of accessing a storage of a storage device by reading or writing data to said storage device, wherein said accessing is controlled by an external data controller via a low level in a software stack of said device, and wherein said storage device is accessed without hampering the operation of functionalities at a higher level of said software stack of said device, said device comprises an intermediate storage, and the method comprises the steps of storing commands related to said accessing of data in said intermediate storage as a command queue, executing said commands in said command queue when allowed by a command queue scheduler, said scheduler scheduling in dependence of at least one of the functionalities at said higher level of said software stack. Thereby full control is obtained on storage medium requests by the scheduler.Type: ApplicationFiled: May 26, 2005Publication date: December 18, 2008Inventor: Jozef Pieter Van Gassel
-
Patent number: 7448031Abstract: Methods and apparatus to compile a software program to manage parallel ? caches are disclosed. In an example method, a compiler attempts to schedule a software program such that load instructions in a first set of load instructions has a first predetermine latency greater than the latency of the first cache. The compiler also marks a second set of load instructions with a latency less than the first predetermined latency to access the first cache. The compiler attempts to schedule the software program such that the load instruction in a third set have at least a second predetermined latency greater than the latency of the second cache. The compiler identifies a fourth set of load instructions in the scheduled software program having less than the second predetermined latency and marks the fourth set of load instructions to access the second cache.Type: GrantFiled: December 17, 2003Date of Patent: November 4, 2008Assignee: Intel CorporationInventor: Youfeng Wu
-
Patent number: 7447732Abstract: A system, method and article of manufacture return code management in autonomic systems and more particularly to managing execution of operations in data processing systems on the basis of return code tracking. One embodiment provides a method for managing execution of an operation in a data processing system. The method comprises tracking return codes received from previous executions of the operation in the data processing system, determining an execution behavior of the operation from the tracked return codes, and managing a subsequent execution of the operation on the basis of the determined execution behavior.Type: GrantFiled: May 23, 2003Date of Patent: November 4, 2008Assignee: International Business Machines CorporationInventors: Eric L. Barsness, John M. Santosuosso
-
Patent number: 7444628Abstract: A method, apparatus, and computer instructions for scheduling instructions for execution. Identify a series of instructions in a loop, wherein the series of instructions has a cyclic data dependency. Determine whether the series of instructions is a uniform series of instructions. Schedule execution of the uniform series of instructions within the loop to optimize execution of the loop in response to the identified series of instructions being the uniform series of instructions.Type: GrantFiled: August 30, 2004Date of Patent: October 28, 2008Assignee: International Business Machines CorporationInventor: Allan Russell Martin