Abstract: The described embodiments provide a processor for generating a result vector with incremented or decremented values from an input vector. During operation, the processor receives an input vector and a control vector. The processor then copies a value contained in a selected element of the input vector. The processor next generates the result vector, which involves writing an incremented or decremented value to the result vector, depending on the value of the control vector and the embodiment. In addition, a predicate vector can be used to control the values that are written to the result vector.
Type:
Grant
Filed:
June 30, 2009
Date of Patent:
June 24, 2014
Assignee:
Apple Inc.
Inventors:
Jeffry E. Gonion, Keith E. Diefendorff, Jr.
Abstract: According to one embodiment of the invention, software operating as a state machine may be implemented within a digital device to support out-of-ordering processing of events by the state machine. Upon execution of the software by a processor, the following operations are performed. First, a determination is made if an incoming event is a transition, and if so, if the transition is not a transition associated with the current state of the state machine, but rather, is out-of-order from a predetermined order of transitions supported by the state machine. Upon determining that the transition is out-of-order, a determination is made whether the transition is to a reachable state such as a state prior to the current state of the state machine or to a future state from the current state. If so, the transition is allowed to be undertaken.
Type:
Grant
Filed:
December 23, 2009
Date of Patent:
June 24, 2014
Assignee:
Drumright Group, LLC.
Inventors:
Michael Allen Latta, Christian W. Stassen, Himansu Desai
Abstract: Methods, parallel computers, and computer program products for requesting shared variable directory (SVD) information from a plurality of threads in a parallel computer are provided. Embodiments include a runtime optimizer detecting that a first thread requires a plurality of updated SVD information associated with shared resource data stored in a plurality of memory partitions. Embodiments also include a runtime optimizer broadcasting, in response to detecting that the first thread requires the updated SVD information, a gather operation message header to the plurality of threads. The gather operation message header indicates an SVD key corresponding to the required updated SVD information and a local address associated with the first thread to receive a plurality of updated SVD information associated with the SVD key. Embodiments also include the runtime optimizer receiving at the local address, the plurality of updated SVD information from the plurality of threads.
Type:
Application
Filed:
December 18, 2012
Publication date:
June 19, 2014
Applicant:
INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventors:
CHARLES J. ARCHER, JAMES E. CAREY, PHILIP J. SANDERS, BRIAN E. SMITH
Abstract: In an apparatus which includes a plurality of processing modules connected via a ring-shape bus, if a plurality pieces of pipeline processing to be processed in a different order is allocated to a plurality of processing modules, the transfer efficiency may decrease when an amount of data transferred from one of the processing modules to a post-stage module exceeds a processing capacity of the post-stage module. Accordingly, a module positioned on the preceding side in the pipeline processing controls a transmission interval of processed data so that the post-stage module can receive the data processed by the preceding module.
Abstract: Systems and methods for scheduling instructions for execution on a multi-core processor reorder the execution of different threads to ensure that instructions specified as having localized memory access behavior are executed over one or more sequential clock cycles to benefit from memory access locality. At compile time, code sequences including memory access instructions that may be localized are delineated into separate batches. A scheduling unit ensures that multiple parallel threads are processed over one or more sequential scheduling cycles to execute the batched instructions. The scheduling unit waits to schedule execution of instructions that are not included in the particular batch until execution of the batched instructions is done so that memory access locality is maintained for the particular batch. In between the separate batches, instructions that are not included in a batch are scheduled so that threads executing non-batched instructions are also processed and not starved.
Type:
Application
Filed:
December 10, 2012
Publication date:
June 12, 2014
Applicant:
NVIDIA CORPORATION
Inventors:
Olivier GIROUX, Jack Hilaire CHOQUETTE, Xiaogang QIU, Robert J. STOLL
Abstract: A system and method for controlling messaging between a first processor and a second processor is disclosed. The second processor controls one or more peripheral devices on behalf of a plurality of predetermined tasks being executed by the first processor. The system includes a message control module that receives an input message intended for the second processor from the first processor and maintains a message history based on the received input message and previously received input messages. The message history indicates which peripheral devices of the system are to be on and which tasks of the plurality of tasks requested the peripheral devices to be on. The message control module is further configured to generate an output message that includes output instructions for the second processor based on the message history and an output duration based on the message history. The second processor executes the output instructions.
Type:
Grant
Filed:
March 31, 2011
Date of Patent:
June 10, 2014
Assignees:
DENSO International America, Inc., Denso Corporation
Inventors:
Wan-ping Yang, Koji Shinoda, Hiroaki Shibata
Abstract: A data processing apparatus is provided comprising first processing circuitry, second processing circuitry and shared processing circuitry. The first processing circuitry and second processing circuitry are configured to operate in different first and second power domains respectively and the shared processing circuitry is configured to operate in a shared power domain. The data processing apparatus forms a uni-processing environment for executing a single instruction stream in which either the first processing circuitry and the shared processing circuitry operate together to execute the instruction stream or the second processing circuitry and the shared processing circuitry operate together to execute the single instruction stream. Execution flow transfer circuitry is provided for transferring at least one bit of processing-state restoration information between the two hybrid processing units.
Abstract: A method, apparatus and computer program product are therefore provided to enable context aware logging. In this regard, the method, apparatus, and computer program product may record events that occur in one or more applications, where the events are due to user input. These events may be associated with time values and data describing application contexts, such that the events may be used to generate an input log that also records application semantics and statuses. A variety of operations may be performed using this input log, including recreation of an application state by playing back the log, the ability to suspend or resume a user session, the ability to perform undo or pause operations, the ability to analyze user inputs to train or audit users, testing of users, troubleshooting of errors, and enabling multi-user collaboration.
Abstract: In an embodiment, the present invention includes a processor having an execution logic to execute instructions and a control transfer termination (CTT) logic coupled to the execution logic. This logic is to cause a CTT fault to be raised if a target instruction of a control transfer instruction is not a CTT instruction. Other embodiments are described and claimed.
Type:
Application
Filed:
November 30, 2012
Publication date:
June 5, 2014
Inventors:
Vedyvas Shanbhogue, Jason W. Brandt, Uday R. Savagaonkar, Ravi L. Sahita
Abstract: A Very Long Instruction Word (VLIW) processor having an instruction set with a reduced size resulting in a small number of bits being necessary to specify registers. The VLIW processor includes a register file, and first through third operation units, and executes a very long instruction word. Further, the very long instruction word includes a register specifying field which specifies a least one of the registers in the register file and a plurality of instructions. The operand of each instruction includes bits src1, src2, and dst, which indicate whether or not the registers specified by the register specifying field are to be used as the source register and the destination register.
Abstract: In a processor core, high latency operations are tracked in entries of a data structure associated with an execution unit of the processor core. In the execution unit, execution of an instruction dependent on a high latency operation tracked by an entry of the data structure is speculatively finished prior to completion of the high latency operation. Speculatively finishing the instruction includes reporting an identifier of the entry to completion logic of the processor core and removing the instruction from an execution pipeline of the execution unit. The completion logic records dependence of the instruction on the high latency operation and commits execution results of the instruction to an architected state of the processor only after successful completion of the high latency operation.
Type:
Application
Filed:
November 16, 2012
Publication date:
May 22, 2014
Applicant:
INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventors:
SUNDEEP CHADHA, BRYAN LLOYD, DUNG Q. NGUYEN, DAVID S. RAY, BENJAMIN W. STOLT
Abstract: A processor includes a physical register file having physical registers and an execution unit to perform an arithmetic operation to generate a result mapped to a physical register, wherein the processor delays a write of the result to the physical register file until the result is qualified as valid. A method includes mapping the same physical register both to store load data of a load-execute operation and to subsequently store a result of an arithmetic operation of the load-execute operation, and writing the load data into the physical register. The method further includes, in a first clock cycle, executing the arithmetic operation to generate the result, and, in a second clock cycle, providing the result as a source operand for a dependent operation. The method includes, in a third clock cycle, enabling a write of the result to the physical register file responsive to the result qualifying as valid.
Type:
Application
Filed:
November 9, 2012
Publication date:
May 15, 2014
Applicant:
Advanced Micro Devices, Inc.
Inventors:
Ganesh Venkataramanan, Debjit Das Sarma, Betty A. McDaniel, Gregory W. Smaus, Francesco Spadini
Abstract: Disclosed are methods and devices, among which is a method for configuring an electronic device. In one embodiment, an electronic device may include one or more memory locations having stored values representative of the capabilities of the device. According to an example configuration method, a configuring system may access the device capabilities from the one or more memory locations and configure the device based on the accessed device capabilities.
Abstract: A programming language may include hint instructions that may notify a programming idiom accelerator that a programming idiom is coming. An idiom begin hint exposes the programming idiom to the programming idiom accelerator. Thus, the programming idiom accelerator need not perform pattern matching or other forms of analysis to recognize a sequence of instructions. Rather, the programmer may insert idiom hint instructions, such as an idiom begin hint, to expose the idiom to the programming idiom accelerator. Similarly, an idiom end hint may mark the end of the programming idiom.
Type:
Grant
Filed:
February 1, 2008
Date of Patent:
May 13, 2014
Assignee:
International Business Machines Corporation
Inventors:
Ravi K. Arimilli, Satya P. Sharma, Randal C. Swanberg
Abstract: Various systems, processes, products, and techniques may be used to manage thread transitions. In particular implementations, a system and process for managing thread transitions may include the ability to determine that a transition is to be made regarding the relative use of two data register sets and determine, based on the transition determination, whether to move thread data in at least one of the data register sets to second-level registers. The system and process may also include the ability to move the thread data from at least one data register set to second-level registers based on the move determination.
Type:
Grant
Filed:
February 23, 2011
Date of Patent:
May 13, 2014
Assignee:
International Business Machines Corporation
Inventors:
Christopher M. Abernathy, Mary D. Brown, Susan E. Eisen, James A. Kahle, Hung Q. Le, Dung Q. Nguyen
Abstract: A system and method are described for providing hints to a processing unit that subsequent operations are likely. Responsively, the processing unit takes steps to prepare for the likely subsequent operations. Where the hints are more likely than not to be correct, the processing unit operates more efficiently. For example, in an embodiment, the processing unit consumes less power. In another embodiment, subsequent operations are performed more quickly because the processing unit is prepared to efficiently handle the subsequent operations.
Type:
Application
Filed:
November 7, 2012
Publication date:
May 8, 2014
Applicant:
NVIDIA CORPORATION
Inventors:
David Conrad TANNENBAUM, Ming Y. SIU, Stuart F. OBERMAN, Colin SPRINKLE, Srinivasan IYER, Ian Chi Yan KWONG
Abstract: A method and apparatus for picking load or store instructions is presented. Some embodiments of the method include determining that the entry in the queue includes an instruction that is ready to be executed by the processor based on at least one instruction-based event and concurrently determining cancel conditions based on global events of the processor. Some embodiments also include selecting the instruction for execution when the cancel conditions are not satisfied.
Abstract: A processor includes an initiating hardware thread, which initiates a first assist hardware thread to execute a first code segment. Next, the initiating hardware thread sets an assist thread executing indicator in response to initiating the first assist hardware thread. The set assist thread executing indicator indicates whether assist hardware threads are executing. A second assist hardware thread initiates and begins executing a second code segment. In turn, the initiating hardware thread detects a change in the assist thread executing indicator, which signifies that both the first assist hardware thread and the second assist hardware thread terminated. As such, the initiating hardware thread evaluates assist hardware thread results in response to both of the assist hardware threads terminating.
Type:
Grant
Filed:
January 23, 2013
Date of Patent:
May 6, 2014
Assignee:
International Business Machines Corporation
Inventors:
Richard Louis Arndt, Giles Roger Frazier, Ronald P. Hall
Abstract: A microprocessor pipeline arrangement 1 includes a plurality of functional units 2, 3, 4, 5 and 6. Each functional unit 2, 3, 4, 5, 6 also has access to a respective cache memory 7, 8, 9, 10, 11. Threads for processing are received by the first functional unit 2 from an external source 12, and output by an end functional unit 6 of the pipeline to an output target 13. If a thread encounters a cache-miss on its passage through the pipeline, the thread is allowed to continue to pass through the pipeline in the normal manner. However, when the thread reaches the end of the pipeline, it is sent via a loopback path 14 back to the beginning of the pipeline to be sent through the pipeline again. In this way, any thread that has not completed its processing on passing through the pipeline can be sent through the pipeline again to allow the processing of the thread to be completed.
Abstract: An image processing apparatus which makes it possible to select a plurality of instructions at a time, and connect a plurality of documents together so that they can be processed as one document. The image processing apparatus has a reading unit, which reads an image on an original to generate image data, and performs processing according to an instruction defining reading processing to be performed, as well as processing on the generated image data. The selected plurality of instructions are analyzed, and based on the analysis result, the selected plurality of instructions are connected together to create a new instruction.
Abstract: A processor with a register file mapper can use a hasher to improve the distribution of mappings within a mapping structure. The hasher generates a value based, at least in part, on a thread identifier and logical register identifier. The hash value is used as an index value into the mapping structure. The hashing algorithm is chosen to provide a more even distribution of mappings within the mapping structure, reducing the amount of data written from a first level register file to a second level register file.
Type:
Application
Filed:
October 31, 2012
Publication date:
May 1, 2014
Applicant:
INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor:
International Business Machines Corporation
Abstract: One embodiment of the present invention enables threads executing on a processor to locally generate and execute work within that processor by way of work queues and command blocks. A device driver, as an initialization procedure for establishing memory objects that enable the threads to locally generate and execute work, generates a work queue, and sets a GP_GET pointer of the work queue to the first entry in the work queue. The device driver also, during the initialization procedure, sets a GP_PUT pointer of the work queue to the last free entry included in the work queue, thereby establishing a range of entries in the work queue into which new work generated by the threads can be loaded and subsequently executed by the processor. The threads then populate command blocks with generated work and point entries in the work queue to the command blocks to effect processor execution of the work stored in the command blocks.
Type:
Application
Filed:
October 26, 2012
Publication date:
May 1, 2014
Applicant:
NVIDIA CORPORATION
Inventors:
Ignacio LLAMAS, Craig Ross DUTTWEILER, Jeffrey A. BOLZ, Daniel Elliot WEXLER
Abstract: Systems and methods for instruction entity allocation and scheduling on multi-processors is provided. In at least one embodiment, a method for generating an execution schedule for a plurality of instruction entities for execution on a plurality of processing units comprises arranging the plurality of instruction entities into a sorted order and allocating instruction entities in the plurality of instruction entities to individual processing units in the plurality of processing units. The method further comprises scheduling instances of the instruction entities in scheduled time windows in the execution schedule, wherein the instances of the instruction entities are scheduled in scheduled time windows according to the sorted order of the plurality of instruction entities and organizing the execution schedule into execution groups.
Abstract: Emulation of source machine instructions is provided in which target machine CPU condition codes are employed to produce emulated condition code settings without the use, encoding or generation of branching instructions.
Type:
Grant
Filed:
January 30, 2007
Date of Patent:
April 29, 2014
Assignee:
International Business Machines Corporation
Inventors:
Reid T. Copeland, Patrick R. Doyle, Charles B. Hall, Andrew Johnson, Ali I. Sheikh
Abstract: A parallel processing computing system includes an ordered set of m memory banks and a processor core. The ordered set of m memory banks includes a first and a last memory bank, wherein m is an integer greater than 1. The processor core implements n virtual processors, a pipeline having p ordered stages, including a memory operation stage, and a virtual processor selector function.
Abstract: A processor includes an initiating hardware thread, which initiates a first assist hardware thread to execute a first code segment. Next, the initiating hardware thread sets an assist thread executing indicator in response to initiating the first assist hardware thread. The set assist thread executing indicator indicates whether assist hardware threads are executing. A second assist hardware thread initiates and begins executing a second code segment. In turn, the initiating hardware thread detects a change in the assist thread executing indicator, which signifies that both the first assist hardware thread and the second assist hardware thread terminated. As such, the initiating hardware thread evaluates assist hardware thread results in response to both of the assist hardware threads terminating.
Type:
Grant
Filed:
September 20, 2010
Date of Patent:
April 29, 2014
Assignee:
International Business Machines Corporation
Inventors:
Richard Louis Arndt, Giles Roger Frazier, Ronald P. Hall
Abstract: A data processing architecture includes multiple processors connected in series between a load balancer and reorder logic. The load balancer is configured to receive data and distribute the data across the processors. Appropriate ones of the processors are configured to process the data. The reorder logic is configured to receive the data processed by the processors, reorder the data, and output the reordered data.
Abstract: A method and system for providing a memory access check on a processor including the steps of detecting accesses to a memory device including level-1 cache using a wakeup unit. The method includes invalidating level-1 cache ranges corresponding to a guard page, and configuring a plurality of wakeup address compare (WAC) registers to allow access to selected WAC registers. The method selects one of the plurality of WAC registers, and sets up a WAC register related to the guard page. The method configures the wakeup unit to interrupt on access of the selected WAC register. The method detects access of the memory device using the wakeup unit when a guard page is violated. The method generates an interrupt to the core using the wakeup unit, and determines the source of the interrupt. The method detects the activated WAC registers assigned to the violated guard page, and initiates a response.
Type:
Grant
Filed:
January 29, 2010
Date of Patent:
April 29, 2014
Assignee:
International Business Machines Corporation
Inventors:
Thomas M. Gooding, David L. Satterfield, Burkhard Steinmacher-Burow
Abstract: The present disclosure provides a processor, and associated method, for performing parallel processing within a register. An exemplary processor may include a processing element having a compute unit and a register file. The register file includes a register that is divisible into lanes for parallel processing. The processor may further include a mask register and a predicate register. The mask register and the predicate register respective include a number of mask bits and predicate bits equal to a maximum number of divisible lanes of the register. A state of the mask bits and predicate bits is set to respectively achieve enabling/disabling of the lanes from executing an instruction and conditional performance of an operation defined by the instruction. Further, the processor is operable to perform a reduction operation across the lanes of the processing element and/or generate an address for each of the lanes of the processing element.
Type:
Application
Filed:
January 10, 2013
Publication date:
April 24, 2014
Applicant:
Analog Devices Technology
Inventors:
Kaushal Sanghai, Michael G. Perkins, Andrew J. Higham
Abstract: According to an example embodiment, a processor such as a digital signal processor (DSP), is provided with a register acting as a predicate counter. The predicate counter may include more than two useful values, and in addition to acting as a condition for executing an instruction, may also keep track of nesting levels within a loop or conditional branch. In some cases, the predicate counter may be configured to operate in single-instruction, multiple data (SIMD) mode, or SIMD-within-a-register (SWAR) mode.
Type:
Application
Filed:
August 9, 2013
Publication date:
April 24, 2014
Applicant:
ANALOG DEVICES TECHNOLOGY
Inventors:
Andrew J. Higham, Boris Lemer, Kaushal Sanghai, Michael G. Perkins, John L. Redford, Michael S. Allen
Abstract: A set of helper thread binaries is created to retrieve data used by a set of main thread binaries. The set of helper thread binaries and the set of main thread binaries are partitioned according to common instruction boundaries. As a first partition in the set of main thread binaries executes within a first core, a second partition in the set of helper thread binaries executes within a second core, thus “warming up” the cache in the second core. When the first partition of the main completes execution, a second partition of the main core moves to the second core, and executes using the warmed up cache in the second core.
Type:
Grant
Filed:
February 1, 2008
Date of Patent:
April 22, 2014
Assignee:
International Business Machines Corporation
Inventors:
Ravi K. Arimilli, Juan C. Rubio, Balaram Sinharoy
Abstract: A processor in a data processing system executes a permutation instruction which identifies a first source register, at least one other source register, and a destination register. The first source register stores at least one in-range index value for the at least one other source register and at least one out-of-range index value for the at least one other source register. The at least one other source register stores a plurality of vector element values, wherein each in-range index value indicates which vector element value of the at least one other source register is to be stored into a corresponding vector element of the destination register. Each out-of-range index value is used to indicate which one of at least two predetermined constant values is to be stored into a corresponding vector element of the destination register. Partial table lookups using a permutation instruction shortens the time required to retrieve data.
Type:
Grant
Filed:
October 12, 2007
Date of Patent:
April 15, 2014
Assignee:
Freescale Semiconductor, Inc.
Inventors:
William C. Moyer, Imran Ahmed, Dan E. Tamir
Abstract: A processor and a processor control method which efficiently perform an operation on data using a register, are provided. The register may include a data type field and a data field. The processor may generate the data type bits and store the generated data type bits in the data type field.
Abstract: Systems and methods are provided for speculatively elevating a privilege level at which instructions are executed. In embodiment, this is accomplished b identification of a privilege elevation instruction (e.g., SYSCALL) at an early pipeline stage and speculatively executing subsequent instructions with elevated privileges.
Abstract: An information processing system records an execution of a program instruction. A determination is made that a thread has entered a program unit. Another determination is made that that the thread is associated with at least one attribute that matches a set of thread recording criteria. An instruction recording mechanism for the thread is dynamically activated in response to the at least one attribute of the thread matching the set of thread recording criteria.
Type:
Application
Filed:
December 13, 2013
Publication date:
April 10, 2014
Applicant:
International Business Machines Corporation
Inventors:
Christopher D. FILACHEK, Mei Hui WANG, Joshua B. WISNIEWSKI
Abstract: A pipelined processing device includes: a pipeline controller configured to receive at least one instruction associated with an operation from each of a plurality of subcontrollers, and input the at least one instruction into a pipeline; and a pipeline counter configured to receive an active time value from each of the plurality of subcontrollers, the active time value indicating at least a portion of a time taken to process the at least one instruction, the pipeline controller configured to route the active time value to a shared pipeline storage for performance analysis.
Type:
Application
Filed:
December 3, 2013
Publication date:
April 3, 2014
Applicant:
International Business Machines Corporation
Inventors:
Ekaterina M. Ambroladze, Deanna Postles Dunn Berger, Michael Fee, Christine C. Jones, Arthur J. O'Neill, Diana Lynn Orf, Robert J. Sonnelitter
Abstract: An apparatus and method are described for performing efficient gather operations in a pipelined processor. For example, a processor according to one embodiment of the invention comprises: gather setup logic to execute one or more gather setup operations in anticipation of one or more gather operations, the gather setup operations to determine one or more addresses of vector data elements to be gathered by the gather operations; and gather logic to execute the one or more gather operations to gather the vector data elements using the one or more addresses determined by the gather setup operations.
Type:
Application
Filed:
September 28, 2012
Publication date:
April 3, 2014
Inventors:
Edward T. Grochowski, Dennis R. Bradford, George Z. Chrysos, Andrew T. Forsyth, Michael D. Upton, Lisa K. Wu
Abstract: A system serialization capability is provided to facilitate processing in those environments that allow multiple processors to update the same resources. The system serialization capability is used to facilitate processing in a multi-processing environment in which guests and hosts use locks to provide serialization. The system serialization capability includes a diagnose instruction which is issued after the host acquires a lock, eliminating the need for the guest to acquire the lock.
Type:
Application
Filed:
December 3, 2013
Publication date:
April 3, 2014
Applicant:
INTERNATIONAL BUSINESS MACHINES CORPORATION
Abstract: A processor executes a mask update instruction to perform updates to a first mask register and a second mask register. A register file within the processor includes the first mask register and the second mask register. The processor includes execution circuitry to execute the mask update instruction. In response to the mask update instruction, the execution circuitry is to invert a given number of mask bits in the first mask register, and also to invert the given number of mask bits in the second mask register.
Type:
Application
Filed:
September 28, 2012
Publication date:
April 3, 2014
Inventors:
Mikhail Plotnikov, Andrey Naraikin, Christopher Hughes
Abstract: A processor including a circuit unit includes a state information holding unit, a direction controller, a direction generator, and a direction execution unit. The state information holding unit holds state information indicating a state of the circuit unit. The direction controller decodes a first direction for generating a control direction that is contained in a program. The direction generator generates a second direction when the first direction decoded by the direction controller is a direction for generating the second direction for reading the state information from the state information holding unit. The direction execution unit reads the state information from the state information holding unit based on the second direction generated by the direction generator so as to store the state information in a register unit that is capable of being read from a program.
Abstract: A processor includes a processing unit including a storage module having stored thereon a physical reference list for storing identifications of physical registers that have been referenced by multiple logical registers, and a reclamation module for reclaiming physical registers to a free list based on a count of each of the physical registers on the physical reference list.
Type:
Application
Filed:
September 28, 2012
Publication date:
April 3, 2014
Inventors:
VIJAYKUMAR VIJAY KADGI, JAMES D. HADLEY, AVINASH SODANI, MATTHEW C. MERTEN, MORRIS MARDEN, JOSEPH A. MCMAHON, GRACE C. LEE, LAURA A. KNAUTH, ROBERT S. CHAPPELL, FARIBORZ TABESH
Abstract: A method, a system and a computer program product for controlling the hardware priority of hardware threads in a data processing system. A Thread Priority Control (TPC) utility assigns a primary level and one or more secondary levels of hardware priority to a hardware thread. When a hardware thread initiates execution in the absence of a system call, the TPC utility enables execution based on the primary level. When the hardware thread initiates execution within a system call, the TPC utility dynamically adjusts execution from the primary level to the secondary level associated with the system call. The TPC utility adjusts hardware priority levels in order to: (a) raise the hardware priority of one hardware thread relative to another; (b) reduce energy consumed by the hardware thread; and (c) fulfill requirements of time critical hardware sections.
Type:
Grant
Filed:
October 30, 2008
Date of Patent:
April 1, 2014
Assignee:
International Business Machines Corporation
Inventors:
Vaijayanthimala K. Anand, Joerg Droste, Bruce Mealey, Bret Ronald Olszewski
Abstract: A processor includes a plurality of execution units. At least one of the execution units is configured to execute a complex instruction that requires multiple instruction cycles to execute, and to enforce atomic execution of the complex instruction during a first-portion of the multiple instruction cycles required to execute the complex instruction. The at least one of the execution units is further configured to enable execution of the complex instruction to be interrupted for execution of a different instruction by the at least one execution unit during execution of a second portion of the multiple instruction cycles. The first portion and the second portion are non-overlapping.
Abstract: A technique for optimizing program instruction execution throughput in a central processing unit core (CPU). The CPU implements a simultaneous multithreading (SMT) operational mode wherein program instructions associated with at least two software threads are executed in parallel as hardware threads while sharing one or more hardware resources used by the CPU, such as cache memory, translation lookaside buffers, functional execution units, etc. As part of the SMT mode, the CPU implements an autothread (AT) operational mode. During the AT operational mode, a determination is made whether there is a resource conflict between the hardware threads that undermines instruction execution throughput. If a resource conflict is detected, the CPU adjusts the relative instruction execution rates of the hardware threads based on relative priorities of the software threads.
Type:
Application
Filed:
November 29, 2013
Publication date:
March 27, 2014
Applicant:
International Business Machines Corporation
Inventors:
Amit Merchant, Dipankar Sarma, Vaidyanathan Srinivasan
Abstract: One or more embodiments may provide a method for performing a replay. The method includes initiating execution of a program, the program having a plurality of sets of instructions, and each set of instructions has a number of chunks of instructions. The method also includes intercepting, by a virtual machine unit executing on a processor, an instruction of a chunk of the number of chunks before execution. The method further includes determining, by a replay module executing on the processor, whether the chunk is an active chunk, and responsive to the chunk being the active chunk, executing the instruction.
Type:
Application
Filed:
September 27, 2012
Publication date:
March 27, 2014
Inventors:
Justin E. Gottschlich, Klaus Danne, Cristiano L. Pereira, Gilles A. Pokam, Rolf Kassa, Shiliang Hu, Tim Kranich
Abstract: A processor includes a plurality of execution units. At least one of the execution units is configured to repeatedly execute a first instruction based on a first field of the first instruction indicating that the first instruction is to be iteratively executed.
Abstract: An arithmetic processor includes a first pipeline unit configured to execute a first instruction that is input; a second pipeline unit configured to execute a second instruction that is input; a registration unit into which an aborted instruction is registered, the aborted instruction being the first instruction when the first pipeline unit is unable to complete the first instruction or the second instruction when the second pipeline unit is unable to complete the second instruction; a determination unit configured to make a determination as to which one of the first pipeline unit and the second pipeline unit is operating under a lower load; and an input unit configured to input, in the first pipeline unit or the second pipeline unit that is determined as operating under the lower load by the determination unit, the aborted instruction that is registered in the registration unit.
Abstract: A system and method for controlling processor instruction execution. In one example, a method for synchronizing a number of instructions performed by processors includes instructing a first processor to iteratively execute instructions via a first set of iterations until a predetermined time period has elapsed. A number of instructions executed in each iteration of the first set of iterations is less than a number of instructions executed in a prior iteration of the first set of iterations. The method also includes instructing a second processor to iteratively execute instructions via a second set of iterations until the predetermined time period has elapsed. A number of instructions executed in each iteration of the second set of iterations is less than a number of instructions executed in a prior iteration of the second set of iterations. The method includes determining whether additional instructions are to be executed.
Type:
Application
Filed:
September 14, 2012
Publication date:
March 20, 2014
Applicant:
General Electric Company
Inventors:
Willliam David Smith, II, Safayet Nizam Uddin Ahmed, Jon Marc Diekema
Abstract: Provided are an apparatus for reconfiguring a mapping method and a scheduling method in a reconfigurable multi-processor system. A single function is mapped to a reconfigurable processor. When a task is created in the reconfigurable multi-processor system, a function of the task is dynamically mapped to a host processor or a reconfigurable processor, thereby removing temporal sharing between functions on the reconfigurable processor and thus reducing the number of times reconfiguration is performed. The overhead of the reconfigurable processor is minimized and the reconfigurable processor is optimized for a dynamic multi-application environment.
Type:
Grant
Filed:
October 31, 2007
Date of Patent:
March 18, 2014
Assignee:
Samsung Electronics Co., Ltd.
Inventors:
Chae-Seok Im, Gyu-Sang Choi, Jung-Keun Park
Abstract: An instruction fusion calculation device of the present invention includes an instruction fusion detection circuit, an instruction fusion circuit, and a calculator. The instruction fusion detection circuit determines whether or not a fusion of a preceding instruction and a subsequent instruction that have a flow dependence relationship between them can be made. The instruction fusion circuit fuses the preceding instruction and the subsequent instruction to which it is determined by the instruction fusion detection circuit that the instructions can be fused into one instruction. The calculator executes the fused instruction into which the instructions are fused by the instruction fusion circuit to output the calculation result and outputs at least one of the calculation results obtained by executing the preceding instruction and the subsequent instruction as an intermediate result.