Instruction Issuing Patents (Class 712/214)

Simultaneous issuance of multiple instructions (Class 712/215)

Code coverage and confidence determination

Patent number: 9292419

Abstract: A device receives code for a technical computing environment, and receives conditions for executing the code. The device performs a static analysis of the code, based on the conditions, to generate static analysis information for the code, and executes the code in the technical computing environment based on the conditions. The device determines coverage information associated with the executing code, where the coverage information provides a measure of completeness associated with the executing code. The device compares the static analysis information and the coverage information to determine confidence information associated with the coverage information, and outputs or stores the coverage information and the confidence information.

Type: Grant

Filed: May 29, 2014

Date of Patent: March 22, 2016

Assignee: The MathWorks, Inc.

Inventors: Kiran K. Kintali, Anand Krishnamoorthi, Ebrahim Mestchian, Richard M. McKeever
Optimal deallocation of instructions from a unified pick queue

Patent number: 9286075

Abstract: Systems and methods for efficient out-of-order dynamic deallocation of entries within a shared storage resource in a processor. A processor comprises a unified pick queue that includes an array configured to dynamically allocate any entry of a plurality of entries for a decoded and renamed instruction. This instruction may correspond to any available active threads supported by the processor. The processor includes circuitry configured to determine whether an instruction corresponding to an allocated entry of the plurality of entries is dependent on a speculative instruction and whether the instruction has a fixed instruction execution latency. In response to determining the instruction is not dependent on a speculative instruction, the instruction has a fixed instruction execution latency, and said latency has transpired, the circuitry may deallocate the instruction from the allocated entry.

Type: Grant

Filed: September 30, 2009

Date of Patent: March 15, 2016

Assignee: Oracle America, Inc.

Inventors: Matthew B. Smittle, Robert T. Golla
Data processing apparatus having a parallel processing circuit including a plurality of processing modules, and method for controlling the same

Patent number: 9286078

Abstract: In an apparatus which includes a plurality of processing modules connected via a ring-shape bus, if a plurality pieces of pipeline processing to be processed in a different order is allocated to a plurality of processing modules, the transfer efficiency may decrease when an amount of data transferred from one of the processing modules to a post-stage module exceeds a processing capacity of the post-stage module. Accordingly, a module positioned on the preceding side in the pipeline processing controls a transmission interval of processed data so that the post-stage module can receive the data processed by the preceding module.

Type: Grant

Filed: April 28, 2014

Date of Patent: March 15, 2016

Assignee: CANON KABUSHIKI KAISHA

Inventors: Hiroyasu Watanabe, Hirowo Inoue, Hisashi Ishikawa
Flush operations in a processor

Patent number: 9268575

Abstract: Methods and apparatuses are provided for flush operations in a processor. The apparatus comprises an out-of-order execution unit for processing instructions issued in-order from an instruction decoder for first and second threads and being configured to identify an errored instruction in a first thread. A retire unit includes a retire queue for receiving completed instructions from the out-of-order execution unit, the retire unit being configured retire older in-order first thread instructions until the errored instruction would be the next instruction to be retired, and then flushing the errored instruction and all later in-order first thread instructions from the retire queue. The method comprises determining that an errored instruction is being processed by an out-of-order execution unit of a processor and continuing to process to completion instructions earlier in-order from the errored instruction until the completion of the errored instruction.

Type: Grant

Filed: June 30, 2011

Date of Patent: February 23, 2016

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Jay E. Fleischman, Emil Talpes, Debjit DasSarma
Load latency speculation in an out-of-order computer processor

Patent number: 9262160

Abstract: Load latency speculation in an out-of-order computer processor, including: issuing a load instruction for execution, wherein the load instruction has a predetermined expected execution latency; issuing a dependent instruction wakeup signal on an instruction wakeup bus, wherein the dependent instruction wakeup signal indicates that the load instruction will be completed upon the expiration of the expected execution latency; determining, upon the expiration of the expected execution latency, whether the load instruction has completed; and responsive to determining that the load instruction has not completed upon the expiration of the expected execution latency, issuing a negative dependent instruction wakeup signal on the instruction wakeup bus, wherein the negative dependent instruction wakeup signal indicates that the load instruction has not completed upon the expiration of the expected execution latency.

Type: Grant

Filed: March 5, 2013

Date of Patent: February 16, 2016

Assignee: International Business Machines Corporation

Inventors: Timothy H. Heil, Andrew D. Hilton, Adam J. Muff
Load latency speculation in an out-of-order computer processor

Patent number: 9256428

Abstract: Load latency speculation in an out-of-order computer processor, including: issuing a load instruction for execution, wherein the load instruction has a predetermined expected execution latency; issuing a dependent instruction wakeup signal on an instruction wakeup bus, wherein the dependent instruction wakeup signal indicates that the load instruction will be completed upon the expiration of the expected execution latency; determining, upon the expiration of the expected execution latency, whether the load instruction has completed; and responsive to determining that the load instruction has not completed upon the expiration of the expected execution latency, issuing a negative dependent instruction wakeup signal on the instruction wakeup bus, wherein the negative dependent instruction wakeup signal indicates that the load instruction has not completed upon the expiration of the expected execution latency.

Type: Grant

Filed: February 6, 2013

Date of Patent: February 9, 2016

Assignee: International Business Machines Corporation

Inventors: Timothy H. Heil, Andrew D. Hilton, Adam J. Muff
Identifying thread progress information by monitoring transitions between interesting states

Patent number: 9218199

Abstract: Embodiments relate to a method, apparatus and program product and for capturing thread specific state timing information. The method includes associating a time field and a time valid field to a thread data structure and setting a current time state by determining a previous time state and updating it according to a previously identified method for setting time states. The method further includes determining status of a time valid bit to see if it is set to valid or invalid. When the status is valid, it is made available for reporting.

Type: Grant

Filed: November 23, 2012

Date of Patent: December 22, 2015

Assignee: International Business Machines Corporation

Inventors: Michael H. Dawson, Trent A. Gray-Donald
Hardware recovery in multi-threaded processor

Patent number: 9213608

Abstract: A computer system includes a simultaneous multi-threading processor and memory in operable communication with the processor. The processor is configured to perform a method including running multiple threads simultaneously, detecting a hardware error in one or more hardware structures of the processing circuit, and identifying one or more victim threads of the multiple threads. The processor is further configured to identify a plurality of hardware structures associated with execution of the one or more victim threads, isolate the one or more victim threads from the rest of the multiple threads by preventing access to the plurality of hardware structures by the multiple threads, flush the one or more victim threads by resetting hardware states of the plurality of hardware structures, and restore the one or more victim threads by restoring the plurality of hardware structures to a known safe state.

Type: Grant

Filed: March 8, 2013

Date of Patent: December 15, 2015

Assignee: International Business Machines Corporation

Inventors: Fadi Y. Busaba, Steven R. Carlough, Christopher A. Krygowski, Brian R. Prasky, Chung-Lung K. Shum
Lighted multi-plane window well enhancement system

Patent number: 9192243

Abstract: The invention utilizes a number of thin, flexible layers placed in parallel to each other and the back wall of a window well. The inner layer is translucent such that when lit, the lighting shines through the layer and illuminates the middle and outer layers or shines through all the layers and through the window well and into the room. Alternatively, the inner layer is reflective so that it reflects light directed towards it into the other layers. The various layers can be variably transparent, translucent and/or opaque and can be shaded. By spacing the layers apart, depth can be imparted to a viewer. The outermost layer can have the appearance of a metallic film and/or artistic representations. This outer film can be stenciled such that patterns of light shine through in various places.

Type: Grant

Filed: May 17, 2012

Date of Patent: November 24, 2015

Inventor: Sharon Slaughter
Mitigating busy time in a high performance cache

Patent number: 9158694

Abstract: A method, information processing device, and computer program product mitigate busy time in a hierarchical store-through memory cache structure. A cache directory associated with a memory cache is divided into a plurality of portions each associated with a portion memory cache. Simultaneous cache lookup operations and cache write operations between the plurality of portions of the cache directory are supported. Two or more store commands are simultaneously processed in a shared cache pipeline communicatively coupled to the plurality of portions of the cache directory.

Type: Grant

Filed: October 31, 2012

Date of Patent: October 13, 2015

Assignee: International Business Machines Corporation

Inventors: Deanna P. Berger, Michael F. Fee, Christine C. Jones, Arthur J. O'Neill, Diana L. Orf, Robert J. Sonnelitter, III
Work units for content processing

Patent number: 9152932

Abstract: A system may create work units, each work unit including at least one of an input port or output port, each work unit configured to modify data that is received via the input port. In addition, the system may compose a workflow by connecting an output port of a first of the work units to an input port of a second of the work units, receive a work order, select the workflow in response to the work order, decompose the workflow into constituent work units, instantiate tasks that correspond to the constituent work units, and execute a work unit process for each of the tasks.

Type: Grant

Filed: December 17, 2010

Date of Patent: October 6, 2015

Assignee: VERIZON PATENT AND LICENSING INC.

Inventors: Mohammad Reza Shafiee, Hongfang Li, Wei Liu, Anurag Gupta, Ashutosh K. Sureka, Satya S. Raju
Hardware recovery in multi-threaded processor

Patent number: 9152510

Abstract: A computer system includes a simultaneous multi-threading processor and memory in operable communication with the processor. The processor is configured to perform a method including running multiple threads simultaneously, detecting a hardware error in one or more hardware structures of the processing circuit, and identifying one or more victim threads of the multiple threads. The processor is further configured to identify a plurality of hardware structures associated with execution of the one or more victim threads, isolate the one or more victim threads from the rest of the multiple threads by preventing access to the plurality of hardware structures by the multiple threads, flush the one or more victim threads by resetting hardware states of the plurality of hardware structures, and restore the one or more victim threads by restoring the plurality of hardware structures to a known safe state.

Type: Grant

Filed: July 13, 2012

Date of Patent: October 6, 2015

Assignee: International Business Machines Corporation

Inventors: Fadi Y. Busaba, Steven R. Carlough, Christopher A. Krygowski, Brian R. Prasky, Chung-Lung K. Shum
Eliminating redundant masking operations instruction processing circuits, and related processor systems, methods, and computer-readable media

Patent number: 9146741

Abstract: Eliminating redundant masking operations in instruction processing circuits and related processor systems, methods, and computer-readable media are disclosed. In one embodiment, a first instruction in an instruction stream indicating an operation writing a value to a first register is detected by an instruction processing circuit, the value having a value size less than a size of the first register. The circuit also detects a second instruction in the instruction stream indicating a masking operation on the first register. The masking operation is eliminated upon a determination that the masking operation indicates a read operation and a write operation on the first register and has an identity mask size equal to or greater than the value size. In this manner, the elimination of the masking operation avoids potential read-after-write hazards and improves performance of a CPU by removing redundant operations from an execution pipeline.

Type: Grant

Filed: October 19, 2012

Date of Patent: September 29, 2015

Assignee: QUALCOMM Incorporated

Inventors: Melinda J. Brown, Michael William Morrow, James Norris Dieffenderfer, Brian Michael Stempel, Michael Scott McIlvaine
Data processing system with latency tolerance execution

Patent number: 9141391

Abstract: In a processor having an instruction unit, a decode/issue unit, and execution queues configured to provide instructions to correspondingly different types execution units, a method comprises maintaining a duplicate free list for the execution queues. The duplicate free list includes a plurality of duplicate dependent instruction indicators that indicate when a duplicate instruction for a dependent instruction is stored in at least one of the execution queues. One of the duplicate dependent instruction indicators is assigned to an execution queue for a dependent instruction. The dependent instruction is executed only when the one of the duplicate dependent instruction indicators is reset.

Type: Grant

Filed: March 14, 2012

Date of Patent: September 22, 2015

Assignee: Freescale Semiconductor, Inc.

Inventors: Thang M. Tran, Trinh Huy Nguyen
Pointer chasing prediction

Patent number: 9116817

Abstract: A system and method for efficient scheduling of dependent load instructions. A processor includes both an execution core and a scheduler that issues instructions to the execution core. The execution core includes a load-store unit (LSU). The scheduler determines a first condition is satisfied, wherein the first condition comprises result data for a first load instruction is predicted eligible for LSU-internal forwarding. The scheduler determines a second condition is satisfied, wherein the second condition comprises a second load instruction younger in program order than the first load instruction is dependent on the first load instruction. In response to each of the first condition and the second condition being satisfied, the scheduler can issue the second load instruction earlier than it otherwise would. The LSU internally forwards the received result data from the first load instruction to address generation logic for the second load instruction.

Type: Grant

Filed: May 9, 2013

Date of Patent: August 25, 2015

Assignee: Apple Inc.

Inventor: Stephan G. Meier
Systems and methods for handling instructions of in-order and out-of-order execution queues

Patent number: 9110656

Abstract: A processor configured to provide instructions of a first instruction type to a first execution unit, and a second execution queue configured to provide instructions of a second instruction type to a second execution unit. A first instruction of the second instruction type is received. The first instruction is decoded by the decode/issue unit to determine operands of the first instruction. The operands of the first instruction are determined to include a dependency on a second instruction of the first instruction type stored in a first entry of the first execution queue. The first instruction is stored in a first entry of the second execution queue. A synchronization indicator corresponding to the first instruction in a second entry of the first execution queue is set immediately adjacent the first entry of the first execution queue, which indicates that the first instruction is stored in another execution queue.

Type: Grant

Filed: August 16, 2011

Date of Patent: August 18, 2015

Assignee: Freescale Semiconductor, Inc.

Inventors: Thang M. Tran, Trinh Huy H. Nguyen
Controlling an order for processing data elements during vector processing

Patent number: 9098265

Abstract: A data processing apparatus includes a register bank having a plurality of registers for storing vectors being processed; a pipelined processor for processing the stream of vector instructions; the pipelined processor comprising circuitry configured to detect data dependencies for the vectors processed by the stream of vector instructions and stored in the plurality of registers and to determine constraints on timing of execution for the vector instructions such that no register data hazards arise. Register data hazards arise where two accesses to a same register, at least one of said accesses being a write, occur in an order different to an order of said instruction stream such that an access occurring later in said instruction stream starts before an access occurring earlier in said instruction stream has completed. The pipelined processor includes data element hazard determination circuitry.

Type: Grant

Filed: July 11, 2012

Date of Patent: August 4, 2015

Assignee: ARM Limited

Inventor: Alastair David Reid
MFENCE and LFENCE micro-architectural implementation method and system

Patent number: 9098268

Abstract: A system and method for fencing memory accesses. Memory loads can be fenced, or all memory access can be fenced. The system receives a fencing instruction that separates memory access instructions into older accesses and newer accesses. A buffer within the memory ordering unit is allocated to the instruction. The access instructions newer than the fencing instruction are stalled. The older access instructions are gradually retired. When all older memory accesses are retired, the fencing instruction is dispatched from the buffer.

Type: Grant

Filed: September 14, 2012

Date of Patent: August 4, 2015

Assignee: Intel Corporation

Inventors: Salvador Palanca, Stephen A. Fischer, Subramaniam Maiyuran, Shekoufeh Qawami
Storing in other queue when reservation station instruction queue reserved for immediate source operand instruction execution unit is full

Patent number: 9043581

Abstract: A processing apparatus includes an execution unit which performs computation on two operand inputs each being selectable between read data from a register and an immediate value. The processing apparatus also includes another execution unit which performs computation on two operand inputs, one of which is selectable between read data from a register and an immediate value, and the other of which is an immediate value. A control unit determines, based on a received instruction specifying a computation on two operands, whether each of the two operands specifies read data from a register or an immediate value. Depending on the determination result, the control unit causes one of the execution units to execute the computation specified by the received instruction.

Type: Grant

Filed: November 14, 2011

Date of Patent: May 26, 2015

Assignee: FUJITSU LIMITED

Inventor: Masaki Ukai
Enhanced instruction scheduling during compilation of high level source code for improved executable code

Patent number: 9043582

Abstract: Systems and methods for static code scheduling are disclosed. A method can include receiving an intermediate representation of source code, building a directed acyclic graph (DAG) for the intermediate representation, and creating chains of dependent instructions from the DAG for cluster formation. The chains are merged into clusters and each node in the DAG is marked with an identifier of a cluster it is part of to generate a marked instruction DAG. Instruction DAG scheduling is then performed using information about the clusters to generate an ordered intermediate representation of the source code.

Type: Grant

Filed: September 14, 2012

Date of Patent: May 26, 2015

Assignee: Qualcomm Innovation Center, Inc.

Inventor: Sergei Larin
Issue policy control within a multi-threaded in-order superscalar processor

Patent number: 9032188

Abstract: A multi-threaded in-order superscalar processor 2 includes an issue stage 12 including issue circuitry 22, 24 for selecting instructions to be issued to execution units 14, 16 in dependence upon a currently selected issue policy. A plurality of different issue policies are provided by associated different policy circuitry 28, 30, 32 and a selection between which of these instances of the policy circuitry 28, 30, 32 is active is made by policy selecting circuitry 34 in dependence upon detected dynamic behavior of the processor 2.

Type: Grant

Filed: March 27, 2008

Date of Patent: May 12, 2015

Assignee: ARM Limited

Inventors: Emre Özer, Stuart David Biles
Efficient parallel computation of dependency problems

Patent number: 9032377

Abstract: A computing method includes accepting a definition of a computing task, which includes multiple Processing Elements (PEs) having execution dependencies. The computing task is compiled for concurrent execution on a multiprocessor device, by arranging the PEs in a series of two or more invocations of the multiprocessor device, including assigning the PEs to the invocations depending on the execution dependencies. The multiprocessor device is invoked to run software code that executes the series of the invocations, so as to produce a result of the computing task.

Type: Grant

Filed: June 2, 2013

Date of Patent: May 12, 2015

Assignee: Rocketick Technologies Ltd.

Inventors: Shay Mizrachi, Uri Tal, Tomer Ben-David, Ishay Geller, Ido Kasher, Ronen Gal
Detecting and reissuing of loop instructions in reorder structure

Patent number: 9026769

Abstract: A processor for processing loop instructions can include an instruction reorder structure and a loop processing controller. The instruction reorder structure is configured to store decoded instructions according to program order and issue the decoded instructions for execution out of program order. The loop processing controller is configured to detect a loop in the decoded instructions stored in the instruction reorder structure and cause the instruction reorder structure to reissue the decoded instructions that form the loop for re-execution.

Type: Grant

Filed: January 24, 2012

Date of Patent: May 5, 2015

Assignee: Marvell International Ltd.

Inventors: Sujat Jamil, R. Frank O'Bleness, Joseph Delgross, Tom Hameenanttila
Systems and Methods for Register Allocation

Publication number: 20150113251

Abstract: System and methods are provided for register allocation. An original code block and a target code block associated with a branch of an execution loop are determined. An original allocation of a plurality of physical registers to one or more original variables associated with the original code block is detected. A target allocation of the plurality of physical registers to one or more target variables associated with the target code block is determined. One or more temporary registers are selected from the plurality of physical registers based at least in part on the original allocation and the target allocation. The original allocation is changed to the target allocation using the selected temporary registers. Specifically, one or more instructions are generated to change the original allocation to the target allocation using the selected temporary registers. The instructions are executed using one or more processors.

Type: Application

Filed: September 12, 2014

Publication date: April 23, 2015

Inventors: Ningsheng Jian, Yuheng Zhang, Liping Gao, Haitao Huang, Xinyu Qi
Region-weighted accounting of multi-threaded processor core according to dispatch state

Patent number: 9015449

Abstract: An approach is provided in which a thread is selected from multiple active threads, along with a corresponding weighting value. Computational logic determines whether one of the multiple threads is dispatching an instruction and, if so, computes a dispatch weighting value using the selected weighting value and a dispatch factor that indicates a weighting adjustment of the selected weighting value. In turn, a resource utilization value of the selected thread is computed using the dispatch weighting value.

Type: Grant

Filed: March 27, 2011

Date of Patent: April 21, 2015

Assignee: International Business Machines Corporation

Inventors: James Wilson Bishop, Michael J. Genden, Steven Bradford Herndon, Philip Lee Vitale
Instruction type issue throttling upon reaching threshold by adjusting counter increment amount for issued cycle and decrement amount for not issued cycle

Patent number: 9009451

Abstract: A system and method for reducing power consumption through issue throttling of selected problematic instructions. A power throttle unit within a processor maintains instruction issue counts for associated instruction types. The instruction types may be a subset of supported instruction types executed by an execution core within the processor. The instruction types may be chosen based on high power consumption estimates for processing instructions of these types. The power throttle unit may determine a given instruction issue count exceeds a given threshold. In response, the power throttle unit may select given instruction types to limit a respective issue rate. The power throttle unit may choose an issue rate for each one of the selected given instruction types and limit an associated issue rate to a chosen issue rate. The selection of given instruction types and associated issue rate limits is programmable.

Type: Grant

Filed: October 31, 2011

Date of Patent: April 14, 2015

Assignee: Apple Inc.

Inventors: Daniel C. Murray, Andrew J. Beaumont-Smith, John H. Mylius, Peter J. Bannon, Toshi Takayanagi, Jung Wook Cho
VIRTUAL LOAD STORE QUEUE HAVING A DYNAMIC DISPATCH WINDOW WITH A UNIFIED STRUCTURE

Publication number: 20150095618

Abstract: An out of order processor. The processor includes a virtual load store queue for allocating a plurality of loads and a plurality of stores, wherein more loads and more stores can be accommodated beyond an actual physical size of the load store queue of the processor; wherein the processor allocates other instructions besides loads and stores beyond the actual physical size limitation of the load/store queue; and wherein the other instructions can be dispatched and executed even though intervening loads or stores do not have spaces in the load store queue.

Type: Application

Filed: December 11, 2014

Publication date: April 2, 2015

Inventor: Mohammad A. ABDALLAH
Global register protection in a multi-threaded processor

Patent number: 8996847

Abstract: Global register protection in a multi-threaded processor is described. In an embodiment, global resources within a multi-threaded processor are protected by performing checks, before allowing a thread to write to a global resource, to determine whether the thread has write access to the particular global resource. The check involves accessing one or more local control registers or a global control field within the multi-threaded processor and in an example, a local register associated with each other thread in the multi-threaded processor is accessed and checked to see whether it contains an identifier for the particular global resource. Only if none of the accessed local resources contain such an identifier, is the instruction issued and the thread allowed to write to the global resource. Otherwise, the instruction is blocked and an exception may be raised to alert the program that issued the instruction that the write failed.

Type: Grant

Filed: February 28, 2013

Date of Patent: March 31, 2015

Assignee: Imagination Technologies Limited

Inventors: Guixin Wang, Hugh Jackson, Robert Graham Isherwood
TECHNIQUE FOR REDUCING VOLTAGE DROOP BY THROTTLING INSTRUCTION ISSUE RATE

Publication number: 20150089198

Abstract: An issue control unit is configured to control the rate at which an instruction issue unit issues instructions to an execution pipeline in order to avoid spikes in power drawn by that execution pipeline. The issue control unit maintains a history buffer that reflects, for N previous cycles, the number of instructions issued during each of those N cycles. If the total number of instructions issued during the N previous cycles exceeds a threshold value, then the issue control unit throttles the instruction issue unit from issuing instructions during a subsequent cycle. In addition, the issue control unit increases the threshold value in proportion to the number of previously issued instructions and based on a variety of configurable parameters. Accordingly, the issue control unit maintains granular control over the rate with which the instruction issue unit “ramps up” to a maximum instruction issue rate.

Type: Application

Filed: September 20, 2013

Publication date: March 26, 2015

Applicant: NVIDIA CORPORATION

Inventors: Peter SOMMERS, Peter NELSON, Aniket NAIK, John H. EDMONDSON
Method, system, and computer program product for optimizing runtime branch selection in a flow process

Patent number: 8984259

Abstract: A method, system, and computer program product for optimizing runtime branch selection in a flow process are provided. The method includes gathering performance metrics of flow branch behavior for executed flows in a runtime system over time and using aggregated performance metrics for the behavior to determine an optimal ordering of branches for a currently running flow. The optimal ordering is determined by identifying one or more branch points in the flow, generating ordering permutations for at least a portion of the branches in the branch point for the flow to identify any permutations that have not been executed, gathering metrics for permutation(s) of the branch point in the flow, comparing the metrics to performance metrics of executed flows having substantially similar flow branch behavior, and identifying optimal branch ordering for the permutation(s) based upon the comparison. The method also includes executing the flow according to the optimal branch ordering.

Type: Grant

Filed: November 4, 2008

Date of Patent: March 17, 2015

Assignee: International Business Machines Corporation

Inventors: Brian Hulse, Callum P. Jackson, Christopher Kalus, Ian W. Parkinson, Robert W. Phippen, Amanda J. Watkinson
System and Method for an Asynchronous Processor with Token-Based Very Long Instruction Word Architecture

Publication number: 20150074379

Abstract: Embodiments are provided for an asynchronous processor with token-based very long instruction word architecture. The asynchronous processor comprises a memory configured to cache a plurality of instructions, a feedback engine configured to receive the instructions in bundles of instructions at a time (referred to as very long instruction word) and to decode the instructions, and a crossbar bus configured to transfer calculation information and results of the asynchronous processor. The apparatus further comprises a plurality of sets of execution units (XUs) between the feedback engine and the crossbar bus. Each set of the sets of XUs comprises a plurality of XUs arranged in series and configured to process a bundle of instructions received at the each set from the feedback engine.

Type: Application

Filed: September 8, 2014

Publication date: March 12, 2015

Inventors: Yiqun Ge, Wuxian Shi, Qifan Zhang, Tao Huang, Wen Tong
Apparatus and method for early issue and recovery for a conditional load instruction having multiple outcomes

Patent number: 8977837

Abstract: At least one instruction of a sequence of program instructions has a plurality of alternative outcomes including at least a first outcome that is independent of at least one operand and a second outcome that is dependent on the at least one operand. The at least one operand is a value generated by a preceding instruction in the sequence. The at least one instruction is issued for execution independently of when the at least one operand is generated by the preceding instruction. Recovery circuitry is provided to perform a recovery operation in the event that the second outcome is to be executed for the at least one instruction and the at least one operand has not been generated by the preceding instruction when the at least one instruction is to be executed by said instruction execution circuitry.

Type: Grant

Filed: May 27, 2009

Date of Patent: March 10, 2015

Assignee: ARM Limited

Inventors: Robert Gregory McDonald, Paul Gilbert Meyer
Code section optimization by removing memory barrier instruction and enclosing within a transaction that employs hardware transaction memory

Patent number: 8972704

Abstract: A code section of a computer program to be executed by a computing device includes memory barrier instructions. Where the code section satisfies a threshold, the code section is modified, by enclosing the code section within a transaction that employs hardware transactional memory of the computing device, and removing the memory barrier instructions from the code section. Execution of the code section as has been enclosed within the transaction can be monitored to yield monitoring results. Where the monitoring results satisfy an abort threshold corresponding to excessive aborting of the execution of the code section as has been enclosed within the transaction, the code section is split into code sub-sections, and each code sub-section enclosed within a separate transaction that employs the hardware transactional memory. Splitting the code section sections and enclosing each code sub-section within a separate transaction can decrease occurrence of the code section aborting during execution.

Type: Grant

Filed: December 15, 2011

Date of Patent: March 3, 2015

Assignee: International Business Machines Corporation

Inventors: Toshihiko Koju, Takuya Nakaike, Ali Ijaz Sheikh, Harold Wade Cain, III, Maged M. Michael
Systems and methods for handling instructions of in-order and out-of-order execution queues

Patent number: 8966229

Abstract: Processing systems and methods are disclosed that can include an instruction unit which provides instructions for execution by the processor; a decode/issue unit which decodes instructions received from the instruction unit and issues the instructions; and a plurality of execution queues coupled to the decode/issue unit, wherein each issued instruction from the decode/issue unit can be stored into an entry of at least one queue of the plurality of execution queues. The plurality of queues can comprise an independent execution queue, a dependent execution queue, and a plurality of execution units coupled to receive instructions for execution from the plurality of execution queues. The plurality of execution units can comprise a first execution unit, coupled to receive instructions from the dependent execution queue and the independent execution queue which have been selected for execution.

Type: Grant

Filed: August 18, 2011

Date of Patent: February 24, 2015

Assignee: Freescale Semiconductor, Inc.

Inventors: Thang M. Tran, Trinh Huy H. Nguyen
MFENCE and LFENCE micro-architectural implementation method and system

Patent number: 8959314

Abstract: A system and method for fencing memory accesses. Memory loads can be fenced, or all memory access can be fenced. The system receives a fencing instruction that separates memory access instructions into older accesses and newer accesses. A buffer within the memory ordering unit is allocated to the instruction. The access instructions newer than the fencing instruction are stalled. The older access instructions are gradually retired. When all older memory accesses are retired, the fencing instruction is dispatched from the buffer.

Type: Grant

Filed: July 15, 2013

Date of Patent: February 17, 2015

Assignee: Intel Corporation

Inventors: Salvador Palanca, Stephen Fischer, Subramaniam Maiyuran, Shekoufeh Qawami
Processor with cycle offsets and delay lines to allow scheduling of instructions through time

Patent number: 8954714

Abstract: An apparatus includes a processor. The processor includes two memories. The first memory stores one set of instructions. The second memory stores another set of instructions that are longer than the set of instructions in the first memory. An instruction in the set of instructions in the first memory is used as a pointer to a corresponding instruction in the set of instructions in the second memory.

Type: Grant

Filed: February 1, 2010

Date of Patent: February 10, 2015

Assignee: Altera Corporation

Inventor: Steven Perry
HANDLING PRECOMPILED BINARIES IN A HARDWARE ACCELERATED SOFTWARE TRANSACTIONAL MEMORY SYSTEM

Publication number: 20150040111

Abstract: A method and apparatus for enabling a Software Transactional Memory (STM) with precompiled binaries is herein described. Upon encountering an access operation in a transaction, an annotation field associated with a memory location referenced by the access is checked. In response to the memory location representing a previous similar access within the transaction, the access is performed without access barriers. However, if the annotation field is in a default state representing no previous access during a pendancy of the transaction, then a mode of the processor is determined. If the processor mode is in implicit mode, an access handler/barrier is asynchronously executed. Conversely, in an explicit mode, a flag is set instead of asynchronously executing the handler. In addition, during compilation convert explicit and convert implicit instructions are inserted to intelligently convert modes for precompiled and newly compiled binaries.

Type: Application

Filed: May 6, 2014

Publication date: February 5, 2015

Inventors: Bratin Saha, Ali-Reza Adl-Tabatabai, Quinn A. Jacobson
Threshold controlled limited out of order load execution

Patent number: 8949581

Abstract: A load scheduler capable of limited issuing of out of order load instruction is disclosed. The load scheduler uses a max skipping threshold which limits the number of skipping load instructions and a max skipped threshold which limits the number of skipped load instructions. An address tag for a skipping instruction is stored in a skipping load instruction tracking unit when a skipping load instruction is issued. When a skipped load instruction issues, the address tag of the skipped load instruction is compared to the address tag of the skipping instruction to determine if a hazard from the out of order issuing of the skipping load instruction caused a hazard and must be flushed.

Type: Grant

Filed: May 9, 2011

Date of Patent: February 3, 2015

Assignee: Applied Micro Circuits Corporation

Inventors: Matthew W. Ashcraft, John Gregory Favor
HYBRID TAG SCHEDULER

Publication number: 20150026436

Abstract: The present invention provides a method and apparatus for scheduling based on tags of different types. Some embodiments of the method include broadcasting a first tag to entries in a queue of a scheduler. The first tag is broadcast in response to a first instruction associated with a first entry in the queue being picked for execution. The first tag includes information identifying the first entry and information indicating a type of the first tag. Some embodiments of the method also include marking at least one second entry in the queue is ready to be picked for execution in response to at least one second tag associated with at least one second entry in the queue matching the first tag.

Type: Application

Filed: July 17, 2013

Publication date: January 22, 2015

Applicant: Advanced Micro Devices, Inc.

Inventors: Michael Achenbach, Teik Tan, Gregory W. Smaus, Ganesh Venkataramanan, Emil Talpes
Apparatus and method for heterogeneous chip multiprocessors via resource allocation and restriction

Patent number: 8924690

Abstract: A method and apparatus for heterogeneous chip multiprocessors (CMP) via resource restriction. In one embodiment, the method includes the accessing of a resource utilization register to identify a resource utilization policy. Once accessed, a processor controller ensures that the processor core utilizes a shared resource in a manner specified by the resource utilization policy. In one embodiment, each processor core within a CMP includes an instruction issue throttle resource utilization register, an instruction fetch throttle resource utilization register and other like ways of restricting its utilization of shared resources within a minimum and maximum utilization level. In one embodiment, resource restriction provides a flexible manner for allocating current and power resources to processor cores of a CMP that can be controlled by hardware or software. Other embodiments are described and claimed.

Type: Grant

Filed: May 29, 2012

Date of Patent: December 30, 2014

Assignee: Intel Corporation

Inventors: Tryggve Fossum, George Chrysos, Todd A. Dutton
Speculative scheduling of memory instructions in out-of-order processor based on addressing mode comparison

Patent number: 8918625

Abstract: A processor that executes instructions out of program order is described. In some implementations, a processor detects whether a second memory operation is dependent on a first memory operation prior to memory address calculation. If the processor detects that the second memory operation is not dependent on the first memory operation, the processor is configured to allow the second memory operation to be scheduled. If the processor detects that the second memory operation is dependent on the first memory operation, the processor is configured to prevent the second memory operation from being scheduled until the first memory operation has been scheduled to reduce the likelihood of having to reexecute the second memory operation.

Type: Grant

Filed: November 15, 2011

Date of Patent: December 23, 2014

Assignee: Marvell International Ltd.

Inventors: R. Frank O'Bleness, Sujat Jamil, Tom Hameenanttila
ACCELERATED REVERSAL OF SPECULATIVE STATE CHANGES AND RESOURCE RECOVERY

Publication number: 20140372732

Abstract: A method includes undoing, in reverse program order, changes in a state of a processing device caused by speculative instructions previously dispatched for execution in the processing device and concurrently deallocating resources previously allocated to the speculative instructions in response to interruption of dispatch of instructions due to a flush of the speculative instructions. A processor device comprises a retire queue to store entries for instructions that are awaiting retirement and a finite state machine. The finite state machine is to interrupt dispatch of instructions in response to a flush of speculative instructions previously dispatched for execution in the processing device and to undo, in reverse program order, changes in a state of the processing device caused by the speculative instructions while concurrently deallocating resources previously allocated to the speculative instructions.

Type: Application

Filed: June 14, 2013

Publication date: December 18, 2014

Inventors: Jay Fleischman, Michael Estlick
TECHNIQUES FOR SCHEDULING OPERATIONS AT AN INSTRUCTION PIPELINE

Publication number: 20140351562

Abstract: A dispatch stage of a processor core dispatches designated operations (e.g. load/store operations) to a temporary queue when the resources to execute the designated operations are not available. Once the resources become available to execute an operation at the temporary queue, the operation is transferred to a scheduler queue where it can be picked for execution. By dispatching the designated operations to the temporary queue, other operations behind the designated operations in a program order are made available for dispatch to the scheduler queue, thereby improving instruction throughput at the processor core.

Type: Application

Filed: May 23, 2013

Publication date: November 27, 2014

Applicant: Advanced Micro Devices, Inc.

Inventor: Francesco Spadini
SINGLE-CYCLE INSTRUCTION PIPELINE SCHEDULING

Publication number: 20140325187

Abstract: A method includes allocating a first single-cycle instruction to a first pipeline that picks single-cycle instructions for execution in program order. The method further includes marking at least one source register of the first single-cycle instruction as ready for execution in the first pipeline in response to all older single-cycle instructions allocated to the first pipeline being ready and eligible to be picked for execution. An apparatus includes a decoder to decode a first single-cycle instruction and to allocate the first single-cycle instruction to a first pipeline. The apparatus further includes a scheduler to pick single-cycle instructions for execution by the first pipeline in program order and to mark at least one source register of the first single-cycle instruction as ready for execution in the first pipeline in response to determining that all older single-cycle instructions allocated to the first pipeline are ready and eligible.

Type: Application

Filed: April 24, 2013

Publication date: October 30, 2014

Applicant: Advanced Micro Devices, Inc.

Inventors: Michael D. Estlick, Jay E. Fleischman, Kevin A. Hurd, Mark M. Gibson, Kelvin D. Goveas, Brian M. Lay
Mitigating lookahead branch prediction latency by purposely stalling a branch instruction until a delayed branch prediction is received or a timeout occurs

Patent number: 8874885

Abstract: Embodiments relate to mitigation of lookahead branch predication latency. An aspect includes receiving an instruction address in an instruction cache for fetching instructions in a microprocessor pipeline. Another aspect includes receiving the instruction address in a branch presence predictor coupled to the microprocessor pipeline. Another aspect includes determining, by the branch presence predictor, presence of a branch instruction in the instructions being fetched, wherein the branch instruction is predictable by the branch target buffer, and any indication of the instruction address not written to the branch target buffer is also not written to the branch presence predictor. Another aspect includes, based on receipt of an indication that the branch instruction is present from the branch presence predictor, holding the branch instruction.

Type: Grant

Filed: February 12, 2008

Date of Patent: October 28, 2014

Assignee: International Business Machines Corporation

Inventors: James J. Bonanno, David S. Hutton, Brian R. Prasky, Anthony Saporito
ALLOCATING STORE QUEUE ENTRIES TO STORE INSTRUCTIONS FOR EARLY STORE-TO-LOAD FORWARDING

Publication number: 20140310506

Abstract: The present invention provides a method and apparatus for allocating store queue entries to store instructions for early store-to-load forwarding. Some embodiments of the method include allocating an entry in a store queue to a store instruction in response to the store instruction being dispatched and prior to receiving a translation of a virtual address to a physical address associated with the store instruction. The entry includes storage for data to be written to the physical address by the store instruction.

Type: Application

Filed: April 11, 2013

Publication date: October 16, 2014

Applicant: Advanced Micro Devices, Inc.

Inventors: David A Kaplan, Daniel Hopper, Tarun Nakra
METHOD AND APPARATUS FOR PROVIDING AN INTERFACE BETWEEN A UICC AND A PROCESSOR IN AN ACCESS TERMINAL THAT SUPPORTS ASYNCHRONOUS COMMAND PROCESSING BY THE UICC

Publication number: 20140289500

Abstract: Techniques for providing an interface between a UICC and a processor, included in an access terminal, that supports asynchronous command processing by the UICC, are described. A first complex command, with a first processing time, may be received from the processor. An initial response to the first command, including a token, may be sent to the processor. The first command may be processed for the first processing time. At least one additional command, having a processing time shorter than the first processing time, may be received from the processor. Processing of the first command may be completed. Processing of a current one of the at least one additional command, which was being processed before, during, or after completion of the processing of the first command, may be completed. A response to the current one of the at least one additional command, including the token, may be sent to the processor.

Type: Application

Filed: September 19, 2013

Publication date: September 25, 2014

Applicant: QUALCOMM Incorporated

Inventors: Michele BERIONNE, Jose Alfredo Ruvalcaba, Younghwan Kang, Nicholas Matthias Beckmann
PROCESSOR WITH HYBRID PIPELINE CAPABLE OF OPERATING IN OUT-OF-ORDER AND IN-ORDER MODES

Publication number: 20140281402

Abstract: A method and circuit arrangement provide support for a hybrid pipeline that dynamically switches between out-of-order and in-order modes. The hybrid pipeline may selectively execute instructions from at least one instruction stream that require the high performance capabilities provided by out-of-order processing in the out-of-order mode. The hybrid pipeline may also execute instructions that have strict power requirements in the in-order mode where the in-order mode conserves more power compared to the out-of-order mode. Each stage in the hybrid pipeline may be activated and fully functional when the hybrid pipeline is in the out-of-order mode. However, stages in the hybrid pipeline not used for the in-order mode may be deactivated and bypassed by the instructions when the hybrid pipeline dynamically switches from the out-of-order mode to the in-order mode. The deactivated stages may then be reactivated when the hybrid pipeline dynamically switches from the in-order mode to the out-of-order mode.

Type: Application

Filed: March 13, 2013

Publication date: September 18, 2014

Applicant: International Business Machines Corporation

Inventors: Miguel Comparan, Andrew D. Hilton, Hans M. Jacobson, Brian M. Rogers, Robert A. Shearer, Ken V. Vu, Alfred T. Watson, III
CHAINING BETWEEN EXPOSED VECTOR PIPELINES

Publication number: 20140281403

Abstract: Embodiments include a method for chaining data in an exposed-pipeline processing element. The method includes separating a multiple instruction word into a first sub-instruction and a second sub-instruction, receiving the first sub-instruction and the second sub-instruction in the exposed-pipeline processing element. The method also includes issuing the first sub-instruction at a first time, issuing the second sub-instruction at a second time different than the first time, the second time being offset to account for a dependency of the second sub-instruction on a first result from the first sub-instruction, the first pipeline performing the first sub-instruction at a first clock cycle and communicating the first result from performing the first sub-instruction to a chaining bus coupled to the first pipeline and a second pipeline, the communicating at a second clock cycle subsequent to the first clock cycle that corresponds to a total number of latch pipeline stages in the first pipeline.

Type: Application

Filed: August 14, 2013

Publication date: September 18, 2014

Applicant: International Business Machines Corporation

Inventors: Thomas W. Fox, Bruce M. Fleischer, Hans M. Jacobson, Ravi Nair
Evict on write, a management strategy for a prefetch unit and/or first level cache in a multiprocessor system with speculative execution

Patent number: 8838906

Abstract: In a multiprocessor system with at least two levels of cache, a speculative thread may run on a core processor in parallel with other threads. When the thread seeks to do a write to main memory, this access is to be written through the first level cache to the second level cache. After the write though, the corresponding line is deleted from the first level cache and/or prefetch unit, so that any further accesses to the same location in main memory have to be retrieved from the second level cache. The second level cache keeps track of multiple versions of data, where more than one speculative thread is running in parallel, while the first level cache does not have any of the versions during speculation. A switch allows choosing between modes of operation of a speculation blind first level cache.

Type: Grant

Filed: January 4, 2011

Date of Patent: September 16, 2014

Assignee: International Business Machines Corporation

Inventors: Alan Gara, Martin Ohmacht

prev 1 2 3 4 5 6 7 8 … next