Instruction Issuing Patents (Class 712/214)
  • Patent number: 9292419
    Abstract: A device receives code for a technical computing environment, and receives conditions for executing the code. The device performs a static analysis of the code, based on the conditions, to generate static analysis information for the code, and executes the code in the technical computing environment based on the conditions. The device determines coverage information associated with the executing code, where the coverage information provides a measure of completeness associated with the executing code. The device compares the static analysis information and the coverage information to determine confidence information associated with the coverage information, and outputs or stores the coverage information and the confidence information.
    Type: Grant
    Filed: May 29, 2014
    Date of Patent: March 22, 2016
    Assignee: The MathWorks, Inc.
    Inventors: Kiran K. Kintali, Anand Krishnamoorthi, Ebrahim Mestchian, Richard M. McKeever
  • Patent number: 9286075
    Abstract: Systems and methods for efficient out-of-order dynamic deallocation of entries within a shared storage resource in a processor. A processor comprises a unified pick queue that includes an array configured to dynamically allocate any entry of a plurality of entries for a decoded and renamed instruction. This instruction may correspond to any available active threads supported by the processor. The processor includes circuitry configured to determine whether an instruction corresponding to an allocated entry of the plurality of entries is dependent on a speculative instruction and whether the instruction has a fixed instruction execution latency. In response to determining the instruction is not dependent on a speculative instruction, the instruction has a fixed instruction execution latency, and said latency has transpired, the circuitry may deallocate the instruction from the allocated entry.
    Type: Grant
    Filed: September 30, 2009
    Date of Patent: March 15, 2016
    Assignee: Oracle America, Inc.
    Inventors: Matthew B. Smittle, Robert T. Golla
  • Patent number: 9286078
    Abstract: In an apparatus which includes a plurality of processing modules connected via a ring-shape bus, if a plurality pieces of pipeline processing to be processed in a different order is allocated to a plurality of processing modules, the transfer efficiency may decrease when an amount of data transferred from one of the processing modules to a post-stage module exceeds a processing capacity of the post-stage module. Accordingly, a module positioned on the preceding side in the pipeline processing controls a transmission interval of processed data so that the post-stage module can receive the data processed by the preceding module.
    Type: Grant
    Filed: April 28, 2014
    Date of Patent: March 15, 2016
    Assignee: CANON KABUSHIKI KAISHA
    Inventors: Hiroyasu Watanabe, Hirowo Inoue, Hisashi Ishikawa
  • Patent number: 9268575
    Abstract: Methods and apparatuses are provided for flush operations in a processor. The apparatus comprises an out-of-order execution unit for processing instructions issued in-order from an instruction decoder for first and second threads and being configured to identify an errored instruction in a first thread. A retire unit includes a retire queue for receiving completed instructions from the out-of-order execution unit, the retire unit being configured retire older in-order first thread instructions until the errored instruction would be the next instruction to be retired, and then flushing the errored instruction and all later in-order first thread instructions from the retire queue. The method comprises determining that an errored instruction is being processed by an out-of-order execution unit of a processor and continuing to process to completion instructions earlier in-order from the errored instruction until the completion of the errored instruction.
    Type: Grant
    Filed: June 30, 2011
    Date of Patent: February 23, 2016
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Jay E. Fleischman, Emil Talpes, Debjit DasSarma
  • Patent number: 9262160
    Abstract: Load latency speculation in an out-of-order computer processor, including: issuing a load instruction for execution, wherein the load instruction has a predetermined expected execution latency; issuing a dependent instruction wakeup signal on an instruction wakeup bus, wherein the dependent instruction wakeup signal indicates that the load instruction will be completed upon the expiration of the expected execution latency; determining, upon the expiration of the expected execution latency, whether the load instruction has completed; and responsive to determining that the load instruction has not completed upon the expiration of the expected execution latency, issuing a negative dependent instruction wakeup signal on the instruction wakeup bus, wherein the negative dependent instruction wakeup signal indicates that the load instruction has not completed upon the expiration of the expected execution latency.
    Type: Grant
    Filed: March 5, 2013
    Date of Patent: February 16, 2016
    Assignee: International Business Machines Corporation
    Inventors: Timothy H. Heil, Andrew D. Hilton, Adam J. Muff
  • Patent number: 9256428
    Abstract: Load latency speculation in an out-of-order computer processor, including: issuing a load instruction for execution, wherein the load instruction has a predetermined expected execution latency; issuing a dependent instruction wakeup signal on an instruction wakeup bus, wherein the dependent instruction wakeup signal indicates that the load instruction will be completed upon the expiration of the expected execution latency; determining, upon the expiration of the expected execution latency, whether the load instruction has completed; and responsive to determining that the load instruction has not completed upon the expiration of the expected execution latency, issuing a negative dependent instruction wakeup signal on the instruction wakeup bus, wherein the negative dependent instruction wakeup signal indicates that the load instruction has not completed upon the expiration of the expected execution latency.
    Type: Grant
    Filed: February 6, 2013
    Date of Patent: February 9, 2016
    Assignee: International Business Machines Corporation
    Inventors: Timothy H. Heil, Andrew D. Hilton, Adam J. Muff
  • Patent number: 9218199
    Abstract: Embodiments relate to a method, apparatus and program product and for capturing thread specific state timing information. The method includes associating a time field and a time valid field to a thread data structure and setting a current time state by determining a previous time state and updating it according to a previously identified method for setting time states. The method further includes determining status of a time valid bit to see if it is set to valid or invalid. When the status is valid, it is made available for reporting.
    Type: Grant
    Filed: November 23, 2012
    Date of Patent: December 22, 2015
    Assignee: International Business Machines Corporation
    Inventors: Michael H. Dawson, Trent A. Gray-Donald
  • Patent number: 9213608
    Abstract: A computer system includes a simultaneous multi-threading processor and memory in operable communication with the processor. The processor is configured to perform a method including running multiple threads simultaneously, detecting a hardware error in one or more hardware structures of the processing circuit, and identifying one or more victim threads of the multiple threads. The processor is further configured to identify a plurality of hardware structures associated with execution of the one or more victim threads, isolate the one or more victim threads from the rest of the multiple threads by preventing access to the plurality of hardware structures by the multiple threads, flush the one or more victim threads by resetting hardware states of the plurality of hardware structures, and restore the one or more victim threads by restoring the plurality of hardware structures to a known safe state.
    Type: Grant
    Filed: March 8, 2013
    Date of Patent: December 15, 2015
    Assignee: International Business Machines Corporation
    Inventors: Fadi Y. Busaba, Steven R. Carlough, Christopher A. Krygowski, Brian R. Prasky, Chung-Lung K. Shum
  • Patent number: 9192243
    Abstract: The invention utilizes a number of thin, flexible layers placed in parallel to each other and the back wall of a window well. The inner layer is translucent such that when lit, the lighting shines through the layer and illuminates the middle and outer layers or shines through all the layers and through the window well and into the room. Alternatively, the inner layer is reflective so that it reflects light directed towards it into the other layers. The various layers can be variably transparent, translucent and/or opaque and can be shaded. By spacing the layers apart, depth can be imparted to a viewer. The outermost layer can have the appearance of a metallic film and/or artistic representations. This outer film can be stenciled such that patterns of light shine through in various places.
    Type: Grant
    Filed: May 17, 2012
    Date of Patent: November 24, 2015
    Inventor: Sharon Slaughter
  • Patent number: 9158694
    Abstract: A method, information processing device, and computer program product mitigate busy time in a hierarchical store-through memory cache structure. A cache directory associated with a memory cache is divided into a plurality of portions each associated with a portion memory cache. Simultaneous cache lookup operations and cache write operations between the plurality of portions of the cache directory are supported. Two or more store commands are simultaneously processed in a shared cache pipeline communicatively coupled to the plurality of portions of the cache directory.
    Type: Grant
    Filed: October 31, 2012
    Date of Patent: October 13, 2015
    Assignee: International Business Machines Corporation
    Inventors: Deanna P. Berger, Michael F. Fee, Christine C. Jones, Arthur J. O'Neill, Diana L. Orf, Robert J. Sonnelitter, III
  • Patent number: 9152932
    Abstract: A system may create work units, each work unit including at least one of an input port or output port, each work unit configured to modify data that is received via the input port. In addition, the system may compose a workflow by connecting an output port of a first of the work units to an input port of a second of the work units, receive a work order, select the workflow in response to the work order, decompose the workflow into constituent work units, instantiate tasks that correspond to the constituent work units, and execute a work unit process for each of the tasks.
    Type: Grant
    Filed: December 17, 2010
    Date of Patent: October 6, 2015
    Assignee: VERIZON PATENT AND LICENSING INC.
    Inventors: Mohammad Reza Shafiee, Hongfang Li, Wei Liu, Anurag Gupta, Ashutosh K. Sureka, Satya S. Raju
  • Patent number: 9152510
    Abstract: A computer system includes a simultaneous multi-threading processor and memory in operable communication with the processor. The processor is configured to perform a method including running multiple threads simultaneously, detecting a hardware error in one or more hardware structures of the processing circuit, and identifying one or more victim threads of the multiple threads. The processor is further configured to identify a plurality of hardware structures associated with execution of the one or more victim threads, isolate the one or more victim threads from the rest of the multiple threads by preventing access to the plurality of hardware structures by the multiple threads, flush the one or more victim threads by resetting hardware states of the plurality of hardware structures, and restore the one or more victim threads by restoring the plurality of hardware structures to a known safe state.
    Type: Grant
    Filed: July 13, 2012
    Date of Patent: October 6, 2015
    Assignee: International Business Machines Corporation
    Inventors: Fadi Y. Busaba, Steven R. Carlough, Christopher A. Krygowski, Brian R. Prasky, Chung-Lung K. Shum
  • Patent number: 9146741
    Abstract: Eliminating redundant masking operations in instruction processing circuits and related processor systems, methods, and computer-readable media are disclosed. In one embodiment, a first instruction in an instruction stream indicating an operation writing a value to a first register is detected by an instruction processing circuit, the value having a value size less than a size of the first register. The circuit also detects a second instruction in the instruction stream indicating a masking operation on the first register. The masking operation is eliminated upon a determination that the masking operation indicates a read operation and a write operation on the first register and has an identity mask size equal to or greater than the value size. In this manner, the elimination of the masking operation avoids potential read-after-write hazards and improves performance of a CPU by removing redundant operations from an execution pipeline.
    Type: Grant
    Filed: October 19, 2012
    Date of Patent: September 29, 2015
    Assignee: QUALCOMM Incorporated
    Inventors: Melinda J. Brown, Michael William Morrow, James Norris Dieffenderfer, Brian Michael Stempel, Michael Scott McIlvaine
  • Patent number: 9141391
    Abstract: In a processor having an instruction unit, a decode/issue unit, and execution queues configured to provide instructions to correspondingly different types execution units, a method comprises maintaining a duplicate free list for the execution queues. The duplicate free list includes a plurality of duplicate dependent instruction indicators that indicate when a duplicate instruction for a dependent instruction is stored in at least one of the execution queues. One of the duplicate dependent instruction indicators is assigned to an execution queue for a dependent instruction. The dependent instruction is executed only when the one of the duplicate dependent instruction indicators is reset.
    Type: Grant
    Filed: March 14, 2012
    Date of Patent: September 22, 2015
    Assignee: Freescale Semiconductor, Inc.
    Inventors: Thang M. Tran, Trinh Huy Nguyen
  • Patent number: 9116817
    Abstract: A system and method for efficient scheduling of dependent load instructions. A processor includes both an execution core and a scheduler that issues instructions to the execution core. The execution core includes a load-store unit (LSU). The scheduler determines a first condition is satisfied, wherein the first condition comprises result data for a first load instruction is predicted eligible for LSU-internal forwarding. The scheduler determines a second condition is satisfied, wherein the second condition comprises a second load instruction younger in program order than the first load instruction is dependent on the first load instruction. In response to each of the first condition and the second condition being satisfied, the scheduler can issue the second load instruction earlier than it otherwise would. The LSU internally forwards the received result data from the first load instruction to address generation logic for the second load instruction.
    Type: Grant
    Filed: May 9, 2013
    Date of Patent: August 25, 2015
    Assignee: Apple Inc.
    Inventor: Stephan G. Meier
  • Patent number: 9110656
    Abstract: A processor configured to provide instructions of a first instruction type to a first execution unit, and a second execution queue configured to provide instructions of a second instruction type to a second execution unit. A first instruction of the second instruction type is received. The first instruction is decoded by the decode/issue unit to determine operands of the first instruction. The operands of the first instruction are determined to include a dependency on a second instruction of the first instruction type stored in a first entry of the first execution queue. The first instruction is stored in a first entry of the second execution queue. A synchronization indicator corresponding to the first instruction in a second entry of the first execution queue is set immediately adjacent the first entry of the first execution queue, which indicates that the first instruction is stored in another execution queue.
    Type: Grant
    Filed: August 16, 2011
    Date of Patent: August 18, 2015
    Assignee: Freescale Semiconductor, Inc.
    Inventors: Thang M. Tran, Trinh Huy H. Nguyen
  • Patent number: 9098265
    Abstract: A data processing apparatus includes a register bank having a plurality of registers for storing vectors being processed; a pipelined processor for processing the stream of vector instructions; the pipelined processor comprising circuitry configured to detect data dependencies for the vectors processed by the stream of vector instructions and stored in the plurality of registers and to determine constraints on timing of execution for the vector instructions such that no register data hazards arise. Register data hazards arise where two accesses to a same register, at least one of said accesses being a write, occur in an order different to an order of said instruction stream such that an access occurring later in said instruction stream starts before an access occurring earlier in said instruction stream has completed. The pipelined processor includes data element hazard determination circuitry.
    Type: Grant
    Filed: July 11, 2012
    Date of Patent: August 4, 2015
    Assignee: ARM Limited
    Inventor: Alastair David Reid
  • Patent number: 9098268
    Abstract: A system and method for fencing memory accesses. Memory loads can be fenced, or all memory access can be fenced. The system receives a fencing instruction that separates memory access instructions into older accesses and newer accesses. A buffer within the memory ordering unit is allocated to the instruction. The access instructions newer than the fencing instruction are stalled. The older access instructions are gradually retired. When all older memory accesses are retired, the fencing instruction is dispatched from the buffer.
    Type: Grant
    Filed: September 14, 2012
    Date of Patent: August 4, 2015
    Assignee: Intel Corporation
    Inventors: Salvador Palanca, Stephen A. Fischer, Subramaniam Maiyuran, Shekoufeh Qawami
  • Patent number: 9043581
    Abstract: A processing apparatus includes an execution unit which performs computation on two operand inputs each being selectable between read data from a register and an immediate value. The processing apparatus also includes another execution unit which performs computation on two operand inputs, one of which is selectable between read data from a register and an immediate value, and the other of which is an immediate value. A control unit determines, based on a received instruction specifying a computation on two operands, whether each of the two operands specifies read data from a register or an immediate value. Depending on the determination result, the control unit causes one of the execution units to execute the computation specified by the received instruction.
    Type: Grant
    Filed: November 14, 2011
    Date of Patent: May 26, 2015
    Assignee: FUJITSU LIMITED
    Inventor: Masaki Ukai
  • Patent number: 9043582
    Abstract: Systems and methods for static code scheduling are disclosed. A method can include receiving an intermediate representation of source code, building a directed acyclic graph (DAG) for the intermediate representation, and creating chains of dependent instructions from the DAG for cluster formation. The chains are merged into clusters and each node in the DAG is marked with an identifier of a cluster it is part of to generate a marked instruction DAG. Instruction DAG scheduling is then performed using information about the clusters to generate an ordered intermediate representation of the source code.
    Type: Grant
    Filed: September 14, 2012
    Date of Patent: May 26, 2015
    Assignee: Qualcomm Innovation Center, Inc.
    Inventor: Sergei Larin
  • Patent number: 9032188
    Abstract: A multi-threaded in-order superscalar processor 2 includes an issue stage 12 including issue circuitry 22, 24 for selecting instructions to be issued to execution units 14, 16 in dependence upon a currently selected issue policy. A plurality of different issue policies are provided by associated different policy circuitry 28, 30, 32 and a selection between which of these instances of the policy circuitry 28, 30, 32 is active is made by policy selecting circuitry 34 in dependence upon detected dynamic behavior of the processor 2.
    Type: Grant
    Filed: March 27, 2008
    Date of Patent: May 12, 2015
    Assignee: ARM Limited
    Inventors: Emre Özer, Stuart David Biles
  • Patent number: 9032377
    Abstract: A computing method includes accepting a definition of a computing task, which includes multiple Processing Elements (PEs) having execution dependencies. The computing task is compiled for concurrent execution on a multiprocessor device, by arranging the PEs in a series of two or more invocations of the multiprocessor device, including assigning the PEs to the invocations depending on the execution dependencies. The multiprocessor device is invoked to run software code that executes the series of the invocations, so as to produce a result of the computing task.
    Type: Grant
    Filed: June 2, 2013
    Date of Patent: May 12, 2015
    Assignee: Rocketick Technologies Ltd.
    Inventors: Shay Mizrachi, Uri Tal, Tomer Ben-David, Ishay Geller, Ido Kasher, Ronen Gal
  • Patent number: 9026769
    Abstract: A processor for processing loop instructions can include an instruction reorder structure and a loop processing controller. The instruction reorder structure is configured to store decoded instructions according to program order and issue the decoded instructions for execution out of program order. The loop processing controller is configured to detect a loop in the decoded instructions stored in the instruction reorder structure and cause the instruction reorder structure to reissue the decoded instructions that form the loop for re-execution.
    Type: Grant
    Filed: January 24, 2012
    Date of Patent: May 5, 2015
    Assignee: Marvell International Ltd.
    Inventors: Sujat Jamil, R. Frank O'Bleness, Joseph Delgross, Tom Hameenanttila
  • Publication number: 20150113251
    Abstract: System and methods are provided for register allocation. An original code block and a target code block associated with a branch of an execution loop are determined. An original allocation of a plurality of physical registers to one or more original variables associated with the original code block is detected. A target allocation of the plurality of physical registers to one or more target variables associated with the target code block is determined. One or more temporary registers are selected from the plurality of physical registers based at least in part on the original allocation and the target allocation. The original allocation is changed to the target allocation using the selected temporary registers. Specifically, one or more instructions are generated to change the original allocation to the target allocation using the selected temporary registers. The instructions are executed using one or more processors.
    Type: Application
    Filed: September 12, 2014
    Publication date: April 23, 2015
    Inventors: Ningsheng Jian, Yuheng Zhang, Liping Gao, Haitao Huang, Xinyu Qi
  • Patent number: 9015449
    Abstract: An approach is provided in which a thread is selected from multiple active threads, along with a corresponding weighting value. Computational logic determines whether one of the multiple threads is dispatching an instruction and, if so, computes a dispatch weighting value using the selected weighting value and a dispatch factor that indicates a weighting adjustment of the selected weighting value. In turn, a resource utilization value of the selected thread is computed using the dispatch weighting value.
    Type: Grant
    Filed: March 27, 2011
    Date of Patent: April 21, 2015
    Assignee: International Business Machines Corporation
    Inventors: James Wilson Bishop, Michael J. Genden, Steven Bradford Herndon, Philip Lee Vitale
  • Patent number: 9009451
    Abstract: A system and method for reducing power consumption through issue throttling of selected problematic instructions. A power throttle unit within a processor maintains instruction issue counts for associated instruction types. The instruction types may be a subset of supported instruction types executed by an execution core within the processor. The instruction types may be chosen based on high power consumption estimates for processing instructions of these types. The power throttle unit may determine a given instruction issue count exceeds a given threshold. In response, the power throttle unit may select given instruction types to limit a respective issue rate. The power throttle unit may choose an issue rate for each one of the selected given instruction types and limit an associated issue rate to a chosen issue rate. The selection of given instruction types and associated issue rate limits is programmable.
    Type: Grant
    Filed: October 31, 2011
    Date of Patent: April 14, 2015
    Assignee: Apple Inc.
    Inventors: Daniel C. Murray, Andrew J. Beaumont-Smith, John H. Mylius, Peter J. Bannon, Toshi Takayanagi, Jung Wook Cho
  • Publication number: 20150095618
    Abstract: An out of order processor. The processor includes a virtual load store queue for allocating a plurality of loads and a plurality of stores, wherein more loads and more stores can be accommodated beyond an actual physical size of the load store queue of the processor; wherein the processor allocates other instructions besides loads and stores beyond the actual physical size limitation of the load/store queue; and wherein the other instructions can be dispatched and executed even though intervening loads or stores do not have spaces in the load store queue.
    Type: Application
    Filed: December 11, 2014
    Publication date: April 2, 2015
    Inventor: Mohammad A. ABDALLAH
  • Patent number: 8996847
    Abstract: Global register protection in a multi-threaded processor is described. In an embodiment, global resources within a multi-threaded processor are protected by performing checks, before allowing a thread to write to a global resource, to determine whether the thread has write access to the particular global resource. The check involves accessing one or more local control registers or a global control field within the multi-threaded processor and in an example, a local register associated with each other thread in the multi-threaded processor is accessed and checked to see whether it contains an identifier for the particular global resource. Only if none of the accessed local resources contain such an identifier, is the instruction issued and the thread allowed to write to the global resource. Otherwise, the instruction is blocked and an exception may be raised to alert the program that issued the instruction that the write failed.
    Type: Grant
    Filed: February 28, 2013
    Date of Patent: March 31, 2015
    Assignee: Imagination Technologies Limited
    Inventors: Guixin Wang, Hugh Jackson, Robert Graham Isherwood
  • Publication number: 20150089198
    Abstract: An issue control unit is configured to control the rate at which an instruction issue unit issues instructions to an execution pipeline in order to avoid spikes in power drawn by that execution pipeline. The issue control unit maintains a history buffer that reflects, for N previous cycles, the number of instructions issued during each of those N cycles. If the total number of instructions issued during the N previous cycles exceeds a threshold value, then the issue control unit throttles the instruction issue unit from issuing instructions during a subsequent cycle. In addition, the issue control unit increases the threshold value in proportion to the number of previously issued instructions and based on a variety of configurable parameters. Accordingly, the issue control unit maintains granular control over the rate with which the instruction issue unit “ramps up” to a maximum instruction issue rate.
    Type: Application
    Filed: September 20, 2013
    Publication date: March 26, 2015
    Applicant: NVIDIA CORPORATION
    Inventors: Peter SOMMERS, Peter NELSON, Aniket NAIK, John H. EDMONDSON
  • Patent number: 8984259
    Abstract: A method, system, and computer program product for optimizing runtime branch selection in a flow process are provided. The method includes gathering performance metrics of flow branch behavior for executed flows in a runtime system over time and using aggregated performance metrics for the behavior to determine an optimal ordering of branches for a currently running flow. The optimal ordering is determined by identifying one or more branch points in the flow, generating ordering permutations for at least a portion of the branches in the branch point for the flow to identify any permutations that have not been executed, gathering metrics for permutation(s) of the branch point in the flow, comparing the metrics to performance metrics of executed flows having substantially similar flow branch behavior, and identifying optimal branch ordering for the permutation(s) based upon the comparison. The method also includes executing the flow according to the optimal branch ordering.
    Type: Grant
    Filed: November 4, 2008
    Date of Patent: March 17, 2015
    Assignee: International Business Machines Corporation
    Inventors: Brian Hulse, Callum P. Jackson, Christopher Kalus, Ian W. Parkinson, Robert W. Phippen, Amanda J. Watkinson
  • Publication number: 20150074379
    Abstract: Embodiments are provided for an asynchronous processor with token-based very long instruction word architecture. The asynchronous processor comprises a memory configured to cache a plurality of instructions, a feedback engine configured to receive the instructions in bundles of instructions at a time (referred to as very long instruction word) and to decode the instructions, and a crossbar bus configured to transfer calculation information and results of the asynchronous processor. The apparatus further comprises a plurality of sets of execution units (XUs) between the feedback engine and the crossbar bus. Each set of the sets of XUs comprises a plurality of XUs arranged in series and configured to process a bundle of instructions received at the each set from the feedback engine.
    Type: Application
    Filed: September 8, 2014
    Publication date: March 12, 2015
    Inventors: Yiqun Ge, Wuxian Shi, Qifan Zhang, Tao Huang, Wen Tong
  • Patent number: 8977837
    Abstract: At least one instruction of a sequence of program instructions has a plurality of alternative outcomes including at least a first outcome that is independent of at least one operand and a second outcome that is dependent on the at least one operand. The at least one operand is a value generated by a preceding instruction in the sequence. The at least one instruction is issued for execution independently of when the at least one operand is generated by the preceding instruction. Recovery circuitry is provided to perform a recovery operation in the event that the second outcome is to be executed for the at least one instruction and the at least one operand has not been generated by the preceding instruction when the at least one instruction is to be executed by said instruction execution circuitry.
    Type: Grant
    Filed: May 27, 2009
    Date of Patent: March 10, 2015
    Assignee: ARM Limited
    Inventors: Robert Gregory McDonald, Paul Gilbert Meyer
  • Patent number: 8972704
    Abstract: A code section of a computer program to be executed by a computing device includes memory barrier instructions. Where the code section satisfies a threshold, the code section is modified, by enclosing the code section within a transaction that employs hardware transactional memory of the computing device, and removing the memory barrier instructions from the code section. Execution of the code section as has been enclosed within the transaction can be monitored to yield monitoring results. Where the monitoring results satisfy an abort threshold corresponding to excessive aborting of the execution of the code section as has been enclosed within the transaction, the code section is split into code sub-sections, and each code sub-section enclosed within a separate transaction that employs the hardware transactional memory. Splitting the code section sections and enclosing each code sub-section within a separate transaction can decrease occurrence of the code section aborting during execution.
    Type: Grant
    Filed: December 15, 2011
    Date of Patent: March 3, 2015
    Assignee: International Business Machines Corporation
    Inventors: Toshihiko Koju, Takuya Nakaike, Ali Ijaz Sheikh, Harold Wade Cain, III, Maged M. Michael
  • Patent number: 8966229
    Abstract: Processing systems and methods are disclosed that can include an instruction unit which provides instructions for execution by the processor; a decode/issue unit which decodes instructions received from the instruction unit and issues the instructions; and a plurality of execution queues coupled to the decode/issue unit, wherein each issued instruction from the decode/issue unit can be stored into an entry of at least one queue of the plurality of execution queues. The plurality of queues can comprise an independent execution queue, a dependent execution queue, and a plurality of execution units coupled to receive instructions for execution from the plurality of execution queues. The plurality of execution units can comprise a first execution unit, coupled to receive instructions from the dependent execution queue and the independent execution queue which have been selected for execution.
    Type: Grant
    Filed: August 18, 2011
    Date of Patent: February 24, 2015
    Assignee: Freescale Semiconductor, Inc.
    Inventors: Thang M. Tran, Trinh Huy H. Nguyen
  • Patent number: 8959314
    Abstract: A system and method for fencing memory accesses. Memory loads can be fenced, or all memory access can be fenced. The system receives a fencing instruction that separates memory access instructions into older accesses and newer accesses. A buffer within the memory ordering unit is allocated to the instruction. The access instructions newer than the fencing instruction are stalled. The older access instructions are gradually retired. When all older memory accesses are retired, the fencing instruction is dispatched from the buffer.
    Type: Grant
    Filed: July 15, 2013
    Date of Patent: February 17, 2015
    Assignee: Intel Corporation
    Inventors: Salvador Palanca, Stephen Fischer, Subramaniam Maiyuran, Shekoufeh Qawami
  • Patent number: 8954714
    Abstract: An apparatus includes a processor. The processor includes two memories. The first memory stores one set of instructions. The second memory stores another set of instructions that are longer than the set of instructions in the first memory. An instruction in the set of instructions in the first memory is used as a pointer to a corresponding instruction in the set of instructions in the second memory.
    Type: Grant
    Filed: February 1, 2010
    Date of Patent: February 10, 2015
    Assignee: Altera Corporation
    Inventor: Steven Perry
  • Publication number: 20150040111
    Abstract: A method and apparatus for enabling a Software Transactional Memory (STM) with precompiled binaries is herein described. Upon encountering an access operation in a transaction, an annotation field associated with a memory location referenced by the access is checked. In response to the memory location representing a previous similar access within the transaction, the access is performed without access barriers. However, if the annotation field is in a default state representing no previous access during a pendancy of the transaction, then a mode of the processor is determined. If the processor mode is in implicit mode, an access handler/barrier is asynchronously executed. Conversely, in an explicit mode, a flag is set instead of asynchronously executing the handler. In addition, during compilation convert explicit and convert implicit instructions are inserted to intelligently convert modes for precompiled and newly compiled binaries.
    Type: Application
    Filed: May 6, 2014
    Publication date: February 5, 2015
    Inventors: Bratin Saha, Ali-Reza Adl-Tabatabai, Quinn A. Jacobson
  • Patent number: 8949581
    Abstract: A load scheduler capable of limited issuing of out of order load instruction is disclosed. The load scheduler uses a max skipping threshold which limits the number of skipping load instructions and a max skipped threshold which limits the number of skipped load instructions. An address tag for a skipping instruction is stored in a skipping load instruction tracking unit when a skipping load instruction is issued. When a skipped load instruction issues, the address tag of the skipped load instruction is compared to the address tag of the skipping instruction to determine if a hazard from the out of order issuing of the skipping load instruction caused a hazard and must be flushed.
    Type: Grant
    Filed: May 9, 2011
    Date of Patent: February 3, 2015
    Assignee: Applied Micro Circuits Corporation
    Inventors: Matthew W. Ashcraft, John Gregory Favor
  • Publication number: 20150026436
    Abstract: The present invention provides a method and apparatus for scheduling based on tags of different types. Some embodiments of the method include broadcasting a first tag to entries in a queue of a scheduler. The first tag is broadcast in response to a first instruction associated with a first entry in the queue being picked for execution. The first tag includes information identifying the first entry and information indicating a type of the first tag. Some embodiments of the method also include marking at least one second entry in the queue is ready to be picked for execution in response to at least one second tag associated with at least one second entry in the queue matching the first tag.
    Type: Application
    Filed: July 17, 2013
    Publication date: January 22, 2015
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Michael Achenbach, Teik Tan, Gregory W. Smaus, Ganesh Venkataramanan, Emil Talpes
  • Patent number: 8924690
    Abstract: A method and apparatus for heterogeneous chip multiprocessors (CMP) via resource restriction. In one embodiment, the method includes the accessing of a resource utilization register to identify a resource utilization policy. Once accessed, a processor controller ensures that the processor core utilizes a shared resource in a manner specified by the resource utilization policy. In one embodiment, each processor core within a CMP includes an instruction issue throttle resource utilization register, an instruction fetch throttle resource utilization register and other like ways of restricting its utilization of shared resources within a minimum and maximum utilization level. In one embodiment, resource restriction provides a flexible manner for allocating current and power resources to processor cores of a CMP that can be controlled by hardware or software. Other embodiments are described and claimed.
    Type: Grant
    Filed: May 29, 2012
    Date of Patent: December 30, 2014
    Assignee: Intel Corporation
    Inventors: Tryggve Fossum, George Chrysos, Todd A. Dutton
  • Patent number: 8918625
    Abstract: A processor that executes instructions out of program order is described. In some implementations, a processor detects whether a second memory operation is dependent on a first memory operation prior to memory address calculation. If the processor detects that the second memory operation is not dependent on the first memory operation, the processor is configured to allow the second memory operation to be scheduled. If the processor detects that the second memory operation is dependent on the first memory operation, the processor is configured to prevent the second memory operation from being scheduled until the first memory operation has been scheduled to reduce the likelihood of having to reexecute the second memory operation.
    Type: Grant
    Filed: November 15, 2011
    Date of Patent: December 23, 2014
    Assignee: Marvell International Ltd.
    Inventors: R. Frank O'Bleness, Sujat Jamil, Tom Hameenanttila
  • Publication number: 20140372732
    Abstract: A method includes undoing, in reverse program order, changes in a state of a processing device caused by speculative instructions previously dispatched for execution in the processing device and concurrently deallocating resources previously allocated to the speculative instructions in response to interruption of dispatch of instructions due to a flush of the speculative instructions. A processor device comprises a retire queue to store entries for instructions that are awaiting retirement and a finite state machine. The finite state machine is to interrupt dispatch of instructions in response to a flush of speculative instructions previously dispatched for execution in the processing device and to undo, in reverse program order, changes in a state of the processing device caused by the speculative instructions while concurrently deallocating resources previously allocated to the speculative instructions.
    Type: Application
    Filed: June 14, 2013
    Publication date: December 18, 2014
    Inventors: Jay Fleischman, Michael Estlick
  • Publication number: 20140351562
    Abstract: A dispatch stage of a processor core dispatches designated operations (e.g. load/store operations) to a temporary queue when the resources to execute the designated operations are not available. Once the resources become available to execute an operation at the temporary queue, the operation is transferred to a scheduler queue where it can be picked for execution. By dispatching the designated operations to the temporary queue, other operations behind the designated operations in a program order are made available for dispatch to the scheduler queue, thereby improving instruction throughput at the processor core.
    Type: Application
    Filed: May 23, 2013
    Publication date: November 27, 2014
    Applicant: Advanced Micro Devices, Inc.
    Inventor: Francesco Spadini
  • Publication number: 20140325187
    Abstract: A method includes allocating a first single-cycle instruction to a first pipeline that picks single-cycle instructions for execution in program order. The method further includes marking at least one source register of the first single-cycle instruction as ready for execution in the first pipeline in response to all older single-cycle instructions allocated to the first pipeline being ready and eligible to be picked for execution. An apparatus includes a decoder to decode a first single-cycle instruction and to allocate the first single-cycle instruction to a first pipeline. The apparatus further includes a scheduler to pick single-cycle instructions for execution by the first pipeline in program order and to mark at least one source register of the first single-cycle instruction as ready for execution in the first pipeline in response to determining that all older single-cycle instructions allocated to the first pipeline are ready and eligible.
    Type: Application
    Filed: April 24, 2013
    Publication date: October 30, 2014
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Michael D. Estlick, Jay E. Fleischman, Kevin A. Hurd, Mark M. Gibson, Kelvin D. Goveas, Brian M. Lay
  • Patent number: 8874885
    Abstract: Embodiments relate to mitigation of lookahead branch predication latency. An aspect includes receiving an instruction address in an instruction cache for fetching instructions in a microprocessor pipeline. Another aspect includes receiving the instruction address in a branch presence predictor coupled to the microprocessor pipeline. Another aspect includes determining, by the branch presence predictor, presence of a branch instruction in the instructions being fetched, wherein the branch instruction is predictable by the branch target buffer, and any indication of the instruction address not written to the branch target buffer is also not written to the branch presence predictor. Another aspect includes, based on receipt of an indication that the branch instruction is present from the branch presence predictor, holding the branch instruction.
    Type: Grant
    Filed: February 12, 2008
    Date of Patent: October 28, 2014
    Assignee: International Business Machines Corporation
    Inventors: James J. Bonanno, David S. Hutton, Brian R. Prasky, Anthony Saporito
  • Publication number: 20140310506
    Abstract: The present invention provides a method and apparatus for allocating store queue entries to store instructions for early store-to-load forwarding. Some embodiments of the method include allocating an entry in a store queue to a store instruction in response to the store instruction being dispatched and prior to receiving a translation of a virtual address to a physical address associated with the store instruction. The entry includes storage for data to be written to the physical address by the store instruction.
    Type: Application
    Filed: April 11, 2013
    Publication date: October 16, 2014
    Applicant: Advanced Micro Devices, Inc.
    Inventors: David A Kaplan, Daniel Hopper, Tarun Nakra
  • Publication number: 20140289500
    Abstract: Techniques for providing an interface between a UICC and a processor, included in an access terminal, that supports asynchronous command processing by the UICC, are described. A first complex command, with a first processing time, may be received from the processor. An initial response to the first command, including a token, may be sent to the processor. The first command may be processed for the first processing time. At least one additional command, having a processing time shorter than the first processing time, may be received from the processor. Processing of the first command may be completed. Processing of a current one of the at least one additional command, which was being processed before, during, or after completion of the processing of the first command, may be completed. A response to the current one of the at least one additional command, including the token, may be sent to the processor.
    Type: Application
    Filed: September 19, 2013
    Publication date: September 25, 2014
    Applicant: QUALCOMM Incorporated
    Inventors: Michele BERIONNE, Jose Alfredo Ruvalcaba, Younghwan Kang, Nicholas Matthias Beckmann
  • Publication number: 20140281402
    Abstract: A method and circuit arrangement provide support for a hybrid pipeline that dynamically switches between out-of-order and in-order modes. The hybrid pipeline may selectively execute instructions from at least one instruction stream that require the high performance capabilities provided by out-of-order processing in the out-of-order mode. The hybrid pipeline may also execute instructions that have strict power requirements in the in-order mode where the in-order mode conserves more power compared to the out-of-order mode. Each stage in the hybrid pipeline may be activated and fully functional when the hybrid pipeline is in the out-of-order mode. However, stages in the hybrid pipeline not used for the in-order mode may be deactivated and bypassed by the instructions when the hybrid pipeline dynamically switches from the out-of-order mode to the in-order mode. The deactivated stages may then be reactivated when the hybrid pipeline dynamically switches from the in-order mode to the out-of-order mode.
    Type: Application
    Filed: March 13, 2013
    Publication date: September 18, 2014
    Applicant: International Business Machines Corporation
    Inventors: Miguel Comparan, Andrew D. Hilton, Hans M. Jacobson, Brian M. Rogers, Robert A. Shearer, Ken V. Vu, Alfred T. Watson, III
  • Publication number: 20140281403
    Abstract: Embodiments include a method for chaining data in an exposed-pipeline processing element. The method includes separating a multiple instruction word into a first sub-instruction and a second sub-instruction, receiving the first sub-instruction and the second sub-instruction in the exposed-pipeline processing element. The method also includes issuing the first sub-instruction at a first time, issuing the second sub-instruction at a second time different than the first time, the second time being offset to account for a dependency of the second sub-instruction on a first result from the first sub-instruction, the first pipeline performing the first sub-instruction at a first clock cycle and communicating the first result from performing the first sub-instruction to a chaining bus coupled to the first pipeline and a second pipeline, the communicating at a second clock cycle subsequent to the first clock cycle that corresponds to a total number of latch pipeline stages in the first pipeline.
    Type: Application
    Filed: August 14, 2013
    Publication date: September 18, 2014
    Applicant: International Business Machines Corporation
    Inventors: Thomas W. Fox, Bruce M. Fleischer, Hans M. Jacobson, Ravi Nair
  • Patent number: 8838906
    Abstract: In a multiprocessor system with at least two levels of cache, a speculative thread may run on a core processor in parallel with other threads. When the thread seeks to do a write to main memory, this access is to be written through the first level cache to the second level cache. After the write though, the corresponding line is deleted from the first level cache and/or prefetch unit, so that any further accesses to the same location in main memory have to be retrieved from the second level cache. The second level cache keeps track of multiple versions of data, where more than one speculative thread is running in parallel, while the first level cache does not have any of the versions during speculation. A switch allows choosing between modes of operation of a speculation blind first level cache.
    Type: Grant
    Filed: January 4, 2011
    Date of Patent: September 16, 2014
    Assignee: International Business Machines Corporation
    Inventors: Alan Gara, Martin Ohmacht