Simultaneous Issuance Of Multiple Instructions Patents (Class 712/215)
  • Patent number: 10747541
    Abstract: Instructions are executed in a pipeline. Storage accessible to the pipeline stores branch prediction information characterizing results of branch instructions previously executed. A predicted branch result is provided, for at least some branch instructions, based on a selected predictor of multiple predictors. An actual branch result is provided based on an executed branch instruction, and the branch prediction information is updated based on the actual branch result. The predictors include: a first predictor that determines the predicted branch result based on at least a portion of the branch prediction information; and a second predictor that determines the predicted branch result independently from the branch prediction information.
    Type: Grant
    Filed: January 25, 2018
    Date of Patent: August 18, 2020
    Assignee: Marvell Asia Pte, Ltd.
    Inventors: Shubhendu Sekhar Mukherjee, David Kravitz, Edward J. McLellan
  • Patent number: 10732976
    Abstract: A processor includes an instruction pipeline. The pipeline can be operated alternatively in a multi-thread mode and in a single-thread mode. In the multi-thread mode, the instruction pipeline processes multiple threads in an interleaved or simultaneous manner. In the single-thread mode, the pipeline processes a single thread. The instruction pipeline comprises multiple functional units, each of which is reserved for one thread among the multiple threads when the pipeline is in the multi-thread mode and reserved for one context layer among multiple context layers when the instruction pipeline is in the single-thread mode.
    Type: Grant
    Filed: January 10, 2013
    Date of Patent: August 4, 2020
    Assignee: NXP USA, Inc.
    Inventors: Alistair Robertson, Jeffrey W. Scott
  • Patent number: 10719325
    Abstract: Very long instruction word (VLIW) instruction processing using a reduced-width processor is disclosed. In a particular embodiment, a VLIW processor includes a control circuit configured to receive a VLIW packet that includes a first number of instructions and to distribute the instructions to a second number of instruction execution paths. The first number is greater than the second number. The VLIW processor also includes physical registers configured to store results of executing the instructions and a register renaming circuit that is coupled to the control circuit.
    Type: Grant
    Filed: November 7, 2017
    Date of Patent: July 21, 2020
    Assignee: Qualcomm Incorporated
    Inventors: Peter Sassone, Christopher Koob, Suresh Kumar Venkumahanti
  • Patent number: 10705851
    Abstract: A method for scheduling micro-instructions, performed by a first qualifier, is provided. The method includes the following steps: detecting a write-back signal broadcasted by a second qualifier; determining whether a value of a first load-detection counting logic is to be synchronized with a value of a second load-detection counting logic carried by the write-back signal according to content of the write-back signal; determining whether execution statuses of all load micro-instructions are cache hit when the synchronized value of the first load-detection counting logic reaches a predetermined value; and driving a release circuit to remove a micro-instruction in a reservation station queue when the execution statuses of the all load micro-instructions are cache hit and the micro-instruction has been dispatched to an arithmetic and logic unit for execution.
    Type: Grant
    Filed: October 2, 2018
    Date of Patent: July 7, 2020
    Assignee: SHANGHAI ZHAOXIN SEMICONDUCTOR CO., LTD.
    Inventor: Xiaolong Fei
  • Patent number: 10698729
    Abstract: The invention relates to a method for organizing tasks, in at least some nodes of a computer cluster, comprising: First, launching two containers on each of said nodes, a standard container and a priority container, next, for all or part of said nodes with two containers, at each node, while a priority task does not occur, assigning one or more available resources of the node to the standard container thereof in order to execute a standard task, the priority container thereof not executing any task, when a priority task occurs, dynamically switching only a portion of the resources from the standard container thereof to the priority container thereof, such that, the priority task is executed in the priority container with the switched portion of the resources, and the standard task continues to be executed, without being halted, in the standard container with the non-switched portion of the resources.
    Type: Grant
    Filed: December 16, 2015
    Date of Patent: June 30, 2020
    Assignee: BULL SAS
    Inventors: Yann Maupu, Thomas Cadeau, Matthieu Daniel
  • Patent number: 10678542
    Abstract: Systems, apparatuses, and methods for implementing a non-shifting reservation station. A dispatch unit may write an operation into any entry of a reservation station. The reservation station may include an age matrix for determining the relative ages of the operations stored in the entries of the reservation station. The reservation station may include selection logic which is configured to pick the oldest ready operation from the reservation station based on the values stored in the age matrix. The selection logic may utilize control logic to mask off columns of an age matrix corresponding to non-ready operation so as to determine which operation is the oldest ready operation in the reservation station. Also, the reservation station may be configured to dequeue operations early when these operations do not have load dependency.
    Type: Grant
    Filed: July 24, 2015
    Date of Patent: June 9, 2020
    Assignee: Apple Inc.
    Inventors: Ian D. Kountanis, Mahesh K. Reddy
  • Patent number: 10620960
    Abstract: An apparatus and method are provided for performing branch prediction. The apparatus has processing circuitry for executing instructions out-of-order with respect to original program order, and event counting prediction circuitry for maintaining event count values for branch instructions, for use in making branch outcome predictions for those branch instructions. Further, checkpointing storage stores state information of the apparatus at a plurality of checkpoints to enable the state information to be restored for a determined one of those checkpoints in response to a flush event. The event counting prediction circuitry has training storage with a first number of training entries, each training entry being associated with a branch instruction.
    Type: Grant
    Filed: August 20, 2018
    Date of Patent: April 14, 2020
    Assignee: Arm Limited
    Inventors: Houdhaifa Bouzguarrou, Guillaume Bolbenes, Vincenzo Consales
  • Patent number: 10613993
    Abstract: Program code intended to be copied into the cache memory of a microprocessor is transferred encrypted between the random-access memory and the processor, and the decryption is carried out at the level of the cache memory. A checksum may be inserted into the cache lines in order to allow integrity verification, and this checksum is then replaced with a specific instruction before delivery of an instruction word to the central unit of the microprocessor.
    Type: Grant
    Filed: January 30, 2015
    Date of Patent: April 7, 2020
    Assignee: STMICROELECTRONICS SA
    Inventor: Bruno Fel
  • Patent number: 10606590
    Abstract: Technical solutions are described for out-of-order (OoO) execution of one or more instructions by a processing unit. An example method includes looking up, by a load-store unit (LSU), an entry in an effective address directory (EAD) for an effective address (EA) of an operand of an instruction to be launched. Further, the method includes, in response to the EA being present in the EAD, launching, by the LSU, the instruction with the RA from the EAD, and in response to the EA not being present in the EAD, looking up, by the LSU, the EA in an effective real table (ERT) entry, and launching the instruction with the RA from the ERT entry. Further, in response to the ERT entry to be removed, the ERT entry including an ERT index and a mapping between the EA and the RA, removing the entry of the EA from the EAD.
    Type: Grant
    Filed: October 6, 2017
    Date of Patent: March 31, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Bryan Lloyd, Balaram Sinharoy
  • Patent number: 10606593
    Abstract: Technical solutions are described for out-of-order (OoO) execution of one or more instructions by a processing unit. An example method includes looking up, by a load-store unit (LSU), an entry in an effective address directory (EAD) for an effective address (EA) of an operand of an instruction to be launched. Further, the method includes, in response to the EA being present in the EAD, launching, by the LSU, the instruction with the RA from the EAD, and in response to the EA not being present in the EAD, looking up, by the LSU, the EA in an effective real table (ERT) entry, and launching the instruction with the RA from the ERT entry. Further, in response to the ERT entry to be removed, the ERT entry including an ERT index and a mapping between the EA and the RA, removing the entry of the EA from the EAD.
    Type: Grant
    Filed: November 29, 2017
    Date of Patent: March 31, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Bryan Lloyd, Balaram Sinharoy
  • Patent number: 10592298
    Abstract: A system and method for processing a data packet. The method comprises initiating processing of a received plurality of data packets by CPU cores; tracking, by a scale management routine, processing queues for the CPU cores and their load. In response to an average size of a processing queue being lower than a first pre-determined queue threshold, and a CPU core load being lower than a first pre-determined load threshold, preventing adding new data packets to the processing queue, monitoring emptying of processing queues for each processing CPU core. In response to an average size of a processing queue or a CPU core load being above a second pre-determined upper queue threshold or the second pre-determined load threshold, transmitting all data from processing queues for each processing CPU core to a memory buffer, increasing the number of processing cores by one; and initiating data packet processing.
    Type: Grant
    Filed: January 26, 2018
    Date of Patent: March 17, 2020
    Assignee: NFWARE, INC.
    Inventors: Alexander Britkin, Viacheslav Morozov, Igor Pavlov
  • Patent number: 10585669
    Abstract: A system and method suppresses occurrence of stalling caused by data dependency other than register dependency in an out-of-order processor. A stall reducing method includes a handler for detecting a stall occurring during execution of execution code using a performance monitoring unit, and for identifying, based on dependencies, a second instruction on which a first instruction is data dependent, the stall based on this dependency. A profiler registers the second instruction as profile information. An optimization module inserts a thread yield instruction in an appropriate position inside execution code or an original code file based on the profile information, and outputs optimized execution code.
    Type: Grant
    Filed: July 31, 2018
    Date of Patent: March 10, 2020
    Assignee: International Business Machines Corporation
    Inventor: Takeshi Ogasawara
  • Patent number: 10564976
    Abstract: Aspects of the invention include tracking dependencies between instructions in an issue queue. The tracking includes, for each instruction in the issue queue, identifying whether the instruction is dependent on each of a threshold number of instructions added to the issue queue prior to the instruction. The tracking also includes identifying whether the instruction is dependent on one or more other instructions added to the issue queue prior to the instruction that are not included in the each of the threshold number of instructions. A dependency between the instruction and each of the other instructions is tracked as a plurality of groups by indicating that a dependency exists between the instruction and one of the groups based on identifying a dependency between the instruction and at least one instruction in the group. Instructions are issued from the issue queue based at least in part on the tracking.
    Type: Grant
    Filed: November 30, 2017
    Date of Patent: February 18, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Joel A. Silberman, Balaram Sinharoy
  • Patent number: 10558460
    Abstract: Systems and techniques are disclosed for general purpose register dynamic allocation based on latency associated with of instructions in processor threads. A streaming processor can include a general purpose registers configured to stored data associated with threads, and a thread scheduler configured to receive allocation information for the general purpose registers, the information describing general purpose registers that are to be assigned as persistent general purpose registers (pGPRs) and volatile general purpose registers (vGPRs). The plurality of general purpose registers can be allocated according to the received information. The streaming processor can include the general purpose registers allocated according to the received information, the allocated based on execution latencies of instructions included in the threads.
    Type: Grant
    Filed: December 14, 2016
    Date of Patent: February 11, 2020
    Assignee: QUALCOMM Incorporated
    Inventors: Yun Du, Liang Han, Lin Chen, Chihong Zhang, Hongjiang Shang, Jing Wu, Zilin Ying, Chun Yu, Guofang Jiao, Andrew Gruber, Eric Demers
  • Patent number: 10552130
    Abstract: A method of providing by a code optimization service an optimized version of a code unit to a managed runtime environment is disclosed. Information related to one or more runtime conditions associated with the managed runtime environment that is executing in a different process than that of the code optimization service is obtained, wherein the one or more runtime conditions are subject to change during the execution of the code unit. The optimized version of the code unit and a corresponding set of one or more speculative assumptions are provided to the managed runtime environment, wherein the optimized version of the code unit produces the same logical results as the code unit unless at least one of the set of one or more speculative assumptions is not true, wherein the set of one or more speculative assumptions are based on the information related to the one or more runtime conditions.
    Type: Grant
    Filed: June 8, 2018
    Date of Patent: February 4, 2020
    Assignee: Azul Systems, Inc.
    Inventors: Gil Tene, Philip Reames
  • Patent number: 10445097
    Abstract: Apparatus and methods are disclosed for decoding targets from an instruction and transmitting data to those targets in accordance with a current instruction. Multimodal target hardware is used in conjunction with one or more of the routers so as to route data to an appropriate target. The data can be one or more operands or a predicate and the targets can include operand buffers, broadcast channels, and general registers. In this way, operands, for example, can be directed for use with multiple subsequent instructions, and there are multiple modes for distributing the operands to the multiple instructions.
    Type: Grant
    Filed: March 17, 2016
    Date of Patent: October 15, 2019
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Douglas C. Burger, Aaron L. Smith
  • Patent number: 10437603
    Abstract: The disclosed inventions include a processor apparatus and method that enable a general purpose processor to achieve twice the operating frequency of typical processor implementations with a modest increase in area and a modest increase in energy per operation. The invention relies upon exploiting multiple independent streams of execution. Low area and low energy memory arrays used for register files operate a modest frequency. Instructions can be issued at a rate higher than this frequency by including logic that guarantees the spacing between instructions from the same thread are spaced wider than the time to access the register file. The result of the invention is the ability to overlap long latency structures, which allows using lower energy structures, thereby reducing energy per operation.
    Type: Grant
    Filed: February 20, 2018
    Date of Patent: October 8, 2019
    Inventor: Kevin Sean Halle
  • Patent number: 10379866
    Abstract: An electronic apparatus generating compiled data used in a very long instruction word (VLIW) processor including a plurality of function units is provided. The electronic apparatus includes a storage and a processor configured to control the storage to store the compiled data in which a plurality of VLIW instructions are compiled, identify a VLIW instruction from the compiled data; and update, if a multi-cycle no operation (nop) instruction for the plurality of function units is identified within a cycle corresponding to a latency of the identified VLIW instruction and if an end cycle of another VLIW instruction is within the cycle corresponding to the latency of the identified VLIW instruction, the compiled data by including information on a cycle difference between an end cycle of the identified VLIW instruction and the end cycle of the another VLIW instruction in the multi-cycle nop instruction.
    Type: Grant
    Filed: July 19, 2017
    Date of Patent: August 13, 2019
    Assignee: Samsung Electronics Co., Ltd
    Inventors: Jong-hun Lee, Jae-un Park, Si-hoon Song, Myung-sun Kim
  • Patent number: 10346165
    Abstract: A load/store unit including a memory queue configured to store a plurality of memory instructions and state information indicating whether each memory instruction of the plurality of memory instructions can be performed independently, with, separately, or after older pending instructions; and a state-selection circuit configured to set a state information of each memory instruction of the plurality of memory instructions in view of an older pending instruction in the memory queue.
    Type: Grant
    Filed: October 31, 2014
    Date of Patent: July 9, 2019
    Assignee: Avago Technologies International Sales Pte. Limited
    Inventor: Tariq Kurd
  • Patent number: 10346174
    Abstract: Operation of a multi-slice processor that includes a plurality of execution slices and a plurality of load/store slices, where the multi-slice processor is configured to dynamically cancel partial load operations by, among other steps, receiving a load instruction requesting multiple portions of data; receiving a load instruction requesting multiple portions of data; determining that a load of one portion of the requested multiple portions is unavailable to be issued; and responsive to determining that the load of the one portion of the requested multiple portions is unavailable to be issued, delaying issuance of the load instruction.
    Type: Grant
    Filed: March 24, 2016
    Date of Patent: July 9, 2019
    Assignee: International Business Machines Corporation
    Inventors: Elizabeth A. McGlone, Jennifer L. Molnar
  • Patent number: 10331455
    Abstract: An electronic apparatus generating compiled data used in a very long instruction word (VLIW) processor including a plurality of function units is provided. The electronic apparatus includes a storage and a processor configured to control the storage to store the compiled data in which a plurality of VLIW instructions are compiled, identify a VLIW instruction from the compiled data; and update, if a multi-cycle no operation (nop) instruction for the plurality of function units is identified within a cycle corresponding to a latency of the identified VLIW instruction and if an end cycle of another VLIW instruction is within the cycle corresponding to the latency of the identified VLIW instruction, the compiled data by including information on a cycle difference between an end cycle of the identified VLIW instruction and the end cycle of the another VLIW instruction in the multi-cycle nop instruction.
    Type: Grant
    Filed: July 19, 2017
    Date of Patent: June 25, 2019
    Assignee: Samsung Electronics Co., Ltd
    Inventors: Jong-hun Lee, Jae-un Park, Si-hoon Song, Myung-sun Kim
  • Patent number: 10324724
    Abstract: Methods and apparatuses relating to a fusion manager to fuse instructions are described. In one embodiment, a hardware processor includes a hardware binary translator to translate an instruction stream into a translated instruction stream, a hardware fusion manager to fuse multiple instructions of the translated instruction stream into a single fused instruction, a hardware decode unit to decode the single fused instruction into a decoded, single fused instruction, and a hardware execution unit to execute the decoded, single fused instruction.
    Type: Grant
    Filed: December 16, 2015
    Date of Patent: June 18, 2019
    Assignee: Intel Corporation
    Inventors: Patrick P. Lai, Tyler N. Sondag, Sebastian Winkel, Polychronis Xekalakis, Ethan Schuchman, Jayesh Iyer
  • Patent number: 10282207
    Abstract: Operation of a multi-slice processor that includes execution slices and load/store slices coupled via a results bus includes: receiving, by an execution slice, a producer instruction, including: storing, in an entry of an issue queue, the producer instruction; and storing, in a register, an issue queue entry identifier representing the entry of the issue queue in which the producer instruction is stored; receiving, by the execution slice, a source instruction, the source instruction dependent upon the result of the producer instruction, including: storing, in another entry of the issue queue, the source instruction and the issue queue entry identifier of the producer instruction; determining in dependence upon the issue queue entry identifier of the producer instruction that the producer instruction has issued from the issue queue; and responsive to the determination that the producer instruction has issued from the issue queue, issuing the source instruction from the issue queue.
    Type: Grant
    Filed: February 18, 2016
    Date of Patent: May 7, 2019
    Assignee: International Business Machines Corporation
    Inventors: Brian D. Barrick, Sundeep Chadha, Michael J. Genden, Jerry Y. Lu, Dung Q. Nguyen, Nasrin Sultana, David R. Terry, David S. Walder
  • Patent number: 10268482
    Abstract: Operation of a multi-slice processor that includes execution slices and load/store slices coupled via a results bus includes: receiving, by an execution slice, a producer instruction, including: storing, in an entry of an issue queue, the producer instruction; and storing, in a register, an issue queue entry identifier representing the entry of the issue queue in which the producer instruction is stored; receiving, by the execution slice, a source instruction, the source instruction dependent upon the result of the producer instruction, including: storing, in another entry of the issue queue, the source instruction and the issue queue entry identifier of the producer instruction; determining in dependence upon the issue queue entry identifier of the producer instruction that the producer instruction has issued from the issue queue; and responsive to the determination that the producer instruction has issued from the issue queue, issuing the source instruction from the issue queue.
    Type: Grant
    Filed: December 15, 2015
    Date of Patent: April 23, 2019
    Assignee: International Business Machines Corporation
    Inventors: Brian D. Barrick, Sundeep Chadha, Michael J. Genden, Jerry Y. Lu, Dung Q. Nguyen, Nasrin Sultana, David R. Terry, David S. Walder
  • Patent number: 10269088
    Abstract: A mechanism is described for facilitating thread execution arbitration for thread scheduling relating to graphics processors at computing devices. A method of embodiments, as described herein, includes assigning priority levels to threads based on stall signals communicated from the one or more shared function units to one or more execution units of a processor including a graphics processor, and selecting a first thread to be scheduled and a second thread to be ignored based on the stall signals.
    Type: Grant
    Filed: April 21, 2017
    Date of Patent: April 23, 2019
    Assignee: INTEL CORPORATION
    Inventors: Joydeep Ray, Abhishek R. Appu, Subramaniam M. Maiyuran, Eric J. Hoekstra, Prasoonkumar Surti, Balaji Vembu, Altug Koker
  • Patent number: 10248421
    Abstract: Operation of a multi-slice processor that includes execution slices and load/store slices coupled via a results bus, including: for a target instruction targeting a logical register, determining whether an entry in a general purpose register representing the logical register is pending a flush; if the entry in the general purpose register representing the logical register is pending a flush: cancelling the flush in the entry of the general purpose register; storing the target instruction in the entry of the general purpose register representing the logical register, and if an entry in a history buffer targeting the logical register is pending a restore, cancelling the restore for the entry of the history buffer.
    Type: Grant
    Filed: February 16, 2016
    Date of Patent: April 2, 2019
    Assignee: International Business Machines Corporation
    Inventors: Salma Ayub, Brian D. Barrick, Joshua W. Bowman, Sundeep Chadha, Cliff Kucharski, Dung Q. Nguyen, David R. Terry, Jing Zhang
  • Patent number: 10241790
    Abstract: Operation of a multi-slice processor that includes execution slices and load/store slices coupled via a results bus, including: for a target instruction targeting a logical register, determining whether an entry in a general purpose register representing the logical register is pending a flush; if the entry in the general purpose register representing the logical register is pending a flush: cancelling the flush in the entry of the general purpose register; storing the target instruction in the entry of the general purpose register representing the logical register, and if an entry in a history buffer targeting the logical register is pending a restore, cancelling the restore for the entry of the history buffer.
    Type: Grant
    Filed: December 15, 2015
    Date of Patent: March 26, 2019
    Assignee: International Business Machines Corporation
    Inventors: Salma Ayub, Brian D. Barrick, Joshua W. Bowman, Sundeep Chadha, Cliff Kucharski, Dung Q. Nguyen, David R. Terry, Jing Zhang
  • Patent number: 10241557
    Abstract: A processor includes a mechanism for disabling a memory array of a branch prediction unit. The processor may include a next fetch prediction unit that may include a number of entries. Each entry may correspond to a next instruction fetch group and may store an indication of whether or not the corresponding the next fetch group includes a conditional branch instruction. In response to an indication that the next fetch group does not include a conditional branch instruction, the fetch prediction unit may be configured to disable, in a next instruction execution cycle, the memory array of the branch prediction unit.
    Type: Grant
    Filed: December 12, 2013
    Date of Patent: March 26, 2019
    Assignee: Apple Inc.
    Inventors: Conrado Blasco, Ronald P Hall, Ramesh B Gunna, Ian D Kountanis, Shyam Sundar, André Seznec
  • Patent number: 10235181
    Abstract: An out-of-order (OOO) processor includes ready logic that provides a signal indicating an instruction is ready when all operands for the instruction are ready, or when all operands are either ready or are marked back-to-back to a current instruction. By marking a second instruction that consumes an operand as ready when it is back-to-back with a first instruction that produces the operand, but the first instruction has not yet produced the operand, latency due to missed cycles in executing back-to-back instructions is minimized.
    Type: Grant
    Filed: February 3, 2017
    Date of Patent: March 19, 2019
    Assignee: International Business Machines Corporation
    Inventor: Brian W. Thompto
  • Patent number: 10235232
    Abstract: A processor includes an indicator configured to indicate a first mode or a second mode and a functional unit configured to perform computations with a full degree of accuracy when the indicator indicates the first mode and to perform computations with less than the full degree of accuracy when the indicator indicates the second mode.
    Type: Grant
    Filed: October 23, 2014
    Date of Patent: March 19, 2019
    Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD
    Inventors: G. Glenn Henry, Terry Parks, Rodney E. Hooker
  • Patent number: 10229066
    Abstract: A data processing apparatus is provided including queue circuitry to respond to control signals each associated with a memory access instruction, and to queue a plurality of requests for data, each associated with a reference to a storage location. Resolution circuitry acquires a request for data, and issues the request for data, the resolution circuitry having a resolution circuitry limit. When a current capacity of the resolution circuitry is below the resolution circuitry limit, the resolution circuitry acquires the request for data by receiving the request for data from the queue circuitry, stores the request for data in association with the storage location, issues the request for data, and causes a result of issuing the request for data to be provided to said storage location.
    Type: Grant
    Filed: September 30, 2016
    Date of Patent: March 12, 2019
    Assignee: ARM Limited
    Inventors: Miles Robert Dooley, Matthew Andrew Rafacz, Huzefa Moiz Sanjeliwala, Michael Filippo
  • Patent number: 10228982
    Abstract: A mechanism is provided for allocating a hyper-threaded processor to nodes of multi-tenant distributed software systems. Responsive to receiving a request to provision a node of the multi-tenant distributed software system on the host data processing system, a cluster to which the node belongs is identified. Responsive to the node being a second type of node, responsive to determining that another second type of node in the same cluster has been provisioned on the host data processing system, and responsive to the number of unallocated VPs on different physical processors from that of the other second type of node being greater than or equal to the requested number of VPs for the second type of node, the requested number of VPs for the second type of node is allocated each to a different physical processor from that of the other second type of node.
    Type: Grant
    Filed: January 25, 2018
    Date of Patent: March 12, 2019
    Assignee: International Business Machines Corporation
    Inventors: Rachit Arora, Dharmesh K. Jain, Padmanabhan Krishnan, Shrinivas S. Kulkarni, Subin Shekhar
  • Patent number: 10223126
    Abstract: An out-of-order (OOO) processor includes ready logic that provides a signal indicating an instruction is ready when all operands for the instruction are ready, or when all operands are either ready or are marked back-to-back to a current instruction. By marking a second instruction that consumes an operand as ready when it is back-to-back with a first instruction that produces the operand, but the first instruction has not yet produced the operand, latency due to missed cycles in executing back-to-back instructions is minimized.
    Type: Grant
    Filed: January 6, 2017
    Date of Patent: March 5, 2019
    Assignee: International Business Machines Corporation
    Inventor: Brian W. Thompto
  • Patent number: 10216547
    Abstract: A mechanism is provided for allocating a hyper-threaded processor to nodes of multi-tenant distributed software systems. Responsive to receiving a request to provision a node of the multi-tenant distributed software system on the host data processing system, a cluster to which the node belongs is identified. Responsive to the node being a second type of node, responsive to determining that another second type of node in the same cluster has been provisioned on the host data processing system, and responsive to the number of unallocated VPs on different physical processors from that of the other second type of node being greater than or equal to the requested number of VPs for the second type of node, the requested number of VPs for the second type of node is allocated each to a different physical processor from that of the other second type of node.
    Type: Grant
    Filed: November 22, 2016
    Date of Patent: February 26, 2019
    Assignee: International Business Machines Corporation
    Inventors: Rachit Arora, Dharmesh K. Jain, Padmanabhan Krishnan, Shrinivas S. Kulkarni, Subin Shekhar
  • Patent number: 10169187
    Abstract: A performance monitor including a saturating counter provides a relative measure of event frequency without requiring a minimum polling rate or periodic reset to avoid or account for counter overflow. The saturating counter is incremented upon detection of an event and decremented if an event is not detected within a predetermined period. The period of detecting may be programmable and may be determined by real time clock, processor or instruction cycles. Multiple event types may be selected from for detection and input to a single counter, or alternatively multiple event counters may be provided for various event types. The saturating counter may additionally be periodically reset in a selected operating mode, in combination with the decrementing action performed on the counter.
    Type: Grant
    Filed: August 18, 2010
    Date of Patent: January 1, 2019
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Venkat Rajeev Indukuru, Alexander Erik Mericas
  • Patent number: 10140129
    Abstract: A processor having one or more processing cores is described. Each of the one or more processing cores has front end logic circuitry and a plurality of processing units. The front end logic circuitry is to fetch respective instructions of threads and decode the instructions into respective micro-code and input operand and resultant addresses of the instructions. Each of the plurality of processing units is to be assigned at least one of the threads, is coupled to said front end unit, and has a respective buffer to receive and store microcode of its assigned at least one of the threads.
    Type: Grant
    Filed: December 28, 2012
    Date of Patent: November 27, 2018
    Assignee: Intel Corporation
    Inventors: Ilan Pardo, Dror Markovich, Oren Ben-Kiki, Yuval Yosef
  • Patent number: 10114644
    Abstract: A decoding logic method is arranged to execute a zero-overhead loop in an embedded digital signal processor (DSP). In the method, instruction data is fetched from a memory, and a plurality of instruction tokens, which are derived from the instruction data, are stored in a token buffer. A first portion of one or more instruction tokens from the token buffer are passed to a first decode module, which may be an instruction decode module, and a second portion of the one or more instruction tokens from the token buffer are passed to a second decode module, which may be a loop decode module. The second decode module detects a special loop instruction token, and based on the detection of the special loop instruction token, a loop counter is conditionally tested. Using the first decode module, at least one instruction token of an iterative algorithm is assembled into a single instruction, which is executable in a single execution cycle.
    Type: Grant
    Filed: July 26, 2016
    Date of Patent: October 30, 2018
    Assignee: STMICROELECTRONICS (BEIJING) R&D CO. LTD
    Inventors: PengFei Zhu, Xiao Kang Jiao
  • Patent number: 10114638
    Abstract: In one embodiment, command message generation and execution using a machine code-instruction is performed. One embodiment includes a particular machine executing a single machine-code instruction including a reference into a command-message-building data structure stored in memory. This executing the single machine-code instruction includes generating a command message and initiating communication of the command message to a hardware accelerator, including copying command information from the command-message-building data structure based on the reference into the command message. The hardware accelerator receives and executes the command message. In one embodiment, the command message is message-switched from a processor to a hardware accelerator, such as, but not limited to, a memory controller, a table lookup unit, or a prefix lookup unit. In one embodiment, a plurality of threads share the command-message-building data structure.
    Type: Grant
    Filed: December 15, 2014
    Date of Patent: October 30, 2018
    Assignee: Cisco Technology, Inc.
    Inventor: Donald Edward Steiss
  • Patent number: 10095637
    Abstract: Techniques for improving execution of a lock instruction are provided herein. A lock instruction and younger instructions are allowed to speculatively retire prior to the store portion of the lock instruction committing its value to memory. These instructions thus do not have to wait for the lock instruction to complete before retiring. In the event that the processor detects a violation of the atomic or fencing properties of the lock instruction prior to committing the value of the lock instruction, the processor rolls back state and executes the lock instruction in a slow mode in which younger instructions are not allowed to retire until the stored value of the lock instruction is committed. Speculative retirement of these instructions results in increased processing speed, as instructions no longer need to wait to retire after execution of a lock instruction.
    Type: Grant
    Filed: September 15, 2016
    Date of Patent: October 9, 2018
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Gregory W. Smaus, John M. King, Michael D. Achenbach, Kevin M. Lepak, Matthew A. Rafacz, Noah Bamford
  • Patent number: 10089081
    Abstract: A method and apparatus for generating a signal processing pipeline based, at least in part, on manipulation of a GUI, the signal processing pipeline comprising one or more signal processing operations.
    Type: Grant
    Filed: December 31, 2014
    Date of Patent: October 2, 2018
    Assignee: Excalibur IP, LLC
    Inventors: Guangxin Yang, Ji Zhou, Shuo Yang, Yan Xia, Delu Zhu
  • Patent number: 10083039
    Abstract: A processor core having multiple parallel instruction execution slices and coupled to multiple dispatch queues by a dispatch routing network provides flexible and efficient use of internal resources. The configuration of the execution slices is selectable so that capabilities of the processor core can be adjusted according to execution requirements for the instruction streams. Two or more execution slices can be combined as super-slices to handle wider data, wider operands and/or vector operations, according to one or more mode control signal that also serves as a configuration control signal. The mode control signal is also used to partition clusters of the execution slices within the processor core according to whether single-threaded or multi-threaded operation is selected, and additionally according to a number of hardware threads that are active.
    Type: Grant
    Filed: January 30, 2018
    Date of Patent: September 25, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Lee Evan Eisen, Hung Qui Le, Jentje Leenstra, Jose Eduardo Moreira, Bruce Joseph Ronchetti, Brian William Thompto, Albert James Van Norstrand, Jr.
  • Patent number: 10078516
    Abstract: Techniques are disclosed for back-to-back issue of instructions in a processor. A first instruction is stored in a queue position in an issue queue. The issue queue stores instructions in a corresponding queue position. The first instruction includes a target instruction tag and at least a source instruction tag. The target instruction tag is stored in a table storing a plurality of target instruction tags associated with a corresponding instruction. Each stored target instruction tag specifies a logical register that stores a target operand. Upon determining, based on the source instruction tag associated with the first instruction and the target instruction tag associated with a second instruction, that the first instruction is dependent on the second instruction, a pointer to the first instruction is associated with the second instruction. The pointer is used to wake up the first instruction upon issue of the second instruction.
    Type: Grant
    Filed: August 24, 2015
    Date of Patent: September 18, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jeffrey C. Brownscheidle, Sundeep Chadha, Maureen A. Delaney, Dung Q. Nguyen
  • Patent number: 9983875
    Abstract: Operation of a multi-slice processor that includes a plurality of execution slices, a plurality of load/store slices, and an instruction sequencing unit, where operation includes: receiving, at a load/store slice, a load instruction to be issued; determining, at the load/store slice, that the load instruction has not completed and is to be reissued; and responsive to determining that the load instruction is to be reissued, delaying a signal, from the load/store slice to the instruction sequencing unit, that allows the instruction sequencing unit to issue one or more instructions dependent upon the load instruction.
    Type: Grant
    Filed: March 4, 2016
    Date of Patent: May 29, 2018
    Assignee: International Business Machines Corporation
    Inventors: Sundeep Chadha, David A. Hrusecky, Elizabeth A. McGlone, Jennifer L. Molnar
  • Patent number: 9971600
    Abstract: Techniques are disclosed for back-to-back issue of instructions in a processor. A first instruction is stored in a queue position in an issue queue. The issue queue stores instructions in a corresponding queue position. The first instruction includes a target instruction tag and at least a source instruction tag. The target instruction tag is stored in a table storing a plurality of target instruction tags associated with a corresponding instruction. Each stored target instruction tag specifies a logical register that stores a target operand. Upon determining, based on the source instruction tag associated with the first instruction and the target instruction tag associated with a second instruction, that the first instruction is dependent on the second instruction, a pointer to the first instruction is associated with the second instruction. The pointer is used to wake up the first instruction upon issue of the second instruction.
    Type: Grant
    Filed: June 26, 2015
    Date of Patent: May 15, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Jeffrey C. Brownscheidle, Sundeep Chadha, Maureen A. Delaney, Dung Q. Nguyen
  • Patent number: 9971601
    Abstract: Dynamic resource allocation is provided in which additional resources, such as additional architected registers, are provided to an instruction, if it is determined that resources in addition to those configured to be provided to the instruction are to be used for the particular instruction. An instruction to be executed is dispatched on a pipe of a pipeline and that pipe is configured to have a set number of architected registers for use by the instruction. However, if one or more other architected registers are needed, those additional architected registers are dynamically allocated to the instruction by assigning one or more source ports of an additional pipe to the instruction.
    Type: Grant
    Filed: February 13, 2015
    Date of Patent: May 15, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Gregory W. Alexander, Brian D. Barrick, Fadi Y. Busaba, Wen H. Li, Edward T. Malley
  • Patent number: 9870229
    Abstract: Embodiments of the present invention provide systems and methods for mapping the architected state of one or more threads to a set of distributed physical register files to enable independent execution of one or more threads in a multiple slice processor. In one embodiment, a system is disclosed including a plurality of dispatch queues which receive instructions from one or more threads and an even number of parallel execution slices, each parallel execution slice containing a register file. A routing network directs an output from the dispatch queues to the parallel execution slices and the parallel execution slices independently execute the one or more threads.
    Type: Grant
    Filed: September 29, 2015
    Date of Patent: January 16, 2018
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Sam G. Chu, Markus Kaltenbach, Hung Q. Le, Jentje Leenstra, Jose E. Moreira, Dung Q. Nguyen, Brian W. Thompto
  • Patent number: 9811377
    Abstract: A method for executing multithreaded instructions grouped into blocks. The method includes receiving an incoming instruction sequence using a global front end; grouping the instructions to form instruction blocks, wherein the instructions of the instruction blocks are interleaved with multiple threads; scheduling the instructions of the instruction block to execute in accordance with the multiple threads; and tracking execution of the multiple threads to enforce fairness in an execution pipeline.
    Type: Grant
    Filed: March 14, 2014
    Date of Patent: November 7, 2017
    Assignee: INTEL CORPORATION
    Inventor: Mohammad Abdallah
  • Patent number: 9811343
    Abstract: A method, system, and computer program product synchronize a group of workitems executing an instruction stream on a processor. The processor is yielded by a first workitem responsive to a synchronization instruction in the instruction stream. A first one of a plurality of program counters is updated to point to a next instruction following the synchronization instruction in the instruction stream to be executed by the first workitem. A second workitem is run on the processor after the yielding.
    Type: Grant
    Filed: May 26, 2017
    Date of Patent: November 7, 2017
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Lee W. Howes, Benedict R. Gaster, Michael C. Houston
  • Patent number: 9747238
    Abstract: A computer processor including a plurality of functional units that performs operations that produce result operands at different characteristic latencies over multiple cycles. An interconnect network provides data paths for transfer of operand data between functional units. The interconnect network includes first and second crossbar parts. The first crossbar part is configured to route result operands produced with the lowest characteristic latency to any other functional unit. The second crossbar part is configured to route result operands with higher characteristic latency relative to the lowest characteristic latency to the first crossbar part where such result operands are in turn routed to any functional unit. In another aspect, the functional units can be organized as multiple slots where each slot can produce multiple result operands of different characteristic latencies in the same cycle, and wherein each slot employs separate result registers for each characteristic latency present on the slot.
    Type: Grant
    Filed: June 23, 2014
    Date of Patent: August 29, 2017
    Assignee: MILL COMPUTING, INC.
    Inventors: Roger Rawson Godard, Arthur David Kahlich, Sebastien Paul Maurice Mirolo, David Arthur Yost
  • Patent number: 9720697
    Abstract: In an embodiment, a method is provided. The method includes managing user-level threads on a first instruction sequencer in response to executing user-level instructions on a second instruction sequencer that is under control of an application level program. A first user-level thread is run on the second instruction sequencer and contains one or more user level instructions. A first user level instruction has at least 1) a field that makes reference to one or more instruction sequencers or 2) implicitly references with a pointer to code that specifically addresses one or more instruction sequencers when the code is executed.
    Type: Grant
    Filed: September 10, 2012
    Date of Patent: August 1, 2017
    Assignee: INTEL CORPORATION
    Inventors: Hong Wang, John Shen, Ed Grochowski, James Paul Held, Bryant Bigbee, Shivnandan D. Kaushik, Gautham Chinya, Xiang Zou, Per Hammarlund, Xinmin Tian, Anil Aggarwal, Scott Dion Rodgers, Prashant Sethi, Baiju V. Patel, Richard Andrew Hankins