Simultaneous Issuance Of Multiple Instructions Patents (Class 712/215)
  • Patent number: 11663011
    Abstract: Very long instruction word (VLIW) instruction processing using a reduced-width processor is disclosed. In a particular embodiment, a VLIW processor includes a control circuit configured to receive a VLIW packet that includes a first number of instructions and to distribute the instructions to a second number of instruction execution paths. The first number is greater than the second number. The VLIW processor also includes physical registers configured to store results of executing the instructions and a register renaming circuit that is coupled to the control circuit.
    Type: Grant
    Filed: July 7, 2020
    Date of Patent: May 30, 2023
    Assignee: Qualcomm Incorporated
    Inventors: Peter Sassone, Christopher Koob, Suresh Kumar Venkumahanti
  • Patent number: 11579878
    Abstract: An apparatus is disclosed. The apparatus includes one or more processors comprising register sharing circuitry to receive meta-information indicating a number of threads that are to be disabled and provide an indication that an associated thread is disabled, a plurality of General Purpose Register Files (GRFs), wherein one or more of the plurality of GRFs is associated with one of the plurality of threads and a plurality of multiplexers coupled to the one or more GRFs to receive the indication from the register sharing circuitry and disable thread access to an associated GRF based on an indication that a thread is to be disabled.
    Type: Grant
    Filed: May 22, 2020
    Date of Patent: February 14, 2023
    Assignee: Intel Corporation
    Inventors: Pratik J. Ashar, Supratim Pal, Subramaniam Maiyuran, Wei-Yu Chen, Guei-Yuan Lueh
  • Patent number: 11544062
    Abstract: An apparatus and method for pairing store operations. For example, one embodiment of a processor comprises: a grouping eligibility checker to evaluate a plurality of store instructions based on a set of grouping rules to determine whether two or more of the plurality of store instructions are eligible for grouping; and a dispatcher to simultaneously dispatch a first group of store instructions of the plurality of store instructions determined to be eligible for grouping by the grouping eligibility checker.
    Type: Grant
    Filed: March 28, 2020
    Date of Patent: January 3, 2023
    Assignee: Intel Corporation
    Inventors: Raanan Sade, Igor Yanover, Stanislav Shwartsman, Muhammad Taher, David Zysman, Liron Zur, Yiftach Gilad
  • Patent number: 11531552
    Abstract: Systems and methods are disclosed for allocating resources to contexts in block-based processor architectures. In one example of the disclosed technology, a processor is configured to spatially allocate resources between multiple contexts being executed by the processor, including caches, functional units, and register files. In a second example of the disclosed technology, a processor is configured to temporally allocate resources between multiple contexts, for example, on a clock cycle basis, including caches, register files, and branch predictors. Each context is guaranteed access to its allocated resources to avoid starvation from contexts competing for resources of the processor. A results buffer can be used for folding larger instruction blocks into portions that can be mapped to smaller-sized instruction windows. The results buffer stores operand results that can be passed to subsequent portions of an instruction block.
    Type: Grant
    Filed: February 6, 2017
    Date of Patent: December 20, 2022
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Gagan Gupta, Douglas C. Burger
  • Patent number: 11429393
    Abstract: An apparatus for data processing and a method of data processing are provided. Data processing operations are performed in response to instructions which reference architectural registers using physical registers to store data values when performing the data processing operations. Mappings between the architectural registers and the physical registers are stored, and when a data hazard condition is identified with respect to out-of-order program execution of an instruction, an architectural register specified in the instruction is remapped to an available physical register. A reorder buffer stores an entry for each destination architectural register specified by the instruction, entries being stored in program order, and an entry specifies a destination architectural register and an original physical register to which the destination architectural register was mapped before the architectural register remapped to an available physical register.
    Type: Grant
    Filed: November 11, 2015
    Date of Patent: August 30, 2022
    Assignee: ARM LIMITED
    Inventors: Vladimir Vasekin, Ian Michael Caulfield, Chiloda Ashan Senarath Pathirane
  • Patent number: 11422822
    Abstract: Techniques are disclosed relating to sharing datapath circuitry among multiple SIMD groups. In some embodiments, pipeline circuitry is configured to perform operations specified by instructions of first and second assigned SIMD groups. The pipeline circuitry may include first and second front-end circuitry configured to decode instructions of the respective SIMD groups. The pipeline circuitry may include shared execution circuitry configured to perform operations specified by the first and second assigned SIMD groups and arbitration circuitry configured to select an instruction from among at least the first and second front-end circuitry for assignment to the shared execution circuitry in a current cycle. The arbitration circuitry may select an instruction based on one or more of: stall counts, whether available instructions are being speculatively executed, whether ones of available instructions target a particular portion of the shared execution circuitry, numbers of execution cycles, and SIMD group ages.
    Type: Grant
    Filed: May 8, 2020
    Date of Patent: August 23, 2022
    Assignee: Apple Inc.
    Inventors: Robert D. Kenney, Jason N. Dale
  • Patent number: 11403388
    Abstract: An extracting unit randomly extracts a block from among the blocks of instruction strings constituting the byte code of a first program and, at the time of execution of the first program, extracts the blocks which are invariably executed before the randomly-extracted block. A dividing unit randomly divides, into a plurality of blocks, the instruction strings constituting the byte code of a second program which enables detection of tampering of the first program. An inserting unit inserts the plurality of blocks, which are obtained by division by the dividing unit, at different positions in the block extracted by the extracting unit, while maintaining the execution sequence written in the second program.
    Type: Grant
    Filed: September 22, 2017
    Date of Patent: August 2, 2022
    Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION
    Inventors: Fumihiro Kanei, Mitsuaki Akiyama, Yuta Takata, Takeshi Yagi
  • Patent number: 11379243
    Abstract: A microprocessor with a multistep-ahead branch predictor is shown. The branch predictor is coupled to an instruction cache and has an N-stage pipelined architecture, which is configured to perform branch prediction to control the instruction fetching of the instruction cache. The branch predictor performs branch prediction for (N?1) instruction-address blocks in parallel, wherein the (N?1) instruction-address blocks include a starting instruction-address block and (N?2) subsequent instruction-address blocks. The branch predictor is thereby ahead of branch prediction of the starting instruction-address block. The branch predictor stores reference information about branch prediction in at least one memory and performs a parallel search of the memory for the branch prediction of the (N-1) instruction-address blocks.
    Type: Grant
    Filed: October 29, 2020
    Date of Patent: July 5, 2022
    Assignee: SHANGHAI ZHAOXIN SEMICONDUCTOR CO., LTD.
    Inventors: Fangong Gong, Mengchen Yang, Guohua Chen
  • Patent number: 11327903
    Abstract: An apparatus has memory access circuitry to perform a tag-guarded memory access operation in response to a target address. The tag-guarded memory access operation comprises: comparing an address tag associated with the target address with a guard tag stored in a memory system in association with a block of one or more memory locations comprising an addressed location identified by the target address, and generating an indication of whether a match is detected between the guard tag and the address tag. An instruction decoder decodes a multiple guard tag setting instruction to control the memory access circuitry to trigger memory accesses to update the guard tags associated with at least two consecutive blocks of one or more memory locations.
    Type: Grant
    Filed: December 10, 2018
    Date of Patent: May 10, 2022
    Assignee: Arm Limited
    Inventor: Graeme Peter Barnes
  • Patent number: 11308057
    Abstract: Described herein is a system and method for multiplexer tree (muxtree) indexing. Muxtree indexing performs hashing and row reduction in parallel by use of each select bit only once in a particular path of the muxtree. The muxtree indexing generates a different final index as compared to conventional hashed indexing but still results in a fair hash, where all table entries get used with equal distribution with uniformly random selects.
    Type: Grant
    Filed: November 28, 2017
    Date of Patent: April 19, 2022
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Steven R. Havlir, Patrick J. Shyvers
  • Patent number: 11275614
    Abstract: A computer system includes a processor, main memory, and controller. The processor includes a plurality of hardware threads configured to execute a plurality of software threads. The main memory includes a first register table configured to contain a current set of architected registers for the currently running software threads. The controller is configured to change a first number of the architected registers assigned to a given one of the software threads to a second number of architected registers when a result of monitoring current usage of the registers by the software threads indicates that the change will improve performance of the computer system. The processor includes a second register table configured to contain a subset of the architected registers and a mapping table for each software thread indicating whether the architected registers referenced by the corresponding software thread are located in the first register table or the second register table.
    Type: Grant
    Filed: September 27, 2019
    Date of Patent: March 15, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Harold W. Cain, III, Hubertus Franke, Charles R. Johns, Hung Q. Le, Ravi Nair, James A. Kahle
  • Patent number: 11256511
    Abstract: A method of performing instruction scheduling during execution in a processor includes receiving, at an execution unit of the processor, an initial assignment of an assigned execution resource among two or more execution resources to execute an operation. An instruction includes two or more operations. Based on determining that the assigned execution resource is not available, the method also includes determining, at the execution unit, whether another execution resource among the two or more execution resources is available to execute the operation. Based on determining that the other execution resource is available, the method further includes executing the operation with the other execution resource.
    Type: Grant
    Filed: May 20, 2019
    Date of Patent: February 22, 2022
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Cedric Lichtenau, Stefan Payer, Kerstin Claudia Schelm, Anthony Saporito, Gregory William Alexander
  • Patent number: 11218322
    Abstract: Techniques and apparatuses for issuance of license upgrades for hardware components in the field, as well as the hardware components, are described. In one embodiment, for example an apparatus may include processor circuitry and memory in communication with the processor circuitry, wherein the memory contains a configuration data block and license data block, the configuration data block being read from the memory via a licensing apparatus and the licensing data block being written to the memory by the licensing apparatus. The processor may include executable code to process the licensing data block to facilitate an upgrade of the capabilities of the processor circuitry.
    Type: Grant
    Filed: September 28, 2017
    Date of Patent: January 4, 2022
    Assignee: INTEL CORPORATION
    Inventors: Sergiu D. Ghetie, Neeraj S. Upasani, Chukwunenye S. Nnebe, Won Lee, Shaila R. Murty, Arkadiusz Berent, Vasuki Chilukuri, David T. Mayo, Scott P. Bobholz, Vinila Rose, Wojciech S. Powiertowski
  • Patent number: 11132228
    Abstract: A computing device and a method of allocating vector register files in a simultaneously-multithreaded (SMT) processor core are provided. A request for a first number (M) of vector register files is received from a borrower thread of the processor core. One or more available donor threads of the processor core are identified. A second number (N) of the vector register files, of the identified one or more available donor threads, are assigned to the borrower thread, where N is ?M. The borrower thread is parameterized to create a virtualized vector register file for the borrower thread, based on a width of the N vector register files of the identified one or more donor threads.
    Type: Grant
    Filed: March 21, 2018
    Date of Patent: September 28, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Mauricio Serrano, Giles Frazier, Silvia Melitta Mueller
  • Patent number: 11132486
    Abstract: Systems and method are provided that include a standard cell with multiple input and output storage elements, such as flip flops, latches, etc., with some combination logic interconnected between them. In embodiments, the slave latches on input flip flops are replaced with a fewer number latches at a downstream node(s) of the combination logic resulting in improved performance, area and power, while maintaining functionality at the interface pins of the standard cell. The process of inferring such a standard cell from a behavioral description, such as RTL, of a design or remapping equivalent sub-circuits from a netlist to such a standard cell is also described.
    Type: Grant
    Filed: May 21, 2020
    Date of Patent: September 28, 2021
    Assignee: Taiwan Semiconductor Manufacturing Company, Ltd.
    Inventors: Guru Prasad, Sachin Kumar
  • Patent number: 11126438
    Abstract: In one embodiment, a reservation station of a processor includes: a plurality of first lanes having a plurality of entries to store information for instructions having in-order dependencies; a variable latency tracking table including a second plurality of entries to store information for instructions having a variable latency; and a scheduler circuit to access a head entry of the plurality of first lanes to schedule, for execution on at least one execution unit, at least one instruction from the head entry of at least one of the plurality of first lanes. Other embodiments are described and claimed.
    Type: Grant
    Filed: June 26, 2019
    Date of Patent: September 21, 2021
    Assignee: Intel Corporation
    Inventors: Srikanth Srinivasan, Thomas Mullins, Ammon Christiansen, James Hadley, Robert S. Chappell, Sean Mirkes
  • Patent number: 11106363
    Abstract: A nonvolatile memory device includes a nonvolatile memory, a volatile memory being a cache memory of the nonvolatile memory, and a first controller configured to control the nonvolatile memory. The nonvolatile memory device further includes a second controller configured to receive a device write command and an address, and transmit, to the volatile memory through a first bus, a first read command and the address and a first write command and the address sequentially, and transmit a second write command and the address to the first controller through a second bus, in response to the reception of the device write command and the address.
    Type: Grant
    Filed: May 17, 2019
    Date of Patent: August 31, 2021
    Assignee: SAMSUNG ELECTRONICS CO., LTD.
    Inventors: Youngjin Cho, Sungyong Seo, Sun-Young Lim, Uksong Kang, Chankyung Kim, Duckhyun Chang, JinHyeok Choi
  • Patent number: 11055099
    Abstract: A method and system of the branch look-ahead (BLA) instruction disassembling, assembling, and delivering are designed for improving speed of branch prediction and instruction fetch of microprocessor systems by reducing the amount of clock cycles required to deliver branch instructions to a branch predictor located inside the microprocessors. The invention is also designed for reducing run-length of the instructions found between branch instructions by disassembling the instructions in a basic block as a BLA instruction and a single or plurality of non-BLA instructions from the software/assembly program. The invention is also designed for dynamically reassembling the BLA and the non-BLA instructions and delivering them to a single or plurality of microprocessors in a compatible sequence. In particular, the reassembled instructions are concurrently delivered to a single or plurality of microprocessors in a timely and precise manner while providing compatibility of the software/assembly program.
    Type: Grant
    Filed: February 17, 2019
    Date of Patent: July 6, 2021
    Inventor: Yong-Kyu Jung
  • Patent number: 11048515
    Abstract: Disclosed herein are systems and method for instruction tightly-coupled memory (iTIM) and instruction cache (iCache) access prediction. A processor may use a predictor to enable access to the iTIM or the iCache and a particular way (a memory structure) based on a location state and program counter value. The predictor may determine whether to stay in an enabled memory structure, move to and enable a different memory structure, or move to and enable both memory structures. Stay and move predictions may be based on whether a memory structure boundary crossing has occurred due to sequential instruction processing, branch or jump instruction processing, branch resolution, and cache miss processing. The program counter and a location state indicator may use feedback and be updated each instruction-fetch cycle to determine which memory structure(s) needs to be enabled for the next instruction fetch.
    Type: Grant
    Filed: August 28, 2019
    Date of Patent: June 29, 2021
    Assignee: SiFive, Inc.
    Inventors: Krste Asanovic, Andrew Waterman
  • Patent number: 10990745
    Abstract: An integrated circuit includes a first bit flip-flop and a second flip-flop. The first flip-flop has a first driving capability. The second flip-flop has a second driving capability different from the first driving capability. The first flip-flop and the second flip-flop are part of a multibit flip-flop configured to share at least a first clock pin. The first clock pin is configured to receive the first clock signal.
    Type: Grant
    Filed: September 3, 2019
    Date of Patent: April 27, 2021
    Assignee: TAIWAN SEMICONDUCTOR MANUFACTURING COMPANY LTD.
    Inventors: Sheng-Hsiung Chen, Shao-Huan Wang, Wen-Hao Chen, Chun-Yao Ku, Hung-Chih Ou
  • Patent number: 10956160
    Abstract: A processor and method are described for a multi-level reservation station.
    Type: Grant
    Filed: March 27, 2019
    Date of Patent: March 23, 2021
    Assignee: Intel Corporation
    Inventors: Mark Dechene, Srikanth Srinivasan, Matthew Merten, Ammon Christiansen
  • Patent number: 10942745
    Abstract: Fast issuance and execution of a multi-width instruction across multiple slices in a parallel slice processor core is supported in part through the use of an early notification signal passed between issue logic associated with multiple slices handling that multi-width instruction coupled with an issuance of a different instruction by the originating issue logic for the early notification signal.
    Type: Grant
    Filed: September 25, 2018
    Date of Patent: March 9, 2021
    Assignee: International Business Machines Corporation
    Inventors: Salma Ayub, Jeffrey C. Brownscheidle, Sundeep Chadha, Dung Q. Nguyen, Tu-An T. Nguyen, Salim A. Shah, Brian W. Thompto
  • Patent number: 10936321
    Abstract: An approach is disclosed that that in one or more embodiments includes receiving an indicator to issue an out-of-order instruction or a type of out-of-order instruction in-order; receiving a first instruction; determining whether the first instruction corresponds to the indicated out-of-order instruction or the type of out-of-order instruction; writing, in response to determining that the first instruction corresponds to the indicated out-of-order instruction or the type of out-of-order instruction, an instruction identifier and a dependent instruction opcode into a first queue and an issue queue of the processor; receiving at least one subsequent instruction; determining whether an instruction opcode of the subsequent instructions matches the dependent instruction opcode of the first instruction; and writing, in response to determining the instruction opcode of the subsequent instruction matches the dependent instruction opcode of the instruction, a dependent instruction identifier for the subsequent instruc
    Type: Grant
    Filed: February 1, 2019
    Date of Patent: March 2, 2021
    Assignee: International Business Machines Corporation
    Inventors: Kurt A. Feiste, Joshua W. Bowman, Christopher M. Mueller, Dung Q. Nguyen, Deepak K. Singh, Brian W. Thompto
  • Patent number: 10929140
    Abstract: Aspects of the invention include tracking dependencies between instructions in an issue queue. The tracking includes, for each instruction in the issue queue, identifying whether the instruction is dependent on each of a threshold number of instructions added to the issue queue prior to the instruction. The tracking also includes identifying whether the instruction is dependent on one or more other instructions in a group of instructions in the issue queue that were added to the issue queue prior to the instruction and that are not included in the threshold number of instructions that are tracked individually. A dependency between the instruction and the one or more other instructions in the group of instructions is tracked using a single summary bit that is set to indicate that a dependency exists between the instruction and the group of instructions. Instructions are issued from the issue queue based at least in part on the tracking.
    Type: Grant
    Filed: November 30, 2017
    Date of Patent: February 23, 2021
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Joel A. Silberman, Balaram Sinharoy
  • Patent number: 10884743
    Abstract: A method of activating scheduling instructions within a parallel processing unit is described. The method comprises decoding, in an instruction decoder, an instruction in a scheduled task in an active state and checking, by an instruction controller, if a swap flag is set in the decoded instruction. If the swap flag in the decoded instruction is set, a scheduler is triggered to de-activate the scheduled task by changing the scheduled task from the active state to a non-active state.
    Type: Grant
    Filed: June 18, 2018
    Date of Patent: January 5, 2021
    Assignee: Imagination Technologies Limited
    Inventors: Simon Nield, Yoong-Chert Foo, Adam de Grasse, Luca Iuliano
  • Patent number: 10860327
    Abstract: A method for scheduling micro-instructions, performed by a qualifier, is provided. The method includes the following steps: detecting a load write-back signal broadcasted by a load execution unit; determining whether to trigger a load-detection counting logic according to content of the load write-back signal; determining whether an execution status of a load micro-instruction is cache hit when the triggered load-detection counting logic reaches a predetermined value; and driving a release circuit to remove the first micro-instruction in a reservation station queue when the execution status of the load micro-instruction is cache hit and the first micro-instruction has been dispatched to an arithmetic and logic unit for execution.
    Type: Grant
    Filed: October 2, 2018
    Date of Patent: December 8, 2020
    Assignee: SHANGHAI ZHAOXIN SEMICONDUCTOR CO., LTD.
    Inventor: Xiaolong Fei
  • Patent number: 10853076
    Abstract: An apparatus is provided to perform branch prediction in respect of a plurality of instructions divided into a plurality of blocks. Receiving circuitry receives references to at least two blocks in the plurality of blocks. Branch prediction circuitry performs at least two branch predictions at a time. The branch predictions are performed in respect of the at least two blocks and the at least two blocks are non-contiguous.
    Type: Grant
    Filed: February 21, 2018
    Date of Patent: December 1, 2020
    Assignee: Arm Limited
    Inventors: Houdhaifa Bouzguarrou, Guillaume Bolbenes, Eddy Lapeyre, Luc Orion
  • Patent number: 10831496
    Abstract: The present disclosure relates to a method to execute successive dependent instructions from an instruction stream in a processor. In an embodiment, the invention relates to a method to execute successive dependent instructions from an instruction stream in a processor. The method may include identifying a first instruction and a second instruction. A given operand of a second instruction is an output of the first instruction of the pair. The first instruction is older than the second instruction. The method may include loading the operands of the first instruction and the second instruction. The method may include executing the first instruction and the second instruction.
    Type: Grant
    Filed: February 28, 2019
    Date of Patent: November 10, 2020
    Assignee: International Business Machines Corporation
    Inventors: Maarten J. Boersma, Michael Klaus Kroener, Niels Fricke, Razvan Peter Figuli, Nandor Szirmak, Dung Q. Nguyen
  • Patent number: 10831232
    Abstract: A computer architecture suitable for out-of-order processors manages the problem of timing slack, in which an instruction completes before its clock cycle, by recycling that slack to allow the next succeeding instruction allowing that instruction to begin execution earlier. This recycling mechanism is enabled through the use of a transparent gating between execution units which allows data transfer before clock cycle boundaries and, in some cases, by aggressively issuing children instructions contemporaneously with their parent instruction after a grandparent instruction is issued.
    Type: Grant
    Filed: February 15, 2019
    Date of Patent: November 10, 2020
    Assignee: Wisconsin Alumni Research Foundation
    Inventors: Gokul Subramanian Ravi, Mikko H. Lipasti
  • Patent number: 10747541
    Abstract: Instructions are executed in a pipeline. Storage accessible to the pipeline stores branch prediction information characterizing results of branch instructions previously executed. A predicted branch result is provided, for at least some branch instructions, based on a selected predictor of multiple predictors. An actual branch result is provided based on an executed branch instruction, and the branch prediction information is updated based on the actual branch result. The predictors include: a first predictor that determines the predicted branch result based on at least a portion of the branch prediction information; and a second predictor that determines the predicted branch result independently from the branch prediction information.
    Type: Grant
    Filed: January 25, 2018
    Date of Patent: August 18, 2020
    Assignee: Marvell Asia Pte, Ltd.
    Inventors: Shubhendu Sekhar Mukherjee, David Kravitz, Edward J. McLellan
  • Patent number: 10732976
    Abstract: A processor includes an instruction pipeline. The pipeline can be operated alternatively in a multi-thread mode and in a single-thread mode. In the multi-thread mode, the instruction pipeline processes multiple threads in an interleaved or simultaneous manner. In the single-thread mode, the pipeline processes a single thread. The instruction pipeline comprises multiple functional units, each of which is reserved for one thread among the multiple threads when the pipeline is in the multi-thread mode and reserved for one context layer among multiple context layers when the instruction pipeline is in the single-thread mode.
    Type: Grant
    Filed: January 10, 2013
    Date of Patent: August 4, 2020
    Assignee: NXP USA, Inc.
    Inventors: Alistair Robertson, Jeffrey W. Scott
  • Patent number: 10719325
    Abstract: Very long instruction word (VLIW) instruction processing using a reduced-width processor is disclosed. In a particular embodiment, a VLIW processor includes a control circuit configured to receive a VLIW packet that includes a first number of instructions and to distribute the instructions to a second number of instruction execution paths. The first number is greater than the second number. The VLIW processor also includes physical registers configured to store results of executing the instructions and a register renaming circuit that is coupled to the control circuit.
    Type: Grant
    Filed: November 7, 2017
    Date of Patent: July 21, 2020
    Assignee: Qualcomm Incorporated
    Inventors: Peter Sassone, Christopher Koob, Suresh Kumar Venkumahanti
  • Patent number: 10705851
    Abstract: A method for scheduling micro-instructions, performed by a first qualifier, is provided. The method includes the following steps: detecting a write-back signal broadcasted by a second qualifier; determining whether a value of a first load-detection counting logic is to be synchronized with a value of a second load-detection counting logic carried by the write-back signal according to content of the write-back signal; determining whether execution statuses of all load micro-instructions are cache hit when the synchronized value of the first load-detection counting logic reaches a predetermined value; and driving a release circuit to remove a micro-instruction in a reservation station queue when the execution statuses of the all load micro-instructions are cache hit and the micro-instruction has been dispatched to an arithmetic and logic unit for execution.
    Type: Grant
    Filed: October 2, 2018
    Date of Patent: July 7, 2020
    Assignee: SHANGHAI ZHAOXIN SEMICONDUCTOR CO., LTD.
    Inventor: Xiaolong Fei
  • Patent number: 10698729
    Abstract: The invention relates to a method for organizing tasks, in at least some nodes of a computer cluster, comprising: First, launching two containers on each of said nodes, a standard container and a priority container, next, for all or part of said nodes with two containers, at each node, while a priority task does not occur, assigning one or more available resources of the node to the standard container thereof in order to execute a standard task, the priority container thereof not executing any task, when a priority task occurs, dynamically switching only a portion of the resources from the standard container thereof to the priority container thereof, such that, the priority task is executed in the priority container with the switched portion of the resources, and the standard task continues to be executed, without being halted, in the standard container with the non-switched portion of the resources.
    Type: Grant
    Filed: December 16, 2015
    Date of Patent: June 30, 2020
    Assignee: BULL SAS
    Inventors: Yann Maupu, Thomas Cadeau, Matthieu Daniel
  • Patent number: 10678542
    Abstract: Systems, apparatuses, and methods for implementing a non-shifting reservation station. A dispatch unit may write an operation into any entry of a reservation station. The reservation station may include an age matrix for determining the relative ages of the operations stored in the entries of the reservation station. The reservation station may include selection logic which is configured to pick the oldest ready operation from the reservation station based on the values stored in the age matrix. The selection logic may utilize control logic to mask off columns of an age matrix corresponding to non-ready operation so as to determine which operation is the oldest ready operation in the reservation station. Also, the reservation station may be configured to dequeue operations early when these operations do not have load dependency.
    Type: Grant
    Filed: July 24, 2015
    Date of Patent: June 9, 2020
    Assignee: Apple Inc.
    Inventors: Ian D. Kountanis, Mahesh K. Reddy
  • Patent number: 10620960
    Abstract: An apparatus and method are provided for performing branch prediction. The apparatus has processing circuitry for executing instructions out-of-order with respect to original program order, and event counting prediction circuitry for maintaining event count values for branch instructions, for use in making branch outcome predictions for those branch instructions. Further, checkpointing storage stores state information of the apparatus at a plurality of checkpoints to enable the state information to be restored for a determined one of those checkpoints in response to a flush event. The event counting prediction circuitry has training storage with a first number of training entries, each training entry being associated with a branch instruction.
    Type: Grant
    Filed: August 20, 2018
    Date of Patent: April 14, 2020
    Assignee: Arm Limited
    Inventors: Houdhaifa Bouzguarrou, Guillaume Bolbenes, Vincenzo Consales
  • Patent number: 10613993
    Abstract: Program code intended to be copied into the cache memory of a microprocessor is transferred encrypted between the random-access memory and the processor, and the decryption is carried out at the level of the cache memory. A checksum may be inserted into the cache lines in order to allow integrity verification, and this checksum is then replaced with a specific instruction before delivery of an instruction word to the central unit of the microprocessor.
    Type: Grant
    Filed: January 30, 2015
    Date of Patent: April 7, 2020
    Assignee: STMICROELECTRONICS SA
    Inventor: Bruno Fel
  • Patent number: 10606593
    Abstract: Technical solutions are described for out-of-order (OoO) execution of one or more instructions by a processing unit. An example method includes looking up, by a load-store unit (LSU), an entry in an effective address directory (EAD) for an effective address (EA) of an operand of an instruction to be launched. Further, the method includes, in response to the EA being present in the EAD, launching, by the LSU, the instruction with the RA from the EAD, and in response to the EA not being present in the EAD, looking up, by the LSU, the EA in an effective real table (ERT) entry, and launching the instruction with the RA from the ERT entry. Further, in response to the ERT entry to be removed, the ERT entry including an ERT index and a mapping between the EA and the RA, removing the entry of the EA from the EAD.
    Type: Grant
    Filed: November 29, 2017
    Date of Patent: March 31, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Bryan Lloyd, Balaram Sinharoy
  • Patent number: 10606590
    Abstract: Technical solutions are described for out-of-order (OoO) execution of one or more instructions by a processing unit. An example method includes looking up, by a load-store unit (LSU), an entry in an effective address directory (EAD) for an effective address (EA) of an operand of an instruction to be launched. Further, the method includes, in response to the EA being present in the EAD, launching, by the LSU, the instruction with the RA from the EAD, and in response to the EA not being present in the EAD, looking up, by the LSU, the EA in an effective real table (ERT) entry, and launching the instruction with the RA from the ERT entry. Further, in response to the ERT entry to be removed, the ERT entry including an ERT index and a mapping between the EA and the RA, removing the entry of the EA from the EAD.
    Type: Grant
    Filed: October 6, 2017
    Date of Patent: March 31, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Bryan Lloyd, Balaram Sinharoy
  • Patent number: 10592298
    Abstract: A system and method for processing a data packet. The method comprises initiating processing of a received plurality of data packets by CPU cores; tracking, by a scale management routine, processing queues for the CPU cores and their load. In response to an average size of a processing queue being lower than a first pre-determined queue threshold, and a CPU core load being lower than a first pre-determined load threshold, preventing adding new data packets to the processing queue, monitoring emptying of processing queues for each processing CPU core. In response to an average size of a processing queue or a CPU core load being above a second pre-determined upper queue threshold or the second pre-determined load threshold, transmitting all data from processing queues for each processing CPU core to a memory buffer, increasing the number of processing cores by one; and initiating data packet processing.
    Type: Grant
    Filed: January 26, 2018
    Date of Patent: March 17, 2020
    Assignee: NFWARE, INC.
    Inventors: Alexander Britkin, Viacheslav Morozov, Igor Pavlov
  • Patent number: 10585669
    Abstract: A system and method suppresses occurrence of stalling caused by data dependency other than register dependency in an out-of-order processor. A stall reducing method includes a handler for detecting a stall occurring during execution of execution code using a performance monitoring unit, and for identifying, based on dependencies, a second instruction on which a first instruction is data dependent, the stall based on this dependency. A profiler registers the second instruction as profile information. An optimization module inserts a thread yield instruction in an appropriate position inside execution code or an original code file based on the profile information, and outputs optimized execution code.
    Type: Grant
    Filed: July 31, 2018
    Date of Patent: March 10, 2020
    Assignee: International Business Machines Corporation
    Inventor: Takeshi Ogasawara
  • Patent number: 10564976
    Abstract: Aspects of the invention include tracking dependencies between instructions in an issue queue. The tracking includes, for each instruction in the issue queue, identifying whether the instruction is dependent on each of a threshold number of instructions added to the issue queue prior to the instruction. The tracking also includes identifying whether the instruction is dependent on one or more other instructions added to the issue queue prior to the instruction that are not included in the each of the threshold number of instructions. A dependency between the instruction and each of the other instructions is tracked as a plurality of groups by indicating that a dependency exists between the instruction and one of the groups based on identifying a dependency between the instruction and at least one instruction in the group. Instructions are issued from the issue queue based at least in part on the tracking.
    Type: Grant
    Filed: November 30, 2017
    Date of Patent: February 18, 2020
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Joel A. Silberman, Balaram Sinharoy
  • Patent number: 10558460
    Abstract: Systems and techniques are disclosed for general purpose register dynamic allocation based on latency associated with of instructions in processor threads. A streaming processor can include a general purpose registers configured to stored data associated with threads, and a thread scheduler configured to receive allocation information for the general purpose registers, the information describing general purpose registers that are to be assigned as persistent general purpose registers (pGPRs) and volatile general purpose registers (vGPRs). The plurality of general purpose registers can be allocated according to the received information. The streaming processor can include the general purpose registers allocated according to the received information, the allocated based on execution latencies of instructions included in the threads.
    Type: Grant
    Filed: December 14, 2016
    Date of Patent: February 11, 2020
    Assignee: QUALCOMM Incorporated
    Inventors: Yun Du, Liang Han, Lin Chen, Chihong Zhang, Hongjiang Shang, Jing Wu, Zilin Ying, Chun Yu, Guofang Jiao, Andrew Gruber, Eric Demers
  • Patent number: 10552130
    Abstract: A method of providing by a code optimization service an optimized version of a code unit to a managed runtime environment is disclosed. Information related to one or more runtime conditions associated with the managed runtime environment that is executing in a different process than that of the code optimization service is obtained, wherein the one or more runtime conditions are subject to change during the execution of the code unit. The optimized version of the code unit and a corresponding set of one or more speculative assumptions are provided to the managed runtime environment, wherein the optimized version of the code unit produces the same logical results as the code unit unless at least one of the set of one or more speculative assumptions is not true, wherein the set of one or more speculative assumptions are based on the information related to the one or more runtime conditions.
    Type: Grant
    Filed: June 8, 2018
    Date of Patent: February 4, 2020
    Assignee: Azul Systems, Inc.
    Inventors: Gil Tene, Philip Reames
  • Patent number: 10445097
    Abstract: Apparatus and methods are disclosed for decoding targets from an instruction and transmitting data to those targets in accordance with a current instruction. Multimodal target hardware is used in conjunction with one or more of the routers so as to route data to an appropriate target. The data can be one or more operands or a predicate and the targets can include operand buffers, broadcast channels, and general registers. In this way, operands, for example, can be directed for use with multiple subsequent instructions, and there are multiple modes for distributing the operands to the multiple instructions.
    Type: Grant
    Filed: March 17, 2016
    Date of Patent: October 15, 2019
    Assignee: Microsoft Technology Licensing, LLC
    Inventors: Douglas C. Burger, Aaron L. Smith
  • Patent number: 10437603
    Abstract: The disclosed inventions include a processor apparatus and method that enable a general purpose processor to achieve twice the operating frequency of typical processor implementations with a modest increase in area and a modest increase in energy per operation. The invention relies upon exploiting multiple independent streams of execution. Low area and low energy memory arrays used for register files operate a modest frequency. Instructions can be issued at a rate higher than this frequency by including logic that guarantees the spacing between instructions from the same thread are spaced wider than the time to access the register file. The result of the invention is the ability to overlap long latency structures, which allows using lower energy structures, thereby reducing energy per operation.
    Type: Grant
    Filed: February 20, 2018
    Date of Patent: October 8, 2019
    Inventor: Kevin Sean Halle
  • Patent number: 10379866
    Abstract: An electronic apparatus generating compiled data used in a very long instruction word (VLIW) processor including a plurality of function units is provided. The electronic apparatus includes a storage and a processor configured to control the storage to store the compiled data in which a plurality of VLIW instructions are compiled, identify a VLIW instruction from the compiled data; and update, if a multi-cycle no operation (nop) instruction for the plurality of function units is identified within a cycle corresponding to a latency of the identified VLIW instruction and if an end cycle of another VLIW instruction is within the cycle corresponding to the latency of the identified VLIW instruction, the compiled data by including information on a cycle difference between an end cycle of the identified VLIW instruction and the end cycle of the another VLIW instruction in the multi-cycle nop instruction.
    Type: Grant
    Filed: July 19, 2017
    Date of Patent: August 13, 2019
    Assignee: Samsung Electronics Co., Ltd
    Inventors: Jong-hun Lee, Jae-un Park, Si-hoon Song, Myung-sun Kim
  • Patent number: 10346165
    Abstract: A load/store unit including a memory queue configured to store a plurality of memory instructions and state information indicating whether each memory instruction of the plurality of memory instructions can be performed independently, with, separately, or after older pending instructions; and a state-selection circuit configured to set a state information of each memory instruction of the plurality of memory instructions in view of an older pending instruction in the memory queue.
    Type: Grant
    Filed: October 31, 2014
    Date of Patent: July 9, 2019
    Assignee: Avago Technologies International Sales Pte. Limited
    Inventor: Tariq Kurd
  • Patent number: 10346174
    Abstract: Operation of a multi-slice processor that includes a plurality of execution slices and a plurality of load/store slices, where the multi-slice processor is configured to dynamically cancel partial load operations by, among other steps, receiving a load instruction requesting multiple portions of data; receiving a load instruction requesting multiple portions of data; determining that a load of one portion of the requested multiple portions is unavailable to be issued; and responsive to determining that the load of the one portion of the requested multiple portions is unavailable to be issued, delaying issuance of the load instruction.
    Type: Grant
    Filed: March 24, 2016
    Date of Patent: July 9, 2019
    Assignee: International Business Machines Corporation
    Inventors: Elizabeth A. McGlone, Jennifer L. Molnar
  • Patent number: 10331455
    Abstract: An electronic apparatus generating compiled data used in a very long instruction word (VLIW) processor including a plurality of function units is provided. The electronic apparatus includes a storage and a processor configured to control the storage to store the compiled data in which a plurality of VLIW instructions are compiled, identify a VLIW instruction from the compiled data; and update, if a multi-cycle no operation (nop) instruction for the plurality of function units is identified within a cycle corresponding to a latency of the identified VLIW instruction and if an end cycle of another VLIW instruction is within the cycle corresponding to the latency of the identified VLIW instruction, the compiled data by including information on a cycle difference between an end cycle of the identified VLIW instruction and the end cycle of the another VLIW instruction in the multi-cycle nop instruction.
    Type: Grant
    Filed: July 19, 2017
    Date of Patent: June 25, 2019
    Assignee: Samsung Electronics Co., Ltd
    Inventors: Jong-hun Lee, Jae-un Park, Si-hoon Song, Myung-sun Kim