Simultaneous Issuance Of Multiple Instructions Patents (Class 712/215)

Coprocessors with bypass optimization, variable grid architecture, and fused vector operations

Patent number: 12174785

Abstract: In an embodiment, a coprocessor may include a bypass indication which identifies execution circuitry that is not used by a given processor instruction, and thus may be bypassed. The corresponding circuitry may be disabled during execution, preventing evaluation when the output of the circuitry will not be used for the instruction. In another embodiment, the coprocessor may implement a grid of processing elements in rows and columns, where a given coprocessor instruction may specify an operation that causes up to all of the processing elements to operate on vectors of input operands to produce results. Implementations of the coprocessor may implement a portion of the processing elements. The coprocessor control circuitry may be designed to operate with the full grid or partial grid, reissuing instructions in the partial grid case to perform the requested operation. In still another embodiment, the coprocessor may be able to fuse vector mode operations.

Type: Grant

Filed: July 20, 2022

Date of Patent: December 24, 2024

Assignee: Apple Inc.

Inventors: Aditya Kesiraju, Andrew J. Beaumont-Smith, Boris S. Alvarez-Heredia, Pradeep Kanapathipillai, Ran A. Chachick
Dynamic insights extraction and trend prediction

Patent number: 12086601

Abstract: Techniques for process execution trend prediction and visualization are disclosed. The disclosed system receives a process execution request to be executed on a set of targets. The request may include request characteristics, such as a request type and computations to be performed during execution. The system analyzes the request characteristics to determine the computations to execute and for initiates request execution on the targets. Based on the analysis, the system generates predictions regarding the execution, including an estimated completion time. During execution, the system displays various attributes of the execution in a dynamically updating visualization. The system also provides real-time recommendations on how the process can be optimized, such as to reduce execution time and errors.

Type: Grant

Filed: March 16, 2022

Date of Patent: September 10, 2024

Assignee: Oracle International Corporation

Inventor: Anadi Upadhyaya
Computer system and program execution method

Patent number: 12061936

Abstract: A synchronous core processing unit executes the same program as a program executed by another computer for an execution unit at a synchronization timing synchronized with a synchronous core processing unit of the other computer, and migrates the program being executed for which migration is requested according to characteristics of the program to a quasi-synchronous core processing unit. The quasi-synchronous core processing unit executes the program migrated from the synchronous core processing unit, and then migrates the program to the synchronous core processing unit. The synchronous core processing unit outputs, to an output comparison machine, an execution result obtained by executing the program migrated from the quasi-synchronous core processing unit at the synchronization timing.

Type: Grant

Filed: March 5, 2020

Date of Patent: August 13, 2024

Assignee: HITACHI, LTD.

Inventors: Takuma Nomizu, Hidehiro Kawai, Masaaki Ogawa
Processor with multiple fetch and decode pipelines

Patent number: 12039337

Abstract: A processor employs a plurality of fetch and decode pipelines by dividing an instruction stream into instruction blocks with identified boundaries. The processor includes a branch predictor that generates branch predictions. Each branch prediction corresponds to a branch instruction and includes a prediction that the corresponding branch is to be taken or not taken. In addition, each branch prediction identifies both an end of the current branch prediction window and the start of another branch prediction window. Using these known boundaries, the processor provides different sequential fetch streams to different ones of the plurality of fetch and decode states, which concurrently process the instructions of the different fetch streams, thereby improving overall instruction throughput at the processor.

Type: Grant

Filed: September 25, 2020

Date of Patent: July 16, 2024

Assignee: Advanced Micro Devices, Inc.

Inventors: Robert B. Cohen, Tzu-Wei Lin, Anthony J. Bybell, Bill Kai Chiu Kwan, Frank C. Galloway
System and method of VLIW instruction processing using reduced-width VLIW processor

Patent number: 11663011

Abstract: Very long instruction word (VLIW) instruction processing using a reduced-width processor is disclosed. In a particular embodiment, a VLIW processor includes a control circuit configured to receive a VLIW packet that includes a first number of instructions and to distribute the instructions to a second number of instruction execution paths. The first number is greater than the second number. The VLIW processor also includes physical registers configured to store results of executing the instructions and a register renaming circuit that is coupled to the control circuit.

Type: Grant

Filed: July 7, 2020

Date of Patent: May 30, 2023

Assignee: Qualcomm Incorporated

Inventors: Peter Sassone, Christopher Koob, Suresh Kumar Venkumahanti
Register sharing mechanism to equally allocate disabled thread registers to active threads

Patent number: 11579878

Abstract: An apparatus is disclosed. The apparatus includes one or more processors comprising register sharing circuitry to receive meta-information indicating a number of threads that are to be disabled and provide an indication that an associated thread is disabled, a plurality of General Purpose Register Files (GRFs), wherein one or more of the plurality of GRFs is associated with one of the plurality of threads and a plurality of multiplexers coupled to the one or more GRFs to receive the indication from the register sharing circuitry and disable thread access to an associated GRF based on an indication that a thread is to be disabled.

Type: Grant

Filed: May 22, 2020

Date of Patent: February 14, 2023

Assignee: Intel Corporation

Inventors: Pratik J. Ashar, Supratim Pal, Subramaniam Maiyuran, Wei-Yu Chen, Guei-Yuan Lueh
Apparatus and method for store pairing with reduced hardware requirements

Patent number: 11544062

Abstract: An apparatus and method for pairing store operations. For example, one embodiment of a processor comprises: a grouping eligibility checker to evaluate a plurality of store instructions based on a set of grouping rules to determine whether two or more of the plurality of store instructions are eligible for grouping; and a dispatcher to simultaneously dispatch a first group of store instructions of the plurality of store instructions determined to be eligible for grouping by the grouping eligibility checker.

Type: Grant

Filed: March 28, 2020

Date of Patent: January 3, 2023

Assignee: Intel Corporation

Inventors: Raanan Sade, Igor Yanover, Stanislav Shwartsman, Muhammad Taher, David Zysman, Liron Zur, Yiftach Gilad
Executing multiple programs simultaneously on a processor core

Patent number: 11531552

Abstract: Systems and methods are disclosed for allocating resources to contexts in block-based processor architectures. In one example of the disclosed technology, a processor is configured to spatially allocate resources between multiple contexts being executed by the processor, including caches, functional units, and register files. In a second example of the disclosed technology, a processor is configured to temporally allocate resources between multiple contexts, for example, on a clock cycle basis, including caches, register files, and branch predictors. Each context is guaranteed access to its allocated resources to avoid starvation from contexts competing for resources of the processor. A results buffer can be used for folding larger instruction blocks into portions that can be mapped to smaller-sized instruction windows. The results buffer stores operand results that can be passed to subsequent portions of an instruction block.

Type: Grant

Filed: February 6, 2017

Date of Patent: December 20, 2022

Assignee: Microsoft Technology Licensing, LLC

Inventors: Gagan Gupta, Douglas C. Burger
Apparatus and method for supporting out-of-order program execution of instructions

Patent number: 11429393

Abstract: An apparatus for data processing and a method of data processing are provided. Data processing operations are performed in response to instructions which reference architectural registers using physical registers to store data values when performing the data processing operations. Mappings between the architectural registers and the physical registers are stored, and when a data hazard condition is identified with respect to out-of-order program execution of an instruction, an architectural register specified in the instruction is remapped to an available physical register. A reorder buffer stores an entry for each destination architectural register specified by the instruction, entries being stored in program order, and an entry specifies a destination architectural register and an original physical register to which the destination architectural register was mapped before the architectural register remapped to an available physical register.

Type: Grant

Filed: November 11, 2015

Date of Patent: August 30, 2022

Assignee: ARM LIMITED

Inventors: Vladimir Vasekin, Ian Michael Caulfield, Chiloda Ashan Senarath Pathirane
Multi-channel data path circuitry

Patent number: 11422822

Abstract: Techniques are disclosed relating to sharing datapath circuitry among multiple SIMD groups. In some embodiments, pipeline circuitry is configured to perform operations specified by instructions of first and second assigned SIMD groups. The pipeline circuitry may include first and second front-end circuitry configured to decode instructions of the respective SIMD groups. The pipeline circuitry may include shared execution circuitry configured to perform operations specified by the first and second assigned SIMD groups and arbitration circuitry configured to select an instruction from among at least the first and second front-end circuitry for assignment to the shared execution circuitry in a current cycle. The arbitration circuitry may select an instruction based on one or more of: stall counts, whether available instructions are being speculatively executed, whether ones of available instructions target a particular portion of the shared execution circuitry, numbers of execution cycles, and SIMD group ages.

Type: Grant

Filed: May 8, 2020

Date of Patent: August 23, 2022

Assignee: Apple Inc.

Inventors: Robert D. Kenney, Jason N. Dale
Assignment device, assignment method, and assignment program

Patent number: 11403388

Abstract: An extracting unit randomly extracts a block from among the blocks of instruction strings constituting the byte code of a first program and, at the time of execution of the first program, extracts the blocks which are invariably executed before the randomly-extracted block. A dividing unit randomly divides, into a plurality of blocks, the instruction strings constituting the byte code of a second program which enables detection of tampering of the first program. An inserting unit inserts the plurality of blocks, which are obtained by division by the dividing unit, at different positions in the block extracted by the extracting unit, while maintaining the execution sequence written in the second program.

Type: Grant

Filed: September 22, 2017

Date of Patent: August 2, 2022

Assignee: NIPPON TELEGRAPH AND TELEPHONE CORPORATION

Inventors: Fumihiro Kanei, Mitsuaki Akiyama, Yuta Takata, Takeshi Yagi
Microprocessor with multistep-ahead branch predictor

Patent number: 11379243

Abstract: A microprocessor with a multistep-ahead branch predictor is shown. The branch predictor is coupled to an instruction cache and has an N-stage pipelined architecture, which is configured to perform branch prediction to control the instruction fetching of the instruction cache. The branch predictor performs branch prediction for (N?1) instruction-address blocks in parallel, wherein the (N?1) instruction-address blocks include a starting instruction-address block and (N?2) subsequent instruction-address blocks. The branch predictor is thereby ahead of branch prediction of the starting instruction-address block. The branch predictor stores reference information about branch prediction in at least one memory and performs a parallel search of the memory for the branch prediction of the (N-1) instruction-address blocks.

Type: Grant

Filed: October 29, 2020

Date of Patent: July 5, 2022

Assignee: SHANGHAI ZHAOXIN SEMICONDUCTOR CO., LTD.

Inventors: Fangong Gong, Mengchen Yang, Guohua Chen
Multiple guard tag setting instruction

Patent number: 11327903

Abstract: An apparatus has memory access circuitry to perform a tag-guarded memory access operation in response to a target address. The tag-guarded memory access operation comprises: comparing an address tag associated with the target address with a guard tag stored in a memory system in association with a block of one or more memory locations comprising an addressed location identified by the target address, and generating an indication of whether a match is detected between the guard tag and the address tag. An instruction decoder decodes a multiple guard tag setting instruction to control the memory access circuitry to trigger memory accesses to update the guard tags associated with at least two consecutive blocks of one or more memory locations.

Type: Grant

Filed: December 10, 2018

Date of Patent: May 10, 2022

Assignee: Arm Limited

Inventor: Graeme Peter Barnes
System and method for multiplexer tree indexing

Patent number: 11308057

Abstract: Described herein is a system and method for multiplexer tree (muxtree) indexing. Muxtree indexing performs hashing and row reduction in parallel by use of each select bit only once in a particular path of the muxtree. The muxtree indexing generates a different final index as compared to conventional hashed indexing but still results in a fair hash, where all table entries get used with equal distribution with uniformly random selects.

Type: Grant

Filed: November 28, 2017

Date of Patent: April 19, 2022

Assignee: Advanced Micro Devices, Inc.

Inventors: Steven R. Havlir, Patrick J. Shyvers
Dynamic update of the number of architected registers assigned to software threads using spill counts

Patent number: 11275614

Abstract: A computer system includes a processor, main memory, and controller. The processor includes a plurality of hardware threads configured to execute a plurality of software threads. The main memory includes a first register table configured to contain a current set of architected registers for the currently running software threads. The controller is configured to change a first number of the architected registers assigned to a given one of the software threads to a second number of architected registers when a result of monitoring current usage of the registers by the software threads indicates that the change will improve performance of the computer system. The processor includes a second register table configured to contain a subset of the architected registers and a mapping table for each software thread indicating whether the architected registers referenced by the corresponding software thread are located in the first register table or the second register table.

Type: Grant

Filed: September 27, 2019

Date of Patent: March 15, 2022

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Harold W. Cain, III, Hubertus Franke, Charles R. Johns, Hung Q. Le, Ravi Nair, James A. Kahle
Instruction scheduling during execution in a processor

Patent number: 11256511

Abstract: A method of performing instruction scheduling during execution in a processor includes receiving, at an execution unit of the processor, an initial assignment of an assigned execution resource among two or more execution resources to execute an operation. An instruction includes two or more operations. Based on determining that the assigned execution resource is not available, the method also includes determining, at the execution unit, whether another execution resource among the two or more execution resources is available to execute the operation. Based on determining that the other execution resource is available, the method further includes executing the operation with the other execution resource.

Type: Grant

Filed: May 20, 2019

Date of Patent: February 22, 2022

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Cedric Lichtenau, Stefan Payer, Kerstin Claudia Schelm, Anthony Saporito, Gregory William Alexander
System and method for reconfiguring and deploying soft stock-keeping units

Patent number: 11218322

Abstract: Techniques and apparatuses for issuance of license upgrades for hardware components in the field, as well as the hardware components, are described. In one embodiment, for example an apparatus may include processor circuitry and memory in communication with the processor circuitry, wherein the memory contains a configuration data block and license data block, the configuration data block being read from the memory via a licensing apparatus and the licensing data block being written to the memory by the licensing apparatus. The processor may include executable code to process the licensing data block to facilitate an upgrade of the capabilities of the processor circuitry.

Type: Grant

Filed: September 28, 2017

Date of Patent: January 4, 2022

Assignee: INTEL CORPORATION

Inventors: Sergiu D. Ghetie, Neeraj S. Upasani, Chukwunenye S. Nnebe, Won Lee, Shaila R. Murty, Arkadiusz Berent, Vasuki Chilukuri, David T. Mayo, Scott P. Bobholz, Vinila Rose, Wojciech S. Powiertowski
Systems and methods for multi-bit memory with embedded logic

Patent number: 11132486

Abstract: Systems and method are provided that include a standard cell with multiple input and output storage elements, such as flip flops, latches, etc., with some combination logic interconnected between them. In embodiments, the slave latches on input flip flops are replaced with a fewer number latches at a downstream node(s) of the combination logic resulting in improved performance, area and power, while maintaining functionality at the interface pins of the standard cell. The process of inferring such a standard cell from a behavioral description, such as RTL, of a design or remapping equivalent sub-circuits from a netlist to such a standard cell is also described.

Type: Grant

Filed: May 21, 2020

Date of Patent: September 28, 2021

Assignee: Taiwan Semiconductor Manufacturing Company, Ltd.

Inventors: Guru Prasad, Sachin Kumar
SMT processor to create a virtual vector register file for a borrower thread from a number of donated vector register files

Patent number: 11132228

Abstract: A computing device and a method of allocating vector register files in a simultaneously-multithreaded (SMT) processor core are provided. A request for a first number (M) of vector register files is received from a borrower thread of the processor core. One or more available donor threads of the processor core are identified. A second number (N) of the vector register files, of the identified one or more available donor threads, are assigned to the borrower thread, where N is ?M. The borrower thread is parameterized to create a virtualized vector register file for the borrower thread, based on a width of the N vector register files of the identified one or more donor threads.

Type: Grant

Filed: March 21, 2018

Date of Patent: September 28, 2021

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Mauricio Serrano, Giles Frazier, Silvia Melitta Mueller
System, apparatus and method for a hybrid reservation station for a processor

Patent number: 11126438

Abstract: In one embodiment, a reservation station of a processor includes: a plurality of first lanes having a plurality of entries to store information for instructions having in-order dependencies; a variable latency tracking table including a second plurality of entries to store information for instructions having a variable latency; and a scheduler circuit to access a head entry of the plurality of first lanes to schedule, for execution on at least one execution unit, at least one instruction from the head entry of at least one of the plurality of first lanes. Other embodiments are described and claimed.

Type: Grant

Filed: June 26, 2019

Date of Patent: September 21, 2021

Assignee: Intel Corporation

Inventors: Srikanth Srinivasan, Thomas Mullins, Ammon Christiansen, James Hadley, Robert S. Chappell, Sean Mirkes
Nonvolatile memory device and operation method thereof

Patent number: 11106363

Abstract: A nonvolatile memory device includes a nonvolatile memory, a volatile memory being a cache memory of the nonvolatile memory, and a first controller configured to control the nonvolatile memory. The nonvolatile memory device further includes a second controller configured to receive a device write command and an address, and transmit, to the volatile memory through a first bus, a first read command and the address and a first write command and the address sequentially, and transmit a second write command and the address to the first controller through a second bus, in response to the reception of the device write command and the address.

Type: Grant

Filed: May 17, 2019

Date of Patent: August 31, 2021

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Youngjin Cho, Sungyong Seo, Sun-Young Lim, Uksong Kang, Chankyung Kim, Duckhyun Chang, JinHyeok Choi
Branch look-ahead instruction disassembling, assembling, and delivering system apparatus and method for microprocessor system

Patent number: 11055099

Abstract: A method and system of the branch look-ahead (BLA) instruction disassembling, assembling, and delivering are designed for improving speed of branch prediction and instruction fetch of microprocessor systems by reducing the amount of clock cycles required to deliver branch instructions to a branch predictor located inside the microprocessors. The invention is also designed for reducing run-length of the instructions found between branch instructions by disassembling the instructions in a basic block as a BLA instruction and a single or plurality of non-BLA instructions from the software/assembly program. The invention is also designed for dynamically reassembling the BLA and the non-BLA instructions and delivering them to a single or plurality of microprocessors in a compatible sequence. In particular, the reassembled instructions are concurrently delivered to a single or plurality of microprocessors in a timely and precise manner while providing compatibility of the software/assembly program.

Type: Grant

Filed: February 17, 2019

Date of Patent: July 6, 2021

Inventor: Yong-Kyu Jung
Way predictor and enable logic for instruction tightly-coupled memory and instruction cache

Patent number: 11048515

Abstract: Disclosed herein are systems and method for instruction tightly-coupled memory (iTIM) and instruction cache (iCache) access prediction. A processor may use a predictor to enable access to the iTIM or the iCache and a particular way (a memory structure) based on a location state and program counter value. The predictor may determine whether to stay in an enabled memory structure, move to and enable a different memory structure, or move to and enable both memory structures. Stay and move predictions may be based on whether a memory structure boundary crossing has occurred due to sequential instruction processing, branch or jump instruction processing, branch resolution, and cache miss processing. The program counter and a location state indicator may use feedback and be updated each instruction-fetch cycle to determine which memory structure(s) needs to be enabled for the next instruction fetch.

Type: Grant

Filed: August 28, 2019

Date of Patent: June 29, 2021

Assignee: SiFive, Inc.

Inventors: Krste Asanovic, Andrew Waterman
Integrated circuit and method of forming same and a system

Patent number: 10990745

Abstract: An integrated circuit includes a first bit flip-flop and a second flip-flop. The first flip-flop has a first driving capability. The second flip-flop has a second driving capability different from the first driving capability. The first flip-flop and the second flip-flop are part of a multibit flip-flop configured to share at least a first clock pin. The first clock pin is configured to receive the first clock signal.

Type: Grant

Filed: September 3, 2019

Date of Patent: April 27, 2021

Assignee: TAIWAN SEMICONDUCTOR MANUFACTURING COMPANY LTD.

Inventors: Sheng-Hsiung Chen, Shao-Huan Wang, Wen-Hao Chen, Chun-Yao Ku, Hung-Chih Ou
Method and apparatus for a multi-level reservation station with instruction recirculation

Patent number: 10956160

Abstract: A processor and method are described for a multi-level reservation station.

Type: Grant

Filed: March 27, 2019

Date of Patent: March 23, 2021

Assignee: Intel Corporation

Inventors: Mark Dechene, Srikanth Srinivasan, Matthew Merten, Ammon Christiansen
Fast multi-width instruction issue in parallel slice processor

Patent number: 10942745

Abstract: Fast issuance and execution of a multi-width instruction across multiple slices in a parallel slice processor core is supported in part through the use of an early notification signal passed between issue logic associated with multiple slices handling that multi-width instruction coupled with an issuance of a different instruction by the originating issue logic for the early notification signal.

Type: Grant

Filed: September 25, 2018

Date of Patent: March 9, 2021

Assignee: International Business Machines Corporation

Inventors: Salma Ayub, Jeffrey C. Brownscheidle, Sundeep Chadha, Dung Q. Nguyen, Tu-An T. Nguyen, Salim A. Shah, Brian W. Thompto
Instruction chaining

Patent number: 10936321

Abstract: An approach is disclosed that that in one or more embodiments includes receiving an indicator to issue an out-of-order instruction or a type of out-of-order instruction in-order; receiving a first instruction; determining whether the first instruction corresponds to the indicated out-of-order instruction or the type of out-of-order instruction; writing, in response to determining that the first instruction corresponds to the indicated out-of-order instruction or the type of out-of-order instruction, an instruction identifier and a dependent instruction opcode into a first queue and an issue queue of the processor; receiving at least one subsequent instruction; determining whether an instruction opcode of the subsequent instructions matches the dependent instruction opcode of the first instruction; and writing, in response to determining the instruction opcode of the subsequent instruction matches the dependent instruction opcode of the instruction, a dependent instruction identifier for the subsequent instruc

Type: Grant

Filed: February 1, 2019

Date of Patent: March 2, 2021

Assignee: International Business Machines Corporation

Inventors: Kurt A. Feiste, Joshua W. Bowman, Christopher M. Mueller, Dung Q. Nguyen, Deepak K. Singh, Brian W. Thompto
Scalable dependency matrix with a single summary bit in an out-of-order processor

Patent number: 10929140

Abstract: Aspects of the invention include tracking dependencies between instructions in an issue queue. The tracking includes, for each instruction in the issue queue, identifying whether the instruction is dependent on each of a threshold number of instructions added to the issue queue prior to the instruction. The tracking also includes identifying whether the instruction is dependent on one or more other instructions in a group of instructions in the issue queue that were added to the issue queue prior to the instruction and that are not included in the threshold number of instructions that are tracked individually. A dependency between the instruction and the one or more other instructions in the group of instructions is tracked using a single summary bit that is set to indicate that a dependency exists between the instruction and the group of instructions. Instructions are issued from the issue queue based at least in part on the tracking.

Type: Grant

Filed: November 30, 2017

Date of Patent: February 23, 2021

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Joel A. Silberman, Balaram Sinharoy
Scheduling tasks using swap flags

Patent number: 10884743

Abstract: A method of activating scheduling instructions within a parallel processing unit is described. The method comprises decoding, in an instruction decoder, an instruction in a scheduled task in an active state and checking, by an instruction controller, if a swap flag is set in the decoded instruction. If the swap flag in the decoded instruction is set, a scheduler is triggered to de-activate the scheduled task by changing the scheduled task from the active state to a non-active state.

Type: Grant

Filed: June 18, 2018

Date of Patent: January 5, 2021

Assignee: Imagination Technologies Limited

Inventors: Simon Nield, Yoong-Chert Foo, Adam de Grasse, Luca Iuliano
Methods for scheduling that determine whether to remove a dependent micro-instruction from a reservation station queue based on determining a cache hit/miss status of a load micro-instruction once a count reaches a predetermined value and an apparatus using the same

Patent number: 10860327

Abstract: A method for scheduling micro-instructions, performed by a qualifier, is provided. The method includes the following steps: detecting a load write-back signal broadcasted by a load execution unit; determining whether to trigger a load-detection counting logic according to content of the load write-back signal; determining whether an execution status of a load micro-instruction is cache hit when the triggered load-detection counting logic reaches a predetermined value; and driving a release circuit to remove the first micro-instruction in a reservation station queue when the execution status of the load micro-instruction is cache hit and the first micro-instruction has been dispatched to an arithmetic and logic unit for execution.

Type: Grant

Filed: October 2, 2018

Date of Patent: December 8, 2020

Assignee: SHANGHAI ZHAOXIN SEMICONDUCTOR CO., LTD.

Inventor: Xiaolong Fei
Performing at least two branch predictions for non-contiguous instruction blocks at the same time using a prediction mapping

Patent number: 10853076

Abstract: An apparatus is provided to perform branch prediction in respect of a plurality of instructions divided into a plurality of blocks. Receiving circuitry receives references to at least two blocks in the plurality of blocks. Branch prediction circuitry performs at least two branch predictions at a time. The branch predictions are performed in respect of the at least two blocks and the at least two blocks are non-contiguous.

Type: Grant

Filed: February 21, 2018

Date of Patent: December 1, 2020

Assignee: Arm Limited

Inventors: Houdhaifa Bouzguarrou, Guillaume Bolbenes, Eddy Lapeyre, Luc Orion
Method to execute successive dependent instructions from an instruction stream in a processor

Patent number: 10831496

Abstract: The present disclosure relates to a method to execute successive dependent instructions from an instruction stream in a processor. In an embodiment, the invention relates to a method to execute successive dependent instructions from an instruction stream in a processor. The method may include identifying a first instruction and a second instruction. A given operand of a second instruction is an output of the first instruction of the pair. The first instruction is older than the second instruction. The method may include loading the operands of the first instruction and the second instruction. The method may include executing the first instruction and the second instruction.

Type: Grant

Filed: February 28, 2019

Date of Patent: November 10, 2020

Assignee: International Business Machines Corporation

Inventors: Maarten J. Boersma, Michael Klaus Kroener, Niels Fricke, Razvan Peter Figuli, Nandor Szirmak, Dung Q. Nguyen
Computer architecture allowing recycling of instruction slack time

Patent number: 10831232

Abstract: A computer architecture suitable for out-of-order processors manages the problem of timing slack, in which an instruction completes before its clock cycle, by recycling that slack to allow the next succeeding instruction allowing that instruction to begin execution earlier. This recycling mechanism is enabled through the use of a transparent gating between execution units which allows data transfer before clock cycle boundaries and, in some cases, by aggressively issuing children instructions contemporaneously with their parent instruction after a grandparent instruction is issued.

Type: Grant

Filed: February 15, 2019

Date of Patent: November 10, 2020

Assignee: Wisconsin Alumni Research Foundation

Inventors: Gokul Subramanian Ravi, Mikko H. Lipasti
Managing predictor selection for branch prediction

Patent number: 10747541

Abstract: Instructions are executed in a pipeline. Storage accessible to the pipeline stores branch prediction information characterizing results of branch instructions previously executed. A predicted branch result is provided, for at least some branch instructions, based on a selected predictor of multiple predictors. An actual branch result is provided based on an executed branch instruction, and the branch prediction information is updated based on the actual branch result. The predictors include: a first predictor that determines the predicted branch result based on at least a portion of the branch prediction information; and a second predictor that determines the predicted branch result independently from the branch prediction information.

Type: Grant

Filed: January 25, 2018

Date of Patent: August 18, 2020

Assignee: Marvell Asia Pte, Ltd.

Inventors: Shubhendu Sekhar Mukherjee, David Kravitz, Edward J. McLellan
Integrated circuit processor and method of operating the integrated circuit processor in different modes of differing thread counts

Patent number: 10732976

Abstract: A processor includes an instruction pipeline. The pipeline can be operated alternatively in a multi-thread mode and in a single-thread mode. In the multi-thread mode, the instruction pipeline processes multiple threads in an interleaved or simultaneous manner. In the single-thread mode, the pipeline processes a single thread. The instruction pipeline comprises multiple functional units, each of which is reserved for one thread among the multiple threads when the pipeline is in the multi-thread mode and reserved for one context layer among multiple context layers when the instruction pipeline is in the single-thread mode.

Type: Grant

Filed: January 10, 2013

Date of Patent: August 4, 2020

Assignee: NXP USA, Inc.

Inventors: Alistair Robertson, Jeffrey W. Scott
System and method of VLIW instruction processing using reduced-width VLIW processor

Patent number: 10719325

Abstract: Very long instruction word (VLIW) instruction processing using a reduced-width processor is disclosed. In a particular embodiment, a VLIW processor includes a control circuit configured to receive a VLIW packet that includes a first number of instructions and to distribute the instructions to a second number of instruction execution paths. The first number is greater than the second number. The VLIW processor also includes physical registers configured to store results of executing the instructions and a register renaming circuit that is coupled to the control circuit.

Type: Grant

Filed: November 7, 2017

Date of Patent: July 21, 2020

Assignee: Qualcomm Incorporated

Inventors: Peter Sassone, Christopher Koob, Suresh Kumar Venkumahanti
Scheduling that determines whether to remove a dependent micro-instruction from a reservation station queue based on determining cache hit/miss status of one ore more load micro-instructions once a count reaches a predetermined value

Patent number: 10705851

Abstract: A method for scheduling micro-instructions, performed by a first qualifier, is provided. The method includes the following steps: detecting a write-back signal broadcasted by a second qualifier; determining whether a value of a first load-detection counting logic is to be synchronized with a value of a second load-detection counting logic carried by the write-back signal according to content of the write-back signal; determining whether execution statuses of all load micro-instructions are cache hit when the synchronized value of the first load-detection counting logic reaches a predetermined value; and driving a release circuit to remove a micro-instruction in a reservation station queue when the execution statuses of the all load micro-instructions are cache hit and the micro-instruction has been dispatched to an arithmetic and logic unit for execution.

Type: Grant

Filed: October 2, 2018

Date of Patent: July 7, 2020

Assignee: SHANGHAI ZHAOXIN SEMICONDUCTOR CO., LTD.

Inventor: Xiaolong Fei
Method for organizing tasks in the nodes of a computer cluster, associated task organizer and cluster

Patent number: 10698729

Abstract: The invention relates to a method for organizing tasks, in at least some nodes of a computer cluster, comprising: First, launching two containers on each of said nodes, a standard container and a priority container, next, for all or part of said nodes with two containers, at each node, while a priority task does not occur, assigning one or more available resources of the node to the standard container thereof in order to execute a standard task, the priority container thereof not executing any task, when a priority task occurs, dynamically switching only a portion of the resources from the standard container thereof to the priority container thereof, such that, the priority task is executed in the priority container with the switched portion of the resources, and the standard task continues to be executed, without being halted, in the standard container with the non-switched portion of the resources.

Type: Grant

Filed: December 16, 2015

Date of Patent: June 30, 2020

Assignee: BULL SAS

Inventors: Yann Maupu, Thomas Cadeau, Matthieu Daniel
Non-shifting reservation station

Patent number: 10678542

Abstract: Systems, apparatuses, and methods for implementing a non-shifting reservation station. A dispatch unit may write an operation into any entry of a reservation station. The reservation station may include an age matrix for determining the relative ages of the operations stored in the entries of the reservation station. The reservation station may include selection logic which is configured to pick the oldest ready operation from the reservation station based on the values stored in the age matrix. The selection logic may utilize control logic to mask off columns of an age matrix corresponding to non-ready operation so as to determine which operation is the oldest ready operation in the reservation station. Also, the reservation station may be configured to dequeue operations early when these operations do not have load dependency.

Type: Grant

Filed: July 24, 2015

Date of Patent: June 9, 2020

Assignee: Apple Inc.

Inventors: Ian D. Kountanis, Mahesh K. Reddy
Apparatus and method for performing branch prediction

Patent number: 10620960

Abstract: An apparatus and method are provided for performing branch prediction. The apparatus has processing circuitry for executing instructions out-of-order with respect to original program order, and event counting prediction circuitry for maintaining event count values for branch instructions, for use in making branch outcome predictions for those branch instructions. Further, checkpointing storage stores state information of the apparatus at a plurality of checkpoints to enable the state information to be restored for a determined one of those checkpoints in response to a flush event. The event counting prediction circuitry has training storage with a first number of training entries, each training entry being associated with a branch instruction.

Type: Grant

Filed: August 20, 2018

Date of Patent: April 14, 2020

Assignee: Arm Limited

Inventors: Houdhaifa Bouzguarrou, Guillaume Bolbenes, Vincenzo Consales
Method for protecting a program code, corresponding system and processor

Patent number: 10613993

Abstract: Program code intended to be copied into the cache memory of a microprocessor is transferred encrypted between the random-access memory and the processor, and the decryption is carried out at the level of the cache memory. A checksum may be inserted into the cache lines in order to allow integrity verification, and this checksum is then replaced with a specific instruction before delivery of an instruction word to the central unit of the microprocessor.

Type: Grant

Filed: January 30, 2015

Date of Patent: April 7, 2020

Assignee: STMICROELECTRONICS SA

Inventor: Bruno Fel
Effective address based load store unit in out of order processors

Patent number: 10606593

Abstract: Technical solutions are described for out-of-order (OoO) execution of one or more instructions by a processing unit. An example method includes looking up, by a load-store unit (LSU), an entry in an effective address directory (EAD) for an effective address (EA) of an operand of an instruction to be launched. Further, the method includes, in response to the EA being present in the EAD, launching, by the LSU, the instruction with the RA from the EAD, and in response to the EA not being present in the EAD, looking up, by the LSU, the EA in an effective real table (ERT) entry, and launching the instruction with the RA from the ERT entry. Further, in response to the ERT entry to be removed, the ERT entry including an ERT index and a mapping between the EA and the RA, removing the entry of the EA from the EAD.

Type: Grant

Filed: November 29, 2017

Date of Patent: March 31, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Bryan Lloyd, Balaram Sinharoy
Effective address based load store unit in out of order processors

Patent number: 10606590

Abstract: Technical solutions are described for out-of-order (OoO) execution of one or more instructions by a processing unit. An example method includes looking up, by a load-store unit (LSU), an entry in an effective address directory (EAD) for an effective address (EA) of an operand of an instruction to be launched. Further, the method includes, in response to the EA being present in the EAD, launching, by the LSU, the instruction with the RA from the EAD, and in response to the EA not being present in the EAD, looking up, by the LSU, the EA in an effective real table (ERT) entry, and launching the instruction with the RA from the ERT entry. Further, in response to the ERT entry to be removed, the ERT entry including an ERT index and a mapping between the EA and the RA, removing the entry of the EA from the EAD.

Type: Grant

Filed: October 6, 2017

Date of Patent: March 31, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Bryan Lloyd, Balaram Sinharoy
Method for distributing load in a multi-core system

Patent number: 10592298

Abstract: A system and method for processing a data packet. The method comprises initiating processing of a received plurality of data packets by CPU cores; tracking, by a scale management routine, processing queues for the CPU cores and their load. In response to an average size of a processing queue being lower than a first pre-determined queue threshold, and a CPU core load being lower than a first pre-determined load threshold, preventing adding new data packets to the processing queue, monitoring emptying of processing queues for each processing CPU core. In response to an average size of a processing queue or a CPU core load being above a second pre-determined upper queue threshold or the second pre-determined load threshold, transmitting all data from processing queues for each processing CPU core to a memory buffer, increasing the number of processing cores by one; and initiating data packet processing.

Type: Grant

Filed: January 26, 2018

Date of Patent: March 17, 2020

Assignee: NFWARE, INC.

Inventors: Alexander Britkin, Viacheslav Morozov, Igor Pavlov
Reducing stalling in a simultaneous multithreading processor by inserting thread switches for instructions likely to stall

Patent number: 10585669

Abstract: A system and method suppresses occurrence of stalling caused by data dependency other than register dependency in an out-of-order processor. A stall reducing method includes a handler for detecting a stall occurring during execution of execution code using a performance monitoring unit, and for identifying, based on dependencies, a second instruction on which a first instruction is data dependent, the stall based on this dependency. A profiler registers the second instruction as profile information. An optimization module inserts a thread yield instruction in an appropriate position inside execution code or an original code file based on the profile information, and outputs optimized execution code.

Type: Grant

Filed: July 31, 2018

Date of Patent: March 10, 2020

Assignee: International Business Machines Corporation

Inventor: Takeshi Ogasawara
Scalable dependency matrix with multiple summary bits in an out-of-order processor

Patent number: 10564976

Abstract: Aspects of the invention include tracking dependencies between instructions in an issue queue. The tracking includes, for each instruction in the issue queue, identifying whether the instruction is dependent on each of a threshold number of instructions added to the issue queue prior to the instruction. The tracking also includes identifying whether the instruction is dependent on one or more other instructions added to the issue queue prior to the instruction that are not included in the each of the threshold number of instructions. A dependency between the instruction and each of the other instructions is tracked as a plurality of groups by indicating that a dependency exists between the instruction and one of the groups based on identifying a dependency between the instruction and at least one instruction in the group. Instructions are issued from the issue queue based at least in part on the tracking.

Type: Grant

Filed: November 30, 2017

Date of Patent: February 18, 2020

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Joel A. Silberman, Balaram Sinharoy
General purpose register allocation in streaming processor

Patent number: 10558460

Abstract: Systems and techniques are disclosed for general purpose register dynamic allocation based on latency associated with of instructions in processor threads. A streaming processor can include a general purpose registers configured to stored data associated with threads, and a thread scheduler configured to receive allocation information for the general purpose registers, the information describing general purpose registers that are to be assigned as persistent general purpose registers (pGPRs) and volatile general purpose registers (vGPRs). The plurality of general purpose registers can be allocated according to the received information. The streaming processor can include the general purpose registers allocated according to the received information, the allocated based on execution latencies of instructions included in the threads.

Type: Grant

Filed: December 14, 2016

Date of Patent: February 11, 2020

Assignee: QUALCOMM Incorporated

Inventors: Yun Du, Liang Han, Lin Chen, Chihong Zhang, Hongjiang Shang, Jing Wu, Zilin Ying, Chun Yu, Guofang Jiao, Andrew Gruber, Eric Demers
Code optimization conversations for connected managed runtime environments

Patent number: 10552130

Abstract: A method of providing by a code optimization service an optimized version of a code unit to a managed runtime environment is disclosed. Information related to one or more runtime conditions associated with the managed runtime environment that is executing in a different process than that of the code optimization service is obtained, wherein the one or more runtime conditions are subject to change during the execution of the code unit. The optimized version of the code unit and a corresponding set of one or more speculative assumptions are provided to the managed runtime environment, wherein the optimized version of the code unit produces the same logical results as the code unit unless at least one of the set of one or more speculative assumptions is not true, wherein the set of one or more speculative assumptions are based on the information related to the one or more runtime conditions.

Type: Grant

Filed: June 8, 2018

Date of Patent: February 4, 2020

Assignee: Azul Systems, Inc.

Inventors: Gil Tene, Philip Reames
Multimodal targets in a block-based processor

Patent number: 10445097

Abstract: Apparatus and methods are disclosed for decoding targets from an instruction and transmitting data to those targets in accordance with a current instruction. Multimodal target hardware is used in conjunction with one or more of the routers so as to route data to an appropriate target. The data can be one or more operands or a predicate and the targets can include operand buffers, broadcast channels, and general registers. In this way, operands, for example, can be directed for use with multiple subsequent instructions, and there are multiple modes for distributing the operands to the multiple instructions.

Type: Grant

Filed: March 17, 2016

Date of Patent: October 15, 2019

Assignee: Microsoft Technology Licensing, LLC

Inventors: Douglas C. Burger, Aaron L. Smith
Super-thread processor

Patent number: 10437603

Abstract: The disclosed inventions include a processor apparatus and method that enable a general purpose processor to achieve twice the operating frequency of typical processor implementations with a modest increase in area and a modest increase in energy per operation. The invention relies upon exploiting multiple independent streams of execution. Low area and low energy memory arrays used for register files operate a modest frequency. Instructions can be issued at a rate higher than this frequency by including logic that guarantees the spacing between instructions from the same thread are spaced wider than the time to access the register file. The result of the invention is the ability to overlap long latency structures, which allows using lower energy structures, thereby reducing energy per operation.

Type: Grant

Filed: February 20, 2018

Date of Patent: October 8, 2019

Inventor: Kevin Sean Halle

1 2 3 4 5 … next