Instruction Issuing Patents (Class 712/214)

Simultaneous issuance of multiple instructions (Class 712/215)

Distributed shared memory

Patent number: 12248788

Abstract: Distributed shared memory (DSMEM) comprises blocks of memory that are distributed or scattered across a processor (such as a GPU). Threads executing on a processing core local to one memory block are able to access a memory block local to a different processing core. In one embodiment, shared access to these DSMEM allocations distributed across a collection of processing cores is implemented by communications between the processing cores. Such distributed shared memory provides very low latency memory access for processing cores located in proximity to the memory blocks, and also provides a way for more distant processing cores to also access the memory blocks in a manner and using interconnects that do not interfere with the processing cores' access to main or global memory such as hacked by an L2 cache.

Type: Grant

Filed: March 10, 2022

Date of Patent: March 11, 2025

Assignee: NVIDIA Corporation

Inventors: Prakash Bangalore Prabhakar, Gentaro Hirota, Ronny Krashinsky, Ze Long, Brian Pharris, Rajballav Dash, Jeff Tuckey, Jerome F. Duluk, Jr., Lacky Shah, Luke Durant, Jack Choquette, Eric Werness, Naman Govil, Manan Patel, Shayani Deb, Sandeep Navada, John Edmondson, Greg Palmer, Wish Gandhi, Ravi Manyam, Apoorv Parle, Olivier Giroux, Shirish Gadre, Steve Heinrich
Systems and methods for synchronizing data processing in a cellular modem

Patent number: 12153928

Abstract: A cellular modem processor can include dedicated processing engines that implement specific, complex data processing operations. The processing engines can be arranged in pipelines, with different processing engines executing different steps in a sequence of operations. Flow control or data synchronization between pipeline stages can be provided using a hybrid of firmware-based flow control and hardware-based data dependency management. Firmware instructions can define data flow by reference to a virtual address space associated with pipeline buffers. A hardware interlock controller within the pipeline can track and enforce the data dependencies for the pipeline.

Type: Grant

Filed: July 27, 2023

Date of Patent: November 26, 2024

Assignee: Apple Inc.

Inventors: Steve Hengchen Hsu, Thirunathan Sutharsan, Mohanned Omar Sinnokrot, On Wa Yeung
Dependency analyzer in application dependency discovery, reporting, and management tool

Patent number: 12079668

Abstract: Techniques for monitoring operating statuses of an application and its dependencies are provided. A monitoring application may collect and report the operating status of the monitored application and each dependency. Through use of existing monitoring interfaces, the monitoring application can collect operating status without requiring modification of the underlying monitored application or dependencies. The monitoring application may determine a problem service that is a root cause of an unhealthy state of the monitored application. Dependency analyzer and discovery crawler techniques may automatically configure and update the monitoring application. Machine learning techniques may be used to determine patterns of performance based on system state information associated with performance events and provide health reports relative to a baseline status of the monitored application. Also provided are techniques for testing a response of the monitored application through modifications to API calls.

Type: Grant

Filed: April 18, 2023

Date of Patent: September 3, 2024

Assignee: Capital One Services, LLC

Inventors: Muralidharan Balasubramanian, Eric K. Barnum, Julie Dallen, David Watson
Offloading computations from a processor to remote execution logic

Patent number: 12073251

Abstract: Offloading computations from a processor to remote execution logic is disclosed. Offload instructions for remote execution on a remote device are dispatched in the form of processor instructions like conventional instructions. In the processor, an offload instruction is inserted in an offload queue. The offload instruction may be inserted at the dispatch stage or the retire stage of the processor pipeline. Metadata for the offload instruction is added to the offload instruction in the offload queue. After retirement of the offload instruction, the processor transmits an offload request generated from the offload instruction.

Type: Grant

Filed: December 29, 2020

Date of Patent: August 27, 2024

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Nagadastagiri Reddy Challapalle, Jagadish B. Kotra, John Kalamatianos
Packed data element predication processors, methods, systems, and instructions

Patent number: 12039336

Abstract: A processor includes a first mode where the processor is not to use packed data operation masking, and a second mode where the processor is to use packed data operation masking. A decode unit to decode an unmasked packed data instruction for a given packed data operation in the first mode, and to decode a masked packed data instruction for a masked version of the given packed data operation in the second mode. The instructions have a same instruction length. The masked instruction has bit(s) to specify a mask. Execution unit(s) are coupled with the decode unit. The execution unit(s), in response to the decode unit decoding the unmasked instruction in the first mode, to perform the given packed data operation. The execution unit(s), in response to the decode unit decoding the masked instruction in the second mode, to perform the masked version of the given packed data operation.

Type: Grant

Filed: August 29, 2022

Date of Patent: July 16, 2024

Assignee: Intel Corporation

Inventors: Bret L. Toll, Buford M. Guy, Ronak Singhal, Mishali Naik
Compartment isolation for load store forwarding

Patent number: 12019733

Abstract: A method comprises receiving, in a store buffer, at least a portion of a store instruction, the at least a portion of the store instruction comprising a data operand, receiving, a load instruction for execution; and determining whether the store instruction and the load instruction are in different compartments.

Type: Grant

Filed: March 11, 2022

Date of Patent: June 25, 2024

Assignee: Intel Corporation

Inventor: Michael LeMay
Latency management in synchronization events

Patent number: 11914524

Abstract: An electronic device includes one or more processors for executing one or more virtual machines. In response to a request for initiating a synchronization event, a processor identifies a subset of speculative memory access requests in one or more memory access request queues. Automatically and in accordance with the identifying, the processor purges translations associated with the subset of speculative memory access requests. Subsequent to the purging, the processor initiates the synchronization event. In some implementations, memory access completion is forced in response to a context synchronization event that corresponds to a termination of a first application, a termination of a first virtual machine, or a system call for updating a system register. Alternatively, in some implementations, memory access completion is forced in an operating system level or an application level in response to a data synchronization event that is initiated on a hypervisor layer or a firmware layer.

Type: Grant

Filed: March 1, 2022

Date of Patent: February 27, 2024

Assignee: QUALCOMM Incorporated

Inventors: Adrian Montero, Huzefa Sanjeliwala, Paul Kitchin, Prarthna Santhanakrishnan, Conrado Blasco, Pradeep Kanapathipillai
Pipelines for secure multithread execution

Patent number: 11886882

Abstract: Described herein are systems and methods for secure multithread execution. For example, some methods include fetching an instruction of a first thread from a memory into a processor pipeline that is configured to execute instructions from two or more threads in parallel using execution units of the processor pipeline; detecting that the instruction has been designated as a sensitive instruction; responsive to detection of the sensitive instruction, disabling execution of instructions of threads other than the first thread in the processor pipeline during execution of the sensitive instruction by an execution unit of the processor pipeline; executing the sensitive instruction using an execution unit of the processor pipeline; and, responsive to completion of execution of the sensitive instruction, enabling execution of instructions of threads other than the first thread in the processor pipeline.

Type: Grant

Filed: April 5, 2022

Date of Patent: January 30, 2024

Assignee: Marvell Asia Pte, Ltd.

Inventor: Shubhendu Sekhar Mukherjee
Delayed snoop for improved multi-process false sharing parallel thread performance

Patent number: 11822786

Abstract: Techniques for maintaining cache coherency comprising storing data blocks associated with a main process in a cache line of a main cache memory, storing a first local copy of the data blocks in a first local cache memory of a first processor, storing a second local copy of the set of data blocks in a second local cache memory of a second processor executing a first child process of the main process to generate first output data, writing the first output data to the first data block of the first local copy as a write through, writing the first output data to the first data block of the main cache memory as a part of the write through, transmitting an invalidate request to the second local cache memory, marking the second local copy of the set of data blocks as delayed, and transmitting an acknowledgment to the invalidate request.

Type: Grant

Filed: February 1, 2022

Date of Patent: November 21, 2023

Assignee: Texas Instruments Incorporated

Inventors: Kai Chirca, Timothy David Anderson
Instruction format and instruction set architecture for tensor streaming processor

Patent number: 11822510

Abstract: Embodiments are directed to a processor having a functional slice architecture. The processor is divided into tiles (or functional units) organized into a plurality of functional slices. The functional slices are configured to perform specific operations within the processor, which includes memory slices for storing operand data and arithmetic logic slices for performing operations on received operand data (e.g., vector processing, matrix manipulation). The processor includes a plurality of functional slices of a module type, each functional slice having a plurality of tiles. The processor further includes a plurality of data transport lanes for transporting data in a direction indicated in a corresponding instruction. The processor also includes a plurality of instruction queues, each instruction queue associated with a corresponding functional slice of the plurality of functional slices, wherein the instructions in the instruction queues comprise a functional slice specific operation code.

Type: Grant

Filed: March 1, 2022

Date of Patent: November 21, 2023

Assignee: Groq, Inc.

Inventors: Dennis Charles Abts, Jonathan Alexander Ross, John Thompson, Gregory Michael Thorson
Dynamic allocation of arithmetic logic units for vectorized operations

Patent number: 11816061

Abstract: A system includes a processing device that includes a vector arithmetic logic unit comprising a plurality of arithmetic logic units (ALUs), and a first processor core operatively coupled to the vector arithmetic logic unit, the processing device to receive a first vector instruction from the first processor core, wherein the first vector instruction specifies at least one first input vector having a first vector length, identify a first subset of the ALUs in view of the first vector length and one or more allocation criteria, execute, using the first subset of the set of ALUs, one or more first ALU operations specified by the first vector instruction, wherein the vector arithmetic logic unit executes the first ALU operations in parallel with one or more second ALU operations specified by a second vector instruction received from a second processor core.

Type: Grant

Filed: December 18, 2020

Date of Patent: November 14, 2023

Assignee: Red Hat, Inc.

Inventor: Ulrich Drepper
Reach matrix scheduler circuit for scheduling instructions to be executed in a processor

Patent number: 11803389

Abstract: A reach matrix scheduler circuit for scheduling instructions to be executed in a processor is disclosed. The scheduler circuit includes an N×R matrix wake-up circuit, where ‘N’ is the instruction window size of the scheduler circuit, and ‘R’ is the “reach” within the instruction window of the matrix wake-up circuit, with ‘R’ being less than ‘N’. A grant line associated with each instruction request entry in the N×R matrix wake-up circuit is coupled to ‘R’ other instruction entries among the ‘N’ instruction entries. When a producer instruction in an instruction request entry is ready for issuance, the grant line associated with the instruction request entry is activated so that any other instruction entries coupled to the grant line (i.e., within the “reach” of the instruction request entry) that consume the produced value generated by the producer instruction are “woken-up” and subsequently indicated as ready to be issued.

Type: Grant

Filed: January 9, 2020

Date of Patent: October 31, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Yusuf Cagatay Tekmen, Rodney Wayne Smith, Douglas C. Burger, Gagan Gupta, Kiran Ravi Seth
Apparatuses and methods for ordering bits in a memory device

Patent number: 11782721

Abstract: Systems, apparatuses, and methods for organizing bits in a memory device are described. In a number of embodiments, an apparatus can include an array of memory cells, a data interface, a multiplexer coupled between the array of memory cells and the data interface, and a controller coupled to the array of memory cells, the controller configured to cause the apparatus to latch bits associated with a row of memory cells in the array in a number of sense amplifiers in a prefetch operation and send the bits from the sense amplifiers, through a multiplexer, to a data interface, which may include or be referred to as DQs. The bits may be sent to the DQs in a particular order that may correspond to a particular matrix configuration and may thus facilitate or reduce the complexity of arithmetic operations performed on the data.

Type: Grant

Filed: February 25, 2022

Date of Patent: October 10, 2023

Assignee: Micron Technology, Inc.

Inventors: Glen E. Hush, Aaron P. Boehm, Fa-Long Luo
Scheduling tasks in a processor

Patent number: 11755365

Abstract: A method of scheduling tasks in a processor comprises receiving a plurality of tasks that are ready to be executed, i.e. all their dependencies have been met and all the resources required to execute the task are available, and adding the received tasks to a task queue (or “task pool”). The number of tasks that are executing is monitored and in response to determining that an additional task can be executed by the processor, a task is selected from the task pool based at least in part on a comparison of indications of resources used by tasks being executed and indications of resources used by individual tasks in the task pool and the selected task is then sent for execution.

Type: Grant

Filed: December 23, 2019

Date of Patent: September 12, 2023

Assignee: Imagination Technologies Limited

Inventors: Isuru Herath, Richard Broadhurst
Counting elements in neural network input data

Patent number: 11734002

Abstract: The present disclosure provides a counting device and counting method. The device includes a storage unit, a counting unit, and a register unit, where the storage unit may be connected to the counting unit for storing input data to be counted and storing a number of elements satisfying a given condition in the input data after counting; the register unit may be configured to store an address where input data to be counted is stored in the storage unit; and the counting unit may be connected to the register unit, and may be configured to acquire a counting instruction, read a storage address of the input data to be counted in the register unit according to the counting instruction, acquire corresponding input data to be counted in the storage unit, perform statistical counting on a number of elements in the input data to be counted that satisfy the given condition, and obtain a counting result.

Type: Grant

Filed: November 27, 2019

Date of Patent: August 22, 2023

Assignee: SHANGHAI CAMBRICON INFORMATION TECHNOLOGY CO., LTD

Inventors: Tianshi Chen, Jie Wei, Tian Zhi, Zai Wang
Method and processor system for executing a TELT instruction to access a data item during execution of an atomic primitive

Patent number: 11681567

Abstract: The present disclosure relates to a method for a computer system comprising a plurality of processor cores including a first processor core and a second processor core, wherein a data item is exclusively assigned to the first processor core, of the plurality of processor cores, for executing an atomic primitive by the first processor core. The method includes receiving by the first processor core, from the second processor core, a request for accessing the data item, and in response to determining by the first processor core that the executing of the atomic primitive is not completed by the first processor core, returning a rejection message to the second processor core.

Type: Grant

Filed: May 9, 2019

Date of Patent: June 20, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Ralf Winkelmann, Michael Fee, Matthias Klein, Carsten Otte, Edward W. Chencinski, Hanno Eichelberger
Method, apparatus, and system for reducing live readiness calculations in reservation stations

Patent number: 11669333

Abstract: In certain aspects of the disclosure, an apparatus comprises a first scheduling pool associated with a first minimum scheduling latency and a second scheduling pool associated with a second minimum scheduling latency, the second minimum scheduling latency greater than the first minimum scheduling latency. A common instruction picker is coupled to both the first scheduling pool and the second scheduling pool. The common instruction picker may be configured to select a first instruction from the first scheduling pool and a second instruction from the second scheduling pool, and then choose either the first instruction or second instruction for dispatch according to a picking policy.

Type: Grant

Filed: April 26, 2018

Date of Patent: June 6, 2023

Assignee: Qualcomm Incorporated

Inventors: Rodney Wayne Smith, Raghavan Madhavan, Luke Yen, Shivam Priyadarshi, Yusuf Cagatay Tekmen
Cache replacement mechanisms for speculative execution

Patent number: 11663130

Abstract: Described herein are systems and methods for cache replacement mechanisms for speculative execution. For example, some systems include, a buffer comprising entries that are each configured to store a cache line of data and a tag that includes an indication of a status of the cache line stored in the entry, in an integrated circuit that is configured to: responsive to a cache miss caused by a load instruction that is speculatively executed by a processor pipeline, load a cache line of data corresponding to the cache miss into a first entry of the buffer and update the tag of the first entry to indicate the status is speculative; responsive to the load instruction being retired by the processor pipeline, update the tag to indicate the status is validated; and, responsive to the load instruction being flushed from the processor pipeline, update the tag to indicate the status is cancelled.

Type: Grant

Filed: April 30, 2021

Date of Patent: May 30, 2023

Assignee: Marvell Asia Pte, Ltd.

Inventor: Rabin Sugumar
Pre-staged instruction registers for variable length instruction set machine

Patent number: 11599358

Abstract: Methods and systems relating to improved processing architectures with pre-staged instructions are disclosed herein. A disclosed processor includes an instruction memory, at least one functional processing unit, a bus, a set of instruction registers configured to be loaded, using the bus, with a set of pre-staged instructions from the instruction memory, and a logic circuit configured to provide the set of pre-staged instructions from the set of instruction registers to the at least one functional processing unit in response to receiving an instruction from the instruction memory.

Type: Grant

Filed: August 12, 2021

Date of Patent: March 7, 2023

Assignee: Tenstorrent Inc.

Inventors: Miles Robert Dooley, Milos Trajkovic, Rakesh Shaji Lal, Stanislav Sokorac
Fine resolution on-chip voltage simulation to prevent under voltage conditions

Patent number: 11586267

Abstract: Embodiments of the present disclosure relate to managing power provided to a semiconductor circuit to prevent undervoltage conditions. A measured voltage value describing a measured supply voltage at a first subcircuit of a semiconductor circuit can be received, the measured voltage value having a first resolution. A selected metric indicative of a supply voltage present at the first subcircuit can be received, the selected metric having a second resolution higher than the first resolution. The selected metric is calibrated to obtain a calibrated metric when a transition of the measured voltage value occurs.

Type: Grant

Filed: December 19, 2018

Date of Patent: February 21, 2023

Assignee: International Business Machines Corporation

Inventors: Thomas Strach, Preetham M. Lobo, Tobias Webel
System, apparatus and method for configurable control of asymmetric multi-threading (SMT) on a per core basis

Patent number: 11579944

Abstract: In one embodiment, a processor includes: a plurality of cores each comprising a multi-threaded core to concurrently execute a plurality of threads; and a control circuit to concurrently enable at least one of the plurality of cores to operate in a single-threaded mode and at least one other of the plurality of cores to operate in a multi-threaded mode. Other embodiments are described and claimed.

Type: Grant

Filed: November 14, 2018

Date of Patent: February 14, 2023

Assignee: Intel Corporation

Inventors: Daniel J. Ragland, Guy M. Therien, Ankush Varma, Eric J. DeHaemer, David T. Mayo, Ariel Gur, Yoav Ben-Raphael, Mark P. Seconi
Pre-staged instruction registers for variable length instruction set machine

Patent number: 11567764

Abstract: Methods and systems relating to improved processing architectures with pre-staged instructions are disclosed herein. A disclosed processor includes an instruction memory, at least one functional processing unit, a bus, a set of instruction registers configured to be loaded, using the bus, with a set of pre-staged instructions from the instruction memory, and a logic circuit configured to provide the set of pre-staged instructions from the set of instruction registers to the at least one functional processing unit in response to receiving an instruction from the instruction memory.

Type: Grant

Filed: August 12, 2021

Date of Patent: January 31, 2023

Assignee: Tenstorrent Inc.

Inventors: Miles Robert Dooley, Milos Trajkovic, Rakesh Shaji Lal, Stanislav Sokorac
Apparatus and method for generating and processing a trace stream indicative of instruction execution by processing circuitry

Patent number: 11561882

Abstract: An apparatus and method are provided for generating and processing a trace stream indicative of instruction execution by processing circuitry. An apparatus has an input interface for receiving instruction execution information from the processing circuitry indicative of a sequence of instructions executed by the processing circuitry, and trace generation circuitry for generating from the instruction execution information a trace stream comprising a plurality of trace elements indicative of execution by the processing circuitry of instruction flow changing instructions within the sequence.

Type: Grant

Filed: August 9, 2017

Date of Patent: January 24, 2023

Assignee: Arm Limited

Inventors: François Christopher Jacques Botman, Thomas Christopher Grocutt, John Michael Horley, Michael John Williams, Michael John Gibbs
Variable pipeline length in a barrel-multithreaded processor

Patent number: 11526361

Abstract: Devices and techniques for variable pipeline length in a barrel-multithreaded processor are described herein. A completion time for an instruction can be determined prior to insertion into a pipeline of a processor. A conflict between the instruction and a different instruction based on the completion time can be detected. Here, the different instruction is already in the pipeline and the conflict detected when the completion time equals the previously determined completion time for the different instruction. A difference between the completion time and an unconflicted completion time can then be calculated and completion of the instruction delayed by the difference.

Type: Grant

Filed: October 20, 2020

Date of Patent: December 13, 2022

Assignee: Micron Technology, Inc.

Inventor: Tony Brewer
Compressing micro-operations in scheduler entries in a processor

Patent number: 11513802

Abstract: An electronic device includes a processor having a micro-operation queue, multiple scheduler entries, and scheduler compression logic. When a pair of micro-operations in the micro-operation queue is compressible in accordance with one or more compressibility rules, the scheduler compression logic acquires the pair of micro-operations from the micro-operation queue and stores information from both micro-operations of the pair of micro-operations into different portions in a single scheduler entry. In this way, the scheduler compression logic compresses the pair of micro-operations into the single scheduler entry.

Type: Grant

Filed: September 27, 2020

Date of Patent: November 29, 2022

Assignee: Advanced Micro Devices, Inc.

Inventors: Michael W. Boyer, John Kalamatianos, Pritam Majumder
Memory management method and apparatus

Patent number: 11507412

Abstract: A disclosed example apparatus includes memory; and processor circuitry to: identify a lock-protected section of instructions in the memory; replace lock/unlock instructions with transactional lock acquire and transactional lock release instructions to form a transactional process; and execute the transactional process in a speculative execution.

Type: Grant

Filed: April 28, 2020

Date of Patent: November 22, 2022

Assignee: Intel Corporation

Inventors: Keqiang Wu, Jiwei Lu, Koichi Yamada, Yong-Fong Lee
Processor device for executing SIMD instructions

Patent number: 11500632

Abstract: In a processor device according to the present invention, a memory access unit reads data to be processed from an external memory and writes the data to a first register group that a plurality of processors does not access among a plurality of register groups. A control unit sequentially makes each of the plurality of processors implement a same instruction, in parallel with changing an address of a register group that stores the data to be processed. A scheduler, based on specified scenario information, specifies an instruction to be implemented and a register group to be accessed for the plurality of processors, and specifies a register group to be written to among the plurality of register groups and data to be processed that is to be written for the memory access unit.

Type: Grant

Filed: April 23, 2019

Date of Patent: November 15, 2022

Assignee: ArchiTek Corporation

Inventor: Shuichi Takada
Atomic operations in a large scale distributed computing network

Patent number: 11481216

Abstract: Techniques for executing an atomic command in a distributed computing network are provided. A core cluster, including a plurality of processing cores that do not natively issue atomic commands to the distributed computing network, is coupled to a translation unit. To issue an atomic command, a core requests a location in the translation unit to write an opcode and operands for the atomic command. The translation unit identifies a location (a “window”) that is not in use by another atomic command and indicates the location to the processing core. The processing core writes the opcode and operands into the window and indicates to the translation unit that the atomic command is ready. The translation generates an atomic command and issues the command to the distributed computing network for execution. After execution, the distributed computing network provides a response to the translation unit, which provides that response to the core.

Type: Grant

Filed: September 10, 2018

Date of Patent: October 25, 2022

Assignee: Advanced Micro Devices, Inc.

Inventor: Stanley Ames Lackey, Jr.
Method and apparatus for executing instructions including a blocking instruction generated in response to determining that there is data dependence between instructions

Patent number: 11422817

Abstract: A method and apparatus for executing an instruction are provided. In the method, an instruction queue is first generated, and an instruction from the instruction queue in preset order is acquired. Then, a sending step including: determining a type of the acquired instruction; determining, in response to determining that the acquired instruction is an arithmetic instruction, an executing component for executing the arithmetic instruction from an executing component set; and sending the arithmetic instruction to the determined executing component is executed. Last, in response to determining that the acquired instruction is a blocking instruction, a next instruction is acquired after receiving a signal for instructing an instruction associated with the blocking instruction being completely executed.

Type: Grant

Filed: July 1, 2019

Date of Patent: August 23, 2022

Assignee: Kunlunxin Technology (Beijing) Company Limited

Inventors: Jing Wang, Wei Qi, Yupeng Li, Xiaozhang Gong
Predicting load-based control independent (CI) register data independent (DI) (CIRDI) instructions as CI memory data dependent (DD) (CIMDD) instructions for replay in speculative misprediction recovery in a processor

Patent number: 11392387

Abstract: Predicting load-based control independent (CI), register data independent (DI) (CIRDI) instructions as CI memory data dependent (DD) (CIMDD) instructions for replay in speculative misprediction recovery in a processor. The processor predicts if a source of a load-based CIRDI instruction will be forwarded by a store-based instruction (i.e. “store-forwarded”). If a load-based CIRDI instruction is predicted as store-forwarded, the load-based CIRDI instruction is considered a CIMDD instruction and is replayed in misprediction recovery. If a load-based CIRDI instruction is not predicted as store-forwarded, the processor considers such load-based CIRDI instruction as a pending load-based CIRDI instruction. If this pending load-based CIRDI instruction is determined in execution to be store-forwarded, the instruction pipeline is flushed and the pending load-based CIRDI instruction is also replayed in misprediction recovery.

Type: Grant

Filed: November 4, 2020

Date of Patent: July 19, 2022

Assignee: Microsoft Technology Licensing, LLC

Inventors: Vignyan Reddy Kothinti Naresh, Arthur Perais, Rami Mohammad Al Sheikh, Shivam Priyadarshi
Reach-based explicit dataflow processors, and related computer-readable media and methods

Patent number: 11392537

Abstract: Exemplary reach-based explicit dataflow processors and related computer-readable media and methods. The reach-based explicit dataflow processors are configured to support execution of producer instructions encoded with explicit naming of consumer instructions intended to consume the values produced by the producer instructions. The reach-based explicit dataflow processors are configured to make available produced values as inputs to explicitly named consumer instructions as a result of processing producer instructions. The reach-based explicit dataflow processors support execution of a producer instruction that explicitly names a consumer instruction based on using the producer instruction as a relative reference point from the producer instruction.

Type: Grant

Filed: March 18, 2019

Date of Patent: July 19, 2022

Assignee: Microsoft Technology Licensing, LLC

Inventors: Gagan Gupta, Michael Scott McIlvaine, Rodney Wayne Smith, Thomas Philip Speier, David Tennyson Harper, III
Scheduling tasks using targeted pipelines

Patent number: 11366691

Abstract: A method of scheduling instructions within a parallel processing unit is described. The method comprises decoding, in an instruction decoder, an instruction in a scheduled task in an active state, and checking, by an instruction controller, if an ALU targeted by the decoded instruction is a primary instruction pipeline. If the targeted ALU is a primary instruction pipeline, a list associated with the primary instruction pipeline is checked to determine whether the scheduled task is already included in the list. If the scheduled task is already included in the list, the decoded instruction is sent to the primary instruction pipeline.

Type: Grant

Filed: December 1, 2020

Date of Patent: June 21, 2022

Assignee: Imagination Technologies Limited

Inventors: Simon Nield, Yoong-Chert Foo, Adam de Grasse, Luca Iuliano
Controlling the number of powered vector lanes via a register field

Patent number: 11360536

Abstract: The vector data path is divided into smaller vector lanes. A register such as a memory mapped control register stores a vector lane number (VLX) indicating the number of vector lanes to be powered. A decoder converts this VLX into a vector lane control word, each bit controlling the ON of OFF state of the corresponding vector lane. This number of contiguous least significant vector lanes are powered. In the preferred embodiment the stored data VLX indicates that 2VLX contiguous least significant vector lanes are to be powered. Thus the number of vector lanes powered is limited to an integral power of 2. This manner of coding produces a very compact controlling bit field while obtaining substantially all the power saving advantage of individually controlling the power of all vector lanes.

Type: Grant

Filed: August 3, 2020

Date of Patent: June 14, 2022

Assignee: Texas Instruments Incorporated

Inventors: Timothy David Anderson, Duc Quang Bui
Instruction dispatch routing

Patent number: 11327766

Abstract: A method of instruction dispatch routing comprises receiving an instruction for dispatch to one of a plurality of issue queues; determining a priority status of the instruction; selecting a rotation order based on the priority status, wherein a first rotation order is associated with priority instructions and a second rotation order, different from the first rotation order, is associated with non-priority instructions; selecting an issue queue of the plurality of issue queues based on the selected rotation order; and dispatching the instruction to the selected issue queue.

Type: Grant

Filed: July 31, 2020

Date of Patent: May 10, 2022

Assignee: International Business Machines Corporation

Inventors: Eric Mark Schwarz, Brian W. Thompto, Kurt A. Feiste, Michael Joseph Genden, Dung Q. Nguyen, Susan E. Eisen
Apparatus and method for operating an issue queue

Patent number: 11327791

Abstract: An apparatus provides an issue queue having a first section and a second section. Each entry in each section stores operation information identifying an operation to be performed. Allocation circuitry allocates each item of received operation information to an entry in the first section or the second section. Selection circuitry selects from the issue queue, during a given selection iteration, an operation from amongst the operations whose required source operands are available. Availability update circuitry updates source operand availability for each entry whose operation information identifies as a source operand a destination operand of the selected operation in the given selection iteration. A deferral mechanism inhibits from selection, during a next selection iteration, any operation associated with an entry in the second section whose source operands are now available due to that operation having as a source operand the destination operand of the selected operation in the given selection iteration.

Type: Grant

Filed: August 21, 2019

Date of Patent: May 10, 2022

Assignee: Arm Limited

Inventors: Michael David Achenbach, Robert Greg McDonald, Nicholas Andrew Pfister, Kelvin Domnic Goveas, Michael Filippo, . Abhishek Raja, Zachary Allen Kingsbury
Method and apparatus for balancing binary instruction burstization and chaining

Patent number: 11327760

Abstract: A method for grouping computer instructions includes receiving a set of computer instructions, grouping the set of computer instructions by register dependencies, identifying a plurality of single-definition-use flow (SDF) bundles based on a burstization criteria and a chaining criteria; and based on the SDF bundles, transforming the set of computer instructions. The transformation may include splitting one of the set of computer instructions and setting a burst parameter for the one of the set of computer instruction. The transformation may include grouping a plurality of the set of computer instructions and replacing a pair of register file accesses with a pair of temporary register accesses.

Type: Grant

Filed: April 9, 2020

Date of Patent: May 10, 2022

Assignee: HUAWEI TECHNOLOGIES CO., LTD.

Inventors: Andrew Siu Doug Lee, Ahmed Mohammed Elshafiey Mohammed Eltantawy
Event processing

Patent number: 11321019

Abstract: An event-processing unit for processing tokens associated with a state or state transition, herein also referred to as an event, of an external device is disclosed. The EPU allows token-processing schemes, in which the processing of incoming tokens and the further handling of a processing result by the EPU are determined not only by the token identifier, but also by the payload data of the incoming token or by data in the data memory. A flag-processing capability of a processing-control stage allows applying flag-processing operations such as logical operations to data obtained as a processing result of an ALU-processing operation. The result of these operations determines a subsequent handling of ALU-result data by the EPU. Thus, whether or not the ALU-result data is written to the data memory also influences the processing of any subsequent incoming tokens for which that data is used in the ALU-processing operation.

Type: Grant

Filed: September 11, 2020

Date of Patent: May 3, 2022

Assignee: ACCEMIC TECHNOLOGIES GMBH

Inventor: Alexander Weiss
Executing mutually exclusive vector instructions according to a vector predicate instruction

Patent number: 11301252

Abstract: A data processing apparatus is provided comprising: a plurality of input lanes and a plurality of corresponding output lanes. Processing circuitry executes a first vector instruction and a second vector instruction. The first vector instruction specifies a target of output data from the corresponding output lanes that is specified as a source of input data to the input lanes by the second vector instruction. Mask circuitry stores a first mask that defines a first set of the output lanes that are valid for the first vector instruction, and stores a second mask that defines a second set of the output lanes that are valid for the second vector instruction. The first set and the second set are mutually exclusive. Issue circuitry begins processing of the second vector instruction at a lane index prior to completion of the first vector instruction at the lane index.

Type: Grant

Filed: January 15, 2020

Date of Patent: April 12, 2022

Assignee: Arm Limited

Inventor: Kim Richard Schuttenberg
Proactive voltage droop reduction and/or mitigation in a processor core

Patent number: 11275644

Abstract: Techniques facilitating voltage droop reduction and/or mitigation in a processor core are provided. In one example, a system can comprise a memory that stores, and a processor that executes, computer executable components. The computer executable components can comprise an observation component that detects one or more events at a first stage of a processor pipeline. An event of the one or more events can be a defined event determined to increase a level of power consumed during a second stage of the processor pipeline. The computer executable components can also comprise an instruction component that applies a voltage droop mitigation countermeasure prior to the increase of the level of power consumed during the second stage of the processor pipeline and a feedback component that provides a notification to the instruction component that indicates a success or a failure of a result of the voltage droop mitigation countermeasure.

Type: Grant

Filed: December 6, 2019

Date of Patent: March 15, 2022

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Giora Biran, Pradip Bose, Alper Buyuktosunoglu, Pierce I-Jen Chuang, Preetham M. Lobo, Ramon Bertran Monfort, Phillip John Restle, Christos Vezyrtzis, Tobias Webel
Instruction scheduling patterns on decoupled systems

Patent number: 11269646

Abstract: Apparatuses and methods for instruction scheduling in an out-of-order decoupled access-execute processor are disclosed. The instructions for the decoupled access-execute processor comprises access instructions and execute instructions, where access instructions comprise load instructions and instructions which provide operand values to load instructions. Schedule patterns of groups of linked execute instructions are monitored, where the execute instructions in a group of linked execute instructions are linked by data dependencies. On the basis of an identified repeating schedule pattern configurable execution circuitry adopts a configuration to perform the operations defined by the group of linked execute instructions of the repeating schedule pattern.

Type: Grant

Filed: March 29, 2021

Date of Patent: March 8, 2022

Assignee: Arm Limited

Inventors: Mbou Eyole, Michiel Willem Van Tol
Multiple core software forwarding

Patent number: 11212590

Abstract: Approaches for performing all DOCSIS downstream and upstream data forwarding functions using executable software. DOCSIS data forwarding functions may be performed by classifying one or more packets, of a plurality of received packets, to a particular DOCSIS system component, and then processing the one or more packets classified to the same DOCSIS system component on a single CPU core. The one or more packets may be forwarded between a sequence of one or more software stages. The software stages may each be configured to execute on separate logical cores or on a single logical core.

Type: Grant

Filed: July 10, 2017

Date of Patent: December 28, 2021

Assignee: Harmonic, Inc.

Inventors: Adam Levy, Pavlo Shcherbyna, Alex Muller, Vladyslav Buslov, Victoria Sinitsky, Michael W. Patrick, Nitin Sasi Kumar
System, apparatus and method for symbolic store address generation for data-parallel processor

Patent number: 11188341

Abstract: In one embodiment, an apparatus includes: a plurality of execution lanes to perform parallel execution of instructions; and a unified symbolic store address buffer coupled to the plurality of execution lanes, the unified symbolic store address buffer comprising a plurality of entries each to store a symbolic store address for a store instruction to be executed by at least some of the plurality of execution lanes. Other embodiments are described and claimed.

Type: Grant

Filed: March 26, 2019

Date of Patent: November 30, 2021

Assignee: Intel Corporation

Inventors: Jeffrey J. Cook, Srikanth T. Srinivasan, Jonathan D. Pearce, David B. Sheffield
Malware resistant computer

Patent number: 11188681

Abstract: An approach is provided in which an information handling system loads a set of encrypted binary code into a processor that has been encrypted based upon a unique key of the processor. The processor includes an instruction decoder that transforms the set of encrypted binary code into a set of instruction control signals using the unique key. In turn, the processor executes a set of instructions based on the set of instruction control signals.

Type: Grant

Filed: April 8, 2019

Date of Patent: November 30, 2021

Assignee: International Business Machines Corporation

Inventors: Guy M. Cohen, Shai Halevi, Lior Horesh
Method to determine the oldest instruction in an instruction queue of a processor with multiple instruction threads

Patent number: 11182167

Abstract: A method to determine an oldest instruction in an instruction queue of a processor with multiple instruction threads, wherein each of the multiple instruction threads have a unique thread identifier. The method includes tagging each instruction thread, of the multiple instruction threads, in the instruction queue with a unique tag number according to a round-robin scheme, wherein the unique tag number includes the unique thread identifier for each instruction thread and a round number in the round-robin scheme. The method further includes selecting, for each instruction thread, of the multiple instruction threads, the instruction thread with a lowest tag number from the multiple instruction threads in the instruction queue that are tagged with an oldest round number from the round-robin scheme.

Type: Grant

Filed: March 15, 2019

Date of Patent: November 23, 2021

Assignee: International Business Machines Corporation

Inventors: Arni Ingimundarson, Maarten J. Boersma, Niels Fricke
System and method for a lightweight fencing operation

Patent number: 11175916

Abstract: A system and method for a lightweight fence is described. In particular, micro-operations including a fencing micro-operation are dispatched to a load queue. The fencing micro-operation allows micro-operations younger than the fencing micro-operation to execute, where the micro-operations are related to a type of fencing micro-operation. The fencing micro-operation is executed if the fencing micro-operation is the oldest memory access micro-operation, where the oldest memory access micro-operation is related to the type of fencing micro-operation. The fencing micro-operation determines whether micro-operations younger than the fencing micro-operation have load ordering violations and if load ordering violations are detected, the fencing micro-operation signals the retire queue that instructions younger than the fencing micro-operation should be flushed. The instructions to be flushed should include all micro-operations with load ordering violations.

Type: Grant

Filed: December 19, 2017

Date of Patent: November 16, 2021

Assignee: Advanced Micro Devices, Inc.

Inventors: Gregory W. Smaus, John M. King
Systems and methods for invisible speculative execution

Patent number: 11163576

Abstract: A system and method for efficiently preventing visible side-effects in the memory hierarchy during speculative execution is disclosed. Hiding the side-effects of executed instructions in the whole memory hierarchy is both expensive, in terms of performance and energy, and complicated. A system and method is disclosed to hide the side-effects of speculative loads in the cache(s) until the earliest time these speculative loads become non-speculative. A refinement is disclosed where loads that hit in the L1 cache are allowed to proceed by keeping their side-effects on the L1 cache hidden until these loads become non-speculative, and all other speculative loads that miss in the cache(s) are prevented from executing until they become non-speculative. To limit the performance deterioration caused by these delayed loads, a system and method is disclosed that augments the cache(s) with a value predictor or a re-computation engine that supplies predicted or recomputed values to the loads that missed in the cache(s).

Type: Grant

Filed: March 20, 2020

Date of Patent: November 2, 2021

Assignee: ETA SCALE AB

Inventors: Christos Sakalis, Stefanos Kaxiras, Alberto Ros, Alexandra Jimborean, Magnus Själander
Accelerated operation of a graph streaming processor

Patent number: 11150961

Abstract: Methods, systems and apparatuses for graph processing are disclosed. One graph streaming processor includes a thread manager, wherein the thread manager is operative to dispatch operation of the plurality of threads of a plurality of thread processors before dependencies of the dependent threads have been resolved, maintain a scorecard of operation of the plurality of threads of the plurality of thread processors, and provide an indication to at least one of the plurality of thread processors when a dependency between the at least one of the plurality of threads that a request has or has not been satisfied. Further, a producer thread provides a response to the dependency when the dependency has been satisfied, and each of the plurality of thread processors is operative to provide processing updates to the thread manager, and provide queries to the thread manager upon reaching a dependency.

Type: Grant

Filed: February 8, 2019

Date of Patent: October 19, 2021

Assignee: Blaize, Inc.

Inventors: Lokesh Agarwal, Sarvendra Govindammagari, Venkata Ganapathi Puppala, Satyaki Koneru
AC parallelization circuit, AC parallelization method, and parallel information processing device

Patent number: 11144317

Abstract: An AC parallelization circuit includes a transmitting circuit configured to transmit a stop signal to instruct a device for executing calculation in an iteration immediately preceding an iteration for which a concerned device is responsible to stop the calculation in loop-carried dependency calculation; and an estimating circuit configured to generate, as a result of executing the calculation in the preceding iteration, an estimated value to be provided to an arithmetic circuit when the transmitting circuit transmits the stop signal.

Type: Grant

Filed: August 20, 2020

Date of Patent: October 12, 2021

Assignee: FUJITSU LIMITED

Inventor: Hisanao Akima
Slice-target register file for microprocessor

Patent number: 11119774

Abstract: A system and/or method for processing information is disclosed that has at least one processor; a register file associated with the processor, the register file sliced into a plurality of STF blocks having a plurality of STF entries, and in an embodiment, each STF block is further partitioned into a plurality of sub-blocks, each sub-block having a different portion of the plurality of STF entries; and a plurality of execution units configured to read data from and write data to the register file, where the plurality of execution units are arranged in one or more execution slices. In one or more embodiments, the system is configured so that each execution slice has a plurality of STF blocks, and alternatively or additionally, each of the plurality of execution units in a single execution slice is assigned to write to one, and preferably only one, of the plurality of STF blocks.

Type: Grant

Filed: September 6, 2019

Date of Patent: September 14, 2021

Assignee: International Business Machines Corporation

Inventors: Brian W. Thompto, Dung Q. Nguyen, Hung Q. Le, Sam Gat-Shang Chu
Predictive on-chip voltage simulation to detect near-future under voltage conditions

Patent number: 11112846

Abstract: Embodiments of the present disclosure relate to detecting undervoltage conditions at a subcircuit. A power supply current of a first subcircuit is determined over a first number of previous clock cycles. A cross current flowing between the first subcircuit and a second subcircuit is determined over the first number of previous clock cycles. An estimated momentary supply voltage present at the first subcircuit is then determined based on the power supply current of the first subcircuit over the first number of previous clock cycles and the cross current flowing between the first subcircuit and the second subcircuit over the first number of previous clock cycles.

Type: Grant

Filed: December 19, 2018

Date of Patent: September 7, 2021

Assignee: International Business Machines Corporation

Inventors: Thomas Strach, Preetham M. Lobo, Tobias Webel

1 2 3 4 5 … next