Patents Examined by Shawn Doman

Stateful microcode branching

Patent number: 11977890

Abstract: Stateful microbranch instructions, including: generating, based on an instruction, a first one or more microinstructions including a stateful microbranch instruction, wherein the stateful microbranch instruction includes: an address of a next instruction after the instruction; a branch target address; one or more microcode attributes; and executing the first one or more microinstructions.

Type: Grant

Filed: December 30, 2021

Date of Patent: May 7, 2024

Assignee: ADVANCED MICRO DEVICES, INC.

Inventors: Magiting M. Talisayon, Luca Schiano, Neil N. Marketkar, Yueh-Chuan Tzeng
Hierarchical thread scheduling based on multiple barriers

Patent number: 11977895

Abstract: Examples described herein relate to a graphics processing unit (GPU) coupled to the memory device, the GPU configured to: execute an instruction thread; determine if a dual directional signal barrier is associated with the instruction thread; and based on clearance of the dual directional signal barrier for a particular signal barrier identifier and a mode of operation, indicate a clearance of the dual directional signal barrier for the mode of operation, wherein the dual directional signal barrier is to provide a single barrier to gate activity of one or more producers based on activity of one or more consumers or gate activity of one or more consumers based on activity of one or more producers.

Type: Grant

Filed: December 22, 2020

Date of Patent: May 7, 2024

Assignee: Intel Corporation

Inventors: Sabareesh Ganapathy, Fangwen Fu, Hong Jiang, James Valerio
Use of multiple different variants of floating point number formats in floating point operations on a per-operand basis

Patent number: 11966740

Abstract: A processor comprising: a register file comprising a group of operand registers for holding data values, each operand register being a fixed number of bits in length for holding a respective data value of that length; and processing logic comprising floating point logic for performing floating point operations on data values in the register file, the floating point logic is configured to process the fixed number of bits in the respective data value according to a floating point format comprising a set of mantissa bits and a set of exponent bits. The processing logic is operable to select between a plurality of different variants of the floating point format, at least some of the variants having a different size sets of mantissa bits and exponent bits relative to one another.

Type: Grant

Filed: August 10, 2021

Date of Patent: April 23, 2024

Assignee: Graphcore Limited

Inventors: Mrudula Gore, Alan Alexander
Instruction scheduling in a processor using operation source parent tracking

Patent number: 11934834

Abstract: Instruction scheduling in a processor using operation source parent tracking. A source parent is a producer instruction whose execution generates a produced value consumed by a consumer instruction. The processor is configured to track identifying operation source parent information for instructions processed in a pipeline and providing such operation source parent information to a scheduling circuit along with the associated consumer instruction. The scheduling circuit is configured to perform instruction scheduling using operation source parent tracking on received instruction(s) to be scheduled for execution. The processor is configured to compare sources and destinations for each of the instructions to be scheduled based on the operation source parent information to determine instructions ready for scheduling for execution.

Type: Grant

Filed: October 19, 2021

Date of Patent: March 19, 2024

Assignee: Ampere Computing LLC

Inventors: Sean Philip Mirkes, Jason Anthony Bessette
Stream data unit with multiple head registers

Patent number: 11934833

Abstract: A streaming engine employed in a digital signal processor specifies a fixed read only data stream. Once fetched the data stream is stored in two head registers for presentation to functional units in the fixed order. Data use by the functional unit is preferably controlled using the input operand fields of the corresponding instruction. A first read only operand coding supplies data from the first head register. A first read/advance operand coding supplies data from the first head register and also advances the stream to the next sequential data elements. Corresponding second read only operand coding and second read/advance operand coding operate similarly with the second head register. A third read only operand coding supplies double width data from both head registers.

Type: Grant

Filed: December 21, 2021

Date of Patent: March 19, 2024

Assignee: Texas Instruments Incorporated

Inventor: Joseph Zbiciak
Tracing synchronization activity of a processing unit

Patent number: 11907772

Abstract: A device comprising: a processing unit comprising at least one processor configured to: participate in barrier synchronisations, each of which separates a compute phase of the at least one processor from an exchange phase for the at least one processor; and exchange sync messages with a sync controller hardware unit so as to co-ordinate each of the barrier synchronisations; and sync trace circuitry configured to: receive one or more of the sync messages; and in response to each of the one or more of the sync messages, provide sync trace information for output from the device, the sync trace information comprising timing information associated with the respective sync message.

Type: Grant

Filed: August 24, 2021

Date of Patent: February 20, 2024

Assignee: GRAPHCORE LIMITED

Inventor: Daniel John Pelham Wilkinson
Loop execution in a reconfigurable compute fabric using flow controllers for respective synchronous flows

Patent number: 11907718

Abstract: Various examples are directed to systems and methods for executing a loop in a reconfigurable compute fabric. A first flow controller may initiate a first thread at a first synchronous flow to execute a first portion of a first iteration of the loop. A second flow controller may receive a first asynchronous message instructing the second flow controller to initiate a first thread at a second synchronous flow to execute a second portion of the first iteration. The second flow controller may determine that the first iteration of the loop is the last iteration of the loop to be executed and initiate the first thread at the second synchronous flow with a last iteration flag set.

Type: Grant

Filed: August 18, 2021

Date of Patent: February 20, 2024

Assignee: Micron Technology, Inc.

Inventors: Douglas Vanesko, Bryan Hornung, Patrick Estep
Stack pointer instruction buffer for zero-cycle loads

Patent number: 11900118

Abstract: An apparatus includes a rescue buffer circuit, a store queue circuit, and a control circuit. The rescue buffer circuit may be configured to retain address information related to store instructions. The store queue circuit may be configured to buffer dependency information related to a particular store instruction until the particular store instruction is released to be executed. The control circuit may be configured to cause a subset of the dependency information for the particular store instruction to be written to the rescue buffer circuit. The rescue buffer circuit may be configured to retain the subset after the dependency information has been released from the store queue circuit, and to perform a subsequent load instruction corresponding to a memory location associated with the particular store instruction using the subset of the dependency information from the rescue buffer circuit.

Type: Grant

Filed: August 5, 2022

Date of Patent: February 13, 2024

Assignee: Apple Inc.

Inventors: John D. Pape, Francesco Spadini, Zhaoxiang Jin
Method of debugging a processor that executes vertices of an application, each vertex being assigned to a programming thread of the processor

Patent number: 11893390

Abstract: A method for debugging a processor which is executing vertices of a software application is described. Each vertex is assigned to a programming thread of the processor. The processor has debug hardware for raising exceptions in certain break conditions. The method comprises inspecting a vertex identifier, comparing the vertex identifier and raising an instruction exception event for the programming thread if the vertex identifier assigned to the thread matches the vertex break identifier in the debug hardware. Exceptions are raised based on identified vertices, rather than just individual instructions or instruction addresses.

Type: Grant

Filed: July 13, 2022

Date of Patent: February 6, 2024

Assignee: GRAPHCORE LIMITED

Inventors: Alan Graham Alexander, Richard Luke Southwell Osborne, Matthew David Fyles
Synchronization mechanisms for a multi-core processor using wait commands having either a blocking or a non-blocking state

Patent number: 11892972

Abstract: Systems, apparatuses and methods suitable for optimizing synchronization mechanisms for multi-core processors are provided. The synchronizing mechanisms may be optimized by receiving a command stream which comprises a plurality of commands including one or more wait commands, wherein each wait command has an associated state and one or more associated conditions; sequentially processing each command in the command stream until a wait command is reached; checking the state associated with the wait command to be processed, wherein if said state is a blocking state, further processing of commands in the command stream is paused until each of said wait command's associated conditions are met, and wherein if said state is a non-blocking state, the next command in the command stream is retrieved and processed.

Type: Grant

Filed: September 8, 2021

Date of Patent: February 6, 2024

Assignee: Arm Limited

Inventors: Aaron Debattista, Jared Corey Smolens
Shifting input values within input buffer of neural network inference circuit

Patent number: 11886979

Abstract: Some embodiments provide a method for a neural network inference circuit that executes a neural network. The method loads a first set of inputs into an input buffer and computes a first dot product between the first set of inputs and a set of weights. The method shifts the first set of inputs in the buffer while loading a second set of inputs into the buffer such that a first subset of the first set of inputs is removed from the buffer, a second subset of the first set of inputs is moved to new locations in the buffer, and a second set of inputs are loaded into locations in the buffer vacated by the shifting. The method computes a second dot product between (i) the second set of inputs and the second subset of the first set of inputs and (ii) the set of weights.

Type: Grant

Filed: March 15, 2019

Date of Patent: January 30, 2024

Assignee: PERCEIVE CORPORATION

Inventors: Kenneth Duong, Jung Ko, Steven L. Teig
Dependency skipping in a load-compare-jump sequence of instructions by incorporating compare functionality into the jump instruction and auto-finishing the compare instruction

Patent number: 11886883

Abstract: A method of performing instructions in a computer processor architecture includes determining that a load instruction is being dispatched. Destination related data of the load instruction is written into a mapper of the architecture. A determination that a compare immediate instruction is being dispatched is made. A determination that a branch conditional instruction is being dispatched is made. The branch conditional instruction is configured to wait until the load instruction produces a result before the branch conditional instruction issues and executes. The branch conditional instruction skips waiting for a finish of the compare immediate instruction.

Type: Grant

Filed: August 26, 2021

Date of Patent: January 30, 2024

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Nicholas R. Orzol, Mehul Patel, Dung Q. Nguyen, Brian D. Barrick, Richard J. Eickemeyer, John B Griswell, Jr., Balaram Sinharoy, Brian W. Thompto, Ophir Erez
Deep learning accelerator and random access memory with separate memory access connections

Patent number: 11887647

Abstract: Systems, devices, and methods related to a Deep Learning Accelerator and memory are described. An integrated circuit may be configured to execute instructions with matrix operands and configured with: random access memory configured to store instructions executable by the Deep Learning Accelerator and store matrices of an Artificial Neural Network; a connection between the random access memory and the Deep Learning Accelerator; a first interface to a memory controller of a Central Processing Unit; and a second interface to a direct memory access controller. While the Deep Learning Accelerator is using the random access memory to process current input to the Artificial Neural Network in generating current output from the Artificial Neural Network, the direct memory access controller may concurrently load next input into the random access memory; and at the same time, the Central Processing Unit may concurrently retrieve prior output from the random access memory.

Type: Grant

Filed: April 9, 2020

Date of Patent: January 30, 2024

Assignee: Micron Technology, Inc.

Inventors: Poorna Kale, Jaime Cummins
Management of thrashing in a GPU

Patent number: 11875197

Abstract: Systems, apparatuses, and methods for managing a number of wavefronts permitted to concurrently execute in a processing system. An apparatus includes a register file with a plurality of registers and a plurality of compute units configured to execute wavefronts. A control unit of the apparatus is configured to allow a first number of wavefronts to execute concurrently on the plurality of compute units. The control unit is configured to allow no more than a second number of wavefronts to execute concurrently on the plurality of compute units, wherein the second number is less than the first number, in response to detection that thrashing of the register file is above a threshold. The control unit is configured to detect said thrashing based at least in part on a number of registers in use by executing wavefronts that spill to memory.

Type: Grant

Filed: December 29, 2020

Date of Patent: January 16, 2024

Assignee: Advanced Micro Devices, Inc.

Inventors: Bradford Michael Beckmann, Steven Tony Tye, Brian L. Sumner, Nicolai Hähnle
Dot product operations on sparse matrix elements

Patent number: 11842423

Abstract: Embodiments described herein include software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. One embodiment provides for data aware sparsity via compressed bitstreams. One embodiment provides for block sparse dot product instructions. One embodiment provides for a depth-wise adapter for a systolic array.

Type: Grant

Filed: December 15, 2020

Date of Patent: December 12, 2023

Assignee: Intel Corporation

Inventors: Abhishek Appu, Subramaniam Maiyuran, Mike Macpherson, Fangwen Fu, Jiasheng Chen, Varghese George, Vasanth Ranganathan, Ashutosh Garg, Joydeep Ray
Managing out-of-order retirement of instructions based on received instructions indicating start or stop to out-of-order retirement

Patent number: 11842198

Abstract: Retiring instructions out-of-order includes: receiving processor instructions comprising two or more and fewer than all processor instructions generated based on a program, where the processor instructions include a first instruction and a second instruction such that the first instruction precedes the second instruction in a program order of the program; receiving a start instruction that immediately precedes the processor instructions and indicates that the processor instructions are to be retired out-of-order; receiving a stop instruction immediately that succeeds the processor instructions and indicates a stop to out-of-order instruction retirement; and, in response to completing execution of the second instruction before completing execution of the first instruction, retiring the second instruction before retiring the first instruction.

Type: Grant

Filed: November 1, 2021

Date of Patent: December 12, 2023

Assignee: Marvell Asia Pte, Ltd.

Inventor: Shubhendu Sekhar Mukherjee
Systolic multiply delayed accumulate processor architecture

Patent number: 11842169

Abstract: Systems and methods are provided to perform multiplication-delayed-addition operations in a systolic array to increase clock speeds, reduce circuit area, and/or reduce dynamic power consumption. Each processing element in the systolic array can have a pipeline configured to perform a multiplication during a first systolic interval and to perform an accumulation during a second systolic interval. The multiplication result from the first systolic interval can be stored in a delay register for use by the accumulator during the second systolic interval. A skip detection circuit can be used to skip one or more of the multiplication, storing in the delay register, and the addition during skip conditions for improved energy efficiency.

Type: Grant

Filed: September 25, 2019

Date of Patent: December 12, 2023

Assignee: Amazon Technologies, Inc.

Inventor: Thomas Elmer
Method and apparatus for dynamically simplifying processor instructions

Patent number: 11816488

Abstract: There is provided methods and devices for dynamically simplifying processor instructions. A method includes receiving, at a computing device, processor instructions and determining, by the computing device, if instruction simplification is enabled for an instruction being processed. The method further includes determining, by the computing device, from an instruction simplification table if the instruction is capable of being simplified and scheduling, by the computing device, a simplified instruction based on the determination from the instruction simplification table. A device includes a processor, and a non-transient computer readable memory having stored thereon instructions which when executed by the processor configure the device to execute the methods disclosed herein.

Type: Grant

Filed: November 10, 2021

Date of Patent: November 14, 2023

Assignee: HUAWEI TECHNOLOGIES CO., LTD.

Inventors: Henry Fangli Kao, Shehab Yomn Abdellatif Elsayed, Tomasz Sebastian Czajkowski, Reza Azimi, Ehsan Amiri
Method of converting extended instructions based on an emulation flag and retirement of corresponding microinstructions, device and system using the same

Patent number: 11816487

Abstract: An instruction conversion device, an instruction conversion method, an instruction conversion system, and a processor are provided. The instruction conversion device includes a monitor for determining whether a ready-for-execution instruction is an instruction that belongs to a new instruction set or an extended instruction set, wherein the new instruction set and the extended instruction set have the same type of the instruction set architecture as that of the processor. If the ready-for-execution instruction is determined as an extended instruction, this extended instruction is converted into a converted instruction sequence by means of the conversion system, this converted instruction sequence is then sent to the processor for executions, thereby extending the lifespans of the electronic devices embodied with old-version processors.

Type: Grant

Filed: September 10, 2021

Date of Patent: November 14, 2023

Assignee: Shanghai Zhaoxin Semiconductor Co., Ltd.

Inventors: Weilin Wang, Yingbing Guan, Mengchen Yang
Neuron cache-based hardware branch prediction

Patent number: 11803386

Abstract: A branch prediction system includes a neuron cache and logic coupled to the neuron cache. The neuron cache includes one or more weights of a neural network model trained for one or more selected code sections, and the logic is to be used with the neuron cache to predict a target address for a branch instruction of the one or more selected code sections.

Type: Grant

Filed: September 16, 2021

Date of Patent: October 31, 2023

Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION

Inventors: Satish Kumar Sadasivam, Shruti Saxena, Puneeth A. H. Bhat

prev 1 2 3 4 5 6 7 … next