Patents Examined by Shawn Doman
  • Patent number: 11977895
    Abstract: Examples described herein relate to a graphics processing unit (GPU) coupled to the memory device, the GPU configured to: execute an instruction thread; determine if a dual directional signal barrier is associated with the instruction thread; and based on clearance of the dual directional signal barrier for a particular signal barrier identifier and a mode of operation, indicate a clearance of the dual directional signal barrier for the mode of operation, wherein the dual directional signal barrier is to provide a single barrier to gate activity of one or more producers based on activity of one or more consumers or gate activity of one or more consumers based on activity of one or more producers.
    Type: Grant
    Filed: December 22, 2020
    Date of Patent: May 7, 2024
    Assignee: Intel Corporation
    Inventors: Sabareesh Ganapathy, Fangwen Fu, Hong Jiang, James Valerio
  • Patent number: 11977890
    Abstract: Stateful microbranch instructions, including: generating, based on an instruction, a first one or more microinstructions including a stateful microbranch instruction, wherein the stateful microbranch instruction includes: an address of a next instruction after the instruction; a branch target address; one or more microcode attributes; and executing the first one or more microinstructions.
    Type: Grant
    Filed: December 30, 2021
    Date of Patent: May 7, 2024
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Magiting M. Talisayon, Luca Schiano, Neil N. Marketkar, Yueh-Chuan Tzeng
  • Patent number: 11977896
    Abstract: An apparatus, method and computer program, the apparatus comprising processing circuitry to execute instructions, issue circuitry to issue the instructions for execution by the processing circuitry, and candidate instruction storage circuitry to store a plurality of condition-dependent instructions, each specifying at least one condition. The issue circuitry is configured to issue a given condition-dependent instruction in response to a determination or a prediction of the at least one condition specified by the given condition-dependent instruction being met, and when the given condition-dependent instruction is a sequence-start instruction, the issue circuitry is responsive to the determination or prediction to issue a sequence of instructions comprising the sequence-start instruction and at least one subsequent instruction.
    Type: Grant
    Filed: September 12, 2022
    Date of Patent: May 7, 2024
    Assignee: Arm Limited
    Inventors: Matthew James Walker, Mbou Eyole, Giacomo Gabrielli, Balaji Venu, Wei Wang
  • Patent number: 11966740
    Abstract: A processor comprising: a register file comprising a group of operand registers for holding data values, each operand register being a fixed number of bits in length for holding a respective data value of that length; and processing logic comprising floating point logic for performing floating point operations on data values in the register file, the floating point logic is configured to process the fixed number of bits in the respective data value according to a floating point format comprising a set of mantissa bits and a set of exponent bits. The processing logic is operable to select between a plurality of different variants of the floating point format, at least some of the variants having a different size sets of mantissa bits and exponent bits relative to one another.
    Type: Grant
    Filed: August 10, 2021
    Date of Patent: April 23, 2024
    Assignee: Graphcore Limited
    Inventors: Mrudula Gore, Alan Alexander
  • Patent number: 11934834
    Abstract: Instruction scheduling in a processor using operation source parent tracking. A source parent is a producer instruction whose execution generates a produced value consumed by a consumer instruction. The processor is configured to track identifying operation source parent information for instructions processed in a pipeline and providing such operation source parent information to a scheduling circuit along with the associated consumer instruction. The scheduling circuit is configured to perform instruction scheduling using operation source parent tracking on received instruction(s) to be scheduled for execution. The processor is configured to compare sources and destinations for each of the instructions to be scheduled based on the operation source parent information to determine instructions ready for scheduling for execution.
    Type: Grant
    Filed: October 19, 2021
    Date of Patent: March 19, 2024
    Assignee: Ampere Computing LLC
    Inventors: Sean Philip Mirkes, Jason Anthony Bessette
  • Patent number: 11934833
    Abstract: A streaming engine employed in a digital signal processor specifies a fixed read only data stream. Once fetched the data stream is stored in two head registers for presentation to functional units in the fixed order. Data use by the functional unit is preferably controlled using the input operand fields of the corresponding instruction. A first read only operand coding supplies data from the first head register. A first read/advance operand coding supplies data from the first head register and also advances the stream to the next sequential data elements. Corresponding second read only operand coding and second read/advance operand coding operate similarly with the second head register. A third read only operand coding supplies double width data from both head registers.
    Type: Grant
    Filed: December 21, 2021
    Date of Patent: March 19, 2024
    Assignee: Texas Instruments Incorporated
    Inventor: Joseph Zbiciak
  • Patent number: 11907772
    Abstract: A device comprising: a processing unit comprising at least one processor configured to: participate in barrier synchronisations, each of which separates a compute phase of the at least one processor from an exchange phase for the at least one processor; and exchange sync messages with a sync controller hardware unit so as to co-ordinate each of the barrier synchronisations; and sync trace circuitry configured to: receive one or more of the sync messages; and in response to each of the one or more of the sync messages, provide sync trace information for output from the device, the sync trace information comprising timing information associated with the respective sync message.
    Type: Grant
    Filed: August 24, 2021
    Date of Patent: February 20, 2024
    Assignee: GRAPHCORE LIMITED
    Inventor: Daniel John Pelham Wilkinson
  • Patent number: 11907718
    Abstract: Various examples are directed to systems and methods for executing a loop in a reconfigurable compute fabric. A first flow controller may initiate a first thread at a first synchronous flow to execute a first portion of a first iteration of the loop. A second flow controller may receive a first asynchronous message instructing the second flow controller to initiate a first thread at a second synchronous flow to execute a second portion of the first iteration. The second flow controller may determine that the first iteration of the loop is the last iteration of the loop to be executed and initiate the first thread at the second synchronous flow with a last iteration flag set.
    Type: Grant
    Filed: August 18, 2021
    Date of Patent: February 20, 2024
    Assignee: Micron Technology, Inc.
    Inventors: Douglas Vanesko, Bryan Hornung, Patrick Estep
  • Patent number: 11900118
    Abstract: An apparatus includes a rescue buffer circuit, a store queue circuit, and a control circuit. The rescue buffer circuit may be configured to retain address information related to store instructions. The store queue circuit may be configured to buffer dependency information related to a particular store instruction until the particular store instruction is released to be executed. The control circuit may be configured to cause a subset of the dependency information for the particular store instruction to be written to the rescue buffer circuit. The rescue buffer circuit may be configured to retain the subset after the dependency information has been released from the store queue circuit, and to perform a subsequent load instruction corresponding to a memory location associated with the particular store instruction using the subset of the dependency information from the rescue buffer circuit.
    Type: Grant
    Filed: August 5, 2022
    Date of Patent: February 13, 2024
    Assignee: Apple Inc.
    Inventors: John D. Pape, Francesco Spadini, Zhaoxiang Jin
  • Patent number: 11893390
    Abstract: A method for debugging a processor which is executing vertices of a software application is described. Each vertex is assigned to a programming thread of the processor. The processor has debug hardware for raising exceptions in certain break conditions. The method comprises inspecting a vertex identifier, comparing the vertex identifier and raising an instruction exception event for the programming thread if the vertex identifier assigned to the thread matches the vertex break identifier in the debug hardware. Exceptions are raised based on identified vertices, rather than just individual instructions or instruction addresses.
    Type: Grant
    Filed: July 13, 2022
    Date of Patent: February 6, 2024
    Assignee: GRAPHCORE LIMITED
    Inventors: Alan Graham Alexander, Richard Luke Southwell Osborne, Matthew David Fyles
  • Patent number: 11892972
    Abstract: Systems, apparatuses and methods suitable for optimizing synchronization mechanisms for multi-core processors are provided. The synchronizing mechanisms may be optimized by receiving a command stream which comprises a plurality of commands including one or more wait commands, wherein each wait command has an associated state and one or more associated conditions; sequentially processing each command in the command stream until a wait command is reached; checking the state associated with the wait command to be processed, wherein if said state is a blocking state, further processing of commands in the command stream is paused until each of said wait command's associated conditions are met, and wherein if said state is a non-blocking state, the next command in the command stream is retrieved and processed.
    Type: Grant
    Filed: September 8, 2021
    Date of Patent: February 6, 2024
    Assignee: Arm Limited
    Inventors: Aaron Debattista, Jared Corey Smolens
  • Patent number: 11887647
    Abstract: Systems, devices, and methods related to a Deep Learning Accelerator and memory are described. An integrated circuit may be configured to execute instructions with matrix operands and configured with: random access memory configured to store instructions executable by the Deep Learning Accelerator and store matrices of an Artificial Neural Network; a connection between the random access memory and the Deep Learning Accelerator; a first interface to a memory controller of a Central Processing Unit; and a second interface to a direct memory access controller. While the Deep Learning Accelerator is using the random access memory to process current input to the Artificial Neural Network in generating current output from the Artificial Neural Network, the direct memory access controller may concurrently load next input into the random access memory; and at the same time, the Central Processing Unit may concurrently retrieve prior output from the random access memory.
    Type: Grant
    Filed: April 9, 2020
    Date of Patent: January 30, 2024
    Assignee: Micron Technology, Inc.
    Inventors: Poorna Kale, Jaime Cummins
  • Patent number: 11886979
    Abstract: Some embodiments provide a method for a neural network inference circuit that executes a neural network. The method loads a first set of inputs into an input buffer and computes a first dot product between the first set of inputs and a set of weights. The method shifts the first set of inputs in the buffer while loading a second set of inputs into the buffer such that a first subset of the first set of inputs is removed from the buffer, a second subset of the first set of inputs is moved to new locations in the buffer, and a second set of inputs are loaded into locations in the buffer vacated by the shifting. The method computes a second dot product between (i) the second set of inputs and the second subset of the first set of inputs and (ii) the set of weights.
    Type: Grant
    Filed: March 15, 2019
    Date of Patent: January 30, 2024
    Assignee: PERCEIVE CORPORATION
    Inventors: Kenneth Duong, Jung Ko, Steven L. Teig
  • Patent number: 11886883
    Abstract: A method of performing instructions in a computer processor architecture includes determining that a load instruction is being dispatched. Destination related data of the load instruction is written into a mapper of the architecture. A determination that a compare immediate instruction is being dispatched is made. A determination that a branch conditional instruction is being dispatched is made. The branch conditional instruction is configured to wait until the load instruction produces a result before the branch conditional instruction issues and executes. The branch conditional instruction skips waiting for a finish of the compare immediate instruction.
    Type: Grant
    Filed: August 26, 2021
    Date of Patent: January 30, 2024
    Assignee: INTERNATIONAL BUSINESS MACHINES CORPORATION
    Inventors: Nicholas R. Orzol, Mehul Patel, Dung Q. Nguyen, Brian D. Barrick, Richard J. Eickemeyer, John B Griswell, Jr., Balaram Sinharoy, Brian W. Thompto, Ophir Erez
  • Patent number: 11875197
    Abstract: Systems, apparatuses, and methods for managing a number of wavefronts permitted to concurrently execute in a processing system. An apparatus includes a register file with a plurality of registers and a plurality of compute units configured to execute wavefronts. A control unit of the apparatus is configured to allow a first number of wavefronts to execute concurrently on the plurality of compute units. The control unit is configured to allow no more than a second number of wavefronts to execute concurrently on the plurality of compute units, wherein the second number is less than the first number, in response to detection that thrashing of the register file is above a threshold. The control unit is configured to detect said thrashing based at least in part on a number of registers in use by executing wavefronts that spill to memory.
    Type: Grant
    Filed: December 29, 2020
    Date of Patent: January 16, 2024
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Bradford Michael Beckmann, Steven Tony Tye, Brian L. Sumner, Nicolai Hähnle
  • Patent number: 11842198
    Abstract: Retiring instructions out-of-order includes: receiving processor instructions comprising two or more and fewer than all processor instructions generated based on a program, where the processor instructions include a first instruction and a second instruction such that the first instruction precedes the second instruction in a program order of the program; receiving a start instruction that immediately precedes the processor instructions and indicates that the processor instructions are to be retired out-of-order; receiving a stop instruction immediately that succeeds the processor instructions and indicates a stop to out-of-order instruction retirement; and, in response to completing execution of the second instruction before completing execution of the first instruction, retiring the second instruction before retiring the first instruction.
    Type: Grant
    Filed: November 1, 2021
    Date of Patent: December 12, 2023
    Assignee: Marvell Asia Pte, Ltd.
    Inventor: Shubhendu Sekhar Mukherjee
  • Patent number: 11842169
    Abstract: Systems and methods are provided to perform multiplication-delayed-addition operations in a systolic array to increase clock speeds, reduce circuit area, and/or reduce dynamic power consumption. Each processing element in the systolic array can have a pipeline configured to perform a multiplication during a first systolic interval and to perform an accumulation during a second systolic interval. The multiplication result from the first systolic interval can be stored in a delay register for use by the accumulator during the second systolic interval. A skip detection circuit can be used to skip one or more of the multiplication, storing in the delay register, and the addition during skip conditions for improved energy efficiency.
    Type: Grant
    Filed: September 25, 2019
    Date of Patent: December 12, 2023
    Assignee: Amazon Technologies, Inc.
    Inventor: Thomas Elmer
  • Patent number: 11842423
    Abstract: Embodiments described herein include software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. One embodiment provides for data aware sparsity via compressed bitstreams. One embodiment provides for block sparse dot product instructions. One embodiment provides for a depth-wise adapter for a systolic array.
    Type: Grant
    Filed: December 15, 2020
    Date of Patent: December 12, 2023
    Assignee: Intel Corporation
    Inventors: Abhishek Appu, Subramaniam Maiyuran, Mike Macpherson, Fangwen Fu, Jiasheng Chen, Varghese George, Vasanth Ranganathan, Ashutosh Garg, Joydeep Ray
  • Patent number: 11816487
    Abstract: An instruction conversion device, an instruction conversion method, an instruction conversion system, and a processor are provided. The instruction conversion device includes a monitor for determining whether a ready-for-execution instruction is an instruction that belongs to a new instruction set or an extended instruction set, wherein the new instruction set and the extended instruction set have the same type of the instruction set architecture as that of the processor. If the ready-for-execution instruction is determined as an extended instruction, this extended instruction is converted into a converted instruction sequence by means of the conversion system, this converted instruction sequence is then sent to the processor for executions, thereby extending the lifespans of the electronic devices embodied with old-version processors.
    Type: Grant
    Filed: September 10, 2021
    Date of Patent: November 14, 2023
    Assignee: Shanghai Zhaoxin Semiconductor Co., Ltd.
    Inventors: Weilin Wang, Yingbing Guan, Mengchen Yang
  • Patent number: 11816488
    Abstract: There is provided methods and devices for dynamically simplifying processor instructions. A method includes receiving, at a computing device, processor instructions and determining, by the computing device, if instruction simplification is enabled for an instruction being processed. The method further includes determining, by the computing device, from an instruction simplification table if the instruction is capable of being simplified and scheduling, by the computing device, a simplified instruction based on the determination from the instruction simplification table. A device includes a processor, and a non-transient computer readable memory having stored thereon instructions which when executed by the processor configure the device to execute the methods disclosed herein.
    Type: Grant
    Filed: November 10, 2021
    Date of Patent: November 14, 2023
    Assignee: HUAWEI TECHNOLOGIES CO., LTD.
    Inventors: Henry Fangli Kao, Shehab Yomn Abdellatif Elsayed, Tomasz Sebastian Czajkowski, Reza Azimi, Ehsan Amiri