Patents Assigned to Advanced Micro Device
  • Publication number: 20240020173
    Abstract: Methods and systems are disclosed for distribution of a workload among nodes of a NUMA architecture. Techniques disclosed include receiving the workload and data batches, the data batches to be processed by the workload. Techniques disclosed further include assigning workload processes to the nodes according to a determined distribution, and, then, executing the workload according to the determined distribution. The determined distribution is selected out of a set of distributions, so that the execution time of the workload, when executed according to the determined distribution, is minimal.
    Type: Application
    Filed: July 12, 2022
    Publication date: January 18, 2024
    Applicant: Advanced Micro Devices, Inc.
    Inventor: Aditya Chatterjee
  • Patent number: 11875197
    Abstract: Systems, apparatuses, and methods for managing a number of wavefronts permitted to concurrently execute in a processing system. An apparatus includes a register file with a plurality of registers and a plurality of compute units configured to execute wavefronts. A control unit of the apparatus is configured to allow a first number of wavefronts to execute concurrently on the plurality of compute units. The control unit is configured to allow no more than a second number of wavefronts to execute concurrently on the plurality of compute units, wherein the second number is less than the first number, in response to detection that thrashing of the register file is above a threshold. The control unit is configured to detect said thrashing based at least in part on a number of registers in use by executing wavefronts that spill to memory.
    Type: Grant
    Filed: December 29, 2020
    Date of Patent: January 16, 2024
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Bradford Michael Beckmann, Steven Tony Tye, Brian L. Sumner, Nicolai Hähnle
  • Patent number: 11875875
    Abstract: Methods and systems are disclosed for calibrating, by a memory interface system, an interface with dynamic random-access memory (DRAM) using a dynamically changing training clock. Techniques disclosed comprise receiving a system clock having a clock signal at a first pulse rate. Then, during the training of the interface, techniques disclosed comprise generating a training clock from the clock signal at the first pulse rate, the training clock having a clock signal at a second pulse rate, and sending, based on the generated training clock, command signals, including address data, to the DRAM.
    Type: Grant
    Filed: December 29, 2021
    Date of Patent: January 16, 2024
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Anwar Kashem, Craig Daniel Eaton, Pouya Najafi Ashtiani
  • Patent number: 11876718
    Abstract: Graded throttling for network-on-chip traffic, including: calculating, by an agent of a network-on-chip, a number of outstanding transactions issued by the agent; determining that the number of outstanding transactions meets a threshold; and implementing, by the agent, in response to the number of outstanding transactions meeting the threshold, a traffic throttling policy.
    Type: Grant
    Filed: October 6, 2022
    Date of Patent: January 16, 2024
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventor: Narendra Kamat
  • Patent number: 11875425
    Abstract: Implementing heterogeneous wavefronts on a graphics processing unit (GPU) is disclosed. A scheduler assigns heterogeneous wavefronts for execution on a compute unit of a processing device. The heterogeneous wavefronts include different types of wavefronts such as vector compute wavefronts and service-level wavefronts that vary in resource requirements and instruction sets. As one example, heterogeneous wavefronts may include scalar wavefronts and vector compute wavefronts that execute on scalar units and vector units, respectively. Distinct sets of instructions are executed for the heterogeneous wavefronts on the compute unit. Heterogeneous wavefronts are processed in the same pipeline of the processing device.
    Type: Grant
    Filed: December 28, 2020
    Date of Patent: January 16, 2024
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Sooraj Puthoor, Bradford Beckmann, Nuwan Jayasena, Anthony Gutierrez
  • Patent number: 11874739
    Abstract: A memory module includes one or more programmable ECC engines that may be programed by a host processing element with a particular ECC implementation. As used herein, the term “ECC implementation” refers to ECC functionality for performing error detection and subsequent processing, for example using the results of the error detection to perform error correction and to encode corrupted data that cannot be corrected, etc. The approach allows an SoC designer or company to program and reprogram ECC engines in memory modules in a secure manner without having to disclose the particular ECC implementations used by the ECC engines to memory vendors or third parties.
    Type: Grant
    Filed: September 25, 2020
    Date of Patent: January 16, 2024
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Sudhanva Gurumurthi, Vilas Sridharan, Shaizeen Aga, Nuwan Jayasena, Michael Ignatowski, Shrikanth Ganapathy, John Kalamatianos
  • Patent number: 11874783
    Abstract: A coherent memory fabric includes a plurality of coherent master controllers and a coherent slave controller. The plurality of coherent master controllers each include a response data buffer. The coherent slave controller is coupled to the plurality of coherent master controllers. The coherent slave controller, responsive to determining a selected coherent block read command is guaranteed to have only one data response, sends a target request globally ordered message to the selected coherent master controller and transmits responsive data. The selected coherent master controller, responsive to receiving the target request globally ordered message, blocks any coherent probes to an address associated with the selected coherent block read command until receipt of the responsive data is acknowledged by a requesting client.
    Type: Grant
    Filed: December 21, 2021
    Date of Patent: January 16, 2024
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Vydhyanathan Kalyanasundharam, Amit P. Apte, Eric Christopher Morton, Ganesh Balakrishnan, Ann M. Ling
  • Patent number: 11874774
    Abstract: A method includes, in response to each write request of a plurality of write requests received at a memory-side cache device coupled with a memory device, writing payload data specified by the write request to the memory-side cache device, and when a first bandwidth availability condition is satisfied, performing a cache write-through by writing the payload data to the memory device, and recording an indication that the payload data written to the memory-side cache device matches the payload data written to the memory device.
    Type: Grant
    Filed: September 24, 2020
    Date of Patent: January 16, 2024
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Ravindra N. Bhargava, Ganesh Balakrishnan, Joe Sargunaraj, Chintan S. Patel, Girish Balaiah Aswathaiya, Vydhyanathan Kalyanasundharam
  • Patent number: 11869140
    Abstract: Improvements to graphics processing pipelines are disclosed. More specifically, the vertex shader stage, which performs vertex transformations, and the hull or geometry shader stages, are combined. If tessellation is disabled and geometry shading is enabled, then the graphics processing pipeline includes a combined vertex and graphics shader stage. If tessellation is enabled, then the graphics processing pipeline includes a combined vertex and hull shader stage. If tessellation and geometry shading are both disabled, then the graphics processing pipeline does not use a combined shader stage. The combined shader stages improve efficiency by reducing the number of executing instances of shader programs and associated resources reserved.
    Type: Grant
    Filed: April 19, 2021
    Date of Patent: January 9, 2024
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Mangesh P. Nijasure, Randy W. Ramsey, Todd Martin
  • Patent number: 11868306
    Abstract: A processing system includes a processing unit and a memory device. The memory device includes a processing-in-memory (PIM) module that performs processing operations on behalf of the processing unit. An instruction set architecture (ISA) of the PIM module has fewer instructions than an ISA of the processing unit. Instructions received from the processing unit are translated such that processing resources of the PIM module are virtualized. As a result, the PIM module concurrently performs processing operations for multiple threads or applications of the processing unit.
    Type: Grant
    Filed: September 13, 2022
    Date of Patent: January 9, 2024
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Michael L. Chu, Ashwin Aji, Muhammad Amber Hassaan
  • Patent number: 11868809
    Abstract: A processor includes a task scheduling unit and a compute unit coupled to the task scheduling unit. The task scheduling unit performs a task dependency assessment of a task dependency graph and task data requirements that correspond to each task of the plurality of tasks. Based on the task dependency assessment, the task scheduling unit schedules a first task of the plurality of tasks and a second proxy object of a plurality of proxy objects specified by the task data requirements such that a memory transfer of the second proxy object of the plurality of proxy objects occurs while the first task is being executed.
    Type: Grant
    Filed: January 11, 2023
    Date of Patent: January 9, 2024
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Muhammad Amber Hassaan, Anirudh Mohan Kaushik, Sooraj Puthoor, Gokul Subramanian Ravi, Bradford Beckmann, Ashwin Aji
  • Patent number: 11868818
    Abstract: Techniques for selectively executing a lock instruction speculatively or non-speculatively based on lock address prediction and/or temporal lock prediction. including methods an devices for locking an entry in a memory device. In some techniques, a lock instruction executed by a thread for a particular memory entry of a memory device is detected. Whether contention occurred for the particular memory entry during an earlier speculative lock is detected on a condition that the lock instruction comprises a speculative lock instruction. The lock is executed non-speculatively if contention occurred for the particular memory entry during an earlier speculative lock. The lock is executed speculatively if contention did not occur for the particular memory entry during an earlier speculative lock.
    Type: Grant
    Filed: September 22, 2016
    Date of Patent: January 9, 2024
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Gregory W. Smaus, John M. King, Matthew A. Rafacz, Matthew M. Crum
  • Patent number: 11868221
    Abstract: Techniques for performing cache operations are provided. The techniques include tracking performance events for a plurality of test sets of a cache, detecting a replacement policy change trigger event associated with a test set of the plurality of test sets, and in response to the replacement policy change trigger event, operating non-test sets of the cache according to a replacement policy associated with the test set.
    Type: Grant
    Filed: September 30, 2021
    Date of Patent: January 9, 2024
    Assignee: Advanced Micro Devices, Inc.
    Inventors: John Kelley, Vanchinathan Venkataramani, Paul J. Moyer
  • Patent number: 11868778
    Abstract: Compacted addressing for transaction layer packets, including: determining, for a first epoch, one or more low entropy address bits in a plurality of first transaction layer packets; removing, from one or more memory addresses of one or more second transaction layer packets, the one or more low entropy address bits; and sending the one or more second transaction layer packets.
    Type: Grant
    Filed: July 23, 2020
    Date of Patent: January 9, 2024
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Ganesh Dasika, Sergey Blagodurov, Seyedmohammad Seyedzadehdelcheh
  • Patent number: 11868777
    Abstract: Processor-guided execution of offloaded instructions using fixed function operations is disclosed. Instructions designated for remote execution by a target device are received by a processor. Each instruction includes, as an operand, a target register in the target device. The target register may be an architected virtual register. For each of the plurality of instructions, the processor transmits an offload request in the order that the instructions are received. The offload request includes the instruction designated for remote execution. The target device may be, for example, a processing-in-memory device or an accelerator coupled to a memory.
    Type: Grant
    Filed: December 16, 2020
    Date of Patent: January 9, 2024
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: John Kalamatianos, Michael T. Clark, Marius Evers, William L. Walker, Paul Moyer, Jay Fleischman, Jagadish B. Kotra
  • Patent number: 11868254
    Abstract: An electronic device includes a cache, a memory, and a controller. The controller stores an epoch counter value in metadata for a location in the memory when a cache block evicted from the cache is stored in the location. The controller also controls how the cache block is retained in the cache based at least in part on the epoch counter value when the cache block is subsequently retrieved from the location and stored in the cache.
    Type: Grant
    Filed: September 30, 2021
    Date of Patent: January 9, 2024
    Assignee: Advanced Micro Devices, Inc.
    Inventor: Nuwan Jayasena
  • Patent number: 11869874
    Abstract: A stacked die system includes at least three dies. A first die has a same design as a second die. The first die includes a first circuit, and the second die includes a corresponding second circuit. A signal is received at the first die and sent to the third die via the second die. The signal is routed through either the first circuit or the second circuit but not both. Accordingly, an operation is performed on the signal prior to the signal reaching the third die but the operation is not performed by both the first circuit and the second circuit.
    Type: Grant
    Filed: December 14, 2020
    Date of Patent: January 9, 2024
    Assignee: Advanced Micro Devices, Inc.
    Inventors: John Wuu, David Johnson
  • Patent number: 11868759
    Abstract: Shader source code performance prediction is described. In accordance with the described techniques, an update to shader source code for implementing a shader is received. A prediction of performance of the shader on a processing unit is generated based on the update to the shader source code. Feedback about the update is output. The feedback includes the prediction of performance of the shader. In one or more implementations, generating the prediction of performance of the shader includes compiling the shader source code with the update to generate a representation of the shader, inputting the representation of the shader to one or more machine learning models, and receiving the prediction of performance of the shader as an output from the one or more machine learning models.
    Type: Grant
    Filed: December 8, 2021
    Date of Patent: January 9, 2024
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Amit Ben-Moshe, Ian Charles Colbert
  • Publication number: 20240004645
    Abstract: An intermediate representation (IR) controller is described that, for a given intermediate representation (IR) primitive, selects a hardware compute unit of a plurality of hardware compute units. In a non-limiting example, the IR controller receives an input that specifies an IR primitive, a device mask indicating a type of hardware circuitry to be used to process the primitive, and a goal vector specifying a goal in the processing of the primitive. The IR controller also collects data describing power consumption by respective hardware compute units and completion times for processing respective IR primitives. This data is maintained as implementation profiles that describe operation of respective hardware compute units in processing respective IR primitives, e.g., as histograms. The implementation profiles are then leveraged by the IR controller to select hardware compute units for execution of subsequent IR primitives.
    Type: Application
    Filed: June 30, 2022
    Publication date: January 4, 2024
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Sergey Blagodurov, Ganesh Suryanarayanc Dasika
  • Publication number: 20240004801
    Abstract: An encryption circuit includes an iterative block cipher circuit. The iterative block cipher circuit has a counter input for a row index, a key input for receiving a secret key, and an output for providing an encrypted counter value in response to performing a block cipher process using the row index as a counter the secret key. The encryption circuit uses the iterative block cipher circuit during a row operation to a memory.
    Type: Application
    Filed: June 29, 2022
    Publication date: January 4, 2024
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Nuwan Jayasena, Shaizeen Dilawarhusen Aga, Vignesh Adhinarayanan