Patents by Inventor Bradford Beckmann

Bradford Beckmann has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240095180
    Abstract: The disclosed computer-implemented method for interpolating register-based lookup tables can include identifying, within a set of registers, a lookup table that has been encoded for storage within the set of registers. The method can also include receiving a request to look up a value in the lookup table and responding to the request by interpolating, from the encoded lookup table stored in the set of registers, a representation of the requested value. Various other methods, systems, and computer-readable media are also disclosed.
    Type: Application
    Filed: December 23, 2022
    Publication date: March 21, 2024
    Applicant: Advanced Micro Devices, Inc.
    Inventors: Gabriel H. Loh, Michael Estlick, Jay Fleischman, Michael J. Schulte, Bradford Beckmann, Yasuko Eckert
  • Patent number: 11875425
    Abstract: Implementing heterogeneous wavefronts on a graphics processing unit (GPU) is disclosed. A scheduler assigns heterogeneous wavefronts for execution on a compute unit of a processing device. The heterogeneous wavefronts include different types of wavefronts such as vector compute wavefronts and service-level wavefronts that vary in resource requirements and instruction sets. As one example, heterogeneous wavefronts may include scalar wavefronts and vector compute wavefronts that execute on scalar units and vector units, respectively. Distinct sets of instructions are executed for the heterogeneous wavefronts on the compute unit. Heterogeneous wavefronts are processed in the same pipeline of the processing device.
    Type: Grant
    Filed: December 28, 2020
    Date of Patent: January 16, 2024
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Sooraj Puthoor, Bradford Beckmann, Nuwan Jayasena, Anthony Gutierrez
  • Patent number: 11868809
    Abstract: A processor includes a task scheduling unit and a compute unit coupled to the task scheduling unit. The task scheduling unit performs a task dependency assessment of a task dependency graph and task data requirements that correspond to each task of the plurality of tasks. Based on the task dependency assessment, the task scheduling unit schedules a first task of the plurality of tasks and a second proxy object of a plurality of proxy objects specified by the task data requirements such that a memory transfer of the second proxy object of the plurality of proxy objects occurs while the first task is being executed.
    Type: Grant
    Filed: January 11, 2023
    Date of Patent: January 9, 2024
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Muhammad Amber Hassaan, Anirudh Mohan Kaushik, Sooraj Puthoor, Gokul Subramanian Ravi, Bradford Beckmann, Ashwin Aji
  • Publication number: 20230393855
    Abstract: An approach is provided for implementing register based single instruction, multiple data (SIMD) lookup table operations. According to the approach, an instruction set architecture (ISA) can support one or more SIMD instructions that enable vectors or multiple values in source data registers to be processed in parallel using a lookup table or truth table stored in one or more function registers. The SIMD instructions can be flexibly configured to support functions with inputs and outputs of various sizes and data formats. Various approaches are also described for supporting very large lookup tables that span multiple registers.
    Type: Application
    Filed: June 6, 2022
    Publication date: December 7, 2023
    Inventors: Gabriel H. Loh, Yasuko Eckert, Bradford Beckmann, Michael Estlick, Jay Fleischman
  • Patent number: 11740791
    Abstract: In some embodiments, a memory controller in a processor includes a base value cache, a compressor, and a metadata cache. The compressor is coupled to the base value cache and the metadata cache. The compressor compresses a data block using at least a base value and delta values. The compressor determines whether the size of the data block exceeds a data block threshold value. Based on the determination of whether the size of the compressed data block generated by the compressor exceeds the data block threshold value, the memory controller transfers only a set of the compressed delta values to memory for storage. A decompressor located in the lower level cache of the processor decompresses the compressed data block using the base value stored in the base value cache, metadata stored in the metadata cache and the delta values stored in memory.
    Type: Grant
    Filed: October 8, 2021
    Date of Patent: August 29, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Seyed Mohammad Seyedzadehdelcheh, Xianwei Zhang, Bradford Beckmann, Shomit N. Das
  • Patent number: 11734059
    Abstract: A processor includes a task scheduling unit and a compute unit coupled to the task scheduling unit. The task scheduling unit performs a task dependency assessment of a task dependency graph and task data requirements that correspond to each task of the plurality of tasks. Based on the task dependency assessment, the task scheduling unit schedules a first task of the plurality of tasks and a second proxy object of a plurality of proxy objects specified by the task data requirements such that a memory transfer of the second proxy object of the plurality of proxy objects occurs while the first task is being executed.
    Type: Grant
    Filed: March 19, 2020
    Date of Patent: August 22, 2023
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Muhammad Amber Hassaan, Anirudh Mohan Kaushik, Sooraj Puthoor, Gokul Subramanian Ravi, Bradford Beckmann, Ashwin Aji
  • Publication number: 20230229494
    Abstract: A processor includes a task scheduling unit and a compute unit coupled to the task scheduling unit. The task scheduling unit performs a task dependency assessment of a task dependency graph and task data requirements that correspond to each task of the plurality of tasks. Based on the task dependency assessment, the task scheduling unit schedules a first task of the plurality of tasks and a second proxy object of a plurality of proxy objects specified by the task data requirements such that a memory transfer of the second proxy object of the plurality of proxy objects occurs while the first task is being executed.
    Type: Application
    Filed: January 11, 2023
    Publication date: July 20, 2023
    Inventors: Muhammad Amber HASSAAN, Anirudh Mohan KAUSHIK, Sooraj PUTHOOR, Gokul Subramanian RAVI, Bradford BECKMANN, Ashwin AJI
  • Patent number: 11526449
    Abstract: A processing system limits the propagation of unnecessary memory updates by bypassing writing back dirty cache lines to other levels of a memory hierarchy in response to receiving an indication from software executing at a processor of the processing system that the value of the dirty cache line is dead (i.e., will not be read again or will not be read until after it has been overwritten). In response to receiving an indication from software that data is dead, a cache controller prevents propagation of the dead data to other levels of memory in response to eviction of the dead data or flushing of the cache at which the dead data is stored.
    Type: Grant
    Filed: August 31, 2020
    Date of Patent: December 13, 2022
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Johnathan Alsop, Pouya Fotouhi, Bradford Beckmann, Sergey Blagodurov
  • Patent number: 11487671
    Abstract: Wavefront loading in a processor is managed and includes monitoring a selected wavefront of a set of wavefronts. Reuse of memory access requests for the selected wavefront is counted. A cache hit rate in one or more caches of the processor is determined based on the counted reuse. Based on the cache hit rate, subsequent memory requests of other wavefronts of the set of wavefronts are modified by including a type of reuse of cache lines in requests to the caches. In the caches, storage of data in the caches is based on the type of reuse indicated by the subsequent memory access requests. Reused cache lines are protected by preventing cache line contents from being replaced by another cache line for a duration of processing the set of wavefronts. Caches are bypassed when streaming access requests are made.
    Type: Grant
    Filed: June 19, 2019
    Date of Patent: November 1, 2022
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Xianwei Zhang, John Kalamatianos, Bradford Beckmann
  • Patent number: 11481250
    Abstract: A first workgroup is preempted in response to threads in the first workgroup executing a first wait instruction including a first value of a signal and a first hint indicating a type of modification for the signal. The first workgroup is scheduled for execution on a processor core based on a first context after preemption in response to the signal having the first value. A second workgroup is scheduled for execution on the processor core based on a second context in response to preempting the first workgroup and in response to the signal having a second value. A third context it is prefetched into registers of the processor core based on the first hint and the second value. The first context is stored in a first portion of the registers and the second context is prefetched into a second portion of the registers prior to preempting the first workgroup.
    Type: Grant
    Filed: June 29, 2018
    Date of Patent: October 25, 2022
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Alexandru Dutu, Matthew David Sinclair, Bradford Beckmann, David A. Wood
  • Publication number: 20220206869
    Abstract: Virtualizing resources of a memory-based execution device is disclosed. A host processing system orchestrates the execution of two or more offload tasks on a remote execution device. The remote execution device includes a memory array coupled to a processing unit that is shared by concurrent processes on the host processing system. The host processing system provides time-multiplexed access to the processing unit by each concurrent process for completing offload tasks on the processing unit. The host processing system initiates a context switch on the remote execution device from a first offload task to a second offload task. The context state of the first offload task is saved on the remote execution device.
    Type: Application
    Filed: December 28, 2020
    Publication date: June 30, 2022
    Inventors: VAIBHAV RAMAKRISHNAN RAMACHANDRAN, ALEXANDRU DUTU, BRADFORD BECKMANN
  • Publication number: 20220207643
    Abstract: Implementing heterogenous wavefronts on a graphics processing unit (GPU) is disclosed. A schedule assigns heterogeneous wavefronts for execution on a compute unit of a processing device. The heterogeneous wavefronts include different types of wavefronts such as vector compute wavefronts service-level wavefronts that vary in resource requirements and instruction sets. As one example, heterogenous wavefronts may include scalar wavefronts and vector compute wavefronts that execute on scalar units and vector units, respectively. Distinct sets of instructions are executed for the heterogenous wavefronts on the compute unit. Heterogenous wavefronts are processed in the same pipeline of the processing device.
    Type: Application
    Filed: December 28, 2020
    Publication date: June 30, 2022
    Inventors: SOORAJ PUTHOOR, BRADFORD BECKMANN, NUWAN JAYASENA, ANTHONY GUTIERREZ
  • Publication number: 20220083233
    Abstract: In some embodiments, a memory controller in a processor includes a base value cache, a compressor, and a metadata cache. The compressor is coupled to the base value cache and the metadata cache. The compressor compresses a data block using at least a base value and delta values. The compressor determines whether the size of the data block exceeds a data block threshold value. Based on the determination of whether the size of the compressed data block generated by the compressor exceeds the data block threshold value, the memory controller transfers only a set of the compressed delta values to memory for storage. A decompressor located in the lower level cache of the processor decompresses the compressed data block using the base value stored in the base value cache, metadata stored in the metadata cache and the delta values stored in memory.
    Type: Application
    Filed: October 8, 2021
    Publication date: March 17, 2022
    Inventors: Seyed Mohammad SEYEDZADEHDELCHEH, Xianwei ZHANG, Bradford BECKMANN, Shomit N. DAS
  • Publication number: 20220066940
    Abstract: A processing system limits the propagation of unnecessary memory updates by bypassing writing back dirty cache lines to other levels of a memory hierarchy in response to receiving an indication from software executing at a processor of the processing system that the value of the dirty cache line is dead (i.e., will not be read again or will not be read until after it has been overwritten). In response to receiving an indication from software that data is dead, a cache controller prevents propagation of the dead data to other levels of memory in response to eviction of the dead data or flushing of the cache at which the dead data is stored.
    Type: Application
    Filed: August 31, 2020
    Publication date: March 3, 2022
    Inventors: Johnathan ALSOP, Pouya FOTOUHI, Bradford BECKMANN, Sergey BLAGODUROV
  • Publication number: 20210373975
    Abstract: A processing system monitors and synchronizes parallel execution of workgroups (WGs). One or more of the WGs perform (e.g., periodically or in response to a trigger such as an indication of oversubscription) a waiting atomic instruction. In response to a comparison between an atomic value produced as a result of the waiting atomic instruction and an expected value, WGs that fail to produce a correct atomic value are identified as being in a waiting state (e.g., waiting for a synchronization variable). Execution of WGs in the waiting state is prevented (e.g., by a context switch) until corresponding synchronization variables are released.
    Type: Application
    Filed: September 23, 2020
    Publication date: December 2, 2021
    Inventors: Alexandru DUTU, Matthew David SINCLAIR, Bradford BECKMANN, David A. WOOD
  • Patent number: 11144208
    Abstract: In some embodiments, a memory controller in a processor includes a base value cache, a compressor, and a metadata cache. The compressor is coupled to the base value cache and the metadata cache. The compressor compresses a data block using at least a base value and delta values. The compressor determines whether the size of the data block exceeds a data block threshold value. Based on the determination of whether the size of the compressed data block generated by the compressor exceeds the data block threshold value, the memory controller transfers only a set of the compressed delta values to memory for storage. A decompressor located in the lower level cache of the processor decompresses the compressed data block using the base value stored in the base value cache, metadata stored in the metadata cache and the delta values stored in memory.
    Type: Grant
    Filed: December 23, 2019
    Date of Patent: October 12, 2021
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: SeyedMohammad Seyedzadehdelcheh, Xianwei Zhang, Bradford Beckmann, Shomit N. Das
  • Publication number: 20210294646
    Abstract: A processor includes a task scheduling unit and a compute unit coupled to the task scheduling unit. The task scheduling unit performs a task dependency assessment of a task dependency graph and task data requirements that correspond to each task of the plurality of tasks. Based on the task dependency assessment, the task scheduling unit schedules a first task of the plurality of tasks and a second proxy object of a plurality of proxy objects specified by the task data requirements such that a memory transfer of the second proxy object of the plurality of proxy objects occurs while the first task is being executed.
    Type: Application
    Filed: March 19, 2020
    Publication date: September 23, 2021
    Inventors: Muhammad Amber HASSAAN, Anirudh Mohan KAUSHIK, Sooraj PUTHOOR, Gokul Subramanian RAVI, Bradford BECKMANN, Ashwin AJI
  • Publication number: 20210191620
    Abstract: In some embodiments, a memory controller in a processor includes a base value cache, a compressor, and a metadata cache. The compressor is coupled to the base value cache and the metadata cache. The compressor compresses a data block using at least a base value and delta values. The compressor determines whether the size of the data block exceeds a data block threshold value. Based on the determination of whether the size of the compressed data block generated by the compressor exceeds the data block threshold value, the memory controller transfers only a set of the compressed delta values to memory for storage. A decompressor located in the lower level cache of the processor decompresses the compressed data block using the base value stored in the base value cache, metadata stored in the metadata cache and the delta values stored in memory.
    Type: Application
    Filed: December 23, 2019
    Publication date: June 24, 2021
    Inventors: SeyedMohammad SEYEDZADEHDELCHEH, Xianwei ZHANG, Bradford BECKMANN, Shomit N. DAS
  • Patent number: 11042484
    Abstract: A processing system includes one or more first caches and one or more first lock tables associated with the one or more first caches. The processing system also includes one or more processing units that each include a plurality of compute units for concurrently executing work-groups of work items, a plurality of second caches associated with the plurality of compute units and configured in a hierarchy with the one or more first caches, and a plurality of second lock tables associated with the plurality of second caches. The first and second lock tables indicate locking states of addresses of cache lines in the corresponding first and second caches on a per-line basis.
    Type: Grant
    Filed: June 24, 2016
    Date of Patent: June 22, 2021
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Johnathan R. Alsop, Bradford Beckmann
  • Publication number: 20200401529
    Abstract: Wavefront loading in a processor is managed and includes monitoring a selected wavefront of a set of wavefronts. Reuse of memory access requests for the selected wavefront is counted. A cache hit rate in one or more caches of the processor is determined based on the counted reuse. Based on the cache hit rate, subsequent memory requests of other wavefronts of the set of wavefronts are modified by including a type of reuse of cache lines in requests to the caches. In the caches, storage of data in the caches is based on the type of reuse indicated by the subsequent memory access requests. Reused cache lines are protected by preventing cache line contents from being replaced by another cache line for a duration of processing the set of wavefronts. Caches are bypassed when streaming access requests are made.
    Type: Application
    Filed: June 19, 2019
    Publication date: December 24, 2020
    Inventors: Xianwei ZHANG, John KALAMATIANOS, Bradford BECKMANN