Patents by Inventor Bradford Beckmann
Bradford Beckmann has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20250123846Abstract: A processing unit includes a plurality of processing cores and is configured to arrange a sparse matrix for parallel performance by the cores on different rows of the matrix at least in part by calculating a respective quantity of non-zero elements in each row, assigning each row to a respective collection according to the respective quantity of non-zero elements for the row, wherein the processing unit is configured to assign at least one first row of the sparse matrix to respective collections of in parallel with assigning at least one second row of the sparse matrix to respective collections, and performing at least one mathematical operation on at least a first collection of the plurality of collections in parallel with performing the at least one mathematical operation on at least a second collection of the plurality of collections.Type: ApplicationFiled: October 12, 2023Publication date: April 17, 2025Applicant: Advanced Micro Devices, Inc.Inventors: William Peter Ehrett, Muhammad Osama, Bradford Beckmann
-
Publication number: 20250103395Abstract: A computer-implemented method for dynamic resource management can include evaluating, by at least one processor, whether a priority of one or more processes associated with a request for one or more shared resources meets a threshold condition. The method can additionally include determining, by the at least one processor and in response to an evaluation that the priority meets the threshold condition, whether the one or more shared resources is available to meet the request. The method can further include completing, by the at least one processor and in response to a determination that the one or more shared resources is available, execution of the one or more processes. Various other methods, systems, and computer-readable media are also disclosed.Type: ApplicationFiled: September 27, 2023Publication date: March 27, 2025Applicant: Advanced Micro Devices, Inc.Inventors: Bradford Beckmann, Matthew David Sinclair, Vinay Bharadwaj Ramakrishnaiah, William Peter Ehrett
-
Publication number: 20240395289Abstract: Integrated circuit (IC) memory devices and methods for fabricating the same are provided. In one example, an integrated circuit (IC) memory device is provided that includes a substrate, at least two or more memory (IC) dies, and a non-memory IC die integrated in a chip package. The memory (IC) dies are stacked on the substrate to form a memory die stack. The non-memory IC die contains row segmentation logic having an output routed to corresponding wordline drivers of the memory IC dies through vertical wiring passing through the memory die stack.Type: ApplicationFiled: May 22, 2024Publication date: November 28, 2024Inventors: Vignesh ADHINARAYANAN, Hyung-Dong LEE, Bradford BECKMANN, Seyedmohammad SEYEDZADEHDELCHEH, Sergey BLAGODUROV
-
Patent number: 12131199Abstract: A processing system monitors and synchronizes parallel execution of workgroups (WGs). One or more of the WGs perform (e.g., periodically or in response to a trigger such as an indication of oversubscription) a waiting atomic instruction. In response to a comparison between an atomic value produced as a result of the waiting atomic instruction and an expected value, WGs that fail to produce a correct atomic value are identified as being in a waiting state (e.g., waiting for a synchronization variable). Execution of WGs in the waiting state is prevented (e.g., by a context switch) until corresponding synchronization variables are released.Type: GrantFiled: September 23, 2020Date of Patent: October 29, 2024Assignee: Advanced Micro Devices, Inc.Inventors: Alexandru Dutu, Matthew David Sinclair, Bradford Beckmann, David A. Wood
-
Publication number: 20240329984Abstract: An electronic device includes processing circuitry that executes a lookup table (LUT) vector instruction. Executing the lookup table vector instruction causes the processing circuitry to acquire a set of reference values by using each input value from an input vector as an index to acquire a reference value from a reference vector. The processing circuitry then provides the set of reference values for one or more subsequent operations. The processing circuitry can also use the set of reference values for controlling vector elements from among a set of vector elements for which a vector operation is performed.Type: ApplicationFiled: March 30, 2023Publication date: October 3, 2024Inventors: Yasuko Eckert, Vadim Vadimovich Nikiforov, Gabriel H. Loh, Bradford Beckmann
-
Publication number: 20240095180Abstract: The disclosed computer-implemented method for interpolating register-based lookup tables can include identifying, within a set of registers, a lookup table that has been encoded for storage within the set of registers. The method can also include receiving a request to look up a value in the lookup table and responding to the request by interpolating, from the encoded lookup table stored in the set of registers, a representation of the requested value. Various other methods, systems, and computer-readable media are also disclosed.Type: ApplicationFiled: December 23, 2022Publication date: March 21, 2024Applicant: Advanced Micro Devices, Inc.Inventors: Gabriel H. Loh, Michael Estlick, Jay Fleischman, Michael J. Schulte, Bradford Beckmann, Yasuko Eckert
-
Patent number: 11875425Abstract: Implementing heterogeneous wavefronts on a graphics processing unit (GPU) is disclosed. A scheduler assigns heterogeneous wavefronts for execution on a compute unit of a processing device. The heterogeneous wavefronts include different types of wavefronts such as vector compute wavefronts and service-level wavefronts that vary in resource requirements and instruction sets. As one example, heterogeneous wavefronts may include scalar wavefronts and vector compute wavefronts that execute on scalar units and vector units, respectively. Distinct sets of instructions are executed for the heterogeneous wavefronts on the compute unit. Heterogeneous wavefronts are processed in the same pipeline of the processing device.Type: GrantFiled: December 28, 2020Date of Patent: January 16, 2024Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Sooraj Puthoor, Bradford Beckmann, Nuwan Jayasena, Anthony Gutierrez
-
Patent number: 11868809Abstract: A processor includes a task scheduling unit and a compute unit coupled to the task scheduling unit. The task scheduling unit performs a task dependency assessment of a task dependency graph and task data requirements that correspond to each task of the plurality of tasks. Based on the task dependency assessment, the task scheduling unit schedules a first task of the plurality of tasks and a second proxy object of a plurality of proxy objects specified by the task data requirements such that a memory transfer of the second proxy object of the plurality of proxy objects occurs while the first task is being executed.Type: GrantFiled: January 11, 2023Date of Patent: January 9, 2024Assignee: Advanced Micro Devices, Inc.Inventors: Muhammad Amber Hassaan, Anirudh Mohan Kaushik, Sooraj Puthoor, Gokul Subramanian Ravi, Bradford Beckmann, Ashwin Aji
-
Publication number: 20230393855Abstract: An approach is provided for implementing register based single instruction, multiple data (SIMD) lookup table operations. According to the approach, an instruction set architecture (ISA) can support one or more SIMD instructions that enable vectors or multiple values in source data registers to be processed in parallel using a lookup table or truth table stored in one or more function registers. The SIMD instructions can be flexibly configured to support functions with inputs and outputs of various sizes and data formats. Various approaches are also described for supporting very large lookup tables that span multiple registers.Type: ApplicationFiled: June 6, 2022Publication date: December 7, 2023Inventors: Gabriel H. Loh, Yasuko Eckert, Bradford Beckmann, Michael Estlick, Jay Fleischman
-
Patent number: 11740791Abstract: In some embodiments, a memory controller in a processor includes a base value cache, a compressor, and a metadata cache. The compressor is coupled to the base value cache and the metadata cache. The compressor compresses a data block using at least a base value and delta values. The compressor determines whether the size of the data block exceeds a data block threshold value. Based on the determination of whether the size of the compressed data block generated by the compressor exceeds the data block threshold value, the memory controller transfers only a set of the compressed delta values to memory for storage. A decompressor located in the lower level cache of the processor decompresses the compressed data block using the base value stored in the base value cache, metadata stored in the metadata cache and the delta values stored in memory.Type: GrantFiled: October 8, 2021Date of Patent: August 29, 2023Assignee: Advanced Micro Devices, Inc.Inventors: Seyed Mohammad Seyedzadehdelcheh, Xianwei Zhang, Bradford Beckmann, Shomit N. Das
-
Patent number: 11734059Abstract: A processor includes a task scheduling unit and a compute unit coupled to the task scheduling unit. The task scheduling unit performs a task dependency assessment of a task dependency graph and task data requirements that correspond to each task of the plurality of tasks. Based on the task dependency assessment, the task scheduling unit schedules a first task of the plurality of tasks and a second proxy object of a plurality of proxy objects specified by the task data requirements such that a memory transfer of the second proxy object of the plurality of proxy objects occurs while the first task is being executed.Type: GrantFiled: March 19, 2020Date of Patent: August 22, 2023Assignee: Advanced Micro Devices, Inc.Inventors: Muhammad Amber Hassaan, Anirudh Mohan Kaushik, Sooraj Puthoor, Gokul Subramanian Ravi, Bradford Beckmann, Ashwin Aji
-
Publication number: 20230229494Abstract: A processor includes a task scheduling unit and a compute unit coupled to the task scheduling unit. The task scheduling unit performs a task dependency assessment of a task dependency graph and task data requirements that correspond to each task of the plurality of tasks. Based on the task dependency assessment, the task scheduling unit schedules a first task of the plurality of tasks and a second proxy object of a plurality of proxy objects specified by the task data requirements such that a memory transfer of the second proxy object of the plurality of proxy objects occurs while the first task is being executed.Type: ApplicationFiled: January 11, 2023Publication date: July 20, 2023Inventors: Muhammad Amber HASSAAN, Anirudh Mohan KAUSHIK, Sooraj PUTHOOR, Gokul Subramanian RAVI, Bradford BECKMANN, Ashwin AJI
-
Patent number: 11526449Abstract: A processing system limits the propagation of unnecessary memory updates by bypassing writing back dirty cache lines to other levels of a memory hierarchy in response to receiving an indication from software executing at a processor of the processing system that the value of the dirty cache line is dead (i.e., will not be read again or will not be read until after it has been overwritten). In response to receiving an indication from software that data is dead, a cache controller prevents propagation of the dead data to other levels of memory in response to eviction of the dead data or flushing of the cache at which the dead data is stored.Type: GrantFiled: August 31, 2020Date of Patent: December 13, 2022Assignee: Advanced Micro Devices, Inc.Inventors: Johnathan Alsop, Pouya Fotouhi, Bradford Beckmann, Sergey Blagodurov
-
Patent number: 11487671Abstract: Wavefront loading in a processor is managed and includes monitoring a selected wavefront of a set of wavefronts. Reuse of memory access requests for the selected wavefront is counted. A cache hit rate in one or more caches of the processor is determined based on the counted reuse. Based on the cache hit rate, subsequent memory requests of other wavefronts of the set of wavefronts are modified by including a type of reuse of cache lines in requests to the caches. In the caches, storage of data in the caches is based on the type of reuse indicated by the subsequent memory access requests. Reused cache lines are protected by preventing cache line contents from being replaced by another cache line for a duration of processing the set of wavefronts. Caches are bypassed when streaming access requests are made.Type: GrantFiled: June 19, 2019Date of Patent: November 1, 2022Assignee: Advanced Micro Devices, Inc.Inventors: Xianwei Zhang, John Kalamatianos, Bradford Beckmann
-
Patent number: 11481250Abstract: A first workgroup is preempted in response to threads in the first workgroup executing a first wait instruction including a first value of a signal and a first hint indicating a type of modification for the signal. The first workgroup is scheduled for execution on a processor core based on a first context after preemption in response to the signal having the first value. A second workgroup is scheduled for execution on the processor core based on a second context in response to preempting the first workgroup and in response to the signal having a second value. A third context it is prefetched into registers of the processor core based on the first hint and the second value. The first context is stored in a first portion of the registers and the second context is prefetched into a second portion of the registers prior to preempting the first workgroup.Type: GrantFiled: June 29, 2018Date of Patent: October 25, 2022Assignee: Advanced Micro Devices, Inc.Inventors: Alexandru Dutu, Matthew David Sinclair, Bradford Beckmann, David A. Wood
-
Publication number: 20220206869Abstract: Virtualizing resources of a memory-based execution device is disclosed. A host processing system orchestrates the execution of two or more offload tasks on a remote execution device. The remote execution device includes a memory array coupled to a processing unit that is shared by concurrent processes on the host processing system. The host processing system provides time-multiplexed access to the processing unit by each concurrent process for completing offload tasks on the processing unit. The host processing system initiates a context switch on the remote execution device from a first offload task to a second offload task. The context state of the first offload task is saved on the remote execution device.Type: ApplicationFiled: December 28, 2020Publication date: June 30, 2022Inventors: VAIBHAV RAMAKRISHNAN RAMACHANDRAN, ALEXANDRU DUTU, BRADFORD BECKMANN
-
Publication number: 20220207643Abstract: Implementing heterogenous wavefronts on a graphics processing unit (GPU) is disclosed. A schedule assigns heterogeneous wavefronts for execution on a compute unit of a processing device. The heterogeneous wavefronts include different types of wavefronts such as vector compute wavefronts service-level wavefronts that vary in resource requirements and instruction sets. As one example, heterogenous wavefronts may include scalar wavefronts and vector compute wavefronts that execute on scalar units and vector units, respectively. Distinct sets of instructions are executed for the heterogenous wavefronts on the compute unit. Heterogenous wavefronts are processed in the same pipeline of the processing device.Type: ApplicationFiled: December 28, 2020Publication date: June 30, 2022Inventors: SOORAJ PUTHOOR, BRADFORD BECKMANN, NUWAN JAYASENA, ANTHONY GUTIERREZ
-
Publication number: 20220083233Abstract: In some embodiments, a memory controller in a processor includes a base value cache, a compressor, and a metadata cache. The compressor is coupled to the base value cache and the metadata cache. The compressor compresses a data block using at least a base value and delta values. The compressor determines whether the size of the data block exceeds a data block threshold value. Based on the determination of whether the size of the compressed data block generated by the compressor exceeds the data block threshold value, the memory controller transfers only a set of the compressed delta values to memory for storage. A decompressor located in the lower level cache of the processor decompresses the compressed data block using the base value stored in the base value cache, metadata stored in the metadata cache and the delta values stored in memory.Type: ApplicationFiled: October 8, 2021Publication date: March 17, 2022Inventors: Seyed Mohammad SEYEDZADEHDELCHEH, Xianwei ZHANG, Bradford BECKMANN, Shomit N. DAS
-
Publication number: 20220066940Abstract: A processing system limits the propagation of unnecessary memory updates by bypassing writing back dirty cache lines to other levels of a memory hierarchy in response to receiving an indication from software executing at a processor of the processing system that the value of the dirty cache line is dead (i.e., will not be read again or will not be read until after it has been overwritten). In response to receiving an indication from software that data is dead, a cache controller prevents propagation of the dead data to other levels of memory in response to eviction of the dead data or flushing of the cache at which the dead data is stored.Type: ApplicationFiled: August 31, 2020Publication date: March 3, 2022Inventors: Johnathan ALSOP, Pouya FOTOUHI, Bradford BECKMANN, Sergey BLAGODUROV
-
Publication number: 20210373975Abstract: A processing system monitors and synchronizes parallel execution of workgroups (WGs). One or more of the WGs perform (e.g., periodically or in response to a trigger such as an indication of oversubscription) a waiting atomic instruction. In response to a comparison between an atomic value produced as a result of the waiting atomic instruction and an expected value, WGs that fail to produce a correct atomic value are identified as being in a waiting state (e.g., waiting for a synchronization variable). Execution of WGs in the waiting state is prevented (e.g., by a context switch) until corresponding synchronization variables are released.Type: ApplicationFiled: September 23, 2020Publication date: December 2, 2021Inventors: Alexandru DUTU, Matthew David SINCLAIR, Bradford BECKMANN, David A. WOOD