Patents by Inventor Brian Emberling
Brian Emberling has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12645362Abstract: A parallel processor assigns data for use by one or more tasks to a shared memory or memories associated with a plurality of compute units. A scheduler or other controller within or otherwise associated with the parallel processor assigns threads or groups of threads, which utilize the assigned data, to compute units as appropriate. Compute units utilize two sets of instructions, one specifying upper bits and one specifying lower bits of a memory address, to specify memory addresses that are larger than a number of bits an individual instruction can specify in a memory address field. Mode setting commands determine when and how lower bits in a memory address field of an instruction will be combined with upper bits in a previous instruction, e.g., through concatenation.Type: GrantFiled: September 25, 2024Date of Patent: June 2, 2026Assignees: ATI TECHNOLOGIES ULC, ADVANCED MICRO DEVICES, INC.Inventors: Ahmed Mohammed ElShafiey Mohammed ElTantawy, Brian Emberling, Stanislav Mekhanoshin
-
Patent number: 12646249Abstract: A method, system, and computer-readable medium for executing a task is disclosed. The method includes receiving input data and computing instructions, launching a workgroup including wavefronts to execute the task, wherein the launching causes the wavefronts to process the input data by sharing intermediate results and resources, and adjusting the operation based on characteristics of the wavefronts. The characteristics include data dependencies, computational load, memory usage, and execution timing requirements. The wavefronts execute the task in stages, where each stage processes portions of input data and data generated by other wavefronts.Type: GrantFiled: July 2, 2024Date of Patent: June 2, 2026Assignee: Advanced Micro Devices, Inc.Inventors: Brian Emberling, Michael Y. Chow
-
Publication number: 20260119386Abstract: During execution of software a processing unit issues asynchronous operations such that there are multiple asynchronous operations, such as memory operations, in flight (that is, pending execution completion) from a single set of instructions, such as a wavefront or warp. In some cases, the processing unit executes other operations while the multiple asynchronous operations are pending. Performance monitor circuitry records information, such as asynchronous operation count information, register file scoreboard information, and the like, that allows a software engineer to identify which of a plurality of asynchronous operations caused a stall.Type: ApplicationFiled: September 25, 2024Publication date: April 30, 2026Inventors: Joseph L. Greathouse, Brian Emberling, Nicholas Curtis, Rene Willibrordus van Oostrum
-
Publication number: 20260086718Abstract: A parallel processor assigns data for use by one or more tasks to a shared memory or memories associated with a plurality of compute units. A scheduler or other controller within or otherwise associated with the parallel processor assigns threads or groups of threads, which utilize the assigned data, to compute units as appropriate. Compute units utilize two sets of instructions, one specifying upper bits and one specifying lower bits of a memory address, to specify memory addresses that are larger than a number of bits an individual instruction can specify in a memory address field. Mode setting commands determine when and how lower bits in a memory address field of an instruction will be combined with upper bits in a previous instruction, e.g., through concatenation.Type: ApplicationFiled: September 25, 2024Publication date: March 26, 2026Inventors: Ahmed Mohammed EIShafiey Mohammed EITantawy, Brian Emberling, Stanislav Mekhanoshin
-
Patent number: 12517771Abstract: A disclosed technique includes executing, for a first wavefront, a barrier arrival notification instruction, for a first barrier, indicating arrival at a first barrier point; performing, for the first wavefront, work prior to the first barrier point; executing, for the first wavefront, a barrier check instruction; and executing, for the first wavefront, at a control flow path based on a result of the barrier check instruction.Type: GrantFiled: December 27, 2021Date of Patent: January 6, 2026Assignee: Advanced Micro Devices, Inc.Inventors: Brian Emberling, Joseph L Greathouse
-
Patent number: 12299413Abstract: A processing system executes wavefronts at multiple arithmetic logic unit (ALU) pipelines of a single instruction multiple data (SIMD) unit in a single execution cycle. The ALU pipelines each include a number of ALUs that execute instructions on wavefront operands that are collected from vector general process register (VGPR) banks at a cache and output results of the instructions executed on the wavefronts at a buffer. By storing wavefronts supplied by the VGPR banks at the cache, a greater number of wavefronts can be made available to the SIMD unit without increasing the VGPR bandwidth, enabling multiple ALU pipelines to execute instructions during a single execution cycle.Type: GrantFiled: January 16, 2024Date of Patent: May 13, 2025Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Bin He, Brian Emberling, Mark Leather, Michael Mantor
-
Patent number: 12229570Abstract: Block data load with transpose techniques are described. In one example, an input is received, at a control unit, specifying an instruction to load a block of data to at least one memory module using a transpose operation. Responsive to the receiving the input by the control unit, the block of data is caused to be loaded to the at least one memory module by transposing the block of data to form a transposed block of data and storing the transposed block of data in the at least one memory.Type: GrantFiled: September 25, 2022Date of Patent: February 18, 2025Assignee: Advanced Micro Devices, Inc.Inventors: Bin He, Michael John Mantor, Brian Emberling, Liang Huang, Chao Liu
-
Publication number: 20240355044Abstract: A method, system, and computer-readable medium for executing a task is disclosed. The method includes receiving input data and computing instructions, launching a workgroup including wavefronts to execute the task, wherein the launching causes the wavefronts to process the input data by sharing intermediate results and resources, and adjusting the operation based on characteristics of the wavefronts. The characteristics include data dependencies, computational load, memory usage, and execution timing requirements. The wavefronts execute the task in stages, where each stage processes portions of input data and data generated by other wavefronts.Type: ApplicationFiled: July 2, 2024Publication date: October 24, 2024Applicant: Advanced Micro Devices, Inc.Inventors: Brian Emberling, Michael Y. Chow
-
Publication number: 20240311199Abstract: A program code executing on a processing system includes one or more instructions each identifying a workload that includes a plurality of waves and each identifying resource allocations for the plurality of waves of the workgroup. In response to receiving an instruction identifying a workload and resource allocations for the plurality of waves of the workgroup, a processor allocates a first set of processing resources to a compute unit of the processor based on the resource allocations for the plurality of waves. The compute unit then performs operations for the workgroup using the allocated set of processing resources.Type: ApplicationFiled: March 13, 2023Publication date: September 19, 2024Inventors: Nicolai Haehnle, Mark Leather, Brian Emberling, Michael John Bedy, Daniel Schneider
-
Patent number: 12033275Abstract: Methods and systems are disclosed for executing a collaborative task in a shader system. Techniques disclosed include receiving, by the system, input data and computing instructions associated with the collaborative task, as well as a configuration setting, causing the system to operate in a takeover mode. The system then launches, exclusively in one workgroup processor, a workgroup including wavefronts configured to execute the collaborative task.Type: GrantFiled: September 29, 2021Date of Patent: July 9, 2024Assignee: Advanced Micro Devices, Inc.Inventors: Brian Emberling, Michael Y. Chow
-
Publication number: 20240168719Abstract: A processing system executes wavefronts at multiple arithmetic logic unit (ALU) pipelines of a single instruction multiple data (SIMD) unit in a single execution cycle. The ALU pipelines each include a number of ALUs that execute instructions on wavefront operands that are collected from vector general process register (VGPR) banks at a cache and output results of the instructions executed on the wavefronts at a buffer. By storing wavefronts supplied by the VGPR banks at the cache, a greater number of wavefronts can be made available to the SIMD unit without increasing the VGPR bandwidth, enabling multiple ALU pipelines to execute instructions during a single execution cycle.Type: ApplicationFiled: January 16, 2024Publication date: May 23, 2024Inventors: Bin HE, Brian EMBERLING, Mark LEATHER, Michael MANTOR
-
Publication number: 20240103879Abstract: Block data load with transpose techniques are described. In one example, an input is received, at a control unit, specifying an instruction to load a block of data to at least one memory module using a transpose operation. Responsive to the receiving the input by the control unit, the block of data is caused to be loaded to the at least one memory module by transposing the block of data to form a transposed block of data and storing the transposed block of data in the at least one memory.Type: ApplicationFiled: September 25, 2022Publication date: March 28, 2024Applicant: Advanced Micro Devices, Inc.Inventors: Bin He, Michael John Mantor, Brian Emberling, Liang Huang, Chao Liu
-
Patent number: 11847462Abstract: A software-based instruction scoreboard indicates dependencies between closely-issued instructions issued to an arithmetic logic unit (ALU) pipeline. The software-based instruction scoreboard inserts one or more control words into the command stream between the dependent instructions, which is then executed by the ALU pipeline. The control words identify the instruction(s) upon which the dependent instructions depend (parent instructions) so that the GPU hardware can ensure that the ALU pipeline does not stall while the dependent instruction waits for results from the parent instruction.Type: GrantFiled: December 15, 2020Date of Patent: December 19, 2023Assignee: Advanced Micro Devices, Inc.Inventor: Brian Emberling
-
Publication number: 20230205608Abstract: A disclosed technique includes executing, for a first wavefront, a barrier arrival notification instruction, for a first barrier, indicating arrival at a first barrier point; performing, for the first wavefront, work prior to the first barrier point; executing, for the first wavefront, a barrier check instruction; and executing, for the first wavefront, at a control flow path based on a result of the barrier check instruction.Type: ApplicationFiled: December 27, 2021Publication date: June 29, 2023Applicant: Advanced Micro Devices, Inc.Inventors: Brian Emberling, Joseph L. Greathouse
-
Patent number: 11675568Abstract: A processing system executes wavefronts at multiple arithmetic logic unit (ALU) pipelines of a single instruction multiple data (SIMD) unit in a single execution cycle. The ALU pipelines each include a number of ALUs that execute instructions on wavefront operands that are collected from vector general process register (VGPR) banks at a cache and output results of the instructions executed on the wavefronts at a buffer. By storing wavefronts supplied by the VGPR banks at the cache, a greater number of wavefronts can be made available to the SIMD unit without increasing the VGPR bandwidth, enabling multiple ALU pipelines to execute instructions during a single execution cycle.Type: GrantFiled: December 14, 2020Date of Patent: June 13, 2023Assignee: Advanced Micro Devices, Inc.Inventors: Bin He, Brian Emberling, Mark Leather, Michael Mantor
-
Publication number: 20230102767Abstract: Methods and systems are disclosed for executing a collaborative task in a shader system. Techniques disclosed include receiving, by the system, input data and computing instructions associated with the collaborative task, as well as a configuration setting, causing the system to operate in a takeover mode. The system then launches, exclusively in one workgroup processor, a workgroup including wavefronts configured to execute the collaborative task.Type: ApplicationFiled: September 29, 2021Publication date: March 30, 2023Applicant: Advanced Micro Devices, Inc.Inventors: Brian Emberling, Michael Y. Chow
-
Publication number: 20230097279Abstract: Methods and systems are disclosed for executing operations on single-instruction-multiple-data (SIMD) units. Techniques disclosed perform a dot product operation on input data during one computer cycle, including convolving the input data, generating intermediate data, and applying one or more transitional operations to the intermediate data to generate output data. Aspects described, wherein the input data is an input to a layer of a convolutional neural network and the generated output data is the output of the layer.Type: ApplicationFiled: September 29, 2021Publication date: March 30, 2023Applicant: Advanced Micro Devices, Inc.Inventors: Brian Emberling, Michael Mantor, Michael Y. Chow, Bin He
-
Patent number: 11386518Abstract: The address of the draw or dispatch packet responsible for creating an exception is tied to a shader/wavefront back to the draw command from which it originated. In various embodiments, a method of operating a graphics pipeline and exception handling includes receiving, at a command processor of a graphics processing unit (GPU), an exception signal indicating an occurrence of a pipeline exception at a shader stage of a graphics pipeline. The shader stage generates an exception signal in response to a pipeline exception and transmits the exception signal to the command processor. The command processor determines, based on the exception signal, an address of a command packet responsible for the occurrence of the pipeline exception.Type: GrantFiled: September 24, 2019Date of Patent: July 12, 2022Assignee: ADVANCED MICRO DEVICES, INC.Inventors: Michael Mantor, Alexander Fuad Ashkar, Randy Ramsey, Mangesh P. Nijasure, Brian Emberling
-
Publication number: 20220197649Abstract: A processing unit includes a first memory device and a second memory device. The first memory device includes a first plurality of general purpose registers (GPRs) and the second memory device includes a second plurality of GPRs. The second memory device includes fewer GPRs than the first memory device. Program data is stored at the first memory device and the second memory device based on expected frequency of accesses associated with the program data.Type: ApplicationFiled: December 21, 2021Publication date: June 23, 2022Inventors: Prasanna Balasundaram, Dipayan Karmakar, Brian Emberling
-
Publication number: 20220188120Abstract: A software-based instruction scoreboard indicates dependencies between closely-issued instructions issued to an arithmetic logic unit (ALU) pipeline. The software-based instruction scoreboard inserts one or more control words into the command stream between the dependent instructions, which is then executed by the ALU pipeline. The control words identify the instruction(s) upon which the dependent instructions depend (parent instructions) so that the GPU hardware can ensure that the ALU pipeline does not stall while the dependent instruction waits for results from the parent instruction.Type: ApplicationFiled: December 15, 2020Publication date: June 16, 2022Inventor: Brian EMBERLING