Patents by Inventor Rex Eldon MCCRARY

Rex Eldon MCCRARY has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

RECONFIGURABLE VIRTUAL GRAPHICS AND COMPUTE PROCESSOR PIPELINE

Publication number: 20250191111

Abstract: A plurality of programmable processing cores is configured to process graphics primitives and corresponding data and a plurality of fixed-function hardware units. The processing cores and the fixed-function hardware units are configured to implement a configurable number of virtual pipelines. Each virtual pipeline includes a configurable number of fragments and an operational state of each virtual pipeline is specified by a different context. The configurable number of virtual pipelines can be modified from a first number to a second number that is different than the first number. An emulation of a fixed-function hardware unit can be instantiated on one or more of the graphics processing cores in response to detection of a bottleneck in a fixed-function hardware unit. One or more of the virtual pipelines can then be reconfigured to utilize the emulation instead of the fixed-function hardware unit.

Type: Application

Filed: February 25, 2025

Publication date: June 12, 2025

Inventors: Timour T. PALTASHEV, Michael MANTOR, Rex Eldon MCCRARY
CROSS GPU SCHEDULING OF DEPENDENT PROCESSES

Publication number: 20240394829

Abstract: A primary processing unit includes queues configured to store commands prior to execution in corresponding pipelines. The primary processing unit also includes a first table configured to store entries indicating dependencies between commands that are to be executed on different ones of a plurality of processing units that include the primary processing unit and one or more secondary processing units. The primary processing unit also includes a scheduler configured to release commands in response to resolution of the dependencies. In some cases, a first one of the secondary processing units schedules the first command for execution in response to resolution of a dependency on a second command executing in a second one of the secondary processing units. The second one of the secondary processing units notifies the primary processing unit in response to completing execution of the second command.

Type: Application

Filed: May 29, 2024

Publication date: November 28, 2024

Inventor: Rex Eldon MCCRARY
DISTRIBUTED USER MODE PROCESSING

Publication number: 20240320042

Abstract: A first processing unit such as a graphics processing unit (GPU) pipelines that execute commands and a scheduler to schedule one or more first commands for execution by one or more of the pipelines. The one or more first commands are received from a user mode driver in a second processing unit such as a central processing unit (CPU). The scheduler schedules one or more second commands for execution in response to completing execution of the one or more first commands and without notifying the second processing unit. In some cases, the first processing unit includes a direct memory access (DMA) engine that writes blocks of information from the first processing unit to a memory. The one or more second commands program the DMA engine to write a block of information including results generated by executing the one or more first commands.

Type: Application

Filed: March 6, 2024

Publication date: September 26, 2024

Inventor: Rex Eldon MCCRARY
DISTRIBUTED USER MODE PROCESSING

Publication number: 20230094639

Abstract: A first processing unit such as a graphics processing unit (GPU) pipelines that execute commands and a scheduler to schedule one or more first commands for execution by one or more of the pipelines. The one or more first commands are received from a user mode driver in a second processing unit such as a central processing unit (CPU). The scheduler schedules one or more second commands for execution in response to completing execution of the one or more first commands and without notifying the second processing unit. In some cases, the first processing unit includes a direct memory access (DMA) engine that writes blocks of information from the first processing unit to a memory. The one or more second commands program the DMA engine to write a block of information including results generated by executing the one or more first commands.

Type: Application

Filed: September 16, 2022

Publication date: March 30, 2023

Inventor: Rex Eldon MCCRARY
THROTTLING SHADERS BASED ON RESOURCE USAGE IN A GRAPHICS PIPELINE

Publication number: 20220188963

Abstract: A processing system includes a graphics pipeline that executes a first shader of a first type and a second shader of a second type. In some cases, the first shader is a geometry shader and the second shader is a pixel shader. The processing system also includes buffers that hold primitives generated by the first shader and provide the primitives to the second shader. The processing system also includes a primitive hub that monitors fullness of the buffers. Launching of waves from the first shader is throttled based on the fullness of the buffers. A shader processor input (SPI) selectively throttles the waves launched by the geometry shader based on a signal from the primitive hub indicating the fullness, an indication of relative resource usage of geometry waves and pixel waves in the graphics pipeline, or an indication of lifetimes of the geometry waves.

Type: Application

Filed: December 16, 2020

Publication date: June 16, 2022

Inventors: Nishank PATHAK, Randy Wayne RAMSEY, Tad LITWILLER, Rex Eldon MCCRARY
PREFETCHING FROM INDIRECT BUFFERS AT A PROCESSING UNIT

Publication number: 20220091847

Abstract: In response to executing a specified command packet, a processing unit prefetches commands stored at an indirect buffer a command queue for execution, prior to executing a command that initiates execution of the commands stored at the indirect buffer. By prefetching the data prior to executing the indirect buffer execution command, the processing unit reduces delays in processing the commands stored at the indirect buffer.

Type: Application

Filed: September 23, 2020

Publication date: March 24, 2022

Inventors: Alexander Fuad ASHKAR, Harry J. WISE, Rex Eldon MCCRARY, Hans FERNLUND
DYNAMIC TRANSPARENT RECONFIGURATION OF A MULTI-TENANT GRAPHICS PROCESSING UNIT

Publication number: 20210272229

Abstract: An apparatus such as a graphics processing unit (GPU) includes shader engines and front end (FE) circuits. Subsets of the FE circuits are configured to schedule commands for execution on corresponding subsets of the shader engines. The apparatus also includes a set of physical paths configured to convey information from the FE circuits to a memory via the shader engines. Subsets of the physical paths are allocated to the subsets of the FE circuits and the corresponding subsets of the shader engines. The apparatus further includes a scheduler configured to receive a reconfiguration request and modify the set of physical paths based on the reconfiguration request. In some cases, the reconfiguration request is provided by a central processing unit (CPU) that requests the modification based on characteristics of applications generating the commands.

Type: Application

Filed: February 28, 2020

Publication date: September 2, 2021

Inventor: Rex Eldon MCCRARY
FULLY UTILIZED HARDWARE IN A MULTI-TENANCY GRAPHICS PROCESSING UNIT

Publication number: 20210272347

Abstract: An apparatus such as a graphics processing unit (GPU) includes a set of shader engines and a set of front end (FE) circuits. Subsets of the set of FE circuits schedule geometry workloads for subsets of the set of shader engines based on a mapping. The apparatus also includes a set of physical paths that convey information from the set of FE circuits to a memory via the set of shader engines. Subsets of the set of physical paths are allocated to the subsets of the set of FE circuits and the subsets of the set of shader engines based on the mapping. The mapping determines information stored in a set of registers used to configure the apparatus. In some cases, the set of registers store information indicating a spatial partitioning of the set of physical paths.

Type: Application

Filed: February 28, 2020

Publication date: September 2, 2021

Inventor: Rex Eldon MCCRARY
CROSS GPU SCHEDULING OF DEPENDENT PROCESSES

Publication number: 20210192672

Abstract: A primary processing unit includes queues configured to store commands prior to execution in corresponding pipelines. The primary processing unit also includes a first table configured to store entries indicating dependencies between commands that are to be executed on different ones of a plurality of processing units that include the primary processing unit and one or more secondary processing units. The primary processing unit also includes a scheduler configured to release commands in response to resolution of the dependencies. In some cases, a first one of the secondary processing units schedules the first command for execution in response to resolution of a dependency on a second command executing in a second one of the secondary processing units. The second one of the secondary processing units notifies the primary processing unit in response to completing execution of the second command.

Type: Application

Filed: December 19, 2019

Publication date: June 24, 2021

Inventor: Rex Eldon MCCRARY
AGGREGATED DOORBELLS FOR UNMAPPED QUEUES IN A GRAPHICS PROCESSING UNIT

Publication number: 20210191730

Abstract: A processing system includes a set of queues to store command buffers prior to execution in a corresponding plurality of pipelines. The processing system also includes one or more first doorbells and a second doorbell. The first doorbells map to one or more queues in the set of queues on a one-to-one basis. The second doorbell maps to a subset of the set of queues on a one-to-many basis. A doorbell monitor generates an interrupt in response to an empty queue in the subset becoming a non-empty queue. A scheduler polls the subset in response to the interrupt. The scheduler schedules a command buffer from the non-empty queue for execution or adds the command buffer to a pool for subsequent execution.

Type: Application

Filed: December 19, 2019

Publication date: June 24, 2021

Inventor: Rex Eldon MCCRARY
GANG SCHEDULING WITH AN ONBOARD GRAPHICS PROCESSING UNIT AND USER-BASED QUEUES

Publication number: 20210191793

Abstract: A processing unit such as a graphics processing unit (GPU) includes a set of queues that stores command buffers prior to execution in a corresponding plurality of pipelines. The processing unit also implements a kernel mode driver that allocates a first subset of the set of queues to a first application in response to receiving registration requests from the first application. The processing unit further includes a scheduler that schedules command buffers in the first subset of the set of queues for concurrent execution on a first subset of the set of pipelines. In some cases, an interrupt is generated in response to execution of a first command in a first command buffer in the first queue or the second queue. The interrupt includes an address indicating a location of a routine to be executed by a second subset of the plurality of pipelines.

Type: Application

Filed: December 19, 2019

Publication date: June 24, 2021

Inventor: Rex Eldon MCCRARY
DISTRIBUTED USER MODE PROCESSING

Publication number: 20210191771

Abstract: A first processing unit such as a graphics processing unit (GPU) pipelines that execute commands and a scheduler to schedule one or more first commands for execution by one or more of the pipelines. The one or more first commands are received from a user mode driver in a second processing unit such as a central processing unit (CPU). The scheduler schedules one or more second commands for execution in response to completing execution of the one or more first commands and without notifying the second processing unit. In some cases, the first processing unit includes a direct memory access (DMA) engine that writes blocks of information from the first processing unit to a memory. The one or more second commands program the DMA engine to write a block of information including results generated by executing the one or more first commands.

Type: Application

Filed: December 19, 2019

Publication date: June 24, 2021

Inventor: Rex Eldon MCCRARY
RECONFIGURABLE VIRTUAL GRAPHICS AND COMPUTE PROCESSOR PIPELINE

Publication number: 20210049729

Abstract: A graphics processing unit (GPU) includes a plurality of programmable processing cores configured to process graphics primitives and corresponding data and a plurality of fixed-function hardware units. The plurality of processing cores and the plurality of fixed-function hardware units are configured to implement a configurable number of virtual pipelines to concurrently process different command flows. Each virtual pipeline includes a configurable number of fragments and an operational state of each virtual pipeline is specified by a different context. The configurable number of virtual pipelines can be modified from a first number to a second number that is different than the first number. An emulation of a fixed-function hardware unit can be instantiated on one or more of the graphics processing cores in response to detection of a bottleneck in a fixed-function hardware unit. One or more of the virtual pipelines can then be reconfigured to utilize the emulation instead of the fixed-function hardware unit.

Type: Application

Filed: May 21, 2020

Publication date: February 18, 2021

Inventors: Timour T. PALTASHEV, Michael MANTOR, Rex Eldon MCCRARY
GRAPHICS CONTEXT BOUNCING

Publication number: 20200379767

Abstract: A method of context bouncing includes receiving, at a command processor of a graphics processing unit (GPU), a conditional execute packet providing a hash identifier corresponding to an encapsulated state. The encapsulated state includes one or more context state packets following the conditional execute packet. A command packet following the encapsulated state is executed based at least in part on determining whether the hash identifier of the encapsulated state matches one of a plurality of hash identifiers of active context states currently stored at the GPU.

Type: Application

Filed: May 30, 2019

Publication date: December 3, 2020

Inventors: Rex Eldon MCCRARY, Yi LUO, Harry J. WISE, Alexander Fuad ASHKAR, Michael MANTOR
PROCESSOR MICROCODE WITH EMBEDDED JUMP TABLE

Publication number: 20200379792

Abstract: A processing unit employs microcode wherein the jump table associated with the microcode is embedded in the microcode itself. When the microcode is compiled based on a set of programmer instructions, the compiler prepares the jump table for the microcode and stores the jump table in the same file or other storage unit as the microcode. When the processing unit is initialized to execute a program, such as an operating system, the processing unit retrieves the microcode corresponding to the program from memory, stores the microcode in a cache or other memory module for execution, and automatically loads the embedded jump table from the microcode to a specified set of jump table registers, thereby preparing the processing unit to process received packets.

Type: Application

Filed: May 31, 2019

Publication date: December 3, 2020

Inventors: Alexander Fuad ASHKAR, Rakan KHRAISHA, Rex Eldon MCCRARY, Harry J. WISE
Efficient State Management System

Publication number: 20090172677

Abstract: The present invention provides an efficient state management system for a complex ASIC, and applications thereof. In an embodiment, a computer-based system executes state-dependent processes. The computer-based system includes a command processor (CP) and a plurality of processing blocks. The CP receives commands in a command stream and manages a global state responsive to global context events in the command stream. The plurality of processing blocks receive the commands in the command stream and manage respective block states responsive to block context events in the command stream. Each respective processing block executes a process on data in a data stream based on the global state and the block state of the respective processing block.

Type: Application

Filed: December 22, 2008

Publication date: July 2, 2009

Applicant: Advanced Micro Devices, Inc.

Inventors: Michael MANTOR, Rex Eldon MCCRARY