Patents by Inventor Rex Eldon MCCRARY

Rex Eldon MCCRARY has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11144329
    Abstract: A processing unit employs microcode wherein the jump table associated with the microcode is embedded in the microcode itself. When the microcode is compiled based on a set of programmer instructions, the compiler prepares the jump table for the microcode and stores the jump table in the same file or other storage unit as the microcode. When the processing unit is initialized to execute a program, such as an operating system, the processing unit retrieves the microcode corresponding to the program from memory, stores the microcode in a cache or other memory module for execution, and automatically loads the embedded jump table from the microcode to a specified set of jump table registers, thereby preparing the processing unit to process received packets.
    Type: Grant
    Filed: May 31, 2019
    Date of Patent: October 12, 2021
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Alexander Fuad Ashkar, Rakan Khraisha, Rex Eldon McCrary, Harry J. Wise
  • Patent number: 11132204
    Abstract: A processing system includes a set of queues to store command buffers prior to execution in a corresponding plurality of pipelines. The processing system also includes one or more first doorbells and a second doorbell. The first doorbells map to one or more queues in the set of queues on a one-to-one basis. The second doorbell maps to a subset of the set of queues on a one-to-many basis. A doorbell monitor generates an interrupt in response to an empty queue in the subset becoming a non-empty queue. A scheduler polls the subset in response to the interrupt. The scheduler schedules a command buffer from the non-empty queue for execution or adds the command buffer to a pool for subsequent execution.
    Type: Grant
    Filed: December 19, 2019
    Date of Patent: September 28, 2021
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventor: Rex Eldon McCrary
  • Publication number: 20210272229
    Abstract: An apparatus such as a graphics processing unit (GPU) includes shader engines and front end (FE) circuits. Subsets of the FE circuits are configured to schedule commands for execution on corresponding subsets of the shader engines. The apparatus also includes a set of physical paths configured to convey information from the FE circuits to a memory via the shader engines. Subsets of the physical paths are allocated to the subsets of the FE circuits and the corresponding subsets of the shader engines. The apparatus further includes a scheduler configured to receive a reconfiguration request and modify the set of physical paths based on the reconfiguration request. In some cases, the reconfiguration request is provided by a central processing unit (CPU) that requests the modification based on characteristics of applications generating the commands.
    Type: Application
    Filed: February 28, 2020
    Publication date: September 2, 2021
    Inventor: Rex Eldon MCCRARY
  • Publication number: 20210272347
    Abstract: An apparatus such as a graphics processing unit (GPU) includes a set of shader engines and a set of front end (FE) circuits. Subsets of the set of FE circuits schedule geometry workloads for subsets of the set of shader engines based on a mapping. The apparatus also includes a set of physical paths that convey information from the set of FE circuits to a memory via the set of shader engines. Subsets of the set of physical paths are allocated to the subsets of the set of FE circuits and the subsets of the set of shader engines based on the mapping. The mapping determines information stored in a set of registers used to configure the apparatus. In some cases, the set of registers store information indicating a spatial partitioning of the set of physical paths.
    Type: Application
    Filed: February 28, 2020
    Publication date: September 2, 2021
    Inventor: Rex Eldon MCCRARY
  • Publication number: 20210191793
    Abstract: A processing unit such as a graphics processing unit (GPU) includes a set of queues that stores command buffers prior to execution in a corresponding plurality of pipelines. The processing unit also implements a kernel mode driver that allocates a first subset of the set of queues to a first application in response to receiving registration requests from the first application. The processing unit further includes a scheduler that schedules command buffers in the first subset of the set of queues for concurrent execution on a first subset of the set of pipelines. In some cases, an interrupt is generated in response to execution of a first command in a first command buffer in the first queue or the second queue. The interrupt includes an address indicating a location of a routine to be executed by a second subset of the plurality of pipelines.
    Type: Application
    Filed: December 19, 2019
    Publication date: June 24, 2021
    Inventor: Rex Eldon MCCRARY
  • Publication number: 20210191771
    Abstract: A first processing unit such as a graphics processing unit (GPU) pipelines that execute commands and a scheduler to schedule one or more first commands for execution by one or more of the pipelines. The one or more first commands are received from a user mode driver in a second processing unit such as a central processing unit (CPU). The scheduler schedules one or more second commands for execution in response to completing execution of the one or more first commands and without notifying the second processing unit. In some cases, the first processing unit includes a direct memory access (DMA) engine that writes blocks of information from the first processing unit to a memory. The one or more second commands program the DMA engine to write a block of information including results generated by executing the one or more first commands.
    Type: Application
    Filed: December 19, 2019
    Publication date: June 24, 2021
    Inventor: Rex Eldon MCCRARY
  • Publication number: 20210192672
    Abstract: A primary processing unit includes queues configured to store commands prior to execution in corresponding pipelines. The primary processing unit also includes a first table configured to store entries indicating dependencies between commands that are to be executed on different ones of a plurality of processing units that include the primary processing unit and one or more secondary processing units. The primary processing unit also includes a scheduler configured to release commands in response to resolution of the dependencies. In some cases, a first one of the secondary processing units schedules the first command for execution in response to resolution of a dependency on a second command executing in a second one of the secondary processing units. The second one of the secondary processing units notifies the primary processing unit in response to completing execution of the second command.
    Type: Application
    Filed: December 19, 2019
    Publication date: June 24, 2021
    Inventor: Rex Eldon MCCRARY
  • Publication number: 20210191730
    Abstract: A processing system includes a set of queues to store command buffers prior to execution in a corresponding plurality of pipelines. The processing system also includes one or more first doorbells and a second doorbell. The first doorbells map to one or more queues in the set of queues on a one-to-one basis. The second doorbell maps to a subset of the set of queues on a one-to-many basis. A doorbell monitor generates an interrupt in response to an empty queue in the subset becoming a non-empty queue. A scheduler polls the subset in response to the interrupt. The scheduler schedules a command buffer from the non-empty queue for execution or adds the command buffer to a pool for subsequent execution.
    Type: Application
    Filed: December 19, 2019
    Publication date: June 24, 2021
    Inventor: Rex Eldon MCCRARY
  • Patent number: 10955901
    Abstract: Systems, apparatuses, and methods for dynamically adjusting the power consumption of prefetch engines are disclosed. In one embodiment, a processor includes one or more prefetch engines, a draw completion engine, and a queue in between the one or more prefetch engines and the draw completion engine. If the number of packets stored in the queue is greater than a high watermark, then the processor reduces the power state of the prefetch engine(s). By decreasing the power state of the prefetch engine(s), power consumption is reduced. Additionally, this power consumption reduction is achieved without affecting performance, since the queue has a high occupancy and the draw completion engine can continue to read packets out of the queue. If the number of packets stored in the queue is less than a low watermark, then the processor increases the power state of the prefetch engine(s).
    Type: Grant
    Filed: September 29, 2017
    Date of Patent: March 23, 2021
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Alexander Fuad Ashkar, Angel E. Socarras, Rex Eldon McCrary
  • Publication number: 20210049729
    Abstract: A graphics processing unit (GPU) includes a plurality of programmable processing cores configured to process graphics primitives and corresponding data and a plurality of fixed-function hardware units. The plurality of processing cores and the plurality of fixed-function hardware units are configured to implement a configurable number of virtual pipelines to concurrently process different command flows. Each virtual pipeline includes a configurable number of fragments and an operational state of each virtual pipeline is specified by a different context. The configurable number of virtual pipelines can be modified from a first number to a second number that is different than the first number. An emulation of a fixed-function hardware unit can be instantiated on one or more of the graphics processing cores in response to detection of a bottleneck in a fixed-function hardware unit. One or more of the virtual pipelines can then be reconfigured to utilize the emulation instead of the fixed-function hardware unit.
    Type: Application
    Filed: May 21, 2020
    Publication date: February 18, 2021
    Inventors: Timour T. PALTASHEV, Michael MANTOR, Rex Eldon MCCRARY
  • Publication number: 20210011760
    Abstract: Systems, apparatuses, and methods for abstracting tasks in virtual memory identifier (VMID) containers are disclosed. A processor coupled to a memory executes a plurality of concurrent tasks including a first task. Responsive to detecting one or more instructions of the first task which correspond to a first operation, the processor retrieves a first identifier (ID) which is used to uniquely identify the first task, wherein the first ID is transparent to the first task. Then, the processor maps the first ID to a second ID and/or a third ID. The processor completes the first operation by using the second ID and/or the third ID to identify the first task to at least a first data structure. In one implementation, the first operation is a memory access operation and the first data structure is a set of page tables. Also, in one implementation, the second ID identifies a first application of the first task and the third ID identifies a first operating system (OS) of the first task.
    Type: Application
    Filed: July 24, 2020
    Publication date: January 14, 2021
    Inventors: Anirudh R. Acharya, Michael J. Mantor, Rex Eldon McCrary, Anthony Asaro, Jeffrey Gongxian Cheng, Mark Fowler
  • Publication number: 20200379792
    Abstract: A processing unit employs microcode wherein the jump table associated with the microcode is embedded in the microcode itself. When the microcode is compiled based on a set of programmer instructions, the compiler prepares the jump table for the microcode and stores the jump table in the same file or other storage unit as the microcode. When the processing unit is initialized to execute a program, such as an operating system, the processing unit retrieves the microcode corresponding to the program from memory, stores the microcode in a cache or other memory module for execution, and automatically loads the embedded jump table from the microcode to a specified set of jump table registers, thereby preparing the processing unit to process received packets.
    Type: Application
    Filed: May 31, 2019
    Publication date: December 3, 2020
    Inventors: Alexander Fuad ASHKAR, Rakan KHRAISHA, Rex Eldon MCCRARY, Harry J. WISE
  • Publication number: 20200379767
    Abstract: A method of context bouncing includes receiving, at a command processor of a graphics processing unit (GPU), a conditional execute packet providing a hash identifier corresponding to an encapsulated state. The encapsulated state includes one or more context state packets following the conditional execute packet. A command packet following the encapsulated state is executed based at least in part on determining whether the hash identifier of the encapsulated state matches one of a plurality of hash identifiers of active context states currently stored at the GPU.
    Type: Application
    Filed: May 30, 2019
    Publication date: December 3, 2020
    Inventors: Rex Eldon MCCRARY, Yi LUO, Harry J. WISE, Alexander Fuad ASHKAR, Michael MANTOR
  • Patent number: 10725822
    Abstract: Systems, apparatuses, and methods for abstracting tasks in virtual memory identifier (VMID) containers are disclosed. A processor coupled to a memory executes a plurality of concurrent tasks including a first task. Responsive to detecting one or more instructions of the first task which correspond to a first operation, the processor retrieves a first identifier (ID) which is used to uniquely identify the first task, wherein the first ID is transparent to the first task. Then, the processor maps the first ID to a second ID and/or a third ID. The processor completes the first operation by using the second ID and/or the third ID to identify the first task to at least a first data structure. In one implementation, the first operation is a memory access operation and the first data structure is a set of page tables. Also, in one implementation, the second ID identifies a first application of the first task and the third ID identifies a first operating system (OS) of the first task.
    Type: Grant
    Filed: July 31, 2018
    Date of Patent: July 28, 2020
    Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: Anirudh R. Acharya, Michael J. Mantor, Rex Eldon McCrary, Anthony Asaro, Jeffrey Gongxian Cheng, Mark Fowler
  • Patent number: 10664942
    Abstract: A graphics processing unit (GPU) includes a plurality of programmable processing cores configured to process graphics primitives and corresponding data and a plurality of fixed-function hardware units. The plurality of processing cores and the plurality of fixed-function hardware units are configured to implement a configurable number of virtual pipelines to concurrently process different command flows. Each virtual pipeline includes a configurable number of fragments and an operational state of each virtual pipeline is specified by a different context. The configurable number of virtual pipelines can be modified from a first number to a second number that is different than the first number. An emulation of a fixed-function hardware unit can be instantiated on one or more of the graphics processing cores in response to detection of a bottleneck in a fixed-function hardware unit. One or more of the virtual pipelines can then be reconfigured to utilize the emulation instead of the fixed-function hardware unit.
    Type: Grant
    Filed: October 21, 2016
    Date of Patent: May 26, 2020
    Assignee: ADVANCED MICRO DEVICES, INC.
    Inventors: Timour T. Paltashev, Michael Mantor, Rex Eldon McCrary
  • Patent number: 10558489
    Abstract: Systems, apparatuses, and methods for suspending and restoring operations on a processor are disclosed. In one embodiment, a processor includes at least a control unit, multiple execution units, and multiple work creation units. In response to detecting a request to suspend a software application executing on the processor, the control unit sends requests to the plurality of work creation units to stop creating new work. The control unit waits until receiving acknowledgements from the work creation units prior to initiating a suspend operation. Once all work creation units have acknowledged that they have stopped creating new work, the control unit initiates the suspend operation. Also, when a restore operation is initiated, the control unit prevents any work creation units from launching new work-items until all previously in-flight work-items have been restored to the same work creation units and execution units to which they were previously allocated.
    Type: Grant
    Filed: February 21, 2017
    Date of Patent: February 11, 2020
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Alexander Fuad Ashkar, Michael J. Mantor, Randy Wayne Ramsey, Rex Eldon McCrary, Harry J. Wise
  • Publication number: 20200042348
    Abstract: Systems, apparatuses, and methods for abstracting tasks in virtual memory identifier (VMID) containers are disclosed. A processor coupled to a memory executes a plurality of concurrent tasks including a first task. Responsive to detecting one or more instructions of the first task which correspond to a first operation, the processor retrieves a first identifier (ID) which is used to uniquely identify the first task, wherein the first ID is transparent to the first task. Then, the processor maps the first ID to a second ID and/or a third ID. The processor completes the first operation by using the second ID and/or the third ID to identify the first task to at least a first data structure. In one implementation, the first operation is a memory access operation and the first data structure is a set of page tables. Also, in one implementation, the second ID identifies a first application of the first task and the third ID identifies a first operating system (OS) of the first task.
    Type: Application
    Filed: July 31, 2018
    Publication date: February 6, 2020
    Inventors: Anirudh R. Acharya, Michael J. Mantor, Rex Eldon McCrary, Anthony Asaro, Jeffrey Gongxian Cheng, Mark Fowler
  • Publication number: 20190129754
    Abstract: A system and method for providing security of sensitive information within chips using SIMD micro-architecture are described. A command processor within a parallel data processing unit, such as a graphics processing unit (GPU), schedules commands across multiple compute units based on state information. When the command processor determines a rescheduling condition is satisfied, it causes the overwriting of at least a portion of data stored in each of the one or more local memories used by the multiple compute units. The command processor also stores in the secure memory a copy of state information associated with a given group of commands and later checks it to ensure corruption by a malicious or careless program is prevented.
    Type: Application
    Filed: August 27, 2018
    Publication date: May 2, 2019
    Inventor: Rex Eldon McCrary
  • Publication number: 20190101973
    Abstract: Systems, apparatuses, and methods for dynamically adjusting the power consumption of prefetch engines are disclosed. In one embodiment, a processor includes one or more prefetch engines, a draw completion engine, and a queue in between the one or more prefetch engines and the draw completion engine. If the number of packets stored in the queue is greater than a high watermark, then the processor reduces the power state of the prefetch engine(s). By decreasing the power state of the prefetch engine(s), power consumption is reduced. Additionally, this power consumption reduction is achieved without affecting performance, since the queue has a high occupancy and the draw completion engine can continue to read packets out of the queue. If the number of packets stored in the queue is less than a low watermark, then the processor increases the power state of the prefetch engine(s).
    Type: Application
    Filed: September 29, 2017
    Publication date: April 4, 2019
    Inventors: Alexander Fuad Ashkar, Angel E. Socarras, Rex Eldon McCrary
  • Patent number: 10198849
    Abstract: Systems, apparatuses, and methods for preloading caches using a direct memory access (DMA) engine with a fast discard mode are disclosed. In one embodiment, a processor includes one or more compute units, a DMA engine, and one or more caches. When a shader program is detected in a sequence of instructions, the DMA engine is programmed to utilize a fast discard mode to prefetch the shader program from memory. By prefetching the shader program from memory, the one or more caches are populated with address translations and the shader program. Then, the DMA engine discards the shader program rather than writing the shader program to another location. Accordingly, when the shader program is invoked on the compute unit(s), the shader program and its translations are already preloaded in the cache(s).
    Type: Grant
    Filed: October 26, 2017
    Date of Patent: February 5, 2019
    Assignee: Advanced Micro Devices, Inc.
    Inventors: Alexander Fuad Ashkar, Rex Eldon McCrary, Harry J. Wise