Patents by Inventor Michael J. Mantor

Michael J. Mantor has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Graphics primitives and positions through memory buffers

Patent number: 12169896

Abstract: Systems, apparatuses, and methods for preemptively reserving buffer space for primitives and positions in a graphics pipeline are disclosed. A system includes a graphics pipeline frontend with any number of geometry engines coupled to corresponding shader engines. Each geometry engine launches shader wavefronts to execute on a corresponding shader engine. The geometry engine preemptively reserves buffer space for each wavefront prior to the wavefront being launched on the shader engine. When the shader engine executes a wavefront, the shader engine exports primitive and position data to the reserved buffer space. Multiple scan converters will consume the primitive and position data, with each scan converter consuming primitive and position data based on the screen coverage of the scan converter. After consuming the primitive and position data, the scan converters mark the buffer space as freed so that the geometry engine can then allocate the freed buffer space to subsequent shader wavefronts.

Type: Grant

Filed: September 29, 2021

Date of Patent: December 17, 2024

Assignee: Advanced Micro Devices, Inc.

Inventors: Todd Martin, Tad Robert Litwiller, Nishank Pathak, Randy Wayne Ramsey, Michael J. Mantor, Christopher J. Brennan, Mark M. Leather, Ryan James Cash
VMID as a GPU task container for virtualization

Patent number: 12153958

Abstract: Systems, apparatuses, and methods for abstracting tasks in virtual memory identifier (VMID) containers are disclosed. A processor coupled to a memory executes a plurality of concurrent tasks including a first task. Responsive to detecting one or more instructions of the first task which correspond to a first operation, the processor retrieves a first identifier (ID) which is used to uniquely identify the first task, wherein the first ID is transparent to the first task. Then, the processor maps the first ID to a second ID and/or a third ID. The processor completes the first operation by using the second ID and/or the third ID to identify the first task to at least a first data structure. In one implementation, the first operation is a memory access operation and the first data structure is a set of page tables. Also, in one implementation, the second ID identifies a first application of the first task and the third ID identifies a first operating system (OS) of the first task.

Type: Grant

Filed: October 7, 2022

Date of Patent: November 26, 2024

Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC

Inventors: Anirudh R. Acharya, Michael J. Mantor, Rex Eldon McCrary, Anthony Asaro, Jeffrey Gongxian Cheng, Mark Fowler
WAVE LEVEL MATRIX MULTIPLY INSTRUCTIONS

Publication number: 20240329998

Abstract: An apparatus and method for efficiently processing multiplication and accumulate operations for matrices in applications. In various implementations, a computing system includes a parallel data processing circuit and a memory. The memory stores the instructions (or translated commands) of a parallel data application. The circuitry of the parallel data processing circuit performs a matrix multiplication operation using source operands accessed only once from a vector register file and multiple instantiations of a vector processing circuit capable of performing multiple matrix multiplication operations corresponding to multiple different types of instructions. The multiplier circuit and the adder circuit of the vector processing circuit perform each of the fused multiply add (FMA) operation and the dot product (inner product) operation without independent, dedicated execution pipelines with one execution pipeline for the FMA operation and the other separate execution pipeline for the dot product operation.

Type: Application

Filed: March 28, 2024

Publication date: October 3, 2024

Inventors: Bin He, Michael J. Mantor, Brian D. Emberling
Stream processor with low power parallel matrix multiply pipeline

Patent number: 12067401

Abstract: Systems, apparatuses, and methods for implementing a low power parallel matrix multiply pipeline are disclosed. In one embodiment, a system includes at least first and second vector register files coupled to a matrix multiply pipeline. The matrix multiply pipeline comprises a plurality of dot product units. The dot product units are configured to calculate dot or outer products for first and second sets of operands retrieved from the first vector register file. The results of the dot or outer product operations are written back to the second vector register file. The second vector register file provides the results from the previous dot or outer product operations as inputs to subsequent dot or outer product operations. The dot product units receive the results from previous phases of the matrix multiply operation and accumulate these previous dot or outer product results with the current dot or outer product results.

Type: Grant

Filed: December 27, 2017

Date of Patent: August 20, 2024

Assignee: Advanced Micro Devices, Inc.

Inventors: Jiasheng Chen, Yunxiao Zou, Michael J. Mantor, Allen Rush
Inclusion of Dedicated Accelerators in Graph Nodes

Publication number: 20240202003

Abstract: Systems, apparatuses, and methods for implementing a hierarchical scheduling in fixed-function graphics pipeline are disclosed. In various implementations, a processor includes a pipeline comprising a plurality of fixed-function units and a scheduler. The scheduler is configured to schedule a first operation for execution by one or more fixed-function units of the pipeline by scheduling the first operation with a first unit of the pipeline, responsive to a first mode of operation and schedule a second operation for execution by a selected fixed-function unit of the pipeline by scheduling the second operation directly to the selected fixed-function unit, independent of a sequential arrangement of the one or more fixed-function units in the pipeline, responsive to a second mode of operation.

Type: Application

Filed: December 14, 2022

Publication date: June 20, 2024

Inventors: Matthäus G. Chajdas, Michael J. Mantor, Rex Eldon McCrary, Christopher J. Brennan, Robert Martin, Brian Kenneth Bennett
Work Graph Scheduler Implementation

Publication number: 20240111574

Abstract: Systems, apparatuses, and methods for implementing a hierarchical scheduler. In various implementations, a processor includes a global scheduler, and a plurality of independent local schedulers with each of the local schedulers coupled to a plurality of processors. In one implementation, the processor is a graphics processing unit and the processors are computation units. The processor further includes a shared cache that is shared by the plurality of local schedulers. Each of the local schedulers also includes a local cache used by the local scheduler and processors coupled to the local scheduler. To schedule work items for execution, the global scheduler is configured to store one or more work items in the shared cache and convey an indication to a first local scheduler of the plurality of local schedulers which causes the first local scheduler to retrieve the one or more work items from the shared cache.

Type: Application

Filed: September 29, 2022

Publication date: April 4, 2024

Inventors: Matthäus G. Chajdas, Michael J. Mantor, Rex Eldon McCrary, Christopher J. Brennan, Robert Martin, Dominik Baumeister, Fabian Robert Sebastian Wildgrube
Synchronization Method for Low Latency Communication for Efficient Scheduling

Publication number: 20240111575

Abstract: Systems, apparatuses, and methods for implementing a message passing system to schedule work in a computing system. In various implementations, a processor includes a global scheduler, and a plurality of local schedulers with each of the local schedulers coupled to a plurality of processors. The processor further includes a shared cache that is shared by the plurality of local schedulers. Also, a plurality of mailboxes are implemented to enable communication between the local schedulers and the global scheduler. To schedule work items for execution, the global scheduler is configured to store one or more work items in the shared cache and store an indication in a mailbox for a first local scheduler of the plurality of local schedulers. Responsive to detecting the message in the mailbox, the first local scheduler identifies a location of the one or more work items in the shared cache and retrieves them for scheduling locally.

Type: Application

Filed: September 29, 2022

Publication date: April 4, 2024

Inventors: Matthäus G. Chajdas, Michael J. Mantor, Rex Eldon McCrary, Christopher J. Brennan, Robert Martin, Dominik Baumeister, Fabian Robert Sebastian Wildgrube
Redundancy method and apparatus for shader column repair

Patent number: 11948223

Abstract: Methods and systems are described. A system includes a redundant shader pipe array that performs rendering calculations on data provided thereto and a shader pipe array that includes a plurality of shader pipes, each of which performs rendering calculations on data provided thereto. The system also includes a circuit that identifies a defective shader pipe of the plurality of shader pipes in the shader pipe array. In response to identifying the defective shader pipe, the circuit generates a signal. The system also includes a redundant shader switch. The redundant shader switch receives the generated signal, and, in response to receiving the generated signal, transfers the data for the defective shader pipe to the redundant shader pipe array.

Type: Grant

Filed: July 11, 2022

Date of Patent: April 2, 2024

Assignee: Advanced Micro Devices, Inc.

Inventors: Michael J. Mantor, Jeffrey T. Brady, Angel E. Socarras
Packed 16 bits instruction pipeline

Patent number: 11880683

Abstract: Systems, apparatuses, and methods for efficiently processing arithmetic operations are disclosed. A computing system includes a processor capable of executing single precision mathematical instructions on data sizes of M bits and half precision mathematical instructions on data sizes of N bits, which is less than M bits. At least two source operands with M bits indicated by a received instruction are read from a register file. If the instruction is a packed math instruction, at least a first source operand with a size of N bits less than M bits is selected from either a high portion or a low portion of one of the at least two source operands read from the register file. The instruction includes fields storing bits, each bit indicating the high portion or the low portion of a given source operand associated with a register identifier specified elsewhere in the instruction.

Type: Grant

Filed: October 31, 2017

Date of Patent: January 23, 2024

Assignee: Advanced micro devices, inc.

Inventors: Jiasheng Chen, Bin He, Yunxiao Zou, Michael J. Mantor, Radhakrishna Giduthuri, Eric J. Finger, Brian D. Emberling
Low power and low latency GPU coprocessor for persistent computing

Patent number: 11625807

Abstract: Systems, apparatuses, and methods for implementing a graphics processing unit (GPU) coprocessor are disclosed. The GPU coprocessor includes a SIMD unit with the ability to self-schedule sub-wave procedures based on input data flow events. A host processor sends messages targeting the GPU coprocessor to a queue. In response to detecting a first message in the queue, the GPU coprocessor schedules a first sub-task for execution. The GPU coprocessor includes an inter-lane crossbar and intra-lane biased indexing mechanism for a vector general purpose register (VGPR) file. The VGPR file is split into two files. The first VGPR file is a larger register file with one read port and one write port. The second VGPR file is a smaller register file with multiple read ports and one write port. The second VGPR introduces the ability to co-issue more than one instruction per clock cycle.

Type: Grant

Filed: February 22, 2021

Date of Patent: April 11, 2023

Assignee: Advanced Micro Devices, Inc.

Inventors: Jiasheng Chen, Timour Paltashev, Alexander Lyashevsky, Carl Kittredge Wakeland, Michael J. Mantor
GRAPHICS PRIMITIVES AND POSITIONS THROUGH MEMORY BUFFERS

Publication number: 20230097097

Abstract: Systems, apparatuses, and methods for preemptively reserving buffer space for primitives and positions in a graphics pipeline are disclosed. A system includes a graphics pipeline frontend with any number of geometry engines coupled to corresponding shader engines. Each geometry engine launches shader wavefronts to execute on a corresponding shader engine. The geometry engine preemptively reserves buffer space for each wavefront prior to the wavefront being launched on the shader engine. When the shader engine executes a wavefront, the shader engine exports primitive and position data to the reserved buffer space. Multiple scan converters will consume the primitive and position data, with each scan converter consuming primitive and position data based on the screen coverage of the scan converter. After consuming the primitive and position data, the scan converters mark the buffer space as freed so that the geometry engine can then allocate the freed buffer space to subsequent shader wavefronts.

Type: Application

Filed: September 29, 2021

Publication date: March 30, 2023

Inventors: Todd Martin, Tad Robert Litwiller, Nishank Pathak, Randy Wayne Ramsey, Michael J. Mantor, Christopher J. Brennan, Mark M. Leather, Ryan James Cash
VMID AS A GPU TASK CONTAINER FOR VIRTUALIZATION

Publication number: 20230055695

Abstract: Systems, apparatuses, and methods for abstracting tasks in virtual memory identifier (VMID) containers are disclosed. A processor coupled to a memory executes a plurality of concurrent tasks including a first task. Responsive to detecting one or more instructions of the first task which correspond to a first operation, the processor retrieves a first identifier (ID) which is used to uniquely identify the first task, wherein the first ID is transparent to the first task. Then, the processor maps the first ID to a second ID and/or a third ID. The processor completes the first operation by using the second ID and/or the third ID to identify the first task to at least a first data structure. In one implementation, the first operation is a memory access operation and the first data structure is a set of page tables. Also, in one implementation, the second ID identifies a first application of the first task and the third ID identifies a first operating system (OS) of the first task.

Type: Application

Filed: October 7, 2022

Publication date: February 23, 2023

Inventors: Anirudh R. Acharya, Michael J. Mantor, Rex Eldon McCrary, Anthony Asaro, Jeffrey Gongxian Cheng, Mark Fowler
REDUNDANCY METHOD AND APPARATUS FOR SHADER COLUMN REPAIR

Publication number: 20220343456

Abstract: Methods and systems are described. A system includes a redundant shader pipe array that performs rendering calculations on data provided thereto and a shader pipe array that includes a plurality of shader pipes, each of which performs rendering calculations on data provided thereto. The system also includes a circuit that identifies a defective shader pipe of the plurality of shader pipes in the shader pipe array. In response to identifying the defective shader pipe, the circuit generates a signal. The system also includes a redundant shader switch. The redundant shader switch receives the generated signal, and, in response to receiving the generated signal, transfers the data for the defective shader pipe to the redundant shader pipe array.

Type: Application

Filed: July 11, 2022

Publication date: October 27, 2022

Applicant: Advanced Micro Devices, Inc.

Inventors: Michael J. Mantor, Jeffrey T. Brady, Angel E. Socarras
VMID as a GPU task container for virtualization

Patent number: 11467870

Abstract: Systems, apparatuses, and methods for abstracting tasks in virtual memory identifier (VMID) containers are disclosed. A processor coupled to a memory executes a plurality of concurrent tasks including a first task. Responsive to detecting one or more instructions of the first task which correspond to a first operation, the processor retrieves a first identifier (ID) which is used to uniquely identify the first task, wherein the first ID is transparent to the first task. Then, the processor maps the first ID to a second ID and/or a third ID. The processor completes the first operation by using the second ID and/or the third ID to identify the first task to at least a first data structure. In one implementation, the first operation is a memory access operation and the first data structure is a set of page tables. Also, in one implementation, the second ID identifies a first application of the first task and the third ID identifies a first operating system (OS) of the first task.

Type: Grant

Filed: July 24, 2020

Date of Patent: October 11, 2022

Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC

Inventors: Anirudh R. Acharya, Michael J. Mantor, Rex Eldon McCrary, Anthony Asaro, Jeffrey Gongxian Cheng, Mark Fowler
Redundancy method and apparatus for shader column repair

Patent number: 11386520

Abstract: Methods and systems are described. A system includes a redundant shader pipe array that performs rendering calculations on data provided thereto and a shader pipe array that includes a plurality of shader pipes, each of which performs rendering calculations on data provided thereto. The system also includes a circuit that identifies a defective shader pipe of the plurality of shader pipes in the shader pipe array. In response to identifying the defective shader pipe, the circuit generates a signal. The system also includes a redundant shader switch. The redundant shader switch receives the generated signal, and, in response to receiving the generated signal, transfers the data for the defective shader pipe to the redundant shader pipe array.

Type: Grant

Filed: December 7, 2020

Date of Patent: July 12, 2022

Assignee: Advanced Micro Devices, Inc.

Inventors: Michael J. Mantor, Jeffrey T. Brady, Angel E. Socarras
LOW POWER AND LOW LATENCY GPU COPROCESSOR FOR PERSISTENT COMPUTING

Publication number: 20210201439

Abstract: Systems, apparatuses, and methods for implementing a graphics processing unit (GPU) coprocessor are disclosed. The GPU coprocessor includes a SIMD unit with the ability to self-schedule sub-wave procedures based on input data flow events. A host processor sends messages targeting the GPU coprocessor to a queue. In response to detecting a first message in the queue, the GPU coprocessor schedules a first sub-task for execution. The GPU coprocessor includes an inter-lane crossbar and intra-lane biased indexing mechanism for a vector general purpose register (VGPR) file. The VGPR file is split into two files. The first VGPR file is a larger register file with one read port and one write port. The second VGPR file is a smaller register file with multiple read ports and one write port. The second VGPR introduces the ability to co-issue more than one instruction per clock cycle.

Type: Application

Filed: February 22, 2021

Publication date: July 1, 2021

Inventors: Jiasheng Chen, Timour Paltashev, Alexander Lyashevsky, Carl Kittredge Wakeland, Michael J. Mantor
Stream processor with decoupled crossbar for cross lane operations

Patent number: 10970081

Abstract: Systems, apparatuses, and methods for implementing a decoupled crossbar for a stream processor are disclosed. In one embodiment, a system includes at least a multi-lane execution pipeline, a vector register file, and a crossbar. The system is configured to determine if a given instruction in an instruction stream requires a permutation on data operands retrieved from the vector register file. The system conveys the data operands to the multi-lane execution pipeline on a first path which includes the crossbar responsive to determining the given instruction requires a permutation on the data operands. The crossbar then performs the necessary permutation to route the data operands to the proper processing lanes. Otherwise, the system conveys the data operands to the multi-lane execution pipeline on a second path which bypasses the crossbar responsive to determining the given instruction does not require a permutation on the input operands.

Type: Grant

Filed: June 29, 2017

Date of Patent: April 6, 2021

Assignee: Advanced Micro Devices, Inc.

Inventors: Jiasheng Chen, Bin He, Mohammad Reza Hakami, Timothy Lottes, Justin David Smith, Michael J. Mantor, Derek Carson
REDUNDANCY METHOD AND APPARATUS FOR SHADER COLUMN REPAIR

Publication number: 20210090208

Abstract: Methods and systems are described. A system includes a redundant shader pipe array that performs rendering calculations on data provided thereto and a shader pipe array that includes a plurality of shader pipes, each of which performs rendering calculations on data provided thereto. The system also includes a circuit that identifies a defective shader pipe of the plurality of shader pipes in the shader pipe array. In response to identifying the defective shader pipe, the circuit generates a signal. The system also includes a redundant shader switch. The redundant shader switch receives the generated signal, and, in response to receiving the generated signal, transfers the data for the defective shader pipe to the redundant shader pipe array.

Type: Application

Filed: December 7, 2020

Publication date: March 25, 2021

Applicant: Advanced Micro Devices, Inc.

Inventors: Michael J. Mantor, Jeffrey T. Brady, Angel E. Socarras
Low power and low latency GPU coprocessor for persistent computing

Patent number: 10929944

Abstract: Systems, apparatuses, and methods for implementing a graphics processing unit (GPU) coprocessor are disclosed. The GPU coprocessor includes a SIMD unit with the ability to self-schedule sub-wave procedures based on input data flow events. A host processor sends messages targeting the GPU coprocessor to a queue. In response to detecting a first message in the queue, the GPU coprocessor schedules a first sub-task for execution. The GPU coprocessor includes an inter-lane crossbar and intra-lane biased indexing mechanism for a vector general purpose register (VGPR) file. The VGPR file is split into two files. The first VGPR file is a larger register file with one read port and one write port. The second VGPR file is a smaller register file with multiple read ports and one write port. The second VGPR introduces the ability to co-issue more than one instruction per clock cycle.

Type: Grant

Filed: November 23, 2016

Date of Patent: February 23, 2021

Assignee: Advanced Micro Devices, Inc.

Inventors: Jiasheng Chen, Timour Paltashev, Alexander Lyashevsky, Carl Kittredge Wakeland, Michael J. Mantor
VMID AS A GPU TASK CONTAINER FOR VIRTUALIZATION

Publication number: 20210011760

Abstract: Systems, apparatuses, and methods for abstracting tasks in virtual memory identifier (VMID) containers are disclosed. A processor coupled to a memory executes a plurality of concurrent tasks including a first task. Responsive to detecting one or more instructions of the first task which correspond to a first operation, the processor retrieves a first identifier (ID) which is used to uniquely identify the first task, wherein the first ID is transparent to the first task. Then, the processor maps the first ID to a second ID and/or a third ID. The processor completes the first operation by using the second ID and/or the third ID to identify the first task to at least a first data structure. In one implementation, the first operation is a memory access operation and the first data structure is a set of page tables. Also, in one implementation, the second ID identifies a first application of the first task and the third ID identifies a first operating system (OS) of the first task.

Type: Application

Filed: July 24, 2020

Publication date: January 14, 2021

Inventors: Anirudh R. Acharya, Michael J. Mantor, Rex Eldon McCrary, Anthony Asaro, Jeffrey Gongxian Cheng, Mark Fowler

1 2 3 4 next