Patents by Inventor Andrew M. Havlir

Andrew M. Havlir has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Software control techniques for graphics hardware that supports logical slots and reservation of graphics hardware based on a priority threshold

Patent number: 12175300

Abstract: Disclosed embodiments relate to software control of graphics hardware that supports logical slots. In some embodiments, a GPU includes circuitry that implements a plurality of logical slots and a set of graphics processor sub-units that each implement multiple distributed hardware slots. Control circuitry may determine mappings between logical slots and distributed hardware slots for different sets of graphics work. Various mapping aspects may be software-controlled. For example, software may specify one or more of the following: priority information for a set of graphics work, to retain the mapping after completion of the work, a distribution rule, a target group of sub-units, a sub-unit mask, a scheduling policy, to reclaim hardware slots from another logical slot, etc. Software may also query status of the work.

Type: Grant

Filed: August 11, 2021

Date of Patent: December 24, 2024

Assignee: Apple Inc.

Inventors: Andrew M. Havlir, Steven Fishwick, Melissa L. Velez
Compute Kernel Parsing with Limits in one or more Dimensions

Publication number: 20240345892

Abstract: Techniques are disclosed relating to dispatching compute work from a compute stream. In some embodiments, a graphics processor executes instructions of compute kernels. Workload parser circuitry may determine, for distribution to the graphics processor circuitry, a set of workgroups from a compute kernel that includes workgroups organized in multiple dimensions, including a first number of workgroups in a first dimension and a second number of workgroups in a second dimension. This may include determining multiple sub-kernels for the compute kernel, wherein a first sub-kernel includes, in the first dimension, a limited number of workgroups that is smaller than the first number of workgroups. The parser circuitry may iterate through workgroups in both the first and second dimensions to generate the set of workgroups, proceeding through the first sub-kernel before iterating through any of the other sub-kernels. Disclosed techniques may provide desirable shapes for batches of workgroups.

Type: Application

Filed: May 24, 2024

Publication date: October 17, 2024

Inventors: Andrew M. Havlir, Ajay Simha Modugala, Karl D. Mann
Logical slot to hardware slot mapping for graphics processors

Patent number: 12086644

Abstract: Disclosed techniques relate to work distribution in graphics processors. In some embodiments, an apparatus includes circuitry that implements a plurality of logical slots and a set of graphics processor sub-units that each implement multiple distributed hardware slots. The circuitry may determine different distribution rules for first and second sets of graphics work and map logical slots to distributed hardware slots based on the distribution rules. In various embodiments, disclosed techniques may advantageously distribute work efficiently across distributed shader processors for graphics kicks of various sizes.

Type: Grant

Filed: August 11, 2021

Date of Patent: September 10, 2024

Assignee: Apple Inc.

Inventors: Andrew M. Havlir, Steven Fishwick, David A. Gotwalt, Benjamin Bowman, Ralph C. Taylor, Melissa L. Velez, Mladen Wilder, Ali Rabbani Rankouhi, Fergus W. MacGarry
Compute kernel parsing with limits in one or more dimensions with iterating through workgroups in the one or more dimensions for execution

Patent number: 12020075

Abstract: Techniques are disclosed relating to dispatching compute work from a compute stream. In some embodiments, a graphics processor executes instructions of compute kernels. Workload parser circuitry may determine, for distribution to the graphics processor circuitry, a set of workgroups from a compute kernel that includes workgroups organized in multiple dimensions, including a first number of workgroups in a first dimension and a second number of workgroups in a second dimension. This may include determining multiple sub-kernels for the compute kernel, wherein a first sub-kernel includes, in the first dimension, a limited number of workgroups that is smaller than the first number of workgroups. The parser circuitry may iterate through workgroups in both the first and second dimensions to generate the set of workgroups, proceeding through the first sub-kernel before iterating through any of the other sub-kernels. Disclosed techniques may provide desirable shapes for batches of workgroups.

Type: Grant

Filed: September 11, 2020

Date of Patent: June 25, 2024

Assignee: Apple Inc.

Inventors: Andrew M. Havlir, Ajay Simha Modugala, Karl D. Mann
Instruction storage

Patent number: 11727530

Abstract: Techniques are disclosed relating to low-level instruction storage in a processing unit. In some embodiments, a graphics unit includes execution circuitry, decode circuitry, hazard circuitry, and caching circuitry. In some embodiments the execution circuitry is configured to execute clauses of graphics instructions. In some embodiments, the decode circuitry is configured to receive graphics instructions and a clause identifier for each received graphics instruction and to decode the received graphics instructions. In some embodiments, the caching circuitry includes a plurality of entries each configured to store a set of decoded instructions in the same clause. A given clause may be fetched and executed multiple times, e.g., for different SIMD groups, while stored in the caching circuitry.

Type: Grant

Filed: May 28, 2021

Date of Patent: August 15, 2023

Assignee: Apple Inc.

Inventors: Andrew M. Havlir, Dzung Q. Vu, Liang Kai Wang
Software Control Techniques for Graphics Hardware that Supports Logical Slots

Publication number: 20230051906

Abstract: Disclosed embodiments relate to software control of graphics hardware that supports logical slots. In some embodiments, a GPU includes circuitry that implements a plurality of logical slots and a set of graphics processor sub-units that each implement multiple distributed hardware slots. Control circuitry may determine mappings between logical slots and distributed hardware slots for different sets of graphics work. Various mapping aspects may be software-controlled. For example, software may specify one or more of the following: priority information for a set of graphics work, to retain the mapping after completion of the work, a distribution rule, a target group of sub-units, a sub-unit mask, a scheduling policy, to reclaim hardware slots from another logical slot, etc. Software may also query status of the work.

Type: Application

Filed: August 11, 2021

Publication date: February 16, 2023

Inventors: Andrew M. Havlir, Steven Fishwick, Melissa L. Velez
Logical Slot to Hardware Slot Mapping for Graphics Processors

Publication number: 20230050061

Abstract: Disclosed techniques relate to work distribution in graphics processors. In some embodiments, an apparatus includes circuitry that implements a plurality of logical slots and a set of graphics processor sub-units that each implement multiple distributed hardware slots. The circuitry may determine different distribution rules for first and second sets of graphics work and map logical slots to distributed hardware slots based on the distribution rules. In various embodiments, disclosed techniques may advantageously distribute work efficiently across distributed shader processors for graphics kicks of various sizes.

Type: Application

Filed: August 11, 2021

Publication date: February 16, 2023

Inventors: Andrew M. Havlir, Steven Fishwick, David A. Gotwalt, Benjamin Bowman, Ralph C. Taylor, Melissa L. Velez, Mladen Wilder, Ali Rabbani Rankouhi, Fergus W. MacGarry
Affinity-based Graphics Scheduling

Publication number: 20230047481

Abstract: Techniques are disclosed relating to affinity-based scheduling of graphics work. In disclosed embodiments, first and second groups of graphics processor sub-units may share respective first and second caches. Distribution circuitry may receive a software-specified set of graphics work and a software-indicated mapping of portions of the set of graphics work to groups of graphics processor sub-units. The distribution circuitry may assign subsets of the set of graphics work based on the mapping. This may improve cache efficiency, in some embodiments, by allowing graphics work that accesses the same memory areas to be assigned to the same group of sub-units that share a cache.

Type: Application

Filed: August 11, 2021

Publication date: February 16, 2023

Inventors: Andrew M. Havlir, Ajay Simha Modugala, Benjamin Bowman, Yunjun Zhang
Dynamic buffering control for compute work distribution

Patent number: 11500692

Abstract: Techniques are disclosed relating to dynamically adjusting buffering for distributing compute work in a graphics processor. In some embodiments, the graphics processor includes shader circuitry configured to process compute work from a compute kernel, multiple distributed workload parser circuits configured to send compute work to the shader circuitry, primary workload parser circuitry configured to send, via a communications fabric, compute work from the compute kernel to the distributed workload parser circuits, and buffer circuitry configured to buffer compute work received by one or more of the distributed workload parser circuits from the primary workload parser circuitry. In some embodiments, the graphics processor is configured to dynamically adjust a limit on the number of entries used in the buffer circuitry based on information indicating complexity of the compute kernel. This may advantageously maintain launch rates while reducing or avoiding workload imbalances, in some embodiments.

Type: Grant

Filed: September 15, 2020

Date of Patent: November 15, 2022

Assignee: Apple Inc.

Inventors: Andrew M. Havlir, Benjamin Bowman
Dynamic Buffering Control for Compute Work Distribution

Publication number: 20220083396

Abstract: Techniques are disclosed relating to dynamically adjusting buffering for distributing compute work in a graphics processor. In some embodiments, the graphics processor includes shader circuitry configured to process compute work from a compute kernel, multiple distributed workload parser circuits configured to send compute work to the shader circuitry, primary workload parser circuitry configured to send, via a communications fabric, compute work from the compute kernel to the distributed workload parser circuits, and buffer circuitry configured to buffer compute work received by one or more of the distributed workload parser circuits from the primary workload parser circuitry. In some embodiments, the graphics processor is configured to dynamically adjust a limit on the number of entries used in the buffer circuitry based on information indicating complexity of the compute kernel. This may advantageously maintain launch rates while reducing or avoiding workload imbalances, in some embodiments.

Type: Application

Filed: September 15, 2020

Publication date: March 17, 2022

Inventors: Andrew M. Havlir, Benjamin Bowman
Compute Kernel Parsing with Limits in one or more Dimensions

Publication number: 20220083377

Abstract: Techniques are disclosed relating to dispatching compute work from a compute stream. In some embodiments, a graphics processor executes instructions of compute kernels. Workload parser circuitry may determine, for distribution to the graphics processor circuitry, a set of workgroups from a compute kernel that includes workgroups organized in multiple dimensions, including a first number of workgroups in a first dimension and a second number of workgroups in a second dimension. This may include determining multiple sub-kernels for the compute kernel, wherein a first sub-kernel includes, in the first dimension, a limited number of workgroups that is smaller than the first number of workgroups. The parser circuitry may iterate through workgroups in both the first and second dimensions to generate the set of workgroups, proceeding through the first sub-kernel before iterating through any of the other sub-kernels. Disclosed techniques may provide desirable shapes for batches of workgroups.

Type: Application

Filed: September 11, 2020

Publication date: March 17, 2022

Inventors: Andrew M. Havlir, Ajay Simha Modugala, Karl D. Mann
Low latency fetch circuitry for compute kernels

Patent number: 11256510

Abstract: Techniques are disclosed relating to fetching items from a compute command stream that includes compute kernels. In some embodiments, stream fetch circuitry sequentially pre-fetches items from the stream and stores them in a buffer. In some embodiments, fetch parse circuitry iterate through items in the buffer using a fetch parse pointer to detect indirect-data-access items and/or redirect items in the buffer. The fetch parse circuitry may send detected indirect data accesses to indirect-fetch circuitry, which may buffer requests. In some embodiments, execute parse circuitry iterates through items in the buffer using an execute parse pointer (e.g., which may trail the fetch parse pointer) and outputs both item data from the buffer and indirect-fetch results from indirect-fetch circuitry for execution. In various embodiments, the disclosed techniques may reduce fetch latency for compute kernels.

Type: Grant

Filed: October 8, 2020

Date of Patent: February 22, 2022

Assignee: Apple Inc.

Inventors: Andrew M. Havlir, Jeffrey T. Brady
Completion signaling techniques in distributed processor

Patent number: 11250538

Abstract: Techniques are disclosed relating to tracking compute workgroup completions in a distributed processor. In some embodiments, an apparatus includes a plurality of shader processors configured to perform operations for compute workgroups included in compute kernels, a master workload parser circuit, a plurality of distributed workload parser circuits, and a communications fabric connected to the plurality of distributed workload parser circuits and the master workload parser circuit.

Type: Grant

Filed: March 9, 2020

Date of Patent: February 15, 2022

Assignee: Apple Inc.

Inventors: Andrew M. Havlir, Ajay Simha Modugala
Instruction Storage

Publication number: 20210358078

Abstract: Techniques are disclosed relating to low-level instruction storage in a processing unit. In some embodiments, a graphics unit includes execution circuitry, decode circuitry, hazard circuitry, and caching circuitry. In some embodiments the execution circuitry is configured to execute clauses of graphics instructions. In some embodiments, the decode circuitry is configured to receive graphics instructions and a clause identifier for each received graphics instruction and to decode the received graphics instructions. In some embodiments, the caching circuitry includes a plurality of entries each configured to store a set of decoded instructions in the same clause. A given clause may be fetched and executed multiple times, e.g., for different SIMD groups, while stored in the caching circuitry.

Type: Application

Filed: May 28, 2021

Publication date: November 18, 2021

Inventors: Andrew M. Havlir, Dzung Q. Vu, Liang Kai Wang
Completion Signaling Techniques in Distributed Processor

Publication number: 20210279832

Abstract: Techniques are disclosed relating to tracking compute workgroup completions in a distributed processor. In some embodiments, an apparatus includes a plurality of shader processors configured to perform operations for compute workgroups included in compute kernels, a master workload parser circuit, a plurality of distributed workload parser circuits, and a communications fabric connected to the plurality of distributed workload parser circuits and the master workload parser circuit.

Type: Application

Filed: March 9, 2020

Publication date: September 9, 2021

Inventors: Andrew M. Havlir, Ajay Simha Modugala
Dependency scheduling for control stream in parallel processor

Patent number: 11080101

Abstract: Techniques are disclosed relating to processing a control stream such as a compute control stream. In some embodiments, the control stream includes kernels and commands for multiple substreams. In some embodiments, multiple substream processors are each configured to: fetch and parse portions of the control stream corresponding to an assigned substream and, in response to a neighbor barrier command in the assigned substream that identifies another substream, communicate the identified other substream to a barrier clearing circuitry. In some embodiments, the barrier clearing circuitry is configured to determine whether to allow the assigned substream to proceed past the neighbor barrier command based on communication of a most-recently-completed command from a substream processor to which the other substream is assigned (e.g., based on whether the most-recently-completed command meets a command identifier communicated in the neighbor barrier command).

Type: Grant

Filed: March 22, 2019

Date of Patent: August 3, 2021

Assignee: Apple Inc.

Inventors: Andrew M. Havlir, Jason D. Carroll, Karl D. Mann
Instruction storage

Patent number: 11023997

Abstract: Techniques are disclosed relating to low-level instruction storage in a processing unit. In some embodiments, a graphics unit includes execution circuitry, decode circuitry, hazard circuitry, and caching circuitry. In some embodiments the execution circuitry is configured to execute clauses of graphics instructions. In some embodiments, the decode circuitry is configured to receive graphics instructions and a clause identifier for each received graphics instruction and to decode the received graphics instructions. In some embodiments, the caching circuitry includes a plurality of entries each configured to store a set of decoded instructions in the same clause. A given clause may be fetched and executed multiple times, e.g., for different SIMD groups, while stored in the caching circuitry.

Type: Grant

Filed: July 24, 2017

Date of Patent: June 1, 2021

Assignee: Apple Inc.

Inventors: Andrew M. Havlir, Dzung Q. Vu, Liang Kai Wang
Low Latency Fetch Circuitry for Compute Kernels

Publication number: 20210026638

Abstract: Techniques are disclosed relating to fetching items from a compute command stream that includes compute kernels. In some embodiments, stream fetch circuitry sequentially pre-fetches items from the stream and stores them in a buffer. In some embodiments, fetch parse circuitry iterate through items in the buffer using a fetch parse pointer to detect indirect-data-access items and/or redirect items in the buffer. The fetch parse circuitry may send detected indirect data accesses to indirect-fetch circuitry, which may buffer requests. In some embodiments, execute parse circuitry iterates through items in the buffer using an execute parse pointer (e.g., which may trail the fetch parse pointer) and outputs both item data from the buffer and indirect-fetch results from indirect-fetch circuitry for execution. In various embodiments, the disclosed techniques may reduce fetch latency for compute kernels.

Type: Application

Filed: October 8, 2020

Publication date: January 28, 2021

Inventors: Andrew M. Havlir, Jeffrey T. Brady
Techniques for context switching using distributed compute workload parsers

Patent number: 10901777

Abstract: Techniques are disclosed relating to context switching using distributed compute workload parsers. In some embodiments, an apparatus includes a plurality of shader units configured to perform operations for compute workgroups included in compute kernels, a plurality of distributed workload parser circuits each configured to dispatch workgroups to a respective set of the shader units, a communications fabric, and a master workload parser circuit configured to communicate with the distributed workload parser circuits via the communications fabric. In some embodiments, the master workload parser circuit maintains a first set of master state information that does not change for a compute kernel based on operations by the shader units and a second set of master state information that may be changed by operations specified by the kernel. In some embodiments, the master workload parser circuit performs a multi-phase state storage process in communications with the distributed workload parser circuits.

Type: Grant

Filed: September 26, 2018

Date of Patent: January 26, 2021

Assignee: Apple Inc.

Inventors: Andrew M. Havlir, Jeffrey T. Brady
Distributing compute work using distributed parser circuitry

Patent number: 10846131

Abstract: Techniques are disclosed relating to distributing work from compute kernels using distributed parser circuitry. In some embodiments, a master parser is configured to communicate with distributed parsers over a communications fabric. In some embodiments, the master parser generates batches of compute workgroups from a compute kernel and assigns batches to ones of the distributed workload parser circuits. In some embodiments, based on an indicated number of sequential workgroups to send to the same distributed workload parser, the master parser selects a distributed parser to receive a set of batches and avoids selecting a distributed workload parser whose queue would be filled by the batches. In some embodiments, this may avoid stalling for a kernel terminate command. In some embodiments, the master parser may adjust a batch size to avoid filling a distributed parser's queue. In some embodiments, the distributed parsers may use similar techniques when sending work to shader units.

Type: Grant

Filed: September 26, 2018

Date of Patent: November 24, 2020

Assignee: Apple Inc.

Inventors: Andrew M. Havlir, Jeffrey T. Brady, Jingfei Kong

1 2 3 4 next