Patents by Inventor Benjiman L. Goodman

Benjiman L. Goodman has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Fence enforcement techniques based on stall characteristics

Patent number: 11954492

Abstract: Techniques are disclosed relating to channel stalls or deactivations based on the latency of prior operations. In some embodiments, a processor includes a plurality of channel pipelines for a plurality of channels and a plurality of execution pipelines shared by the channel pipelines and configured to perform different types of operations provided by the channel pipelines. First scheduler circuitry may assign threads to channels and second scheduler circuitry may assign an operation from a given channel to a given execution pipeline based on decode of an operation for that channel. Dependency circuitry may, for a first operation that depends on a prior operation that uses one of the execution pipelines, determine, based on status information for the prior operation from the one of the execution pipelines, whether to stall the first operation or to deactivate a thread that includes the first operation from its assigned channel.

Type: Grant

Filed: November 10, 2022

Date of Patent: April 9, 2024

Assignee: Apple Inc.

Inventors: Benjiman L. Goodman, Dzung Q. Vu, Robert Kenney
Cache footprint management

Patent number: 11947462

Abstract: Techniques are disclosed relating to cache footprint management. In some embodiments, execution circuitry is configured to perform operations for instructions from multiple threads in parallel. Cache circuitry may store information operated on by threads executed by the execution circuitry. Scheduling circuitry may arbitrate among threads to schedule threads for execution by the execution circuitry. Tracking circuitry may determine one or more performance metrics for the cache circuitry. Control circuitry may, based on the one or more performance metrics meeting a threshold, reduce a limit on a number of threads considered for arbitration by the scheduling circuitry, to control a footprint of information stored by the cache circuitry. Disclosed techniques may advantageously reduce or avoid cache thrashing for certain processor workloads.

Type: Grant

Filed: March 3, 2022

Date of Patent: April 2, 2024

Assignee: Apple Inc.

Inventors: Yoong Chert Foo, Terence M. Potter, Donald R. DeSota, Benjiman L. Goodman, Aroun Demeure, Cheng Li, Winnie W. Yeung
Multi-stage Thread Scheduling

Publication number: 20240095065

Abstract: Techniques are disclosed relating to multi-stage thread scheduling. In some embodiments, processor circuitry includes multiple channel pipelines for multiple channels and multiple execution pipelines shared by the channel pipelines and configured to perform different types of operations provided by the channel pipelines. First scheduler circuitry may arbitrate among threads to assign threads to channels. Second scheduler circuitry may arbitrate among channels to assign an operation from a given channel to a given execution pipeline. The execution pipelines may provide backpressure information to the first scheduler circuitry based on execution status and the first scheduler circuitry may adjust priority of a thread for assignment to a channel based on the backpressure information. Disclosed techniques may reduce channel conflicts and starvation for execution resources.

Type: Application

Filed: November 10, 2022

Publication date: March 21, 2024

Inventors: Benjiman L. Goodman, Anjana Rajendran, Sheenam Jayaswal, Terence M. Potter, Yoong Chert Foo
Fence Enforcement Techniques based on Stall Characteristics

Publication number: 20240095035

Abstract: Techniques are disclosed relating to channel stalls or deactivations based on the latency of prior operations. In some embodiments, a processor includes a plurality of channel pipelines for a plurality of channels and a plurality of execution pipelines shared by the channel pipelines and configured to perform different types of operations provided by the channel pipelines. First scheduler circuitry may assign threads to channels and second scheduler circuitry may assign an operation from a given channel to a given execution pipeline based on decode of an operation for that channel. Dependency circuitry may, for a first operation that depends on a prior operation that uses one of the execution pipelines, determine, based on status information for the prior operation from the one of the execution pipelines, whether to stall the first operation or to deactivate a thread that includes the first operation from its assigned channel.

Type: Application

Filed: November 10, 2022

Publication date: March 21, 2024

Inventors: Benjiman L. Goodman, Dzung Q. Vu, Robert Kenney
Thread Channel Deactivation based on Instruction Cache Misses

Publication number: 20240095031

Abstract: Techniques are disclosed relating to instruction scheduling in the context of instruction cache misses. In some embodiments, first-stage scheduler circuitry is configured to assign threads to channels and second-stage scheduler circuitry is configured to assign an operation from a given channel to a given execution pipeline based on decode of an operation for that channel. In some embodiments, thread replacement circuitry is configured to, in response to an instruction cache miss for an operation of a first thread assigned to a first channel, deactivate the first thread from the first channel.

Type: Application

Filed: November 10, 2022

Publication date: March 21, 2024

Inventors: Justin Friesenhahn, Benjiman L. Goodman
Preemption Techniques for Memory-Backed Registers

Publication number: 20240095176

Abstract: Techniques are disclosed relating to thread preemption in the context of memory-backed registers. In some embodiments, a memory hierarchy includes one or more cache levels and one or more memory circuits. Execution circuitry may operate on operands in architectural registers to execute instructions of threads, where data for the architectural registers is stored and backed by the memory hierarchy. Control circuitry may, in response to a context switch indication for a given thread: flush and invalidate a set of architectural register data from a first cache level and store memory page information (e.g., a page catalog base address) associated with the set of architectural register data.

Type: Application

Filed: November 10, 2022

Publication date: March 21, 2024

Inventors: Benjiman L. Goodman, Yoong Chert Foo, Karl D. Mann, Terence M. Potter, Frank W. Liljeros, Jeffrey T. Brady
Private Memory Management using Utility Thread

Publication number: 20230385201

Abstract: Techniques are disclosed relating to private memory management using a mapping thread, which may be persistent. In some embodiments, a graphics processor is configured to generate a pool of private memory pages for a set of graphics work that includes multiple threads. The processor may maintain a translation table configured to map private memory addresses to virtual addresses based on identifiers of the threads. The processor may execute a mapping thread to receive a request to allocate a private memory page for a requesting thread, select a private memory page from the pool in response to the request, and map the selected page in the translation table for the requesting. The processor may then execute one or more instructions of the requesting thread to access a private memory space, wherein the execution includes translation of a private memory address to a virtual address based on the mapped page in the translation table.

Type: Application

Filed: July 31, 2023

Publication date: November 30, 2023

Inventors: Benjiman L. Goodman, Terence M. Potter, Anjana Rajendran, Mark I. Luffel, William V. Miller
Private memory management using utility thread

Patent number: 11714759

Abstract: Techniques are disclosed relating to private memory management using a mapping thread, which may be persistent. In some embodiments, a graphics processor is configured to generate a pool of private memory pages for a set of graphics work that includes multiple threads. The processor may maintain a translation table configured to map private memory addresses to virtual addresses based on identifiers of the threads. The processor may execute a mapping thread to receive a request to allocate a private memory page for a requesting thread, select a private memory page from the pool in response to the request, and map the selected page in the translation table for the requesting. The processor may then execute one or more instructions of the requesting thread to access a private memory space, wherein the execution includes translation of a private memory address to a virtual address based on the mapped page in the translation table.

Type: Grant

Filed: August 17, 2020

Date of Patent: August 1, 2023

Assignee: Apple Inc.

Inventors: Benjiman L. Goodman, Terence M. Potter, Anjana Rajendran, Mark I. Luffel, William V. Miller
Instruction-level context switch in SIMD processor

Patent number: 11360780

Abstract: Techniques are disclosed relating to context switching in a SIMD processor. In some embodiments, an apparatus includes pipeline circuitry configured to execute graphics instructions included in threads of a group of single-instruction multiple-data (SIMD) threads in a thread group. In some embodiments, context switch circuitry is configured to atomically: save, for the SIMD group, a program counter and information that indicates whether threads in the SIMD group are active using one or more context switch registers, set all threads to an active state for the SIMD group, and branch to handler code for the SIMD group. In some embodiments, the pipeline circuitry is configured to execute the handler code to save context information for the SIMD group and subsequently execute threads of another thread group. Disclosed techniques may allow instruction-level context switching even when some SIMD threads are non-active.

Type: Grant

Filed: January 22, 2020

Date of Patent: June 14, 2022

Assignee: Apple Inc.

Inventors: Benjiman L. Goodman, Terence M. Potter, Anjana Rajendran, Jeffrey T. Brady, Brian K. Reynolds, Jeffrey A. Lohman
Private Memory Management using Utility Thread

Publication number: 20220050790

Abstract: Techniques are disclosed relating to private memory management using a mapping thread, which may be persistent. In some embodiments, a graphics processor is configured to generate a pool of private memory pages for a set of graphics work that includes multiple threads. The processor may maintain a translation table configured to map private memory addresses to virtual addresses based on identifiers of the threads. The processor may execute a mapping thread to receive a request to allocate a private memory page for a requesting thread, select a private memory page from the pool in response to the request, and map the selected page in the translation table for the requesting. The processor may then execute one or more instructions of the requesting thread to access a private memory space, wherein the execution includes translation of a private memory address to a virtual address based on the mapped page in the translation table.

Type: Application

Filed: August 17, 2020

Publication date: February 17, 2022

Inventors: Benjiman L. Goodman, Terence M. Potter, Anjana Rajendran, Mark I. Luffel, William V. Miller
Thread-group-scoped gate instruction

Patent number: 11204774

Abstract: Techniques are disclosed relating to a thread-group-scoped gate instruction. In some embodiments, graphics processor circuitry is configured to execute, for multiple SIMD groups of a thread group, a graphics program that includes a gate instruction. During execution of the gate instruction for a first SIMD group, the processor accesses state information to determine that a threshold number of other SIMD groups in the thread group have not yet executed the gate instruction. Based on the determination, the processor executes a particular set of instructions of the graphics program for the first SIMD group (that is not executed by one or more other SIMD groups that reach the gate instruction after the first SIMD group). For example, the particular set of instructions may be a utility program that performs one or more operations for the entire thread group but is only executed by a subset of the SIMD groups.

Type: Grant

Filed: August 31, 2020

Date of Patent: December 21, 2021

Assignee: Apple Inc.

Inventors: Benjiman L. Goodman, Anjana Rajendran, Jeffrey A. Lohman, Terence M. Potter
Instruction-level Context Switch in SIMD Processor

Publication number: 20210224072

Abstract: Techniques are disclosed relating to context switching in a SIMD processor. In some embodiments, an apparatus includes pipeline circuitry configured to execute graphics instructions included in threads of a group of single-instruction multiple-data (SIMD) threads in a thread group. In some embodiments, context switch circuitry is configured to atomically: save, for the SIMD group, a program counter and information that indicates whether threads in the SIMD group are active using one or more context switch registers, set all threads to an active state for the SIMD group, and branch to handler code for the SIMD group. In some embodiments, the pipeline circuitry is configured to execute the handler code to save context information for the SIMD group and subsequently execute threads of another thread group. Disclosed techniques may allow instruction-level context switching even when some SIMD threads are non-active.

Type: Application

Filed: January 22, 2020

Publication date: July 22, 2021

Inventors: Benjiman L. Goodman, Terence M. Potter, Anjana Rajendran, Jeffrey T. Brady, Brian K. Reynolds, Jeffrey A. Lohman
GPU task scheduling

Patent number: 10902545

Abstract: Techniques are disclosed relating to scheduling tasks for graphics processing. In one embodiment, a graphics unit is configured to render a frame of graphics data using a plurality of pass groups and the frame of graphics data includes a plurality of frame portions. In this embodiment, the graphics unit includes scheduling circuitry configured to receive a plurality of tasks, maintain pass group information for each of the plurality of tasks, and maintain relative age information for the plurality of frame portions. In this embodiment, the scheduling circuitry is configured to select a task for execution based on the pass group information and the age information. In some embodiments, the scheduling circuitry is configured to select tasks from an oldest frame portion and current pass group before selecting other tasks. This scheduling approach may result in efficient execution of various different types of graphics workloads.

Type: Grant

Filed: December 17, 2014

Date of Patent: January 26, 2021

Assignee: Apple Inc.

Inventors: Robert D. Kenney, Benjiman L. Goodman, Terence M. Potter
Data alignment and formatting for graphics processing unit

Patent number: 10769746

Abstract: A data queuing and format apparatus is disclosed. A first selection circuit may be configured to selectively couple a first subset of data to a first plurality of data lines dependent upon control information, and a second selection circuit may be configured to selectively couple a second subset of data to a second plurality of data lines dependent upon the control information. A storage array may include multiple storage units, and each storage unit may be configured to receive data from one or more data lines of either the first or second plurality of data lines dependent upon the control information.

Type: Grant

Filed: September 25, 2014

Date of Patent: September 8, 2020

Assignee: Apple Inc.

Inventors: Liang Xia, Robert D. Kenney, Benjiman L. Goodman, Terence M. Potter
GPU resource priorities based on hardware utilization

Patent number: 10503546

Abstract: In general, techniques are disclosed for tracking and allocating graphics processor hardware over specified periods of time. More particularly, hardware sensors may be used to determine the utilization of graphics processor hardware after each of a number of specified intervals (referred to as “sample intervals”). The utilization values so captured may be combined after a first number of sample intervals (the combined interval referred to as an “epoch interval”) and used to determine a normalized utilization of the graphic processor's hardware resources. Normalized epoch utilization values have been adjusted to account for resources used by concurrently executing processes. In some embodiments, a lower priority process that obtains and fails to release resources that should be allocated to one or more higher priority processes may be detected, paused, and its hardware resources given to the higher priority processes.

Type: Grant

Filed: June 6, 2017

Date of Patent: December 10, 2019

Assignee: Apple Inc.

Inventors: Tatsuya Iwamoto, Kutty Banerjee, Benjiman L. Goodman, Terence M. Potter
Pipelining and Concurrency Techniques for Groups of Graphics Processing Work

Publication number: 20190244323

Abstract: Techniques are disclosed relating to processing groups of graphics work (which may be referred to as “kicks”) using a graphics processing pipeline. In some embodiments, a graphics processor includes multiple sets of configuration registers such that multiple kicks can be processed in the pipeline at the same time. In some embodiments, kicks are pipelined such that a subsequent kick ramps up use of hardware resources as a previous kick winds down. In some embodiments, the graphics processing may execute kicks concurrently and/or preemptively, e.g., based on a priority scheme. In some embodiments, the disclosed techniques may be used with pipelines that include front and back-end fixed function circuitry as well as shared programmable resources such as shader cores. In various embodiments, the disclosed techniques may improve overall performance and/or reduce latency for high-priority graphics tasks.

Type: Application

Filed: February 2, 2018

Publication date: August 8, 2019

Inventors: Benjiman L. Goodman, Christopher L. Spencer, Mark D. Earl, Robert S. Hartog, Timothy M. Kelley
Power saving with dynamic pulse insertion

Patent number: 10270434

Abstract: A method and apparatus for saving power in integrated circuits is disclosed. An IC includes functional circuit blocks which are not placed into a sleep mode when idle. A power management circuit may monitor the activity levels of the functional circuit blocks not placed into a sleep mode. When the power management circuit detects that an activity level of one of the non-sleep functional circuit blocks is less than a predefined threshold, it reduce the frequency of a clock signal provided thereto by scheduling only one pulse of a clock signal for every N pulses of the full frequency clock signal. The remaining N?1 pulses of the clock signal may be inhibited. If a high priority transaction inbound for the functional circuit block is detected, an inserted pulse of the clock signal may be provided to the functional unit irrespective of when a most recent regular pulse was provided.

Type: Grant

Filed: February 18, 2016

Date of Patent: April 23, 2019

Assignee: Apple Inc.

Inventors: James Wang, Benjiman L. Goodman, Liang-Kai Wang, Robert D. Kenney
GPU Resource Tracking

Publication number: 20180349146

Abstract: In general, techniques are disclosed for tracking and allocating graphics processor hardware over specified periods of time. More particularly, hardware sensors may be used to determine the utilization of graphics processor hardware after each of a number of specified intervals (referred to as “sample intervals”). The utilization values so captured may be combined after a first number of sample intervals (the combined interval referred to as an “epoch interval”) and used to determine a normalized utilization of the graphic processor's hardware resources. Normalized epoch utilization values have been adjusted to account for resources used by concurrently executing processes. In some embodiments, a lower priority process that obtains and fails to release resources that should be allocated to one or more higher priority processes may be detected, paused, and its hardware resources given to the higher priority processes.

Type: Application

Filed: June 6, 2017

Publication date: December 6, 2018

Inventors: Tatsuya Iwamoto, Kutty Banerjee, Benjiman L. Goodman, Terence M. Potter
PROCESSING CIRCUIT HARDWARE RESOURCE ALLOCATION SYSTEM

Publication number: 20180173560

Abstract: In various embodiments, hardware resources of a processing circuit may be allocated to a plurality of processes based on priorities of the processes. A hardware resource utilization sensor may detect a current utilization of the hardware resources by a process. A utilization accumulation circuit may determine a utilization of the hardware resources by the process over a particular amount of time. A target utilization of the hardware resources for the process may be determined based on the utilization of the hardware resources over the particular amount of time. A comparator circuit may compare the current utilization to the target utilization. A process priority adjustment circuit may adjust a priority of the process based on the comparison. Based on the adjusted priority, a different amount of hardware resources may be allocated to the processes.

Type: Application

Filed: December 21, 2016

Publication date: June 21, 2018

Inventors: Gokhan Avkarogullari, Terence M. Potter, Benjiman L. Goodman, Ralph C. Taylor, Kutty Banerjee
Texture state cache

Patent number: 9811875

Abstract: Techniques are disclosed relating to a cache configured to store state information for texture mapping. In one embodiment, a texture state cache includes a plurality of entries configured to store state information relating to one or more stored textures. In this embodiment, the texture state cache also includes texture processing circuitry configured to retrieve state information for one of the stored textures from one of the entries in the texture state cache and determine pixel attributes based on the texture and the retrieved state information. The state information may include texture state information and sampler state information, in some embodiments. The texture state cache may allow for reduced rending times and power consumption, in some embodiments.

Type: Grant

Filed: September 10, 2014

Date of Patent: November 7, 2017

Assignee: Apple Inc.

Inventors: Benjiman L. Goodman, Adam T. Moerschell, James S. Blomgren

1 2 3 4 5 … next