Patents by Inventor Benjiman L. Goodman

Benjiman L. Goodman has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11954492
    Abstract: Techniques are disclosed relating to channel stalls or deactivations based on the latency of prior operations. In some embodiments, a processor includes a plurality of channel pipelines for a plurality of channels and a plurality of execution pipelines shared by the channel pipelines and configured to perform different types of operations provided by the channel pipelines. First scheduler circuitry may assign threads to channels and second scheduler circuitry may assign an operation from a given channel to a given execution pipeline based on decode of an operation for that channel. Dependency circuitry may, for a first operation that depends on a prior operation that uses one of the execution pipelines, determine, based on status information for the prior operation from the one of the execution pipelines, whether to stall the first operation or to deactivate a thread that includes the first operation from its assigned channel.
    Type: Grant
    Filed: November 10, 2022
    Date of Patent: April 9, 2024
    Assignee: Apple Inc.
    Inventors: Benjiman L. Goodman, Dzung Q. Vu, Robert Kenney
  • Patent number: 11947462
    Abstract: Techniques are disclosed relating to cache footprint management. In some embodiments, execution circuitry is configured to perform operations for instructions from multiple threads in parallel. Cache circuitry may store information operated on by threads executed by the execution circuitry. Scheduling circuitry may arbitrate among threads to schedule threads for execution by the execution circuitry. Tracking circuitry may determine one or more performance metrics for the cache circuitry. Control circuitry may, based on the one or more performance metrics meeting a threshold, reduce a limit on a number of threads considered for arbitration by the scheduling circuitry, to control a footprint of information stored by the cache circuitry. Disclosed techniques may advantageously reduce or avoid cache thrashing for certain processor workloads.
    Type: Grant
    Filed: March 3, 2022
    Date of Patent: April 2, 2024
    Assignee: Apple Inc.
    Inventors: Yoong Chert Foo, Terence M. Potter, Donald R. DeSota, Benjiman L. Goodman, Aroun Demeure, Cheng Li, Winnie W. Yeung
  • Publication number: 20240095065
    Abstract: Techniques are disclosed relating to multi-stage thread scheduling. In some embodiments, processor circuitry includes multiple channel pipelines for multiple channels and multiple execution pipelines shared by the channel pipelines and configured to perform different types of operations provided by the channel pipelines. First scheduler circuitry may arbitrate among threads to assign threads to channels. Second scheduler circuitry may arbitrate among channels to assign an operation from a given channel to a given execution pipeline. The execution pipelines may provide backpressure information to the first scheduler circuitry based on execution status and the first scheduler circuitry may adjust priority of a thread for assignment to a channel based on the backpressure information. Disclosed techniques may reduce channel conflicts and starvation for execution resources.
    Type: Application
    Filed: November 10, 2022
    Publication date: March 21, 2024
    Inventors: Benjiman L. Goodman, Anjana Rajendran, Sheenam Jayaswal, Terence M. Potter, Yoong Chert Foo
  • Publication number: 20240095035
    Abstract: Techniques are disclosed relating to channel stalls or deactivations based on the latency of prior operations. In some embodiments, a processor includes a plurality of channel pipelines for a plurality of channels and a plurality of execution pipelines shared by the channel pipelines and configured to perform different types of operations provided by the channel pipelines. First scheduler circuitry may assign threads to channels and second scheduler circuitry may assign an operation from a given channel to a given execution pipeline based on decode of an operation for that channel. Dependency circuitry may, for a first operation that depends on a prior operation that uses one of the execution pipelines, determine, based on status information for the prior operation from the one of the execution pipelines, whether to stall the first operation or to deactivate a thread that includes the first operation from its assigned channel.
    Type: Application
    Filed: November 10, 2022
    Publication date: March 21, 2024
    Inventors: Benjiman L. Goodman, Dzung Q. Vu, Robert Kenney
  • Publication number: 20240095031
    Abstract: Techniques are disclosed relating to instruction scheduling in the context of instruction cache misses. In some embodiments, first-stage scheduler circuitry is configured to assign threads to channels and second-stage scheduler circuitry is configured to assign an operation from a given channel to a given execution pipeline based on decode of an operation for that channel. In some embodiments, thread replacement circuitry is configured to, in response to an instruction cache miss for an operation of a first thread assigned to a first channel, deactivate the first thread from the first channel.
    Type: Application
    Filed: November 10, 2022
    Publication date: March 21, 2024
    Inventors: Justin Friesenhahn, Benjiman L. Goodman
  • Publication number: 20240095176
    Abstract: Techniques are disclosed relating to thread preemption in the context of memory-backed registers. In some embodiments, a memory hierarchy includes one or more cache levels and one or more memory circuits. Execution circuitry may operate on operands in architectural registers to execute instructions of threads, where data for the architectural registers is stored and backed by the memory hierarchy. Control circuitry may, in response to a context switch indication for a given thread: flush and invalidate a set of architectural register data from a first cache level and store memory page information (e.g., a page catalog base address) associated with the set of architectural register data.
    Type: Application
    Filed: November 10, 2022
    Publication date: March 21, 2024
    Inventors: Benjiman L. Goodman, Yoong Chert Foo, Karl D. Mann, Terence M. Potter, Frank W. Liljeros, Jeffrey T. Brady
  • Publication number: 20230385201
    Abstract: Techniques are disclosed relating to private memory management using a mapping thread, which may be persistent. In some embodiments, a graphics processor is configured to generate a pool of private memory pages for a set of graphics work that includes multiple threads. The processor may maintain a translation table configured to map private memory addresses to virtual addresses based on identifiers of the threads. The processor may execute a mapping thread to receive a request to allocate a private memory page for a requesting thread, select a private memory page from the pool in response to the request, and map the selected page in the translation table for the requesting. The processor may then execute one or more instructions of the requesting thread to access a private memory space, wherein the execution includes translation of a private memory address to a virtual address based on the mapped page in the translation table.
    Type: Application
    Filed: July 31, 2023
    Publication date: November 30, 2023
    Inventors: Benjiman L. Goodman, Terence M. Potter, Anjana Rajendran, Mark I. Luffel, William V. Miller
  • Patent number: 11714759
    Abstract: Techniques are disclosed relating to private memory management using a mapping thread, which may be persistent. In some embodiments, a graphics processor is configured to generate a pool of private memory pages for a set of graphics work that includes multiple threads. The processor may maintain a translation table configured to map private memory addresses to virtual addresses based on identifiers of the threads. The processor may execute a mapping thread to receive a request to allocate a private memory page for a requesting thread, select a private memory page from the pool in response to the request, and map the selected page in the translation table for the requesting. The processor may then execute one or more instructions of the requesting thread to access a private memory space, wherein the execution includes translation of a private memory address to a virtual address based on the mapped page in the translation table.
    Type: Grant
    Filed: August 17, 2020
    Date of Patent: August 1, 2023
    Assignee: Apple Inc.
    Inventors: Benjiman L. Goodman, Terence M. Potter, Anjana Rajendran, Mark I. Luffel, William V. Miller
  • Patent number: 11360780
    Abstract: Techniques are disclosed relating to context switching in a SIMD processor. In some embodiments, an apparatus includes pipeline circuitry configured to execute graphics instructions included in threads of a group of single-instruction multiple-data (SIMD) threads in a thread group. In some embodiments, context switch circuitry is configured to atomically: save, for the SIMD group, a program counter and information that indicates whether threads in the SIMD group are active using one or more context switch registers, set all threads to an active state for the SIMD group, and branch to handler code for the SIMD group. In some embodiments, the pipeline circuitry is configured to execute the handler code to save context information for the SIMD group and subsequently execute threads of another thread group. Disclosed techniques may allow instruction-level context switching even when some SIMD threads are non-active.
    Type: Grant
    Filed: January 22, 2020
    Date of Patent: June 14, 2022
    Assignee: Apple Inc.
    Inventors: Benjiman L. Goodman, Terence M. Potter, Anjana Rajendran, Jeffrey T. Brady, Brian K. Reynolds, Jeffrey A. Lohman
  • Publication number: 20220050790
    Abstract: Techniques are disclosed relating to private memory management using a mapping thread, which may be persistent. In some embodiments, a graphics processor is configured to generate a pool of private memory pages for a set of graphics work that includes multiple threads. The processor may maintain a translation table configured to map private memory addresses to virtual addresses based on identifiers of the threads. The processor may execute a mapping thread to receive a request to allocate a private memory page for a requesting thread, select a private memory page from the pool in response to the request, and map the selected page in the translation table for the requesting. The processor may then execute one or more instructions of the requesting thread to access a private memory space, wherein the execution includes translation of a private memory address to a virtual address based on the mapped page in the translation table.
    Type: Application
    Filed: August 17, 2020
    Publication date: February 17, 2022
    Inventors: Benjiman L. Goodman, Terence M. Potter, Anjana Rajendran, Mark I. Luffel, William V. Miller
  • Patent number: 11204774
    Abstract: Techniques are disclosed relating to a thread-group-scoped gate instruction. In some embodiments, graphics processor circuitry is configured to execute, for multiple SIMD groups of a thread group, a graphics program that includes a gate instruction. During execution of the gate instruction for a first SIMD group, the processor accesses state information to determine that a threshold number of other SIMD groups in the thread group have not yet executed the gate instruction. Based on the determination, the processor executes a particular set of instructions of the graphics program for the first SIMD group (that is not executed by one or more other SIMD groups that reach the gate instruction after the first SIMD group). For example, the particular set of instructions may be a utility program that performs one or more operations for the entire thread group but is only executed by a subset of the SIMD groups.
    Type: Grant
    Filed: August 31, 2020
    Date of Patent: December 21, 2021
    Assignee: Apple Inc.
    Inventors: Benjiman L. Goodman, Anjana Rajendran, Jeffrey A. Lohman, Terence M. Potter
  • Publication number: 20210224072
    Abstract: Techniques are disclosed relating to context switching in a SIMD processor. In some embodiments, an apparatus includes pipeline circuitry configured to execute graphics instructions included in threads of a group of single-instruction multiple-data (SIMD) threads in a thread group. In some embodiments, context switch circuitry is configured to atomically: save, for the SIMD group, a program counter and information that indicates whether threads in the SIMD group are active using one or more context switch registers, set all threads to an active state for the SIMD group, and branch to handler code for the SIMD group. In some embodiments, the pipeline circuitry is configured to execute the handler code to save context information for the SIMD group and subsequently execute threads of another thread group. Disclosed techniques may allow instruction-level context switching even when some SIMD threads are non-active.
    Type: Application
    Filed: January 22, 2020
    Publication date: July 22, 2021
    Inventors: Benjiman L. Goodman, Terence M. Potter, Anjana Rajendran, Jeffrey T. Brady, Brian K. Reynolds, Jeffrey A. Lohman
  • Patent number: 10902545
    Abstract: Techniques are disclosed relating to scheduling tasks for graphics processing. In one embodiment, a graphics unit is configured to render a frame of graphics data using a plurality of pass groups and the frame of graphics data includes a plurality of frame portions. In this embodiment, the graphics unit includes scheduling circuitry configured to receive a plurality of tasks, maintain pass group information for each of the plurality of tasks, and maintain relative age information for the plurality of frame portions. In this embodiment, the scheduling circuitry is configured to select a task for execution based on the pass group information and the age information. In some embodiments, the scheduling circuitry is configured to select tasks from an oldest frame portion and current pass group before selecting other tasks. This scheduling approach may result in efficient execution of various different types of graphics workloads.
    Type: Grant
    Filed: December 17, 2014
    Date of Patent: January 26, 2021
    Assignee: Apple Inc.
    Inventors: Robert D. Kenney, Benjiman L. Goodman, Terence M. Potter
  • Patent number: 10769746
    Abstract: A data queuing and format apparatus is disclosed. A first selection circuit may be configured to selectively couple a first subset of data to a first plurality of data lines dependent upon control information, and a second selection circuit may be configured to selectively couple a second subset of data to a second plurality of data lines dependent upon the control information. A storage array may include multiple storage units, and each storage unit may be configured to receive data from one or more data lines of either the first or second plurality of data lines dependent upon the control information.
    Type: Grant
    Filed: September 25, 2014
    Date of Patent: September 8, 2020
    Assignee: Apple Inc.
    Inventors: Liang Xia, Robert D. Kenney, Benjiman L. Goodman, Terence M. Potter
  • Patent number: 10503546
    Abstract: In general, techniques are disclosed for tracking and allocating graphics processor hardware over specified periods of time. More particularly, hardware sensors may be used to determine the utilization of graphics processor hardware after each of a number of specified intervals (referred to as “sample intervals”). The utilization values so captured may be combined after a first number of sample intervals (the combined interval referred to as an “epoch interval”) and used to determine a normalized utilization of the graphic processor's hardware resources. Normalized epoch utilization values have been adjusted to account for resources used by concurrently executing processes. In some embodiments, a lower priority process that obtains and fails to release resources that should be allocated to one or more higher priority processes may be detected, paused, and its hardware resources given to the higher priority processes.
    Type: Grant
    Filed: June 6, 2017
    Date of Patent: December 10, 2019
    Assignee: Apple Inc.
    Inventors: Tatsuya Iwamoto, Kutty Banerjee, Benjiman L. Goodman, Terence M. Potter
  • Publication number: 20190244323
    Abstract: Techniques are disclosed relating to processing groups of graphics work (which may be referred to as “kicks”) using a graphics processing pipeline. In some embodiments, a graphics processor includes multiple sets of configuration registers such that multiple kicks can be processed in the pipeline at the same time. In some embodiments, kicks are pipelined such that a subsequent kick ramps up use of hardware resources as a previous kick winds down. In some embodiments, the graphics processing may execute kicks concurrently and/or preemptively, e.g., based on a priority scheme. In some embodiments, the disclosed techniques may be used with pipelines that include front and back-end fixed function circuitry as well as shared programmable resources such as shader cores. In various embodiments, the disclosed techniques may improve overall performance and/or reduce latency for high-priority graphics tasks.
    Type: Application
    Filed: February 2, 2018
    Publication date: August 8, 2019
    Inventors: Benjiman L. Goodman, Christopher L. Spencer, Mark D. Earl, Robert S. Hartog, Timothy M. Kelley
  • Patent number: 10270434
    Abstract: A method and apparatus for saving power in integrated circuits is disclosed. An IC includes functional circuit blocks which are not placed into a sleep mode when idle. A power management circuit may monitor the activity levels of the functional circuit blocks not placed into a sleep mode. When the power management circuit detects that an activity level of one of the non-sleep functional circuit blocks is less than a predefined threshold, it reduce the frequency of a clock signal provided thereto by scheduling only one pulse of a clock signal for every N pulses of the full frequency clock signal. The remaining N?1 pulses of the clock signal may be inhibited. If a high priority transaction inbound for the functional circuit block is detected, an inserted pulse of the clock signal may be provided to the functional unit irrespective of when a most recent regular pulse was provided.
    Type: Grant
    Filed: February 18, 2016
    Date of Patent: April 23, 2019
    Assignee: Apple Inc.
    Inventors: James Wang, Benjiman L. Goodman, Liang-Kai Wang, Robert D. Kenney
  • Publication number: 20180349146
    Abstract: In general, techniques are disclosed for tracking and allocating graphics processor hardware over specified periods of time. More particularly, hardware sensors may be used to determine the utilization of graphics processor hardware after each of a number of specified intervals (referred to as “sample intervals”). The utilization values so captured may be combined after a first number of sample intervals (the combined interval referred to as an “epoch interval”) and used to determine a normalized utilization of the graphic processor's hardware resources. Normalized epoch utilization values have been adjusted to account for resources used by concurrently executing processes. In some embodiments, a lower priority process that obtains and fails to release resources that should be allocated to one or more higher priority processes may be detected, paused, and its hardware resources given to the higher priority processes.
    Type: Application
    Filed: June 6, 2017
    Publication date: December 6, 2018
    Inventors: Tatsuya Iwamoto, Kutty Banerjee, Benjiman L. Goodman, Terence M. Potter
  • Publication number: 20180173560
    Abstract: In various embodiments, hardware resources of a processing circuit may be allocated to a plurality of processes based on priorities of the processes. A hardware resource utilization sensor may detect a current utilization of the hardware resources by a process. A utilization accumulation circuit may determine a utilization of the hardware resources by the process over a particular amount of time. A target utilization of the hardware resources for the process may be determined based on the utilization of the hardware resources over the particular amount of time. A comparator circuit may compare the current utilization to the target utilization. A process priority adjustment circuit may adjust a priority of the process based on the comparison. Based on the adjusted priority, a different amount of hardware resources may be allocated to the processes.
    Type: Application
    Filed: December 21, 2016
    Publication date: June 21, 2018
    Inventors: Gokhan Avkarogullari, Terence M. Potter, Benjiman L. Goodman, Ralph C. Taylor, Kutty Banerjee
  • Patent number: 9811875
    Abstract: Techniques are disclosed relating to a cache configured to store state information for texture mapping. In one embodiment, a texture state cache includes a plurality of entries configured to store state information relating to one or more stored textures. In this embodiment, the texture state cache also includes texture processing circuitry configured to retrieve state information for one of the stored textures from one of the entries in the texture state cache and determine pixel attributes based on the texture and the retrieved state information. The state information may include texture state information and sampler state information, in some embodiments. The texture state cache may allow for reduced rending times and power consumption, in some embodiments.
    Type: Grant
    Filed: September 10, 2014
    Date of Patent: November 7, 2017
    Assignee: Apple Inc.
    Inventors: Benjiman L. Goodman, Adam T. Moerschell, James S. Blomgren