Patents by Inventor Yoong-Chert Foo

Yoong-Chert Foo has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20250147803
    Abstract: A method of synchronizing a group of scheduled tasks within a parallel processing unit into a known state is described. The method uses a synchronization instruction in a scheduled task which triggers, in response to decoding of the instruction, an instruction decoder to place the scheduled task into a non-active state and forward the decoded synchronization instruction to an atomic ALU for execution. When the atomic ALU executes the decoded synchronization instruction, the atomic ALU performs an operation and check on data assigned to the group ID of the scheduled task and if the check is passed, all scheduled tasks having the particular group ID are removed from the non-active state.
    Type: Application
    Filed: January 12, 2025
    Publication date: May 8, 2025
    Inventors: Ollie Mower, Yoong-Chert Foo
  • Publication number: 20250124643
    Abstract: Techniques are disclosed relating to processor communications fabrics. In some embodiments, a processor includes multiple client circuitry and fabric circuitry that includes at least first and second instances of a tile. The tile may include: client inputs configured to interface with client circuits, tile inputs configured to interface with one or more other tile instances, and communication resources that include internal links and outputs. The communication resources may be assignable to the client inputs and output and may include resources to satisfy all requests, in a given cycle, from one or more non-stallable channels supported by the apparatus. The communication resources may also include additional resources allocable, in a given cycle, to one or more stallable channels supported by the apparatus. Control circuitry may, in a given cycle, assign communication resources of a given tile instance to at least a portion of the client inputs and tile inputs for a next cycle, based on priority information.
    Type: Application
    Filed: December 20, 2024
    Publication date: April 17, 2025
    Inventors: Adam J. Smith, Sergio V. Tota, Christopher G. Martin, Yoong Chert Foo, Terence M. Potter, Max J. Batley
  • Patent number: 12265474
    Abstract: Techniques are disclosed relating to dynamically allocating and mapping private memory for requesting circuitry. Disclosed circuitry may receive a private address and translate the private address to a virtual address (which an MMU may then translate to physical address to actually access a storage element). In some embodiments, private memory allocation circuitry is configured to generate page table information and map private memory pages for requests if the page table information is not already setup. In various embodiments, this may advantageously allow dynamic private memory allocation, e.g., to efficiently allocate memory for graphics shaders with different types of workloads. Disclosed caching techniques for page table information may improve performance relative to traditional techniques. Further, disclosed embodiments may facilitate memory consolidation across a device such as a graphics processor.
    Type: Grant
    Filed: October 19, 2023
    Date of Patent: April 1, 2025
    Assignee: Apple Inc.
    Inventors: Justin A. Hensley, Karl D. Mann, Yoong Chert Foo, Terence M. Potter, Frank W. Liljeros, Ralph C. Taylor
  • Publication number: 20250094357
    Abstract: Techniques are disclosed relating to eviction control for cache lines that store register data. In some embodiments, memory hierarchy circuitry is configured to provide memory backing for register operand data in one or more cache circuits. Lock circuitry may control a first set of lock indicators for a set of registers for a first thread, including to assert one or more lock indicators for registers that are indicated, by decode circuitry, as being utilized by decoded instructions of the first thread. The lock circuitry may preserve register operand data in the one or more cache circuits, including to prevent eviction of a given cache line from a cache circuit based on an asserted lock indicator. The lock circuitry may clear the first set of lock indicators in response to a reset event. Disclosed techniques may advantageously retain relevant register information in the cache with limited control circuit area.
    Type: Application
    Filed: November 27, 2024
    Publication date: March 20, 2025
    Inventors: Jonathan M. Redshaw, Winnie W. Yeung, Benjiman L. Goodman, David K. Li, Zelin Zhang, Yoong Chert Foo
  • Publication number: 20250061657
    Abstract: A method and system for generating two or three dimensional computer graphics images using multisample antialiasing (MSAA) is provided, which enables memory bandwidth to be conserved. For each of one or more pixels it is determined whether all of a plurality of sample areas of that pixel are located within a particular primitive. For those pixels where it is determined that all the sample areas of that pixel are located within that primitive, a value is stored in a multisample memory for a smaller number of the sample areas of that pixel than the total number of the sample areas of that pixel and data is stored indicating that all the sample areas of that pixel are located within that primitive.
    Type: Application
    Filed: October 31, 2024
    Publication date: February 20, 2025
    Inventors: Yoong Chert Foo, Salil Sahasrabudhe, Andrew Davy
  • Publication number: 20250061536
    Abstract: A SIMD processing unit processes a plurality of tasks which each include up to a predetermined maximum number of work items. The work items of a task are arranged for executing a common sequence of instructions on respective data items. The data items are arranged into blocks, with some of the blocks including at least one invalid data item. Work items which relate to invalid data items are invalid work items. The SIMD processing unit comprises a group of processing lanes configured to execute instructions of work items of a particular task over a plurality of processing cycles. A control module assembles work items into the tasks based on the validity of the work items, so that invalid work items of the particular task are temporally aligned across the processing lanes. In this way the number of wasted processing slots due to invalid work items may be reduced.
    Type: Application
    Filed: October 7, 2024
    Publication date: February 20, 2025
    Inventors: John Howson, Jonathan Redshaw, Yoong Chert Foo
  • Patent number: 12229593
    Abstract: A method of synchronizing a group of scheduled tasks within a parallel processing unit into a known state is described. The method uses a synchronization instruction in a scheduled task which triggers, in response to decoding of the instruction, an instruction decoder to place the scheduled task into a non-active state and forward the decoded synchronization instruction to an atomic ALU for execution. When the atomic ALU executes the decoded synchronization instruction, the atomic ALU performs an operation and check on data assigned to the group ID of the scheduled task and if the check is passed, all scheduled tasks having the particular group ID are removed from the non-active state.
    Type: Grant
    Filed: October 13, 2022
    Date of Patent: February 18, 2025
    Assignee: Imagination Technologies Limited
    Inventors: Ollie Mower, Yoong-Chert Foo
  • Publication number: 20250021340
    Abstract: Methods and parallel processing units for avoiding inter-pipeline data hazards identified at compile time. For each identified inter-pipeline data hazard the primary instruction and secondary instruction(s) thereof are identified as such and are linked by a counter which is used to track that inter-pipeline data hazard. When a primary instruction is output by the instruction decoder for execution the value of the counter associated therewith is adjusted to indicate that there is hazard related to the primary instruction, and when primary instruction has been resolved by one of multiple parallel processing pipelines the value of the counter associated therewith is adjusted to indicate that the hazard related to the primary instruction has been resolved.
    Type: Application
    Filed: February 12, 2024
    Publication date: January 16, 2025
    Inventors: Luca Iuliano, Simon Nield, Yoong-Chert Foo, Ollie Mower
  • Patent number: 12190151
    Abstract: Techniques are disclosed relating to multi-stage thread scheduling. In some embodiments, processor circuitry includes multiple channel pipelines for multiple channels and multiple execution pipelines shared by the channel pipelines and configured to perform different types of operations provided by the channel pipelines. First scheduler circuitry may arbitrate among threads to assign threads to channels. Second scheduler circuitry may arbitrate among channels to assign an operation from a given channel to a given execution pipeline. The execution pipelines may provide backpressure information to the first scheduler circuitry based on execution status and the first scheduler circuitry may adjust priority of a thread for assignment to a channel based on the backpressure information. Disclosed techniques may reduce channel conflicts and starvation for execution resources.
    Type: Grant
    Filed: November 10, 2022
    Date of Patent: January 7, 2025
    Assignee: Apple Inc.
    Inventors: Benjiman L. Goodman, Anjana Rajendran, Sheenam Jayaswal, Terence M. Potter, Yoong Chert Foo
  • Patent number: 12182037
    Abstract: Techniques are disclosed relating to eviction control for cache lines that store register data. In some embodiments, memory hierarchy circuitry is configured to provide memory backing for register operand data in one or more cache circuits. Lock circuitry may control a first set of lock indicators for a set of registers for a first thread, including to assert one or more lock indicators for registers that are indicated, by decode circuitry, as being utilized by decoded instructions of the first thread. The lock circuitry may preserve register operand data in the one or more cache circuits, including to prevent eviction of a given cache line from a cache circuit based on an asserted lock indicator. The lock circuitry may clear the first set of lock indicators in response to a reset event. Disclosed techniques may advantageously retain relevant register information in the cache with limited control circuit area.
    Type: Grant
    Filed: February 23, 2023
    Date of Patent: December 31, 2024
    Assignee: Apple Inc.
    Inventors: Jonathan M. Redshaw, Winnie W. Yeung, Benjiman L. Goodman, David K. Li, Zelin Zhang, Yoong Chert Foo
  • Patent number: 12159347
    Abstract: A method and system for generating two or three dimensional computer graphics images using multisample antialiasing (MSAA) is provided, which enables memory bandwidth to be conserved. For each of one or more pixels it is determined whether all of a plurality of sample areas of that pixel are located within a particular primitive. For those pixels where it is determined that all the sample areas of that pixel are located within that primitive, a value is stored in a multisample memory for a smaller number of the sample areas of that pixel than the total number of the sample areas of that pixel and data is stored indicating that all the sample areas of that pixel are located within that primitive.
    Type: Grant
    Filed: June 28, 2022
    Date of Patent: December 3, 2024
    Assignee: Imagination Technologies Limited
    Inventors: Yoong Chert Foo, Salil Sahasrabudhe, Andrew Davy
  • Patent number: 12112396
    Abstract: A SIMD processing unit processes a plurality of tasks which each include up to a predetermined maximum number of work items. The work items of a task are arranged for executing a common sequence of instructions on respective data items. The data items are arranged into blocks, with some of the blocks including at least one invalid data item. Work items which relate to invalid data items are invalid work items. The SIMD processing unit comprises a group of processing lanes configured to execute instructions of work items of a particular task over a plurality of processing cycles. A control module assembles work items into the tasks based on the validity of the work items, so that invalid work items of the particular task are temporally aligned across the processing lanes. In this way the number of wasted processing slots due to invalid work items may be reduced.
    Type: Grant
    Filed: August 21, 2023
    Date of Patent: October 8, 2024
    Assignee: Imagination Technologies Limited
    Inventors: John Howson, Jonathan Redshaw, Yoong Chert Foo
  • Publication number: 20240311187
    Abstract: A method of scheduling instructions within a parallel processing unit is described. The method comprises decoding, in an instruction decoder, an instruction in a scheduled task in an active state, and checking, by an instruction controller, if an ALU targeted by the decoded instruction is a primary instruction pipeline. If the targeted ALU is a primary instruction pipeline, a list associated with the primary instruction pipeline is checked to determine whether the scheduled task is already included in the list. If the scheduled task is already included in the list, the decoded instruction is sent to the primary instruction pipeline.
    Type: Application
    Filed: May 28, 2024
    Publication date: September 19, 2024
    Inventors: Simon Nield, Yoong-Chert Foo, Adam de Grasse, Luca Iuliano
  • Publication number: 20240289282
    Abstract: Techniques are disclosed relating to eviction control for cache lines that store register data. In some embodiments, memory hierarchy circuitry is configured to provide memory backing for register operand data in one or more cache circuits. Lock circuitry may control a first set of lock indicators for a set of registers for a first thread, including to assert one or more lock indicators for registers that are indicated, by decode circuitry, as being utilized by decoded instructions of the first thread. The lock circuitry may preserve register operand data in the one or more cache circuits, including to prevent eviction of a given cache line from a cache circuit based on an asserted lock indicator. The lock circuitry may clear the first set of lock indicators in response to a reset event. Disclosed techniques may advantageously retain relevant register information in the cache with limited control circuit area.
    Type: Application
    Filed: February 23, 2023
    Publication date: August 29, 2024
    Inventors: Jonathan M. Redshaw, Winnie W. Yeung, Benjiman L. Goodman, David K. Li, Zelin Zhang, Yoong Chert Foo
  • Publication number: 20240241751
    Abstract: A SIMD microprocessor is configured to execute programs divided into discrete phases. A scheduler is provided for scheduling instructions. A plurality of resources are for executing instructions issued by the scheduler, wherein the scheduler is configured to schedule each phase of the program only after receiving an indication that execution of the preceding phase of the program has been completed. By splitting programs into multiple phases and providing a scheduler that is able to determine whether execution of a phase has been completed, each phase can be separately scheduled and the results of preceding phases can be used to inform the scheduling of subsequent phases. In one example, different numbers of threads and/or different numbers of data instances per thread may be processed for different phases of the same program.
    Type: Application
    Filed: April 1, 2024
    Publication date: July 18, 2024
    Inventor: Yoong Chert Foo
  • Patent number: 12020067
    Abstract: A method of scheduling instructions within a parallel processing unit is described. The method comprises decoding, in an instruction decoder, an instruction in a scheduled task in an active state, and checking, by an instruction controller, if an ALU targeted by the decoded instruction is a primary instruction pipeline. If the targeted ALU is a primary instruction pipeline, a list associated with the primary instruction pipeline is checked to determine whether the scheduled task is already included in the list. If the scheduled task is already included in the list, the decoded instruction is sent to the primary instruction pipeline.
    Type: Grant
    Filed: May 17, 2022
    Date of Patent: June 25, 2024
    Assignee: Imagination Technologies Limited
    Inventors: Simon Nield, Yoong-Chert Foo, Adam de Grasse, Luca Iuliano
  • Publication number: 20240160472
    Abstract: A method of activating scheduling instructions within a parallel processing unit includes checking if an ALU targeted by a decoded instruction is full by checking a value of an ALU work fullness counter stored in the instruction controller and associated with the targeted ALU. If the targeted ALU is not full, the decoded instruction is sent to the targeted ALU for execution and the ALU work fullness counter associated with the targeted ALU is updated. If, however, the targeted ALU is full, a scheduler is triggered to de-activate the scheduled task by changing the scheduled task from the active state to a non-active state. When an ALU changes from being full to not being full, the scheduler is triggered to re-activate an oldest scheduled task waiting for the ALU by removing the oldest scheduled task from the non-active state.
    Type: Application
    Filed: January 8, 2024
    Publication date: May 16, 2024
    Inventors: Simon Nield, Yoong-Chert Foo, Adam de Grasse, Luca Iuliano
  • Patent number: 11947462
    Abstract: Techniques are disclosed relating to cache footprint management. In some embodiments, execution circuitry is configured to perform operations for instructions from multiple threads in parallel. Cache circuitry may store information operated on by threads executed by the execution circuitry. Scheduling circuitry may arbitrate among threads to schedule threads for execution by the execution circuitry. Tracking circuitry may determine one or more performance metrics for the cache circuitry. Control circuitry may, based on the one or more performance metrics meeting a threshold, reduce a limit on a number of threads considered for arbitration by the scheduling circuitry, to control a footprint of information stored by the cache circuitry. Disclosed techniques may advantageously reduce or avoid cache thrashing for certain processor workloads.
    Type: Grant
    Filed: March 3, 2022
    Date of Patent: April 2, 2024
    Assignee: Apple Inc.
    Inventors: Yoong Chert Foo, Terence M. Potter, Donald R. DeSota, Benjiman L. Goodman, Aroun Demeure, Cheng Li, Winnie W. Yeung
  • Patent number: 11947999
    Abstract: A SIMD microprocessor is configured to execute programs divided into discrete phases. A scheduler is provided for scheduling instructions. A plurality of resources are for executing instructions issued by the scheduler, wherein the scheduler is configured to schedule each phase of the program only after receiving an indication that execution of the preceding phase of the program has been completed. By splitting programs into multiple phases and providing a scheduler that is able to determine whether execution of a phase has been completed, each phase can be separately scheduled and the results of preceding phases can be used to inform the scheduling of subsequent phases. In one example, different numbers of threads and/or different numbers of data instances per thread may be processed for different phases of the same program.
    Type: Grant
    Filed: February 29, 2020
    Date of Patent: April 2, 2024
    Assignee: Imagination Technologies Limited
    Inventor: Yoong Chert Foo
  • Patent number: 11941742
    Abstract: Techniques are disclosed relating to processor communications fabrics. In some embodiments, a processor includes multiple client circuitry and fabric circuitry that includes at least first and second instances of a tile. The tile may include: client inputs configured to interface with client circuits, tile inputs configured to interface with one or more other tile instances, and communication resources assignable to the client inputs and tile inputs. The communications resources may include: multiple internal links, client outputs configured to interface with client circuits, and tile outputs configured to interface with one or more other tile instances. Control circuitry may, in a given cycle, assign communication resources of a given tile instance to at least a portion of the client inputs and tile inputs for a next cycle, based on priority information. The control circuitry may update priority information based on assignment results over multiple cycles.
    Type: Grant
    Filed: June 23, 2022
    Date of Patent: March 26, 2024
    Assignee: Apple Inc.
    Inventors: Adam J. Smith, Sergio V. Tota, Christopher G. Martin, Yoong Chert Foo, Terence M. Potter, Max J. Batley