Patents by Inventor Yoong-Chert Foo

Yoong-Chert Foo has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11947462
    Abstract: Techniques are disclosed relating to cache footprint management. In some embodiments, execution circuitry is configured to perform operations for instructions from multiple threads in parallel. Cache circuitry may store information operated on by threads executed by the execution circuitry. Scheduling circuitry may arbitrate among threads to schedule threads for execution by the execution circuitry. Tracking circuitry may determine one or more performance metrics for the cache circuitry. Control circuitry may, based on the one or more performance metrics meeting a threshold, reduce a limit on a number of threads considered for arbitration by the scheduling circuitry, to control a footprint of information stored by the cache circuitry. Disclosed techniques may advantageously reduce or avoid cache thrashing for certain processor workloads.
    Type: Grant
    Filed: March 3, 2022
    Date of Patent: April 2, 2024
    Assignee: Apple Inc.
    Inventors: Yoong Chert Foo, Terence M. Potter, Donald R. DeSota, Benjiman L. Goodman, Aroun Demeure, Cheng Li, Winnie W. Yeung
  • Patent number: 11947999
    Abstract: A SIMD microprocessor is configured to execute programs divided into discrete phases. A scheduler is provided for scheduling instructions. A plurality of resources are for executing instructions issued by the scheduler, wherein the scheduler is configured to schedule each phase of the program only after receiving an indication that execution of the preceding phase of the program has been completed. By splitting programs into multiple phases and providing a scheduler that is able to determine whether execution of a phase has been completed, each phase can be separately scheduled and the results of preceding phases can be used to inform the scheduling of subsequent phases. In one example, different numbers of threads and/or different numbers of data instances per thread may be processed for different phases of the same program.
    Type: Grant
    Filed: February 29, 2020
    Date of Patent: April 2, 2024
    Assignee: Imagination Technologies Limited
    Inventor: Yoong Chert Foo
  • Patent number: 11941742
    Abstract: Techniques are disclosed relating to processor communications fabrics. In some embodiments, a processor includes multiple client circuitry and fabric circuitry that includes at least first and second instances of a tile. The tile may include: client inputs configured to interface with client circuits, tile inputs configured to interface with one or more other tile instances, and communication resources assignable to the client inputs and tile inputs. The communications resources may include: multiple internal links, client outputs configured to interface with client circuits, and tile outputs configured to interface with one or more other tile instances. Control circuitry may, in a given cycle, assign communication resources of a given tile instance to at least a portion of the client inputs and tile inputs for a next cycle, based on priority information. The control circuitry may update priority information based on assignment results over multiple cycles.
    Type: Grant
    Filed: June 23, 2022
    Date of Patent: March 26, 2024
    Assignee: Apple Inc.
    Inventors: Adam J. Smith, Sergio V. Tota, Christopher G. Martin, Yoong Chert Foo, Terence M. Potter, Max J. Batley
  • Publication number: 20240095176
    Abstract: Techniques are disclosed relating to thread preemption in the context of memory-backed registers. In some embodiments, a memory hierarchy includes one or more cache levels and one or more memory circuits. Execution circuitry may operate on operands in architectural registers to execute instructions of threads, where data for the architectural registers is stored and backed by the memory hierarchy. Control circuitry may, in response to a context switch indication for a given thread: flush and invalidate a set of architectural register data from a first cache level and store memory page information (e.g., a page catalog base address) associated with the set of architectural register data.
    Type: Application
    Filed: November 10, 2022
    Publication date: March 21, 2024
    Inventors: Benjiman L. Goodman, Yoong Chert Foo, Karl D. Mann, Terence M. Potter, Frank W. Liljeros, Jeffrey T. Brady
  • Publication number: 20240095975
    Abstract: A decoder decodes a plurality of texels from a received block of texture data encoded according to the Adaptive Scalable Texture Compression (ASTC) format. A parameter decode unit decodes configuration data for the received block of texture data, a colour decode unit decodes colour endpoint data for the plurality of texels in dependence on the configuration data, a weight decode unit decodes interpolation weight data for each of the plurality of texels in dependence on the configuration data, and at least one interpolator unit calculates a colour value for each of the plurality of texels using the interpolation weight data for that texel and a pair of colour endpoints from the colour endpoint data. At least one of the parameter decode unit, colour decode unit and weight decode unit decodes intermediate data from the received block that is common to the decoding of a subset of texels of that block and uses that decoded data as part of the decoding of at least two of the plurality of texels.
    Type: Application
    Filed: November 30, 2023
    Publication date: March 21, 2024
    Inventors: Kenneth Rovers, Yoong Chert Foo
  • Publication number: 20240095065
    Abstract: Techniques are disclosed relating to multi-stage thread scheduling. In some embodiments, processor circuitry includes multiple channel pipelines for multiple channels and multiple execution pipelines shared by the channel pipelines and configured to perform different types of operations provided by the channel pipelines. First scheduler circuitry may arbitrate among threads to assign threads to channels. Second scheduler circuitry may arbitrate among channels to assign an operation from a given channel to a given execution pipeline. The execution pipelines may provide backpressure information to the first scheduler circuitry based on execution status and the first scheduler circuitry may adjust priority of a thread for assignment to a channel based on the backpressure information. Disclosed techniques may reduce channel conflicts and starvation for execution resources.
    Type: Application
    Filed: November 10, 2022
    Publication date: March 21, 2024
    Inventors: Benjiman L. Goodman, Anjana Rajendran, Sheenam Jayaswal, Terence M. Potter, Yoong Chert Foo
  • Patent number: 11900122
    Abstract: Methods and parallel processing units for avoiding inter-pipeline data hazards identified at compile time. For each identified inter-pipeline data hazard the primary instruction and secondary instruction(s) thereof are identified as such and are linked by a counter which is used to track that inter-pipeline data hazard. When a primary instruction is output by the instruction decoder for execution the value of the counter associated therewith is adjusted to indicate that there is hazard related to the primary instruction, and when primary instruction has been resolved by one of multiple parallel processing pipelines the value of the counter associated therewith is adjusted to indicate that the hazard related to the primary instruction has been resolved.
    Type: Grant
    Filed: July 10, 2023
    Date of Patent: February 13, 2024
    Assignee: Imagination Technologies Limited
    Inventors: Luca Iuliano, Simon Nield, Yoong-Chert Foo, Ollie Mower
  • Publication number: 20240045808
    Abstract: Techniques are disclosed relating to dynamically allocating and mapping private memory for requesting circuitry. Disclosed circuitry may receive a private address and translate the private address to a virtual address (which an MMU may then translate to physical address to actually access a storage element). In some embodiments, private memory allocation circuitry is configured to generate page table information and map private memory pages for requests if the page table information is not already setup. In various embodiments, this may advantageously allow dynamic private memory allocation, e.g., to efficiently allocate memory for graphics shaders with different types of workloads. Disclosed caching techniques for page table information may improve performance relative to traditional techniques. Further, disclosed embodiments may facilitate memory consolidation across a device such as a graphics processor.
    Type: Application
    Filed: October 19, 2023
    Publication date: February 8, 2024
    Inventors: Justin A. Hensley, Karl D. Mann, Yoong Chert Foo, Terence M. Potter, Frank W. Liljeros, Ralph C. Taylor
  • Patent number: 11868807
    Abstract: A method of activating scheduling instructions within a parallel processing unit includes checking if an ALU targeted by a decoded instruction is full by checking a value of an ALU work fullness counter stored in the instruction controller and associated with the targeted ALU. If the targeted ALU is not full, the decoded instruction is sent to the targeted ALU for execution and the ALU work fullness counter associated with the targeted ALU is updated. If, however, the targeted ALU is full, a scheduler is triggered to de-activate the scheduled task by changing the scheduled task from the active state to a non-active state. When an ALU changes from being full to not being full, the scheduler is triggered to re-activate an oldest scheduled task waiting for the ALU by removing the oldest scheduled task from the non-active state.
    Type: Grant
    Filed: November 17, 2021
    Date of Patent: January 9, 2024
    Assignee: Imagination Technologies Limited
    Inventors: Simon Nield, Yoong-Chert Foo, Adam de Grasse, Luca Iuliano
  • Publication number: 20230419585
    Abstract: Techniques are disclosed relating to processor communications fabrics. In some embodiments, a processor includes multiple client circuitry and fabric circuitry that includes at least first and second instances of a tile. The tile may include: client inputs configured to interface with client circuits, tile inputs configured to interface with one or more other tile instances, and communication resources assignable to the client inputs and tile inputs. The communications resources may include: multiple internal links, client outputs configured to interface with client circuits, and tile outputs configured to interface with one or more other tile instances. Control circuitry may, in a given cycle, assign communication resources of a given tile instance to at least a portion of the client inputs and tile inputs for a next cycle, based on priority information. The control circuitry may update priority information based on assignment results over multiple cycles.
    Type: Application
    Filed: June 23, 2022
    Publication date: December 28, 2023
    Inventors: Adam J. Smith, Sergio V. Tota, Christopher G. Martin, Yoong Chert Foo, Terence M. Potter, Max J. Batley
  • Publication number: 20230394615
    Abstract: A SIMD processing unit processes a plurality of tasks which each include up to a predetermined maximum number of work items. The work items of a task are arranged for executing a common sequence of instructions on respective data items. The data items are arranged into blocks, with some of the blocks including at least one invalid data item. Work items which relate to invalid data items are invalid work items. The SIMD processing unit comprises a group of processing lanes configured to execute instructions of work items of a particular task over a plurality of processing cycles. A control module assembles work items into the tasks based on the validity of the work items, so that invalid work items of the particular task are temporally aligned across the processing lanes. In this way the number of wasted processing slots due to invalid work items may be reduced.
    Type: Application
    Filed: August 21, 2023
    Publication date: December 7, 2023
    Inventors: John Howson, Jonathan Redshaw, Yoong Chert Foo
  • Patent number: 11836830
    Abstract: A decoder is configured to decode a plurality of texels from a received block of texture data encoded according to the Adaptive Scalable Texture Compression (ASTC) format, and includes a parameter decode unit configured to decode configuration data for the received block of texture data, a colour decode unit configured to decode colour endpoint data for the plurality of texels of the received block in dependence on the configuration data, a weight decode unit configured to decode interpolation weight data for each of the plurality of texels of the received block in dependence on the configuration data, and at least one interpolator unit configured to calculate a colour value for each of the plurality of texels of the received block using the interpolation weight data for that texel and a pair of colour endpoints from the colour endpoint data.
    Type: Grant
    Filed: January 25, 2023
    Date of Patent: December 5, 2023
    Assignee: Imagination Technologies Limited
    Inventors: Kenneth Rovers, Yoong Chert Foo
  • Patent number: 11829298
    Abstract: Techniques are disclosed relating to dynamically allocating and mapping private memory for requesting circuitry. Disclosed circuitry may receive a private address and translate the private address to a virtual address (which an MMU may then translate to physical address to actually access a storage element). In some embodiments, private memory allocation circuitry is configured to generate page table information and map private memory pages for requests if the page table information is not already setup. In various embodiments, this may advantageously allow dynamic private memory allocation, e.g., to efficiently allocate memory for graphics shaders with different types of workloads. Disclosed caching techniques for page table information may improve performance relative to traditional techniques. Further, disclosed embodiments may facilitate memory consolidation across a device such as a graphics processor.
    Type: Grant
    Filed: February 28, 2020
    Date of Patent: November 28, 2023
    Assignee: Apple Inc.
    Inventors: Justin A. Hensley, Karl D. Mann, Yoong Chert Foo, Terence M. Potter, Frank W. Liljeros, Ralph C. Taylor
  • Publication number: 20230376348
    Abstract: A method of scheduling tasks within a GPU or other highly parallel processing unit is described which is both age-aware and wakeup event driven. Tasks which are received are added to an age-based task queue. Wakeup event bits for task types, or combinations of task types and data groups, are set in response to completion of a task dependency and these wakeup event bits are used to select an oldest task from the queue that satisfies predefined criteria.
    Type: Application
    Filed: July 31, 2023
    Publication date: November 23, 2023
    Inventors: Simon Nield, Adam de Grasse, Luca Iuliano, Ollie Mower, Yoong-Chert Foo
  • Publication number: 20230350689
    Abstract: Methods and parallel processing units for avoiding inter-pipeline data hazards identified at compile time. For each identified inter-pipeline data hazard the primary instruction and secondary instruction(s) thereof are identified as such and are linked by a counter which is used to track that inter-pipeline data hazard. When a primary instruction is output by the instruction decoder for execution the value of the counter associated therewith is adjusted to indicate that there is hazard related to the primary instruction, and when primary instruction has been resolved by one of multiple parallel processing pipelines the value of the counter associated therewith is adjusted to indicate that the hazard related to the primary instruction has been resolved.
    Type: Application
    Filed: July 10, 2023
    Publication date: November 2, 2023
    Inventors: Luca Iuliano, Simon Nield, Yoong-Chert Foo, Ollie Mower
  • Publication number: 20230297425
    Abstract: A resource allocator receives a memory resource request for first memory resources in respect of a first-received task of a workgroup having a plurality of tasks. In response to receiving the memory resource request, the resource allocator allocates to the entire workgroup a block of memory portions of a shared memory that is sufficient in size for each task of the workgroup to receive memory resources in the block equivalent to the first memory resources.
    Type: Application
    Filed: May 23, 2023
    Publication date: September 21, 2023
    Inventors: Luca Iuliano, Simon Nield, Yoong-Chert Foo, Ollie Mower, Jonathan Redshaw
  • Patent number: 11734788
    Abstract: A SIMD processing unit processes a plurality of tasks which each include up to a predetermined maximum number of work items. The work items of a task are arranged for executing a common sequence of instructions on respective data items. The data items are arranged into blocks, with some of the blocks including at least one invalid data item. Work items which relate to invalid data items are invalid work items. The SIMD processing unit comprises a group of processing lanes configured to execute instructions of work items of a particular task over a plurality of processing cycles. A control module assembles work items into the tasks based on the validity of the work items, so that invalid work items of the particular task are temporally aligned across the processing lanes. In this way the number of wasted processing slots due to invalid work items may be reduced.
    Type: Grant
    Filed: October 29, 2021
    Date of Patent: August 22, 2023
    Assignee: Imagination Technologies Limited
    Inventors: John Howson, Jonathan Redshaw, Yoong Chert Foo
  • Patent number: 11720399
    Abstract: A method of scheduling tasks within a GPU or other highly parallel processing unit is described which is both age-aware and wakeup event driven. Tasks which are received are added to an age-based task queue. Wakeup event bits for task types, or combinations of task types and data groups, are set in response to completion of a task dependency and these wakeup event bits are used to select an oldest task from the queue that satisfies predefined criteria.
    Type: Grant
    Filed: December 1, 2021
    Date of Patent: August 8, 2023
    Assignee: Imagination Technologies Limited
    Inventors: Simon Nield, Adam de Grasse, Luca Iuliano, Ollie Mower, Yoong-Chert Foo
  • Patent number: 11698790
    Abstract: Methods and parallel processing units for avoiding inter-pipeline data hazards identified at compile time. For each identified inter-pipeline data hazard the primary instruction and secondary instruction(s) thereof are identified as such and are linked by a counter which is used to track that inter-pipeline data hazard. When a primary instruction is output by the instruction decoder for execution the value of the counter associated therewith is adjusted to indicate that there is hazard related to the primary instruction, and when primary instruction has been resolved by one of multiple parallel processing pipelines the value of the counter associated therewith is adjusted to indicate that the hazard related to the primary instruction has been resolved.
    Type: Grant
    Filed: November 10, 2021
    Date of Patent: July 11, 2023
    Assignee: Imagination Technologies Limited
    Inventors: Luca Iuliano, Simon Nield, Yoong-Chert Foo, Ollie Mower
  • Publication number: 20230169701
    Abstract: A decoder is configured to decode a plurality of texels from a received block of texture data encoded according to the Adaptive Scalable Texture Compression (ASTC) format, and includes a parameter decode unit configured to decode configuration data for the received block of texture data, a colour decode unit configured to decode colour endpoint data for the plurality of texels of the received block in dependence on the configuration data, a weight decode unit configured to decode interpolation weight data for each of the plurality of texels of the received block in dependence on the configuration data, and at least one interpolator unit configured to calculate a colour value for each of the plurality of texels of the received block using the interpolation weight data for that texel and a pair of colour endpoints from the colour endpoint data.
    Type: Application
    Filed: January 25, 2023
    Publication date: June 1, 2023
    Inventors: Kenneth Rovers, Yoong Chert Foo