Patents by Inventor Yoong-Chert Foo

Yoong-Chert Foo has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Cache footprint management

Patent number: 11947462

Abstract: Techniques are disclosed relating to cache footprint management. In some embodiments, execution circuitry is configured to perform operations for instructions from multiple threads in parallel. Cache circuitry may store information operated on by threads executed by the execution circuitry. Scheduling circuitry may arbitrate among threads to schedule threads for execution by the execution circuitry. Tracking circuitry may determine one or more performance metrics for the cache circuitry. Control circuitry may, based on the one or more performance metrics meeting a threshold, reduce a limit on a number of threads considered for arbitration by the scheduling circuitry, to control a footprint of information stored by the cache circuitry. Disclosed techniques may advantageously reduce or avoid cache thrashing for certain processor workloads.

Type: Grant

Filed: March 3, 2022

Date of Patent: April 2, 2024

Assignee: Apple Inc.

Inventors: Yoong Chert Foo, Terence M. Potter, Donald R. DeSota, Benjiman L. Goodman, Aroun Demeure, Cheng Li, Winnie W. Yeung
Multi-phased and multi-threaded program execution based on SIMD ratio

Patent number: 11947999

Abstract: A SIMD microprocessor is configured to execute programs divided into discrete phases. A scheduler is provided for scheduling instructions. A plurality of resources are for executing instructions issued by the scheduler, wherein the scheduler is configured to schedule each phase of the program only after receiving an indication that execution of the preceding phase of the program has been completed. By splitting programs into multiple phases and providing a scheduler that is able to determine whether execution of a phase has been completed, each phase can be separately scheduled and the results of preceding phases can be used to inform the scheduling of subsequent phases. In one example, different numbers of threads and/or different numbers of data instances per thread may be processed for different phases of the same program.

Type: Grant

Filed: February 29, 2020

Date of Patent: April 2, 2024

Assignee: Imagination Technologies Limited

Inventor: Yoong Chert Foo
Tiled processor communication fabric

Patent number: 11941742

Abstract: Techniques are disclosed relating to processor communications fabrics. In some embodiments, a processor includes multiple client circuitry and fabric circuitry that includes at least first and second instances of a tile. The tile may include: client inputs configured to interface with client circuits, tile inputs configured to interface with one or more other tile instances, and communication resources assignable to the client inputs and tile inputs. The communications resources may include: multiple internal links, client outputs configured to interface with client circuits, and tile outputs configured to interface with one or more other tile instances. Control circuitry may, in a given cycle, assign communication resources of a given tile instance to at least a portion of the client inputs and tile inputs for a next cycle, based on priority information. The control circuitry may update priority information based on assignment results over multiple cycles.

Type: Grant

Filed: June 23, 2022

Date of Patent: March 26, 2024

Assignee: Apple Inc.

Inventors: Adam J. Smith, Sergio V. Tota, Christopher G. Martin, Yoong Chert Foo, Terence M. Potter, Max J. Batley
Preemption Techniques for Memory-Backed Registers

Publication number: 20240095176

Abstract: Techniques are disclosed relating to thread preemption in the context of memory-backed registers. In some embodiments, a memory hierarchy includes one or more cache levels and one or more memory circuits. Execution circuitry may operate on operands in architectural registers to execute instructions of threads, where data for the architectural registers is stored and backed by the memory hierarchy. Control circuitry may, in response to a context switch indication for a given thread: flush and invalidate a set of architectural register data from a first cache level and store memory page information (e.g., a page catalog base address) associated with the set of architectural register data.

Type: Application

Filed: November 10, 2022

Publication date: March 21, 2024

Inventors: Benjiman L. Goodman, Yoong Chert Foo, Karl D. Mann, Terence M. Potter, Frank W. Liljeros, Jeffrey T. Brady
MULTI-OUTPUT DECODER FOR TEXTURE DECOMPRESSION

Publication number: 20240095975

Abstract: A decoder decodes a plurality of texels from a received block of texture data encoded according to the Adaptive Scalable Texture Compression (ASTC) format. A parameter decode unit decodes configuration data for the received block of texture data, a colour decode unit decodes colour endpoint data for the plurality of texels in dependence on the configuration data, a weight decode unit decodes interpolation weight data for each of the plurality of texels in dependence on the configuration data, and at least one interpolator unit calculates a colour value for each of the plurality of texels using the interpolation weight data for that texel and a pair of colour endpoints from the colour endpoint data. At least one of the parameter decode unit, colour decode unit and weight decode unit decodes intermediate data from the received block that is common to the decoding of a subset of texels of that block and uses that decoded data as part of the decoding of at least two of the plurality of texels.

Type: Application

Filed: November 30, 2023

Publication date: March 21, 2024

Inventors: Kenneth Rovers, Yoong Chert Foo
Multi-stage Thread Scheduling

Publication number: 20240095065

Abstract: Techniques are disclosed relating to multi-stage thread scheduling. In some embodiments, processor circuitry includes multiple channel pipelines for multiple channels and multiple execution pipelines shared by the channel pipelines and configured to perform different types of operations provided by the channel pipelines. First scheduler circuitry may arbitrate among threads to assign threads to channels. Second scheduler circuitry may arbitrate among channels to assign an operation from a given channel to a given execution pipeline. The execution pipelines may provide backpressure information to the first scheduler circuitry based on execution status and the first scheduler circuitry may adjust priority of a thread for assignment to a channel based on the backpressure information. Disclosed techniques may reduce channel conflicts and starvation for execution resources.

Type: Application

Filed: November 10, 2022

Publication date: March 21, 2024

Inventors: Benjiman L. Goodman, Anjana Rajendran, Sheenam Jayaswal, Terence M. Potter, Yoong Chert Foo
Methods and systems for inter-pipeline data hazard avoidance

Patent number: 11900122

Abstract: Methods and parallel processing units for avoiding inter-pipeline data hazards identified at compile time. For each identified inter-pipeline data hazard the primary instruction and secondary instruction(s) thereof are identified as such and are linked by a counter which is used to track that inter-pipeline data hazard. When a primary instruction is output by the instruction decoder for execution the value of the counter associated therewith is adjusted to indicate that there is hazard related to the primary instruction, and when primary instruction has been resolved by one of multiple parallel processing pipelines the value of the counter associated therewith is adjusted to indicate that the hazard related to the primary instruction has been resolved.

Type: Grant

Filed: July 10, 2023

Date of Patent: February 13, 2024

Assignee: Imagination Technologies Limited

Inventors: Luca Iuliano, Simon Nield, Yoong-Chert Foo, Ollie Mower
On-demand Memory Allocation

Publication number: 20240045808

Abstract: Techniques are disclosed relating to dynamically allocating and mapping private memory for requesting circuitry. Disclosed circuitry may receive a private address and translate the private address to a virtual address (which an MMU may then translate to physical address to actually access a storage element). In some embodiments, private memory allocation circuitry is configured to generate page table information and map private memory pages for requests if the page table information is not already setup. In various embodiments, this may advantageously allow dynamic private memory allocation, e.g., to efficiently allocate memory for graphics shaders with different types of workloads. Disclosed caching techniques for page table information may improve performance relative to traditional techniques. Further, disclosed embodiments may facilitate memory consolidation across a device such as a graphics processor.

Type: Application

Filed: October 19, 2023

Publication date: February 8, 2024

Inventors: Justin A. Hensley, Karl D. Mann, Yoong Chert Foo, Terence M. Potter, Frank W. Liljeros, Ralph C. Taylor
Scheduling tasks using work fullness counter

Patent number: 11868807

Abstract: A method of activating scheduling instructions within a parallel processing unit includes checking if an ALU targeted by a decoded instruction is full by checking a value of an ALU work fullness counter stored in the instruction controller and associated with the targeted ALU. If the targeted ALU is not full, the decoded instruction is sent to the targeted ALU for execution and the ALU work fullness counter associated with the targeted ALU is updated. If, however, the targeted ALU is full, a scheduler is triggered to de-activate the scheduled task by changing the scheduled task from the active state to a non-active state. When an ALU changes from being full to not being full, the scheduler is triggered to re-activate an oldest scheduled task waiting for the ALU by removing the oldest scheduled task from the non-active state.

Type: Grant

Filed: November 17, 2021

Date of Patent: January 9, 2024

Assignee: Imagination Technologies Limited

Inventors: Simon Nield, Yoong-Chert Foo, Adam de Grasse, Luca Iuliano
Tiled Processor Communication Fabric

Publication number: 20230419585

Abstract: Techniques are disclosed relating to processor communications fabrics. In some embodiments, a processor includes multiple client circuitry and fabric circuitry that includes at least first and second instances of a tile. The tile may include: client inputs configured to interface with client circuits, tile inputs configured to interface with one or more other tile instances, and communication resources assignable to the client inputs and tile inputs. The communications resources may include: multiple internal links, client outputs configured to interface with client circuits, and tile outputs configured to interface with one or more other tile instances. Control circuitry may, in a given cycle, assign communication resources of a given tile instance to at least a portion of the client inputs and tile inputs for a next cycle, based on priority information. The control circuitry may update priority information based on assignment results over multiple cycles.

Type: Application

Filed: June 23, 2022

Publication date: December 28, 2023

Inventors: Adam J. Smith, Sergio V. Tota, Christopher G. Martin, Yoong Chert Foo, Terence M. Potter, Max J. Batley
TASK EXECUTION IN A SIMD PROCESSING UNIT WITH PARALLEL GROUPS OF PROCESSING LANES

Publication number: 20230394615

Abstract: A SIMD processing unit processes a plurality of tasks which each include up to a predetermined maximum number of work items. The work items of a task are arranged for executing a common sequence of instructions on respective data items. The data items are arranged into blocks, with some of the blocks including at least one invalid data item. Work items which relate to invalid data items are invalid work items. The SIMD processing unit comprises a group of processing lanes configured to execute instructions of work items of a particular task over a plurality of processing cycles. A control module assembles work items into the tasks based on the validity of the work items, so that invalid work items of the particular task are temporally aligned across the processing lanes. In this way the number of wasted processing slots due to invalid work items may be reduced.

Type: Application

Filed: August 21, 2023

Publication date: December 7, 2023

Inventors: John Howson, Jonathan Redshaw, Yoong Chert Foo
Multi-output decoder for texture decompression

Patent number: 11836830

Abstract: A decoder is configured to decode a plurality of texels from a received block of texture data encoded according to the Adaptive Scalable Texture Compression (ASTC) format, and includes a parameter decode unit configured to decode configuration data for the received block of texture data, a colour decode unit configured to decode colour endpoint data for the plurality of texels of the received block in dependence on the configuration data, a weight decode unit configured to decode interpolation weight data for each of the plurality of texels of the received block in dependence on the configuration data, and at least one interpolator unit configured to calculate a colour value for each of the plurality of texels of the received block using the interpolation weight data for that texel and a pair of colour endpoints from the colour endpoint data.

Type: Grant

Filed: January 25, 2023

Date of Patent: December 5, 2023

Assignee: Imagination Technologies Limited

Inventors: Kenneth Rovers, Yoong Chert Foo
On-demand memory allocation

Patent number: 11829298

Abstract: Techniques are disclosed relating to dynamically allocating and mapping private memory for requesting circuitry. Disclosed circuitry may receive a private address and translate the private address to a virtual address (which an MMU may then translate to physical address to actually access a storage element). In some embodiments, private memory allocation circuitry is configured to generate page table information and map private memory pages for requests if the page table information is not already setup. In various embodiments, this may advantageously allow dynamic private memory allocation, e.g., to efficiently allocate memory for graphics shaders with different types of workloads. Disclosed caching techniques for page table information may improve performance relative to traditional techniques. Further, disclosed embodiments may facilitate memory consolidation across a device such as a graphics processor.

Type: Grant

Filed: February 28, 2020

Date of Patent: November 28, 2023

Assignee: Apple Inc.

Inventors: Justin A. Hensley, Karl D. Mann, Yoong Chert Foo, Terence M. Potter, Frank W. Liljeros, Ralph C. Taylor
Task Scheduling in a GPU Using Wakeup Event State Data

Publication number: 20230376348

Abstract: A method of scheduling tasks within a GPU or other highly parallel processing unit is described which is both age-aware and wakeup event driven. Tasks which are received are added to an age-based task queue. Wakeup event bits for task types, or combinations of task types and data groups, are set in response to completion of a task dependency and these wakeup event bits are used to select an oldest task from the queue that satisfies predefined criteria.

Type: Application

Filed: July 31, 2023

Publication date: November 23, 2023

Inventors: Simon Nield, Adam de Grasse, Luca Iuliano, Ollie Mower, Yoong-Chert Foo
METHODS AND SYSTEMS FOR INTER-PIPELINE DATA HAZARD AVOIDANCE

Publication number: 20230350689

Abstract: Methods and parallel processing units for avoiding inter-pipeline data hazards identified at compile time. For each identified inter-pipeline data hazard the primary instruction and secondary instruction(s) thereof are identified as such and are linked by a counter which is used to track that inter-pipeline data hazard. When a primary instruction is output by the instruction decoder for execution the value of the counter associated therewith is adjusted to indicate that there is hazard related to the primary instruction, and when primary instruction has been resolved by one of multiple parallel processing pipelines the value of the counter associated therewith is adjusted to indicate that the hazard related to the primary instruction has been resolved.

Type: Application

Filed: July 10, 2023

Publication date: November 2, 2023

Inventors: Luca Iuliano, Simon Nield, Yoong-Chert Foo, Ollie Mower
ALLOCATION OF MEMORY RESOURCES TO SIMD WORKGROUPS

Publication number: 20230297425

Abstract: A resource allocator receives a memory resource request for first memory resources in respect of a first-received task of a workgroup having a plurality of tasks. In response to receiving the memory resource request, the resource allocator allocates to the entire workgroup a block of memory portions of a shared memory that is sufficient in size for each task of the workgroup to receive memory resources in the block equivalent to the first memory resources.

Type: Application

Filed: May 23, 2023

Publication date: September 21, 2023

Inventors: Luca Iuliano, Simon Nield, Yoong-Chert Foo, Ollie Mower, Jonathan Redshaw
Task execution in a SIMD processing unit with parallel groups of processing lanes

Patent number: 11734788

Abstract: A SIMD processing unit processes a plurality of tasks which each include up to a predetermined maximum number of work items. The work items of a task are arranged for executing a common sequence of instructions on respective data items. The data items are arranged into blocks, with some of the blocks including at least one invalid data item. Work items which relate to invalid data items are invalid work items. The SIMD processing unit comprises a group of processing lanes configured to execute instructions of work items of a particular task over a plurality of processing cycles. A control module assembles work items into the tasks based on the validity of the work items, so that invalid work items of the particular task are temporally aligned across the processing lanes. In this way the number of wasted processing slots due to invalid work items may be reduced.

Type: Grant

Filed: October 29, 2021

Date of Patent: August 22, 2023

Assignee: Imagination Technologies Limited

Inventors: John Howson, Jonathan Redshaw, Yoong Chert Foo
Task scheduling in a GPU using wakeup event state data

Patent number: 11720399

Abstract: A method of scheduling tasks within a GPU or other highly parallel processing unit is described which is both age-aware and wakeup event driven. Tasks which are received are added to an age-based task queue. Wakeup event bits for task types, or combinations of task types and data groups, are set in response to completion of a task dependency and these wakeup event bits are used to select an oldest task from the queue that satisfies predefined criteria.

Type: Grant

Filed: December 1, 2021

Date of Patent: August 8, 2023

Assignee: Imagination Technologies Limited

Inventors: Simon Nield, Adam de Grasse, Luca Iuliano, Ollie Mower, Yoong-Chert Foo
Queues for inter-pipeline data hazard avoidance

Patent number: 11698790

Abstract: Methods and parallel processing units for avoiding inter-pipeline data hazards identified at compile time. For each identified inter-pipeline data hazard the primary instruction and secondary instruction(s) thereof are identified as such and are linked by a counter which is used to track that inter-pipeline data hazard. When a primary instruction is output by the instruction decoder for execution the value of the counter associated therewith is adjusted to indicate that there is hazard related to the primary instruction, and when primary instruction has been resolved by one of multiple parallel processing pipelines the value of the counter associated therewith is adjusted to indicate that the hazard related to the primary instruction has been resolved.

Type: Grant

Filed: November 10, 2021

Date of Patent: July 11, 2023

Assignee: Imagination Technologies Limited

Inventors: Luca Iuliano, Simon Nield, Yoong-Chert Foo, Ollie Mower
Multi-Output Decoder for Texture Decompression

Publication number: 20230169701

Abstract: A decoder is configured to decode a plurality of texels from a received block of texture data encoded according to the Adaptive Scalable Texture Compression (ASTC) format, and includes a parameter decode unit configured to decode configuration data for the received block of texture data, a colour decode unit configured to decode colour endpoint data for the plurality of texels of the received block in dependence on the configuration data, a weight decode unit configured to decode interpolation weight data for each of the plurality of texels of the received block in dependence on the configuration data, and at least one interpolator unit configured to calculate a colour value for each of the plurality of texels of the received block using the interpolation weight data for that texel and a pair of colour endpoints from the colour endpoint data.

Type: Application

Filed: January 25, 2023

Publication date: June 1, 2023

Inventors: Kenneth Rovers, Yoong Chert Foo

1 2 3 4 5 … next