Patents by Inventor Yoong-Chert Foo

Yoong-Chert Foo has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Allocation of Memory Resources to SIMD Workgroups

Publication number: 20190087229

Abstract: A memory subsystem for use with a single-instruction multiple-data (SIMD) processor comprising a plurality of processing units configured for processing one or more workgroups each comprising a plurality of SIMD tasks, the memory subsystem comprising: a shared memory partitioned into a plurality of memory portions for allocation to tasks that are to be processed by the processor; and a resource allocator configured to, in response to receiving a memory resource request for first memory resources in respect of a first-received task of a workgroup, allocate to the workgroup a block of memory portions sufficient in size for each task of the workgroup to receive memory resources in the block equivalent to the first memory resources.

Type: Application

Filed: September 17, 2018

Publication date: March 21, 2019

Inventors: Luca Iuliano, Simon Nield, Yoong-Chert Foo, Ollie Mower, Jonathan Redshaw
Allocation of tiles to processing engines in a graphics processing system

Patent number: 10210651

Abstract: A graphics processing system processes primitive fragments using a rendering space which is sub-divided into tiles. The graphics processing system comprises processing engines configured to apply texturing and/or shading to primitive fragments. The graphics processing system also comprises a cache system for storing graphics data for primitive fragments, the cache system including multiple cache subsystems. Each of the cache subsystems is coupled to a respective set of one or more processing engines. The graphics processing system also comprises a tile allocation unit which operates in one or more allocation modes to allocate tiles to processing engines. The allocation mode(s) include a spatial allocation mode in which groups of spatially adjacent tiles are allocated to the processing engines according to a spatial allocation scheme, which ensures that each of the groups of spatially adjacent tiles is allocated to a set of processing engines which are coupled to the same cache subsystem.

Type: Grant

Filed: July 18, 2018

Date of Patent: February 19, 2019

Assignee: Imagination Technologies Limited

Inventors: Jonathan Redshaw, Yoong Chert Foo
SCHEDULING TASKS

Publication number: 20180365056

Abstract: A method of synchronizing a group of scheduled tasks within a parallel processing unit into a known state is described. The method uses a synchronization instruction in a scheduled task which triggers, in response to decoding of the instruction, an instruction decoder to place the scheduled task into a non-active state and forward the decoded synchronization instruction to an atomic ALU for execution. When the atomic ALU executes the decoded synchronization instruction, the atomic ALU performs an operation and check on data assigned to the group ID of the scheduled task and if the check is passed, all scheduled tasks having the particular group ID are removed from the non-active state.

Type: Application

Filed: June 18, 2018

Publication date: December 20, 2018

Inventors: Ollie Mower, Yoong-Chert Foo
Methods and Systems for Inter-Pipeline Data Hazard Avoidance

Publication number: 20180365016

Abstract: Methods and parallel processing units for avoiding inter-pipeline data hazards wherein inter-pipeline data hazards are identified at compile time. For each identified inter-pipeline data hazard the primary instruction and secondary instruction(s) thereof are identified as such and are linked by a counter which is used to track that inter-pipeline data hazard. Then when a primary instruction is output by the instruction decoder for execution the value of the counter associated therewith is adjusted (e.g. incremented) to indicate that there is hazard related to the primary instruction, and when primary instruction has been resolved by one of multiple parallel processing pipelines the value of the counter associated therewith is adjusted (e.g. decremented) to indicate that the hazard related to the primary instruction has been resolved.

Type: Application

Filed: June 15, 2018

Publication date: December 20, 2018

Inventors: Luca Iuliano, Simon Nield, Yoong-Chert Foo, Ollie Mower
SCHEDULING TASKS

Publication number: 20180365057

Abstract: A method of scheduling instructions within a parallel processing unit is described. The method comprises decoding, in an instruction decoder, an instruction in a scheduled task in an active state, and checking, by an instruction controller, if an ALU targeted by the decoded instruction is a primary instruction pipeline. If the targeted ALU is a primary instruction pipeline, a list associated with the primary instruction pipeline is checked to determine whether the scheduled task is already included in the list. If the scheduled task is already included in the list, the decoded instruction is sent to the primary instruction pipeline.

Type: Application

Filed: June 18, 2018

Publication date: December 20, 2018

Inventors: Simon Nield, Yoong-Chert Foo, Adam de Grasse, Luca Iuliano
SCHEDULING TASKS

Publication number: 20180365009

Abstract: A method of activating scheduling instructions within a parallel processing unit is described. The method comprises decoding, in an instruction decoder, an instruction in a scheduled task in an active state and checking, by an instruction controller, if a swap flag is set in the decoded instruction. If the swap flag in the decoded instruction is set, a scheduler is triggered to de-activate the scheduled task by changing the scheduled task from the active state to a non-active state.

Type: Application

Filed: June 18, 2018

Publication date: December 20, 2018

Inventors: Simon Nield, Yoong-Chert Foo, Adam de Grasse, Luca Iuliano
SCHEDULING TASKS

Publication number: 20180365058

Abstract: A method of activating scheduling instructions within a parallel processing unit is described. The method includes checking if an ALU targeted by a decoded instruction is full by checking a value of an ALU work fullness counter stored in the instruction controller and associated with the targeted ALU. If the targeted ALU is not full, the decoded instruction is sent to the targeted ALU for execution and the ALU work fullness counter associated with the targeted ALU is updated. If, however, the targeted ALU is full, a scheduler is triggered to de-activate the scheduled task by changing the scheduled task from the active state to a non-active state. When an ALU changes from being full to not being full, the scheduler is triggered to re-activate an oldest scheduled task waiting for the ALU by removing the oldest scheduled task from the non-active state.

Type: Application

Filed: June 18, 2018

Publication date: December 20, 2018

Inventors: Simon Nield, Yoong-Chert Foo, Adam de Grasse, Luca Iuliano
ALLOCATION OF TILES TO PROCESSING ENGINES IN A GRAPHICS PROCESSING SYSTEM

Publication number: 20180322684

Abstract: A graphics processing system processes primitive fragments using a rendering space which is sub-divided into tiles. The graphics processing system comprises processing engines configured to apply texturing and/or shading to primitive fragments. The graphics processing system also comprises a cache system for storing graphics data for primitive fragments, the cache system including multiple cache subsystems. Each of the cache subsystems is coupled to a respective set of one or more processing engines. The graphics processing system also comprises a tile allocation unit which operates in one or more allocation modes to allocate tiles to processing engines. The allocation mode(s) include a spatial allocation mode in which groups of spatially adjacent tiles are allocated to the processing engines according to a spatial allocation scheme, which ensures that each of the groups of spatially adjacent tiles is allocated to a set of processing engines which are coupled to the same cache subsystem.

Type: Application

Filed: July 18, 2018

Publication date: November 8, 2018

Inventors: Jonathan Redshaw, Yoong Chert Foo
Multi-Output Decoder for Texture Decompression

Publication number: 20180315218

Abstract: A decoder is configured to decode a plurality of texels from a received block of texture data encoded according to the Adaptive Scalable Texture Compression (ASTC) format, and includes a parameter decode unit configured to decode configuration data for the received block of texture data, a colour decode unit configured to decode colour endpoint data for the plurality of texels of the received block in dependence on the configuration data, a weight decode unit configured to decode interpolation weight data for each of the plurality of texels of the received block in dependence on the configuration data, and at least one interpolator unit configured to calculate a colour value for each of the plurality of texels of the received block using the interpolation weight data for that texel and a pair of colour endpoints from the colour endpoint data.

Type: Application

Filed: April 28, 2018

Publication date: November 1, 2018

Inventors: Kenneth Rovers, Yoong Chert Foo
Decoder Unit for Texture Decompression

Publication number: 20180315233

Abstract: A decoder unit is configured to decode a plurality of texels in accordance with a texel request, the plurality of texels being encoded across one or more blocks of encoded texture data each encoding a block of texels, and includes a first set of one or more decoders, each of the first set of decoders being configured to decode n texels from a single received block of encoded texture data; a second set of or more decoders, each of the second set of decoders being configured to decode p texels from a single received block of encoded texture data, where p<n; and control logic configured to allocate blocks of encoded texture data to the decoders in accordance with the texel request.

Type: Application

Filed: April 28, 2018

Publication date: November 1, 2018

Inventors: Yoong Chert Foo, Kenneth Rovers
Allocation of tiles to processing engines in a graphics processing system

Patent number: 10055877

Abstract: A graphics processing system processes primitive fragments using a rendering space which is sub-divided into tiles. The graphics processing system comprises processing engines configured to apply texturing and/or shading to primitive fragments. The graphics processing system also comprises a cache system for storing graphics data for primitive fragments, the cache system including multiple cache subsystems. Each of the cache subsystems is coupled to a respective set of one or more processing engines. The graphics processing system also comprises a tile allocation unit which operates in one or more allocation modes to allocate tiles to processing engines. The allocation mode(s) include a spatial allocation mode in which groups of spatially adjacent tiles are allocated to the processing engines according to a spatial allocation scheme, which ensures that each of the groups of spatially adjacent tiles is allocated to a set of processing engines which are coupled to the same cache subsystem.

Type: Grant

Filed: December 21, 2016

Date of Patent: August 21, 2018

Assignee: Imagination Technologies Limited

Inventors: Jonathan Redshaw, Yoong Chert Foo
Performance Profiling in Computer Graphics

Publication number: 20180158168

Abstract: A method of profiling the performance of a graphics unit when rendering a scene according to a graphics pipeline, includes executing stages of the graphics pipeline using one or more units of rendering circuitry to perform at least one rendering task that defines a portion of the work required to render the scene, the at least one rendering task associated with a set flag; propagating an indication of the flag through stages of the graphics pipeline as the scene is rendered so that work done as part of the at least one rendering task is associated with the set flag; changing the value of a counter associated with a unit of rendering circuitry in response to an occurrence of an event whilst that unit performs an item of work associated with the set flag; and reading the value of the counter to thereby measure the occurrences of the event caused by completing the at least one rendering task.

Type: Application

Filed: October 31, 2017

Publication date: June 7, 2018

Inventor: Yoong-Chert Foo
Task Scheduling in a GPU

Publication number: 20180088989

Abstract: A method of scheduling tasks within a GPU or other highly parallel processing unit is described which is both age-aware and wakeup event driven. Tasks which are received are added to an age-based task queue. Wakeup event bits for task types, or combinations of task types and data groups, are set in response to completion of a task dependency and these wakeup event bits are used to select an oldest task from the queue that satisfies predefined criteria.

Type: Application

Filed: September 25, 2017

Publication date: March 29, 2018

Inventors: Simon Nield, Adam de Grasse, Luca Iuliano, Ollie Mower, Yoong-Chert Foo
Allocation of Tiles to Processing Engines in a Graphics Processing System

Publication number: 20170178386

Abstract: A graphics processing system processes primitive fragments using a rendering space which is sub-divided into tiles. The graphics processing system comprises processing engines configured to apply texturing and/or shading to primitive fragments. The graphics processing system also comprises a cache system for storing graphics data for primitive fragments, the cache system including multiple cache subsystems. Each of the cache subsystems is coupled to a respective set of one or more processing engines. The graphics processing system also comprises a tile allocation unit which operates in one or more allocation modes to allocate tiles to processing engines. The allocation mode(s) include a spatial allocation mode in which groups of spatially adjacent tiles are allocated to the processing engines according to a spatial allocation scheme, which ensures that each of the groups of spatially adjacent tiles is allocated to a set of processing engines which are coupled to the same cache subsystem.

Type: Application

Filed: December 21, 2016

Publication date: June 22, 2017

Inventors: Jonathan Redshaw, Yoong Chert Foo
Task Execution in a SIMD Processing Unit with Parallel Groups of Processing Lanes

Publication number: 20170076420

Abstract: A SIMD processing unit processes a plurality of tasks which each include up to a predetermined maximum number of work items. The work items of a task are arranged for executing a common sequence of instructions on respective data items. The data items are arranged into blocks, with some of the blocks including at least one invalid data item. Work items which relate to invalid data items are invalid work items. The SIMD processing unit comprises a group of processing lanes configured to execute instructions of work items of a particular task over a plurality of processing cycles. A control module assembles work items into the tasks based on the validity of the work items, so that invalid work items of the particular task are temporally aligned across the processing lanes. In this way the number of wasted processing slots due to invalid work items may be reduced.

Type: Application

Filed: November 2, 2016

Publication date: March 16, 2017

Inventors: John Howson, Jonathan Redshaw, Yoong Chert Foo
Task execution in a SIMD processing unit with parallel groups of processing lanes

Patent number: 9513963

Abstract: A SIMD processing unit processes a plurality of tasks which each include up to a predetermined maximum number of work items. The work items of a task are arranged for executing a common sequence of instructions on respective data items. The data items are arranged into blocks, with some of the blocks including at least one invalid data item. Work items which relate to invalid data items are invalid work items. The SIMD processing unit comprises a group of processing lanes configured to execute instructions of work items of a particular task over a plurality of processing cycles. A control module assembles work items into the tasks based on the validity of the work items, so that invalid work items of the particular task are temporally aligned across the processing lanes. In this way the number of wasted processing slots due to invalid work items may be reduced.

Type: Grant

Filed: December 17, 2014

Date of Patent: December 6, 2016

Assignee: Imagination Technologies Limited

Inventors: John Howson, Jonathan Redshaw, Yoong Chert Foo
Multi-Phased and Multi-Threaded Program Execution Based on SIMD Ratio

Publication number: 20160179519

Abstract: A microprocessor is configured to execute programs divided into discrete phases. A scheduler is provided for scheduling instructions. A plurality of resources are for executing instructions issued by the scheduler, wherein the scheduler is configured to schedule each phase of the program only after receiving an indication that execution of the preceding phase of the program has been completed. By splitting programs into multiple phases and providing a scheduler that is able to determine whether execution of a phase has been completed, each phase can be separately scheduled and the results of preceding phases can be used to inform the scheduling of subsequent phases. In one example, different numbers of threads and/or different numbers of data instances per thread may be processed for different phases of the same program.

Type: Application

Filed: February 29, 2016

Publication date: June 23, 2016

Inventor: Yoong Chert Foo
Method and System for Multisample Antialiasing

Publication number: 20160163099

Abstract: A method and system for generating two or three dimensional computer graphics images using multisample antialiasing (MSAA) is provided, which enables memory bandwidth to be conserved. For each of one or more pixels it is determined whether all of a plurality of sample areas of that pixel are located within a particular primitive. For those pixels where it is determined that all the sample areas of that pixel are located within that primitive, a value is stored in a multisample memory for a smaller number of the sample areas of that pixel than the total number of the sample areas of that pixel and data is stored indicating that all the sample areas of that pixel are located within that primitive.

Type: Application

Filed: February 18, 2016

Publication date: June 9, 2016

Inventors: Yoong Chert Foo, Salil Sahasrabudhe, Andrew Davy
Multi-phased and multi-threaded program execution based on SIMD ratio

Patent number: 9304812

Abstract: A microprocessor is configured to execute programs divided into discrete phases. A scheduler is provided for scheduling instructions. A plurality of resources are for executing instructions issued by the scheduler, wherein the scheduler is configured to schedule each phase of the program only after receiving an indication that execution of the preceding phase of the program has been completed. By splitting programs into multiple phases and providing a scheduler that is able to determine whether execution of a phase has been completed, each phase can be separately scheduled and the results of preceding phases can be used to inform the scheduling of subsequent phases. In one example, different numbers of threads and/or different numbers of data instances per thread may be processed for different phases of the same program.

Type: Grant

Filed: May 19, 2011

Date of Patent: April 5, 2016

Assignee: Imagination Technologies Limited

Inventor: Yoong Chert Foo
Method and system for multisample antialiasing

Patent number: 9275492

Abstract: A method and system for generating two or three dimensional computer graphics images using multisample antialiasing (MSAA) is provided, which enables memory bandwidth to be conserved. For each of one or more pixels it is determined whether all of a plurality of sample areas of that pixel are located within a particular primitive. For those pixels where it is determined that all the sample areas of that pixel are located within that primitive, a value is stored in a multisample memory for a smaller number of the sample areas of that pixel than the total number of the sample areas of that pixel and data is stored indicating that all the sample areas of that pixel are located within that primitive.

Type: Grant

Filed: November 7, 2012

Date of Patent: March 1, 2016

Assignee: Imagination Technologies Limited

Inventors: Yoong Chert Foo, Salil Sahasrabudhe, Andrew Davy

prev 1 2 3 4 5 6 next