Patents by Inventor Shou Jen Lai

Shou Jen Lai has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Efficient work execution in a parallel computing system

Patent number: 11175920

Abstract: A computing device operative to perform parallel computations. The computing device includes a controller unit to assign workgroups to a set of batches. Each batch includes a program counter shared by M workgroups assigned to the batch, where M is a positive integer determined according to a configurable batch setting. Each batch further includes a set of thread processing units operative to execute, in parallel, a subset of work items in each of the M workgroups. Each batch further includes a spilling memory to store intermediate data of the M workgroups when one or more workgroups in the M workgroups encounters a synchronization barrier.

Type: Grant

Filed: April 25, 2019

Date of Patent: November 16, 2021

Assignee: MediaTek Inc.

Inventors: Shou-Jen Lai, Pei-Kuei Tsung, Po-Chun Fan, Sung-Fang Tsai
Parallel memory access to on-chip memory containing regions of different addressing schemes by threads executed on parallel processing units

Patent number: 10838656

Abstract: A system is provided to manage on-chip memory access for multiple threads. The system comprises multiple parallel processing units to execute the threads, and an on-chip memory including multiple memory units and each memory unit includes a first region and a second region. The first region and the second region have different memory addressing schemes for parallel access by the threads. The system further comprises an address decoder coupled to the parallel processing units and the on-chip memory. The address decoder is operative to activate access by the threads to memory locations in the first region or the second region according to decoded address signals from the parallel processing units.

Type: Grant

Filed: August 12, 2017

Date of Patent: November 17, 2020

Assignee: MediaTek Inc.

Inventors: Po-Chun Fan, Pei-Kuei Tsung, Sung-Fang Tsai, Chia-Hsien Chou, Shou-Jen Lai
Adaptive execution engine for convolution computing systems

Patent number: 10394929

Abstract: A system performs convolution computing in either a matrix mode or a filter mode. An analysis module generates a mode select signal to select the matrix mode or the filter mode based on results of analyzing convolution characteristics. The results include at least a comparison of resource utilization between the matrix mode and the filter mode. A convolution module includes processing elements, each of which further includes arithmetic computing circuitry. The convolution module is configured according to the matrix mode for performing matrix multiplications converted from convolution computations, and is configured according to the filter mode for performing the convolution computations.

Type: Grant

Filed: October 19, 2017

Date of Patent: August 27, 2019

Assignee: MediaTek, Inc.

Inventors: Sung-Fang Tsai, Pei-Kuei Tsung, Po-Chun Fan, Shou-Jen Lai
EFFICIENT WORK EXECUTION IN A PARALLEL COMPUTING SYSTEM

Publication number: 20190250924

Abstract: A computing device operative to perform parallel computations. The computing device includes a controller unit to assign workgroups to a set of batches. Each batch includes a program counter shared by M workgroups assigned to the batch, where M is a positive integer determined according to a configurable batch setting. Each batch further includes a set of thread processing units operative to execute, in parallel, a subset of work items in each of the M workgroups. Each batch further includes a spilling memory to store intermediate data of the M workgroups when one or more workgroups in the M workgroups encounters a synchronization barrier.

Type: Application

Filed: April 25, 2019

Publication date: August 15, 2019

Inventors: Shou-Jen Lai, Pei-Kuei Tsung, Po-Chun Fan, Sung-Fang Tsai
Memory shuffle engine for efficient work execution in a parallel computing system

Patent number: 10324730

Abstract: A computing device performs parallel computations using a set of thread processing units and a memory shuffle engine. The memory shuffle engine includes a register array to store an array of data elements retrieved from a memory buffer, and an array of input selectors. According to a first control signal, each input selector transfers at least a first data element from a corresponding subset of the register array, which is coupled to the input selector via input lines, to one or more corresponding thread processing units. According to a second control signal, each input selector transfers at least a second data element from another subset of the register array, which is coupled to another input selector via other input lines, to the one or more corresponding thread processing units.

Type: Grant

Filed: October 4, 2016

Date of Patent: June 18, 2019

Assignee: MediaTek, Inc.

Inventors: Shou-Jen Lai, Pei-Kuei Tsung, Po-Chun Fan, Sung-Fang Tsai
ADAPTIVE EXECUTION ENGINE FOR CONVOLUTION COMPUTING SYSTEMS

Publication number: 20180173676

Abstract: A system performs convolution computing in either a matrix mode or a filter mode. An analysis module generates a mode select signal to select the matrix mode or the filter mode based on results of analyzing convolution characteristics. The results include at least a comparison of resource utilization between the matrix mode and the filter mode. A convolution module includes processing elements, each of which further includes arithmetic computing circuitry. The convolution module is configured according to the matrix mode for performing matrix multiplications converted from convolution computations, and is configured according to the filter mode for performing the convolution computations.

Type: Application

Filed: October 19, 2017

Publication date: June 21, 2018

Inventors: Sung-Fang Tsai, Pei-Kuei Tsung, Po-Chun Fan, Shou-Jen Lai
HYBRID MEMORY ACCESS TO ON-CHIP MEMORY BY PARALLEL PROCESSING UNITS

Publication number: 20180173463

Abstract: A system is provided to manage on-chip memory access for multiple threads. The system comprises multiple parallel processing units to execute the threads, and an on-chip memory including multiple memory units and each memory unit includes a first region and a second region. The first region and the second region have different memory addressing schemes for parallel access by the threads. The system further comprises an address decoder coupled to the parallel processing units and the on-chip memory. The address decoder is operative to activate access by the threads to memory locations in the first region or the second region according to decoded address signals from the parallel processing units.

Type: Application

Filed: August 12, 2017

Publication date: June 21, 2018

Inventors: Po-Chun Fan, Pei-Kuei Tsung, Sung-Fang Tsai, Chia-Hsien Chou, Shou-Jen Lai
Apparatus for performing tessellation operation and methods utilizing the same

Patent number: 9786098

Abstract: A rendering method executed by a graphics processing unit includes: loading a vertex shading command from a first command queue to a shader module; executing the vertex shading command for computing the varying of the vertices to perform a vertex shading operation by taking the vertices as first input data; storing first tessellation stage commands into a second command queue; loading the first tessellation stage commands to the shader module; and executing the first tessellation commands for computing first tessellation stage outputs to perform a first tessellation stage of the one or more tessellation stages by taking the varying of the vertices as second input data. The vertex shading command is stored into the first command queue by a first processing unit. The varying of the vertices and the first tessellation stage outputs are stored in a cache of the graphics processing unit.

Type: Grant

Filed: July 6, 2015

Date of Patent: October 10, 2017

Assignee: MEDIATEK INC.

Inventors: Pei-Kuei Tsung, Shou-Jen Lai, Yan-Hong Lu, Sung-Fang Tsai, Chien-Ping Lu
EFFICIENT WORK EXECUTION IN A PARALLEL COMPUTING SYSTEM

Publication number: 20170277567

Abstract: A computing device performs parallel computations using a set of thread processing units and a memory shuffle engine. The memory shuffle engine includes a register array to store an array of data elements retrieved from a memory buffer, and an array of input selectors. According to a first control signal, each input selector transfers at least a first data element from a corresponding subset of the register array, which is coupled to the input selector via input lines, to one or more corresponding thread processing units. According to a second control signal, each input selector transfers at least a second data element from another subset of the register array, which is coupled to another input selector via other input lines, to the one or more corresponding thread processing units.

Type: Application

Filed: October 4, 2016

Publication date: September 28, 2017

Inventors: Shou-Jen Lai, Pei-Kuei Tsung, Po-Chun Fan, Sung-Fang Tsai
HETEROGENEOUS COMPUTING SYSTEM WITH A SHARED COMPUTING UNIT AND SEPARATE MEMORY CONTROLS

Publication number: 20170262291

Abstract: A heterogeneous computing system described herein includes a parallel processing module shared among a set of heterogeneous processors. The processors have different processor types, and each processor includes an internal memory unit to store its current context. The parallel processing module includes multiple execution units. A switch module is coupled to the processors and the parallel processing module. The switch module is operative to select, according to a control signal, one of the processors to use the parallel processing module for executing an instruction with multiple data entries in parallel.

Type: Application

Filed: March 9, 2016

Publication date: September 14, 2017

Inventors: Shou-Jen Lai, Pei-Kuei Tsung, Sung-Fang Tsai
Graphic processing system and method thereof

Patent number: 9760969

Abstract: A graphic processing system and a method of graphic processing are provided. The graphic processing system has a collector, a plurality of slots, a scheduler, an arbiter and at least an arithmetic logic unit (ALU). The collector is configured to group a plurality of workitems into elementary wavefronts. Each of the elementary wavefronts comprises workitems configured to execute the same kernel code. The scheduler is configured to allocate the elementary wavefronts to the slots. Two or more of the elementary wavefronts exist at one slot to form one of a plurality of macro wavefronts. The arbiter is configured to select one of the macro wavefronts. The ALU is configured to execute workitems of at least an elementary wavefront of the selected macro wavefront and output results of execution of the workitems.

Type: Grant

Filed: March 9, 2015

Date of Patent: September 12, 2017

Assignee: MEDIATEK INC.

Inventors: Ming-Hao Liao, Shou-Jen Lai, Chia-Hsien Chou, Po-Chun Fan, Yan-Hong Lu, Chih-Chung Cheng, Hung-Yau Lin
APPARATUS FOR PERFORMING TESSELLATION OPERATION AND METHODS UTILIZING THE SAME

Publication number: 20170011550

Abstract: A rendering method executed by a graphics processing unit includes: loading a vertex shading command from a first command queue to a shader module; executing the vertex shading command for computing the varying of the vertices to perform a vertex shading operation by taking the vertices as first input data; storing first tessellation stage commands into a second command queue; loading the first tessellation stage commands to the shader module; and executing the first tessellation commands for computing first tessellation stage outputs to perform a first tessellation stage of the one or more tessellation stages by taking the varying of the vertices as second input data. The vertex shading command is stored into the first command queue by a first processing unit. The varying of the vertices and the first tessellation stage outputs are stored in a cache of the graphics processing unit.

Type: Application

Filed: July 6, 2015

Publication date: January 12, 2017

Inventors: Pei-Kuei TSUNG, Shou-Jen LAI, Yan-Hong LU, Sung-Fang TSAI, Chien-Ping LU
GRAPHIC PROCESSING SYSTEM AND METHOD THEREOF

Publication number: 20160267621

Abstract: A graphic processing system and a method of graphic processing are provided. The graphic processing system has a collector, a plurality of slots, a scheduler, an arbiter and at least an arithmetic logic unit (ALU). The collector is configured to group a plurality of workitems into elementary wavefronts. Each of the elementary wavefronts comprises workitems configured to execute the same kernel code. The scheduler is configured to allocate the elementary wavefronts to the slots. Two or more of the elementary wavefronts exist at one slot to form one of a plurality of macro wavefronts. The arbiter is configured to select one of the macro wavefronts. The ALU is configured to execute workitems of at least an elementary wavefront of the selected macro wavefront and output results of execution of the workitems.

Type: Application

Filed: March 9, 2015

Publication date: September 15, 2016

Inventors: Ming-Hao Liao, Shou-Jen Lai, Chia-Hsien Chou, Po-Chun Fan, Yan-Hong Lu, Chih-Chung Cheng, Hung-Yau Lin
Apparatus and method for pixel block compression during rendering in computer graphics

Publication number: 20030016226

Abstract: An apparatus and method for pixel block compression during rendering in computer graphics is proposed. The method is to divide the image frame into a plurality of blocks and compute those blocks covered by a rendering triangle. If a block is not totally covered by the triangle, the method will read in and decompress the block for reference. Then, the system will render the blocks covered by the triangle and compress each block. At last, the system stores the compressed data stream into memory. The compression method is first to compute a plurality of initial seed colors for clustering the block of pixels. Then, each pixel within the block will be classified into groups with the corresponding initial seed colors. Those pixels with the same initial seed color are averaged to become a new final seed color. Therefore, the coded data comprise the index table and the final seed colors.

Type: Application

Filed: July 18, 2001

Publication date: January 23, 2003

Inventors: Chung-Yen Lu, Shou Jen Lai