Patents by Inventor Jung Hau Foo

Jung Hau Foo has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Ring Buffer Storage Method and Ring Buffer Storage System Capable of Minimizing Extra Overhead Utilization

Publication number: 20250147905

Abstract: A ring buffer storage method includes generating data of a first output according to Q input tokens of a large language model (LLM), and writing the data of the first output into last Q column vectors of an updated first cache tensor buffer matrix. A starting memory address of a first cache tensor buffer is shifted according to the number Q of input tokens of the LLM for updating the first cache tensor buffer. The first cache tensor buffer forms a first cache tensor buffer matrix. The updated first cache tensor buffer forms the updated first cache tensor buffer matrix. The first cache tensor buffer matrix includes a plurality of space segments. Each row of the first cache tensor buffer matrix includes C space segments. C is a cache size. The plurality of space segments have continuous memory addresses.

Type: Application

Filed: November 3, 2024

Publication date: May 8, 2025

Applicant: MediaTek Singapore Pte. Ltd.

Inventors: Jung Hau FOO, Jia Yao Christopher LIM, Deep Yap, Kelvin Kae Wen TEH
Auto-regressive system and Auto-regressive Method for a Large Language Model

Publication number: 20250053821

Abstract: An auto-regressive method for a large language model includes receiving a hidden state associated with at least one token, generating key data, first value data, and query data according to a received hidden state, generating first positionally encoded key data by encoding the key data positionally, generating positionally encoded query data by encoding the query data positionally, performing first element-wise dot product operations according to the first positionally encoded key data, the positionally encoded query data, and second positionally encoded key data to generate an attention score, performing second element-wise dot product operations according to the first value data, the attention score, and second value data to generate an attention output, and adding the attention output and the hidden state to generate an updated hidden output.

Type: Application

Filed: July 11, 2024

Publication date: February 13, 2025

Applicant: MediaTek Singapore Pte. Ltd.

Inventors: Jia Yao Christopher LIM, Kelvin Kae Wen TEH, Po-Yen LIN, Jung Hau FOO, Chia-Wei HSU, Yu-Lung LU, Hung-Jen CHEN, Chung-Li LU, Wai Mun WONG
Execution Methods of a Machine Learning Model

Publication number: 20250045523

Abstract: An execution method of a machine learning model, comprising: generating output and a begin of sentence (BoS) cache of a BoS token using the machine learning model before or after performing model quantization on the machine learning model to generate a quantized model; and executing inference based on the quantized model, and during the inference, input the next token following the BoS token as a first input token and the BoS cache into the quantized model to generate output and cache of the next token, wherein the next token is based on the output of the Bos token or based on an input content.

Type: Application

Filed: June 21, 2024

Publication date: February 6, 2025

Applicant: MEDIATEK INC.

Inventors: Min-Yuan Tseng, Jung-Hau FOO
Hardware-Aware Mixed-Precision Quantization

Publication number: 20240004952

Abstract: A bit-widths determination method selects bit-widths for mixed-precision neural network computing on a target hardware platform. An activation quantization sensitivity (AQS) value is calculated for each convolution layer in a neural network. The AQS value indicates the sensitivity of convolution output to quantized convolution input. One or more convolution layers are grouped into a quantization group, which is to be executed by a corresponding set of target hardware. A group AQS value is calculated for each quantization group based on the AQS values of the convolution layers in the quantization group. Then bit-widths supported by the target hardware platform are selected for the corresponding quantization groups. The bit-widths are selected to optimize, under a given constraint, a sensitivity metric that is calculated based on each quantization group's group AQS value.

Type: Application

Filed: June 29, 2022

Publication date: January 4, 2024

Inventors: Hantao Huang, Ziang Yang, Jia Yao Christopher Lim, Jung Hau Foo, Chia-Lin Yu

Ring Buffer Storage Method and Ring Buffer Storage System Capable of Minimizing Extra Overhead Utilization

Auto-regressive system and Auto-regressive Method for a Large Language Model

Execution Methods of a Machine Learning Model

Hardware-Aware Mixed-Precision Quantization