Patents by Inventor Jung Hau Foo

Jung Hau Foo has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20250147905
    Abstract: A ring buffer storage method includes generating data of a first output according to Q input tokens of a large language model (LLM), and writing the data of the first output into last Q column vectors of an updated first cache tensor buffer matrix. A starting memory address of a first cache tensor buffer is shifted according to the number Q of input tokens of the LLM for updating the first cache tensor buffer. The first cache tensor buffer forms a first cache tensor buffer matrix. The updated first cache tensor buffer forms the updated first cache tensor buffer matrix. The first cache tensor buffer matrix includes a plurality of space segments. Each row of the first cache tensor buffer matrix includes C space segments. C is a cache size. The plurality of space segments have continuous memory addresses.
    Type: Application
    Filed: November 3, 2024
    Publication date: May 8, 2025
    Applicant: MediaTek Singapore Pte. Ltd.
    Inventors: Jung Hau FOO, Jia Yao Christopher LIM, Deep Yap, Kelvin Kae Wen TEH
  • Publication number: 20250053821
    Abstract: An auto-regressive method for a large language model includes receiving a hidden state associated with at least one token, generating key data, first value data, and query data according to a received hidden state, generating first positionally encoded key data by encoding the key data positionally, generating positionally encoded query data by encoding the query data positionally, performing first element-wise dot product operations according to the first positionally encoded key data, the positionally encoded query data, and second positionally encoded key data to generate an attention score, performing second element-wise dot product operations according to the first value data, the attention score, and second value data to generate an attention output, and adding the attention output and the hidden state to generate an updated hidden output.
    Type: Application
    Filed: July 11, 2024
    Publication date: February 13, 2025
    Applicant: MediaTek Singapore Pte. Ltd.
    Inventors: Jia Yao Christopher LIM, Kelvin Kae Wen TEH, Po-Yen LIN, Jung Hau FOO, Chia-Wei HSU, Yu-Lung LU, Hung-Jen CHEN, Chung-Li LU, Wai Mun WONG
  • Publication number: 20250045523
    Abstract: An execution method of a machine learning model, comprising: generating output and a begin of sentence (BoS) cache of a BoS token using the machine learning model before or after performing model quantization on the machine learning model to generate a quantized model; and executing inference based on the quantized model, and during the inference, input the next token following the BoS token as a first input token and the BoS cache into the quantized model to generate output and cache of the next token, wherein the next token is based on the output of the Bos token or based on an input content.
    Type: Application
    Filed: June 21, 2024
    Publication date: February 6, 2025
    Applicant: MEDIATEK INC.
    Inventors: Min-Yuan Tseng, Jung-Hau FOO
  • Publication number: 20240004952
    Abstract: A bit-widths determination method selects bit-widths for mixed-precision neural network computing on a target hardware platform. An activation quantization sensitivity (AQS) value is calculated for each convolution layer in a neural network. The AQS value indicates the sensitivity of convolution output to quantized convolution input. One or more convolution layers are grouped into a quantization group, which is to be executed by a corresponding set of target hardware. A group AQS value is calculated for each quantization group based on the AQS values of the convolution layers in the quantization group. Then bit-widths supported by the target hardware platform are selected for the corresponding quantization groups. The bit-widths are selected to optimize, under a given constraint, a sensitivity metric that is calculated based on each quantization group's group AQS value.
    Type: Application
    Filed: June 29, 2022
    Publication date: January 4, 2024
    Inventors: Hantao Huang, Ziang Yang, Jia Yao Christopher Lim, Jung Hau Foo, Chia-Lin Yu