Patents by Inventor Mankit Lo

Mankit Lo has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

AI CHIP, ELECTRONIC DEVICE, AND CONVOLUTION OPERATION METHOD

Publication number: 20250053612

Abstract: Provided are AI chip, electronic device and convolution operation method. AI chip includes: N convolution cores and storage control system. Storage control system is electrically connected to N convolution cores. Storage control system reads input image data from memory, distributes input image data to each convolution core, reads each weight block from memory, splits each weight block into N pieces of weight data, and distribute N pieces of weight data to N convolution cores, each convolution core corresponds to a piece of weight data, and each weight block is part of complete weight; each convolution core performs convolution operation on received weight data and input image data. Convolution operation results of each convolution core for the same weight block are added to obtain convolution operation result of each weight block, and convolution operation results of each weight block are added to obtain final convolution operation result.

Type: Application

Filed: August 8, 2023

Publication date: February 13, 2025

Inventors: Sen XU, Mankit LO, Minyu DENG, Junshen LI, Yunyi JIN, Xuesong WU, Peng FAN, Weiming ZHANG, Qingyan YOU, Jiayi ZHU
Accumulation systems and methods

Patent number: 12135954

Abstract: Example accumulation systems and methods are described. In one implementation, data is received for processing. A multiplication operation is performed on the received data to generate multiplied data. An addition operation is performed on the multiplied data to generate a result. At least a portion of the least significant bits of the result are stored in a first region of an accumulation buffer of a convolution core. And, at least a portion of the remaining bits of the result are stored in a shared memory that is separate from the convolution core.

Type: Grant

Filed: February 19, 2021

Date of Patent: November 5, 2024

Assignees: VeriSilicon Microelectronics (Shanghai) Co., Ltd., VeriSilicon Holdings Co., Ltd.

Inventor: Mankit Lo
Visual lossless image/video fixed-rate compression

Patent number: 12095981

Abstract: A data compression method is provided for compressing an image. A coding module may select a plurality of pixels with a sequence order from the image, and compress the plurality of pixels to generate a plurality of compressed pixels. For a current pixel p[i] having a previous pixel p[i?1] and a next pixel p[i+1], the coding module generates a coding mode M[i+1] configured for compressing the p[i+1], and generates a fixed-rate compressed value c[i] corresponding to the p[i]. The coding module stores the c[i] in a compressed pixel, and c[i] encapsulates the coding mode M[i+1]. The coding module then stores the plurality of compressed pixels into a compressed image corresponding to the image.

Type: Grant

Filed: January 30, 2023

Date of Patent: September 17, 2024

Assignees: VeriSilicon Microelectronics (Shanghai) Co., Ltd., VeriSilicon Holdings Co., Ltd.

Inventors: Lefan Zhong, Mankit Lo, Wei Miao
Oligomer Stabilized Liquid Crystal Light Valve

Publication number: 20240256223

Abstract: Example accumulation systems and methods are described. In one implementation, data is received for processing. A multiplication operation is performed on the received data to generate multiplied data. An addition operation is performed on the multiplied data to generate a result. At least a portion of the least significant bits of the result are stored in a first region of an accumulation buffer of a convolution core. And, at least a portion of the remaining bits of the result are stored in a shared memory that is separate from the convolution core.

Type: Application

Filed: April 11, 2024

Publication date: August 1, 2024

Inventor: Mankit Lo
Multiplier with zero skipping

Patent number: 12045306

Abstract: A system performs matrix multiplication of a vector by a two-dimensional matrix by evaluating whether the vector includes zero values. Rows of the matrix are loaded into a first memory device from a second device. Rows corresponding to the indexes of the zero values are not loaded. A dot product of columns of the matrix and the input vector is performed and stored. The matrix may be stored in the second memory device such that only entries for non-zero entries are stored. The rows of the matrix may be reconstructed in the first memory device from these entries.

Type: Grant

Filed: April 9, 2020

Date of Patent: July 23, 2024

Assignees: VeriSilicon Microelectronics (Shanghai) Co., Ltd., VeriSilicon Holdings Co., Ltd.

Inventors: Mankit Lo, Wei-Lun Kao, Yizhong Yang
VISUAL LOSSLESS IMAGE/VIDEO FIXED-RATE COMPRESSION

Publication number: 20230262210

Abstract: A data compression method is provided for compressing an image. A coding module may select a plurality of pixels with a sequence order from the image, and compress the plurality of pixels to generate a plurality of compressed pixels. For a current pixel p[i] having a previous pixel p[i?1] and a next pixel p[i+1], the coding module generates a coding mode M[i+1] configured for compressing the p[i+1], and generates a fixed-rate compressed value c[i] corresponding to the p[i]. The coding module stores the c[i] in a compressed pixel, and c[i] encapsulates the coding mode M[i+1]. The coding module then stores the plurality of compressed pixels into a compressed image corresponding to the image.

Type: Application

Filed: January 30, 2023

Publication date: August 17, 2023

Applicant: VeriSilicon Holdings Co., Ltd.

Inventors: Lefan ZHONG, Mankit LO, Wei MIAO
Enhanced multiply accumulate device for neural networks

Patent number: 11599334

Abstract: A device for performing multiply/accumulate operations processes values in first and second buffers and having a first width using a computational pipeline with a second width, such as half the first width. A sequencer processes combinations of portions (high-high, low-low, high-low, low-high) of the values in the first and second buffers using a multiply/accumulate circuit and adds the accumulated result of each combination of portions to a group accumulator. Adding to the group accumulator may be preceded by left shifting the accumulated result (the first width for the high-high combination and the second width for the low-high and high-low combination).

Type: Grant

Filed: June 9, 2020

Date of Patent: March 7, 2023

Assignees: VeriSilicon Microelectronics, VeriSilicon Holdings Co., Ltd.

Inventors: Mankit Lo, Meng Yue, Jin Zhang
Data reading/writing method and system in 3D image processing, storage medium and terminal

Patent number: 11455781

Abstract: The present disclosure provides a data reading/writing method and system for in 3D image processing, a storage medium and a terminal. The method includes the following steps: dividing a 3D image horizontally based on the vertical sliding technology, the 3D image is divided into at least two subimages, a processing data of each subimage is stored in a circular buffer, after the subimage is processed, an overlapping portion data required by next subimage is retained in the circular buffer; dividing a multi-layer network of an image processing algorithm into at least two segments, the data between adjacent layers in each segment only interact through buffer, not through DDR.

Type: Grant

Filed: September 25, 2019

Date of Patent: September 27, 2022

Assignees: VeriSilicon Microelectronics (Shanghai) Co., Ltd., VeriSilicon Holdings Co., Ltd., VeriSilicon Microelectronics (Nanjing) Co., Ltd.

Inventors: Zhonghao Cui, Mankit Lo, Ke Zhang, Huiming Zhang
Accumulation Systems And Methods

Publication number: 20220269484

Abstract: Example accumulation systems and methods are described. In one implementation, data is received for processing. A multiplication operation is performed on the received data to generate multiplied data. An addition operation is performed on the multiplied data to generate a result. At least a portion of the least significant bits of the result are stored in a first region of an accumulation buffer of a convolution core. And, at least a portion of the remaining bits of the result are stored in a shared memory that is separate from the convolution core.

Type: Application

Filed: February 19, 2021

Publication date: August 25, 2022

Inventor: Mankit Lo
Device for performing multiply/accumulate operations

Patent number: 11301214

Abstract: A circuit for performing multiply/accumulate operations evaluates a type of each value of a pair of input values. Signed values are split into sign and magnitude. One or more pairs of arguments are input to a multiplier such that the arguments have fewer bits than the magnitude of signed values or unsigned values. This may include splitting input values into multiple arguments and inputting multiple pairs of arguments to the multiplier for a single pair of input values.

Type: Grant

Filed: June 9, 2020

Date of Patent: April 12, 2022

Assignees: VeriSilicon Microelectronics (Shanghai) Co., Ltd., VeriSilicon Holdings Co., Ltd.

Inventors: Mankit Lo, Meng Yue, Jin Zhang
Enhanced Multiply Accumulate Device For Neural Networks

Publication number: 20210382690

Abstract: A device for performing multiply/accumulate operations processes values in first and second buffers and having a first width using a computational pipeline with a second width, such as half the first width. A sequencer processes combinations of portions (high-high, low-low, high-low, low-high) of the values in the first and second buffers using a multiply/accumulate circuit and adds the accumulated result of each combination of portions to a group accumulator. Adding to the group accumulator may be preceded by left shifting the accumulated result (the first width for the high-high combination and the second width for the low-high and high-low combination).

Type: Application

Filed: June 9, 2020

Publication date: December 9, 2021

Inventors: Mankit Lo, Meng Yue, Jin Zhang
Device For Performing Multiply/Accumulate Operations

Publication number: 20210382689

Abstract: A circuit for performing multiply/accumulate operations evaluates a type of each value of a pair of input values. Signed values are split into sign and magnitude. One or more pairs of arguments are input to a multiplier such that the arguments have fewer bits than the magnitude of signed values or unsigned values. This may include splitting input values into multiple arguments and inputting multiple pairs of arguments to the multiplier for a single pair of input values.

Type: Application

Filed: June 9, 2020

Publication date: December 9, 2021

Inventors: Mankit Lo, Meng Yue, Jin Zhang
Multiplier with Zero Skipping

Publication number: 20210318887

Abstract: A system performs matrix multiplication of a vector by a two-dimensional matrix by evaluating whether the vector includes zero values. Rows of the matrix are loaded into a first memory device from a second device. Rows corresponding to the indexes of the zero values are not loaded. A dot product of columns of the matrix and the input vector is performed and stored. The matrix may be stored in the second memory device such that only entries for non-zero entries are stored. The rows of the matrix may be reconstructed in the first memory device from these entries.

Type: Application

Filed: April 9, 2020

Publication date: October 14, 2021

Inventors: Mankit Lo, Wei-Lun Kao, Yizhong Yang
DATA READING/WRITING METHOD AND SYSTEM IN 3D IMAGE PROCESSING, STORAGE MEDIUM AND TERMINAL

Publication number: 20210295607

Abstract: The present disclosure provides a data reading/writing method and system for in 3D image processing, a storage medium and a terminal. The method includes the following steps: dividing a 3D image horizontally based on the vertical sliding technology, the 3D image is divided into at least two subimages, a processing data of each subimage is stored in a circular buffer, after the subimage is processed, an overlapping portion data required by next subimage is retained in the circular buffer; dividing a multi-layer network of an image processing algorithm into at least two segments, the data between adjacent layers in each segment only interact through buffer, not through DDR.

Type: Application

Filed: September 25, 2019

Publication date: September 23, 2021

Applicants: VeriSilicon Microelectronics (Shanghai) Co., Ltd., VeriSilicon Holdings Co., Ltd., VeriSilicon Microelectronics (Nanjing) Co., Ltd.

Inventors: Zhonghao CUI, Mankit LO, Ke ZHANG, Huiming ZHANG
Software defined FIFO buffer for multithreaded access

Patent number: 10585623

Abstract: A computer system includes a hardware buffer controller. Memory access requests to a buffer do not include an address within the buffer and threads accessing the buffer do not access or directly update any pointers to locations within the buffer. The memory access requests are addressed to the hardware buffer controller, which determines an address from its current state and issues a memory access command to that address. The hardware buffer controller updates its state in response to the memory access requests. The hardware buffer controller evaluates its state and outputs events to a thread scheduler in response to overflow or underflow conditions or near-overflow or near-underflow conditions. The thread scheduler may then block threads from issuing memory access requests to the hardware buffer controller. The buffer implemented may be a FIFO or other type of buffer.

Type: Grant

Filed: December 11, 2015

Date of Patent: March 10, 2020

Assignee: VIVANTE CORPORATION

Inventor: Mankit Lo
Zero coefficient skipping convolution neural network engine

Patent number: 10242311

Abstract: A convolution engine, such as a convolution neural network, operates efficiently with respect to sparse kernels by implementing zero skipping. An input tile is loaded and accumulated sums are calculated for the input tile for non-zero coefficients by shifting the tile according to a row and column index of the coefficient in the kernel. Each coefficient is applied individually to tile and the result written to an accumulation buffer before moving to the next non-zero coefficient. A 3D or 4D convolution may be implemented in this manner with separate regions of the accumulation buffer storing accumulated sums for different indexes along one dimension. Images are completely processed and results for each image are stored in the accumulation buffer before moving to the next image.

Type: Grant

Filed: August 8, 2017

Date of Patent: March 26, 2019

Assignee: VIVANTE CORPORATION

Inventor: Mankit Lo
Transfer descriptor for memory access commands

Patent number: 9977619

Abstract: A computer system processes instructions including an instruction code, source type, source address, destination type, and destination address. The source and destination type may indicate a memory device in which case data is read from the memory device at the source address and written to the destination address. One or both of the source type and destination type may include a transfer descriptor flag, in which case a transfer descriptor identified by the source or destination address is executed. A transfer descriptor referenced by a source address may be executed to obtain an intermediate result that is used for performing the operation indicated by the instruction code. The transfer descriptor referenced by a destination address may be executed to determine a location at which the result of the operation will be stored.

Type: Grant

Filed: November 6, 2015

Date of Patent: May 22, 2018

Assignee: Vivante Corporation

Inventor: Mankit Lo
Hardware access counters and event generation for coordinating multithreaded processing

Patent number: 9928117

Abstract: A computer system includes a hardware synchronization component (HSC). Multiple concurrent threads of execution issue instructions to update the state of the HSC. Multiple threads may update the state in the same clock cycle and a thread does not need to receive control of the HSC prior to updating its states. Instructions referencing the state received during the same clock cycle are aggregated and the state is updated according to the number of the instructions. The state is evaluated with respect to a threshold condition. If it is met, then the HSC outputs an event to a processor. The processor then identifies a thread impacted by the event and takes a predetermined action based on the event (e.g. blocking, branching, unblocking of the thread).

Type: Grant

Filed: December 11, 2015

Date of Patent: March 27, 2018

Assignee: Vivante Corporation

Inventor: Mankit Lo
Zero Coefficient Skipping Convolution Neural Network Engine

Publication number: 20180046898

Abstract: A convolution engine, such as a convolution neural network, operates efficiently with respect to sparse kernels by implementing zero skipping. An input tile is loaded and accumulated sums are calculated for the input tile for non-zero coefficients by shifting the tile according to a row and column index of the coefficient in the kernel. Each coefficient is applied individually to tile and the result written to an accumulation buffer before moving to the next non-zero coefficient. A 3D or 4D convolution may be implemented in this manner with separate regions of the accumulation buffer storing accumulated sums for different indexes along one dimension. Images are completely processed and results for each image are stored in the accumulation buffer before moving to the next image.

Type: Application

Filed: August 8, 2017

Publication date: February 15, 2018

Inventor: Mankit Lo
Zero Coefficient Skipping Convolution Neural Network Engine

Publication number: 20180046437

Abstract: A convolution engine, such as a convolution neural network, operates efficiently with respect to sparse kernels by implementing zero skipping. An input tile is loaded and accumulated sums are calculated for the input tile for non-zero coefficients by shifting the tile according to a row and column index of the coefficient in the kernel. Each coefficient is applied individually to tile and the result written to an accumulation buffer before moving to the next non-zero coefficient. A 3D or 4D convolution may be implemented in this manner with separate regions of the accumulation buffer storing accumulated sums for different indexes along one dimension. Images are completely processed and results for each image are stored in the accumulation buffer before moving to the next image.

Type: Application

Filed: August 8, 2017

Publication date: February 15, 2018

Inventor: Mankit Lo

1 2 next