Patents by Inventor Mankit Lo
Mankit Lo has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20230262210Abstract: A data compression method is provided for compressing an image. A coding module may select a plurality of pixels with a sequence order from the image, and compress the plurality of pixels to generate a plurality of compressed pixels. For a current pixel p[i] having a previous pixel p[i?1] and a next pixel p[i+1], the coding module generates a coding mode M[i+1] configured for compressing the p[i+1], and generates a fixed-rate compressed value c[i] corresponding to the p[i]. The coding module stores the c[i] in a compressed pixel, and c[i] encapsulates the coding mode M[i+1]. The coding module then stores the plurality of compressed pixels into a compressed image corresponding to the image.Type: ApplicationFiled: January 30, 2023Publication date: August 17, 2023Applicant: VeriSilicon Holdings Co., Ltd.Inventors: Lefan ZHONG, Mankit LO, Wei MIAO
-
Patent number: 11599334Abstract: A device for performing multiply/accumulate operations processes values in first and second buffers and having a first width using a computational pipeline with a second width, such as half the first width. A sequencer processes combinations of portions (high-high, low-low, high-low, low-high) of the values in the first and second buffers using a multiply/accumulate circuit and adds the accumulated result of each combination of portions to a group accumulator. Adding to the group accumulator may be preceded by left shifting the accumulated result (the first width for the high-high combination and the second width for the low-high and high-low combination).Type: GrantFiled: June 9, 2020Date of Patent: March 7, 2023Assignees: VeriSilicon Microelectronics, VeriSilicon Holdings Co., Ltd.Inventors: Mankit Lo, Meng Yue, Jin Zhang
-
Patent number: 11455781Abstract: The present disclosure provides a data reading/writing method and system for in 3D image processing, a storage medium and a terminal. The method includes the following steps: dividing a 3D image horizontally based on the vertical sliding technology, the 3D image is divided into at least two subimages, a processing data of each subimage is stored in a circular buffer, after the subimage is processed, an overlapping portion data required by next subimage is retained in the circular buffer; dividing a multi-layer network of an image processing algorithm into at least two segments, the data between adjacent layers in each segment only interact through buffer, not through DDR.Type: GrantFiled: September 25, 2019Date of Patent: September 27, 2022Assignees: VeriSilicon Microelectronics (Shanghai) Co., Ltd., VeriSilicon Holdings Co., Ltd., VeriSilicon Microelectronics (Nanjing) Co., Ltd.Inventors: Zhonghao Cui, Mankit Lo, Ke Zhang, Huiming Zhang
-
Publication number: 20220269484Abstract: Example accumulation systems and methods are described. In one implementation, data is received for processing. A multiplication operation is performed on the received data to generate multiplied data. An addition operation is performed on the multiplied data to generate a result. At least a portion of the least significant bits of the result are stored in a first region of an accumulation buffer of a convolution core. And, at least a portion of the remaining bits of the result are stored in a shared memory that is separate from the convolution core.Type: ApplicationFiled: February 19, 2021Publication date: August 25, 2022Inventor: Mankit Lo
-
Patent number: 11301214Abstract: A circuit for performing multiply/accumulate operations evaluates a type of each value of a pair of input values. Signed values are split into sign and magnitude. One or more pairs of arguments are input to a multiplier such that the arguments have fewer bits than the magnitude of signed values or unsigned values. This may include splitting input values into multiple arguments and inputting multiple pairs of arguments to the multiplier for a single pair of input values.Type: GrantFiled: June 9, 2020Date of Patent: April 12, 2022Assignees: VeriSilicon Microelectronics (Shanghai) Co., Ltd., VeriSilicon Holdings Co., Ltd.Inventors: Mankit Lo, Meng Yue, Jin Zhang
-
Publication number: 20210382690Abstract: A device for performing multiply/accumulate operations processes values in first and second buffers and having a first width using a computational pipeline with a second width, such as half the first width. A sequencer processes combinations of portions (high-high, low-low, high-low, low-high) of the values in the first and second buffers using a multiply/accumulate circuit and adds the accumulated result of each combination of portions to a group accumulator. Adding to the group accumulator may be preceded by left shifting the accumulated result (the first width for the high-high combination and the second width for the low-high and high-low combination).Type: ApplicationFiled: June 9, 2020Publication date: December 9, 2021Inventors: Mankit Lo, Meng Yue, Jin Zhang
-
Publication number: 20210382689Abstract: A circuit for performing multiply/accumulate operations evaluates a type of each value of a pair of input values. Signed values are split into sign and magnitude. One or more pairs of arguments are input to a multiplier such that the arguments have fewer bits than the magnitude of signed values or unsigned values. This may include splitting input values into multiple arguments and inputting multiple pairs of arguments to the multiplier for a single pair of input values.Type: ApplicationFiled: June 9, 2020Publication date: December 9, 2021Inventors: Mankit Lo, Meng Yue, Jin Zhang
-
Publication number: 20210318887Abstract: A system performs matrix multiplication of a vector by a two-dimensional matrix by evaluating whether the vector includes zero values. Rows of the matrix are loaded into a first memory device from a second device. Rows corresponding to the indexes of the zero values are not loaded. A dot product of columns of the matrix and the input vector is performed and stored. The matrix may be stored in the second memory device such that only entries for non-zero entries are stored. The rows of the matrix may be reconstructed in the first memory device from these entries.Type: ApplicationFiled: April 9, 2020Publication date: October 14, 2021Inventors: Mankit Lo, Wei-Lun Kao, Yizhong Yang
-
Publication number: 20210295607Abstract: The present disclosure provides a data reading/writing method and system for in 3D image processing, a storage medium and a terminal. The method includes the following steps: dividing a 3D image horizontally based on the vertical sliding technology, the 3D image is divided into at least two subimages, a processing data of each subimage is stored in a circular buffer, after the subimage is processed, an overlapping portion data required by next subimage is retained in the circular buffer; dividing a multi-layer network of an image processing algorithm into at least two segments, the data between adjacent layers in each segment only interact through buffer, not through DDR.Type: ApplicationFiled: September 25, 2019Publication date: September 23, 2021Applicants: VeriSilicon Microelectronics (Shanghai) Co., Ltd., VeriSilicon Holdings Co., Ltd., VeriSilicon Microelectronics (Nanjing) Co., Ltd.Inventors: Zhonghao CUI, Mankit LO, Ke ZHANG, Huiming ZHANG
-
Patent number: 10585623Abstract: A computer system includes a hardware buffer controller. Memory access requests to a buffer do not include an address within the buffer and threads accessing the buffer do not access or directly update any pointers to locations within the buffer. The memory access requests are addressed to the hardware buffer controller, which determines an address from its current state and issues a memory access command to that address. The hardware buffer controller updates its state in response to the memory access requests. The hardware buffer controller evaluates its state and outputs events to a thread scheduler in response to overflow or underflow conditions or near-overflow or near-underflow conditions. The thread scheduler may then block threads from issuing memory access requests to the hardware buffer controller. The buffer implemented may be a FIFO or other type of buffer.Type: GrantFiled: December 11, 2015Date of Patent: March 10, 2020Assignee: VIVANTE CORPORATIONInventor: Mankit Lo
-
Patent number: 10242311Abstract: A convolution engine, such as a convolution neural network, operates efficiently with respect to sparse kernels by implementing zero skipping. An input tile is loaded and accumulated sums are calculated for the input tile for non-zero coefficients by shifting the tile according to a row and column index of the coefficient in the kernel. Each coefficient is applied individually to tile and the result written to an accumulation buffer before moving to the next non-zero coefficient. A 3D or 4D convolution may be implemented in this manner with separate regions of the accumulation buffer storing accumulated sums for different indexes along one dimension. Images are completely processed and results for each image are stored in the accumulation buffer before moving to the next image.Type: GrantFiled: August 8, 2017Date of Patent: March 26, 2019Assignee: VIVANTE CORPORATIONInventor: Mankit Lo
-
Patent number: 9977619Abstract: A computer system processes instructions including an instruction code, source type, source address, destination type, and destination address. The source and destination type may indicate a memory device in which case data is read from the memory device at the source address and written to the destination address. One or both of the source type and destination type may include a transfer descriptor flag, in which case a transfer descriptor identified by the source or destination address is executed. A transfer descriptor referenced by a source address may be executed to obtain an intermediate result that is used for performing the operation indicated by the instruction code. The transfer descriptor referenced by a destination address may be executed to determine a location at which the result of the operation will be stored.Type: GrantFiled: November 6, 2015Date of Patent: May 22, 2018Assignee: Vivante CorporationInventor: Mankit Lo
-
Patent number: 9928117Abstract: A computer system includes a hardware synchronization component (HSC). Multiple concurrent threads of execution issue instructions to update the state of the HSC. Multiple threads may update the state in the same clock cycle and a thread does not need to receive control of the HSC prior to updating its states. Instructions referencing the state received during the same clock cycle are aggregated and the state is updated according to the number of the instructions. The state is evaluated with respect to a threshold condition. If it is met, then the HSC outputs an event to a processor. The processor then identifies a thread impacted by the event and takes a predetermined action based on the event (e.g. blocking, branching, unblocking of the thread).Type: GrantFiled: December 11, 2015Date of Patent: March 27, 2018Assignee: Vivante CorporationInventor: Mankit Lo
-
Publication number: 20180046898Abstract: A convolution engine, such as a convolution neural network, operates efficiently with respect to sparse kernels by implementing zero skipping. An input tile is loaded and accumulated sums are calculated for the input tile for non-zero coefficients by shifting the tile according to a row and column index of the coefficient in the kernel. Each coefficient is applied individually to tile and the result written to an accumulation buffer before moving to the next non-zero coefficient. A 3D or 4D convolution may be implemented in this manner with separate regions of the accumulation buffer storing accumulated sums for different indexes along one dimension. Images are completely processed and results for each image are stored in the accumulation buffer before moving to the next image.Type: ApplicationFiled: August 8, 2017Publication date: February 15, 2018Inventor: Mankit Lo
-
Publication number: 20180046437Abstract: A convolution engine, such as a convolution neural network, operates efficiently with respect to sparse kernels by implementing zero skipping. An input tile is loaded and accumulated sums are calculated for the input tile for non-zero coefficients by shifting the tile according to a row and column index of the coefficient in the kernel. Each coefficient is applied individually to tile and the result written to an accumulation buffer before moving to the next non-zero coefficient. A 3D or 4D convolution may be implemented in this manner with separate regions of the accumulation buffer storing accumulated sums for different indexes along one dimension. Images are completely processed and results for each image are stored in the accumulation buffer before moving to the next image.Type: ApplicationFiled: August 8, 2017Publication date: February 15, 2018Inventor: Mankit Lo
-
Publication number: 20170168875Abstract: A computer system includes a hardware synchronization component (HSC). Multiple concurrent threads of execution issue instructions to update the state of the HSC. Multiple threads may update the state in the same clock cycle and a thread does not need to receive control of the HSC prior to updating its states. Instructions referencing the state received during the same clock cycle are aggregated and the state is updated according to the number of the instructions. The state is evaluated with respect to a threshold condition. If it is met, then the HSC outputs an event to a processor. The processor then identifies a thread impacted by the event and takes a predetermined action based on the event (e.g. blocking, branching, unblocking of the thread).Type: ApplicationFiled: December 11, 2015Publication date: June 15, 2017Inventor: Mankit Lo
-
Publication number: 20170168755Abstract: A computer system includes a hardware buffer controller. Memory access requests to a buffer do not include an address within the buffer and threads accessing the buffer do not access or directly update any pointers to locations within the buffer. The memory access requests are addressed to the hardware buffer controller, which determines an address from its current state and issues a memory access command to that address. The hardware buffer controller updates its state in response to the memory access requests. The hardware buffer controller evaluates its state and outputs events to a thread scheduler in response to overflow or underflow conditions or near-overflow or near-underflow conditions. The thread scheduler may then block threads from issuing memory access requests to the hardware buffer controller. The buffer implemented may be a FIFO or other type of buffer.Type: ApplicationFiled: December 11, 2015Publication date: June 15, 2017Inventor: Mankit Lo
-
Publication number: 20170131939Abstract: A computer system processes instructions including an instruction code, source type, source address, destination type, and destination address. The source and destination type may indicate a memory device in which case data is read from the memory device at the source address and written to the destination address. One or both of the source type and destination type may include a transfer descriptor flag, in which case a transfer descriptor identified by the source or destination address is executed. A transfer descriptor referenced by a source address may be executed to obtain an intermediate result that is used for performing the operation indicated by the instruction code. The transfer descriptor referenced by a destination address may be executed to determine a location at which the result of the operation will be stored.Type: ApplicationFiled: November 6, 2015Publication date: May 11, 2017Inventor: Mankit Lo
-
Patent number: 8077776Abstract: Motion estimation is described. A first portion of a predicted frame is obtained. The first portion is for a first predicted value. A first subset of a reference frame is obtained. The first subset is for a first reference value. Twice the first predicted value is subtracted from the first reference value. The outcome of the subtracting is multiplied by the first reference value to produce a partial result. The partial result is used for indication of a degree of difference between the first portion and the first subset.Type: GrantFiled: December 15, 2006Date of Patent: December 13, 2011Assignee: Xilinx, Inc.Inventor: Mankit Lo