Patents by Inventor Mankit Lo

Mankit Lo has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230262210
    Abstract: A data compression method is provided for compressing an image. A coding module may select a plurality of pixels with a sequence order from the image, and compress the plurality of pixels to generate a plurality of compressed pixels. For a current pixel p[i] having a previous pixel p[i?1] and a next pixel p[i+1], the coding module generates a coding mode M[i+1] configured for compressing the p[i+1], and generates a fixed-rate compressed value c[i] corresponding to the p[i]. The coding module stores the c[i] in a compressed pixel, and c[i] encapsulates the coding mode M[i+1]. The coding module then stores the plurality of compressed pixels into a compressed image corresponding to the image.
    Type: Application
    Filed: January 30, 2023
    Publication date: August 17, 2023
    Applicant: VeriSilicon Holdings Co., Ltd.
    Inventors: Lefan ZHONG, Mankit LO, Wei MIAO
  • Patent number: 11599334
    Abstract: A device for performing multiply/accumulate operations processes values in first and second buffers and having a first width using a computational pipeline with a second width, such as half the first width. A sequencer processes combinations of portions (high-high, low-low, high-low, low-high) of the values in the first and second buffers using a multiply/accumulate circuit and adds the accumulated result of each combination of portions to a group accumulator. Adding to the group accumulator may be preceded by left shifting the accumulated result (the first width for the high-high combination and the second width for the low-high and high-low combination).
    Type: Grant
    Filed: June 9, 2020
    Date of Patent: March 7, 2023
    Assignees: VeriSilicon Microelectronics, VeriSilicon Holdings Co., Ltd.
    Inventors: Mankit Lo, Meng Yue, Jin Zhang
  • Patent number: 11455781
    Abstract: The present disclosure provides a data reading/writing method and system for in 3D image processing, a storage medium and a terminal. The method includes the following steps: dividing a 3D image horizontally based on the vertical sliding technology, the 3D image is divided into at least two subimages, a processing data of each subimage is stored in a circular buffer, after the subimage is processed, an overlapping portion data required by next subimage is retained in the circular buffer; dividing a multi-layer network of an image processing algorithm into at least two segments, the data between adjacent layers in each segment only interact through buffer, not through DDR.
    Type: Grant
    Filed: September 25, 2019
    Date of Patent: September 27, 2022
    Assignees: VeriSilicon Microelectronics (Shanghai) Co., Ltd., VeriSilicon Holdings Co., Ltd., VeriSilicon Microelectronics (Nanjing) Co., Ltd.
    Inventors: Zhonghao Cui, Mankit Lo, Ke Zhang, Huiming Zhang
  • Publication number: 20220269484
    Abstract: Example accumulation systems and methods are described. In one implementation, data is received for processing. A multiplication operation is performed on the received data to generate multiplied data. An addition operation is performed on the multiplied data to generate a result. At least a portion of the least significant bits of the result are stored in a first region of an accumulation buffer of a convolution core. And, at least a portion of the remaining bits of the result are stored in a shared memory that is separate from the convolution core.
    Type: Application
    Filed: February 19, 2021
    Publication date: August 25, 2022
    Inventor: Mankit Lo
  • Patent number: 11301214
    Abstract: A circuit for performing multiply/accumulate operations evaluates a type of each value of a pair of input values. Signed values are split into sign and magnitude. One or more pairs of arguments are input to a multiplier such that the arguments have fewer bits than the magnitude of signed values or unsigned values. This may include splitting input values into multiple arguments and inputting multiple pairs of arguments to the multiplier for a single pair of input values.
    Type: Grant
    Filed: June 9, 2020
    Date of Patent: April 12, 2022
    Assignees: VeriSilicon Microelectronics (Shanghai) Co., Ltd., VeriSilicon Holdings Co., Ltd.
    Inventors: Mankit Lo, Meng Yue, Jin Zhang
  • Publication number: 20210382690
    Abstract: A device for performing multiply/accumulate operations processes values in first and second buffers and having a first width using a computational pipeline with a second width, such as half the first width. A sequencer processes combinations of portions (high-high, low-low, high-low, low-high) of the values in the first and second buffers using a multiply/accumulate circuit and adds the accumulated result of each combination of portions to a group accumulator. Adding to the group accumulator may be preceded by left shifting the accumulated result (the first width for the high-high combination and the second width for the low-high and high-low combination).
    Type: Application
    Filed: June 9, 2020
    Publication date: December 9, 2021
    Inventors: Mankit Lo, Meng Yue, Jin Zhang
  • Publication number: 20210382689
    Abstract: A circuit for performing multiply/accumulate operations evaluates a type of each value of a pair of input values. Signed values are split into sign and magnitude. One or more pairs of arguments are input to a multiplier such that the arguments have fewer bits than the magnitude of signed values or unsigned values. This may include splitting input values into multiple arguments and inputting multiple pairs of arguments to the multiplier for a single pair of input values.
    Type: Application
    Filed: June 9, 2020
    Publication date: December 9, 2021
    Inventors: Mankit Lo, Meng Yue, Jin Zhang
  • Publication number: 20210318887
    Abstract: A system performs matrix multiplication of a vector by a two-dimensional matrix by evaluating whether the vector includes zero values. Rows of the matrix are loaded into a first memory device from a second device. Rows corresponding to the indexes of the zero values are not loaded. A dot product of columns of the matrix and the input vector is performed and stored. The matrix may be stored in the second memory device such that only entries for non-zero entries are stored. The rows of the matrix may be reconstructed in the first memory device from these entries.
    Type: Application
    Filed: April 9, 2020
    Publication date: October 14, 2021
    Inventors: Mankit Lo, Wei-Lun Kao, Yizhong Yang
  • Publication number: 20210295607
    Abstract: The present disclosure provides a data reading/writing method and system for in 3D image processing, a storage medium and a terminal. The method includes the following steps: dividing a 3D image horizontally based on the vertical sliding technology, the 3D image is divided into at least two subimages, a processing data of each subimage is stored in a circular buffer, after the subimage is processed, an overlapping portion data required by next subimage is retained in the circular buffer; dividing a multi-layer network of an image processing algorithm into at least two segments, the data between adjacent layers in each segment only interact through buffer, not through DDR.
    Type: Application
    Filed: September 25, 2019
    Publication date: September 23, 2021
    Applicants: VeriSilicon Microelectronics (Shanghai) Co., Ltd., VeriSilicon Holdings Co., Ltd., VeriSilicon Microelectronics (Nanjing) Co., Ltd.
    Inventors: Zhonghao CUI, Mankit LO, Ke ZHANG, Huiming ZHANG
  • Patent number: 10585623
    Abstract: A computer system includes a hardware buffer controller. Memory access requests to a buffer do not include an address within the buffer and threads accessing the buffer do not access or directly update any pointers to locations within the buffer. The memory access requests are addressed to the hardware buffer controller, which determines an address from its current state and issues a memory access command to that address. The hardware buffer controller updates its state in response to the memory access requests. The hardware buffer controller evaluates its state and outputs events to a thread scheduler in response to overflow or underflow conditions or near-overflow or near-underflow conditions. The thread scheduler may then block threads from issuing memory access requests to the hardware buffer controller. The buffer implemented may be a FIFO or other type of buffer.
    Type: Grant
    Filed: December 11, 2015
    Date of Patent: March 10, 2020
    Assignee: VIVANTE CORPORATION
    Inventor: Mankit Lo
  • Patent number: 10242311
    Abstract: A convolution engine, such as a convolution neural network, operates efficiently with respect to sparse kernels by implementing zero skipping. An input tile is loaded and accumulated sums are calculated for the input tile for non-zero coefficients by shifting the tile according to a row and column index of the coefficient in the kernel. Each coefficient is applied individually to tile and the result written to an accumulation buffer before moving to the next non-zero coefficient. A 3D or 4D convolution may be implemented in this manner with separate regions of the accumulation buffer storing accumulated sums for different indexes along one dimension. Images are completely processed and results for each image are stored in the accumulation buffer before moving to the next image.
    Type: Grant
    Filed: August 8, 2017
    Date of Patent: March 26, 2019
    Assignee: VIVANTE CORPORATION
    Inventor: Mankit Lo
  • Patent number: 9977619
    Abstract: A computer system processes instructions including an instruction code, source type, source address, destination type, and destination address. The source and destination type may indicate a memory device in which case data is read from the memory device at the source address and written to the destination address. One or both of the source type and destination type may include a transfer descriptor flag, in which case a transfer descriptor identified by the source or destination address is executed. A transfer descriptor referenced by a source address may be executed to obtain an intermediate result that is used for performing the operation indicated by the instruction code. The transfer descriptor referenced by a destination address may be executed to determine a location at which the result of the operation will be stored.
    Type: Grant
    Filed: November 6, 2015
    Date of Patent: May 22, 2018
    Assignee: Vivante Corporation
    Inventor: Mankit Lo
  • Patent number: 9928117
    Abstract: A computer system includes a hardware synchronization component (HSC). Multiple concurrent threads of execution issue instructions to update the state of the HSC. Multiple threads may update the state in the same clock cycle and a thread does not need to receive control of the HSC prior to updating its states. Instructions referencing the state received during the same clock cycle are aggregated and the state is updated according to the number of the instructions. The state is evaluated with respect to a threshold condition. If it is met, then the HSC outputs an event to a processor. The processor then identifies a thread impacted by the event and takes a predetermined action based on the event (e.g. blocking, branching, unblocking of the thread).
    Type: Grant
    Filed: December 11, 2015
    Date of Patent: March 27, 2018
    Assignee: Vivante Corporation
    Inventor: Mankit Lo
  • Publication number: 20180046898
    Abstract: A convolution engine, such as a convolution neural network, operates efficiently with respect to sparse kernels by implementing zero skipping. An input tile is loaded and accumulated sums are calculated for the input tile for non-zero coefficients by shifting the tile according to a row and column index of the coefficient in the kernel. Each coefficient is applied individually to tile and the result written to an accumulation buffer before moving to the next non-zero coefficient. A 3D or 4D convolution may be implemented in this manner with separate regions of the accumulation buffer storing accumulated sums for different indexes along one dimension. Images are completely processed and results for each image are stored in the accumulation buffer before moving to the next image.
    Type: Application
    Filed: August 8, 2017
    Publication date: February 15, 2018
    Inventor: Mankit Lo
  • Publication number: 20180046437
    Abstract: A convolution engine, such as a convolution neural network, operates efficiently with respect to sparse kernels by implementing zero skipping. An input tile is loaded and accumulated sums are calculated for the input tile for non-zero coefficients by shifting the tile according to a row and column index of the coefficient in the kernel. Each coefficient is applied individually to tile and the result written to an accumulation buffer before moving to the next non-zero coefficient. A 3D or 4D convolution may be implemented in this manner with separate regions of the accumulation buffer storing accumulated sums for different indexes along one dimension. Images are completely processed and results for each image are stored in the accumulation buffer before moving to the next image.
    Type: Application
    Filed: August 8, 2017
    Publication date: February 15, 2018
    Inventor: Mankit Lo
  • Publication number: 20170168875
    Abstract: A computer system includes a hardware synchronization component (HSC). Multiple concurrent threads of execution issue instructions to update the state of the HSC. Multiple threads may update the state in the same clock cycle and a thread does not need to receive control of the HSC prior to updating its states. Instructions referencing the state received during the same clock cycle are aggregated and the state is updated according to the number of the instructions. The state is evaluated with respect to a threshold condition. If it is met, then the HSC outputs an event to a processor. The processor then identifies a thread impacted by the event and takes a predetermined action based on the event (e.g. blocking, branching, unblocking of the thread).
    Type: Application
    Filed: December 11, 2015
    Publication date: June 15, 2017
    Inventor: Mankit Lo
  • Publication number: 20170168755
    Abstract: A computer system includes a hardware buffer controller. Memory access requests to a buffer do not include an address within the buffer and threads accessing the buffer do not access or directly update any pointers to locations within the buffer. The memory access requests are addressed to the hardware buffer controller, which determines an address from its current state and issues a memory access command to that address. The hardware buffer controller updates its state in response to the memory access requests. The hardware buffer controller evaluates its state and outputs events to a thread scheduler in response to overflow or underflow conditions or near-overflow or near-underflow conditions. The thread scheduler may then block threads from issuing memory access requests to the hardware buffer controller. The buffer implemented may be a FIFO or other type of buffer.
    Type: Application
    Filed: December 11, 2015
    Publication date: June 15, 2017
    Inventor: Mankit Lo
  • Publication number: 20170131939
    Abstract: A computer system processes instructions including an instruction code, source type, source address, destination type, and destination address. The source and destination type may indicate a memory device in which case data is read from the memory device at the source address and written to the destination address. One or both of the source type and destination type may include a transfer descriptor flag, in which case a transfer descriptor identified by the source or destination address is executed. A transfer descriptor referenced by a source address may be executed to obtain an intermediate result that is used for performing the operation indicated by the instruction code. The transfer descriptor referenced by a destination address may be executed to determine a location at which the result of the operation will be stored.
    Type: Application
    Filed: November 6, 2015
    Publication date: May 11, 2017
    Inventor: Mankit Lo
  • Patent number: 8077776
    Abstract: Motion estimation is described. A first portion of a predicted frame is obtained. The first portion is for a first predicted value. A first subset of a reference frame is obtained. The first subset is for a first reference value. Twice the first predicted value is subtracted from the first reference value. The outcome of the subtracting is multiplied by the first reference value to produce a partial result. The partial result is used for indication of a degree of difference between the first portion and the first subset.
    Type: Grant
    Filed: December 15, 2006
    Date of Patent: December 13, 2011
    Assignee: Xilinx, Inc.
    Inventor: Mankit Lo