Patents by Inventor Kuen Hung Tsoi

Kuen Hung Tsoi has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240126684
    Abstract: A sparse data storage method for deep learning, a computer device and a storage medium. The method includes: obtaining an offset between current non-zero data and previous non-zero data of the current non-zero data, and generating to-be-transmitted data according to the current non-zero data and the offset, where the to-be-transmitted data is stored in a first memory; obtaining the to-be-transmitted data, calculating an address increment according to the offset, and obtaining, according to the address increment, a storage address in which the current non-zero data is to be stored in a second memory; and transmitting the current non-zero data to the second memory, and storing the current non-zero data in the storage address in the second memory. According to the embodiments, the power consumption and costs required by deep learning operations can be reduced.
    Type: Application
    Filed: July 31, 2023
    Publication date: April 18, 2024
    Applicant: Shenzhen Corerain Technologies Co., Ltd.
    Inventors: Kuen Hung TSOI, Xinyu Niu
  • Publication number: 20230350974
    Abstract: The present application provides a quantitative computation method and apparatus applied to depthwise convolution. The method includes: determining n multipliers adopted for standard convolution in a preset part of quantitative computation; equally distributing the n multipliers to a first part and a second part of depthwise convolution in the quantitative computation; in the depthwise convolution, computing a first result of a target pixel point in a target block unit in the first part by one multiplier in the first part, and computing a second result of the target pixel point in the second part by one multiplier in the second part; and obtaining quantified results of the target block unit specific to the first part and the second part according to the first result and the second result of each target pixel point. According to the present application, resources are utilized to the maximum extent.
    Type: Application
    Filed: March 3, 2023
    Publication date: November 2, 2023
    Inventors: Xiayang Zhou, Kuen Hung Tsoi, Xinyu Niu
  • Publication number: 20230325307
    Abstract: Disclosed are an apparatus and a method for address generation, a data buffer, and an artificial intelligence chip. The apparatus includes address generating circuits, including N first address generating circuits and M second address generating circuits, where an n-th first address generating circuit generates a first address yn of each element in each of first matrices required for computations on an n-th first dimension according to yn=floor(anxn+bn)×Tn, and the first matrices are distributed along M second dimensions; and an m-th second address generating circuit generates a second address ym of each first matrix on an m-th second dimension according to ym=floor(amxm+bm)×Tm; and an address combining circuit generating an address for accessing each element in each first matrix by combining the second address of each first matrix on the M second dimensions and the first address of each element in each first matrix on N first dimensions.
    Type: Application
    Filed: March 16, 2023
    Publication date: October 12, 2023
    Inventors: Li Jiao, Kuen Hung Tsoi, Xinyu Niu
  • Publication number: 20230325184
    Abstract: The present disclosure provides an artificial intelligence chip, an accelerator and an operation method, relating to the technical field of artificial intelligence, the chip comprising: a first operation circuit configured to execute a first operation to output a first operation result; a second operation circuit connected in parallel with the first operation circuit and configured to execute a second operation identical to the first operation to output a second operation result; and a third operation circuit configured to, upon receiving the first operation result and the second operation result, execute a third operation different from the first operation on the first operation result and the second operation result, respectively, to output a third operation result, respectively.
    Type: Application
    Filed: March 14, 2023
    Publication date: October 12, 2023
    Inventors: Jiadong Wang, Xinyu Niu, Kuen Hung Tsoi
  • Publication number: 20230307036
    Abstract: The present disclosure provides storage and accessing methods for parameters in a streaming AI accelerator chip, and relates to the technical field of artificial intelligence, wherein the streaming-based data buffer comprises: a plurality of banks, different banks being configured to store different data; a data read circuit configured to receive a read control signal and a read address corresponding to a computation task, in the case the read control signal corresponds to a first read mode, determine n banks from the plurality of banks based on the read control signal, and read first data required for performing the computation task in parallel from the n banks based on the read address, the first data comprising n pieces of data corresponding to the n banks in a one-to-one correspondence, n?2, n being a positive integer.
    Type: Application
    Filed: March 16, 2023
    Publication date: September 28, 2023
    Inventors: Chenglong Zeng, Kuen Hung Tsoi, Xinyu Niu
  • Publication number: 20230305976
    Abstract: A data flow-based neural network multi-engine synchronous calculation system, include: a plurality of calculation engines each including a plurality of calculation modules and at least one cache module located at different layers, and each calculation module is configured to calculate an input calculation graph provided by the cache module or the calculation module of a previous layer of a layer where each calculation module is located, so as to obtain an output calculation graph; and at least one synchronization module each being configured to monitor the data amount of the input calculation graph stored by the cache module on the same layer in each calculation engine, and control, when the data amount reaches a preset value corresponding to each cache module, each cache module on the same layer to output the stored input calculation graph to the calculation module on a next layer.
    Type: Application
    Filed: June 4, 2021
    Publication date: September 28, 2023
    Inventors: Li Jiao, Yuanchao Li, Kuen Hung Tsoi, Xinyu Niu
  • Publication number: 20230289065
    Abstract: A data flow control device in a streaming architecture chip includes at least one first data buffer module, at least one operation module and at least one second data buffer module. The second data buffer module is configured to send a flow control count signal to the first data buffer module, the flow control count signal being used for informing the first data buffer module of an amount of data that can be received of the second data buffer module. The first data buffer module is configured to send a data signal and a valid signal to the second data buffer module via the operation modules according to the flow control count signal, the valid signal being used for indicating that a corresponding data signal is valid.
    Type: Application
    Filed: March 6, 2023
    Publication date: September 14, 2023
    Inventors: Chenchen Lu, Kuen Hung Tsoi, Xinyu Niu
  • Publication number: 20230267173
    Abstract: Disclosed are a chip, a method, an accelerator, and a system for pooling operation. The chip includes: a demultiplexer including a first input terminal, and first and second output terminals, and outputting a first matrix from the first input terminal via the first or second output terminals in response to a first control signal; a first memory connected to the first output terminal and outputting elements of the first matrix stored by the first memory in response to a second control signal; a second memory connected to the second output terminal and serially outputting elements of a second matrix in the first matrix stored by the second memory in response to a third control signal; and a computation circuit performing a pooling operation on the second matrix from the first memory or the second memory to obtain an operation result in response to a fourth control signal.
    Type: Application
    Filed: January 18, 2023
    Publication date: August 24, 2023
    Inventors: Jiadong Wang, Kuen Hung Tsoi, Xinyu Niu
  • Publication number: 20230267300
    Abstract: Disclosed are a data stream-based computation unit, an artificial intelligence chip, and an accelerator. The computation unit includes a plurality of computation circuits, each computation circuit including a first input terminal and a second input terminal, wherein M first input terminals of M computation circuits receive M pieces of first data required for a computation task on a one-to-one basis, where M2 and M is a positive integer; M second input terminals receive M pieces of second data distinct from each other required for the computation task on a one-to-one basis; the M computation circuits perform the computation task in parallel on the basis of the M pieces of first data and the M pieces of second data, wherein each computation circuit performs the computation task on the basis of one piece of first data and one piece of second data.
    Type: Application
    Filed: January 18, 2023
    Publication date: August 24, 2023
    Inventors: Chenchen Lu, Kuen Hung Tsoi, Xinyu Niu
  • Publication number: 20230252600
    Abstract: The present application discloses an image size adjustment structure, an adjustment method, and an image scaling method and device based on a streaming architecture. The image size adjustment structure includes: a first multiplication operation unit, a second multiplication operation unit, a first data registering unit, a second data registering unit, a first addition operation unit and a second addition operation unit, and the input and output ports of each unit are connected according to the specified data flowing direction. It realizes the fast calculation of image data and relieves the calculation pressure of CPU.
    Type: Application
    Filed: April 17, 2023
    Publication date: August 10, 2023
    Inventors: Jiantian Liang, Kuen Hung Tsoi, Xinyu Niu
  • Publication number: 20230251979
    Abstract: The embodiments of the present application provide a data processing method and apparatus of an AI chip and a computer device. The data processing method of the AI chip includes: determining a target AI model for processing data to be processed; matching, in the AI chip, a data flow network corresponding to the target AI model and a data flow direction of the data flow network; and processing the data to be processed based on the data flow network and the data flow direction.
    Type: Application
    Filed: June 22, 2021
    Publication date: August 10, 2023
    Inventors: Kuen Hung Tsoi, Xinyu Niu
  • Publication number: 20230251827
    Abstract: A floating-point unit, a configuration method and device thereof, an artificial intelligence chip, and an accelerator. The floating-point unit is based on streaming, and includes: a data input end; N multiplexers, each including a first input end, a second input end, and a first output end, the first input end of a 1st multiplexer being connected to the data input end, the first input end of an ith multiplexer being connected to the first output end of an (i?1)th multiplexer, N?2, and 2?i?N; N floating-point operation circuits, a 1st floating-point operation circuit being connected between the data input end and the second input end of the 1st multiplexer, and an ith floating-point operation circuit being connected between the first output end of the (i?1)th multiplexer and the second input end of the ith multiplexer; and a data output end, connected to the first output end of an Nth multiplexer.
    Type: Application
    Filed: January 18, 2023
    Publication date: August 10, 2023
    Inventors: Jiantian Liang, Kuen Hung Tsoi, Xinyu Niu
  • Publication number: 20230205607
    Abstract: A data stream architecture-based accelerator includes a storage unit, a read-write address generation unit and a computing unit. The storage unit includes a plurality of banks. The read-write address generation unit is used for generating storage unit read-write addresses according to a preset read-write parallelism, determining target banks in the storage unit according to the storage unit read-write addresses and reading to-be-processed data from the target banks for operations in the computing unit. The computing unit includes a plurality of data paths and is configured to determine target data paths according to a preset computing parallelism so that the target data paths can perform operations on the to-be-processed data to obtain processed data, and then store the processed data into the target banks according to the storage unit read-write addresses.
    Type: Application
    Filed: December 26, 2022
    Publication date: June 29, 2023
    Inventors: Chenglong Zeng, Kuen Hung Tsoi, Xinyu Niu
  • Patent number: 11677902
    Abstract: Provided is a data processing system. The system includes a data source, a data receiver, a plurality of source code data frame buffer regions, a data processing module and a state register. The data source is configured to generate a data frame, the data receiver is configured to receive the data frame, and write the data frame into one of a plurality of data frame buffer regions, each of the plurality of source code data frame buffer regions is configured to store a data frame to be processed, the data processing module is configured to perform subsequent processing on data and the state register is configured to store a state of the system and states of the plurality of source code data frame buffer regions.
    Type: Grant
    Filed: October 9, 2018
    Date of Patent: June 13, 2023
    Assignee: Shenzhen Corerain Technologies Co., Ltd.
    Inventors: Xinyu Niu, Kuen Hung Tsoi
  • Publication number: 20230139106
    Abstract: Provided are a conversion method and apparatus for a deep learning model, a server, and a storage medium. The method includes: parsing a target deep learning model into an intermediate representation of an instruction set computation graph; converting the intermediate representation of the instruction set computation graph into an intermediate representation of a data flow computation graph; adjusting the intermediate representation of the data flow computation graph to an intermediate representation of a customized architecture; and obtaining a converted target data flow network model corresponding to the target deep learning model according to the intermediate representation of the customized architecture.
    Type: Application
    Filed: January 5, 2021
    Publication date: May 4, 2023
    Inventors: Chao XIONG, Kuen Hung TSOI, Xinyu NIU
  • Publication number: 20230128529
    Abstract: An acceleration system includes: a direct memory accessor configured to store a computation graph, a first data stream lake buffer and a second data stream lake buffer, the first data stream lake buffer being configured to cache the computation graph; an arithmetic unit configured to obtain an i-th layer of computing nodes of the computation graph to obtain an (i+1)-th layer of computing nodes; and the first fan-out device configured to replicate the (i+1)-th layer of computing nodes and store the same in the direct memory accessor and the second data stream lake buffer, respectively. The arithmetic unit extracts the (i+1)-th layer of computing nodes from the second data stream lake buffer to obtain a (i+2)-th layer of computing nodes, and the above steps are repeated until the n layer of computing nodes is obtained, where 1?i?n-3, n?4, i is a positive integer, and n is a positive integer.
    Type: Application
    Filed: December 22, 2022
    Publication date: April 27, 2023
    Inventors: Chenglong Zeng, Yuanchao Li, Kuen Hung Tsoi, Xinyu Niu
  • Publication number: 20230126978
    Abstract: Embodiments of the present disclosure provide an artificial intelligence (AI) chip and an AI chip-based data processing method. The AI chip includes: a data flow network for processing, on the basis of an AI algorithm, data to be processed. The data flow network includes: at least one calculation module, each configured to calculate, on the basis of one of at least one operation node corresponding to the AI algorithm, the data to be processed, and output a calculation result; and a next transfer module corresponding to each calculation module, connected to each calculation module, and configured to receive the calculation result output by each calculation module and process the calculation result, the data to be processed flowing in the data flow network according to a preset data flow direction.
    Type: Application
    Filed: December 20, 2022
    Publication date: April 27, 2023
    Inventors: Kuen Hung Tsoi, Xinyu Niu
  • Publication number: 20230128421
    Abstract: An embodiment of the present application discloses a neural network accelerator, including: a convolution calculation module, which is used to perform a convolution operation on an input data input into a preset neural network to obtain a first output data; a tail calculation module, which is used to perform a calculation on the first output data to obtain a second output data; a storage module, which is used to cache the input data and the second output data; and a first control module, which is used to transmit the first output data to the tail calculation module. The convolution calculation module includes a plurality of convolution calculation units, the tail calculation module includes a plurality of tail calculation units, the first control module includes a plurality of first control units, and at least two convolution calculation units are connected to one tail calculation unit through one first control unit.
    Type: Application
    Filed: December 21, 2022
    Publication date: April 27, 2023
    Inventors: Chenglong Zeng, Yuanchao Li, Kuen Hung Tsoi, Xinyu Niu
  • Publication number: 20230036414
    Abstract: Provided are a neural network acceleration circuit and method. The neural network acceleration circuit includes a data storage module a data cache module, a computing module, and a delay processing module. The data storage module is configured to store input data required for a neural network computation. The data cache module is configured to cache input data output by the data storage module and required for the neural network computation. The computing module includes multiple computing units configured to compute input data output by the data cache module and required for the neural network computation so that multiple groups of output data are obtained. The delay processing module is configured to perform delay processing on the multiple groups of output data separately and output the multiple groups of output data subjected to the delay processing at the same time.
    Type: Application
    Filed: December 16, 2020
    Publication date: February 2, 2023
    Inventors: Li JIAO, Yuanchao LI, Kuen Hung TSOI, Xinyu NIU
  • Publication number: 20230035910
    Abstract: Provided are a method for the parallel processing of data, a device, and a storage medium. The method includes: identifying, from multiple first computing nodes, at least three first computing nodes which have a logical relationship are identified and defining the at least three first computing nodes which have the logical relationship as a first parallel node group, where the first parallel node group includes a first preceding node and at least two first subsequent nodes; acquiring a first input data model of the first preceding node and generating a first input tensor of the first preceding node; computing a first output tensor of the first preceding node according to the first input data model and the first input tensor; and acquiring a second input data model of the at least two first subsequent nodes and using the first output tensor as a second input tensor.
    Type: Application
    Filed: December 23, 2020
    Publication date: February 2, 2023
    Inventors: Kai MA, Chao XIONG, Kuen Hung TSOI, Xinyu NIU