Patents by Inventor Xinyu NIU

Xinyu NIU has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Data stream architecture-based accelerator, and data access method and device for accelerator

Patent number: 12386681

Abstract: A data stream architecture-based accelerator includes a storage unit, a read-write address generation unit and a computing unit. The storage unit includes a plurality of banks. The read-write address generation unit is used for generating storage unit read-write addresses according to a preset read-write parallelism, determining target banks in the storage unit according to the storage unit read-write addresses and reading to-be-processed data from the target banks for operations in the computing unit. The computing unit includes a plurality of data paths and is configured to determine target data paths according to a preset computing parallelism so that the target data paths can perform operations on the to-be-processed data to obtain processed data, and then store the processed data into the target banks according to the storage unit read-write addresses.

Type: Grant

Filed: December 26, 2022

Date of Patent: August 12, 2025

Assignee: Shenzhen Corerain Technologies Co., Ltd.

Inventors: Chenglong Zeng, Kuen Hung Tsoi, Xinyu Niu
Sparse data storage method for deep learning, computer device and storage medium

Patent number: 12353324

Abstract: A sparse data storage method for deep learning, a computer device and a storage medium. The method includes: obtaining an offset between current non-zero data and previous non-zero data of the current non-zero data, and generating to-be-transmitted data according to the current non-zero data and the offset, where the to-be-transmitted data is stored in a first memory; obtaining the to-be-transmitted data, calculating an address increment according to the offset, and obtaining, according to the address increment, a storage address in which the current non-zero data is to be stored in a second memory; and transmitting the current non-zero data to the second memory, and storing the current non-zero data in the storage address in the second memory. According to the embodiments, the power consumption and costs required by deep learning operations can be reduced.

Type: Grant

Filed: July 31, 2023

Date of Patent: July 8, 2025

Assignee: SHENZHEN CORERAIN TECHNOLOGIES CO., LTD.

Inventors: Kuen Hung Tsoi, Xinyu Niu
Scheduling method and device based on deep learning node computation, and storage medium

Patent number: 12340251

Abstract: Provided are a scheduling method and apparatus based on a deep learning node computation, and a storage medium. The scheduling method includes: a to-be-computed node of a preset neural network computation graph is acquired; a node type of the to-be-computed node is determined, where the node type includes a hardware computation node and a software computation node; in a case where the node type is the hardware computation node, the hardware computation node is scheduled to a first queue, and whether a hardware computing power module corresponding to the hardware computation node is occupied or not is determined; and in a case where the hardware computing power module is not occupied, the hardware computation node is input into the hardware computing power module for computing.

Type: Grant

Filed: December 31, 2020

Date of Patent: June 24, 2025

Assignee: Shenzhen Corerain Technologies Co., Ltd.

Inventors: Kai Ma, Chao Xiong, Xinyu Niu, Kuen Hung Tsoi
DATA PROCESSING METHOD AND DATA PROCESSING APPARATUS

Publication number: 20250192801

Abstract: The present invention proposes a data processing method, a data processing apparatus, an electronic device, and a computer-readable storage medium. The method includes: in response to a data processing instruction, segmenting an original data to obtain multiple data segments of the original data to enable parallel processing of the multiple data segments, wherein the multiple data segments includes a first data segment; determining whether the first data segment is suitable for using a preset run-length encoding; and when the first data segment is suitable for using the run-length encoding, using the run-length encoding to perform a run-length encoding processing on the first data segment. According to some embodiments, the original data is initially segmented into multiple data segments, and each data segment can be processed independently and concurrently, thereby enhancing the speed of data compression processing.

Type: Application

Filed: April 30, 2024

Publication date: June 12, 2025

Inventors: Chenchen Lu, Haiqi Tang, Kuen Hung Tsoi, Xinyu Niu
Artificial intelligence chip, accelerator and operation method

Patent number: 12327115

Abstract: The present disclosure provides an artificial intelligence chip, an accelerator and an operation method, relating to the technical field of artificial intelligence, the chip comprising: a first operation circuit configured to execute a first operation to output a first operation result; a second operation circuit connected in parallel with the first operation circuit and configured to execute a second operation identical to the first operation to output a second operation result; and a third operation circuit configured to, upon receiving the first operation result and the second operation result, execute a third operation different from the first operation on the first operation result and the second operation result, respectively, to output a third operation result, respectively.

Type: Grant

Filed: March 14, 2023

Date of Patent: June 10, 2025

Assignee: Shenzhen Corerain Technologies Co., Ltd.

Inventors: Jiadong Wang, Xinyu Niu, Kuen Hung Tsoi
Accumulator, Method for Accumulator and Chip Circuit

Publication number: 20250147725

Abstract: The present application provides an accumulator, a method for the accumulator and a chip circuit. The accumulator includes an accumulation execution module, a difference correction module and a first register, wherein: the accumulation execution module is used to perform accumulation operations on input data of floating point type, and the truncated data in each operation is reserved as truncation error, and the truncation error is fed back to the difference correction module; the difference correction module uses the fed-back truncation error to perform superposition calculation on the external input data, and outputs the superimposed input data to the accumulation execution module; the first register is used to buffer the accumulation result of the accumulation execution module, and the accumulation result is sent back to the accumulation execution module through a feedback channel. It is possible to improve the calculation efficiency while improving the accumulation accuracy of floating point numbers.

Type: Application

Filed: April 30, 2024

Publication date: May 8, 2025

Inventors: Chenchen Lu, Kuen Hung Tsoi, Xinyu Niu
Data processing method and apparatus of AI chip and computer device

Patent number: 12292841

Abstract: The embodiments of the present application provide a data processing method and apparatus of an AI chip and a computer device. The data processing method of the AI chip includes: determining a target AI model for processing data to be processed; matching, in the AI chip, a data flow network corresponding to the target AI model and a data flow direction of the data flow network; and processing the data to be processed based on the data flow network and the data flow direction.

Type: Grant

Filed: June 22, 2021

Date of Patent: May 6, 2025

Assignee: Shenzhen Corerain Technologies Co., Ltd.

Inventors: Kuen Hung Tsoi, Xinyu Niu
Addition Tree Computation Device, Method and Computation Device

Publication number: 20250138780

Abstract: The present application provides an addition tree computation device, method and computation device for the addition operation of floating-point numbers and fixed-point numbers. The data input module receives input data and calculation type instructions; the transmission control module controls the first multiplexer, according to the calculation type instructions, to send floating-point numbers to the first entrance of the fusion calculation module, or to send fixed-point numbers to the second entrance of the fusion calculation module; the fusion calculation module is used to perform addition operations, and the data normalization output module performs output processing on the operation results according to the control of the transmission control module, and outputs the final calculation results.

Type: Application

Filed: April 30, 2024

Publication date: May 1, 2025

Inventors: Chenchen Lu, Kuen Hung Tsoi, Xinyu Niu
METHOD FOR VERIFYING CORRECTNESS OF MODEL CONVERSION UNDER DEPLOYMENT FRAMEWORK AND COMPUTING DEVICE

Publication number: 20250124344

Abstract: A method for verifying correctness of model conversion under a deployment framework, and a computing device. The method for verifying correctness of model conversion under the deployment framework includes: acquiring, under a training framework, a trained model to be converted; acquiring a first intermediate result of the trained model to be converted, as contrast data; converting the trained model to be converted, into a deployment model; loading the deployment model under the deployment framework; executing the deployment model and acquiring a second intermediate result; and comparing the second intermediate results of the deployment model with the contrast data of the trained model, to locate a correctness-related problem of the deployment model before the deployment model completes execution. Accordingly, a problem node can be located quickly and accurately.

Type: Application

Filed: June 6, 2024

Publication date: April 17, 2025

Applicant: Shenzhen Corerain Technologies Co., Ltd.

Inventors: Kuen Hung TSOI, Xinyu Niu
Data flow-based neural network multi-engine synchronous calculation system

Patent number: 12271326

Abstract: A data flow-based neural network multi-engine synchronous calculation system, include: a plurality of calculation engines each including a plurality of calculation modules and at least one cache module located at different layers, and each calculation module is configured to calculate an input calculation graph provided by the cache module or the calculation module of a previous layer of a layer where each calculation module is located, so as to obtain an output calculation graph; and at least one synchronization module each being configured to monitor the data amount of the input calculation graph stored by the cache module on the same layer in each calculation engine, and control, when the data amount reaches a preset value corresponding to each cache module, each cache module on the same layer to output the stored input calculation graph to the calculation module on a next layer.

Type: Grant

Filed: June 4, 2021

Date of Patent: April 8, 2025

Assignee: Shenzhen Corerain Technologies Co., Ltd.

Inventors: Li Jiao, Yuanchao Li, Kuen Hung Tsoi, Xinyu Niu
Streaming-based Computation Circuit, Method and Artificial Intelligence Chip

Publication number: 20250103328

Abstract: The present disclosure provides a streaming-based computation circuit, method and artificial intelligence chip. The computation circuit includes: multiple groups of computation units. Multiple groups of computation units include a first group of computation units and a second group of computation units, the second group of computation units is configured to output a first matrix after each calculation; and a buffer unit, configured to perform one or more first operation. The first operations include: buffering M first matrices consecutively outputted by the second group of computation units for M times, concatenating the M first matrices into a second matrix, the number of elements in the second matrix is not greater than the calculation parallelism of the first computation unit in the first group of computation units, and consecutively outputting the second matrix to the first computation unit for N times to perform N calculation.

Type: Application

Filed: April 30, 2024

Publication date: March 27, 2025

Inventors: Li Jiao, Kuen Hung Tsoi, Xinyu Niu
Artificial intelligence chip and artificial intelligence chip-based data processing method

Patent number: 12216611

Abstract: Embodiments of the present disclosure provide an artificial intelligence (AI) chip and an AI chip-based data processing method. The AI chip includes: a data flow network for processing, on the basis of an AI algorithm, data to be processed. The data flow network includes: at least one calculation module, each configured to calculate, on the basis of one of at least one operation node corresponding to the AI algorithm, the data to be processed, and output a calculation result; and a next transfer module corresponding to each calculation module, connected to each calculation module, and configured to receive the calculation result output by each calculation module and process the calculation result, the data to be processed flowing in the data flow network according to a preset data flow direction.

Type: Grant

Filed: December 20, 2022

Date of Patent: February 4, 2025

Assignee: Shenzhen Corerain Technologies Co., Ltd.

Inventors: Kuen Hung Tsoi, Xinyu Niu
Data compression method, data decompression method, and electronic device

Patent number: 12189601

Abstract: A data compression, decompression method, and an electronic device. The method includes the following steps: establishing an initial lookup table by using data with the same value in dataset to be compressed as one index, sequentially building a new Huffman tree corresponding to each index, and then adding a separator to obtain an encoding list containing a target encoding value and length, adding the encoding list to the initial lookup table to obtain a target lookup table. According to a separator of bitstream data, and searching the target lookup table in parallel, and use the indexes to obtain the decompression result of the data to be decompressed. Embodiments can perform a parallel decompression operation to increase a decompression speed, so that the decompression speed can meet the requirement of an AI engine for a large amount of weight data bandwidth in real time.

Type: Grant

Filed: July 31, 2023

Date of Patent: January 7, 2025

Assignee: Shenzhen Corerain Technologies Co., Ltd.

Inventors: Yuanchao Li, Kuen Hung Tsoi, Xinyu Niu
Data compression method, data decompression method, and electronic device

Patent number: 12147410

Abstract: A data compression, decompression method, and an electronic device. The method includes the following steps: establishing an initial lookup table by using data with the same value in dataset to be compressed as one index, sequentially building a new Huffman tree corresponding to each index, and then adding a separator to obtain an encoding list containing a target encoding value and length, adding the encoding list to the initial lookup table to obtain a target lookup table. According to a separator of bitstream data, and searching the target lookup table in parallel, and use the indexes to obtain the decompression result of the data to be decompressed. Embodiments can perform a parallel decompression operation to increase a decompression speed, so that the decompression speed can meet the requirement of an AI engine for a large amount of weight data bandwidth in real time.

Type: Grant

Filed: July 31, 2023

Date of Patent: November 19, 2024

Assignee: Shenzhen Corerain Technologies Co., Ltd.

Inventors: Yuanchao Li, Kuen Hung Tsoi, Xinyu Niu
Data flow control device in streaming architecture chip

Patent number: 12112043

Abstract: A data flow control device in a streaming architecture chip includes at least one first data buffer module, at least one operation module and at least one second data buffer module. The second data buffer module is configured to send a flow control count signal to the first data buffer module, the flow control count signal being used for informing the first data buffer module of an amount of data that can be received of the second data buffer module. The first data buffer module is configured to send a data signal and a valid signal to the second data buffer module via the operation modules according to the flow control count signal, the valid signal being used for indicating that a corresponding data signal is valid.

Type: Grant

Filed: March 6, 2023

Date of Patent: October 8, 2024

Assignee: Shenzhen Corerain Technologies Co., Ltd.

Inventors: Chenchen Lu, Kuen Hung Tsoi, Xinyu Niu
MODEL COMPILING METHOD AND APPARATUS, AND MODEL RUNNING SYSTEM

Publication number: 20240311686

Abstract: A model compiling method and apparatus, and a model running system. The method includes: parsing a model file to obtain a first computational graph; determining runtime information of a first set of first operators according to a user input and the first computational graph; determining hardware configuration information of a first operator according to the runtime information of each first operator in the first set of first operators; and sending the hardware configuration information of the first operator to an execution device to cause the execution device to perform computation of the first operator.

Type: Application

Filed: August 31, 2022

Publication date: September 19, 2024

Applicant: SHENZHEN CORERAIN TECHNOLOGIES CO., LTD.

Inventors: Jiongkai HUANG, Kuen-Hung TSOI, Xinyu NIU
TAPE PASTING MECHANISM WITH MULTIPLE FUNCTIONS OF CLUTCH-TYPE SYNCHRONOUS PUNCHING, TAPE PASTING AND CUTTING

Publication number: 20240220765

Abstract: A data processing method and apparatus for a neural network model, a device, and a storage medium are provided. The method includes: acquiring multiple neural network operators in a neural network model; fusing the multiple neural network operators according to a preset rule to obtain fused neural network operators; combining the fused neural network operators into computation instructions; and performing computation on the computation instructions by using a computation engine.

Type: Application

Filed: January 26, 2021

Publication date: July 4, 2024

Inventors: Jiongkai HUANG, Kuen Hung TSOI, Xinyu NIU
STREAMING-BASED COMPUTE UNIT AND METHOD, AND ARTIFICIAL INTELLIGENCE CHIP

Publication number: 20240220203

Abstract: A streaming-based compute unit and method, and an artificial intelligence chip, relating to artificial intelligence field. The compute unit includes N registers configured to perform N convolutions on N convolution windows and a convolution kernel. A jth convolution includes performing M multiplications on M data in a jth convolution window and M data in the convolution kernel, to obtain M first computation results. The N convolutions include N multiplications sequentially and consecutively performed on at least one set of feature map data and convolution kernel data. Each feature map data set includes N data from N convolution windows at the same position. A jth register is configured to store a second computation result of the jth convolution window. After an ith multiplication in the jth convolution, the second computation result is updated into a sum of i first computation results in the jth convolution.

Type: Application

Filed: July 31, 2023

Publication date: July 4, 2024

Applicant: Shenzhen Corerain Technologies Co., Ltd.

Inventors: Li JIAO, Kuen Hung Tsoi, Xinyu Niu
DATA COMPRESSION METHOD, DATA DECOMPRESSION METHOD, AND ELECTRONIC DEVICE

Publication number: 20240184763

Abstract: A data compression, decompression method, and an electronic device. The method includes the following steps: establishing an initial lookup table by using data with the same value in dataset to be compressed as one index, sequentially building a new Huffman tree corresponding to each index, and then adding a separator to obtain an encoding list containing a target encoding value and length, adding the encoding list to the initial lookup table to obtain a target lookup table. According to a separator of bitstream data, and searching the target lookup table in parallel, and use the indexes to obtain the decompression result of the data to be decompressed. Embodiments can perform a parallel decompression operation to increase a decompression speed, so that the decompression speed can meet the requirement of an AI engine for a large amount of weight data bandwidth in real time.

Type: Application

Filed: July 31, 2023

Publication date: June 6, 2024

Applicant: Shenzhen Corerain Technologies Co., Ltd.

Inventors: Yuanchao LI, Kuen Hung TSOI, Xinyu NIU
SPARSE DATA STORAGE METHOD FOR DEEP LEARNING, COMPUTER DEVICE AND STORAGE MEDIUM

Publication number: 20240126684

Abstract: A sparse data storage method for deep learning, a computer device and a storage medium. The method includes: obtaining an offset between current non-zero data and previous non-zero data of the current non-zero data, and generating to-be-transmitted data according to the current non-zero data and the offset, where the to-be-transmitted data is stored in a first memory; obtaining the to-be-transmitted data, calculating an address increment according to the offset, and obtaining, according to the address increment, a storage address in which the current non-zero data is to be stored in a second memory; and transmitting the current non-zero data to the second memory, and storing the current non-zero data in the storage address in the second memory. According to the embodiments, the power consumption and costs required by deep learning operations can be reduced.

Type: Application

Filed: July 31, 2023

Publication date: April 18, 2024

Applicant: Shenzhen Corerain Technologies Co., Ltd.

Inventors: Kuen Hung TSOI, Xinyu Niu

1 2 3 next