Patents by Inventor Xinyu NIU
Xinyu NIU has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20250147725Abstract: The present application provides an accumulator, a method for the accumulator and a chip circuit. The accumulator includes an accumulation execution module, a difference correction module and a first register, wherein: the accumulation execution module is used to perform accumulation operations on input data of floating point type, and the truncated data in each operation is reserved as truncation error, and the truncation error is fed back to the difference correction module; the difference correction module uses the fed-back truncation error to perform superposition calculation on the external input data, and outputs the superimposed input data to the accumulation execution module; the first register is used to buffer the accumulation result of the accumulation execution module, and the accumulation result is sent back to the accumulation execution module through a feedback channel. It is possible to improve the calculation efficiency while improving the accumulation accuracy of floating point numbers.Type: ApplicationFiled: April 30, 2024Publication date: May 8, 2025Inventors: Chenchen Lu, Kuen Hung Tsoi, Xinyu Niu
-
Patent number: 12292841Abstract: The embodiments of the present application provide a data processing method and apparatus of an AI chip and a computer device. The data processing method of the AI chip includes: determining a target AI model for processing data to be processed; matching, in the AI chip, a data flow network corresponding to the target AI model and a data flow direction of the data flow network; and processing the data to be processed based on the data flow network and the data flow direction.Type: GrantFiled: June 22, 2021Date of Patent: May 6, 2025Assignee: Shenzhen Corerain Technologies Co., Ltd.Inventors: Kuen Hung Tsoi, Xinyu Niu
-
Publication number: 20250138780Abstract: The present application provides an addition tree computation device, method and computation device for the addition operation of floating-point numbers and fixed-point numbers. The data input module receives input data and calculation type instructions; the transmission control module controls the first multiplexer, according to the calculation type instructions, to send floating-point numbers to the first entrance of the fusion calculation module, or to send fixed-point numbers to the second entrance of the fusion calculation module; the fusion calculation module is used to perform addition operations, and the data normalization output module performs output processing on the operation results according to the control of the transmission control module, and outputs the final calculation results.Type: ApplicationFiled: April 30, 2024Publication date: May 1, 2025Inventors: Chenchen Lu, Kuen Hung Tsoi, Xinyu Niu
-
METHOD FOR VERIFYING CORRECTNESS OF MODEL CONVERSION UNDER DEPLOYMENT FRAMEWORK AND COMPUTING DEVICE
Publication number: 20250124344Abstract: A method for verifying correctness of model conversion under a deployment framework, and a computing device. The method for verifying correctness of model conversion under the deployment framework includes: acquiring, under a training framework, a trained model to be converted; acquiring a first intermediate result of the trained model to be converted, as contrast data; converting the trained model to be converted, into a deployment model; loading the deployment model under the deployment framework; executing the deployment model and acquiring a second intermediate result; and comparing the second intermediate results of the deployment model with the contrast data of the trained model, to locate a correctness-related problem of the deployment model before the deployment model completes execution. Accordingly, a problem node can be located quickly and accurately.Type: ApplicationFiled: June 6, 2024Publication date: April 17, 2025Applicant: Shenzhen Corerain Technologies Co., Ltd.Inventors: Kuen Hung TSOI, Xinyu Niu -
Patent number: 12271326Abstract: A data flow-based neural network multi-engine synchronous calculation system, include: a plurality of calculation engines each including a plurality of calculation modules and at least one cache module located at different layers, and each calculation module is configured to calculate an input calculation graph provided by the cache module or the calculation module of a previous layer of a layer where each calculation module is located, so as to obtain an output calculation graph; and at least one synchronization module each being configured to monitor the data amount of the input calculation graph stored by the cache module on the same layer in each calculation engine, and control, when the data amount reaches a preset value corresponding to each cache module, each cache module on the same layer to output the stored input calculation graph to the calculation module on a next layer.Type: GrantFiled: June 4, 2021Date of Patent: April 8, 2025Assignee: Shenzhen Corerain Technologies Co., Ltd.Inventors: Li Jiao, Yuanchao Li, Kuen Hung Tsoi, Xinyu Niu
-
Publication number: 20250103328Abstract: The present disclosure provides a streaming-based computation circuit, method and artificial intelligence chip. The computation circuit includes: multiple groups of computation units. Multiple groups of computation units include a first group of computation units and a second group of computation units, the second group of computation units is configured to output a first matrix after each calculation; and a buffer unit, configured to perform one or more first operation. The first operations include: buffering M first matrices consecutively outputted by the second group of computation units for M times, concatenating the M first matrices into a second matrix, the number of elements in the second matrix is not greater than the calculation parallelism of the first computation unit in the first group of computation units, and consecutively outputting the second matrix to the first computation unit for N times to perform N calculation.Type: ApplicationFiled: April 30, 2024Publication date: March 27, 2025Inventors: Li Jiao, Kuen Hung Tsoi, Xinyu Niu
-
Patent number: 12216611Abstract: Embodiments of the present disclosure provide an artificial intelligence (AI) chip and an AI chip-based data processing method. The AI chip includes: a data flow network for processing, on the basis of an AI algorithm, data to be processed. The data flow network includes: at least one calculation module, each configured to calculate, on the basis of one of at least one operation node corresponding to the AI algorithm, the data to be processed, and output a calculation result; and a next transfer module corresponding to each calculation module, connected to each calculation module, and configured to receive the calculation result output by each calculation module and process the calculation result, the data to be processed flowing in the data flow network according to a preset data flow direction.Type: GrantFiled: December 20, 2022Date of Patent: February 4, 2025Assignee: Shenzhen Corerain Technologies Co., Ltd.Inventors: Kuen Hung Tsoi, Xinyu Niu
-
Patent number: 12189601Abstract: A data compression, decompression method, and an electronic device. The method includes the following steps: establishing an initial lookup table by using data with the same value in dataset to be compressed as one index, sequentially building a new Huffman tree corresponding to each index, and then adding a separator to obtain an encoding list containing a target encoding value and length, adding the encoding list to the initial lookup table to obtain a target lookup table. According to a separator of bitstream data, and searching the target lookup table in parallel, and use the indexes to obtain the decompression result of the data to be decompressed. Embodiments can perform a parallel decompression operation to increase a decompression speed, so that the decompression speed can meet the requirement of an AI engine for a large amount of weight data bandwidth in real time.Type: GrantFiled: July 31, 2023Date of Patent: January 7, 2025Assignee: Shenzhen Corerain Technologies Co., Ltd.Inventors: Yuanchao Li, Kuen Hung Tsoi, Xinyu Niu
-
Patent number: 12147410Abstract: A data compression, decompression method, and an electronic device. The method includes the following steps: establishing an initial lookup table by using data with the same value in dataset to be compressed as one index, sequentially building a new Huffman tree corresponding to each index, and then adding a separator to obtain an encoding list containing a target encoding value and length, adding the encoding list to the initial lookup table to obtain a target lookup table. According to a separator of bitstream data, and searching the target lookup table in parallel, and use the indexes to obtain the decompression result of the data to be decompressed. Embodiments can perform a parallel decompression operation to increase a decompression speed, so that the decompression speed can meet the requirement of an AI engine for a large amount of weight data bandwidth in real time.Type: GrantFiled: July 31, 2023Date of Patent: November 19, 2024Assignee: Shenzhen Corerain Technologies Co., Ltd.Inventors: Yuanchao Li, Kuen Hung Tsoi, Xinyu Niu
-
Patent number: 12112043Abstract: A data flow control device in a streaming architecture chip includes at least one first data buffer module, at least one operation module and at least one second data buffer module. The second data buffer module is configured to send a flow control count signal to the first data buffer module, the flow control count signal being used for informing the first data buffer module of an amount of data that can be received of the second data buffer module. The first data buffer module is configured to send a data signal and a valid signal to the second data buffer module via the operation modules according to the flow control count signal, the valid signal being used for indicating that a corresponding data signal is valid.Type: GrantFiled: March 6, 2023Date of Patent: October 8, 2024Assignee: Shenzhen Corerain Technologies Co., Ltd.Inventors: Chenchen Lu, Kuen Hung Tsoi, Xinyu Niu
-
Publication number: 20240311686Abstract: A model compiling method and apparatus, and a model running system. The method includes: parsing a model file to obtain a first computational graph; determining runtime information of a first set of first operators according to a user input and the first computational graph; determining hardware configuration information of a first operator according to the runtime information of each first operator in the first set of first operators; and sending the hardware configuration information of the first operator to an execution device to cause the execution device to perform computation of the first operator.Type: ApplicationFiled: August 31, 2022Publication date: September 19, 2024Applicant: SHENZHEN CORERAIN TECHNOLOGIES CO., LTD.Inventors: Jiongkai HUANG, Kuen-Hung TSOI, Xinyu NIU
-
Publication number: 20240220203Abstract: A streaming-based compute unit and method, and an artificial intelligence chip, relating to artificial intelligence field. The compute unit includes N registers configured to perform N convolutions on N convolution windows and a convolution kernel. A jth convolution includes performing M multiplications on M data in a jth convolution window and M data in the convolution kernel, to obtain M first computation results. The N convolutions include N multiplications sequentially and consecutively performed on at least one set of feature map data and convolution kernel data. Each feature map data set includes N data from N convolution windows at the same position. A jth register is configured to store a second computation result of the jth convolution window. After an ith multiplication in the jth convolution, the second computation result is updated into a sum of i first computation results in the jth convolution.Type: ApplicationFiled: July 31, 2023Publication date: July 4, 2024Applicant: Shenzhen Corerain Technologies Co., Ltd.Inventors: Li JIAO, Kuen Hung Tsoi, Xinyu Niu
-
Publication number: 20240220765Abstract: A data processing method and apparatus for a neural network model, a device, and a storage medium are provided. The method includes: acquiring multiple neural network operators in a neural network model; fusing the multiple neural network operators according to a preset rule to obtain fused neural network operators; combining the fused neural network operators into computation instructions; and performing computation on the computation instructions by using a computation engine.Type: ApplicationFiled: January 26, 2021Publication date: July 4, 2024Inventors: Jiongkai HUANG, Kuen Hung TSOI, Xinyu NIU
-
Publication number: 20240184763Abstract: A data compression, decompression method, and an electronic device. The method includes the following steps: establishing an initial lookup table by using data with the same value in dataset to be compressed as one index, sequentially building a new Huffman tree corresponding to each index, and then adding a separator to obtain an encoding list containing a target encoding value and length, adding the encoding list to the initial lookup table to obtain a target lookup table. According to a separator of bitstream data, and searching the target lookup table in parallel, and use the indexes to obtain the decompression result of the data to be decompressed. Embodiments can perform a parallel decompression operation to increase a decompression speed, so that the decompression speed can meet the requirement of an AI engine for a large amount of weight data bandwidth in real time.Type: ApplicationFiled: July 31, 2023Publication date: June 6, 2024Applicant: Shenzhen Corerain Technologies Co., Ltd.Inventors: Yuanchao LI, Kuen Hung TSOI, Xinyu NIU
-
Publication number: 20240126684Abstract: A sparse data storage method for deep learning, a computer device and a storage medium. The method includes: obtaining an offset between current non-zero data and previous non-zero data of the current non-zero data, and generating to-be-transmitted data according to the current non-zero data and the offset, where the to-be-transmitted data is stored in a first memory; obtaining the to-be-transmitted data, calculating an address increment according to the offset, and obtaining, according to the address increment, a storage address in which the current non-zero data is to be stored in a second memory; and transmitting the current non-zero data to the second memory, and storing the current non-zero data in the storage address in the second memory. According to the embodiments, the power consumption and costs required by deep learning operations can be reduced.Type: ApplicationFiled: July 31, 2023Publication date: April 18, 2024Applicant: Shenzhen Corerain Technologies Co., Ltd.Inventors: Kuen Hung TSOI, Xinyu Niu
-
Publication number: 20230350974Abstract: The present application provides a quantitative computation method and apparatus applied to depthwise convolution. The method includes: determining n multipliers adopted for standard convolution in a preset part of quantitative computation; equally distributing the n multipliers to a first part and a second part of depthwise convolution in the quantitative computation; in the depthwise convolution, computing a first result of a target pixel point in a target block unit in the first part by one multiplier in the first part, and computing a second result of the target pixel point in the second part by one multiplier in the second part; and obtaining quantified results of the target block unit specific to the first part and the second part according to the first result and the second result of each target pixel point. According to the present application, resources are utilized to the maximum extent.Type: ApplicationFiled: March 3, 2023Publication date: November 2, 2023Inventors: Xiayang Zhou, Kuen Hung Tsoi, Xinyu Niu
-
Patent number: 11797277Abstract: A neural network model conversion method, a server, and a storage medium are provided according to embodiments of the present disclosure. The neural network model conversion method includes: parsing a neural network model to obtain initial model information; reconstructing the initial model information to obtain streaming model information; generating a target model information file according to the streaming model information; and running, under a streaming architecture, the neural network model according to the target model information file.Type: GrantFiled: October 22, 2019Date of Patent: October 24, 2023Assignee: Shenzhen Corerain Technologies Co., Ltd.Inventors: Chao Xiong, Kuenhung Tsoi, Xinyu Niu
-
Publication number: 20230325184Abstract: The present disclosure provides an artificial intelligence chip, an accelerator and an operation method, relating to the technical field of artificial intelligence, the chip comprising: a first operation circuit configured to execute a first operation to output a first operation result; a second operation circuit connected in parallel with the first operation circuit and configured to execute a second operation identical to the first operation to output a second operation result; and a third operation circuit configured to, upon receiving the first operation result and the second operation result, execute a third operation different from the first operation on the first operation result and the second operation result, respectively, to output a third operation result, respectively.Type: ApplicationFiled: March 14, 2023Publication date: October 12, 2023Inventors: Jiadong Wang, Xinyu Niu, Kuen Hung Tsoi
-
Publication number: 20230325307Abstract: Disclosed are an apparatus and a method for address generation, a data buffer, and an artificial intelligence chip. The apparatus includes address generating circuits, including N first address generating circuits and M second address generating circuits, where an n-th first address generating circuit generates a first address yn of each element in each of first matrices required for computations on an n-th first dimension according to yn=floor(anxn+bn)×Tn, and the first matrices are distributed along M second dimensions; and an m-th second address generating circuit generates a second address ym of each first matrix on an m-th second dimension according to ym=floor(amxm+bm)×Tm; and an address combining circuit generating an address for accessing each element in each first matrix by combining the second address of each first matrix on the M second dimensions and the first address of each element in each first matrix on N first dimensions.Type: ApplicationFiled: March 16, 2023Publication date: October 12, 2023Inventors: Li Jiao, Kuen Hung Tsoi, Xinyu Niu
-
Publication number: 20230305976Abstract: A data flow-based neural network multi-engine synchronous calculation system, include: a plurality of calculation engines each including a plurality of calculation modules and at least one cache module located at different layers, and each calculation module is configured to calculate an input calculation graph provided by the cache module or the calculation module of a previous layer of a layer where each calculation module is located, so as to obtain an output calculation graph; and at least one synchronization module each being configured to monitor the data amount of the input calculation graph stored by the cache module on the same layer in each calculation engine, and control, when the data amount reaches a preset value corresponding to each cache module, each cache module on the same layer to output the stored input calculation graph to the calculation module on a next layer.Type: ApplicationFiled: June 4, 2021Publication date: September 28, 2023Inventors: Li Jiao, Yuanchao Li, Kuen Hung Tsoi, Xinyu Niu