Patents Assigned to Nanjing Iluvatar CoreX Technology Co., Ltd.
  • Publication number: 20200264781
    Abstract: Embodiments of the invention may provide a technical solution by reassigning memory access as a function of physical location information of a memory group. A physical location of an agent in a multi-agent system is first identified. Memory access requests from instructions of the agent are determined. In another embodiment, based on the physical location of the agent, a scheduler may determine a group of memory units having a physical location that is closest to the physical location of the agent. The scheduler may then assign the determined memory access requests to the group of memory units.
    Type: Application
    Filed: February 20, 2019
    Publication date: August 20, 2020
    Applicant: Nanjing Iluvatar CoreX Technology Co., Ltd. (DBA “Iluvatar CoreX Inc. Nanjing”)
    Inventors: Cheng Li, Pingping Shao, Pei Luo
  • Publication number: 20200264921
    Abstract: Embodiments of the invention may provide a technical solution by having a system for configuring a crypto memory engine. A memory is configured to store instructions for execution by an cryptography application. A graphics processing unit (GPU) is configured to execute the cryptography application, wherein the GPU is configured to identify a set of commonly used algorithms by the cryptography application. The identifying comprises identifying one or more input parameters for the set of commonly used algorithms. The identified set of commonly used algorithms is compiled and stored as a virtual memory module. The virtual memory module with the compiled set of commonly used algorithms is provided to the cryptography application.
    Type: Application
    Filed: February 20, 2019
    Publication date: August 20, 2020
    Applicant: Nanjing Iluvatar CoreX Technology Co., Ltd. (DBA "Iluvatar CoreX Inc. Nanjing")
    Inventors: Pei Luo, Pingping Shao, Cheng Li
  • Publication number: 20200264891
    Abstract: Embodiments of the invention provides a technical solution by modifying or changing unused scalar register to become constant scalar register. By using unused scalar register, aspects of the invention may decrease latency of scalar processing while decrease reiteration in the scalar processing. Embodiments of the invention further reduce the need for separate data store units, such as cache or other storage units.
    Type: Application
    Filed: February 20, 2019
    Publication date: August 20, 2020
    Applicant: Nanjing Iluvatar CoreX Technology Co., Ltd. (DBA “Iluvatar CoreX Inc. Nanjing”)
    Inventors: Cheng Li, Pingping Shao, Pei Luo
  • Publication number: 20200264879
    Abstract: Embodiments of the invention may provide a technical solution by identifying a set of scalar and vector instructions. The set of scalar and vector instructions may be set to be executed in a kernel. A scalar instruction of the set of scalar and vector instructions is compared to a predefined set of scalar instructions. Based on the comparison, a secondary scalar pipeline is generated for the scalar instruction for processing. The remaining scalar instructions from the set of scalar and vector instructions is assigned to a first scalar pipeline. Vector instructions of the set of scalar and vector instructions are assigned to a vector pipeline.
    Type: Application
    Filed: February 20, 2019
    Publication date: August 20, 2020
    Applicant: Nanjing Iluvatar CoreX Technology Co., Ltd. (DBA "Iluvatar CoreX Inc. Nanjing")
    Inventors: Cheng Li, Pingping Shao, Pei Luo
  • Publication number: 20200264873
    Abstract: Embodiments of the invention provide a technical solution by making changes of scalar units to enable it for cryptography applications with high performance. Aspects of the invention provide a scalar unit having four 32-bit arithmetic logic units (ALUs). These four ALUs may be used as independently as four individual lanes, each generates 32-bit results. As such, the Instruction per Cycle (IPC) may be 4. In addition, these four sets of 32-bit ALUs may be configured as two 64-bit ALUs with each two of the 32-bit ALUs in one group. This configuration may, in one embodiment, generate two 64-bit results each cycle. Moreover, these four sets of 32-bit ALUs may be configured as one 128-bit ALU when the ALUs are combined as one single unit. Aspects of the invention create an output from the set of four 32-bit scalar ALUs with data width or format that is other than 32-bit.
    Type: Application
    Filed: February 20, 2019
    Publication date: August 20, 2020
    Applicant: Nanjing Iluvatar CoreX Technology Co., Ltd. (DBA ? Iluvatar CoreX Inc. Nanjing?)
    Inventors: Pei Luo, Pingping Shao, Cheng Li
  • Publication number: 20200210759
    Abstract: A computerized method identifies an input and kernel similarity in binarized neural network (BNN) across different applications as they are being processed by processors such as a GPU. The input and kernel similarity in BNN across different applications are analyzed to reduce computation redundancy to accelerate BNN inference. A computer-executable instructions stored thereon an on-chip arrangement receives a first data value for a data source for processing by the BNN at an inference phase. The computer-executable instructions further receives a second data value for the data source for processing by the BNN at the inference phase. The first data value is processed bitwise operations. A difference between the first data value and the second data value is calculated. The difference is stored in the on-chip arrangement. The computer-executable instructions applies the bitwise operations to the stored difference.
    Type: Application
    Filed: December 31, 2018
    Publication date: July 2, 2020
    Applicant: Nanjing Iluvatar CoreX Technology Co., Ltd. (DBA "Iluvatar CoreX Inc. Nanjing")
    Inventors: Tien-Pei Chou, Po-Wei Chou, Ching-En Lee, Cheng Fu
  • Patent number: 10671288
    Abstract: A hierarchical sparse tensor compression method based on artificial intelligence devices, in DRAM, not only saves the storage space of the neuron surface, but also adds a meta-surface to the mask block. When reading data, the mask is first read, then the size of the non-zero data is calculated, and only these non-zero data are read to save DRAM bandwidth. In the cache, only non-zero data is stored, so the required storage space is reduced. When processing data, only non-zero data is used. The method uses a bit mask to determine if the data is zero. There are three levels in the hierarchical compression scheme: tiles, lines, and points, reading bitmasks and non-zero data from DRAM, and saving bandwidth by not reading zero data. When processing data, if their bit mask is zero, the tile data may be easily removed.
    Type: Grant
    Filed: December 31, 2018
    Date of Patent: June 2, 2020
    Assignee: Nanjing Iluvatar CoreX Technology Co., Ltd.
    Inventors: Pingping Shao, Jiejun Chen, Yongliu Wang
  • Publication number: 20200057722
    Abstract: A data reading and writing method based on a variable length cache line. A lookup table stores cache line information of each request. When a read task arrives at the cache, the cache line information is obtained according to the request index, and the request is hit. The data in the cache is read and sent to the requester in multiple cycles, otherwise the request is not in the cache, some read requests are created and sent. The offset, tag and cache line size are recorded in the record of the lookup table, and the request is sent to the DRAM. Once all the data is returned and written to the cache, the corresponding record of the lookup table is set to be valid.
    Type: Application
    Filed: December 31, 2018
    Publication date: February 20, 2020
    Applicant: Nanjing Iluvatar CoreX Technology Co., Ltd. (DBA “Iluvatar CoreX Inc. Nanjing”)
    Inventors: Yongliu Wang, Pingping Shao, Chenggen Zheng, Jinshan Zheng
  • Publication number: 20200042860
    Abstract: A data reuse method based on a convolutional neural network accelerator includes a tile scanning module receiving command information of a command module, the command information comprising a size of a CNN job to be divided into tile blocks; a tile scanning module according to a tile. The size of the tile generates the coordinates of the tile block and sends it to the memory request module; the memory request module generates a memory read request and sends the memory read request to the memory module; the memory module sequentially returns the tile block data to the input activation In the weight buffer unit, the input activation weight buffer unit saves the received tile block data to implement data reuse and transmits the received tile block data to the calculation processing unit PE.
    Type: Application
    Filed: December 31, 2018
    Publication date: February 6, 2020
    Applicant: Nanjing Iluvatar CoreX Technology Co., Ltd. (DBA “Iluvatar CoreX Inc. Nanjing”)
    Inventors: Yile Sun, Pingping Shao, Yongliu Wang, Jinshan Zheng, Yunxiao Zou, Haihua Zhai
  • Publication number: 20200042189
    Abstract: A hierarchical sparse tensor compression method based on artificial intelligence devices, in DRAM, not only saves the storage space of the neuron surface, but also adds a meta-surface to the mask block. When reading data, the mask is first read, then the size of the non-zero data is calculated, and only these non-zero data are read to save DRAM bandwidth. In the cache, only non-zero data is stored, so the required storage space is reduced. When processing data, only non-zero data is used. The method uses a bit mask to determine if the data is zero. There are three levels in the hierarchical compression scheme: tiles, lines, and points, reading bitmasks and non-zero data from DRAM, and saving bandwidth by not reading zero data. When processing data, if their bit mask is zero, the tile data may be easily removed.
    Type: Application
    Filed: December 31, 2018
    Publication date: February 6, 2020
    Applicant: Nanjing Iluvatar CoreX Technology Co., Ltd. (DBA “Iluvatar CoreX Inc. Nanjing”)
    Inventors: Pingping Shao, Jiejun Chen, Yongliu Wang
  • Publication number: 20200042881
    Abstract: A core computing unit processor and a processing method for an artificial intelligence device, wherein the processor is provided with a plurality of neurons, wherein the neuron is composed of a plurality of multiplier groups. The multiply-adder group includes a plurality of multiplier units having an operation function of accumulating, maxima, and minimum values, and the number of multiplier groups in each neuron is the same, and each of the multiplier groups is The number of multiplier units is the same, the multiplier group in one neuron shares the same input activation data, and the multiplier group in one neuron processes different kernel weight data, but multiply and add the same order in different neurons. The device group processes the same kernel weight data, and there is no data conversion between each multiplier group.
    Type: Application
    Filed: December 31, 2018
    Publication date: February 6, 2020
    Applicant: Nanjing Iluvatar CoreX Technology Co., Ltd. (DBA "Iluvatar CoreX Inc. Nanjing")
    Inventors: Yunxiao Zou, Pingping Shao, Min Cai, Jinshan Zheng, Guangzhou Li
  • Publication number: 20200042867
    Abstract: A hardware architecture that may include: a host, a frontal engine, a parietal engine, a renderer engine, an occipital engine, a temporal engine, and a memory. The frontal engine may obtain a 5D tensor from the host and divide it into several groups of tensors. These groups of tensors may be sent or transmitted to the parietal engine, and the parietal engine may take the groups of tensors to further divide them into several tensors. The parietal engine may send these tensors to the renderer engine for execution and may send a partial amount of tensors to the occipital engine. The occipital engine may accumulate the partial amount of tensors and may execute them. The occipital engine may send the output feature as the final tensor to the temporal engine. The temporal engine may compress the final tensor before storing or saving it to the memory.
    Type: Application
    Filed: April 30, 2019
    Publication date: February 6, 2020
    Applicant: Nanjing Iluvatar CoreX Technology Co., Ltd. (DBA “Iluvatar CoreX Inc. Nanjing”)
    Inventor: Pingping Shao
  • Publication number: 20200042868
    Abstract: The present invention is a flexible data stream processor and processing method for an artificial intelligence device, including a frontal engine, a parietal engine group, an occipital engine, and a temporal engine; capable of dividing a tensor into a plurality of tile blocks, and then each Tile blocks are divided into several tiles, each tile is divided into several wave blocks, each wave block is divided into several waves, and waves with the same rendering features are processed in the same neuron block; AI work can be distributed across multiple parietal engines for parallel processing and weight reuse, activation reuse, weight station reuse, partial and reuse.
    Type: Application
    Filed: December 31, 2018
    Publication date: February 6, 2020
    Applicant: Nanjing Iluvatar CoreX Technology Co., Ltd. (DBA Iluvatar CoreX Inc. Nanjing)
    Inventors: Pingping Shao, Yile Sun, Ching-En Lee, Jinshan Zheng, Yunxiao Zou