Patents by Inventor Yunji Chen

Yunji Chen has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20190065187
    Abstract: Aspects for vector operations in neural network are described herein. The aspects may include a vector caching unit configured to store a vector, wherein the vector includes one or more elements. The aspects may further include a computation module that includes one or more comparers configured to compare the one or more elements to generate an output result that satisfies a predetermined condition included in an instruction.
    Type: Application
    Filed: October 25, 2018
    Publication date: February 28, 2019
    Inventors: Tian Zhi, Shaoli Liu, Qi Guo, Tianshi Chen, Yunji Chen
  • Publication number: 20190057063
    Abstract: Aspects for submatrix operations in neural network are described herein. The aspects may include a controller unit configured to receive a submatrix instruction. The submatrix instruction may include a starting address of a submatrix of a matrix, a width of the submatrix, a height of the submatrix, and a stride that indicates a position of the submatrix relative to the matrix. The aspects may further include a computation module configured to select one or more values from the matrix as elements of the submatrix in accordance with the starting address of the matrix, the starting address of the submatrix, the width of the submatrix, the height of the submatrix, and the stride.
    Type: Application
    Filed: October 22, 2018
    Publication date: February 21, 2019
    Inventors: Shaoli Liu, Xiao Zhang, Yunji Chen, Tianshi Chen
  • Publication number: 20190050736
    Abstract: Aspects for maxout layer operations in neural network are described herein. The aspects may include a load/store unit configured to retrieve input data from a storage module. The input data may be formatted as a three-dimensional vector that includes one or more feature values stored in a feature dimension of the three-dimensional vector. The aspects may further include a pruning unit configured to divide the one or more feature values into one or more feature groups based on one or more data ranges and select a maximum feature value from each of the one or more feature groups. Further still, the pruning unit may be configured to delete, in each of the one or more feature groups, feature values other than the maximum feature value and update the input data with the one or more maximum feature values.
    Type: Application
    Filed: October 18, 2018
    Publication date: February 14, 2019
    Inventors: Dong Han, Qi Guo, Tianshi Chen, Yunji Chen
  • Publication number: 20190050369
    Abstract: A nonlinear function operation device and method are provided. The device may include a table looking-up module and a linear fitting module. The table looking-up module may be configured to acquire a first address of a slope value k and a second address of an intercept value b based on a floating-point number. The linear fitting module may be configured to obtain a linear function expressed as y=k×x+b based on the slope value k and the intercept value b, and substitute the floating-point number into the linear function to calculate a function value of the linear function, wherein the calculated function value is determined as the function value of a nonlinear function corresponding to the floating-point number.
    Type: Application
    Filed: October 18, 2018
    Publication date: February 14, 2019
    Inventors: Huiying Lan, Qi Guo, Yunji Chen, Tianshi Chen, Shangying Li, Zhen Li
  • Publication number: 20190026626
    Abstract: A neural network accelerator and an operation method thereof applicable in the field of neural network algorithms are disclosed. The neural network accelerator comprises an on-chip storage medium for storing data externally transmitted or for storing data generated during computing; an on-chip address index module for mapping to a correct storage address on the basis of an input index when an operation is performed; a core computing module for performing a neural network operation; and a multi-ALU device for obtaining input data from the core computing module or the on-chip storage medium to perform a nonlinear operation which cannot be completed by the core computing module. By introducing a multi -ALU design into the neural network accelerator, an operation speed of the nonlinear operation is increased, such that the neural network accelerator is more efficient.
    Type: Application
    Filed: August 9, 2016
    Publication date: January 24, 2019
    Inventors: Zidong Du, Qi Guo, Tianshi Chen, Yunji Chen
  • Publication number: 20190026246
    Abstract: The present invention is directed to the storage technical field and discloses an on-chip data partitioning read-write method, the method comprises: a data partitioning step for storing on-chip data in different areas, and storing the on-chip data in an on-chip storage medium and an off-chip storage medium respectively, based on a data partitioning strategy; a pre-operation step for performing an operational processing of an on-chip address index of the on-chip storage data in advance when implementing data splicing; and a data splicing step, for splicing the on-chip storage data and the off-chip input data to obtain a representation of the original data based on a data splicing strategy. Also provided are a corresponding on-chip data partitioning read-write system and device. Thus, read and write of repeated data can be efficiently realized, reducing memory access bandwidth requirements while providing good flexibility, thus reducing on-chip storage overhead.
    Type: Application
    Filed: August 9, 2016
    Publication date: January 24, 2019
    Inventors: Tianshi Chen, Zidong Du, Qi Guo, Yunji Chen
  • Publication number: 20190018766
    Abstract: The present disclosure may include a method that comprises: partitioning data on an on-chip and/or an off-chip storage medium into different data blocks according to a pre-determined data partitioning principle, wherein data with a reuse distance less than a pre-determined distance threshold value is partitioned into the same data block; and a data indexing step for successively loading different data blocks to at least one on-chip processing unit according a pre-determined ordinal relation of a replacement policy, wherein the repeated data in a loaded data block being subjected to on-chip repetitive addressing. Data with a reuse distance less than a pre-determined distance threshold value is partitioned into the same data block, and the data partitioned into the same data block can be loaded on a chip once for storage, and is then used as many times as possible, so that the access is more efficient.
    Type: Application
    Filed: August 9, 2016
    Publication date: January 17, 2019
    Inventors: Qi GUO, Tianshi CHEN, Yunji CHEN
  • Publication number: 20180375789
    Abstract: A communication structure comprises: a central node that is a communication data center of a network-on-chip and used for broadcasting or multicasting communication data to a plurality of leaf nodes; a plurality of leaf nodes that are communication data nodes of the network-on-chip and used for transmitting the communication data to the central node; and forwarder modules for connecting the central node with the plurality of leaf nodes and forwarding the communication data, wherein the plurality of leaf nodes are divided into N groups, each group having the same number of leaf nodes, the central node is individually in communication connection with each group of leaf nodes by means of the forwarder modules, the communication structure is a fractal-tree structure, the communication structure constituted by each group of leaf nodes has self-similarity, and the forwarder modules comprises a central forwarder module, leaf forwarder modules, and intermediate forwarder modules.
    Type: Application
    Filed: June 17, 2016
    Publication date: December 27, 2018
    Inventors: Huiying LAN, Tao LUO, Shaoli LIU, Shijin ZHANG, Yunji CHEN
  • Publication number: 20180329681
    Abstract: The present disclosure provides a quick operation device for a nonlinear function, and a method therefor. The device comprises: a domain conversion part for converting an input independent variable into a corresponding value in a table lookup range; a table lookup part for looking up a slope and an intercept of the corresponding piecewise linear fitting based on the input independent variable or an independent variable processed by the domain conversion part; and a linear fitting part for obtaining, a final result in a way of linear fitting based on the slope and the intercept obtained, by means of table lookup, by the table lookup part. The present disclosure solves the problems of slow operation speed, large area of the operation device, and high power consumption caused by the traditional method.
    Type: Application
    Filed: June 17, 2016
    Publication date: November 15, 2018
    Inventors: Shijin Zhang, Tao Luo, Shaoli Liu, Yunji Chen
  • Publication number: 20180330239
    Abstract: A compression coding apparatus for artificial neural network, including memory interface unit, instruction cache, controller unit and computing unit, wherein the computing unit is configured to perform corresponding operation to data from the memory interface unit according to instructions of controller unit; the computing unit mainly performs three steps operation: step one is to multiply input neuron by weight data; step two is to perform adder tree computing and add the weighted output neuron obtained in step one level-by-level via adder tree, or add bias to output neuron to get biased output neuron; step three is to perform activation function operation to get final output neuron. The present disclosure also provides a method for compression coding of multi-layer neural network.
    Type: Application
    Filed: July 20, 2018
    Publication date: November 15, 2018
    Inventors: Tianshi CHEN, Shaoli LIU, Qi GUO, Yunji CHEN
  • Publication number: 20180329868
    Abstract: A computing device and related products are provided. The computing device is configured to perform machine learning calculations. The computing device includes an operation unit, a controller unit, and a storage unit. The storage unit includes a data input/output (I/O) unit, a register, and a cache. Technical solution provided by the present disclosure has advantages of fast calculation speed and energy saving.
    Type: Application
    Filed: July 19, 2018
    Publication date: November 15, 2018
    Inventors: Tianshi Chen, Xiao Zhang, Shaoli Liu, Yunji Chen
  • Publication number: 20180321943
    Abstract: The present disclosure provides a data read-write scheduler and a reservation station for vector operations. The data read-write scheduler suspends the instruction execution by providing a read instruction cache module and a write instruction cache module and detecting conflict instructions based on the two modules. After the time is satisfied, instructions are re-executed, thereby solving the read-after-write conflict and the write-after-read conflict between instructions and guaranteeing that correct data are provided to a vector operations component. Therefore, the subject disclosure has more values for promotion and application.
    Type: Application
    Filed: July 19, 2018
    Publication date: November 8, 2018
    Inventors: Dong HAN, Shaoli LIU, Yunji CHEN, Tianshi Chen
  • Publication number: 20180321911
    Abstract: The present disclosure discloses an adder device, a data accumulation method and a data processing device. The adder device comprises: a first adder module provided with an adder tree unit, composed of a multi-stage adder array, and a first control unit, wherein the adder tree unit accumulates data by means of step-by-step accumulation based on a control signal of the first control unit; a second adder module comprising a two-input addition/subtraction operation unit and a second control unit, and used for performing an addition or subtraction operation on input data; a shift operation module for performing a left shift operation on output data of the first adder module; an AND operation module for performing an AND operation on output data of the shift operation module and output data of the second adder module; and a controller module.
    Type: Application
    Filed: June 17, 2016
    Publication date: November 8, 2018
    Inventors: Zhen Li, Shaoli Liu, Shijin Zhang, Tao Lou, Cheng Qian, Yunji Chen, Tianshi Chen
  • Publication number: 20180322381
    Abstract: Aspects for executing forward propagation of artificial neural network are described here. As an example, the aspects may include a plurality of computation modules connected via an interconnection unit; and a controller unit configured to decode an instruction into one or more groups of micro-instructions, wherein the plurality of computation modules are configured to perform respective groups of the micro-instructions.
    Type: Application
    Filed: July 19, 2018
    Publication date: November 8, 2018
    Inventors: Shaoli LIU, Qi GUO, Yunji CHEN, Tianshi CHEN
  • Publication number: 20180322392
    Abstract: An apparatus for executing backpropagation of an artificial neural network comprises an instruction caching unit, a controller unit, a direct memory access unit, an interconnection unit, a master computation module, and multiple slave computation modules. For each layer in a multilayer neural network, weighted summation may be performed on input gradient vectors to calculate an output gradient vector of this layer. The output gradient vector may be multiplied by a derivative value of a next-layer activation function on which forward operation is performed, so that a next-layer input gradient vector can be obtained. The input gradient vector may be multiplied by an input neuron counterpoint in forward operation to obtain the gradient of a weight value of this layer, and the weight value of this layer can be updated according to the gradient of the obtained weight value of this layer.
    Type: Application
    Filed: July 18, 2018
    Publication date: November 8, 2018
    Inventors: Shaoli Liu, Qi Guo, Yunji Chen, Tianshi Chen
  • Publication number: 20180321944
    Abstract: The present disclosure relates to a data ranking apparatus that comprises: a register group for storing K pieces of temporarily ranked maximum or minimum data in a data ranking process, the register group comprises a plurality of registers connected in parallel, and two adjacent registers unidirectionally transmit data from a low level to a high level; a comparator group, which comprises a plurality of comparators connected to the registers on a one-to-one basis, compares the size relationship among a plurality of pieces of input data, and outputs the data of larger or smaller value to the corresponding registers; and a control circuit generating a plurality of flag bits applying to the registers, wherein the flag bits are used to judge whether the registers receive data transmitted from corresponding comparators or lower-level registers, and judge whether the registers transmit data to high level registers.
    Type: Application
    Filed: June 17, 2016
    Publication date: November 8, 2018
    Inventors: Daofu LIU, Shengyuan ZHOU, Yunji CHEN
  • Publication number: 20180321912
    Abstract: The present disclosure provides a data accumulation device and method, and a digital signal processing device. The device comprises: an accumulation tree module for accumulating input data in the form of a binary tree structure and outputting accumulated result data; a register module including a plurality of groups of registers and used for registering intermediate data generated by the accumulation tree module during an accumulation process and the accumulated result data; and a control circuit for generating a data gating signal to control the accumulation tree module to filter the input data not required to be accumulated, and generating a flag signal to perform the following control: selecting a result obtained after adding one or more of intermediate data stored in the register to the accumulated result as output data, or directly selecting the accumulated result as output data. Thus, a plurality of groups of input data can be rapidly accumulated to a group of sums in a clock cycle.
    Type: Application
    Filed: June 17, 2016
    Publication date: November 8, 2018
    Inventors: Zhen LI, Shaoli LIU, Shijin ZHANG, Tao LUO, Cheng QIAN, Yunji CHEN, Tianshi CHEN
  • Publication number: 20180314928
    Abstract: The present disclosure provides an operation apparatus and method for an acceleration chip for accelerating a deep neural network algorithm. The apparatus comprises: a vector addition processor module for performing addition or subtraction of a vector, and/or a vectorized operation of a pooling layer algorithm in a deep neural network algorithm; a vector function value arithmetic unit module for performing a vectorized operation of a non-linear evaluation in the deep neural network algorithm; and a vector multiplier-adder module for performing a multiply-add operation on the vector, wherein the three modules execute a programmable instruction, and interact with each other to calculate values of neurons and a network output result of a neural network, and a variation amount of a synaptic weight representing the interaction strength of the neurons on an input layer to the neurons on an output layer.
    Type: Application
    Filed: June 17, 2016
    Publication date: November 1, 2018
    Inventors: Zhen Li, Shaoli Liu, Shijin Zhang, Tao Luo, Cheng Qian, Yunji Chen, Tianshi Chen
  • Publication number: 20180262205
    Abstract: Aspects for converting floating-point numbers in a processor are described herein. As an example, the aspects may include receiving, by a floating-point number converter, an exponent bit length, a base value, and one or more first floating-point numbers of a first bit length. Further, the aspects may include calculating, by the floating-point number converter, one or more second floating-point numbers of a second bit length based on the exponent bit length and the base value, the one or more second floating-point numbers respectively corresponding to the one or more first floating-point numbers.
    Type: Application
    Filed: May 9, 2018
    Publication date: September 13, 2018
    Inventors: Zhen Li, Shaoli Liu, Tianshi Chen, Yunji Chen
  • Publication number: 20180260709
    Abstract: Aspects for modifying data in a multi-layer neural network (MNN) acceleration processor for neural networks are described herein. As an example, the aspects may include receiving one or more groups of input data and connection data. Further, the aspects may include modifying the one or more groups of input data based on the connection data. Further still, the aspects may include calculating one or more groups of output data based on the modified input data.
    Type: Application
    Filed: May 9, 2018
    Publication date: September 13, 2018
    Inventors: Shijin Zhang, Qi Guo, Yunji Chen, Tianshi Chen