Patents by Inventor Yujie Hu

Yujie Hu has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11106431
    Abstract: A computing device to implement fast floating-point adder tree for the neural network applications is disclosed. The fast float-point adder tree comprises a data preparation module, a fast fixed-point Carry-Save Adder (CSA) tree, and a normalization module. The floating-point input data comprises a sign bit, exponent part and fraction part. The data preparation module aligns the fraction part of the input data and prepares the input data for subsequent processing. The fast adder uses a signed fixed-point CSA tree to quickly add a large number of fixed-point data into 2 output values and then uses a normal adder to add the 2 output values into one output value. The fast adder uses for a large number of operands is based on multiple levels of fast adders for a small number of operands. The output from the signed fixed-point Carry-Save Adder tree is converted to a selected floating-point format.
    Type: Grant
    Filed: February 22, 2020
    Date of Patent: August 31, 2021
    Assignee: DINOPLUSAI HOLDINGS LIMITED
    Inventors: Yutian Feng, Yujie Hu
  • Publication number: 20210264257
    Abstract: An AI (Artificial Intelligence) processor for Neural Network (NN) Processing shared by multiple users is disclosed. The AI processor comprises a Multiplier Unit (MXU), a Scalar Computing Unit (SCU), a unified buffer coupled to the MXU and SCU to store data and a control circuitry coupled to the CCU and the unified buffer. The MXU comprises a plurality of Processing Elements (PEs) responsible for computing matrix multiplications. The SCU coupled to output of the MXU is responsible for computing the activation function. The control circuitry is configured to perform the space division and time division NN processing for a plurality of users. At one time instance, at least one of the MXU and SCU is shared by two or more users; and at least one user is using a part of the MXU while the other user is using a part of the SCU.
    Type: Application
    Filed: February 28, 2019
    Publication date: August 26, 2021
    Inventors: Yujie HU, Xiaosong WANG, Tong WU, Steven SERTILLANGE
  • Publication number: 20210141697
    Abstract: Embodiments described herein provide a mission-critical artificial intelligence (AI) processor (MAIP), which includes multiple types of HEs (hardware elements) comprising one or more HEs configured to perform operations associated with multi-layer NN (neural network) processing, at least one spare HE, a data buffer to store correctly computed data in a previous layer of multi-layer NN processing computed, and fault tolerance (FT) control logic. The FT control logic is configured to: determine a fault in a current layer NN processing associated with the HE; cause the correctly computed data in the previous layer of multi-layer NN processing to be copied or moved to said at least one spare HE; and cause said at least one spare HE to perform the current layer NN processing using said at least one spare HE and the correctly computed data in the previous layer of multi-layer NN processing.
    Type: Application
    Filed: February 25, 2019
    Publication date: May 13, 2021
    Inventors: Chung Kuang CHIN, Yujie HU, Tong WU, Clifford GOLD, Yick Kei WONG, Xiaosong WANG, Steven SERTILLANGE, Zongwei ZHU
  • Publication number: 20200272417
    Abstract: A computing device to implement fast floating-point adder tree for the neural network applications is disclosed. The fast float-point adder tree comprises a data preparation module, a fast fixed-point Carry-Save Adder (CSA) tree, and a normalization module. The floating-point input data comprises a sign bit, exponent part and fraction part. The data preparation module aligns the fraction part of the input data and prepares the input data for subsequent processing. The fast adder uses a signed fixed-point CSA tree to quickly add a large number of fixed-point data into 2 output values and then uses a normal adder to add the 2 output values into one output value. The fast adder uses for a large number of operands is based on multiple levels of fast adders for a small number of operands. The output from the signed fixed-point Carry-Save Adder tree is converted to a selected floating-point format.
    Type: Application
    Filed: February 22, 2020
    Publication date: August 27, 2020
    Inventors: Yutian Feng, Yujie Hu
  • Patent number: 10747631
    Abstract: Embodiments described herein provide a mission-critical artificial intelligence (AI) processor (MAIP), which includes an instruction buffer, processing circuitry, a data buffer, command circuitry, and communication circuitry. During operation, the instruction buffer stores a first hardware instruction and a second hardware instruction. The processing circuitry executes the first hardware instruction, which computes an intermediate stage of an AI model. The data buffer stores data generated from executing the first hardware instruction. The command circuitry determines that the second hardware instruction is a hardware-initiated store instruction for transferring the data from the data buffer. Based on the hardware-initiated store instruction, the communication circuitry transfers the data from the data buffer to a memory device of a computing system, which includes the mission-critical processor, via a communication interface.
    Type: Grant
    Filed: June 5, 2018
    Date of Patent: August 18, 2020
    Assignee: DINOPLUSAI HOLDINGS LIMITED
    Inventors: Yujie Hu, Tong Wu, Xiaosong Wang, Zongwei Zhu, Chung Kuang Chin, Clifford Gold, Steven Sertillange, Yick Kei Wong
  • Publication number: 20190279083
    Abstract: A computing device for fast weighted sum calculation in neural networks is disclosed. The computing device comprises an array of processing elements configured to accept an input array. Each processing element comprises a plurality of multipliers and a multiple levels of accumulators. A set of weights associated with the inputs and a target output are provided to a target processing element to compute the weighted sum for the target output. The device according to the present invention reduces the computation time from M clock cycles to log2M, where M is the size of the input array.
    Type: Application
    Filed: April 19, 2018
    Publication date: September 12, 2019
    Inventors: Cliff Gold, Tong Wu, Yujie Hu, Chung Kuang Chin, Xiaosong Wang, Yick Kei Wong
  • Publication number: 20190227887
    Abstract: Embodiments described herein provide a mission-critical artificial intelligence (AI) processor (MAIP), which includes an instruction buffer, processing circuitry, a data buffer, command circuitry, and communication circuitry. During operation, the instruction buffer stores a first hardware instruction and a second hardware instruction. The processing circuitry executes the first hardware instruction, which computes an intermediate stage of an AI model. The data buffer stores data generated from executing the first hardware instruction. The command circuitry determines that the second hardware instruction is a hardware-initiated store instruction for transferring the data from the data buffer. Based on the hardware-initiated store instruction, the communication circuitry transfers the data from the data buffer to a memory device of a computing system, which includes the mission-critical processor, via a communication interface.
    Type: Application
    Filed: June 5, 2018
    Publication date: July 25, 2019
    Applicant: DinoplusAI Holdings Limited
    Inventors: Yujie Hu, Tong Wu, Xiaosong Wang, Zongwei Zhu, Chung Kuang Chin, Clifford Gold, Steven Sertillange, Yick Kei Wong
  • Publication number: 20080127162
    Abstract: A cross-platform configuration system manages configuration information for application software. In one embodiment, a process includes, but is not limited to, storing configuration information in a configuration file using a cross-platform markup language, the configuration information including configuration data associated with the operating environment and user data associated with the application, and configuring the application by accessing the configuration file without using a registry of an operating environment in which the application is running.
    Type: Application
    Filed: November 29, 2006
    Publication date: May 29, 2008
    Inventors: Kui Xu, Yujie Hu, Ting Wang
  • Patent number: D1065059
    Type: Grant
    Filed: November 18, 2022
    Date of Patent: March 4, 2025
    Inventor: Yujie Hu