Patents by Inventor Zhou Hong

Zhou Hong has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11941396
    Abstract: The present disclosure provides a DIDT control method. The method includes, at each of a plurality of DIDT control modules: obtaining a local operation load of a local ALU in each clock cycle; obtaining a global operation load of a plurality of ALUs in each cycle period; determining an operation load index of the local ALU based on local historical load information and a local historical load weight set of the local ALU and global historical load information and a global historical load weight set of the multiple ALUs, the global historical load information includes a first number of the global operation loads, the local historical load information includes a second number of the local operation loads; and adjusting an operation load of the local ALU based on the operation load index of the local ALU and a predetermined load threshold to control a DIDT of the local ALU.
    Type: Grant
    Filed: October 3, 2022
    Date of Patent: March 26, 2024
    Assignee: Shanghai Biren Technology Co., Ltd
    Inventors: Zhou Hong, Yunya Fei, Hao Shu, ChengKun Sun
  • Patent number: 11915338
    Abstract: The disclosed technology generally relates to a graphics processing unit (GPU). In one aspect, a GPU includes a general purpose register (GPR) having registers, an arithmetic logic unit (ALU) configured to read pixels of an image independently of a shared memory, and a level 1 (L1) cache storing the pixels read by the ALU. The ALU can implement pixel mapping by fetching a quad of pixels, which includes pixels of first, second, third, and fourth pixel types, from the L1 cache, grouping the pixels of the different pixel types of the quad into four groups based on pixel type, and, for each group, separating the pixels included in the group into three regions that each have a set of pixels. The pixels for each group can then be loaded into the registers corresponding to the three regions.
    Type: Grant
    Filed: May 13, 2021
    Date of Patent: February 27, 2024
    Assignee: Huawei Technologies Co., Ltd.
    Inventors: Zhou Hong, Yufei Zhang
  • Patent number: 11908061
    Abstract: Methodologies and architectures are provided for inter-thread sharing of data in a general purpose register (GPR) of a multiprocessor apparatus. The data sharing is performed by a graphics processing unit (GPU) having at least one processing cluster including a plurality of processing cores (PCs) configured for parallel operation. Each PC of a cluster is configured to utilize a dedicated portion of the GPR. The GPU further includes a shared memory for the cluster, and a memory read/write hub coupled to the GPR and shared memory, the hub including a crossbar switch. A PC executes a move data instruction, including operands referencing a destination portion of the GPR and a source portion assigned to the PC, to retrieve data from the source portion. The memory read/write hub writes the data, via the crossbar switch, to the destination portion of the GPR without first writing the data to the shared memory.
    Type: Grant
    Filed: September 1, 2021
    Date of Patent: February 20, 2024
    Assignee: Huawei Technologies Co., Ltd.
    Inventors: Zhou Hong, Yufei Zhang
  • Patent number: 11900175
    Abstract: The embodiments of the disclosure relate to a computing device, a computing equipment, and a programmable scheduling method for data loading and execution, and relate to the field of computer. The computing device is coupled to a first computing core and a first memory. The computing device includes a scratchpad memory, a second computing core, a first hardware queue, a second hardware queue and a synchronization unit. The second computing core is configured for acceleration in a specific field. The first hardware queue receives a load request from the first computing core. The second hardware queue receives an execution request from the first computing core. The synchronization unit configured to make the triggering of the load request and the execution request to cooperate with each other. In this manner, flexibility, throughput, and overall performance can be enhanced.
    Type: Grant
    Filed: November 11, 2021
    Date of Patent: February 13, 2024
    Assignee: Shanghai Biren Technology Co., Ltd
    Inventors: Zhou Hong, YuFei Zhang, ChengKun Sun, Lin Chen
  • Publication number: 20240048475
    Abstract: An information processing method, an interconnection device, and a computer-readable storage medium are provided. The interconnection device includes a request processing module configured for: receiving a data access request from at least one processor, wherein the data access request comprises a merge bit, a multicast group identifier (MGID), and a multicast transaction identifier (MTID); determining whether the data access request is a multicast request; determining whether the interconnection device receives other multicast requests if it is determined that the data access request is a multicast request based on the MGID, the MTID, and a static routing policy of a multicast group; and obtaining the other multicast requests if it is determined that the interconnection device receives the other multicast requests, merging the multicast request with the other multicast requests into a merged request, and forwarding the merged request to a next-hop device of the interconnection device.
    Type: Application
    Filed: October 15, 2023
    Publication date: February 8, 2024
    Applicant: Shanghai Biren Technology Co.,Ltd
    Inventors: Qin ZHENG, Zhou HONG, YuFei ZHANG, Lin CHEN, ChengKun SUN, Tong SUN, ChengPing LUO, HaiChuan WANG
  • Patent number: 11855878
    Abstract: An information processing method, an interconnection device, and a computer-readable storage medium are provided. The interconnection device includes a request processing module configured for: receiving a data access request from at least one processor, wherein the data access request comprises a merge bit, a multicast group identifier (MGID), and a multicast transaction identifier (MTID); determining whether the data access request is a multicast request; determining whether the interconnection device receives other multicast requests if it is determined that the data access request is a multicast request based on the MGID, the MTID, and a static routing policy of a multicast group; and obtaining the other multicast requests if it is determined that the interconnection device receives the other multicast requests, merging the multicast request with the other multicast requests into a merged request, and forwarding the merged request to a next-hop device of the interconnection device.
    Type: Grant
    Filed: November 11, 2021
    Date of Patent: December 26, 2023
    Assignee: Shanghai Biren Technology Co., Ltd
    Inventors: Qin Zheng, Zhou Hong, YuFei Zhang, Lin Chen, ChengKun Sun, Tong Sun, ChengPing Luo, HaiChuan Wang
  • Patent number: 11809516
    Abstract: The invention relates to an apparatus for vector computing incorporating with matrix multiply and accumulation (MMA) calculation. The apparatus includes a streaming multiprocessor (SM), and a block selector. The register space is divided into physical blocks, each of which includes register groups, and a general matrix multiply (GEMM) calculation unit. The SM includes a general-purpose register (GPR), and the GEMM calculation unit includes an instruction queue and a arithmetic logical unit (ALU). The ALU coupled to the GPR is arranged operably to perform MMA calculation according to a GEMM instruction stored in the instruction queue, and store a calculation result in the GPR.
    Type: Grant
    Filed: July 2, 2021
    Date of Patent: November 7, 2023
    Assignee: Shanghai Biren Technology Co., Ltd
    Inventors: Zhou Hong, YuFei Zhang
  • Patent number: 11809221
    Abstract: An artificial intelligence chip and a data operation method are provided. The artificial intelligence chip receives a command carrying first data and address information and includes a chip memory, a computing processor, a base address register, and an extended address processor. The base address register is configured to access an extended address space in the chip memory. The extended address processor receives the command. The extended address processor determines an operation mode of the first data according to the address information. When the address information points to a first section of the extended address space, the extended address processor performs a first operation on the first data. When the address information points to a section other than the first section of the extended address space, the extended address processor notifies the computing processor of the operation mode and the computing processor performs a second operation on the first data.
    Type: Grant
    Filed: September 8, 2021
    Date of Patent: November 7, 2023
    Assignee: Shanghai Biren Technology Co., Ltd
    Inventors: Zhou Hong, Qin Zheng, ChengPing Luo, GuoFang Jiao, Song Zhao, XiangLiang Yu
  • Patent number: 11811512
    Abstract: A multicast routing method and an interconnection device for a mesh network system, a mesh network system and a configuration method thereof are provided. The method includes, at each internal interconnection device among multiple interconnection devices of each processing subsystem: in response to receiving a multicast access request to a destination memory, determining a shortest path from each internal interconnection device to the destination memory based on a topology structure of the mesh network system; where the internal interconnection device has no link connected to an external processing subsystem; in response to determining that the number of the shortest path is equal to one, routing the multicast access request to the destination memory along the shortest path; in response to determining that the number of the shortest path is greater than one, determining a next-hop interconnection device for the multicast access request based on a second static routing policy.
    Type: Grant
    Filed: July 5, 2022
    Date of Patent: November 7, 2023
    Assignee: Shanghai Biren Technology Co., Ltd
    Inventors: Zhou Hong, Qin Zheng, Yuzhe Li
  • Patent number: 11748077
    Abstract: The invention relates to a method for compiling code adapted for secondary offloads in a graphics processing unit (GPU). The method, performed by a processing unit, includes: reconstructing execution codes in a first kernel into a second kernel. The second kernel includes an operation table including entries, and computation codes. The computation codes include a portion of the execution codes, and synchronization hooks, and each synchronization hook includes information indicating one entry of the operation table. An order of the portion of the execution codes and the synchronization hooks in the computation codes matches an order of the execution codes in the first kernel, thereby enabling a compute unit (CU) in the GPU to execute the computation codes, and an engine in the GPU to instruct a component inside or outside of the GPU to complete a designated operation in accordance with content of each entry in the operation table.
    Type: Grant
    Filed: July 2, 2021
    Date of Patent: September 5, 2023
    Assignee: SHANGHAI BIREN TECHNOLOGY CO., LTD
    Inventors: HaiChuan Wang, Song Zhao, GuoFang Jiao, ChengPing Luo, Zhou Hong
  • Publication number: 20230267011
    Abstract: The invention relates to an apparatus for second offloads in a graphics processing unit (GPU). The apparatus includes an engine; and a compute unit (CU). The CU is arranged operably to: fetch execution codes; when each execution code is suitable to be executed by the CU, execute the execution code; and when each execution code is not suitable to be executed by the CU, generate a corresponding entry, and send a request with the corresponding entry to the engine for instructing the engine to allow a component inside or outside of the GPU to complete an operation in accordance with the corresponding entry.
    Type: Application
    Filed: April 21, 2023
    Publication date: August 24, 2023
    Applicant: Shanghai Biren Technology Co., Ltd
    Inventors: HaiChuan WANG, Song ZHAO, GuoFang JIAO, ChengPing LUO, Zhou HONG
  • Patent number: 11669327
    Abstract: The embodiments of the disclosure relate to a computing device and a method for loading data. According to the method, the first processing unit sends a first instruction to the NMP unit. The first instruction includes a first address, a plurality of second addresses, and an operation type. In response to the first instruction, the NMP unit performs operations associated with the operation type on multiple data items on the multiple second addresses of the first memory, so as to generate the operation result. The NMP unit stores the operation result to the first address of the first memory. The first processing unit issues a flush instruction to make the operation result on the first address visible to the first processing unit. The first processing unit issues a read instruction to read the operation result on the first address to the first processing unit.
    Type: Grant
    Filed: November 10, 2021
    Date of Patent: June 6, 2023
    Assignee: Shanghai Biren Technology Co., Ltd
    Inventors: Zhou Hong, YuFei Zhang
  • Patent number: 11663044
    Abstract: The invention relates to an apparatus for second offloads in a graphics processing unit (GPU). The apparatus includes an engine; and a compute unit (CU). The engine is arranged operably to store an operation table including entries. The CU is arranged operably to fetch computation codes including execution codes, and synchronization requests; execute each execution code; and send requests to the engine in accordance with the synchronization requests for instructing the engine to allow components inside or outside of the GPU to complete operations in accordance with the entries of the operation table.
    Type: Grant
    Filed: July 2, 2021
    Date of Patent: May 30, 2023
    Assignee: Shanghai Biren Technology Co., Ltd
    Inventors: HaiChuan Wang, Song Zhao, GuoFang Jiao, ChengPing Luo, Zhou Hong
  • Publication number: 20230130460
    Abstract: The present disclosure provides a chipset and a manufacturing method thereof. The chipset includes a logic chip, an input/output chip, and an interposer. The logic chip includes a plurality of first bonding components disposed in the first device layer. The input/output chip includes a plurality of second bonding components disposed in the second device layer. The interposer includes a plurality of third bonding components disposed in the third device layer. The logic chip is directly bonded to the first portion of the plurality of third bonding components of the interposer in a pad-to-pad manner through the first portion of the plurality of first bonding components, and the input/output chip is directly bonded to the second portion of the plurality of third bonding components of the interposer in a pad-to-pad manner through the plurality of second bonding components.
    Type: Application
    Filed: October 3, 2022
    Publication date: April 27, 2023
    Applicant: Shanghai Biren Technology Co.,Ltd
    Inventors: Shiqun GU, Zhou HONG, Linglan ZHANG, Zheng TIAN, Hongying ZHANG, Peng LIU
  • Publication number: 20230125700
    Abstract: The embodiments of the disclosure relate to a data processing method and a computing system. For each die: a first reduction engine of multiple reduction engines corresponding to multiple computing cores included in a current die is determined; each computing core sends data to be reduced and a synchronization indicator to the first reduction engines in multiple dies; in response to receiving the data to be reduced and the synchronization indicators from the computing cores in multiple dies, the first reduction engine in the current die performs a reduction operation on the data to be reduced to generate a reduction computing result, and sends synchronization acknowledgments to the computing cores in the current die; and in response to receiving the synchronization acknowledgment, each computing core in the current die reads the reduction computing result from the first reduction engine in the current die.
    Type: Application
    Filed: October 19, 2022
    Publication date: April 27, 2023
    Applicant: Shanghai Biren Technology Co.,Ltd
    Inventors: Zhou HONG, Lingjie XU, Chengkun SUN, Hao SHU, Lin CHEN, Wei LIANG, Chao MENG
  • Publication number: 20230131810
    Abstract: The present disclosure provides a DIDT control method. The method includes, at each of a plurality of DIDT control modules: obtaining a local operation load of a local ALU in each clock cycle; obtaining a global operation load of a plurality of ALUs in each cycle period; determining an operation load index of the local ALU based on local historical load information and a local historical load weight set of the local ALU and global historical load information and a global historical load weight set of the multiple ALUs, the global historical load information includes a first number of the global operation loads, the local historical load information includes a second number of the local operation loads; and adjusting an operation load of the local ALU based on the operation load index of the local ALU and a predetermined load threshold to control a DIDT of the local ALU.
    Type: Application
    Filed: October 3, 2022
    Publication date: April 27, 2023
    Applicant: Shanghai Biren Technology Co.,Ltd
    Inventors: Zhou HONG, Yunya FEI, Hao SHU, ChengKun SUN
  • Publication number: 20230117626
    Abstract: A convolution apparatus including a data memory, a matrix unknit-knit device, and a convolution operation device, a convolution method, a matrix unknit-knit device, and a matrix unknit-knit method are provided. The matrix unknit-knit device unknits a first matrix stored in the data memory into s*s second matrices (or knits the s*s second matrices into the first matrix), where s is greater than 1. Pixels in each of s*s subblocks in the first matrix serve one-to-one as pixels of the s*s second matrices. A convolution operation device unknits a convolution kernel of a convolution operation with a stride of s into s*s sub-kernels, uses any one of the sub-kernels to perform a convolution operation with a stride of 1 on one corresponding second matrix, and accumulates the operation results the second matrices as the operation result of performing the convolution operation with a stride of s on the first matrix.
    Type: Application
    Filed: October 3, 2022
    Publication date: April 20, 2023
    Applicant: Shanghai Biren Technology Co.,Ltd
    Inventors: Hao SHU, Zhou HONG, Lin CHEN, Tong SUN, Zhu LIANG, ChengKun SUN
  • Publication number: 20230027355
    Abstract: A multicast routing method and an interconnection device for a mesh network system, a mesh network system and a configuration method thereof are provided. The method includes, at each internal interconnection device among multiple interconnection devices of each processing subsystem: in response to receiving a multicast access request to a destination memory, determining a shortest path from each internal interconnection device to the destination memory based on a topology structure of the mesh network system; where the internal interconnection device has no link connected to an external processing subsystem; in response to determining that the number of the shortest path is equal to one, routing the multicast access request to the destination memory along the shortest path; in response to determining that the number of the shortest path is greater than one, determining a next-hop interconnection device for the multicast access request based on a second static routing policy.
    Type: Application
    Filed: July 5, 2022
    Publication date: January 26, 2023
    Applicant: Shanghai Biren Technology Co.,Ltd
    Inventors: Zhou HONG, Qin ZHENG, Yuzhe LI
  • Publication number: 20220398102
    Abstract: An artificial intelligence chip and a data operation method are provided. The artificial intelligence chip receives a command carrying first data and address information and includes a chip memory, a computing processor, a base address register, and an extended address processor. The base address register is configured to access an extended address space in the chip memory. The extended address processor receives the command. The extended address processor determines an operation mode of the first data according to the address information. When the address information points to a first section of the extended address space, the extended address processor performs a first operation on the first data. When the address information points to a section other than the first section of the extended address space, the extended address processor notifies the computing processor of the operation mode and the computing processor performs a second operation on the first data.
    Type: Application
    Filed: September 8, 2021
    Publication date: December 15, 2022
    Applicant: Shanghai Biren Technology Co.,Ltd
    Inventors: Zhou HONG, Qin ZHENG, ChengPing LUO, GuoFang JIAO, Song ZHAO, XiangLiang YU
  • Patent number: D981620
    Type: Grant
    Filed: July 15, 2022
    Date of Patent: March 21, 2023
    Inventor: Zhou Hong