Patents by Inventor Zhou Hong
Zhou Hong has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11941396Abstract: The present disclosure provides a DIDT control method. The method includes, at each of a plurality of DIDT control modules: obtaining a local operation load of a local ALU in each clock cycle; obtaining a global operation load of a plurality of ALUs in each cycle period; determining an operation load index of the local ALU based on local historical load information and a local historical load weight set of the local ALU and global historical load information and a global historical load weight set of the multiple ALUs, the global historical load information includes a first number of the global operation loads, the local historical load information includes a second number of the local operation loads; and adjusting an operation load of the local ALU based on the operation load index of the local ALU and a predetermined load threshold to control a DIDT of the local ALU.Type: GrantFiled: October 3, 2022Date of Patent: March 26, 2024Assignee: Shanghai Biren Technology Co., LtdInventors: Zhou Hong, Yunya Fei, Hao Shu, ChengKun Sun
-
Patent number: 11915338Abstract: The disclosed technology generally relates to a graphics processing unit (GPU). In one aspect, a GPU includes a general purpose register (GPR) having registers, an arithmetic logic unit (ALU) configured to read pixels of an image independently of a shared memory, and a level 1 (L1) cache storing the pixels read by the ALU. The ALU can implement pixel mapping by fetching a quad of pixels, which includes pixels of first, second, third, and fourth pixel types, from the L1 cache, grouping the pixels of the different pixel types of the quad into four groups based on pixel type, and, for each group, separating the pixels included in the group into three regions that each have a set of pixels. The pixels for each group can then be loaded into the registers corresponding to the three regions.Type: GrantFiled: May 13, 2021Date of Patent: February 27, 2024Assignee: Huawei Technologies Co., Ltd.Inventors: Zhou Hong, Yufei Zhang
-
Patent number: 11908061Abstract: Methodologies and architectures are provided for inter-thread sharing of data in a general purpose register (GPR) of a multiprocessor apparatus. The data sharing is performed by a graphics processing unit (GPU) having at least one processing cluster including a plurality of processing cores (PCs) configured for parallel operation. Each PC of a cluster is configured to utilize a dedicated portion of the GPR. The GPU further includes a shared memory for the cluster, and a memory read/write hub coupled to the GPR and shared memory, the hub including a crossbar switch. A PC executes a move data instruction, including operands referencing a destination portion of the GPR and a source portion assigned to the PC, to retrieve data from the source portion. The memory read/write hub writes the data, via the crossbar switch, to the destination portion of the GPR without first writing the data to the shared memory.Type: GrantFiled: September 1, 2021Date of Patent: February 20, 2024Assignee: Huawei Technologies Co., Ltd.Inventors: Zhou Hong, Yufei Zhang
-
Patent number: 11900175Abstract: The embodiments of the disclosure relate to a computing device, a computing equipment, and a programmable scheduling method for data loading and execution, and relate to the field of computer. The computing device is coupled to a first computing core and a first memory. The computing device includes a scratchpad memory, a second computing core, a first hardware queue, a second hardware queue and a synchronization unit. The second computing core is configured for acceleration in a specific field. The first hardware queue receives a load request from the first computing core. The second hardware queue receives an execution request from the first computing core. The synchronization unit configured to make the triggering of the load request and the execution request to cooperate with each other. In this manner, flexibility, throughput, and overall performance can be enhanced.Type: GrantFiled: November 11, 2021Date of Patent: February 13, 2024Assignee: Shanghai Biren Technology Co., LtdInventors: Zhou Hong, YuFei Zhang, ChengKun Sun, Lin Chen
-
Publication number: 20240048475Abstract: An information processing method, an interconnection device, and a computer-readable storage medium are provided. The interconnection device includes a request processing module configured for: receiving a data access request from at least one processor, wherein the data access request comprises a merge bit, a multicast group identifier (MGID), and a multicast transaction identifier (MTID); determining whether the data access request is a multicast request; determining whether the interconnection device receives other multicast requests if it is determined that the data access request is a multicast request based on the MGID, the MTID, and a static routing policy of a multicast group; and obtaining the other multicast requests if it is determined that the interconnection device receives the other multicast requests, merging the multicast request with the other multicast requests into a merged request, and forwarding the merged request to a next-hop device of the interconnection device.Type: ApplicationFiled: October 15, 2023Publication date: February 8, 2024Applicant: Shanghai Biren Technology Co.,LtdInventors: Qin ZHENG, Zhou HONG, YuFei ZHANG, Lin CHEN, ChengKun SUN, Tong SUN, ChengPing LUO, HaiChuan WANG
-
Patent number: 11855878Abstract: An information processing method, an interconnection device, and a computer-readable storage medium are provided. The interconnection device includes a request processing module configured for: receiving a data access request from at least one processor, wherein the data access request comprises a merge bit, a multicast group identifier (MGID), and a multicast transaction identifier (MTID); determining whether the data access request is a multicast request; determining whether the interconnection device receives other multicast requests if it is determined that the data access request is a multicast request based on the MGID, the MTID, and a static routing policy of a multicast group; and obtaining the other multicast requests if it is determined that the interconnection device receives the other multicast requests, merging the multicast request with the other multicast requests into a merged request, and forwarding the merged request to a next-hop device of the interconnection device.Type: GrantFiled: November 11, 2021Date of Patent: December 26, 2023Assignee: Shanghai Biren Technology Co., LtdInventors: Qin Zheng, Zhou Hong, YuFei Zhang, Lin Chen, ChengKun Sun, Tong Sun, ChengPing Luo, HaiChuan Wang
-
Patent number: 11809516Abstract: The invention relates to an apparatus for vector computing incorporating with matrix multiply and accumulation (MMA) calculation. The apparatus includes a streaming multiprocessor (SM), and a block selector. The register space is divided into physical blocks, each of which includes register groups, and a general matrix multiply (GEMM) calculation unit. The SM includes a general-purpose register (GPR), and the GEMM calculation unit includes an instruction queue and a arithmetic logical unit (ALU). The ALU coupled to the GPR is arranged operably to perform MMA calculation according to a GEMM instruction stored in the instruction queue, and store a calculation result in the GPR.Type: GrantFiled: July 2, 2021Date of Patent: November 7, 2023Assignee: Shanghai Biren Technology Co., LtdInventors: Zhou Hong, YuFei Zhang
-
Patent number: 11809221Abstract: An artificial intelligence chip and a data operation method are provided. The artificial intelligence chip receives a command carrying first data and address information and includes a chip memory, a computing processor, a base address register, and an extended address processor. The base address register is configured to access an extended address space in the chip memory. The extended address processor receives the command. The extended address processor determines an operation mode of the first data according to the address information. When the address information points to a first section of the extended address space, the extended address processor performs a first operation on the first data. When the address information points to a section other than the first section of the extended address space, the extended address processor notifies the computing processor of the operation mode and the computing processor performs a second operation on the first data.Type: GrantFiled: September 8, 2021Date of Patent: November 7, 2023Assignee: Shanghai Biren Technology Co., LtdInventors: Zhou Hong, Qin Zheng, ChengPing Luo, GuoFang Jiao, Song Zhao, XiangLiang Yu
-
Patent number: 11811512Abstract: A multicast routing method and an interconnection device for a mesh network system, a mesh network system and a configuration method thereof are provided. The method includes, at each internal interconnection device among multiple interconnection devices of each processing subsystem: in response to receiving a multicast access request to a destination memory, determining a shortest path from each internal interconnection device to the destination memory based on a topology structure of the mesh network system; where the internal interconnection device has no link connected to an external processing subsystem; in response to determining that the number of the shortest path is equal to one, routing the multicast access request to the destination memory along the shortest path; in response to determining that the number of the shortest path is greater than one, determining a next-hop interconnection device for the multicast access request based on a second static routing policy.Type: GrantFiled: July 5, 2022Date of Patent: November 7, 2023Assignee: Shanghai Biren Technology Co., LtdInventors: Zhou Hong, Qin Zheng, Yuzhe Li
-
Patent number: 11748077Abstract: The invention relates to a method for compiling code adapted for secondary offloads in a graphics processing unit (GPU). The method, performed by a processing unit, includes: reconstructing execution codes in a first kernel into a second kernel. The second kernel includes an operation table including entries, and computation codes. The computation codes include a portion of the execution codes, and synchronization hooks, and each synchronization hook includes information indicating one entry of the operation table. An order of the portion of the execution codes and the synchronization hooks in the computation codes matches an order of the execution codes in the first kernel, thereby enabling a compute unit (CU) in the GPU to execute the computation codes, and an engine in the GPU to instruct a component inside or outside of the GPU to complete a designated operation in accordance with content of each entry in the operation table.Type: GrantFiled: July 2, 2021Date of Patent: September 5, 2023Assignee: SHANGHAI BIREN TECHNOLOGY CO., LTDInventors: HaiChuan Wang, Song Zhao, GuoFang Jiao, ChengPing Luo, Zhou Hong
-
Publication number: 20230267011Abstract: The invention relates to an apparatus for second offloads in a graphics processing unit (GPU). The apparatus includes an engine; and a compute unit (CU). The CU is arranged operably to: fetch execution codes; when each execution code is suitable to be executed by the CU, execute the execution code; and when each execution code is not suitable to be executed by the CU, generate a corresponding entry, and send a request with the corresponding entry to the engine for instructing the engine to allow a component inside or outside of the GPU to complete an operation in accordance with the corresponding entry.Type: ApplicationFiled: April 21, 2023Publication date: August 24, 2023Applicant: Shanghai Biren Technology Co., LtdInventors: HaiChuan WANG, Song ZHAO, GuoFang JIAO, ChengPing LUO, Zhou HONG
-
Patent number: 11669327Abstract: The embodiments of the disclosure relate to a computing device and a method for loading data. According to the method, the first processing unit sends a first instruction to the NMP unit. The first instruction includes a first address, a plurality of second addresses, and an operation type. In response to the first instruction, the NMP unit performs operations associated with the operation type on multiple data items on the multiple second addresses of the first memory, so as to generate the operation result. The NMP unit stores the operation result to the first address of the first memory. The first processing unit issues a flush instruction to make the operation result on the first address visible to the first processing unit. The first processing unit issues a read instruction to read the operation result on the first address to the first processing unit.Type: GrantFiled: November 10, 2021Date of Patent: June 6, 2023Assignee: Shanghai Biren Technology Co., LtdInventors: Zhou Hong, YuFei Zhang
-
Patent number: 11663044Abstract: The invention relates to an apparatus for second offloads in a graphics processing unit (GPU). The apparatus includes an engine; and a compute unit (CU). The engine is arranged operably to store an operation table including entries. The CU is arranged operably to fetch computation codes including execution codes, and synchronization requests; execute each execution code; and send requests to the engine in accordance with the synchronization requests for instructing the engine to allow components inside or outside of the GPU to complete operations in accordance with the entries of the operation table.Type: GrantFiled: July 2, 2021Date of Patent: May 30, 2023Assignee: Shanghai Biren Technology Co., LtdInventors: HaiChuan Wang, Song Zhao, GuoFang Jiao, ChengPing Luo, Zhou Hong
-
Publication number: 20230130460Abstract: The present disclosure provides a chipset and a manufacturing method thereof. The chipset includes a logic chip, an input/output chip, and an interposer. The logic chip includes a plurality of first bonding components disposed in the first device layer. The input/output chip includes a plurality of second bonding components disposed in the second device layer. The interposer includes a plurality of third bonding components disposed in the third device layer. The logic chip is directly bonded to the first portion of the plurality of third bonding components of the interposer in a pad-to-pad manner through the first portion of the plurality of first bonding components, and the input/output chip is directly bonded to the second portion of the plurality of third bonding components of the interposer in a pad-to-pad manner through the plurality of second bonding components.Type: ApplicationFiled: October 3, 2022Publication date: April 27, 2023Applicant: Shanghai Biren Technology Co.,LtdInventors: Shiqun GU, Zhou HONG, Linglan ZHANG, Zheng TIAN, Hongying ZHANG, Peng LIU
-
Publication number: 20230125700Abstract: The embodiments of the disclosure relate to a data processing method and a computing system. For each die: a first reduction engine of multiple reduction engines corresponding to multiple computing cores included in a current die is determined; each computing core sends data to be reduced and a synchronization indicator to the first reduction engines in multiple dies; in response to receiving the data to be reduced and the synchronization indicators from the computing cores in multiple dies, the first reduction engine in the current die performs a reduction operation on the data to be reduced to generate a reduction computing result, and sends synchronization acknowledgments to the computing cores in the current die; and in response to receiving the synchronization acknowledgment, each computing core in the current die reads the reduction computing result from the first reduction engine in the current die.Type: ApplicationFiled: October 19, 2022Publication date: April 27, 2023Applicant: Shanghai Biren Technology Co.,LtdInventors: Zhou HONG, Lingjie XU, Chengkun SUN, Hao SHU, Lin CHEN, Wei LIANG, Chao MENG
-
Publication number: 20230131810Abstract: The present disclosure provides a DIDT control method. The method includes, at each of a plurality of DIDT control modules: obtaining a local operation load of a local ALU in each clock cycle; obtaining a global operation load of a plurality of ALUs in each cycle period; determining an operation load index of the local ALU based on local historical load information and a local historical load weight set of the local ALU and global historical load information and a global historical load weight set of the multiple ALUs, the global historical load information includes a first number of the global operation loads, the local historical load information includes a second number of the local operation loads; and adjusting an operation load of the local ALU based on the operation load index of the local ALU and a predetermined load threshold to control a DIDT of the local ALU.Type: ApplicationFiled: October 3, 2022Publication date: April 27, 2023Applicant: Shanghai Biren Technology Co.,LtdInventors: Zhou HONG, Yunya FEI, Hao SHU, ChengKun SUN
-
Publication number: 20230117626Abstract: A convolution apparatus including a data memory, a matrix unknit-knit device, and a convolution operation device, a convolution method, a matrix unknit-knit device, and a matrix unknit-knit method are provided. The matrix unknit-knit device unknits a first matrix stored in the data memory into s*s second matrices (or knits the s*s second matrices into the first matrix), where s is greater than 1. Pixels in each of s*s subblocks in the first matrix serve one-to-one as pixels of the s*s second matrices. A convolution operation device unknits a convolution kernel of a convolution operation with a stride of s into s*s sub-kernels, uses any one of the sub-kernels to perform a convolution operation with a stride of 1 on one corresponding second matrix, and accumulates the operation results the second matrices as the operation result of performing the convolution operation with a stride of s on the first matrix.Type: ApplicationFiled: October 3, 2022Publication date: April 20, 2023Applicant: Shanghai Biren Technology Co.,LtdInventors: Hao SHU, Zhou HONG, Lin CHEN, Tong SUN, Zhu LIANG, ChengKun SUN
-
Publication number: 20230027355Abstract: A multicast routing method and an interconnection device for a mesh network system, a mesh network system and a configuration method thereof are provided. The method includes, at each internal interconnection device among multiple interconnection devices of each processing subsystem: in response to receiving a multicast access request to a destination memory, determining a shortest path from each internal interconnection device to the destination memory based on a topology structure of the mesh network system; where the internal interconnection device has no link connected to an external processing subsystem; in response to determining that the number of the shortest path is equal to one, routing the multicast access request to the destination memory along the shortest path; in response to determining that the number of the shortest path is greater than one, determining a next-hop interconnection device for the multicast access request based on a second static routing policy.Type: ApplicationFiled: July 5, 2022Publication date: January 26, 2023Applicant: Shanghai Biren Technology Co.,LtdInventors: Zhou HONG, Qin ZHENG, Yuzhe LI
-
Publication number: 20220398102Abstract: An artificial intelligence chip and a data operation method are provided. The artificial intelligence chip receives a command carrying first data and address information and includes a chip memory, a computing processor, a base address register, and an extended address processor. The base address register is configured to access an extended address space in the chip memory. The extended address processor receives the command. The extended address processor determines an operation mode of the first data according to the address information. When the address information points to a first section of the extended address space, the extended address processor performs a first operation on the first data. When the address information points to a section other than the first section of the extended address space, the extended address processor notifies the computing processor of the operation mode and the computing processor performs a second operation on the first data.Type: ApplicationFiled: September 8, 2021Publication date: December 15, 2022Applicant: Shanghai Biren Technology Co.,LtdInventors: Zhou HONG, Qin ZHENG, ChengPing LUO, GuoFang JIAO, Song ZHAO, XiangLiang YU
-
Patent number: D981620Type: GrantFiled: July 15, 2022Date of Patent: March 21, 2023Inventor: Zhou Hong