Patents by Inventor Zhou Hong

Zhou Hong has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

METHOD FOR PROCESSING DATA USING COMPUTING ARRAY AND COMPUTING SYSTEM

Publication number: 20220374280

Abstract: A method for processing data using a computing array is provided. In the method, source data is allocated to each of multiple computing nodes in a computing array. The source data includes multiple blocks. At a computing node among the computing nodes, in at least one iteration process, multiple blocks are respectively received from multiple other computing nodes other than the computing node among the computing nodes using multiple first type computing devices among a set of computing devices included in the computing node. A processing operation is executed on the received blocks using the first type computing devices respectively to generate multiple intermediate results. The processing operation is executed on the intermediate results to obtain a first part of a final result of executing the processing operation on the source data. A corresponding computer system is also provided.

Type: Application

Filed: May 18, 2022

Publication date: November 24, 2022

Applicant: Shanghai Biren Technology Co.,Ltd

Inventors: Zhou HONG, ChengPing LUO, Qin ZHENG
COMPUTING SYSTEM, COMPUTING PROCESSOR AND DATA PROCESSING METHOD

Publication number: 20220368619

Abstract: The present disclosure provides a computing system, a computing processor and a data processing method for the computing processor. The computing system includes: multiple computing clusters, each computing cluster includes multiple computing nodes, and each computing node includes multiple computing processors. At least some computing clusters among the computing clusters, at least some computing nodes in each computing cluster and at least some computing processors of each computing node are connected through direct links. Each computing processor of at least some computing processors of the computing node is configured with a local routing table, which is configured for the computing processor to determine, based on the local routing table, the next direct link through which a data packet performs routing from a data source to a data destination, and the computing processor forwards the data packet through the next direct link.

Type: Application

Filed: May 11, 2022

Publication date: November 17, 2022

Applicant: Shanghai Biren Technology Co.,Ltd

Inventors: Zhou HONG, Qin ZHENG, ChengPing LUO
METHOD, COMPUTING DEVICE AND COMPUTER READABLE STORAGE MEDIUM FOR COMPUTING

Publication number: 20220295080

Abstract: The present disclosure relates to a method for computing, computing device and computer-readable storage medium. The method includes: determining a pixel block set in a cache, a first pixel block in the pixel block set comprising an m×n pixel matrix having a first padding setting related to the original pixel data, the m and n being positive integers; and storing the determined pixel block set in a buffer to enable a second pixel block to be read from the buffer based on the buffer initial address of the first pixel block and an address offset associated with the second pixel block, wherein the second pixel block has a second padding setting related to the original pixel data, and the first padding setting and the second padding setting have the same offset amount in a first direction relative to the original pixel data.

Type: Application

Filed: March 10, 2022

Publication date: September 15, 2022

Applicant: Shanghai Biren Technology Co.,Ltd

Inventors: YuFei ZHANG, Zhou HONG
INFORMATION PROCESSING METHOD, INTERCONNECTION DEVICE AND COMPUTER-READABLE STORAGE MEDIUM

Publication number: 20220158929

Abstract: An information processing method, an interconnection device, and a computer-readable storage medium are provided. The interconnection device includes a request processing module configured for: receiving a data access request from at least one processor, wherein the data access request comprises a merge bit, a multicast group identifier (MGID), and a multicast transaction identifier (MTID); determining whether the data access request is a multicast request; determining whether the interconnection device receives other multicast requests if it is determined that the data access request is a multicast request based on the MGID, the MTID, and a static routing policy of a multicast group; and obtaining the other multicast requests if it is determined that the interconnection device receives the other multicast requests, merging the multicast request with the other multicast requests into a merged request, and forwarding the merged request to a next-hop device of the interconnection device.

Type: Application

Filed: November 11, 2021

Publication date: May 19, 2022

Applicant: Shanghai Biren Technology Co.,Ltd

Inventors: Qin ZHENG, Zhou HONG, YuFei ZHANG, Lin CHEN, ChengKun SUN, Tong SUN, ChengPing LUO, HaiChuan WANG
COMPUTING DEVICE, COMPUTING EQUIPMENT AND PROGRAMMABLE SCHEDULING METHOD

Publication number: 20220156128

Abstract: The embodiments of the disclosure relate to a computing device, a computing equipment, and a programmable scheduling method for data loading and execution, and relate to the field of computer. The computing device is coupled to a first computing core and a first memory. The computing device includes a scratchpad memory, a second computing core, a first hardware queue, a second hardware queue and a synchronization unit. The second computing core is configured for acceleration in a specific field. The first hardware queue receives a load request from the first computing core. The second hardware queue receives an execution request from the first computing core. The synchronization unit configured to make the triggering of the load request and the execution request to cooperate with each other. In this manner, flexibility, throughput, and overall performance can be enhanced.

Type: Application

Filed: November 11, 2021

Publication date: May 19, 2022

Applicant: Shanghai Biren Technology Co.,Ltd

Inventors: Zhou HONG, YuFei ZHANG, ChengKun SUN, Lin CHEN
COMPUTING DEVICE AND METHOD FOR LOADING DATA

Publication number: 20220147354

Abstract: The embodiments of the disclosure relate to a computing device and a method for loading data. According to the method, the first processing unit sends a first instruction to the NMP unit. The first instruction includes a first address, a plurality of second addresses, and an operation type. In response to the first instruction, the NMP unit performs operations associated with the operation type on multiple data items on the multiple second addresses of the first memory, so as to generate the operation result. The NMP unit stores the operation result to the first address of the first memory. The first processing unit issues a flush instruction to make the operation result on the first address visible to the first processing unit. The first processing unit issues a read instruction to read the operation result on the first address to the first processing unit.

Type: Application

Filed: November 10, 2021

Publication date: May 12, 2022

Applicant: Shanghai Biren Technology Co.,Ltd

Inventors: Zhou HONG, YuFei ZHANG
APPARATUS AND METHOD FOR SECONDARY OFFLOADS IN GRAPHICS PROCESSING UNIT

Publication number: 20220129272

Abstract: The invention relates to an apparatus for second offloads in a graphics processing unit (GPU). The apparatus includes an engine; and a compute unit (CU). The engine is arranged operably to store an operation table including entries. The CU is arranged operably to fetch computation codes including execution codes, and synchronization requests; execute each execution code; and send requests to the engine in accordance with the synchronization requests for instructing the engine to allow components inside or outside of the GPU to complete operations in accordance with the entries of the operation table.

Type: Application

Filed: July 2, 2021

Publication date: April 28, 2022

Applicant: Shanghai Biren Technology Co., Ltd

Inventors: HaiChuan WANG, Song ZHAO, GuoFang JIAO, ChengPing LUO, Zhou HONG
APPARATUS AND METHOD AND COMPUTER PROGRAM PRODUCT FOR COMPILING CODE ADAPTED FOR SECONDARY OFFLOADS IN GRAPHICS PROCESSING UNIT

Publication number: 20220129255

Abstract: The invention relates to a method for compiling code adapted for secondary offloads in a graphics processing unit (GPU). The method, performed by a processing unit, includes: reconstructing execution codes in a first kernel into a second kernel. The second kernel includes an operation table including entries, and computation codes. The computation codes include a portion of the execution codes, and synchronization hooks, and each synchronization hook includes information indicating one entry of the operation table. An order of the portion of the execution codes and the synchronization hooks in the computation codes matches an order of the execution codes in the first kernel, thereby enabling a compute unit (CU) in the GPU to execute the computation codes, and an engine in the GPU to instruct a component inside or outside of the GPU to complete a designated operation in accordance with content of each entry in the operation table.

Type: Application

Filed: July 2, 2021

Publication date: April 28, 2022

Applicant: Shanghai Biren Technology Co., Ltd

Inventors: HaiChuan WANG, Song ZHAO, GuoFang JIAO, ChengPing LUO, Zhou HONG
APPARATUS AND METHOD FOR VECTOR COMPUTING INCORPORATING WITH MATRIX MULTIPLY AND ACCUMULATION CALCULATION

Publication number: 20220121727

Abstract: The invention relates to an apparatus for vector computing incorporating with matrix multiply and accumulation (MMA) calculation. The apparatus includes a streaming multiprocessor (SM), and a block selector. The register space is divided into physical blocks, each of which includes register groups, and a general matrix multiply (GEMM) calculation unit. The SM includes a general-purpose register (GPR), and the GEMM calculation unit includes an instruction queue and a arithmetic logical unit (ALU). The ALU coupled to the GPR is arranged operably to perform MMA calculation according to a GEMM instruction stored in the instruction queue, and store a calculation result in the GPR.

Type: Application

Filed: July 2, 2021

Publication date: April 21, 2022

Applicant: Shanghai Biren Technology Co., Ltd

Inventors: Zhou HONG, YuFei ZHANG
APPARATUS AND METHOD FOR CONFIGURING COOPERATIVE WARPS IN VECTOR COMPUTING SYSTEM

Publication number: 20220121444

Abstract: The invention relates to an apparatus for configuring cooperative warps in a vector computing system. The apparatus includes general-purpose registers (GPRs); an arithmetic logical unit (ALU); and a warp instruction scheduler. The warp instruction scheduler is arranged operably to: allow each of a plurality of warps to access to data of a whole or a designated portion of the GPRs through the ALU in accordance with a configuration by a software when being executed; and complete calculations of each warp through the ALU.

Type: Application

Filed: July 2, 2021

Publication date: April 21, 2022

Applicant: Shanghai Biren Technology Co., Ltd

Inventors: Zhou HONG, YuFei ZHANG, ChengKun SUN, Lin CHEN, Hao SHU
Storing Complex Data in Warp GPRS

Publication number: 20220012053

Abstract: A method of storing data in general purpose registers (GPRs) includes packing a tile of data items into GPRs, where the tile includes multiple channels. The tile of data items is read from memory. At least two channels of the data are stored in a first GPR, and at least two additional channels are stored in a second GPR. Auxiliary data is loaded into a third GPR. The auxiliary data and the tile data can be used together for performing convolution operations.

Type: Application

Filed: September 27, 2021

Publication date: January 13, 2022

Inventors: Lin Chen, Zhou Hong, Yufei Zhang
Inter-Warp Sharing Of General Purpose Register Data In GPU

Publication number: 20210398339

Abstract: Methodologies and architectures are provided for inter-thread sharing of data in a general purpose register (GPR) of a multiprocessor apparatus. In described embodiments, such data sharing is performed by a graphics processing unit (GPU) having at least one processing cluster, the at least one processing cluster including a plurality of processing cores (PCs) configured for parallel operation. Each PC of a cluster is configured to utilize a dedicated portion of the GPR. The GPU further includes a shared memory for the cluster, and a memory read/write hub coupled to the GPR and shared memory, the hub including a crossbar switch. A PC executes a move data instruction, the move data instruction including operands referencing a destination portion of the GPR and a source portion assigned to the PC, to retrieve data from the source portion. The memory read/write hub writes the data, via the crossbar switch, to the destination portion of the GPR without first writing the data to the shared memory.

Type: Application

Filed: September 1, 2021

Publication date: December 23, 2021

Inventors: Zhou HONG, Yufei ZHANG
POSITION DETECTION SYSTEM AND DIGITAL STYLUS USING SAME

Publication number: 20210349597

Abstract: The present disclosure discloses a position detection system, which comprising a power module, a detection module, a control module, and a pulse module. The power module is configured to supply power. The detection module comprises a key detection circuit and a force detection circuit. The control module is electrically connected to the power module and the detection module, and configured to process the key detection data detected by the key detection circuit and the force detection data detected by the force detection circuit and output a key signal and a force signal. The pulse module is connected to the control module, and configured to acquire start references of the key signal and the force signal output by the control module, and convert the key signal and the force signal to digital signals by form coding and send the digital signals to a tablet. The present discloses further provides a digital stylus using the same.

Type: Application

Filed: July 16, 2021

Publication date: November 11, 2021

Inventors: Yong Luo, Peng Cheng Zhang, Shi Hua Wu, Zhou Hong Wang
Filter Independent L1 Mapping Of Convolution Data Into General Purpose Register

Publication number: 20210272232

Abstract: The disclosed technology relates to graphics processing units (GPU). In one aspect, a GPU includes a general purpose register (GPR) including registers, an arithmetic logic unit (ALU) reading pixels of an image independently of a shared memory, and a level 1 (L1) cache storing pixels to implement a pixel mapping that maps the pixels read from the L1 cache into the registers of the GPR. The pixel mapping includes separating pixels of an image into three regions, with each region including a set of pixels. A first and second set of the pixels are loaded into registers corresponding to two of the three regions horizontally, and a third set of the pixels are loaded into registers corresponding to the third of the three regions vertically. Each of the registers in the first, second, and third registers are loaded as a contiguous ordered number of registers in the GPR.

Type: Application

Filed: May 21, 2021

Publication date: September 2, 2021

Inventors: Zhou Hong, Yufei Zhang
LOADING APPARATUS AND METHOD FOR CONVOLUTION WITH STRIDE OR DILATION OF 2

Publication number: 20210264560

Abstract: The disclosed technology generally relates to a graphics processing unit (GPU). In one aspect, a GPU includes a general purpose register (GPR) having registers, an arithmetic logic unit (ALU) configured to read pixels of an image independently of a shared memory, and a level 1 (L1) cache storing the pixels read by the ALU. The ALU can implement pixel mapping by fetching a quad of pixels, which includes pixels of first, second, third, and fourth pixel types, from the L1 cache, grouping the pixels of the different pixel types of the quad into four groups based on pixel type, and, for each group, separating the pixels included in the group into three regions that each have a set of pixels. The pixels for each group can then be loaded into the registers corresponding to the three regions.

Type: Application

Filed: May 13, 2021

Publication date: August 26, 2021

Inventors: Zhou Hong, Yufei Zhang
Arithmetic logic unit (ALU)-centric operations in graphics processing units (GPUs)

Patent number: 10726516

Abstract: A GPU comprises: a GPR comprising registers; an L1 cache coupled to the GPR and configured to implement a pixel mapping by: segregating pixels of an image into regions, the regions comprise a first region and a second region, the first region comprises first pixels, and the second region comprises second pixels, loading the first pixels into the GPR in a horizontal manner, and loading the second pixels into the GPR in a vertical manner; and an ALU configured to read the first pixels and the second pixels independently of a shared memory.

Type: Grant

Filed: October 11, 2018

Date of Patent: July 28, 2020

Assignee: Futurewei Technologies, Inc.

Inventors: Zhou Hong, Yufei Zhang
Arithmetic Logic Unit (ALU)-Centric Operations in Graphics Processing Units (GPUs)

Publication number: 20200118238

Abstract: A GPU comprises: a GPR comprising registers; an L1 cache coupled to the GPR and configured to implement a pixel mapping by: segregating pixels of an image into regions, the regions comprise a first region and a second region, the first region comprises first pixels, and the second region comprises second pixels, loading the first pixels into the GPR in a horizontal manner, and loading the second pixels into the GPR in a vertical manner; and an ALU configured to read the first pixels and the second pixels independently of a shared memory.

Type: Application

Filed: October 11, 2018

Publication date: April 16, 2020

Inventors: Zhou Hong, Yufei Zhang
System and method for dynamically adjusting voltage frequency

Patent number: 10606335

Abstract: A dynamic voltage frequency scaling (DVFS) system is provided. The DVFS system includes: a computation unit, a power management unit (PMU), a hardware activity monitor (HAM), and a hardware voltage monitor (HVM). The HAM monitors a working status and temperature information of the computation unit, and determines whether to update an operating voltage and frequency of the computation unit according to the working status, the temperature information, and a previous determination result. When the HAM determines to update the operating voltage and frequency, the HAM generates a first control signal to the PMU to calibrate the operating voltage and frequency. The HVM detects timing information of the computation unit and determine whether to fine-tune the operating voltage according to the detected timing information. When the HVM determines to fine-tune the operating voltage, the hardware monitor generates a second control signal to the PMU to fine-tune the operating voltage.

Type: Grant

Filed: December 12, 2014

Date of Patent: March 31, 2020

Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.

Inventors: Deming Gu, Zhou Hong
Apparatuses for enqueuing kernels on a device-side

Patent number: 10489164

Abstract: An apparatus for enqueuing kernels on a device-side is introduced to incorporate with at least a MXU (Memory Access Unit) and a CSP (Command Stream Processor): The CSP, after receiving a first command from the MXU, executes commands of a ring buffer, thereby enabling an EU (Execution Unit) to direct the MXU to allocate space of the ring buffer for a first hardware thread and subsequently write second commands of the first hardware thread into the allocated space of the ring buffer according to an instruction of a kernel.

Type: Grant

Filed: May 6, 2019

Date of Patent: November 26, 2019

Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.

Inventors: Fengxia Wu, Tian Shen, Zhou Hong, Yuanfeng Wang
Apparatuses for enqueuing kernels on a device-side

Patent number: 10394574

Abstract: An apparatus for enqueuing kernels on a device-side is introduced to incorporate with at least a MXU (Memory Access Unit) and a CSP (Command Stream Processor): The CSP, after receiving a first command from the MXU, executes commands of a ring buffer, thereby enabling an EU (Execution Unit) to direct the MXU to allocate space of the ring buffer for a first hardware thread and subsequently write second commands of the first hardware thread into the allocated space of the ring buffer according to an instruction of a kernel.

Type: Grant

Filed: June 2, 2016

Date of Patent: August 27, 2019

Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.

Inventors: Fengxia Wu, Tian Shen, Zhou Hong, Yuanfeng Wang

prev 1 2 3 4 5 next