Patents by Inventor Zhou Hong

Zhou Hong has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20220374280
    Abstract: A method for processing data using a computing array is provided. In the method, source data is allocated to each of multiple computing nodes in a computing array. The source data includes multiple blocks. At a computing node among the computing nodes, in at least one iteration process, multiple blocks are respectively received from multiple other computing nodes other than the computing node among the computing nodes using multiple first type computing devices among a set of computing devices included in the computing node. A processing operation is executed on the received blocks using the first type computing devices respectively to generate multiple intermediate results. The processing operation is executed on the intermediate results to obtain a first part of a final result of executing the processing operation on the source data. A corresponding computer system is also provided.
    Type: Application
    Filed: May 18, 2022
    Publication date: November 24, 2022
    Applicant: Shanghai Biren Technology Co.,Ltd
    Inventors: Zhou HONG, ChengPing LUO, Qin ZHENG
  • Publication number: 20220368619
    Abstract: The present disclosure provides a computing system, a computing processor and a data processing method for the computing processor. The computing system includes: multiple computing clusters, each computing cluster includes multiple computing nodes, and each computing node includes multiple computing processors. At least some computing clusters among the computing clusters, at least some computing nodes in each computing cluster and at least some computing processors of each computing node are connected through direct links. Each computing processor of at least some computing processors of the computing node is configured with a local routing table, which is configured for the computing processor to determine, based on the local routing table, the next direct link through which a data packet performs routing from a data source to a data destination, and the computing processor forwards the data packet through the next direct link.
    Type: Application
    Filed: May 11, 2022
    Publication date: November 17, 2022
    Applicant: Shanghai Biren Technology Co.,Ltd
    Inventors: Zhou HONG, Qin ZHENG, ChengPing LUO
  • Publication number: 20220295080
    Abstract: The present disclosure relates to a method for computing, computing device and computer-readable storage medium. The method includes: determining a pixel block set in a cache, a first pixel block in the pixel block set comprising an m×n pixel matrix having a first padding setting related to the original pixel data, the m and n being positive integers; and storing the determined pixel block set in a buffer to enable a second pixel block to be read from the buffer based on the buffer initial address of the first pixel block and an address offset associated with the second pixel block, wherein the second pixel block has a second padding setting related to the original pixel data, and the first padding setting and the second padding setting have the same offset amount in a first direction relative to the original pixel data.
    Type: Application
    Filed: March 10, 2022
    Publication date: September 15, 2022
    Applicant: Shanghai Biren Technology Co.,Ltd
    Inventors: YuFei ZHANG, Zhou HONG
  • Publication number: 20220158929
    Abstract: An information processing method, an interconnection device, and a computer-readable storage medium are provided. The interconnection device includes a request processing module configured for: receiving a data access request from at least one processor, wherein the data access request comprises a merge bit, a multicast group identifier (MGID), and a multicast transaction identifier (MTID); determining whether the data access request is a multicast request; determining whether the interconnection device receives other multicast requests if it is determined that the data access request is a multicast request based on the MGID, the MTID, and a static routing policy of a multicast group; and obtaining the other multicast requests if it is determined that the interconnection device receives the other multicast requests, merging the multicast request with the other multicast requests into a merged request, and forwarding the merged request to a next-hop device of the interconnection device.
    Type: Application
    Filed: November 11, 2021
    Publication date: May 19, 2022
    Applicant: Shanghai Biren Technology Co.,Ltd
    Inventors: Qin ZHENG, Zhou HONG, YuFei ZHANG, Lin CHEN, ChengKun SUN, Tong SUN, ChengPing LUO, HaiChuan WANG
  • Publication number: 20220156128
    Abstract: The embodiments of the disclosure relate to a computing device, a computing equipment, and a programmable scheduling method for data loading and execution, and relate to the field of computer. The computing device is coupled to a first computing core and a first memory. The computing device includes a scratchpad memory, a second computing core, a first hardware queue, a second hardware queue and a synchronization unit. The second computing core is configured for acceleration in a specific field. The first hardware queue receives a load request from the first computing core. The second hardware queue receives an execution request from the first computing core. The synchronization unit configured to make the triggering of the load request and the execution request to cooperate with each other. In this manner, flexibility, throughput, and overall performance can be enhanced.
    Type: Application
    Filed: November 11, 2021
    Publication date: May 19, 2022
    Applicant: Shanghai Biren Technology Co.,Ltd
    Inventors: Zhou HONG, YuFei ZHANG, ChengKun SUN, Lin CHEN
  • Publication number: 20220147354
    Abstract: The embodiments of the disclosure relate to a computing device and a method for loading data. According to the method, the first processing unit sends a first instruction to the NMP unit. The first instruction includes a first address, a plurality of second addresses, and an operation type. In response to the first instruction, the NMP unit performs operations associated with the operation type on multiple data items on the multiple second addresses of the first memory, so as to generate the operation result. The NMP unit stores the operation result to the first address of the first memory. The first processing unit issues a flush instruction to make the operation result on the first address visible to the first processing unit. The first processing unit issues a read instruction to read the operation result on the first address to the first processing unit.
    Type: Application
    Filed: November 10, 2021
    Publication date: May 12, 2022
    Applicant: Shanghai Biren Technology Co.,Ltd
    Inventors: Zhou HONG, YuFei ZHANG
  • Publication number: 20220129272
    Abstract: The invention relates to an apparatus for second offloads in a graphics processing unit (GPU). The apparatus includes an engine; and a compute unit (CU). The engine is arranged operably to store an operation table including entries. The CU is arranged operably to fetch computation codes including execution codes, and synchronization requests; execute each execution code; and send requests to the engine in accordance with the synchronization requests for instructing the engine to allow components inside or outside of the GPU to complete operations in accordance with the entries of the operation table.
    Type: Application
    Filed: July 2, 2021
    Publication date: April 28, 2022
    Applicant: Shanghai Biren Technology Co., Ltd
    Inventors: HaiChuan WANG, Song ZHAO, GuoFang JIAO, ChengPing LUO, Zhou HONG
  • Publication number: 20220129255
    Abstract: The invention relates to a method for compiling code adapted for secondary offloads in a graphics processing unit (GPU). The method, performed by a processing unit, includes: reconstructing execution codes in a first kernel into a second kernel. The second kernel includes an operation table including entries, and computation codes. The computation codes include a portion of the execution codes, and synchronization hooks, and each synchronization hook includes information indicating one entry of the operation table. An order of the portion of the execution codes and the synchronization hooks in the computation codes matches an order of the execution codes in the first kernel, thereby enabling a compute unit (CU) in the GPU to execute the computation codes, and an engine in the GPU to instruct a component inside or outside of the GPU to complete a designated operation in accordance with content of each entry in the operation table.
    Type: Application
    Filed: July 2, 2021
    Publication date: April 28, 2022
    Applicant: Shanghai Biren Technology Co., Ltd
    Inventors: HaiChuan WANG, Song ZHAO, GuoFang JIAO, ChengPing LUO, Zhou HONG
  • Publication number: 20220121727
    Abstract: The invention relates to an apparatus for vector computing incorporating with matrix multiply and accumulation (MMA) calculation. The apparatus includes a streaming multiprocessor (SM), and a block selector. The register space is divided into physical blocks, each of which includes register groups, and a general matrix multiply (GEMM) calculation unit. The SM includes a general-purpose register (GPR), and the GEMM calculation unit includes an instruction queue and a arithmetic logical unit (ALU). The ALU coupled to the GPR is arranged operably to perform MMA calculation according to a GEMM instruction stored in the instruction queue, and store a calculation result in the GPR.
    Type: Application
    Filed: July 2, 2021
    Publication date: April 21, 2022
    Applicant: Shanghai Biren Technology Co., Ltd
    Inventors: Zhou HONG, YuFei ZHANG
  • Publication number: 20220121444
    Abstract: The invention relates to an apparatus for configuring cooperative warps in a vector computing system. The apparatus includes general-purpose registers (GPRs); an arithmetic logical unit (ALU); and a warp instruction scheduler. The warp instruction scheduler is arranged operably to: allow each of a plurality of warps to access to data of a whole or a designated portion of the GPRs through the ALU in accordance with a configuration by a software when being executed; and complete calculations of each warp through the ALU.
    Type: Application
    Filed: July 2, 2021
    Publication date: April 21, 2022
    Applicant: Shanghai Biren Technology Co., Ltd
    Inventors: Zhou HONG, YuFei ZHANG, ChengKun SUN, Lin CHEN, Hao SHU
  • Publication number: 20220012053
    Abstract: A method of storing data in general purpose registers (GPRs) includes packing a tile of data items into GPRs, where the tile includes multiple channels. The tile of data items is read from memory. At least two channels of the data are stored in a first GPR, and at least two additional channels are stored in a second GPR. Auxiliary data is loaded into a third GPR. The auxiliary data and the tile data can be used together for performing convolution operations.
    Type: Application
    Filed: September 27, 2021
    Publication date: January 13, 2022
    Inventors: Lin Chen, Zhou Hong, Yufei Zhang
  • Publication number: 20210398339
    Abstract: Methodologies and architectures are provided for inter-thread sharing of data in a general purpose register (GPR) of a multiprocessor apparatus. In described embodiments, such data sharing is performed by a graphics processing unit (GPU) having at least one processing cluster, the at least one processing cluster including a plurality of processing cores (PCs) configured for parallel operation. Each PC of a cluster is configured to utilize a dedicated portion of the GPR. The GPU further includes a shared memory for the cluster, and a memory read/write hub coupled to the GPR and shared memory, the hub including a crossbar switch. A PC executes a move data instruction, the move data instruction including operands referencing a destination portion of the GPR and a source portion assigned to the PC, to retrieve data from the source portion. The memory read/write hub writes the data, via the crossbar switch, to the destination portion of the GPR without first writing the data to the shared memory.
    Type: Application
    Filed: September 1, 2021
    Publication date: December 23, 2021
    Inventors: Zhou HONG, Yufei ZHANG
  • Publication number: 20210349597
    Abstract: The present disclosure discloses a position detection system, which comprising a power module, a detection module, a control module, and a pulse module. The power module is configured to supply power. The detection module comprises a key detection circuit and a force detection circuit. The control module is electrically connected to the power module and the detection module, and configured to process the key detection data detected by the key detection circuit and the force detection data detected by the force detection circuit and output a key signal and a force signal. The pulse module is connected to the control module, and configured to acquire start references of the key signal and the force signal output by the control module, and convert the key signal and the force signal to digital signals by form coding and send the digital signals to a tablet. The present discloses further provides a digital stylus using the same.
    Type: Application
    Filed: July 16, 2021
    Publication date: November 11, 2021
    Inventors: Yong Luo, Peng Cheng Zhang, Shi Hua Wu, Zhou Hong Wang
  • Publication number: 20210272232
    Abstract: The disclosed technology relates to graphics processing units (GPU). In one aspect, a GPU includes a general purpose register (GPR) including registers, an arithmetic logic unit (ALU) reading pixels of an image independently of a shared memory, and a level 1 (L1) cache storing pixels to implement a pixel mapping that maps the pixels read from the L1 cache into the registers of the GPR. The pixel mapping includes separating pixels of an image into three regions, with each region including a set of pixels. A first and second set of the pixels are loaded into registers corresponding to two of the three regions horizontally, and a third set of the pixels are loaded into registers corresponding to the third of the three regions vertically. Each of the registers in the first, second, and third registers are loaded as a contiguous ordered number of registers in the GPR.
    Type: Application
    Filed: May 21, 2021
    Publication date: September 2, 2021
    Inventors: Zhou Hong, Yufei Zhang
  • Publication number: 20210264560
    Abstract: The disclosed technology generally relates to a graphics processing unit (GPU). In one aspect, a GPU includes a general purpose register (GPR) having registers, an arithmetic logic unit (ALU) configured to read pixels of an image independently of a shared memory, and a level 1 (L1) cache storing the pixels read by the ALU. The ALU can implement pixel mapping by fetching a quad of pixels, which includes pixels of first, second, third, and fourth pixel types, from the L1 cache, grouping the pixels of the different pixel types of the quad into four groups based on pixel type, and, for each group, separating the pixels included in the group into three regions that each have a set of pixels. The pixels for each group can then be loaded into the registers corresponding to the three regions.
    Type: Application
    Filed: May 13, 2021
    Publication date: August 26, 2021
    Inventors: Zhou Hong, Yufei Zhang
  • Patent number: 10726516
    Abstract: A GPU comprises: a GPR comprising registers; an L1 cache coupled to the GPR and configured to implement a pixel mapping by: segregating pixels of an image into regions, the regions comprise a first region and a second region, the first region comprises first pixels, and the second region comprises second pixels, loading the first pixels into the GPR in a horizontal manner, and loading the second pixels into the GPR in a vertical manner; and an ALU configured to read the first pixels and the second pixels independently of a shared memory.
    Type: Grant
    Filed: October 11, 2018
    Date of Patent: July 28, 2020
    Assignee: Futurewei Technologies, Inc.
    Inventors: Zhou Hong, Yufei Zhang
  • Publication number: 20200118238
    Abstract: A GPU comprises: a GPR comprising registers; an L1 cache coupled to the GPR and configured to implement a pixel mapping by: segregating pixels of an image into regions, the regions comprise a first region and a second region, the first region comprises first pixels, and the second region comprises second pixels, loading the first pixels into the GPR in a horizontal manner, and loading the second pixels into the GPR in a vertical manner; and an ALU configured to read the first pixels and the second pixels independently of a shared memory.
    Type: Application
    Filed: October 11, 2018
    Publication date: April 16, 2020
    Inventors: Zhou Hong, Yufei Zhang
  • Patent number: 10606335
    Abstract: A dynamic voltage frequency scaling (DVFS) system is provided. The DVFS system includes: a computation unit, a power management unit (PMU), a hardware activity monitor (HAM), and a hardware voltage monitor (HVM). The HAM monitors a working status and temperature information of the computation unit, and determines whether to update an operating voltage and frequency of the computation unit according to the working status, the temperature information, and a previous determination result. When the HAM determines to update the operating voltage and frequency, the HAM generates a first control signal to the PMU to calibrate the operating voltage and frequency. The HVM detects timing information of the computation unit and determine whether to fine-tune the operating voltage according to the detected timing information. When the HVM determines to fine-tune the operating voltage, the hardware monitor generates a second control signal to the PMU to fine-tune the operating voltage.
    Type: Grant
    Filed: December 12, 2014
    Date of Patent: March 31, 2020
    Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.
    Inventors: Deming Gu, Zhou Hong
  • Patent number: 10489164
    Abstract: An apparatus for enqueuing kernels on a device-side is introduced to incorporate with at least a MXU (Memory Access Unit) and a CSP (Command Stream Processor): The CSP, after receiving a first command from the MXU, executes commands of a ring buffer, thereby enabling an EU (Execution Unit) to direct the MXU to allocate space of the ring buffer for a first hardware thread and subsequently write second commands of the first hardware thread into the allocated space of the ring buffer according to an instruction of a kernel.
    Type: Grant
    Filed: May 6, 2019
    Date of Patent: November 26, 2019
    Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.
    Inventors: Fengxia Wu, Tian Shen, Zhou Hong, Yuanfeng Wang
  • Patent number: 10394574
    Abstract: An apparatus for enqueuing kernels on a device-side is introduced to incorporate with at least a MXU (Memory Access Unit) and a CSP (Command Stream Processor): The CSP, after receiving a first command from the MXU, executes commands of a ring buffer, thereby enabling an EU (Execution Unit) to direct the MXU to allocate space of the ring buffer for a first hardware thread and subsequently write second commands of the first hardware thread into the allocated space of the ring buffer according to an instruction of a kernel.
    Type: Grant
    Filed: June 2, 2016
    Date of Patent: August 27, 2019
    Assignee: VIA ALLIANCE SEMICONDUCTOR CO., LTD.
    Inventors: Fengxia Wu, Tian Shen, Zhou Hong, Yuanfeng Wang