Patents by Inventor Guofang Jiao

Guofang Jiao has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11809221
    Abstract: An artificial intelligence chip and a data operation method are provided. The artificial intelligence chip receives a command carrying first data and address information and includes a chip memory, a computing processor, a base address register, and an extended address processor. The base address register is configured to access an extended address space in the chip memory. The extended address processor receives the command. The extended address processor determines an operation mode of the first data according to the address information. When the address information points to a first section of the extended address space, the extended address processor performs a first operation on the first data. When the address information points to a section other than the first section of the extended address space, the extended address processor notifies the computing processor of the operation mode and the computing processor performs a second operation on the first data.
    Type: Grant
    Filed: September 8, 2021
    Date of Patent: November 7, 2023
    Assignee: Shanghai Biren Technology Co., Ltd
    Inventors: Zhou Hong, Qin Zheng, ChengPing Luo, GuoFang Jiao, Song Zhao, XiangLiang Yu
  • Patent number: 11748077
    Abstract: The invention relates to a method for compiling code adapted for secondary offloads in a graphics processing unit (GPU). The method, performed by a processing unit, includes: reconstructing execution codes in a first kernel into a second kernel. The second kernel includes an operation table including entries, and computation codes. The computation codes include a portion of the execution codes, and synchronization hooks, and each synchronization hook includes information indicating one entry of the operation table. An order of the portion of the execution codes and the synchronization hooks in the computation codes matches an order of the execution codes in the first kernel, thereby enabling a compute unit (CU) in the GPU to execute the computation codes, and an engine in the GPU to instruct a component inside or outside of the GPU to complete a designated operation in accordance with content of each entry in the operation table.
    Type: Grant
    Filed: July 2, 2021
    Date of Patent: September 5, 2023
    Assignee: SHANGHAI BIREN TECHNOLOGY CO., LTD
    Inventors: HaiChuan Wang, Song Zhao, GuoFang Jiao, ChengPing Luo, Zhou Hong
  • Publication number: 20230267011
    Abstract: The invention relates to an apparatus for second offloads in a graphics processing unit (GPU). The apparatus includes an engine; and a compute unit (CU). The CU is arranged operably to: fetch execution codes; when each execution code is suitable to be executed by the CU, execute the execution code; and when each execution code is not suitable to be executed by the CU, generate a corresponding entry, and send a request with the corresponding entry to the engine for instructing the engine to allow a component inside or outside of the GPU to complete an operation in accordance with the corresponding entry.
    Type: Application
    Filed: April 21, 2023
    Publication date: August 24, 2023
    Applicant: Shanghai Biren Technology Co., Ltd
    Inventors: HaiChuan WANG, Song ZHAO, GuoFang JIAO, ChengPing LUO, Zhou HONG
  • Patent number: 11663044
    Abstract: The invention relates to an apparatus for second offloads in a graphics processing unit (GPU). The apparatus includes an engine; and a compute unit (CU). The engine is arranged operably to store an operation table including entries. The CU is arranged operably to fetch computation codes including execution codes, and synchronization requests; execute each execution code; and send requests to the engine in accordance with the synchronization requests for instructing the engine to allow components inside or outside of the GPU to complete operations in accordance with the entries of the operation table.
    Type: Grant
    Filed: July 2, 2021
    Date of Patent: May 30, 2023
    Assignee: Shanghai Biren Technology Co., Ltd
    Inventors: HaiChuan Wang, Song Zhao, GuoFang Jiao, ChengPing Luo, Zhou Hong
  • Publication number: 20220398102
    Abstract: An artificial intelligence chip and a data operation method are provided. The artificial intelligence chip receives a command carrying first data and address information and includes a chip memory, a computing processor, a base address register, and an extended address processor. The base address register is configured to access an extended address space in the chip memory. The extended address processor receives the command. The extended address processor determines an operation mode of the first data according to the address information. When the address information points to a first section of the extended address space, the extended address processor performs a first operation on the first data. When the address information points to a section other than the first section of the extended address space, the extended address processor notifies the computing processor of the operation mode and the computing processor performs a second operation on the first data.
    Type: Application
    Filed: September 8, 2021
    Publication date: December 15, 2022
    Applicant: Shanghai Biren Technology Co.,Ltd
    Inventors: Zhou HONG, Qin ZHENG, ChengPing LUO, GuoFang JIAO, Song ZHAO, XiangLiang YU
  • Publication number: 20220164232
    Abstract: A method for managing resources, a computing device, and a computer-readable storage medium are provided. The method includes obtaining device information of multiple physical devices included in a computing node to confirm physical devices supporting a predetermined hardware resource management method; initializing at least one physical device among the physical devices supporting the predetermined hardware resource management method as a unified device view device; allocating a virtual storage address of the unified device view device, where the virtual storage address is mapped to a physical storage address of the physical device participating in the unified device view; transmitting data to the virtual storage address of the unified device view device; and issuing a computing task to the unified device view device via a task queue for using the physical device participating in the unified device view to execute the computing task.
    Type: Application
    Filed: November 4, 2021
    Publication date: May 26, 2022
    Applicant: Shanghai Biren Technology Co.,Ltd
    Inventors: Long CHEN, HaiChuan WANG, GuoFang JIAO
  • Publication number: 20220129272
    Abstract: The invention relates to an apparatus for second offloads in a graphics processing unit (GPU). The apparatus includes an engine; and a compute unit (CU). The engine is arranged operably to store an operation table including entries. The CU is arranged operably to fetch computation codes including execution codes, and synchronization requests; execute each execution code; and send requests to the engine in accordance with the synchronization requests for instructing the engine to allow components inside or outside of the GPU to complete operations in accordance with the entries of the operation table.
    Type: Application
    Filed: July 2, 2021
    Publication date: April 28, 2022
    Applicant: Shanghai Biren Technology Co., Ltd
    Inventors: HaiChuan WANG, Song ZHAO, GuoFang JIAO, ChengPing LUO, Zhou HONG
  • Publication number: 20220129255
    Abstract: The invention relates to a method for compiling code adapted for secondary offloads in a graphics processing unit (GPU). The method, performed by a processing unit, includes: reconstructing execution codes in a first kernel into a second kernel. The second kernel includes an operation table including entries, and computation codes. The computation codes include a portion of the execution codes, and synchronization hooks, and each synchronization hook includes information indicating one entry of the operation table. An order of the portion of the execution codes and the synchronization hooks in the computation codes matches an order of the execution codes in the first kernel, thereby enabling a compute unit (CU) in the GPU to execute the computation codes, and an engine in the GPU to instruct a component inside or outside of the GPU to complete a designated operation in accordance with content of each entry in the operation table.
    Type: Application
    Filed: July 2, 2021
    Publication date: April 28, 2022
    Applicant: Shanghai Biren Technology Co., Ltd
    Inventors: HaiChuan WANG, Song ZHAO, GuoFang JIAO, ChengPing LUO, Zhou HONG
  • Publication number: 20220043688
    Abstract: Embodiments of this disclosure provide techniques for splitting a DAG computation model and constructing sub-DAG computation models for inter-node parallel processing. In particular, a method is provided where a plurality of processors split the DAG computation into a plurality of non-interdependent sub-nodes within each respective node of the DAG computation model. The plurality of processors includes at least two different processing unit types. The plurality of processors construct a plurality of sub-DAG computations, each sub-DAG computation including at least a non-interdependent sub-node from different nodes of the DAG computation. The plurality of processors process each of the plurality of sub-DAG computations in parallel.
    Type: Application
    Filed: April 28, 2019
    Publication date: February 10, 2022
    Inventors: Shouwen Lai, Guofang Jiao
  • Patent number: 10658335
    Abstract: An integrated circuit package and a system including the integrated circuit package as well as a process for assembling the integrated circuit package are provided. The integrated circuit package includes a first die manufactured on a first wafer utilizing a first node size, a second die manufactured on a second wafer utilizing a second node size, and a substrate coupled to the second die at a plurality of bump sites on a bottom surface of the second die. The first die may be mounted on a top surface of the second die utilizing a hybrid wafer bonding technique, micro bumps, or electrode-less plating.
    Type: Grant
    Filed: January 25, 2018
    Date of Patent: May 19, 2020
    Assignee: Futurewei Technologies, Inc.
    Inventors: Shiqun Gu, Yu Lin, Jinghua Zhu, Guofang Jiao
  • Patent number: 10558460
    Abstract: Systems and techniques are disclosed for general purpose register dynamic allocation based on latency associated with of instructions in processor threads. A streaming processor can include a general purpose registers configured to stored data associated with threads, and a thread scheduler configured to receive allocation information for the general purpose registers, the information describing general purpose registers that are to be assigned as persistent general purpose registers (pGPRs) and volatile general purpose registers (vGPRs). The plurality of general purpose registers can be allocated according to the received information. The streaming processor can include the general purpose registers allocated according to the received information, the allocated based on execution latencies of instructions included in the threads.
    Type: Grant
    Filed: December 14, 2016
    Date of Patent: February 11, 2020
    Assignee: QUALCOMM Incorporated
    Inventors: Yun Du, Liang Han, Lin Chen, Chihong Zhang, Hongjiang Shang, Jing Wu, Zilin Ying, Chun Yu, Guofang Jiao, Andrew Gruber, Eric Demers
  • Patent number: 10388060
    Abstract: According to one aspect of the present disclosure, there is provided a method that includes: determining a block size according to capabilities of a processor; dividing a first view into a plurality of first pixel blocks having the block size and a second view into a plurality of second pixel blocks having the block size; rasterizing a primitive object to produce a subset of the first pixel blocks for the first view and a subset of the second pixel blocks for the second view; and rendering the subsets of the first and second pixel blocks produced for the primitive object to produce a first image for the first view and a second image for the second view, where the rendering is interleaved between the subsets of the first and second pixel blocks occupied by the primitive object in the first and second views.
    Type: Grant
    Filed: August 28, 2017
    Date of Patent: August 20, 2019
    Assignee: Futurewei Technologies, Inc.
    Inventor: Guofang Jiao
  • Publication number: 20190179635
    Abstract: Aspects of the disclosure provide a circuit that includes a processing circuit, a memory directly coupled to the processing circuit via a dedicated data bus and a control circuit. The processing circuit includes a dot product engine. The dot product engine is configured to perform, in response to an instruction, an operation that includes dot product calculations on a weight input and a pixel sample input, and to store a result of the operation into the memory. The control circuit is configured to control the dot product engine to perform arithmetic operations that include the dot product calculations, and control the dot product engine to perform an accumulation of outputs of the dot product calculations and data received from the memory via the dedicated data bus to generate the result of the operation.
    Type: Application
    Filed: December 11, 2017
    Publication date: June 13, 2019
    Applicant: FUTUREWEI TECHNOLOGIES, INC.
    Inventors: Guofang Jiao, Zhou Hong, Chengkun Sun
  • Patent number: 10241799
    Abstract: Techniques are described for reordering commands to improve the speed at which at least one command stream may execute. Prior to distributing commands in the at least one command stream to multiple pipelines, a multimedia processor analyzes any inter-pipeline dependencies and determines the current execution state of the pipelines. The processor may, based on this information, reorder the at least one command stream by prioritizing commands that lack any current dependencies and therefore may be executed immediately by the appropriate pipeline. Such out of order execution of commands in the at least one command stream may increase the throughput of the multimedia processor by increasing the rate at which the command stream is executed.
    Type: Grant
    Filed: July 16, 2010
    Date of Patent: March 26, 2019
    Assignee: QUALCOMM Incorporated
    Inventors: Alexei V. Bourd, Guofang Jiao
  • Publication number: 20190066360
    Abstract: According to one aspect of the present disclosure, there is provided a method that includes: determining a block size according to capabilities of a processor; dividing a first view into a plurality of first pixel blocks having the block size and a second view into a plurality of second pixel blocks having the block size; rasterizing a primitive object to produce a subset of the first pixel blocks for the first view and a subset of the second pixel blocks for the second view; and rendering the subsets of the first and second pixel blocks produced for the primitive object to produce a first image for the first view and a second image for the second view, where the rendering is interleaved between the subsets of the first and second pixel blocks occupied by the primitive object in the first and second views.
    Type: Application
    Filed: August 28, 2017
    Publication date: February 28, 2019
    Inventor: Guofang Jiao
  • Publication number: 20180366442
    Abstract: An integrated circuit package and a system including the integrated circuit package as well as a process for assembling the integrated circuit package are provided. The integrated circuit package includes a first die manufactured on a first wafer utilizing a first node size, a second die manufactured on a second wafer utilizing a second node size, and a substrate coupled to the second die at a plurality of bump sites on a bottom surface of the second die. The first die may be mounted on a top surface of the second die utilizing a hybrid wafer bonding technique, micro bumps, or electrode-less plating.
    Type: Application
    Filed: January 25, 2018
    Publication date: December 20, 2018
    Inventors: Shiqun Gu, Yu Lin, Jinghua Zhu, Guofang Jiao
  • Publication number: 20180165092
    Abstract: Systems and techniques are disclosed for general purpose register dynamic allocation based on latency associated with of instructions in processor threads. A streaming processor can include a general purpose registers configured to stored data associated with threads, and a thread scheduler configured to receive allocation information for the general purpose registers, the information describing general purpose registers that are to be assigned as persistent general purpose registers (pGPRs) and volatile general purpose registers (vGPRs). The plurality of general purpose registers can be allocated according to the received information. The streaming processor can include the general purpose registers allocated according to the received information, the allocated based on execution latencies of instructions included in the threads.
    Type: Application
    Filed: December 14, 2016
    Publication date: June 14, 2018
    Inventors: Yun Du, Liang Han, Lin Chen, Chihong Zhang, Hongjiang Shang, Jing Wu, Zilin Ying, Chun Yu, Guofang Jiao, Andrew Gruber, Eric Demers
  • Patent number: 9852536
    Abstract: This disclosure describes techniques for performing high order filtering in a graphics processing unit (GPU). In examples of the disclosure, high order filtering may be implemented on a modified texture engine of a GPU using a single shader instruction. The modified texture engine may be configured to fetch all source pixels needed for the high order filtering and blend them together with pre-loaded filtering weights.
    Type: Grant
    Filed: August 5, 2014
    Date of Patent: December 26, 2017
    Assignee: QUALCOMM Incorporated
    Inventors: Liang Li, Guofang Jiao, Yunshan Kong, Javier Ignacio Girado
  • Patent number: 9799089
    Abstract: A method for processing data in a graphics processing unit including receiving a code block of instructions common to a plurality of groups of threads of a shader, executing the code block of instructions common to the plurality of groups of threads of the shader creating a result by a first group of threads of the plurality of groups of threads, storing the result of the code block of instructions common to the plurality of groups of threads of the shader in on-chip random access memory (RAM), the on-chip RAM accessible by each of the plurality of groups of threads, and upon a determination that storing the result of the code block of instructions common to the plurality of groups of threads of the shader has completed, returning the result of the code block of instructions common to the plurality of groups of threads of the shader from on-chip RAM.
    Type: Grant
    Filed: May 23, 2016
    Date of Patent: October 24, 2017
    Assignee: QUALCOMM Incorporated
    Inventors: Lin Chen, Yun Du, Andrew Evan Gruber, Guofang Jiao, Chun Yu, David Rigel Garcia Garcia
  • Patent number: 9697580
    Abstract: This disclosure describes an apparatus configured to process graphics data. The apparatus may include a fixed hardware pipeline configured to execute one or more functions on a current set of graphics data. The fixed hardware pipeline may include a plurality of stages including a bypassable portion of the plurality of stages. The apparatus may further include a shortcut circuit configured to route the current set of graphics data around the bypassable portion of the plurality of stages, and a controller positioned before the bypassable portion of the plurality of stages, the controller configured to selectively route the current set of graphics data to one of the shortcut circuit or the bypassable portion of the plurality of stages.
    Type: Grant
    Filed: November 10, 2014
    Date of Patent: July 4, 2017
    Assignee: QUALCOMM Incorporated
    Inventors: Liang Li, Andrew Evan Gruber, Guofang Jiao, Zhenyu Qi, Gregory Steve Pitarys, Scott William Nolan