Patents by Inventor Guofang Jiao
Guofang Jiao has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11809221Abstract: An artificial intelligence chip and a data operation method are provided. The artificial intelligence chip receives a command carrying first data and address information and includes a chip memory, a computing processor, a base address register, and an extended address processor. The base address register is configured to access an extended address space in the chip memory. The extended address processor receives the command. The extended address processor determines an operation mode of the first data according to the address information. When the address information points to a first section of the extended address space, the extended address processor performs a first operation on the first data. When the address information points to a section other than the first section of the extended address space, the extended address processor notifies the computing processor of the operation mode and the computing processor performs a second operation on the first data.Type: GrantFiled: September 8, 2021Date of Patent: November 7, 2023Assignee: Shanghai Biren Technology Co., LtdInventors: Zhou Hong, Qin Zheng, ChengPing Luo, GuoFang Jiao, Song Zhao, XiangLiang Yu
-
Patent number: 11748077Abstract: The invention relates to a method for compiling code adapted for secondary offloads in a graphics processing unit (GPU). The method, performed by a processing unit, includes: reconstructing execution codes in a first kernel into a second kernel. The second kernel includes an operation table including entries, and computation codes. The computation codes include a portion of the execution codes, and synchronization hooks, and each synchronization hook includes information indicating one entry of the operation table. An order of the portion of the execution codes and the synchronization hooks in the computation codes matches an order of the execution codes in the first kernel, thereby enabling a compute unit (CU) in the GPU to execute the computation codes, and an engine in the GPU to instruct a component inside or outside of the GPU to complete a designated operation in accordance with content of each entry in the operation table.Type: GrantFiled: July 2, 2021Date of Patent: September 5, 2023Assignee: SHANGHAI BIREN TECHNOLOGY CO., LTDInventors: HaiChuan Wang, Song Zhao, GuoFang Jiao, ChengPing Luo, Zhou Hong
-
Publication number: 20230267011Abstract: The invention relates to an apparatus for second offloads in a graphics processing unit (GPU). The apparatus includes an engine; and a compute unit (CU). The CU is arranged operably to: fetch execution codes; when each execution code is suitable to be executed by the CU, execute the execution code; and when each execution code is not suitable to be executed by the CU, generate a corresponding entry, and send a request with the corresponding entry to the engine for instructing the engine to allow a component inside or outside of the GPU to complete an operation in accordance with the corresponding entry.Type: ApplicationFiled: April 21, 2023Publication date: August 24, 2023Applicant: Shanghai Biren Technology Co., LtdInventors: HaiChuan WANG, Song ZHAO, GuoFang JIAO, ChengPing LUO, Zhou HONG
-
Patent number: 11663044Abstract: The invention relates to an apparatus for second offloads in a graphics processing unit (GPU). The apparatus includes an engine; and a compute unit (CU). The engine is arranged operably to store an operation table including entries. The CU is arranged operably to fetch computation codes including execution codes, and synchronization requests; execute each execution code; and send requests to the engine in accordance with the synchronization requests for instructing the engine to allow components inside or outside of the GPU to complete operations in accordance with the entries of the operation table.Type: GrantFiled: July 2, 2021Date of Patent: May 30, 2023Assignee: Shanghai Biren Technology Co., LtdInventors: HaiChuan Wang, Song Zhao, GuoFang Jiao, ChengPing Luo, Zhou Hong
-
Publication number: 20220398102Abstract: An artificial intelligence chip and a data operation method are provided. The artificial intelligence chip receives a command carrying first data and address information and includes a chip memory, a computing processor, a base address register, and an extended address processor. The base address register is configured to access an extended address space in the chip memory. The extended address processor receives the command. The extended address processor determines an operation mode of the first data according to the address information. When the address information points to a first section of the extended address space, the extended address processor performs a first operation on the first data. When the address information points to a section other than the first section of the extended address space, the extended address processor notifies the computing processor of the operation mode and the computing processor performs a second operation on the first data.Type: ApplicationFiled: September 8, 2021Publication date: December 15, 2022Applicant: Shanghai Biren Technology Co.,LtdInventors: Zhou HONG, Qin ZHENG, ChengPing LUO, GuoFang JIAO, Song ZHAO, XiangLiang YU
-
Publication number: 20220164232Abstract: A method for managing resources, a computing device, and a computer-readable storage medium are provided. The method includes obtaining device information of multiple physical devices included in a computing node to confirm physical devices supporting a predetermined hardware resource management method; initializing at least one physical device among the physical devices supporting the predetermined hardware resource management method as a unified device view device; allocating a virtual storage address of the unified device view device, where the virtual storage address is mapped to a physical storage address of the physical device participating in the unified device view; transmitting data to the virtual storage address of the unified device view device; and issuing a computing task to the unified device view device via a task queue for using the physical device participating in the unified device view to execute the computing task.Type: ApplicationFiled: November 4, 2021Publication date: May 26, 2022Applicant: Shanghai Biren Technology Co.,LtdInventors: Long CHEN, HaiChuan WANG, GuoFang JIAO
-
Publication number: 20220129255Abstract: The invention relates to a method for compiling code adapted for secondary offloads in a graphics processing unit (GPU). The method, performed by a processing unit, includes: reconstructing execution codes in a first kernel into a second kernel. The second kernel includes an operation table including entries, and computation codes. The computation codes include a portion of the execution codes, and synchronization hooks, and each synchronization hook includes information indicating one entry of the operation table. An order of the portion of the execution codes and the synchronization hooks in the computation codes matches an order of the execution codes in the first kernel, thereby enabling a compute unit (CU) in the GPU to execute the computation codes, and an engine in the GPU to instruct a component inside or outside of the GPU to complete a designated operation in accordance with content of each entry in the operation table.Type: ApplicationFiled: July 2, 2021Publication date: April 28, 2022Applicant: Shanghai Biren Technology Co., LtdInventors: HaiChuan WANG, Song ZHAO, GuoFang JIAO, ChengPing LUO, Zhou HONG
-
Publication number: 20220129272Abstract: The invention relates to an apparatus for second offloads in a graphics processing unit (GPU). The apparatus includes an engine; and a compute unit (CU). The engine is arranged operably to store an operation table including entries. The CU is arranged operably to fetch computation codes including execution codes, and synchronization requests; execute each execution code; and send requests to the engine in accordance with the synchronization requests for instructing the engine to allow components inside or outside of the GPU to complete operations in accordance with the entries of the operation table.Type: ApplicationFiled: July 2, 2021Publication date: April 28, 2022Applicant: Shanghai Biren Technology Co., LtdInventors: HaiChuan WANG, Song ZHAO, GuoFang JIAO, ChengPing LUO, Zhou HONG
-
Publication number: 20220043688Abstract: Embodiments of this disclosure provide techniques for splitting a DAG computation model and constructing sub-DAG computation models for inter-node parallel processing. In particular, a method is provided where a plurality of processors split the DAG computation into a plurality of non-interdependent sub-nodes within each respective node of the DAG computation model. The plurality of processors includes at least two different processing unit types. The plurality of processors construct a plurality of sub-DAG computations, each sub-DAG computation including at least a non-interdependent sub-node from different nodes of the DAG computation. The plurality of processors process each of the plurality of sub-DAG computations in parallel.Type: ApplicationFiled: April 28, 2019Publication date: February 10, 2022Inventors: Shouwen Lai, Guofang Jiao
-
Patent number: 10658335Abstract: An integrated circuit package and a system including the integrated circuit package as well as a process for assembling the integrated circuit package are provided. The integrated circuit package includes a first die manufactured on a first wafer utilizing a first node size, a second die manufactured on a second wafer utilizing a second node size, and a substrate coupled to the second die at a plurality of bump sites on a bottom surface of the second die. The first die may be mounted on a top surface of the second die utilizing a hybrid wafer bonding technique, micro bumps, or electrode-less plating.Type: GrantFiled: January 25, 2018Date of Patent: May 19, 2020Assignee: Futurewei Technologies, Inc.Inventors: Shiqun Gu, Yu Lin, Jinghua Zhu, Guofang Jiao
-
Patent number: 10558460Abstract: Systems and techniques are disclosed for general purpose register dynamic allocation based on latency associated with of instructions in processor threads. A streaming processor can include a general purpose registers configured to stored data associated with threads, and a thread scheduler configured to receive allocation information for the general purpose registers, the information describing general purpose registers that are to be assigned as persistent general purpose registers (pGPRs) and volatile general purpose registers (vGPRs). The plurality of general purpose registers can be allocated according to the received information. The streaming processor can include the general purpose registers allocated according to the received information, the allocated based on execution latencies of instructions included in the threads.Type: GrantFiled: December 14, 2016Date of Patent: February 11, 2020Assignee: QUALCOMM IncorporatedInventors: Yun Du, Liang Han, Lin Chen, Chihong Zhang, Hongjiang Shang, Jing Wu, Zilin Ying, Chun Yu, Guofang Jiao, Andrew Gruber, Eric Demers
-
Patent number: 10388060Abstract: According to one aspect of the present disclosure, there is provided a method that includes: determining a block size according to capabilities of a processor; dividing a first view into a plurality of first pixel blocks having the block size and a second view into a plurality of second pixel blocks having the block size; rasterizing a primitive object to produce a subset of the first pixel blocks for the first view and a subset of the second pixel blocks for the second view; and rendering the subsets of the first and second pixel blocks produced for the primitive object to produce a first image for the first view and a second image for the second view, where the rendering is interleaved between the subsets of the first and second pixel blocks occupied by the primitive object in the first and second views.Type: GrantFiled: August 28, 2017Date of Patent: August 20, 2019Assignee: Futurewei Technologies, Inc.Inventor: Guofang Jiao
-
Publication number: 20190179635Abstract: Aspects of the disclosure provide a circuit that includes a processing circuit, a memory directly coupled to the processing circuit via a dedicated data bus and a control circuit. The processing circuit includes a dot product engine. The dot product engine is configured to perform, in response to an instruction, an operation that includes dot product calculations on a weight input and a pixel sample input, and to store a result of the operation into the memory. The control circuit is configured to control the dot product engine to perform arithmetic operations that include the dot product calculations, and control the dot product engine to perform an accumulation of outputs of the dot product calculations and data received from the memory via the dedicated data bus to generate the result of the operation.Type: ApplicationFiled: December 11, 2017Publication date: June 13, 2019Applicant: FUTUREWEI TECHNOLOGIES, INC.Inventors: Guofang Jiao, Zhou Hong, Chengkun Sun
-
Patent number: 10241799Abstract: Techniques are described for reordering commands to improve the speed at which at least one command stream may execute. Prior to distributing commands in the at least one command stream to multiple pipelines, a multimedia processor analyzes any inter-pipeline dependencies and determines the current execution state of the pipelines. The processor may, based on this information, reorder the at least one command stream by prioritizing commands that lack any current dependencies and therefore may be executed immediately by the appropriate pipeline. Such out of order execution of commands in the at least one command stream may increase the throughput of the multimedia processor by increasing the rate at which the command stream is executed.Type: GrantFiled: July 16, 2010Date of Patent: March 26, 2019Assignee: QUALCOMM IncorporatedInventors: Alexei V. Bourd, Guofang Jiao
-
Publication number: 20190066360Abstract: According to one aspect of the present disclosure, there is provided a method that includes: determining a block size according to capabilities of a processor; dividing a first view into a plurality of first pixel blocks having the block size and a second view into a plurality of second pixel blocks having the block size; rasterizing a primitive object to produce a subset of the first pixel blocks for the first view and a subset of the second pixel blocks for the second view; and rendering the subsets of the first and second pixel blocks produced for the primitive object to produce a first image for the first view and a second image for the second view, where the rendering is interleaved between the subsets of the first and second pixel blocks occupied by the primitive object in the first and second views.Type: ApplicationFiled: August 28, 2017Publication date: February 28, 2019Inventor: Guofang Jiao
-
Publication number: 20180366442Abstract: An integrated circuit package and a system including the integrated circuit package as well as a process for assembling the integrated circuit package are provided. The integrated circuit package includes a first die manufactured on a first wafer utilizing a first node size, a second die manufactured on a second wafer utilizing a second node size, and a substrate coupled to the second die at a plurality of bump sites on a bottom surface of the second die. The first die may be mounted on a top surface of the second die utilizing a hybrid wafer bonding technique, micro bumps, or electrode-less plating.Type: ApplicationFiled: January 25, 2018Publication date: December 20, 2018Inventors: Shiqun Gu, Yu Lin, Jinghua Zhu, Guofang Jiao
-
Publication number: 20180165092Abstract: Systems and techniques are disclosed for general purpose register dynamic allocation based on latency associated with of instructions in processor threads. A streaming processor can include a general purpose registers configured to stored data associated with threads, and a thread scheduler configured to receive allocation information for the general purpose registers, the information describing general purpose registers that are to be assigned as persistent general purpose registers (pGPRs) and volatile general purpose registers (vGPRs). The plurality of general purpose registers can be allocated according to the received information. The streaming processor can include the general purpose registers allocated according to the received information, the allocated based on execution latencies of instructions included in the threads.Type: ApplicationFiled: December 14, 2016Publication date: June 14, 2018Inventors: Yun Du, Liang Han, Lin Chen, Chihong Zhang, Hongjiang Shang, Jing Wu, Zilin Ying, Chun Yu, Guofang Jiao, Andrew Gruber, Eric Demers
-
Patent number: 9852536Abstract: This disclosure describes techniques for performing high order filtering in a graphics processing unit (GPU). In examples of the disclosure, high order filtering may be implemented on a modified texture engine of a GPU using a single shader instruction. The modified texture engine may be configured to fetch all source pixels needed for the high order filtering and blend them together with pre-loaded filtering weights.Type: GrantFiled: August 5, 2014Date of Patent: December 26, 2017Assignee: QUALCOMM IncorporatedInventors: Liang Li, Guofang Jiao, Yunshan Kong, Javier Ignacio Girado
-
Patent number: 9799089Abstract: A method for processing data in a graphics processing unit including receiving a code block of instructions common to a plurality of groups of threads of a shader, executing the code block of instructions common to the plurality of groups of threads of the shader creating a result by a first group of threads of the plurality of groups of threads, storing the result of the code block of instructions common to the plurality of groups of threads of the shader in on-chip random access memory (RAM), the on-chip RAM accessible by each of the plurality of groups of threads, and upon a determination that storing the result of the code block of instructions common to the plurality of groups of threads of the shader has completed, returning the result of the code block of instructions common to the plurality of groups of threads of the shader from on-chip RAM.Type: GrantFiled: May 23, 2016Date of Patent: October 24, 2017Assignee: QUALCOMM IncorporatedInventors: Lin Chen, Yun Du, Andrew Evan Gruber, Guofang Jiao, Chun Yu, David Rigel Garcia Garcia
-
Patent number: 9697580Abstract: This disclosure describes an apparatus configured to process graphics data. The apparatus may include a fixed hardware pipeline configured to execute one or more functions on a current set of graphics data. The fixed hardware pipeline may include a plurality of stages including a bypassable portion of the plurality of stages. The apparatus may further include a shortcut circuit configured to route the current set of graphics data around the bypassable portion of the plurality of stages, and a controller positioned before the bypassable portion of the plurality of stages, the controller configured to selectively route the current set of graphics data to one of the shortcut circuit or the bypassable portion of the plurality of stages.Type: GrantFiled: November 10, 2014Date of Patent: July 4, 2017Assignee: QUALCOMM IncorporatedInventors: Liang Li, Andrew Evan Gruber, Guofang Jiao, Zhenyu Qi, Gregory Steve Pitarys, Scott William Nolan