Patents by Inventor Kaiyu Chen
Kaiyu Chen has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20220414053Abstract: A processing apparatus includes a processing resource including a general-purpose parallel processing engine and a matrix accelerator. The matrix accelerator includes first circuitry to receive a command to perform operations associated with an instruction, second circuitry to configure the matrix accelerator according to a physical depth of a systolic array within the matrix accelerator and a logical depth associated with the instruction, third circuitry to read operands for the instruction from a register file associated with the systolic array, fourth circuitry to perform operations for the instruction via one or more passes through one or more physical pipeline stages of the systolic array based on a configuration performed by the second circuitry, and fifth circuitry to write output of the operations to the register file associated with the systolic array.Type: ApplicationFiled: June 24, 2021Publication date: December 29, 2022Applicant: Intel CorporationInventors: Jorge Parra, Wei-yu Chen, Kaiyu Chen, Varghese George, Junjie Gu, Chandra Gurram, Guei-Yuan Lueh, Stephen Junkins, Subramaniam Maiyuran, Supratim Pal
-
Patent number: 10699362Abstract: Embodiments provide support for divergent control flow in heterogeneous compute operations on a fused execution unit. On embodiment provides for a processing apparatus comprising a fused execution unit including multiple graphics execution units having a common instruction pointer; logic to serialize divergent function calls by the fused execution unit, the logic configured to compare a call target of execution channels within the fused execution unit and create multiple groups of channels, each group of channels associated with a single call target; and wherein the fused execution unit is to execute a first group of channels via a first execution unit and a second group of channels via a second execution unit.Type: GrantFiled: June 23, 2016Date of Patent: June 30, 2020Assignee: INTEL CORPORATIONInventors: Pratik J. Ashar, Guei-Yuan Ken Lueh, Kaiyu Chen, Subramaniam Maiyuran, Brent A. Schwartz, Darin M. Starkey
-
Patent number: 10692170Abstract: Embodiments described herein provide a graphics processor in which dependency tracking hardware is simplified via the use of compiler provided software scoreboard information. In one embodiment the shader compiler for shader programs is configured to encode software scoreboard information into each instruction. Dependencies can be evaluated by the shader compiler and provided as scoreboard information with each instruction. The hardware can then use the provided information when scheduling instructions. In one embodiment, a software scoreboard synchronization instruction is provided to facilitate software dependency handling within a shader program. Using software to facilitate software dependency handling and synchronization can simplify hardware design, reducing the area consumed by the hardware. In one embodiment, dependencies can be evaluated by the shader compiler instead of the GPU hardware.Type: GrantFiled: June 11, 2019Date of Patent: June 23, 2020Assignee: Intel CorporationInventors: Subramaniam Maiyuran, Supratim Pal, Jorge E. Parra, Chandra S. Gurram, Ashwin J. Shivani, Ashutosh Garg, Brent A. Schwartz, Jorge F. Garcia Pabon, Darin M. Starkey, Shubh B. Shah, Guei-Yuan Lueh, Kaiyu Chen, Konrad Trifunovic, Buqi Cheng, Weiyu Chen
-
Patent number: 10636112Abstract: A processing apparatus is described. The apparatus includes a graphics processing unit (GPU), including a plurality of execution units to process graphics context data and a register file having a plurality of registers to store the graphics context data; and register renaming logic to facilitate re-use of register data by partitioning a first part and a second part, the first part to include thread-independent code and the second part to include thread-dependent code.Type: GrantFiled: March 28, 2018Date of Patent: April 28, 2020Assignee: Intel CorporationInventors: Slawomir Grajewski, Kaiyu Chen, Guei-Yuan Lueh, Subramaniam Maiyuran
-
Patent number: 10565670Abstract: A processing apparatus is described. The apparatus includes a graphics processing unit (GPU), including a plurality of execution units to process graphics context data and a register file having a plurality of registers to store the graphics context data; and register renaming logic to facilitate dynamic renaming of the plurality of registers by logically partitioning the plurality of registers in the register file into a set of fixed registers and a set of shared registers.Type: GrantFiled: September 30, 2016Date of Patent: February 18, 2020Assignee: INTEL CORPORATIONInventors: Kaiyu Chen, Guei-Yuan Lueh, Subramaniam Maiyuran
-
Patent number: 10515431Abstract: Embodiments are generally directed to global optimal path determination utilizing parallel processing. An embodiment of an apparatus includes a central processing unit (CPU); a graphical processing unit (GPU), the GPU being capable of a plurality of processing threads; and a memory to store data for a system under evaluation, the system under evaluation including a set of nodes having a first endpoint, a second endpoint, and multiple paths between the first endpoint and the second endpoint. The apparatus is to determine a most energy efficient path between the first endpoint and the second endpoint utilizing parallel processing of a push and relabel graph cut algorithm. Performance of the push and relabel algorithm includes a plurality of process iterations, each process iteration including performance of a relabel operation, a push operation in a first direction, and a push operation in a second direction.Type: GrantFiled: December 12, 2017Date of Patent: December 24, 2019Assignee: INTEL CORPORATIONInventors: Yuenian Yang, Kaiyu Chen, Andrew Kuzma
-
Publication number: 20190362460Abstract: Embodiments described herein provide a graphics processor in which dependency tracking hardware is simplified via the use of compiler provided software scoreboard information. In one embodiment the shader compiler for shader programs is configured to encode software scoreboard information into each instruction. Dependencies can be evaluated by the shader compiler and provided as scoreboard information with each instruction. The hardware can then use the provided information when scheduling instructions. In one embodiment, a software scoreboard synchronization instruction is provided to facilitate software dependency handling within a shader program. Using software to facilitate software dependency handling and synchronization can simplify hardware design, reducing the area consumed by the hardware. In one embodiment, dependencies can be evaluated by the shader compiler instead of the GPU hardware.Type: ApplicationFiled: June 11, 2019Publication date: November 28, 2019Applicant: Intel CorporationInventors: Subramaniam Maiyuran, Supratim Pal, Jorge E. Parra, Chandra S. Gurram, Ashwin J. Shivani, Ashutosh Garg, Brent A. Schwartz, Jorge F. Garcia Pabon, Darin M. Starkey, Shubh B. Shah, Guei-Yuan Lueh, Kaiyu Chen, Konrad Trifunovic, Buqi Cheng, Weiyu Chen
-
Publication number: 20190304056Abstract: A processing apparatus is described. The apparatus includes a graphics processing unit (GPU), including a plurality of execution units to process graphics context data and a register file having a plurality of registers to store the graphics context data; and register renaming logic to facilitate re-use of register data by partitioning a first part and a second part, the first part to include thread-independent code and the second part to include thread-dependent code.Type: ApplicationFiled: March 28, 2018Publication date: October 3, 2019Inventors: Slawomir Grajewski, Kaiyu Chen, Guei-Yuan Lueh, Subramaniam Maiyuran
-
Patent number: 10360654Abstract: Embodiments described herein provide a graphics processor in which dependency tracking hardware is simplified via the use of compiler provided software scoreboard information. In one embodiment the shader compiler for shader programs is configured to encode software scoreboard information into each instruction. Dependencies can be evaluated by the shader compiler and provided as scoreboard information with each instruction. The hardware can then use the provided information when scheduling instructions. In one embodiment, a software scoreboard synchronization instruction is provided to facilitate software dependency handling within a shader program. Using software to facilitate software dependency handling and synchronization can simplify hardware design, reducing the area consumed by the hardware. In one embodiment, dependencies can be evaluated by the shader compiler instead of the GPU hardware.Type: GrantFiled: May 25, 2018Date of Patent: July 23, 2019Assignee: Intel CorporationInventors: Subramaniam Maiyuran, Supratim Pal, Jorge E. Parra, Chandra S. Gurram, Ashwin J. Shivani, Ashutosh Garg, Brent A. Schwartz, Jorge F. Garcia Pabon, Darin M. Starkey, Shubh B. Shah, Guei-Yuan Lueh, Kaiyu Chen, Konrad Trifunovic, Buqi Cheng, Weiyu Chen
-
Publication number: 20190180406Abstract: Embodiments are generally directed to global optimal path determination utilizing parallel processing. An embodiment of an apparatus includes a central processing unit (CPU); a graphical processing unit (GPU), the GPU being capable of a plurality of processing threads; and a memory to store data for a system under evaluation, the system under evaluation including a set of nodes having a first endpoint, a second endpoint, and multiple paths between the first endpoint and the second endpoint. The apparatus is to determine a most energy efficient path between the first endpoint and the second endpoint utilizing parallel processing of a push and relabel graph cut algorithm. Performance of the push and relabel algorithm includes a plurality of process iterations, each process iteration including performance of a relabel operation, a push operation in a first direction, and a push operation in a second direction.Type: ApplicationFiled: December 12, 2017Publication date: June 13, 2019Applicant: Intel CorporationInventors: Yuenian Yang, Kaiyu Chen, Andrew Kuzma
-
Patent number: 10282227Abstract: Systems and methods may provide for inserting one or more preemption instructions while compiling a computer program. The one or more preemption instructions being inserted within a preemption window in the computer program reduces the number of live registers at each preemption instruction position. Further, the preemption instruction instructs which registers are to be saved at a particular program position, typically the registers that are live at that program position. The compiled program may be run in an execution unit. A preemption request may be made to the execution unit and executed at a next available preemption instruction in the program being run in the execution unit.Type: GrantFiled: November 18, 2014Date of Patent: May 7, 2019Assignee: Intel CorporationInventors: Guei-Yuan Lueh, Subramaniam Maiyuran, Wei-Yu Chen, Kaiyu Chen
-
Publication number: 20180096446Abstract: A processing apparatus is described. The apparatus includes a graphics processing unit (GPU), including a plurality of execution units to process graphics context data and a register file having a plurality of registers to store the graphics context data; and register renaming logic to facilitate dynamic renaming of the plurality of registers by logically partitioning the plurality of registers in the register file into a set of fixed registers and a set of shared registers.Type: ApplicationFiled: September 30, 2016Publication date: April 5, 2018Inventors: Kaiyu Chen, Guei-Yuan Lueh, Subramaniam Maiyuran
-
Publication number: 20170372446Abstract: Embodiments provide support for divergent control flow in heterogeneous compute operations on a fused execution unit. On embodiment provides for a processing apparatus comprising a fused execution unit including multiple graphics execution units having a common instruction pointer; logic to serialize divergent function calls by the fused execution unit, the logic configured to compare a call target of execution channels within the fused execution unit and create multiple groups of channels, each group of channels associated with a single call target; and wherein the fused execution unit is to execute a first group of channels via a first execution unit and a second group of channels via a second execution unit.Type: ApplicationFiled: June 23, 2016Publication date: December 28, 2017Applicant: Intel CorporationInventors: Pratik J. Ashar, Guei-Yuan Ken Lueh, Kaiyu Chen, Subramaniam Maiyuran, Brent A. Schwartz, Darin M. Starkey
-
Publication number: 20160140686Abstract: Systems and methods may provide for inserting one or more preemption instructions while compiling a computer program. The one or more preemption instructions being inserted within a preemption window in the computer program reduces the number of live registers at each preemption instruction position. Further, the preemption instruction instructs which registers are to be saved at a particular program position, typically the registers that are live at that program position. The compiled program may be run in an execution unit. A preemption request may be made to the execution unit and executed at a next available preemption instruction in the program being run in the execution unit.Type: ApplicationFiled: November 18, 2014Publication date: May 19, 2016Applicant: INTEL CORPORATIONInventors: GUEI-YUAN LUEH, SUBRAMANIAM MAIYURAN, WEI-YU CHEN, KAIYU CHEN
-
Publication number: 20130159680Abstract: Methods, systems, and computer program products for the performance of arithmetic operations on large numbers. The addition of large numbers may be parallelized by adding corresponding sections of the numbers in parallel. The multiplication of large numbers may be accomplished by applying a multiplier to a multiplicand after the latter is divided into sections, where the multiplication of the sections is performed in parallel. Products for each section are saved in high and low order vectors, which may then be aligned and added. The comparison of two large numbers may be performed by comparing the numbers, section by section, in parallel. In an embodiment, these processes may be performed in a graphics processing unit (GPU) having multiple cores. In an embodiment, such a GPU may be integrated into a larger die that also incorporates one or more conventional central processing unit (CPU) cores.Type: ApplicationFiled: December 19, 2011Publication date: June 20, 2013Inventors: Wei-yu Chen, Guei-yuan Lueh, Kaiyu Chen, Xiaozhu Kang
-
Patent number: 8084304Abstract: A method for preventing gate oxide damage of a trench MOSFET during wafer processing while adding an ESD protection module atop the trench MOSFET includes fabricate numerous trench MOSFETs on a wafer; add a Si3N4 isolation layer, capable of preventing the LTO patterning process from damaging the gate oxide, atop the wafer; add numerous ESD protection modules atop the Si3N4 isolation layer.Type: GrantFiled: May 29, 2010Date of Patent: December 27, 2011Assignee: Alpha & Omega Semiconductor, Inc.Inventors: Mengyu Pan, Zengyi He, Kaiyu Chen
-
Publication number: 20110018054Abstract: A method and device structure are disclosed for preventing gate oxide damage of a trench MOSFET during wafer processing while adding an ESD protection module atop the trench MOSFET. The ESD protection module has a low temperature oxide (LTO) bottom layer whose patterning process is found to cause the gate oxide damage. The method includes: a) Fabricate numerous trench MOSFETs on a wafer. b) Add a Si3N4 isolation layer, capable of preventing the LTO patterning process from damaging the gate oxide, atop the wafer. c) Add numerous ESD protection modules atop the Si3N4 isolation layer. d) Remove those portions of the Si3N4 isolation layer that are not beneath the ESD protection modules. In one embodiment, hydrofluoric acid is used as a first etchant for patterning the LTO while hot phosphoric acid is used as a second etchant for removing portions of the Si3N4 isolation layer.Type: ApplicationFiled: May 29, 2010Publication date: January 27, 2011Inventors: Mengyu Pan, Zengyi He, Kaiyu Chen
-
Patent number: 7728385Abstract: A device structure is disclosed for preventing gate oxide damage of a trench MOSFET during wafer processing while adding an ESD protection module atop the trench MOSFET. The ESD protection module has a low temperature oxide (LTO) bottom layer whose patterning process was found to cause the gate oxide damage before. The present invention structure includes a semiconductor substrate having an active area and a termination area; numerous trench MOSFET cells disposed in the active area; numerous electrostatic discharge (ESD) diodes disposed above the semiconductor substrate in the termination area; and an insulation layer comprising Oxide/Nitride/Oxide (ONO) sandwiched between the ESD diodes and the semiconductor substrate. In one embodiment, the active area does not contain the ONO insulation layer.Type: GrantFiled: July 22, 2009Date of Patent: June 1, 2010Assignee: Alpha & Omega Semiconductor, Ltd.Inventors: Mengyu Pan, Zengyi He, Kaiyu Chen
-
Publication number: 20090278199Abstract: A method and device structure are disclosed for preventing gate oxide damage of a trench MOSFET during wafer processing while adding an ESD protection module atop the trench MOSFET. The ESD protection module has a low temperature oxide (LTO) bottom layer whose patterning process is found to cause the gate oxide damage. The method includes: a) Fabricate numerous trench MOSFETs on a wafer. b) Add a Si3N4 isolation layer, capable of preventing the LTO patterning process from damaging the gate oxide, atop the wafer. c) Add numerous ESD protection modules atop the Si3N4 isolation layer. d) Remove those portions of the Si3N4 isolation layer that are not beneath the ESD protection modules. In one embodiment, hydrofluoric acid is used as a first etchant for patterning the LTO while hot phosphoric acid is used as a second etchant for removing portions of the Si3N4 isolation layer.Type: ApplicationFiled: July 22, 2009Publication date: November 12, 2009Inventors: Mengyu Pan, Zengyi He, Kaiyu Chen
-
Patent number: 7585705Abstract: A method and device structure are disclosed for preventing gate oxide damage of a trench MOSFET during wafer processing while adding an ESD protection module atop the trench MOSFET. The ESD protection module has a low temperature oxide (LTO) bottom layer whose patterning process is found to cause the gate oxide damage. The method includes: a) Fabricate numerous trench MOSFETs on a wafer. b) Add a Si3N4 isolation layer, capable of preventing the LTO patterning process from damaging the gate oxide, atop the wafer. c) Add numerous ESD protection modules atop the Si3N4 isolation layer. d) Remove those portions of the Si3N4 isolation layer that are not beneath the ESD protection modules. In one embodiment, hydrofluoric acid is used as a first etchant for patterning the LTO while hot phosphoric acid is used as a second etchant for removing portions of the Si3N4 isolation layer.Type: GrantFiled: November 29, 2007Date of Patent: September 8, 2009Assignee: Alpha & Omega Semiconductor, Inc.Inventors: Mengyu Pan, Zengyi He, Kaiyu Chen