Patents by Inventor Kaiyu Chen

Kaiyu Chen has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

32-BIT CHANNEL-ALIGNED INTEGER MULTIPLICATION VIA MULTIPLE MULTIPLIERS PER-CHANNEL

Publication number: 20250037347

Abstract: Described herein is a graphics processor comprising an instruction cache and a plurality of processing elements coupled with the instruction cache. The plurality of processing elements include functional units configured to provide an integer pipeline to execute instructions to perform operations on integer data elements. The integer pipeline including a first multiplier and a second multiplier, the first multiplier and the second multiplier configured to execute operations for a single instruction.

Type: Application

Filed: July 25, 2023

Publication date: January 30, 2025

Applicant: Intel Corporation

Inventors: Jiasheng Chen, Supratim Pal, Kevin Hurd, Jorge E. Parra Osorio, Christopher Spencer, Takashi Nakagawa, Guei-Yuan Lueh, Pradeep K. Golconda, James Valerio, Mukundan Swaminathan, Nicholas Murphy, Clifford Gibson, Li-An Tang, Fangwen Fu, Kaiyu Chen, Buqi Cheng
Systolic array of arbitrary physical and logical depth

Patent number: 12174783

Abstract: A processing apparatus includes a processing resource including a general-purpose parallel processing engine and a matrix accelerator. The matrix accelerator includes first circuitry to receive a command to perform operations associated with an instruction, second circuitry to configure the matrix accelerator according to a physical depth of a systolic array within the matrix accelerator and a logical depth associated with the instruction, third circuitry to read operands for the instruction from a register file associated with the systolic array, fourth circuitry to perform operations for the instruction via one or more passes through one or more physical pipeline stages of the systolic array based on a configuration performed by the second circuitry, and fifth circuitry to write output of the operations to the register file associated with the systolic array.

Type: Grant

Filed: June 24, 2021

Date of Patent: December 24, 2024

Assignee: Intel Corporation

Inventors: Jorge Parra, Wei-yu Chen, Kaiyu Chen, Varghese George, Junjie Gu, Chandra Gurram, Guei-Yuan Lueh, Stephen Junkins, Subramaniam Maiyuran, Supratim Pal
SILICON-CARBIDE-BASED MOSFET DEVICE AND METHOD FOR MANUFACTURING SAME

Publication number: 20240222491

Abstract: A SiC-based MOSFET device and a method for manufacturing the same. Layout design of SiC-based MOSFET devices is optimized, which keeps the JFET region while introducing shielding regions extending into the JFET region, thereby retaining the current flow area of the JFET region to a great extent; each shielding region is connected to the respective well region and extends into the JFET region along a diagonal direction of the cellular structure, effectively shielding high electric field regions when the device is reverse biased, and significantly enhancing the device's reliability. The shielding regions and the well regions are simultaneously formed, requiring no additional process, avoiding increase in complexity and cost of manufacturing. This approach achieves low on-resistance and prevents a decrease in reliability caused by the electric field strength at the bottom of the gate oxide layer exceeding a critical breakdown electric field strength of the gate oxide layer.

Type: Application

Filed: December 20, 2023

Publication date: July 4, 2024

Applicant: Alkaid-Semi Technologies (Shanghai) Co., Ltd

Inventors: Kaiyu CHEN, Xiaowen WANG
CROSS-THREAD REGISTER SHARING FOR MATRIX MULTIPLICATION COMPUTE

Publication number: 20240168807

Abstract: An apparatus to facilitate cross-thread register sharing for matrix multiplication compute is disclosed. The apparatus includes matrix acceleration hardware comprising a plurality of data processing units, wherein the respective plurality of data processing units are to: receive a decoded instruction for a first thread having a first register space, wherein the decoded instruction is for a matrix multiplication operation and comprises an indication to utilize a second register space of a second thread for an operand of the decoded instruction for the first thread; access the second register space of the second thread to obtain data for the operand of the decoded instruction; and perform the matrix multiplication operation for the first thread using the data for the operand from the second register space of the second thread.

Type: Application

Filed: November 18, 2022

Publication date: May 23, 2024

Applicant: Intel Corporation

Inventors: Jorge Eduardo Parra Osorio, Guei-Yuan Lueh, Maxim Kazakov, Fangwen Fu, Supratim Pal, Kaiyu Chen
SUPER JUNCTION TRENCH MOSFET AND METHOD FOR PREPARING SAME

Publication number: 20240145534

Abstract: A method for preparing a super junction trench MOSFET, comprising: providing a substrate, and forming a first trench in the substrate; depositing an epitaxial portion of a first stage in the first trench while supplying a doped gas and an etching gas, and performing an epitaxial process after stopping supplying the doped gas and the etching gas, wherein impurities in the epitaxial portion of the first stage are diffused to an upper portion of the first trench and to form an epitaxial portion of a second stage with a gradient concentration by utilizing a high-temperature environment of the epitaxial process; forming a well region, a trench gate, and an active region in the substrate at a periphery of the first trench; forming an interlayer dielectric layer covering the column, the trench gate, and the active region; and electrically leading out the column, the trench gate, and the active region.

Type: Application

Filed: October 31, 2023

Publication date: May 2, 2024

Applicant: Alkaid-Semi Technologies (Shanghai) Co.,Ltd

Inventor: Kaiyu CHEN
SYSTOLIC ARRAY OF ARBITRARY PHYSICAL AND LOGICAL DEPTH

Publication number: 20220414053

Abstract: A processing apparatus includes a processing resource including a general-purpose parallel processing engine and a matrix accelerator. The matrix accelerator includes first circuitry to receive a command to perform operations associated with an instruction, second circuitry to configure the matrix accelerator according to a physical depth of a systolic array within the matrix accelerator and a logical depth associated with the instruction, third circuitry to read operands for the instruction from a register file associated with the systolic array, fourth circuitry to perform operations for the instruction via one or more passes through one or more physical pipeline stages of the systolic array based on a configuration performed by the second circuitry, and fifth circuitry to write output of the operations to the register file associated with the systolic array.

Type: Application

Filed: June 24, 2021

Publication date: December 29, 2022

Applicant: Intel Corporation

Inventors: Jorge Parra, Wei-yu Chen, Kaiyu Chen, Varghese George, Junjie Gu, Chandra Gurram, Guei-Yuan Lueh, Stephen Junkins, Subramaniam Maiyuran, Supratim Pal
Divergent control flow for fused EUs

Patent number: 10699362

Abstract: Embodiments provide support for divergent control flow in heterogeneous compute operations on a fused execution unit. On embodiment provides for a processing apparatus comprising a fused execution unit including multiple graphics execution units having a common instruction pointer; logic to serialize divergent function calls by the fused execution unit, the logic configured to compare a call target of execution channels within the fused execution unit and create multiple groups of channels, each group of channels associated with a single call target; and wherein the fused execution unit is to execute a first group of channels via a first execution unit and a second group of channels via a second execution unit.

Type: Grant

Filed: June 23, 2016

Date of Patent: June 30, 2020

Assignee: INTEL CORPORATION

Inventors: Pratik J. Ashar, Guei-Yuan Ken Lueh, Kaiyu Chen, Subramaniam Maiyuran, Brent A. Schwartz, Darin M. Starkey
Software scoreboard information and synchronization

Patent number: 10692170

Abstract: Embodiments described herein provide a graphics processor in which dependency tracking hardware is simplified via the use of compiler provided software scoreboard information. In one embodiment the shader compiler for shader programs is configured to encode software scoreboard information into each instruction. Dependencies can be evaluated by the shader compiler and provided as scoreboard information with each instruction. The hardware can then use the provided information when scheduling instructions. In one embodiment, a software scoreboard synchronization instruction is provided to facilitate software dependency handling within a shader program. Using software to facilitate software dependency handling and synchronization can simplify hardware design, reducing the area consumed by the hardware. In one embodiment, dependencies can be evaluated by the shader compiler instead of the GPU hardware.

Type: Grant

Filed: June 11, 2019

Date of Patent: June 23, 2020

Assignee: Intel Corporation

Inventors: Subramaniam Maiyuran, Supratim Pal, Jorge E. Parra, Chandra S. Gurram, Ashwin J. Shivani, Ashutosh Garg, Brent A. Schwartz, Jorge F. Garcia Pabon, Darin M. Starkey, Shubh B. Shah, Guei-Yuan Lueh, Kaiyu Chen, Konrad Trifunovic, Buqi Cheng, Weiyu Chen
Graphics processor register data re-use mechanism

Patent number: 10636112

Abstract: A processing apparatus is described. The apparatus includes a graphics processing unit (GPU), including a plurality of execution units to process graphics context data and a register file having a plurality of registers to store the graphics context data; and register renaming logic to facilitate re-use of register data by partitioning a first part and a second part, the first part to include thread-independent code and the second part to include thread-dependent code.

Type: Grant

Filed: March 28, 2018

Date of Patent: April 28, 2020

Assignee: Intel Corporation

Inventors: Slawomir Grajewski, Kaiyu Chen, Guei-Yuan Lueh, Subramaniam Maiyuran
Graphics processor register renaming mechanism

Patent number: 10565670

Abstract: A processing apparatus is described. The apparatus includes a graphics processing unit (GPU), including a plurality of execution units to process graphics context data and a register file having a plurality of registers to store the graphics context data; and register renaming logic to facilitate dynamic renaming of the plurality of registers by logically partitioning the plurality of registers in the register file into a set of fixed registers and a set of shared registers.

Type: Grant

Filed: September 30, 2016

Date of Patent: February 18, 2020

Assignee: INTEL CORPORATION

Inventors: Kaiyu Chen, Guei-Yuan Lueh, Subramaniam Maiyuran
Global optimal path determination utilizing parallel processing

Patent number: 10515431

Abstract: Embodiments are generally directed to global optimal path determination utilizing parallel processing. An embodiment of an apparatus includes a central processing unit (CPU); a graphical processing unit (GPU), the GPU being capable of a plurality of processing threads; and a memory to store data for a system under evaluation, the system under evaluation including a set of nodes having a first endpoint, a second endpoint, and multiple paths between the first endpoint and the second endpoint. The apparatus is to determine a most energy efficient path between the first endpoint and the second endpoint utilizing parallel processing of a push and relabel graph cut algorithm. Performance of the push and relabel algorithm includes a plurality of process iterations, each process iteration including performance of a relabel operation, a push operation in a first direction, and a push operation in a second direction.

Type: Grant

Filed: December 12, 2017

Date of Patent: December 24, 2019

Assignee: INTEL CORPORATION

Inventors: Yuenian Yang, Kaiyu Chen, Andrew Kuzma
SOFTWARE SCOREBOARD INFORMATION AND SYNCHRONIZATION

Publication number: 20190362460

Abstract: Embodiments described herein provide a graphics processor in which dependency tracking hardware is simplified via the use of compiler provided software scoreboard information. In one embodiment the shader compiler for shader programs is configured to encode software scoreboard information into each instruction. Dependencies can be evaluated by the shader compiler and provided as scoreboard information with each instruction. The hardware can then use the provided information when scheduling instructions. In one embodiment, a software scoreboard synchronization instruction is provided to facilitate software dependency handling within a shader program. Using software to facilitate software dependency handling and synchronization can simplify hardware design, reducing the area consumed by the hardware. In one embodiment, dependencies can be evaluated by the shader compiler instead of the GPU hardware.

Type: Application

Filed: June 11, 2019

Publication date: November 28, 2019

Applicant: Intel Corporation

Inventors: Subramaniam Maiyuran, Supratim Pal, Jorge E. Parra, Chandra S. Gurram, Ashwin J. Shivani, Ashutosh Garg, Brent A. Schwartz, Jorge F. Garcia Pabon, Darin M. Starkey, Shubh B. Shah, Guei-Yuan Lueh, Kaiyu Chen, Konrad Trifunovic, Buqi Cheng, Weiyu Chen
GRAPHICS PROCESSOR REGISTER DATA RE-USE MECHANISM

Publication number: 20190304056

Abstract: A processing apparatus is described. The apparatus includes a graphics processing unit (GPU), including a plurality of execution units to process graphics context data and a register file having a plurality of registers to store the graphics context data; and register renaming logic to facilitate re-use of register data by partitioning a first part and a second part, the first part to include thread-independent code and the second part to include thread-dependent code.

Type: Application

Filed: March 28, 2018

Publication date: October 3, 2019

Inventors: Slawomir Grajewski, Kaiyu Chen, Guei-Yuan Lueh, Subramaniam Maiyuran
Software scoreboard information and synchronization

Patent number: 10360654

Abstract: Embodiments described herein provide a graphics processor in which dependency tracking hardware is simplified via the use of compiler provided software scoreboard information. In one embodiment the shader compiler for shader programs is configured to encode software scoreboard information into each instruction. Dependencies can be evaluated by the shader compiler and provided as scoreboard information with each instruction. The hardware can then use the provided information when scheduling instructions. In one embodiment, a software scoreboard synchronization instruction is provided to facilitate software dependency handling within a shader program. Using software to facilitate software dependency handling and synchronization can simplify hardware design, reducing the area consumed by the hardware. In one embodiment, dependencies can be evaluated by the shader compiler instead of the GPU hardware.

Type: Grant

Filed: May 25, 2018

Date of Patent: July 23, 2019

Assignee: Intel Corporation

Inventors: Subramaniam Maiyuran, Supratim Pal, Jorge E. Parra, Chandra S. Gurram, Ashwin J. Shivani, Ashutosh Garg, Brent A. Schwartz, Jorge F. Garcia Pabon, Darin M. Starkey, Shubh B. Shah, Guei-Yuan Lueh, Kaiyu Chen, Konrad Trifunovic, Buqi Cheng, Weiyu Chen
GLOBAL OPTIMAL PATH DETERMINATION UTILIZING PARALLEL PROCESSING

Publication number: 20190180406

Abstract: Embodiments are generally directed to global optimal path determination utilizing parallel processing. An embodiment of an apparatus includes a central processing unit (CPU); a graphical processing unit (GPU), the GPU being capable of a plurality of processing threads; and a memory to store data for a system under evaluation, the system under evaluation including a set of nodes having a first endpoint, a second endpoint, and multiple paths between the first endpoint and the second endpoint. The apparatus is to determine a most energy efficient path between the first endpoint and the second endpoint utilizing parallel processing of a push and relabel graph cut algorithm. Performance of the push and relabel algorithm includes a plurality of process iterations, each process iteration including performance of a relabel operation, a push operation in a first direction, and a push operation in a second direction.

Type: Application

Filed: December 12, 2017

Publication date: June 13, 2019

Applicant: Intel Corporation

Inventors: Yuenian Yang, Kaiyu Chen, Andrew Kuzma
Efficient preemption for graphics processors

Patent number: 10282227

Abstract: Systems and methods may provide for inserting one or more preemption instructions while compiling a computer program. The one or more preemption instructions being inserted within a preemption window in the computer program reduces the number of live registers at each preemption instruction position. Further, the preemption instruction instructs which registers are to be saved at a particular program position, typically the registers that are live at that program position. The compiled program may be run in an execution unit. A preemption request may be made to the execution unit and executed at a next available preemption instruction in the program being run in the execution unit.

Type: Grant

Filed: November 18, 2014

Date of Patent: May 7, 2019

Assignee: Intel Corporation

Inventors: Guei-Yuan Lueh, Subramaniam Maiyuran, Wei-Yu Chen, Kaiyu Chen
Graphics Processor Register Renaming Mechanism

Publication number: 20180096446

Abstract: A processing apparatus is described. The apparatus includes a graphics processing unit (GPU), including a plurality of execution units to process graphics context data and a register file having a plurality of registers to store the graphics context data; and register renaming logic to facilitate dynamic renaming of the plurality of registers by logically partitioning the plurality of registers in the register file into a set of fixed registers and a set of shared registers.

Type: Application

Filed: September 30, 2016

Publication date: April 5, 2018

Inventors: Kaiyu Chen, Guei-Yuan Lueh, Subramaniam Maiyuran
Divergent Control Flow for Fused EUs

Publication number: 20170372446

Abstract: Embodiments provide support for divergent control flow in heterogeneous compute operations on a fused execution unit. On embodiment provides for a processing apparatus comprising a fused execution unit including multiple graphics execution units having a common instruction pointer; logic to serialize divergent function calls by the fused execution unit, the logic configured to compare a call target of execution channels within the fused execution unit and create multiple groups of channels, each group of channels associated with a single call target; and wherein the fused execution unit is to execute a first group of channels via a first execution unit and a second group of channels via a second execution unit.

Type: Application

Filed: June 23, 2016

Publication date: December 28, 2017

Applicant: Intel Corporation

Inventors: Pratik J. Ashar, Guei-Yuan Ken Lueh, Kaiyu Chen, Subramaniam Maiyuran, Brent A. Schwartz, Darin M. Starkey
EFFICIENT PREEMPTION FOR GRAPHICS PROCESSORS

Publication number: 20160140686

Abstract: Systems and methods may provide for inserting one or more preemption instructions while compiling a computer program. The one or more preemption instructions being inserted within a preemption window in the computer program reduces the number of live registers at each preemption instruction position. Further, the preemption instruction instructs which registers are to be saved at a particular program position, typically the registers that are live at that program position. The compiled program may be run in an execution unit. A preemption request may be made to the execution unit and executed at a next available preemption instruction in the program being run in the execution unit.

Type: Application

Filed: November 18, 2014

Publication date: May 19, 2016

Applicant: INTEL CORPORATION

Inventors: GUEI-YUAN LUEH, SUBRAMANIAM MAIYURAN, WEI-YU CHEN, KAIYU CHEN
SYSTEMS, METHODS, AND COMPUTER PROGRAM PRODUCTS FOR PARALLELIZING LARGE NUMBER ARITHMETIC

Publication number: 20130159680

Abstract: Methods, systems, and computer program products for the performance of arithmetic operations on large numbers. The addition of large numbers may be parallelized by adding corresponding sections of the numbers in parallel. The multiplication of large numbers may be accomplished by applying a multiplier to a multiplicand after the latter is divided into sections, where the multiplication of the sections is performed in parallel. Products for each section are saved in high and low order vectors, which may then be aligned and added. The comparison of two large numbers may be performed by comparing the numbers, section by section, in parallel. In an embodiment, these processes may be performed in a graphics processing unit (GPU) having multiple cores. In an embodiment, such a GPU may be integrated into a larger die that also incorporates one or more conventional central processing unit (CPU) cores.

Type: Application

Filed: December 19, 2011

Publication date: June 20, 2013

Inventors: Wei-yu Chen, Guei-yuan Lueh, Kaiyu Chen, Xiaozhu Kang

1 2 next