Patents by Inventor Supratim Pal

Supratim Pal has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

LARGE INTEGER MULTIPLICATION ENHANCEMENTS FOR GRAPHICS ENVIRONMENT

Publication number: 20250231764

Abstract: An apparatus to facilitate large integer multiplication enhancements in a graphics environment is disclosed. The apparatus includes a processor comprising processing resources, the processing resources comprising multiplier circuitry to: receive operands for a multiplication operation, wherein the multiplication operation is part of a chain of multiplication operations for a large integer multiplication; and issue a multiply and add (MAD) instruction for the multiplication operation utilizing at least one of a double precision multiplier or a 48 bit output, wherein the MAD instruction to generate an output in a single clock cycle of the processor.

Type: Application

Filed: January 16, 2025

Publication date: July 17, 2025

Applicant: Intel Corporation

Inventors: Supratim Pal, Li-An Tang, Changwon Rhee, Timothy R. Bauer, Alexander Lyashevsky, Jiasheng Chen
BINDLESS THREAD DISPATCH MID-THREAD PREEMPTION ON A GRAPHICS PROCESSOR

Publication number: 20250231769

Abstract: Techniques are provided to enable mid-thread (instruction level) preemption in a graphics processor without requiring software intervention by a graphics driver associated with the graphics processor. Hardware based mid-thread preemption is facilitated using the thread dispatch hardware of the graphics processor to trigger execution of a kernel program by the graphics processor that saves the thread state of preempted processing resources and facilitates the subsequent restoration of that thread state to the same or a different set of processing resources.

Type: Application

Filed: January 17, 2024

Publication date: July 17, 2025

Applicant: Intel Corporation

Inventors: Vasanth Ranganathan, James Valerio, Aditya Navale, Alan Curtis, Jeffery S. Boles, Hema Chand Nalluri, Vikranth Vemulapalli, Supratim Pal, Boris Kuznetsov, John A. Wiegert
Register file for systolic array

Patent number: 12346694

Abstract: A processing apparatus includes a general-purpose parallel processing engine including a set of multiple processing elements including a single precision floating-point unit, a double precision floating point unit, and an integer unit; a matrix accelerator including one or more systolic arrays; a first register file coupled with a first read control circuit, wherein the first read control circuit couples with the set of multiple processing elements and the matrix accelerator to arbitrate read requests to the first register file from the set of multiple processing elements and the matrix accelerator; and a second register file coupled with a second read control circuit, wherein the second read control circuit couples with the matrix accelerator to arbitrate read requests to the second register file from the matrix accelerator and limit access to the second register file by the set of multiple processing elements.

Type: Grant

Filed: June 25, 2021

Date of Patent: July 1, 2025

Assignee: Intel Corporation

Inventors: Chandra Gurram, Wei-yu Chen, Fangwen Fu, Sabareesh Ganapathy, Varghese George, Guei-Yuan Lueh, Subramaniam Maiyuran, Mike Macpherson, Supratim Pal, Jorge Parra
MULTIPLE REGISTER ALLOCATION SIZES FOR GPU HARDWARE THREADS

Publication number: 20250147762

Abstract: Described herein is a graphics processor having processing resources with configurable thread and register configurations. Program code can configure a number of registers and accumulators that will be used by hardware threads during execution of the program code by the graphics processor. Processing resources within the graphics processor can be configured to assign different numbers of registers and accumulators to hardware threads based on the configuration requested by program code to be executed by the processing resource.

Type: Application

Filed: November 8, 2023

Publication date: May 8, 2025

Applicant: Intel Corporation

Inventors: Vasanth Ranganathan, Gang Chen, Supratim Pal, Jorge Eduardo Parra Osorio, Arthur Hunter, Boris Kuznetsov, Deepak N K, Siva Kumar Seemakurthi, James Valerio, Shubham Dinesh Chavan, Abhishek Kumar Singh, Samir Pandya, Sandeep Tippannanavar Niranjan, Alan Curtis, Jain Philip, Maltesh Kulkarni, Fangwen Fu, John Wiegert, Brent Schwartz
DUAL PIPELINE PARALLEL SYSTOLIC ARRAY

Publication number: 20250117359

Abstract: A processing apparatus described herein includes a general-purpose parallel processing engine comprising a systolic array having multiple pipelines, each of the multiple pipelines including multiple pipeline stages, wherein the multiple pipelines include a first pipeline, a second pipeline, and a common input shared between the first pipeline and the second pipeline.

Type: Application

Filed: October 11, 2024

Publication date: April 10, 2025

Applicant: Intel Corporation

Inventors: Jorge Parra, Jiasheng Chen, Supratim Pal, Fangwen Fu, Sabareesh Ganapathy, Chandra Gurram, Chunhui Mei, Yue Qi
SYSTOLIC ARRAY OF ARBITRARY PHYSICAL AND LOGICAL DEPTH

Publication number: 20250117360

Abstract: A processing apparatus includes a processing resource including a general-purpose parallel processing engine and a matrix accelerator. The matrix accelerator includes first circuitry to receive a command to perform operations associated with an instruction, second circuitry to configure the matrix accelerator according to a physical depth of a systolic array within the matrix accelerator and a logical depth associated with the instruction, third circuitry to read operands for the instruction from a register file associated with the systolic array, fourth circuitry to perform operations for the instruction via one or more passes through one or more physical pipeline stages of the systolic array based on a configuration performed by the second circuitry, and fifth circuitry to write output of the operations to the register file associated with the systolic array.

Type: Application

Filed: October 30, 2024

Publication date: April 10, 2025

Applicant: Intel Corporation

Inventors: Jorge Parra, Wei-yu Chen, Kaiyu Chen, Varghese George, Junjie Gu, Chandra Gurram, Guei-Yuan Lueh, Stephen Junkins, Subramaniam Maiyuran, Supratim Pal
UNBLOCKING THE INTEGER PIPELINE DURING MATH PIPELINE PHASES IN A GRAPHICS ENVIRONMENT

Publication number: 20250085969

Abstract: An apparatus to facilitate unblocking the integer pipeline during math pipeline phases in a graphics environment is disclosed. The apparatus includes an execution resource comprising: a thread arbiter; a plurality of execution pipeline hardware circuitry comprising a math execution pipeline and an integer execution pipeline to share resources of the thread arbiter; arbitration hardware circuitry to determine whether the math execution pipeline is available for loading math operand data of a math instruction; and a math instruction staging buffer to store the math operand data responsive to the math execution pipeline not being available; wherein the integer execution pipeline is to receive integer operand data for an integer instruction while bypassing the math operand data in the math instruction staging buffer; and wherein the math execution pipeline is to receive, responsive to the math execution pipeline becoming available, the math operand data from the math instruction staging buffer.

Type: Application

Filed: September 7, 2023

Publication date: March 13, 2025

Applicant: Intel Corporation

Inventors: Shuai Mu, Supratim Pal, Pradeep Golconda, Srilakshmi Jammula, Jiasheng Chen
Supporting 8-bit floating point format operands in a computing architecture

Patent number: 12242846

Abstract: An apparatus to facilitate supporting 8-bit floating point format operands in a computing architecture is disclosed. The apparatus includes a processor comprising: a decoder to decode an instruction fetched for execution into a decoded instruction, wherein the decoded instruction is a matrix instruction that operates on 8-bit floating point operands to cause the processor to perform a parallel dot product operation; a controller to schedule the decoded instruction and provide input data for the 8-bit floating point operands in accordance with an 8-bit floating data format indicated by the decoded instruction; and systolic dot product circuitry to execute the decoded instruction using systolic layers, each systolic layer comprises one or more sets of interconnected multipliers, shifters, and adder, each set of multipliers, shifters, and adders to generate a dot product of the 8-bit floating point operands.

Type: Grant

Filed: March 27, 2024

Date of Patent: March 4, 2025

Assignee: INTEL CORPORATION

Inventors: Naveen Mellempudi, Subramaniam Maiyuran, Varghese George, Fangwen Fu, Shuai Mu, Supratim Pal, Wei Xiong
DISTRIBUTED REGISTER FILE CACHE TO REDUCE L1 BANDWIDTH REQUIREMENTS

Publication number: 20250068473

Abstract: Described herein is a graphics processor comprising a graphics processing cluster coupled with the memory interface, the graphics processing cluster including a plurality of processing resources, a processing resource of the plurality of processing resources including a register file including a first plurality of registers associated with a first hardware thread of a plurality of hardware threads of the processing resource and a second plurality of registers associated with a second hardware thread of the plurality of hardware threads of the processing resource and first circuitry configured to facilitate access to memory on behalf of the plurality of hardware threads and store metadata for memory access requests from the plurality of hardware threads.

Type: Application

Filed: August 22, 2023

Publication date: February 27, 2025

Applicant: Intel Corporation

Inventors: Jorge Eduardo Parra Osorio, Jiasheng Chen, Supratim Pal, James Valerio
INSTRUCTION ENCODING TO IMPLEMENT INCREASED REGISTER CAPACITY PER THREAD

Publication number: 20250068423

Abstract: Described herein is a graphics processor comprising first circuitry configured to execute a decoded instruction and second circuitry configured to second circuitry configured to decode an instruction into the decoded instruction. The second circuitry is configured to determine a number of registers within a register file that are available to a thread of the processing resource and decode the instruction based on that number of registers.

Type: Application

Filed: August 22, 2023

Publication date: February 27, 2025

Applicant: Intel Corporation

Inventors: Jorge Eduardo Parra Osorio, Jiasheng Chen, Supratim Pal, Vasanth Ranganathan, Guei-Yuan Lueh, James Valerio, Pradeep Golconda, Brent Schwartz, Fangwen Fu, Sabareesh Ganapathy, Peter Caday, Wei-Yu Chen, Po-Yu Chen, Timothy Bauer, Maxim Kazakov, Stanley Gambarin, Samir Pandya
Large integer multiplication enhancements for graphics environment

Patent number: 12236238

Abstract: An apparatus to facilitate large integer multiplication enhancements in a graphics environment is disclosed. The apparatus includes a processor comprising processing resources, the processing resources comprising multiplier circuitry to: receive operands for a multiplication operation, wherein the multiplication operation is part of a chain of multiplication operations for a large integer multiplication; and issue a multiply and add (MAD) instruction for the multiplication operation utilizing at least one of a double precision multiplier or a 48 bit output, wherein the MAD instruction to generate an output in a single clock cycle of the processor.

Type: Grant

Filed: June 25, 2021

Date of Patent: February 25, 2025

Assignee: INTEL CORPORATION

Inventors: Supratim Pal, Li-An Tang, Changwon Rhee, Timothy R. Bauer, Alexander Lyashevsky, Jiasheng Chen
FLOATING-POINT CONVERSION VIA AN INTEGER UNIT

Publication number: 20250036361

Abstract: Described herein is a graphics processor comprising a memory interface and a graphics processing cluster coupled with the memory interface. The graphics processing cluster includes a multi-lane parallel floating-point unit and a multi-lane parallel integer unit. The multi-lane parallel integer unit includes an integer pipeline including a plurality of parallel integer logic units configured to perform integer compute operations on a plurality of input data elements and a format conversion pipeline including a plurality of parallel format conversion units configured to convert a plurality of input data elements from a first one of a plurality of datatype formats to a second one of the plurality of datatype formats, the plurality of datatype formats including integer and floating-point formats.

Type: Application

Filed: July 25, 2023

Publication date: January 30, 2025

Applicant: Intel Corporation

Inventors: Supratim Pal, Jiasheng Chen, Kevin Hurd, Jorge E. Parra Osorio, Christopher Spencer, Guei-Yuan Lueh, Pradeep K. Golconda, Fangwen Fu, Wei Xiong, Hongzheng Li, James Valerio, Mukundan Swaminathan, Nicholas Murphy, Shuai Mu, Clifford Gibson, Buqi Cheng
AVOIDING THE USE OF A RESULT CROSSBAR WHEN DOWN CONVERTING TO PACKED REGISTER FORMATS

Publication number: 20250036412

Abstract: Described herein is a graphics processor comprising a memory interface and a graphics processing cluster coupled with the memory interface. The graphics processing cluster includes a plurality of processing resources. A processing resource of the plurality of processing resources includes a source crossbar communicatively coupled with a register file, the source crossbar to reorder data elements of a source operand and a format conversion pipeline to convert a plurality of input data elements specified by the source operand from a first format of a plurality of datatype formats to a second format of the plurality of datatype formats, the plurality of datatype formats including integer and floating-point formats.

Type: Application

Filed: July 25, 2023

Publication date: January 30, 2025

Applicant: Intel Corporation

Inventors: Supratim Pal, Jiasheng Chen, Christopher Spencer, Jorge E. Parra Osorio, Kevin Hurd, Guei-Yuan Lueh, Pradeep K. Golconda, Fangwen Fu, Wei Xiong, Hongzheng Li, James Valerio, Mukundan Swaminathan, Nicholas Murphy, Shuai Mu, Clifford Gibson, Buqi Cheng
32-BIT CHANNEL-ALIGNED INTEGER MULTIPLICATION VIA MULTIPLE MULTIPLIERS PER-CHANNEL

Publication number: 20250037347

Abstract: Described herein is a graphics processor comprising an instruction cache and a plurality of processing elements coupled with the instruction cache. The plurality of processing elements include functional units configured to provide an integer pipeline to execute instructions to perform operations on integer data elements. The integer pipeline including a first multiplier and a second multiplier, the first multiplier and the second multiplier configured to execute operations for a single instruction.

Type: Application

Filed: July 25, 2023

Publication date: January 30, 2025

Applicant: Intel Corporation

Inventors: Jiasheng Chen, Supratim Pal, Kevin Hurd, Jorge E. Parra Osorio, Christopher Spencer, Takashi Nakagawa, Guei-Yuan Lueh, Pradeep K. Golconda, James Valerio, Mukundan Swaminathan, Nicholas Murphy, Clifford Gibson, Li-An Tang, Fangwen Fu, Kaiyu Chen, Buqi Cheng
Multiple register allocation sizes for threads

Patent number: 12210905

Abstract: Provision of multiple register allocation sizes for threads is described. An example of a system includes one or more processors including a graphics processor, the graphics processor including at least a first local thread dispatcher (TDL) and multiple processing resources, each processing resource including a plurality of registers; and memory for storage of data for processing, wherein the one or more processors are to determine a register size for a first thread; identify one or more processing resources having sufficient register space for the first thread; select a processing resource of the one or more processing resources having sufficient register space to assign the first thread; select an available thread slot of the selected processing resource for the first thread; and allocate registers of the selected processing resource for the first thread.

Type: Grant

Filed: June 25, 2021

Date of Patent: January 28, 2025

Assignee: INTEL CORPORATION

Inventors: Chandra Gurram, Wei-Yu Chen, Vikranth Vemulapalli, Subramaniam Maiyuran, Jorge Eduardo Parra Osorio, Shuai Mu, Guei-Yuan Lueh, Supratim Pal
Using sparsity metadata to reduce systolic array power consumption

Patent number: 12190158

Abstract: A processing apparatus can include a general-purpose parallel processing engine comprising a matrix accelerator including a multi-stage systolic array, where each stage includes multiple processing elements associated with multiple processing channels. The multiple processing elements are configured to receive output sparsity metadata that is independent of input sparsity of input matrix elements and perform processing operations on the input matrix elements based on the output sparsity metadata.

Type: Grant

Filed: June 25, 2021

Date of Patent: January 7, 2025

Assignee: Intel Corporation

Inventors: Jorge Parra, Supratim Pal, Jiasheng Chen, Chandra Gurram
Dual pipeline parallel systolic array

Patent number: 12189571

Abstract: A processing apparatus described herein includes a general-purpose parallel processing engine comprising a systolic array having multiple pipelines, each of the multiple pipelines including multiple pipeline stages, wherein the multiple pipelines include a first pipeline, a second pipeline, and a common input shared between the first pipeline and the second pipeline.

Type: Grant

Filed: June 25, 2021

Date of Patent: January 7, 2025

Assignee: Intel Corporation

Inventors: Jorge Parra, Jiasheng Chen, Supratim Pal, Fangwen Fu, Sabareesh Ganapathy, Chandra Gurram, Chunhui Mei, Yue Qi
SCALABLE SPARSE MATRIX MULTIPLY ACCELERATION USING SYSTOLIC ARRAYS WITH FEEDBACK INPUTS

Publication number: 20240427847

Abstract: Described herein is a graphics processor including a plurality of processing clusters coupled with a host interface, each processing cluster comprising a plurality of multiprocessors, the plurality of multiprocessors interconnected via a data interconnect, and each multiprocessor comprising sparse matrix multiply acceleration hardware including a systolic processing array with feedback inputs.

Type: Application

Filed: June 27, 2024

Publication date: December 26, 2024

Applicant: Intel Corporation

Inventors: SUBRAMANIAM MAIYURAN, JORGE PARRA, SUPRATIM PAL, ASHUTOSH GARG, SHUBRA MARWAHA, CHANDRA GURRAM, DARIN STARKEY, DURGESH BORKAR, VARGHESE GEORGE
Systolic array of arbitrary physical and logical depth

Patent number: 12174783

Abstract: A processing apparatus includes a processing resource including a general-purpose parallel processing engine and a matrix accelerator. The matrix accelerator includes first circuitry to receive a command to perform operations associated with an instruction, second circuitry to configure the matrix accelerator according to a physical depth of a systolic array within the matrix accelerator and a logical depth associated with the instruction, third circuitry to read operands for the instruction from a register file associated with the systolic array, fourth circuitry to perform operations for the instruction via one or more passes through one or more physical pipeline stages of the systolic array based on a configuration performed by the second circuitry, and fifth circuitry to write output of the operations to the register file associated with the systolic array.

Type: Grant

Filed: June 24, 2021

Date of Patent: December 24, 2024

Assignee: Intel Corporation

Inventors: Jorge Parra, Wei-yu Chen, Kaiyu Chen, Varghese George, Junjie Gu, Chandra Gurram, Guei-Yuan Lueh, Stephen Junkins, Subramaniam Maiyuran, Supratim Pal
GRAPHICS PROCESSORS AND GRAPHICS PROCESSING UNITS HAVING DOT PRODUCT ACCUMULATE INSTRUCTION FOR HYBRID FLOATING POINT FORMAT

Publication number: 20240362180

Abstract: Graphics processors and graphics processing units having dot product accumulate instructions for a hybrid floating point format are disclosed. In one embodiment, a graphics multiprocessor comprises an instruction unit to dispatch instructions and a processing resource coupled to the instruction unit. The processing resource is configured to receive a dot product accumulate instruction from the instruction unit and to process the dot product accumulate instruction using a bfloat16 number (BF16) format.

Type: Application

Filed: April 26, 2024

Publication date: October 31, 2024

Applicant: Intel Corporation

Inventors: Subramaniam Maiyuran, Shubra Marwaha, Ashutosh Garg, Supratim Pal, Jorge Parra, Chandra Gurram, Varghese George, Darin Starkey, Guei-Yuan Lueh

1 2 3 4 5 … next