Patents by Inventor Jorge PARRA

Jorge PARRA has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

SYSTOLIC ARRAY HAVING SUPPORT FOR OUTPUT SPARSITY

Publication number: 20220413803

Abstract: A processing apparatus is described herein that includes a general-purpose parallel processing engine comprising a matrix accelerator including one or more systolic arrays, at least one of the one or more systolic arrays comprising multiple pipeline stages, each pipeline stage of the multiple pipeline stages including multiple processing elements, the multiple processing elements configured to perform processing operations on input matrix elements based on output sparsity metadata. The output sparsity metadata indicates to the multiple processing elements to bypass multiplication for a first row of elements of a second matrix and multiply a second row of elements of the second matrix with a column of matrix elements of a first matrix.

Type: Application

Filed: June 25, 2021

Publication date: December 29, 2022

Applicant: Intel Corporation

Inventors: Jorge Parra, Fangwen Fu, Subramaniam Maiyuran, Varghese George, Mike Macpherson, Supratim Pal, Chandra Gurram, Sabareesh Ganapathy, Sasikanth Avancha, Dharma Teja Vooturi, Naveen Mellempudi, Dipankar Das
SYSTOLIC ARRAY OF ARBITRARY PHYSICAL AND LOGICAL DEPTH

Publication number: 20220414053

Abstract: A processing apparatus includes a processing resource including a general-purpose parallel processing engine and a matrix accelerator. The matrix accelerator includes first circuitry to receive a command to perform operations associated with an instruction, second circuitry to configure the matrix accelerator according to a physical depth of a systolic array within the matrix accelerator and a logical depth associated with the instruction, third circuitry to read operands for the instruction from a register file associated with the systolic array, fourth circuitry to perform operations for the instruction via one or more passes through one or more physical pipeline stages of the systolic array based on a configuration performed by the second circuitry, and fifth circuitry to write output of the operations to the register file associated with the systolic array.

Type: Application

Filed: June 24, 2021

Publication date: December 29, 2022

Applicant: Intel Corporation

Inventors: Jorge Parra, Wei-yu Chen, Kaiyu Chen, Varghese George, Junjie Gu, Chandra Gurram, Guei-Yuan Lueh, Stephen Junkins, Subramaniam Maiyuran, Supratim Pal
REGISTER FILE FOR SYSTOLIC ARRAY

Publication number: 20220413851

Abstract: A processing apparatus includes a general-purpose parallel processing engine including a set of multiple processing elements including a single precision floating-point unit, a double precision floating point unit, and an integer unit; a matrix accelerator including one or more systolic arrays; a first register file coupled with a first read control circuit, wherein the first read control circuit couples with the set of multiple processing elements and the matrix accelerator to arbitrate read requests to the first register file from the set of multiple processing elements and the matrix accelerator; and a second register file coupled with a second read control circuit, wherein the second read control circuit couples with the matrix accelerator to arbitrate read requests to the second register file from the matrix accelerator and limit access to the second register file by the set of multiple processing elements.

Type: Application

Filed: June 25, 2021

Publication date: December 29, 2022

Applicant: Intel Corporation

Inventors: Chandra Gurram, Wei-yu Chen, Fangwen Fu, Sabareesh Ganapathy, Varghese George, Guei-Yuan Lueh, Subramaniam Maiyuran, Mike Macpherson, Supratim Pal, Jorge Parra
USING SPARSITY METADATA TO REDUCE SYSTOLIC ARRAY POWER CONSUMPTION

Publication number: 20220413924

Abstract: A processing apparatus can include a general-purpose parallel processing engine comprising a matrix accelerator including a multi-stage systolic array, where each stage includes multiple processing elements associated with multiple processing channels. The multiple processing elements are configured to receive output sparsity metadata that is independent of input sparsity of input matrix elements and perform processing operations on the input matrix elements based on the output sparsity metadata.

Type: Application

Filed: June 25, 2021

Publication date: December 29, 2022

Applicant: Intel Corporation

Inventors: Jorge Parra, Supratim Pal, Jiasheng Chen, Chandra Gurram
DUAL PIPELINE PARALLEL SYSTOLIC ARRAY

Publication number: 20220414054

Abstract: A processing apparatus described herein includes a general-purpose parallel processing engine comprising a systolic array having multiple pipelines, each of the multiple pipelines including multiple pipeline stages, wherein the multiple pipelines include a first pipeline, a second pipeline, and a common input shared between the first pipeline and the second pipeline.

Type: Application

Filed: June 25, 2021

Publication date: December 29, 2022

Applicant: Intel Corporation

Inventors: Jorge Parra, Jiasheng Chen, Supratim Pal, Fangwen Fu, Sabareesh Ganapathy, Chandra Gurram, Chunhui Mei, Yue Qi
GRAPHICS PROCESSORS AND GRAPHICS PROCESSING UNITS HAVING DOT PRODUCT ACCUMULATE INSTRUCTION FOR HYBRID FLOATING POINT FORMAT

Publication number: 20220365901

Abstract: Described herein is a graphics processing unit (GPU) comprising a first processing cluster to perform parallel processing operations, the parallel processing operations including a ray tracing operation and a matrix multiply operation; and a second processing cluster coupled to the first processing cluster, wherein the first processing cluster includes a floating-point unit to perform floating point operations, the floating-point unit is configured to process an instruction using a bfloat16 (BF16) format with a multiplier to multiply second and third source operands while an accumulator adds a first source operand with output from the multiplier.

Type: Application

Filed: May 27, 2022

Publication date: November 17, 2022

Applicant: Intel Corporation

Inventors: Subramaniam Maiyuran, Shubra Marwaha, Ashutosh Garg, Supratim Pal, Jorge Parra, Chandra Gurram, Varghese George, Darin Starkey, Guei-Yuan Lueh
SHARING REGISTER FILE USAGE BETWEEN FUSED PROCESSING RESOURCES

Publication number: 20220206795

Abstract: Embodiments described herein provide an apparatus comprising a plurality of processing resources including a first processing resource and a second processing resource, a shared local memory communicatively coupled to the first processing resource and the second processing resource, and a processor to receive an instruction to initiate a matrix multiplication operation, write a first set of matrix data into a first set of registers, and share the first set of matrix data between the first processing resource and the second processing resource for use in the matrix multiplication operation. Other embodiments may be described and claimed.

Type: Application

Filed: January 5, 2022

Publication date: June 30, 2022

Applicant: Intel Corporation

Inventors: SUBRAMANIAM MAIYURAN, VARGHESE GEORGE, JOYDEEP RAY, ASHUTOSH GARG, JORGE PARRA, SHUBH SHAH, SHUBRA MARWAHA
Graphics processors and graphics processing units having dot product accumulate instruction for hybrid floating point format

Patent number: 11361496

Abstract: Described herein is a graphics processing unit (GPU) comprising a single instruction, multiple thread (SIMT) multiprocessor comprising an instruction cache, a shared memory coupled with the instruction cache, and circuitry coupled with the shared memory and the instruction cache, the circuitry including multiple texture units, a first core including hardware to accelerate matrix operations, and a second core configured to receive an instruction having multiple operands in a bfloat16 (BF16) number format, wherein the multiple operands include a first source operand, a second source operand, and a third source operand, and the BF16 number format is a sixteen-bit floating point format having an eight-bit exponent and process the instruction, wherein to process the instruction includes to multiply the second source operand by the third source operand and add a first source operand to a result of the multiply.

Type: Grant

Filed: June 14, 2021

Date of Patent: June 14, 2022

Assignee: Intel Corporation

Inventors: Subramaniam Maiyuran, Shubra Marwaha, Ashutosh Garg, Supratim Pal, Jorge Parra, Chandra Gurram, Varghese George, Darin Starkey, Guei-Yuan Lueh
SPARSE MATRIX MULTIPLICATION ACCELERATION MECHANISM

Publication number: 20220171827

Abstract: An apparatus to facilitate acceleration of matrix multiplication operations. The apparatus comprises a systolic array including matrix multiplication hardware to perform multiply-add operations on received matrix data comprising data from a plurality of input matrices and sparse matrix acceleration hardware to detect zero values in the matrix data and perform one or more optimizations on the matrix data to reduce multiply-add operations to be performed by the matrix multiplication hardware.

Type: Application

Filed: November 16, 2021

Publication date: June 2, 2022

Applicant: Intel Corporation

Inventors: SUBRAMANIAM MAIYURAN, MATHEW NEVIN, JORGE PARRA, ASHUTOSH GARG, SHUBRA MARWAHA, SHUBH SHAH
SCALABLE SPARSE MATRIX MULTIPLY ACCELERATION USING SYSTOLIC ARRAYS WITH FEEDBACK INPUTS

Publication number: 20220156343

Abstract: Described herein is an accelerator device including a host interface, a fabric interconnect coupled with the host interface, and one or more hardware tiles coupled with the fabric interconnect, the one or more hardware tiles including sparse matrix multiply acceleration hardware including a systolic array with feedback inputs.

Type: Application

Filed: November 16, 2021

Publication date: May 19, 2022

Applicant: Intel Corporation

Inventors: SUBRAMANIAM MAIYURAN, JORGE PARRA, SUPRATIM PAL, ASHUTOSH GARG, SHUBRA MARWAHA, CHANDRA GURRAM, DARIN STARKEY, DURGESH BORKAR, VARGHESE GEORGE
Method and apparatus for approximation using polynomials

Patent number: 11327754

Abstract: Methods and apparatus for approximation using polynomial functions are disclosed. In one embodiment, a processor comprises decoding and execution circuitry. The decoding circuitry is to decode an instruction, where the instruction comprises a first operand specifying an output location and a second operand specifying a plurality of data element values to be computed. The execution circuitry is to execute the decoded instruction. The execution includes to compute a result for each of the plurality of data element values using a polynomial function to approximate a complex function, where the computation uses coefficients stored in a lookup location for the complex function, and where data element values within different data element value ranges use different sets of coefficients. The execution further includes to store results of the computation in the output location.

Type: Grant

Filed: March 27, 2019

Date of Patent: May 10, 2022

Assignee: INTEL CORPORATION

Inventors: Jorge Parra, Dan Baum, Robert S. Chappell, Michael Espig, Varghese George, Alexander Heinecke, Christopher Hughes, Subramaniam Maiyuran, Prasoonkumar Surti, Ronen Zohar, Elmoustapha Ould-Ahmed-Vall
GRAPHICS PROCESSORS AND GRAPHICS PROCESSING UNITS HAVING DOT PRODUCT ACCUMULATE INSTRUCTION FOR HYBRID FLOATING POINT FORMAT

Publication number: 20220129266

Abstract: Graphics processors and graphics processing units having dot product accumulate instructions for a hybrid floating point format are disclosed. In one embodiment, a graphics multiprocessor comprises an instruction unit to dispatch instructions and a processing resource coupled to the instruction unit. The processing resource is configured to receive a dot product accumulate instruction from the instruction unit and to process the dot product accumulate instruction using a bfloat16 number (BF16) format.

Type: Application

Filed: March 14, 2020

Publication date: April 28, 2022

Applicant: Intel Corporation

Inventors: Subramaniam Maiyuran, Shubra Marwaha, Ashutosh Garg, Supratim Pal, Jorge Parra, Chandra Gurram, Varghese George, Darin Starkey, Guei-Yuan Lueh
COMPUTING EFFICIENT CROSS CHANNEL OPERATIONS IN PARALLEL COMPUTING MACHINES USING SYSTOLIC ARRAYS

Publication number: 20220058158

Abstract: An apparatus to facilitate computing efficient cross channel operations in parallel computing machines using systolic arrays is disclosed. The apparatus includes a plurality of registers and one or more processing elements communicably coupled to the plurality of registers. The one or more processing elements include a systolic array circuit to perform cross-channel operations on source data received from a single source register of the plurality of registers, wherein the systolic array circuit is modified to: receive inputs from the single source register at different stages of the systolic array circuit; perform cross-channel operations at channels of the systolic array circuit; bypass disabled channels of the systolic array circuit, the disabled channels not used to compute the cross-channel operations; and broadcast a final result of a final stage of the systolic array circuit to all channels of a destination register.

Type: Application

Filed: November 3, 2021

Publication date: February 24, 2022

Applicant: Intel Corporation

Inventors: Subramaniam Maiyuran, Jorge Parra, Supratim Pal, Chandra Gurram
Sharing register file usage between fused processing resources

Patent number: 11221848

Abstract: Embodiments described herein provide an apparatus comprising a plurality of processing resources including a first processing resource and a second processing resource, a shared local memory communicatively coupled to the first processing resource and the second processing resource, and a processor to receive an instruction to initiate a matrix multiplication operation, write a first set of matrix data into a first set of registers, and share the first set of matrix data between the first processing resource and the second processing resource for use in the matrix multiplication operation. Other embodiments may be described and claimed.

Type: Grant

Filed: September 25, 2019

Date of Patent: January 11, 2022

Assignee: INTEL CORPORATION

Inventors: Subramaniam Maiyuran, Varghese George, Joydeep Ray, Ashutosh Garg, Jorge Parra, Shubh Shah, Shubra Marwaha
Scalable sparse matrix multiply acceleration using systolic arrays with feedback inputs

Patent number: 11204977

Abstract: Described herein is an accelerator device including a host interface, a fabric interconnect coupled with the host interface, and one or more hardware tiles coupled with the fabric interconnect, the one or more hardware tiles including sparse matrix multiply acceleration hardware including a systolic array with feedback inputs.

Type: Grant

Filed: June 26, 2020

Date of Patent: December 21, 2021

Assignee: Intel Corporation

Inventors: Subramaniam Maiyuran, Jorge Parra, Supratim Pal, Ashutosh Garg, Shubra Marwaha, Chandra Gurram, Darin Starkey, Durgesh Borkar, Varghese George
Sparse matrix multiplication acceleration mechanism

Patent number: 11188618

Abstract: An apparatus to facilitate acceleration of matrix multiplication operations. The apparatus comprises a systolic array including matrix multiplication hardware to perform multiply-add operations on received matrix data comprising data from a plurality of input matrices and sparse matrix acceleration hardware to detect zero values in the matrix data and perform one or more optimizations on the matrix data to reduce multiply-add operations to be performed by the matrix multiplication hardware.

Type: Grant

Filed: September 5, 2019

Date of Patent: November 30, 2021

Assignee: Intel Corporation

Inventors: Subramaniam Maiyuran, Mathew Nevin, Jorge Parra, Ashutosh Garg, Shubra Marwaha, Shubh Shah
COMPUTING EFFICIENT CROSS CHANNEL OPERATIONS IN PARALLEL COMPUTING MACHINES USING SYSTOLIC ARRAYS

Publication number: 20210365402

Abstract: An apparatus to facilitate computing efficient cross channel operations in parallel computing machines using systolic arrays is disclosed. The apparatus includes a plurality of registers and one or more processing elements communicably coupled to the plurality of registers. The one or more processing elements include a systolic array circuit to perform cross-channel operations on source data received from a single source register of the plurality of registers, the systolic array circuit modified to receive inputs from the single source register and route elements of the single source register to multiple channels in the systolic array circuit.

Type: Application

Filed: June 12, 2020

Publication date: November 25, 2021

Applicant: Intel Corporation

Inventors: Subramaniam Maiyuran, Jorge Parra, Supratim Pal, Chandra Gurram
Computing efficient cross channel operations in parallel computing machines using systolic arrays

Patent number: 11182337

Abstract: An apparatus to facilitate computing efficient cross channel operations in parallel computing machines using systolic arrays is disclosed. The apparatus includes a plurality of registers and one or more processing elements communicably coupled to the plurality of registers. The one or more processing elements include a systolic array circuit to perform cross-channel operations on source data received from a single source register of the plurality of registers, the systolic array circuit modified to receive inputs from the single source register and route elements of the single source register to multiple channels in the systolic array circuit.

Type: Grant

Filed: June 12, 2020

Date of Patent: November 23, 2021

Assignee: INTEL CORPORATION

Inventors: Subramaniam Maiyuran, Jorge Parra, Supratim Pal, Chandra Gurram
SCALABLE SPARSE MATRIX MULTIPLY ACCELERATION USING SYSTOLIC ARRAYS WITH FEEDBACK INPUTS

Publication number: 20210349966

Abstract: Described herein is an accelerator device including a host interface, a fabric interconnect coupled with the host interface, and one or more hardware tiles coupled with the fabric interconnect, the one or more hardware tiles including sparse matrix multiply acceleration hardware including a systolic array with feedback inputs.

Type: Application

Filed: June 26, 2020

Publication date: November 11, 2021

Applicant: Intel Corporation

Inventors: SUBRAMANIAM MAIYURAN, JORGE PARRA, SUPRATIM PAL, ASHUTOSH GARG, SHUBRA MARWAHA, CHANDRA GURRAM, DARIN STARKEY, DURGESH BORKAR, VARGHESE GEORGE
GRAPHICS PROCESSORS AND GRAPHICS PROCESSING UNITS HAVING DOT PRODUCT ACCUMULATE INSTRUCTION FOR HYBRID FLOATING POINT FORMAT

Publication number: 20210312697

Abstract: Described herein is a graphics processing unit (GPU) comprising a single instruction, multiple thread (SIMT) multiprocessor comprising an instruction cache, a shared memory coupled with the instruction cache, and circuitry coupled with the shared memory and the instruction cache, the circuitry including multiple texture units, a first core including hardware to accelerate matrix operations, and a second core configured to receive an instruction having multiple operands in a bfloat16 (BF16) number format, wherein the multiple operands include a first source operand, a second source operand, and a third source operand, and the BF16 number format is a sixteen-bit floating point format having an eight-bit exponent and process the instruction, wherein to process the instruction includes to multiply the second source operand by the third source operand and add a first source operand to a result of the multiply.

Type: Application

Filed: June 14, 2021

Publication date: October 7, 2021

Applicant: Intel Corporation

Inventors: Subramaniam Maiyuran, Shubra Marwaha, Ashutosh Garg, Supratim Pal, Jorge Parra, Chandra Gurram, Varghese George, Darin Starkey, Guei-Yuan Lueh

prev 1 2 3 next