Patents by Inventor Chandra Gurram
Chandra Gurram has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240362180Abstract: Graphics processors and graphics processing units having dot product accumulate instructions for a hybrid floating point format are disclosed. In one embodiment, a graphics multiprocessor comprises an instruction unit to dispatch instructions and a processing resource coupled to the instruction unit. The processing resource is configured to receive a dot product accumulate instruction from the instruction unit and to process the dot product accumulate instruction using a bfloat16 number (BF16) format.Type: ApplicationFiled: April 26, 2024Publication date: October 31, 2024Applicant: Intel CorporationInventors: Subramaniam Maiyuran, Shubra Marwaha, Ashutosh Garg, Supratim Pal, Jorge Parra, Chandra Gurram, Varghese George, Darin Starkey, Guei-Yuan Lueh
-
Publication number: 20240320000Abstract: An apparatus to facilitate utilizing structured sparsity in systolic arrays is disclosed. The apparatus includes a processor comprising a systolic array to receive data from a plurality of source registers, the data comprising unpacked source data, structured source data that is packed based on sparsity, and metadata corresponding to the structured source data; identify portions of the unpacked source data to multiply with the structured source data, the portions of the unpacked source data identified based on the metadata; and output, to a destination register, a result of multiplication of the portions of the unpacked source data and the structured source data.Type: ApplicationFiled: March 29, 2024Publication date: September 26, 2024Applicant: Intel CorporationInventors: Subramaniam Maiyuran, Jorge Parra, Ashutosh Garg, Chandra Gurram, Chunhui Mei, Durgesh Borkar, Shubra Marwaha, Supratim Pal, Varghese George, Wei Xiong, Yan Li, Yongsheng Liu, Dipankar Das, Sasikanth Avancha, Dharma Teja Vooturi, Naveen K. Mellempudi
-
Patent number: 12093213Abstract: An apparatus to facilitate computing efficient cross channel operations in parallel computing machines using systolic arrays is disclosed. The apparatus includes a plurality of registers and one or more processing elements communicably coupled to the plurality of registers. The one or more processing elements include a systolic array circuit to perform cross-channel operations on source data received from a single source register of the plurality of registers, wherein the systolic array circuit is modified to: receive inputs from the single source register at different stages of the systolic array circuit; perform cross-channel operations at channels of the systolic array circuit; bypass disabled channels of the systolic array circuit, the disabled channels not used to compute the cross-channel operations; and broadcast a final result of a final stage of the systolic array circuit to all channels of a destination register.Type: GrantFiled: May 1, 2023Date of Patent: September 17, 2024Assignee: INTEL CORPORATIONInventors: Subramaniam Maiyuran, Jorge Parra, Supratim Pal, Chandra Gurram
-
Patent number: 12039001Abstract: Described herein is a graphics processor including a plurality of processing clusters coupled with a host interface, each processing cluster comprising a plurality of multiprocessors, the plurality of multiprocessors interconnected via a data interconnect, and each multiprocessor comprising sparse matrix multiply acceleration hardware including a systolic processing array with feedback inputs.Type: GrantFiled: April 17, 2023Date of Patent: July 16, 2024Assignee: Intel CorporationInventors: Subramaniam Maiyuran, Jorge Parra, Supratim Pal, Ashutosh Garg, Shubra Marwaha, Chandra Gurram, Darin Starkey, Durgesh Borkar, Varghese George
-
Patent number: 12007935Abstract: Graphics processors and graphics processing units having dot product accumulate instructions for a hybrid floating point format are disclosed. In one embodiment, a graphics multiprocessor comprises an instruction unit to dispatch instructions and a processing resource coupled to the instruction unit. The processing resource is configured to receive a dot product accumulate instruction from the instruction unit and to process the dot product accumulate instruction using a bfloat16 number (BF16) format.Type: GrantFiled: March 14, 2020Date of Patent: June 11, 2024Assignee: Intel CorporationInventors: Subramaniam Maiyuran, Shubra Marwaha, Ashutosh Garg, Supratim Pal, Jorge Parra, Chandra Gurram, Varghese George, Darin Starkey, Guei-Yuan Lueh
-
Patent number: 11977885Abstract: An apparatus to facilitate utilizing structured sparsity in systolic arrays is disclosed. The apparatus includes a processor comprising a systolic array to receive data from a plurality of source registers, the data comprising unpacked source data, structured source data that is packed based on sparsity, and metadata corresponding to the structured source data; identify portions of the unpacked source data to multiply with the structured source data, the portions of the unpacked source data identified based on the metadata; and output, to a destination register, a result of multiplication of the portions of the unpacked source data and the structured source data.Type: GrantFiled: November 30, 2020Date of Patent: May 7, 2024Assignee: INTEL CORPORATIONInventors: Subramaniam Maiyuran, Jorge Parra, Ashutosh Garg, Chandra Gurram, Chunhui Mei, Durgesh Borkar, Shubra Marwaha, Supratim Pal, Varghese George, Wei Xiong, Yan Li, Yongsheng Liu, Dipankar Das, Sasikanth Avancha, Dharma Teja Vooturi, Naveen K. Mellempudi
-
Patent number: 11954063Abstract: Described herein is a graphics processing unit (GPU) configured to receive an instruction having multiple operands, where the instruction is a single instruction multiple data (SIMD) instruction configured to use a bfloat16 (BF16) number format and the BF16 number format is a sixteen-bit floating point format having an eight-bit exponent. The GPU can process the instruction using the multiple operands, where to process the instruction includes to perform a multiply operation, perform an addition to a result of the multiply operation, and apply a rectified linear unit function to a result of the addition.Type: GrantFiled: February 17, 2023Date of Patent: April 9, 2024Assignee: Intel CorporationInventors: Subramaniam Maiyuran, Shubra Marwaha, Ashutosh Garg, Supratim Pal, Jorge Parra, Chandra Gurram, Varghese George, Darin Starkey, Guei-Yuan Lueh
-
Publication number: 20230367740Abstract: An apparatus to facilitate computing efficient cross channel operations in parallel computing machines using systolic arrays is disclosed. The apparatus includes a plurality of registers and one or more processing elements communicably coupled to the plurality of registers. The one or more processing elements include a systolic array circuit to perform cross-channel operations on source data received from a single source register of the plurality of registers, wherein the systolic array circuit is modified to: receive inputs from the single source register at different stages of the systolic array circuit; perform cross-channel operations at channels of the systolic array circuit; bypass disabled channels of the systolic array circuit, the disabled channels not used to compute the cross-channel operations; and broadcast a final result of a final stage of the systolic array circuit to all channels of a destination register.Type: ApplicationFiled: May 1, 2023Publication date: November 16, 2023Applicant: Intel CorporationInventors: Subramaniam Maiyuran, Jorge Parra, Supratim Pal, Chandra Gurram
-
Publication number: 20230281272Abstract: Described herein is a graphics processor including a plurality of processing clusters coupled with a host interface, each processing cluster comprising a plurality of multiprocessors, the plurality of multiprocessors interconnected via a data interconnect, and each multiprocessor comprising sparse matrix multiply acceleration hardware including a systolic processing array with feedback inputs.Type: ApplicationFiled: April 17, 2023Publication date: September 7, 2023Applicant: Intel CorporationInventors: SUBRAMANIAM MAIYURAN, JORGE PARRA, SUPRATIM PAL, ASHUTOSH GARG, SHUBRA MARWAHA, CHANDRA GURRAM, DARIN STARKEY, DURGESH BORKAR, VARGHESE GEORGE
-
Patent number: 11709793Abstract: Described herein is a graphics processing unit (GPU) comprising a first processing cluster to perform parallel processing operations, the parallel processing operations including a ray tracing operation and a matrix multiply operation; and a second processing cluster coupled to the first processing cluster, wherein the first processing cluster includes a floating-point unit to perform floating point operations, the floating-point unit is configured to process an instruction using a bfloat16 (BF16) format with a multiplier to multiply second and third source operands while an accumulator adds a first source operand with output from the multiplier.Type: GrantFiled: May 27, 2022Date of Patent: July 25, 2023Assignee: Intel CorporationInventors: Subramaniam Maiyuran, Shubra Marwaha, Ashutosh Garg, Supratim Pal, Jorge Parra, Chandra Gurram, Varghese George, Darin Starkey, Guei-Yuan Lueh
-
Publication number: 20230195685Abstract: Described herein is a graphics processing unit (GPU) configured to receive an instruction having multiple operands, where the instruction is a single instruction multiple data (SIMD) instruction configured to use a bfloat16 (BF16) number format and the BF16 number format is a sixteen-bit floating point format having an eight-bit exponent. The GPU can process the instruction using the multiple operands, where to process the instruction includes to perform a multiply operation, perform an addition to a result of the multiply operation, and apply a rectified linear unit function to a result of the addition.Type: ApplicationFiled: February 17, 2023Publication date: June 22, 2023Applicant: Intel CorporationInventors: Subramaniam Maiyuran, Shubra Marwaha, Ashutosh Garg, Supratim Pal, Jorge Parra, Chandra Gurram, Varghese George, Darin Starkey, Guei-Yuan Lueh
-
Patent number: 11669490Abstract: An apparatus to facilitate computing efficient cross channel operations in parallel computing machines using systolic arrays is disclosed. The apparatus includes a plurality of registers and one or more processing elements communicably coupled to the plurality of registers. The one or more processing elements include a systolic array circuit to perform cross-channel operations on source data received from a single source register of the plurality of registers, wherein the systolic array circuit is modified to: receive inputs from the single source register at different stages of the systolic array circuit; perform cross-channel operations at channels of the systolic array circuit; bypass disabled channels of the systolic array circuit, the disabled channels not used to compute the cross-channel operations; and broadcast a final result of a final stage of the systolic array circuit to all channels of a destination register.Type: GrantFiled: November 3, 2021Date of Patent: June 6, 2023Assignee: INTEL CORPORATIONInventors: Subramaniam Maiyuran, Jorge Parra, Supratim Pal, Chandra Gurram
-
Patent number: 11636174Abstract: Described herein is an accelerator device including a host interface, a fabric interconnect coupled with the host interface, and one or more hardware tiles coupled with the fabric interconnect, the one or more hardware tiles including sparse matrix multiply acceleration hardware including a systolic array with feedback inputs.Type: GrantFiled: November 16, 2021Date of Patent: April 25, 2023Assignee: Intel CorporationInventors: Subramaniam Maiyuran, Jorge Parra, Supratim Pal, Ashutosh Garg, Shubra Marwaha, Chandra Gurram, Darin Starkey, Durgesh Borkar, Varghese George
-
Publication number: 20230088743Abstract: An apparatus to facilitate gathering payload from arbitrary registers for send messages in a graphics environment is disclosed. The apparatus includes processing resources comprising execution circuitry to receive a send gather message instruction identifying a number of registers to access for a send message and identifying IDs of a plurality of individual registers corresponding to the number of registers; decode a first phase of the send gather message instruction; based on decoding the first phase, cause a second phase of the send gather message instruction to bypass an instruction decode stage; and dispatch the first phase subsequently followed by dispatch of the second phase to a send pipeline. The apparatus can also perform an immediate move of the IDs of the plurality of individual registers to an architectural register of the execution circuitry and include a pointer to the architectural register in the send gather message instruction.Type: ApplicationFiled: September 22, 2021Publication date: March 23, 2023Applicant: Intel CorporationInventors: Supratim Pal, Chandra Gurram, Fan-Yin Tzeng, Subramaniam Maiyuran, Guei-Yuan Lueh, Timothy R. Bauer, Vikranth Vemulapalli, Wei-Yu Chen
-
Publication number: 20220413924Abstract: A processing apparatus can include a general-purpose parallel processing engine comprising a matrix accelerator including a multi-stage systolic array, where each stage includes multiple processing elements associated with multiple processing channels. The multiple processing elements are configured to receive output sparsity metadata that is independent of input sparsity of input matrix elements and perform processing operations on the input matrix elements based on the output sparsity metadata.Type: ApplicationFiled: June 25, 2021Publication date: December 29, 2022Applicant: Intel CorporationInventors: Jorge Parra, Supratim Pal, Jiasheng Chen, Chandra Gurram
-
Publication number: 20220413803Abstract: A processing apparatus is described herein that includes a general-purpose parallel processing engine comprising a matrix accelerator including one or more systolic arrays, at least one of the one or more systolic arrays comprising multiple pipeline stages, each pipeline stage of the multiple pipeline stages including multiple processing elements, the multiple processing elements configured to perform processing operations on input matrix elements based on output sparsity metadata. The output sparsity metadata indicates to the multiple processing elements to bypass multiplication for a first row of elements of a second matrix and multiply a second row of elements of the second matrix with a column of matrix elements of a first matrix.Type: ApplicationFiled: June 25, 2021Publication date: December 29, 2022Applicant: Intel CorporationInventors: Jorge Parra, Fangwen Fu, Subramaniam Maiyuran, Varghese George, Mike Macpherson, Supratim Pal, Chandra Gurram, Sabareesh Ganapathy, Sasikanth Avancha, Dharma Teja Vooturi, Naveen Mellempudi, Dipankar Das
-
Publication number: 20220414054Abstract: A processing apparatus described herein includes a general-purpose parallel processing engine comprising a systolic array having multiple pipelines, each of the multiple pipelines including multiple pipeline stages, wherein the multiple pipelines include a first pipeline, a second pipeline, and a common input shared between the first pipeline and the second pipeline.Type: ApplicationFiled: June 25, 2021Publication date: December 29, 2022Applicant: Intel CorporationInventors: Jorge Parra, Jiasheng Chen, Supratim Pal, Fangwen Fu, Sabareesh Ganapathy, Chandra Gurram, Chunhui Mei, Yue Qi
-
Publication number: 20220414053Abstract: A processing apparatus includes a processing resource including a general-purpose parallel processing engine and a matrix accelerator. The matrix accelerator includes first circuitry to receive a command to perform operations associated with an instruction, second circuitry to configure the matrix accelerator according to a physical depth of a systolic array within the matrix accelerator and a logical depth associated with the instruction, third circuitry to read operands for the instruction from a register file associated with the systolic array, fourth circuitry to perform operations for the instruction via one or more passes through one or more physical pipeline stages of the systolic array based on a configuration performed by the second circuitry, and fifth circuitry to write output of the operations to the register file associated with the systolic array.Type: ApplicationFiled: June 24, 2021Publication date: December 29, 2022Applicant: Intel CorporationInventors: Jorge Parra, Wei-yu Chen, Kaiyu Chen, Varghese George, Junjie Gu, Chandra Gurram, Guei-Yuan Lueh, Stephen Junkins, Subramaniam Maiyuran, Supratim Pal
-
Publication number: 20220413916Abstract: Provision of multiple register allocation sizes for threads is described. An example of a system includes one or more processors including a graphics processor, the graphics processor including at least a first local thread dispatcher (TDL) and multiple processing resources, each processing resource including a plurality of registers; and memory for storage of data for processing, wherein the one or more processors are to determine a register size for a first thread; identify one or more processing resources having sufficient register space for the first thread; select a processing resource of the one or more processing resources having sufficient register space to assign the first thread; select an available thread slot of the selected processing resource for the first thread; and allocate registers of the selected processing resource for the first thread.Type: ApplicationFiled: June 25, 2021Publication date: December 29, 2022Applicant: Intel CorporationInventors: Chandra Gurram, Wei-Yu Chen, Vikranth Vemulapalli, Subramaniam Maiyuran, Jorge Eduardo Parra Osorio, Shuai Mu, Guei-Yuan Lueh, Supratim Pal
-
Publication number: 20220413851Abstract: A processing apparatus includes a general-purpose parallel processing engine including a set of multiple processing elements including a single precision floating-point unit, a double precision floating point unit, and an integer unit; a matrix accelerator including one or more systolic arrays; a first register file coupled with a first read control circuit, wherein the first read control circuit couples with the set of multiple processing elements and the matrix accelerator to arbitrate read requests to the first register file from the set of multiple processing elements and the matrix accelerator; and a second register file coupled with a second read control circuit, wherein the second read control circuit couples with the matrix accelerator to arbitrate read requests to the second register file from the matrix accelerator and limit access to the second register file by the set of multiple processing elements.Type: ApplicationFiled: June 25, 2021Publication date: December 29, 2022Applicant: Intel CorporationInventors: Chandra Gurram, Wei-yu Chen, Fangwen Fu, Sabareesh Ganapathy, Varghese George, Guei-Yuan Lueh, Subramaniam Maiyuran, Mike Macpherson, Supratim Pal, Jorge Parra