Patents by Inventor SHUBRA MARWAHA
SHUBRA MARWAHA has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11954063Abstract: Described herein is a graphics processing unit (GPU) configured to receive an instruction having multiple operands, where the instruction is a single instruction multiple data (SIMD) instruction configured to use a bfloat16 (BF16) number format and the BF16 number format is a sixteen-bit floating point format having an eight-bit exponent. The GPU can process the instruction using the multiple operands, where to process the instruction includes to perform a multiply operation, perform an addition to a result of the multiply operation, and apply a rectified linear unit function to a result of the addition.Type: GrantFiled: February 17, 2023Date of Patent: April 9, 2024Assignee: Intel CorporationInventors: Subramaniam Maiyuran, Shubra Marwaha, Ashutosh Garg, Supratim Pal, Jorge Parra, Chandra Gurram, Varghese George, Darin Starkey, Guei-Yuan Lueh
-
Publication number: 20240053985Abstract: Embodiments described herein provide an apparatus comprising a plurality of processing resources including a first processing resource and a second processing resource, a shared local memory communicatively coupled to the first processing resource and the second processing resource, and a processor to receive an instruction to initiate a matrix multiplication operation, write a first set of matrix data into a first set of registers, and share the first set of matrix data between the first processing resource and the second processing resource for use in the matrix multiplication operation. Other embodiments may be described and claimed.Type: ApplicationFiled: October 11, 2023Publication date: February 15, 2024Applicant: Intel CorporationInventors: SUBRAMANIAM MAIYURAN, VARGHESE GEORGE, JOYDEEP RAY, ASHUTOSH GARG, JORGE PARRA, SHUBH SHAH, SHUBRA MARWAHA
-
Publication number: 20230297373Abstract: Embodiments described herein provided for an instruction and associated logic to enable GPGPU program code to access special purpose hardware logic to accelerate dot product operations. One embodiment provides for a graphics processing unit comprising a fetch unit to fetch a single instruction for execution, a decode unit to decode the single instruction into a decoded instruction, wherein the decoded instruction is to cause the graphics processing unit to perform a set of parallel dot product operations on elements of input matrices, and a systolic dot product unit to execute the decoded instruction across one or more parallel processor lanes using multiple systolic layers associated with multiple pipeline stages. The multiple pipeline stages include one or more sets of interconnected multipliers and adders to compute multiple concurrent dot products.Type: ApplicationFiled: April 26, 2023Publication date: September 21, 2023Applicant: Intel CorporationInventors: SUBRAMANIAM MAIYURAN, GUEI-YUAN LUEH, SUPRATIM PAL, ASHUTOSH GARG, CHANDRA S. GURRAM, JORGE E. PARRA, JUNJIE GU, KONRAD TRIFUNOVIC, HONG BIN LIAO, MIKE B. MACPHERSON, SHUBH B. SHAH, SHUBRA MARWAHA, STEPHEN JUNKINS, TIMOTHY R. BAUER, VARGHESE GEORGE, WEIYU CHEN
-
Publication number: 20230281272Abstract: Described herein is a graphics processor including a plurality of processing clusters coupled with a host interface, each processing cluster comprising a plurality of multiprocessors, the plurality of multiprocessors interconnected via a data interconnect, and each multiprocessor comprising sparse matrix multiply acceleration hardware including a systolic processing array with feedback inputs.Type: ApplicationFiled: April 17, 2023Publication date: September 7, 2023Applicant: Intel CorporationInventors: SUBRAMANIAM MAIYURAN, JORGE PARRA, SUPRATIM PAL, ASHUTOSH GARG, SHUBRA MARWAHA, CHANDRA GURRAM, DARIN STARKEY, DURGESH BORKAR, VARGHESE GEORGE
-
Patent number: 11709793Abstract: Described herein is a graphics processing unit (GPU) comprising a first processing cluster to perform parallel processing operations, the parallel processing operations including a ray tracing operation and a matrix multiply operation; and a second processing cluster coupled to the first processing cluster, wherein the first processing cluster includes a floating-point unit to perform floating point operations, the floating-point unit is configured to process an instruction using a bfloat16 (BF16) format with a multiplier to multiply second and third source operands while an accumulator adds a first source operand with output from the multiplier.Type: GrantFiled: May 27, 2022Date of Patent: July 25, 2023Assignee: Intel CorporationInventors: Subramaniam Maiyuran, Shubra Marwaha, Ashutosh Garg, Supratim Pal, Jorge Parra, Chandra Gurram, Varghese George, Darin Starkey, Guei-Yuan Lueh
-
Publication number: 20230195685Abstract: Described herein is a graphics processing unit (GPU) configured to receive an instruction having multiple operands, where the instruction is a single instruction multiple data (SIMD) instruction configured to use a bfloat16 (BF16) number format and the BF16 number format is a sixteen-bit floating point format having an eight-bit exponent. The GPU can process the instruction using the multiple operands, where to process the instruction includes to perform a multiply operation, perform an addition to a result of the multiply operation, and apply a rectified linear unit function to a result of the addition.Type: ApplicationFiled: February 17, 2023Publication date: June 22, 2023Applicant: Intel CorporationInventors: Subramaniam Maiyuran, Shubra Marwaha, Ashutosh Garg, Supratim Pal, Jorge Parra, Chandra Gurram, Varghese George, Darin Starkey, Guei-Yuan Lueh
-
Patent number: 11640297Abstract: Embodiments described herein provided for an instruction and associated logic to enable GPGPU program code to access special purpose hardware logic to accelerate dot product operations. One embodiment provides for a graphics processing unit comprising a fetch unit to fetch an instruction for execution and a decode unit to decode the instruction into a decoded instruction. The decoded instruction is a matrix instruction to cause the graphics processing unit to perform a parallel dot product operation. The GPGPU also includes systolic dot product circuitry to execute the decoded instruction across one or more SIMD lanes using multiple systolic layers, wherein to execute the decoded instruction, a dot product computed at a first systolic layer is to be output to a second systolic layer, wherein each systolic layer includes one or more sets of interconnected multipliers and adders, each set of multipliers and adders to generate a dot product.Type: GrantFiled: June 15, 2021Date of Patent: May 2, 2023Assignee: Intel CorporationInventors: Subramaniam Maiyuran, Guei-Yuan Lueh, Supratim Pal, Ashutosh Garg, Chandra S. Gurram, Jorge E. Parra, Junjie Gu, Konrad Trifunovic, Hong Bin Liao, Mike B. MacPherson, Shubh B. Shah, Shubra Marwaha, Stephen Junkins, Timothy R. Bauer, Varghese George, Weiyu Chen
-
Patent number: 11636174Abstract: Described herein is an accelerator device including a host interface, a fabric interconnect coupled with the host interface, and one or more hardware tiles coupled with the fabric interconnect, the one or more hardware tiles including sparse matrix multiply acceleration hardware including a systolic array with feedback inputs.Type: GrantFiled: November 16, 2021Date of Patent: April 25, 2023Assignee: Intel CorporationInventors: Subramaniam Maiyuran, Jorge Parra, Supratim Pal, Ashutosh Garg, Shubra Marwaha, Chandra Gurram, Darin Starkey, Durgesh Borkar, Varghese George
-
Publication number: 20220365901Abstract: Described herein is a graphics processing unit (GPU) comprising a first processing cluster to perform parallel processing operations, the parallel processing operations including a ray tracing operation and a matrix multiply operation; and a second processing cluster coupled to the first processing cluster, wherein the first processing cluster includes a floating-point unit to perform floating point operations, the floating-point unit is configured to process an instruction using a bfloat16 (BF16) format with a multiplier to multiply second and third source operands while an accumulator adds a first source operand with output from the multiplier.Type: ApplicationFiled: May 27, 2022Publication date: November 17, 2022Applicant: Intel CorporationInventors: Subramaniam Maiyuran, Shubra Marwaha, Ashutosh Garg, Supratim Pal, Jorge Parra, Chandra Gurram, Varghese George, Darin Starkey, Guei-Yuan Lueh
-
Patent number: 11416580Abstract: An apparatus to facilitate matrix multiplication operations. The apparatus comprises multiplication hardware to operate in a dot product mode, wherein a multiplication stage included in the multiplication hardware is configured as a dot product of a number of bit vectors (N) to perform N×N multiplication operations on a plurality of multiplicands and perform addition operations on results of the N×N multiplication operations.Type: GrantFiled: November 13, 2019Date of Patent: August 16, 2022Assignee: Intel CorporationInventors: Nevin Mathew, Shubra Marwaha, Ashutosh Garg
-
Publication number: 20220206795Abstract: Embodiments described herein provide an apparatus comprising a plurality of processing resources including a first processing resource and a second processing resource, a shared local memory communicatively coupled to the first processing resource and the second processing resource, and a processor to receive an instruction to initiate a matrix multiplication operation, write a first set of matrix data into a first set of registers, and share the first set of matrix data between the first processing resource and the second processing resource for use in the matrix multiplication operation. Other embodiments may be described and claimed.Type: ApplicationFiled: January 5, 2022Publication date: June 30, 2022Applicant: Intel CorporationInventors: SUBRAMANIAM MAIYURAN, VARGHESE GEORGE, JOYDEEP RAY, ASHUTOSH GARG, JORGE PARRA, SHUBH SHAH, SHUBRA MARWAHA
-
Patent number: 11361496Abstract: Described herein is a graphics processing unit (GPU) comprising a single instruction, multiple thread (SIMT) multiprocessor comprising an instruction cache, a shared memory coupled with the instruction cache, and circuitry coupled with the shared memory and the instruction cache, the circuitry including multiple texture units, a first core including hardware to accelerate matrix operations, and a second core configured to receive an instruction having multiple operands in a bfloat16 (BF16) number format, wherein the multiple operands include a first source operand, a second source operand, and a third source operand, and the BF16 number format is a sixteen-bit floating point format having an eight-bit exponent and process the instruction, wherein to process the instruction includes to multiply the second source operand by the third source operand and add a first source operand to a result of the multiply.Type: GrantFiled: June 14, 2021Date of Patent: June 14, 2022Assignee: Intel CorporationInventors: Subramaniam Maiyuran, Shubra Marwaha, Ashutosh Garg, Supratim Pal, Jorge Parra, Chandra Gurram, Varghese George, Darin Starkey, Guei-Yuan Lueh
-
Publication number: 20220171827Abstract: An apparatus to facilitate acceleration of matrix multiplication operations. The apparatus comprises a systolic array including matrix multiplication hardware to perform multiply-add operations on received matrix data comprising data from a plurality of input matrices and sparse matrix acceleration hardware to detect zero values in the matrix data and perform one or more optimizations on the matrix data to reduce multiply-add operations to be performed by the matrix multiplication hardware.Type: ApplicationFiled: November 16, 2021Publication date: June 2, 2022Applicant: Intel CorporationInventors: SUBRAMANIAM MAIYURAN, MATHEW NEVIN, JORGE PARRA, ASHUTOSH GARG, SHUBRA MARWAHA, SHUBH SHAH
-
Publication number: 20220156343Abstract: Described herein is an accelerator device including a host interface, a fabric interconnect coupled with the host interface, and one or more hardware tiles coupled with the fabric interconnect, the one or more hardware tiles including sparse matrix multiply acceleration hardware including a systolic array with feedback inputs.Type: ApplicationFiled: November 16, 2021Publication date: May 19, 2022Applicant: Intel CorporationInventors: SUBRAMANIAM MAIYURAN, JORGE PARRA, SUPRATIM PAL, ASHUTOSH GARG, SHUBRA MARWAHA, CHANDRA GURRAM, DARIN STARKEY, DURGESH BORKAR, VARGHESE GEORGE
-
Publication number: 20220129266Abstract: Graphics processors and graphics processing units having dot product accumulate instructions for a hybrid floating point format are disclosed. In one embodiment, a graphics multiprocessor comprises an instruction unit to dispatch instructions and a processing resource coupled to the instruction unit. The processing resource is configured to receive a dot product accumulate instruction from the instruction unit and to process the dot product accumulate instruction using a bfloat16 number (BF16) format.Type: ApplicationFiled: March 14, 2020Publication date: April 28, 2022Applicant: Intel CorporationInventors: Subramaniam Maiyuran, Shubra Marwaha, Ashutosh Garg, Supratim Pal, Jorge Parra, Chandra Gurram, Varghese George, Darin Starkey, Guei-Yuan Lueh
-
Patent number: 11221848Abstract: Embodiments described herein provide an apparatus comprising a plurality of processing resources including a first processing resource and a second processing resource, a shared local memory communicatively coupled to the first processing resource and the second processing resource, and a processor to receive an instruction to initiate a matrix multiplication operation, write a first set of matrix data into a first set of registers, and share the first set of matrix data between the first processing resource and the second processing resource for use in the matrix multiplication operation. Other embodiments may be described and claimed.Type: GrantFiled: September 25, 2019Date of Patent: January 11, 2022Assignee: INTEL CORPORATIONInventors: Subramaniam Maiyuran, Varghese George, Joydeep Ray, Ashutosh Garg, Jorge Parra, Shubh Shah, Shubra Marwaha
-
Patent number: 11204977Abstract: Described herein is an accelerator device including a host interface, a fabric interconnect coupled with the host interface, and one or more hardware tiles coupled with the fabric interconnect, the one or more hardware tiles including sparse matrix multiply acceleration hardware including a systolic array with feedback inputs.Type: GrantFiled: June 26, 2020Date of Patent: December 21, 2021Assignee: Intel CorporationInventors: Subramaniam Maiyuran, Jorge Parra, Supratim Pal, Ashutosh Garg, Shubra Marwaha, Chandra Gurram, Darin Starkey, Durgesh Borkar, Varghese George
-
Patent number: 11188618Abstract: An apparatus to facilitate acceleration of matrix multiplication operations. The apparatus comprises a systolic array including matrix multiplication hardware to perform multiply-add operations on received matrix data comprising data from a plurality of input matrices and sparse matrix acceleration hardware to detect zero values in the matrix data and perform one or more optimizations on the matrix data to reduce multiply-add operations to be performed by the matrix multiplication hardware.Type: GrantFiled: September 5, 2019Date of Patent: November 30, 2021Assignee: Intel CorporationInventors: Subramaniam Maiyuran, Mathew Nevin, Jorge Parra, Ashutosh Garg, Shubra Marwaha, Shubh Shah
-
Publication number: 20210349966Abstract: Described herein is an accelerator device including a host interface, a fabric interconnect coupled with the host interface, and one or more hardware tiles coupled with the fabric interconnect, the one or more hardware tiles including sparse matrix multiply acceleration hardware including a systolic array with feedback inputs.Type: ApplicationFiled: June 26, 2020Publication date: November 11, 2021Applicant: Intel CorporationInventors: SUBRAMANIAM MAIYURAN, JORGE PARRA, SUPRATIM PAL, ASHUTOSH GARG, SHUBRA MARWAHA, CHANDRA GURRAM, DARIN STARKEY, DURGESH BORKAR, VARGHESE GEORGE
-
Publication number: 20210312697Abstract: Described herein is a graphics processing unit (GPU) comprising a single instruction, multiple thread (SIMT) multiprocessor comprising an instruction cache, a shared memory coupled with the instruction cache, and circuitry coupled with the shared memory and the instruction cache, the circuitry including multiple texture units, a first core including hardware to accelerate matrix operations, and a second core configured to receive an instruction having multiple operands in a bfloat16 (BF16) number format, wherein the multiple operands include a first source operand, a second source operand, and a third source operand, and the BF16 number format is a sixteen-bit floating point format having an eight-bit exponent and process the instruction, wherein to process the instruction includes to multiply the second source operand by the third source operand and add a first source operand to a result of the multiply.Type: ApplicationFiled: June 14, 2021Publication date: October 7, 2021Applicant: Intel CorporationInventors: Subramaniam Maiyuran, Shubra Marwaha, Ashutosh Garg, Supratim Pal, Jorge Parra, Chandra Gurram, Varghese George, Darin Starkey, Guei-Yuan Lueh