Patents by Inventor Jorge PARRA

Jorge PARRA has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20220171827
    Abstract: An apparatus to facilitate acceleration of matrix multiplication operations. The apparatus comprises a systolic array including matrix multiplication hardware to perform multiply-add operations on received matrix data comprising data from a plurality of input matrices and sparse matrix acceleration hardware to detect zero values in the matrix data and perform one or more optimizations on the matrix data to reduce multiply-add operations to be performed by the matrix multiplication hardware.
    Type: Application
    Filed: November 16, 2021
    Publication date: June 2, 2022
    Applicant: Intel Corporation
    Inventors: SUBRAMANIAM MAIYURAN, MATHEW NEVIN, JORGE PARRA, ASHUTOSH GARG, SHUBRA MARWAHA, SHUBH SHAH
  • Publication number: 20220156343
    Abstract: Described herein is an accelerator device including a host interface, a fabric interconnect coupled with the host interface, and one or more hardware tiles coupled with the fabric interconnect, the one or more hardware tiles including sparse matrix multiply acceleration hardware including a systolic array with feedback inputs.
    Type: Application
    Filed: November 16, 2021
    Publication date: May 19, 2022
    Applicant: Intel Corporation
    Inventors: SUBRAMANIAM MAIYURAN, JORGE PARRA, SUPRATIM PAL, ASHUTOSH GARG, SHUBRA MARWAHA, CHANDRA GURRAM, DARIN STARKEY, DURGESH BORKAR, VARGHESE GEORGE
  • Patent number: 11327754
    Abstract: Methods and apparatus for approximation using polynomial functions are disclosed. In one embodiment, a processor comprises decoding and execution circuitry. The decoding circuitry is to decode an instruction, where the instruction comprises a first operand specifying an output location and a second operand specifying a plurality of data element values to be computed. The execution circuitry is to execute the decoded instruction. The execution includes to compute a result for each of the plurality of data element values using a polynomial function to approximate a complex function, where the computation uses coefficients stored in a lookup location for the complex function, and where data element values within different data element value ranges use different sets of coefficients. The execution further includes to store results of the computation in the output location.
    Type: Grant
    Filed: March 27, 2019
    Date of Patent: May 10, 2022
    Assignee: INTEL CORPORATION
    Inventors: Jorge Parra, Dan Baum, Robert S. Chappell, Michael Espig, Varghese George, Alexander Heinecke, Christopher Hughes, Subramaniam Maiyuran, Prasoonkumar Surti, Ronen Zohar, Elmoustapha Ould-Ahmed-Vall
  • Publication number: 20220129266
    Abstract: Graphics processors and graphics processing units having dot product accumulate instructions for a hybrid floating point format are disclosed. In one embodiment, a graphics multiprocessor comprises an instruction unit to dispatch instructions and a processing resource coupled to the instruction unit. The processing resource is configured to receive a dot product accumulate instruction from the instruction unit and to process the dot product accumulate instruction using a bfloat16 number (BF16) format.
    Type: Application
    Filed: March 14, 2020
    Publication date: April 28, 2022
    Applicant: Intel Corporation
    Inventors: Subramaniam Maiyuran, Shubra Marwaha, Ashutosh Garg, Supratim Pal, Jorge Parra, Chandra Gurram, Varghese George, Darin Starkey, Guei-Yuan Lueh
  • Publication number: 20220058158
    Abstract: An apparatus to facilitate computing efficient cross channel operations in parallel computing machines using systolic arrays is disclosed. The apparatus includes a plurality of registers and one or more processing elements communicably coupled to the plurality of registers. The one or more processing elements include a systolic array circuit to perform cross-channel operations on source data received from a single source register of the plurality of registers, wherein the systolic array circuit is modified to: receive inputs from the single source register at different stages of the systolic array circuit; perform cross-channel operations at channels of the systolic array circuit; bypass disabled channels of the systolic array circuit, the disabled channels not used to compute the cross-channel operations; and broadcast a final result of a final stage of the systolic array circuit to all channels of a destination register.
    Type: Application
    Filed: November 3, 2021
    Publication date: February 24, 2022
    Applicant: Intel Corporation
    Inventors: Subramaniam Maiyuran, Jorge Parra, Supratim Pal, Chandra Gurram
  • Patent number: 11221848
    Abstract: Embodiments described herein provide an apparatus comprising a plurality of processing resources including a first processing resource and a second processing resource, a shared local memory communicatively coupled to the first processing resource and the second processing resource, and a processor to receive an instruction to initiate a matrix multiplication operation, write a first set of matrix data into a first set of registers, and share the first set of matrix data between the first processing resource and the second processing resource for use in the matrix multiplication operation. Other embodiments may be described and claimed.
    Type: Grant
    Filed: September 25, 2019
    Date of Patent: January 11, 2022
    Assignee: INTEL CORPORATION
    Inventors: Subramaniam Maiyuran, Varghese George, Joydeep Ray, Ashutosh Garg, Jorge Parra, Shubh Shah, Shubra Marwaha
  • Patent number: 11204977
    Abstract: Described herein is an accelerator device including a host interface, a fabric interconnect coupled with the host interface, and one or more hardware tiles coupled with the fabric interconnect, the one or more hardware tiles including sparse matrix multiply acceleration hardware including a systolic array with feedback inputs.
    Type: Grant
    Filed: June 26, 2020
    Date of Patent: December 21, 2021
    Assignee: Intel Corporation
    Inventors: Subramaniam Maiyuran, Jorge Parra, Supratim Pal, Ashutosh Garg, Shubra Marwaha, Chandra Gurram, Darin Starkey, Durgesh Borkar, Varghese George
  • Patent number: 11188618
    Abstract: An apparatus to facilitate acceleration of matrix multiplication operations. The apparatus comprises a systolic array including matrix multiplication hardware to perform multiply-add operations on received matrix data comprising data from a plurality of input matrices and sparse matrix acceleration hardware to detect zero values in the matrix data and perform one or more optimizations on the matrix data to reduce multiply-add operations to be performed by the matrix multiplication hardware.
    Type: Grant
    Filed: September 5, 2019
    Date of Patent: November 30, 2021
    Assignee: Intel Corporation
    Inventors: Subramaniam Maiyuran, Mathew Nevin, Jorge Parra, Ashutosh Garg, Shubra Marwaha, Shubh Shah
  • Publication number: 20210365402
    Abstract: An apparatus to facilitate computing efficient cross channel operations in parallel computing machines using systolic arrays is disclosed. The apparatus includes a plurality of registers and one or more processing elements communicably coupled to the plurality of registers. The one or more processing elements include a systolic array circuit to perform cross-channel operations on source data received from a single source register of the plurality of registers, the systolic array circuit modified to receive inputs from the single source register and route elements of the single source register to multiple channels in the systolic array circuit.
    Type: Application
    Filed: June 12, 2020
    Publication date: November 25, 2021
    Applicant: Intel Corporation
    Inventors: Subramaniam Maiyuran, Jorge Parra, Supratim Pal, Chandra Gurram
  • Patent number: 11182337
    Abstract: An apparatus to facilitate computing efficient cross channel operations in parallel computing machines using systolic arrays is disclosed. The apparatus includes a plurality of registers and one or more processing elements communicably coupled to the plurality of registers. The one or more processing elements include a systolic array circuit to perform cross-channel operations on source data received from a single source register of the plurality of registers, the systolic array circuit modified to receive inputs from the single source register and route elements of the single source register to multiple channels in the systolic array circuit.
    Type: Grant
    Filed: June 12, 2020
    Date of Patent: November 23, 2021
    Assignee: INTEL CORPORATION
    Inventors: Subramaniam Maiyuran, Jorge Parra, Supratim Pal, Chandra Gurram
  • Publication number: 20210349966
    Abstract: Described herein is an accelerator device including a host interface, a fabric interconnect coupled with the host interface, and one or more hardware tiles coupled with the fabric interconnect, the one or more hardware tiles including sparse matrix multiply acceleration hardware including a systolic array with feedback inputs.
    Type: Application
    Filed: June 26, 2020
    Publication date: November 11, 2021
    Applicant: Intel Corporation
    Inventors: SUBRAMANIAM MAIYURAN, JORGE PARRA, SUPRATIM PAL, ASHUTOSH GARG, SHUBRA MARWAHA, CHANDRA GURRAM, DARIN STARKEY, DURGESH BORKAR, VARGHESE GEORGE
  • Publication number: 20210312697
    Abstract: Described herein is a graphics processing unit (GPU) comprising a single instruction, multiple thread (SIMT) multiprocessor comprising an instruction cache, a shared memory coupled with the instruction cache, and circuitry coupled with the shared memory and the instruction cache, the circuitry including multiple texture units, a first core including hardware to accelerate matrix operations, and a second core configured to receive an instruction having multiple operands in a bfloat16 (BF16) number format, wherein the multiple operands include a first source operand, a second source operand, and a third source operand, and the BF16 number format is a sixteen-bit floating point format having an eight-bit exponent and process the instruction, wherein to process the instruction includes to multiply the second source operand by the third source operand and add a first source operand to a result of the multiply.
    Type: Application
    Filed: June 14, 2021
    Publication date: October 7, 2021
    Applicant: Intel Corporation
    Inventors: Subramaniam Maiyuran, Shubra Marwaha, Ashutosh Garg, Supratim Pal, Jorge Parra, Chandra Gurram, Varghese George, Darin Starkey, Guei-Yuan Lueh
  • Publication number: 20210089301
    Abstract: Embodiments described herein provide an apparatus comprising a plurality of processing resources including a first processing resource and a second processing resource, a shared local memory communicatively coupled to the first processing resource and the second processing resource, and a processor to receive an instruction to initiate a matrix multiplication operation, write a first set of matrix data into a first set of registers, and share the first set of matrix data between the first processing resource and the second processing resource for use in the matrix multiplication operation. Other embodiments may be described and claimed.
    Type: Application
    Filed: September 25, 2019
    Publication date: March 25, 2021
    Applicant: Intel Corporation
    Inventors: SUBRAMANIAM MAIYURAN, VARGHESE GEORGE, JOYDEEP RAY, ASHUTOSH GARG, JORGE PARRA, SHUBH SHAH, SHUBRA MARWAHA
  • Publication number: 20210081201
    Abstract: An apparatus to facilitate utilizing structured sparsity in systolic arrays is disclosed. The apparatus includes a processor comprising a systolic array to receive data from a plurality of source registers, the data comprising unpacked source data, structured source data that is packed based on sparsity, and metadata corresponding to the structured source data; identify portions of the unpacked source data to multiply with the structured source data, the portions of the unpacked source data identified based on the metadata; and output, to a destination register, a result of multiplication of the portions of the unpacked source data and the structured source data.
    Type: Application
    Filed: November 30, 2020
    Publication date: March 18, 2021
    Applicant: Intel Corporation
    Inventors: Subramaniam Maiyuran, Jorge Parra, Ashutosh Garg, Chandra Gurram, Chunhui Mei, Durgesh Borkar, Shubra Marwaha, Supratim Pal, Varghese George, Wei Xiong, Yan Li, Yongsheng Liu, Dipankar Das, Sasikanth Avancha, Dharma Teja Vooturi, Naveen K. Mellempudi
  • Publication number: 20210073318
    Abstract: An apparatus to facilitate acceleration of matrix multiplication operations. The apparatus comprises a systolic array including matrix multiplication hardware to perform multiply-add operations on received matrix data comprising data from a plurality of input matrices and sparse matrix acceleration hardware to detect zero values in the matrix data and perform one or more optimizations on the matrix data to reduce multiply-add operations to be performed by the matrix multiplication hardware.
    Type: Application
    Filed: September 5, 2019
    Publication date: March 11, 2021
    Applicant: Intel Corporation
    Inventors: SUBRAMANIAM MAIYURAN, MATHEW NEVIN, JORGE PARRA, ASHUTOSH GARG, SHUBRA MARWAHA, SHUBH SHAH
  • Publication number: 20200310800
    Abstract: Methods and apparatus for approximation using polynomial functions are disclosed. In one embodiment, a processor comprises decoding and execution circuitry. The decoding circuitry is to decode an instruction, where the instruction comprises a first operand specifying an output location and a second operand specifying a plurality of data element values to be computed. The execution circuitry is to execute the decoded instruction. The execution includes to compute a result for each of the plurality of data element values using a polynomial function to approximate a complex function, where the computation uses coefficients stored in a lookup location for the complex function, and where data element values within different data element value ranges use different sets of coefficients. The execution further includes to store results of the computation in the output location.
    Type: Application
    Filed: March 27, 2019
    Publication date: October 1, 2020
    Inventors: Jorge PARRA, Dan BAUM, Robert CHAPPELL, Michael ESPIG, Varghese GEORGE, Alexander HEINECKE, Christopher HUGHES, Subramaniam MAIYURAN, Elmoustapha OULD-AHMED-VALL, Prasoonkumar SURTI, Ronen ZOHAR