Patents by Inventor Subramaniam Maiyuran

Subramaniam Maiyuran has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230064069
    Abstract: Methods, systems and apparatuses provide for graphics processor technology that generates attribute plane coefficients based on barycentric coefficients, wherein the attribute plane coefficients are generated on a per polygon basis, and interpolates one or more pixel attributes based on the attribute plane coefficients. In one example, the technology excludes the barycentric coefficients from one or more per pixel operations.
    Type: Application
    Filed: July 30, 2021
    Publication date: March 2, 2023
    Inventors: Eric Hoekstra, Prasoonkumar Surti, Abhishek R. Appu, Subramaniam Maiyuran, Kalyan Bhiravabhatla
  • Patent number: 11593910
    Abstract: Embodiments provide mechanisms to facilitate compute operations for deep neural networks. One embodiment comprises a graphics processing unit comprising one or more multiprocessors, at least one of the one or more multiprocessors including a register file to store a plurality of different types of operands and a plurality of processing cores. The plurality of processing cores includes a first set of processing cores of a first type and a second set of processing cores of a second type. The first set of processing cores are associated with a first memory channel and the second set of processing cores are associated with a second memory channel.
    Type: Grant
    Filed: May 11, 2022
    Date of Patent: February 28, 2023
    Assignee: Intel Corporation
    Inventors: Prasoonkumar Surti, Narayan Srinivasa, Feng Chen, Joydeep Ray, Ben J. Ashbaugh, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Balaji Vembu, Tsung-Han Lin, Kamal Sinha, Rajkishore Barik, Sara S. Baghsorkhi, Justin E. Gottschlich, Altug Koker, Nadathur Rajagopalan Satish, Farshad Akhbari, Dukhwan Kim, Wenyin Fu, Travis T. Schluessler, Josh B. Mastronarde, Linda L. Hurd, John H. Feit, Jeffery S. Boles, Adam T. Lake, Karthik Vaidyanathan, Devan Burke, Subramaniam Maiyuran, Abhishek R. Appu
  • Patent number: 11593069
    Abstract: Embodiments described herein are generally directed to an improved vector normalization instruction. An embodiment of a method includes responsive to receipt by a GPU of a single instruction specifying a vector normalization operation to be performed on V vectors: (i) generating V squared length values, N at a time, by a first processing unit, by, for each N sets of inputs, each representing multiple component vectors for N of the vectors, performing N parallel dot product operations on the N sets of inputs. Generating V sets of outputs representing multiple normalized component vectors of the V vectors, N at a time, by a second processing unit, by, for each N squared length values of the V squared length values, performing N parallel operations on the N squared length values, wherein each of the N parallel operations implement a combination of a reciprocal square root function and a vector scaling function.
    Type: Grant
    Filed: September 17, 2021
    Date of Patent: February 28, 2023
    Assignee: Intel Corporation
    Inventors: Abhishek Rhisheekesan, Supratim Pal, Shashank Lakshminarayana, Subramaniam Maiyuran
  • Publication number: 20230051190
    Abstract: Embodiments are generally directed to data prefetching for graphics data processing. An embodiment of an apparatus includes one or more processors including one or more graphics processing units (GPUs); and a plurality of caches to provide storage for the one or more GPUs, the plurality of caches including at least an L1 cache and an L3 cache, wherein the apparatus to provide intelligent prefetching of data by a prefetcher of a first GPU of the one or more GPUs including measuring a hit rate for the L1 cache; upon determining that the hit rate for the L1 cache is equal to or greater than a threshold value, limiting a prefetch of data to storage in the L3 cache, and upon determining that the hit rate for the L1 cache is less than a threshold value, allowing the prefetch of data to the L1 cache.
    Type: Application
    Filed: July 15, 2022
    Publication date: February 16, 2023
    Applicant: Intel Corporation
    Inventors: Vikranth Vemulapalli, Lakshminarayanan Striramassarma, Mike MacPherson, Aravindh Anantaraman, Ben Ashbaugh, Murali Ramadoss, William B. Sadler, Jonathan Pearce, Scott Janus, Brent Insko, Vasanth Ranganathan, Kamal Sinha, Arthur Hunter, JR., Prasoonkumar Surti, Nicolas Galoppo von Borries, Joydeep Ray, Abhishek R. Appu, ElMoustapha Ould-Ahmed-Vall, Altug Koker, Sungye Kim, Subramaniam Maiyuran, Valentin Andrei
  • Patent number: 11579878
    Abstract: An apparatus is disclosed. The apparatus includes one or more processors comprising register sharing circuitry to receive meta-information indicating a number of threads that are to be disabled and provide an indication that an associated thread is disabled, a plurality of General Purpose Register Files (GRFs), wherein one or more of the plurality of GRFs is associated with one of the plurality of threads and a plurality of multiplexers coupled to the one or more GRFs to receive the indication from the register sharing circuitry and disable thread access to an associated GRF based on an indication that a thread is to be disabled.
    Type: Grant
    Filed: May 22, 2020
    Date of Patent: February 14, 2023
    Assignee: Intel Corporation
    Inventors: Pratik J. Ashar, Supratim Pal, Subramaniam Maiyuran, Wei-Yu Chen, Guei-Yuan Lueh
  • Publication number: 20230029176
    Abstract: Methods and apparatus relating to scalar core integration in a graphics processor. In an example, an apparatus comprises a processor to receive a set of workload instructions for a graphics workload from a host complex, determine a first subset of operations in the set of operations that is suitable for execution by a scalar processor complex of the graphics processing device and a second subset of operations in the set of operations that is suitable for execution by a vector processor complex of the graphics processing device, assign the first subset of operations to the scalar processor complex for execution to generate a first set of outputs, assign the second subset of operations to the vector processor complex for execution to generate a second set of outputs. Other embodiments are also disclosed and claimed.
    Type: Application
    Filed: July 19, 2022
    Publication date: January 26, 2023
    Applicant: Intel Corporation
    Inventors: JOYDEEP RAY, ARAVINDH ANANTARAMAN, ABHISHEK R. APPU, ALTUG KOKER, ELMOUSTAPHA OULD-AHMED-VALL, VALENTIN ANDREI, SUBRAMANIAM MAIYURAN, NICOLAS GALOPPO VON BORRIES, VARGHESE GEORGE, MIKE MACPHERSON, BEN ASHBAUGH, MURALI RAMADOSS, VIKRANTH VEMULAPALLI, WILLIAM SADLER, JONATHAN PEARCE, SUNGYE KIM
  • Patent number: 11562461
    Abstract: An apparatus to facilitate compute optimization is disclosed. The apparatus includes one or more processing units to provide a first set of shader operations associated with a shader stage of a graphics pipeline, a scheduler to schedule shader threads for processing, and a field-programmable gate array (FPGA) dynamically configured to provide a second set of shader operations associated with the shader stage of the graphics pipeline.
    Type: Grant
    Filed: November 18, 2021
    Date of Patent: January 24, 2023
    Assignee: Intel Corporation
    Inventors: Prasoonkumar Surti, Narayan Srinivasa, Feng Chen, Joydeep Ray, Ben J. Ashbaugh, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Balaji Vembu, Tsung-Han Lin, Kamal Sinha, Rajkishore Barik, Sara S. Baghsorkhi, Justin E. Gottschlich, Altug Koker, Nadathur Rajagopalan Satish, Farshad Akhbari, Dukhwan Kim, Wenyin Fu, Travis T. Schluessler, Josh B. Mastronarde, Linda L. Hurd, John H. Feit, Jeffery S. Boles, Adam T. Lake, Karthik Vaidyanathan, Devan Burke, Subramaniam Maiyuran, Abhishek R. Appu
  • Patent number: 11561828
    Abstract: Accelerated synchronization operations using fine grain dependency check are disclosed. A graphics multiprocessor includes a plurality of execution units and synchronization circuitry that is configured to determine availability of at least one execution unit. The synchronization circuitry to perform a fine grain dependency check of availability of dependent data or operands in shared local memory or cache when at least one execution unit is available.
    Type: Grant
    Filed: May 11, 2021
    Date of Patent: January 24, 2023
    Assignee: Intel Corporation
    Inventors: Subramaniam Maiyuran, Varghese George, Altug Koker, Aravindh Anantaraman, SungYe Kim, Valentin Andrei, Joydeep Ray
  • Publication number: 20220413916
    Abstract: Provision of multiple register allocation sizes for threads is described. An example of a system includes one or more processors including a graphics processor, the graphics processor including at least a first local thread dispatcher (TDL) and multiple processing resources, each processing resource including a plurality of registers; and memory for storage of data for processing, wherein the one or more processors are to determine a register size for a first thread; identify one or more processing resources having sufficient register space for the first thread; select a processing resource of the one or more processing resources having sufficient register space to assign the first thread; select an available thread slot of the selected processing resource for the first thread; and allocate registers of the selected processing resource for the first thread.
    Type: Application
    Filed: June 25, 2021
    Publication date: December 29, 2022
    Applicant: Intel Corporation
    Inventors: Chandra Gurram, Wei-Yu Chen, Vikranth Vemulapalli, Subramaniam Maiyuran, Jorge Eduardo Parra Osorio, Shuai Mu, Guei-Yuan Lueh, Supratim Pal
  • Publication number: 20220413803
    Abstract: A processing apparatus is described herein that includes a general-purpose parallel processing engine comprising a matrix accelerator including one or more systolic arrays, at least one of the one or more systolic arrays comprising multiple pipeline stages, each pipeline stage of the multiple pipeline stages including multiple processing elements, the multiple processing elements configured to perform processing operations on input matrix elements based on output sparsity metadata. The output sparsity metadata indicates to the multiple processing elements to bypass multiplication for a first row of elements of a second matrix and multiply a second row of elements of the second matrix with a column of matrix elements of a first matrix.
    Type: Application
    Filed: June 25, 2021
    Publication date: December 29, 2022
    Applicant: Intel Corporation
    Inventors: Jorge Parra, Fangwen Fu, Subramaniam Maiyuran, Varghese George, Mike Macpherson, Supratim Pal, Chandra Gurram, Sabareesh Ganapathy, Sasikanth Avancha, Dharma Teja Vooturi, Naveen Mellempudi, Dipankar Das
  • Publication number: 20220414818
    Abstract: Examples described herein relate to an apparatus comprising: at least one memory and at least one processor. In some example, the at least one processor is to: represent at least one vertex in a set of vertices of a first polygon using a first index; store the first index into the at least one memory; and indicate whether the first index is to be de-referenced based on a comparison between the first index and at least one other index, wherein: a first memory pointer is associated with the at least one vertex in the set of vertices of the first polygon and the first index comprises a number of bits that is less than a number of bits associated with the first memory pointer. In some examples, the number of bits of the first index is based on a size of a vertex window and wherein the vertex window comprises multiple vertices associated with one or more draw calls.
    Type: Application
    Filed: June 26, 2021
    Publication date: December 29, 2022
    Inventors: Raghavendra KAMATH MIYAR, Rajalakshmi ATHIMOOLAM, Subramaniam MAIYURAN, Jorge F. GARCIA PABON, Rajarshi BAJPAYEE, Krishan MALIK
  • Publication number: 20220414053
    Abstract: A processing apparatus includes a processing resource including a general-purpose parallel processing engine and a matrix accelerator. The matrix accelerator includes first circuitry to receive a command to perform operations associated with an instruction, second circuitry to configure the matrix accelerator according to a physical depth of a systolic array within the matrix accelerator and a logical depth associated with the instruction, third circuitry to read operands for the instruction from a register file associated with the systolic array, fourth circuitry to perform operations for the instruction via one or more passes through one or more physical pipeline stages of the systolic array based on a configuration performed by the second circuitry, and fifth circuitry to write output of the operations to the register file associated with the systolic array.
    Type: Application
    Filed: June 24, 2021
    Publication date: December 29, 2022
    Applicant: Intel Corporation
    Inventors: Jorge Parra, Wei-yu Chen, Kaiyu Chen, Varghese George, Junjie Gu, Chandra Gurram, Guei-Yuan Lueh, Stephen Junkins, Subramaniam Maiyuran, Supratim Pal
  • Publication number: 20220413851
    Abstract: A processing apparatus includes a general-purpose parallel processing engine including a set of multiple processing elements including a single precision floating-point unit, a double precision floating point unit, and an integer unit; a matrix accelerator including one or more systolic arrays; a first register file coupled with a first read control circuit, wherein the first read control circuit couples with the set of multiple processing elements and the matrix accelerator to arbitrate read requests to the first register file from the set of multiple processing elements and the matrix accelerator; and a second register file coupled with a second read control circuit, wherein the second read control circuit couples with the matrix accelerator to arbitrate read requests to the second register file from the matrix accelerator and limit access to the second register file by the set of multiple processing elements.
    Type: Application
    Filed: June 25, 2021
    Publication date: December 29, 2022
    Applicant: Intel Corporation
    Inventors: Chandra Gurram, Wei-yu Chen, Fangwen Fu, Sabareesh Ganapathy, Varghese George, Guei-Yuan Lueh, Subramaniam Maiyuran, Mike Macpherson, Supratim Pal, Jorge Parra
  • Publication number: 20220405096
    Abstract: Embodiments are directed to systems and methods for reuse of FMA execution unit hardware logic to provide native support for execution of get exponent, get mantissa, and/or scale instructions within a GPU. These new instructions may be used to implement branch-free emulation algorithms for mathematical functions and analytic functions (e.g., transcendental functions) by detecting and handling various special case inputs within a pre-processing stage of the FMA execution unit, which allows the main dataflow of the FMA execution unit to be bypassed for such special cases. Since special cases are handled by the FMA execution unit, library functions emulating various functions, including, but not limited to logarithm, exponential, and division operations may be implemented with significantly fewer lines of machine-level code, thereby providing improved performance for HPC applications.
    Type: Application
    Filed: June 22, 2021
    Publication date: December 22, 2022
    Applicant: Intel Corporation
    Inventors: Shuai Mu, Cristina S. Anderson, Subramaniam Maiyuran
  • Publication number: 20220383569
    Abstract: Methods, systems and apparatuses may provide for technology that selects a location of a pixel block based on a location of a graphics polygon, wherein the pixel block contains the graphics polygon. The technology may also convert the graphics polygon into a pixel-based representation within the pixel block during a single transaction.
    Type: Application
    Filed: May 27, 2021
    Publication date: December 1, 2022
    Inventors: Jorge Garcia Pabon, Subramaniam Maiyuran, Raghavendra Kamath Miyar, Vamsee Vardhan Chivukula, Krishan Malik, Abhishek Varshney
  • Patent number: 11514721
    Abstract: Systems, apparatuses, and methods may provide for technology to dynamically control a display in response to ocular characteristic measurements of at least one eye of a user.
    Type: Grant
    Filed: April 2, 2021
    Date of Patent: November 29, 2022
    Assignee: Intel Corporation
    Inventors: Radhakrishnan Venkataraman, James M. Holland, Sayan Lahiri, Pattabhiraman K, Kamal Sinha, Chandrasekaran Sakthivel, Daniel Pohl, Vivek Tiwari, Philip R. Laws, Subramaniam Maiyuran, Abhishek R. Appu, ElMoustapha Ould-Ahmed-Vall, Peter L. Doyle, Devan Burke
  • Patent number: 11507375
    Abstract: In an example, an apparatus comprises a plurality of execution units, and a first general register file (GRF) communicatively couple to the plurality of execution units, wherein the first GRF is shared by the plurality of execution units. Other embodiments are also disclosed and claimed.
    Type: Grant
    Filed: May 12, 2021
    Date of Patent: November 22, 2022
    Assignee: INTEL CORPORATION
    Inventors: Abhishek R. Appu, Altug Koker, Joydeep Ray, Kamal Sinha, Kiran C. Veernapu, Subramaniam Maiyuran, Prasoonkumar Surti, Guei-Yuan Lueh, David Puffer, Supratim Pal, Eric J. Hoekstra, Travis T. Schluessler, Linda L. Hurd
  • Publication number: 20220365901
    Abstract: Described herein is a graphics processing unit (GPU) comprising a first processing cluster to perform parallel processing operations, the parallel processing operations including a ray tracing operation and a matrix multiply operation; and a second processing cluster coupled to the first processing cluster, wherein the first processing cluster includes a floating-point unit to perform floating point operations, the floating-point unit is configured to process an instruction using a bfloat16 (BF16) format with a multiplier to multiply second and third source operands while an accumulator adds a first source operand with output from the multiplier.
    Type: Application
    Filed: May 27, 2022
    Publication date: November 17, 2022
    Applicant: Intel Corporation
    Inventors: Subramaniam Maiyuran, Shubra Marwaha, Ashutosh Garg, Supratim Pal, Jorge Parra, Chandra Gurram, Varghese George, Darin Starkey, Guei-Yuan Lueh
  • Publication number: 20220366630
    Abstract: Position-based rendering apparatus and method for multi-die/GPU graphics processing. For example, one embodiment of a method comprises: distributing a plurality of graphics draws to a plurality of graphics processors; performing position-only shading using vertex data associated with tiles of a first draw on a first graphics processor, the first graphics processor responsively generating visibility data for each of the tiles; distributing subsets of the visibility data associated with different subsets of the tiles to different graphics processors; limiting geometry work to be performed on each tile by each graphics processor using the visibility data, each graphics processor to responsively generate rendered tiles; and wherein the rendered tiles are combined to generate a complete image frame.
    Type: Application
    Filed: July 28, 2022
    Publication date: November 17, 2022
    Applicant: Intel Corporation
    Inventors: TRAVIS SCHLUESSLER, ZACK WATERS, MICHAEL APODACA, DANIEL JOHNSTON, JASON SURPRISE, PRASOONKUMAR SURTI, SUBRAMANIAM MAIYURAN, PETER DOYLE, SAURABH SHARMA, ANKUR SHAH, MURALI RAMADOSS
  • Patent number: 11494867
    Abstract: An apparatus to facilitate asynchronous execution at a processing unit. The apparatus includes one or more processors to detect independent task passes that may be executed out of order in a pipeline of the processing unit, schedule a first set of processing tasks to be executed at a first set of processing elements at the processing unit and schedule a second set of tasks to be executed at a second set of processing elements, wherein execution of the first set of tasks at the first set of processing elements is to be performed simultaneous and in parallel to execution of the second set of tasks at the second set of processing elements.
    Type: Grant
    Filed: December 8, 2020
    Date of Patent: November 8, 2022
    Assignee: Intel Corporation
    Inventors: Saurabh Sharma, Michael Apodaca, Aditya Navale, Travis Schluessler, Vamsee Vardhan Chivukula, Abhishek Venkatesh, Subramaniam Maiyuran