Patents by Inventor Subramaniam Maiyuran

Subramaniam Maiyuran has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

CONVERTING BARYCENTRIC PLANES TO ATTRIBUTE PLANES

Publication number: 20230064069

Abstract: Methods, systems and apparatuses provide for graphics processor technology that generates attribute plane coefficients based on barycentric coefficients, wherein the attribute plane coefficients are generated on a per polygon basis, and interpolates one or more pixel attributes based on the attribute plane coefficients. In one example, the technology excludes the barycentric coefficients from one or more per pixel operations.

Type: Application

Filed: July 30, 2021

Publication date: March 2, 2023

Inventors: Eric Hoekstra, Prasoonkumar Surti, Abhishek R. Appu, Subramaniam Maiyuran, Kalyan Bhiravabhatla
Compute optimization mechanism for deep neural networks

Patent number: 11593910

Abstract: Embodiments provide mechanisms to facilitate compute operations for deep neural networks. One embodiment comprises a graphics processing unit comprising one or more multiprocessors, at least one of the one or more multiprocessors including a register file to store a plurality of different types of operands and a plurality of processing cores. The plurality of processing cores includes a first set of processing cores of a first type and a second set of processing cores of a second type. The first set of processing cores are associated with a first memory channel and the second set of processing cores are associated with a second memory channel.

Type: Grant

Filed: May 11, 2022

Date of Patent: February 28, 2023

Assignee: Intel Corporation

Inventors: Prasoonkumar Surti, Narayan Srinivasa, Feng Chen, Joydeep Ray, Ben J. Ashbaugh, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Balaji Vembu, Tsung-Han Lin, Kamal Sinha, Rajkishore Barik, Sara S. Baghsorkhi, Justin E. Gottschlich, Altug Koker, Nadathur Rajagopalan Satish, Farshad Akhbari, Dukhwan Kim, Wenyin Fu, Travis T. Schluessler, Josh B. Mastronarde, Linda L. Hurd, John H. Feit, Jeffery S. Boles, Adam T. Lake, Karthik Vaidyanathan, Devan Burke, Subramaniam Maiyuran, Abhishek R. Appu
Use of a single instruction set architecture (ISA) instruction for vector normalization

Patent number: 11593069

Abstract: Embodiments described herein are generally directed to an improved vector normalization instruction. An embodiment of a method includes responsive to receipt by a GPU of a single instruction specifying a vector normalization operation to be performed on V vectors: (i) generating V squared length values, N at a time, by a first processing unit, by, for each N sets of inputs, each representing multiple component vectors for N of the vectors, performing N parallel dot product operations on the N sets of inputs. Generating V sets of outputs representing multiple normalized component vectors of the V vectors, N at a time, by a second processing unit, by, for each N squared length values of the V squared length values, performing N parallel operations on the N squared length values, wherein each of the N parallel operations implement a combination of a reciprocal square root function and a vector scaling function.

Type: Grant

Filed: September 17, 2021

Date of Patent: February 28, 2023

Assignee: Intel Corporation

Inventors: Abhishek Rhisheekesan, Supratim Pal, Shashank Lakshminarayana, Subramaniam Maiyuran
DATA PREFETCHING FOR GRAPHICS DATA PROCESSING

Publication number: 20230051190

Abstract: Embodiments are generally directed to data prefetching for graphics data processing. An embodiment of an apparatus includes one or more processors including one or more graphics processing units (GPUs); and a plurality of caches to provide storage for the one or more GPUs, the plurality of caches including at least an L1 cache and an L3 cache, wherein the apparatus to provide intelligent prefetching of data by a prefetcher of a first GPU of the one or more GPUs including measuring a hit rate for the L1 cache; upon determining that the hit rate for the L1 cache is equal to or greater than a threshold value, limiting a prefetch of data to storage in the L3 cache, and upon determining that the hit rate for the L1 cache is less than a threshold value, allowing the prefetch of data to the L1 cache.

Type: Application

Filed: July 15, 2022

Publication date: February 16, 2023

Applicant: Intel Corporation

Inventors: Vikranth Vemulapalli, Lakshminarayanan Striramassarma, Mike MacPherson, Aravindh Anantaraman, Ben Ashbaugh, Murali Ramadoss, William B. Sadler, Jonathan Pearce, Scott Janus, Brent Insko, Vasanth Ranganathan, Kamal Sinha, Arthur Hunter, JR., Prasoonkumar Surti, Nicolas Galoppo von Borries, Joydeep Ray, Abhishek R. Appu, ElMoustapha Ould-Ahmed-Vall, Altug Koker, Sungye Kim, Subramaniam Maiyuran, Valentin Andrei
Register sharing mechanism to equally allocate disabled thread registers to active threads

Patent number: 11579878

Abstract: An apparatus is disclosed. The apparatus includes one or more processors comprising register sharing circuitry to receive meta-information indicating a number of threads that are to be disabled and provide an indication that an associated thread is disabled, a plurality of General Purpose Register Files (GRFs), wherein one or more of the plurality of GRFs is associated with one of the plurality of threads and a plurality of multiplexers coupled to the one or more GRFs to receive the indication from the register sharing circuitry and disable thread access to an associated GRF based on an indication that a thread is to be disabled.

Type: Grant

Filed: May 22, 2020

Date of Patent: February 14, 2023

Assignee: Intel Corporation

Inventors: Pratik J. Ashar, Supratim Pal, Subramaniam Maiyuran, Wei-Yu Chen, Guei-Yuan Lueh
SCALAR CORE INTEGRATION

Publication number: 20230029176

Abstract: Methods and apparatus relating to scalar core integration in a graphics processor. In an example, an apparatus comprises a processor to receive a set of workload instructions for a graphics workload from a host complex, determine a first subset of operations in the set of operations that is suitable for execution by a scalar processor complex of the graphics processing device and a second subset of operations in the set of operations that is suitable for execution by a vector processor complex of the graphics processing device, assign the first subset of operations to the scalar processor complex for execution to generate a first set of outputs, assign the second subset of operations to the vector processor complex for execution to generate a second set of outputs. Other embodiments are also disclosed and claimed.

Type: Application

Filed: July 19, 2022

Publication date: January 26, 2023

Applicant: Intel Corporation

Inventors: JOYDEEP RAY, ARAVINDH ANANTARAMAN, ABHISHEK R. APPU, ALTUG KOKER, ELMOUSTAPHA OULD-AHMED-VALL, VALENTIN ANDREI, SUBRAMANIAM MAIYURAN, NICOLAS GALOPPO VON BORRIES, VARGHESE GEORGE, MIKE MACPHERSON, BEN ASHBAUGH, MURALI RAMADOSS, VIKRANTH VEMULAPALLI, WILLIAM SADLER, JONATHAN PEARCE, SUNGYE KIM
Compute optimization mechanism for deep neural networks

Patent number: 11562461

Abstract: An apparatus to facilitate compute optimization is disclosed. The apparatus includes one or more processing units to provide a first set of shader operations associated with a shader stage of a graphics pipeline, a scheduler to schedule shader threads for processing, and a field-programmable gate array (FPGA) dynamically configured to provide a second set of shader operations associated with the shader stage of the graphics pipeline.

Type: Grant

Filed: November 18, 2021

Date of Patent: January 24, 2023

Assignee: Intel Corporation

Inventors: Prasoonkumar Surti, Narayan Srinivasa, Feng Chen, Joydeep Ray, Ben J. Ashbaugh, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Balaji Vembu, Tsung-Han Lin, Kamal Sinha, Rajkishore Barik, Sara S. Baghsorkhi, Justin E. Gottschlich, Altug Koker, Nadathur Rajagopalan Satish, Farshad Akhbari, Dukhwan Kim, Wenyin Fu, Travis T. Schluessler, Josh B. Mastronarde, Linda L. Hurd, John H. Feit, Jeffery S. Boles, Adam T. Lake, Karthik Vaidyanathan, Devan Burke, Subramaniam Maiyuran, Abhishek R. Appu
Graphics systems and methods for accelerating synchronization using fine grain dependency check and scheduling optimizations based on available shared memory space

Patent number: 11561828

Abstract: Accelerated synchronization operations using fine grain dependency check are disclosed. A graphics multiprocessor includes a plurality of execution units and synchronization circuitry that is configured to determine availability of at least one execution unit. The synchronization circuitry to perform a fine grain dependency check of availability of dependent data or operands in shared local memory or cache when at least one execution unit is available.

Type: Grant

Filed: May 11, 2021

Date of Patent: January 24, 2023

Assignee: Intel Corporation

Inventors: Subramaniam Maiyuran, Varghese George, Altug Koker, Aravindh Anantaraman, SungYe Kim, Valentin Andrei, Joydeep Ray
MULTIPLE REGISTER ALLOCATION SIZES FOR THREADS

Publication number: 20220413916

Abstract: Provision of multiple register allocation sizes for threads is described. An example of a system includes one or more processors including a graphics processor, the graphics processor including at least a first local thread dispatcher (TDL) and multiple processing resources, each processing resource including a plurality of registers; and memory for storage of data for processing, wherein the one or more processors are to determine a register size for a first thread; identify one or more processing resources having sufficient register space for the first thread; select a processing resource of the one or more processing resources having sufficient register space to assign the first thread; select an available thread slot of the selected processing resource for the first thread; and allocate registers of the selected processing resource for the first thread.

Type: Application

Filed: June 25, 2021

Publication date: December 29, 2022

Applicant: Intel Corporation

Inventors: Chandra Gurram, Wei-Yu Chen, Vikranth Vemulapalli, Subramaniam Maiyuran, Jorge Eduardo Parra Osorio, Shuai Mu, Guei-Yuan Lueh, Supratim Pal
SYSTOLIC ARRAY HAVING SUPPORT FOR OUTPUT SPARSITY

Publication number: 20220413803

Abstract: A processing apparatus is described herein that includes a general-purpose parallel processing engine comprising a matrix accelerator including one or more systolic arrays, at least one of the one or more systolic arrays comprising multiple pipeline stages, each pipeline stage of the multiple pipeline stages including multiple processing elements, the multiple processing elements configured to perform processing operations on input matrix elements based on output sparsity metadata. The output sparsity metadata indicates to the multiple processing elements to bypass multiplication for a first row of elements of a second matrix and multiply a second row of elements of the second matrix with a column of matrix elements of a first matrix.

Type: Application

Filed: June 25, 2021

Publication date: December 29, 2022

Applicant: Intel Corporation

Inventors: Jorge Parra, Fangwen Fu, Subramaniam Maiyuran, Varghese George, Mike Macpherson, Supratim Pal, Chandra Gurram, Sabareesh Ganapathy, Sasikanth Avancha, Dharma Teja Vooturi, Naveen Mellempudi, Dipankar Das
POINTER DE-REFERENCING TECHNOLOGIES

Publication number: 20220414818

Abstract: Examples described herein relate to an apparatus comprising: at least one memory and at least one processor. In some example, the at least one processor is to: represent at least one vertex in a set of vertices of a first polygon using a first index; store the first index into the at least one memory; and indicate whether the first index is to be de-referenced based on a comparison between the first index and at least one other index, wherein: a first memory pointer is associated with the at least one vertex in the set of vertices of the first polygon and the first index comprises a number of bits that is less than a number of bits associated with the first memory pointer. In some examples, the number of bits of the first index is based on a size of a vertex window and wherein the vertex window comprises multiple vertices associated with one or more draw calls.

Type: Application

Filed: June 26, 2021

Publication date: December 29, 2022

Inventors: Raghavendra KAMATH MIYAR, Rajalakshmi ATHIMOOLAM, Subramaniam MAIYURAN, Jorge F. GARCIA PABON, Rajarshi BAJPAYEE, Krishan MALIK
SYSTOLIC ARRAY OF ARBITRARY PHYSICAL AND LOGICAL DEPTH

Publication number: 20220414053

Abstract: A processing apparatus includes a processing resource including a general-purpose parallel processing engine and a matrix accelerator. The matrix accelerator includes first circuitry to receive a command to perform operations associated with an instruction, second circuitry to configure the matrix accelerator according to a physical depth of a systolic array within the matrix accelerator and a logical depth associated with the instruction, third circuitry to read operands for the instruction from a register file associated with the systolic array, fourth circuitry to perform operations for the instruction via one or more passes through one or more physical pipeline stages of the systolic array based on a configuration performed by the second circuitry, and fifth circuitry to write output of the operations to the register file associated with the systolic array.

Type: Application

Filed: June 24, 2021

Publication date: December 29, 2022

Applicant: Intel Corporation

Inventors: Jorge Parra, Wei-yu Chen, Kaiyu Chen, Varghese George, Junjie Gu, Chandra Gurram, Guei-Yuan Lueh, Stephen Junkins, Subramaniam Maiyuran, Supratim Pal
REGISTER FILE FOR SYSTOLIC ARRAY

Publication number: 20220413851

Abstract: A processing apparatus includes a general-purpose parallel processing engine including a set of multiple processing elements including a single precision floating-point unit, a double precision floating point unit, and an integer unit; a matrix accelerator including one or more systolic arrays; a first register file coupled with a first read control circuit, wherein the first read control circuit couples with the set of multiple processing elements and the matrix accelerator to arbitrate read requests to the first register file from the set of multiple processing elements and the matrix accelerator; and a second register file coupled with a second read control circuit, wherein the second read control circuit couples with the matrix accelerator to arbitrate read requests to the second register file from the matrix accelerator and limit access to the second register file by the set of multiple processing elements.

Type: Application

Filed: June 25, 2021

Publication date: December 29, 2022

Applicant: Intel Corporation

Inventors: Chandra Gurram, Wei-yu Chen, Fangwen Fu, Sabareesh Ganapathy, Varghese George, Guei-Yuan Lueh, Subramaniam Maiyuran, Mike Macpherson, Supratim Pal, Jorge Parra
NATIVE SUPPORT FOR EXECUTION OF GET EXPONENT, GET MANTISSSA, AND SCALE INSTRUCTIONS WITHIN A GRAPHICS PROCESSING UNIT VIA REUSE OF FUSED MULTIPLY-ADD EXECUTION UNIT HARDWARE LOGIC

Publication number: 20220405096

Abstract: Embodiments are directed to systems and methods for reuse of FMA execution unit hardware logic to provide native support for execution of get exponent, get mantissa, and/or scale instructions within a GPU. These new instructions may be used to implement branch-free emulation algorithms for mathematical functions and analytic functions (e.g., transcendental functions) by detecting and handling various special case inputs within a pre-processing stage of the FMA execution unit, which allows the main dataflow of the FMA execution unit to be bypassed for such special cases. Since special cases are handled by the FMA execution unit, library functions emulating various functions, including, but not limited to logarithm, exponential, and division operations may be implemented with significantly fewer lines of machine-level code, thereby providing improved performance for HPC applications.

Type: Application

Filed: June 22, 2021

Publication date: December 22, 2022

Applicant: Intel Corporation

Inventors: Shuai Mu, Cristina S. Anderson, Subramaniam Maiyuran
SMALL POLYGON RASTERIZATION

Publication number: 20220383569

Abstract: Methods, systems and apparatuses may provide for technology that selects a location of a pixel block based on a location of a graphics polygon, wherein the pixel block contains the graphics polygon. The technology may also convert the graphics polygon into a pixel-based representation within the pixel block during a single transaction.

Type: Application

Filed: May 27, 2021

Publication date: December 1, 2022

Inventors: Jorge Garcia Pabon, Subramaniam Maiyuran, Raghavendra Kamath Miyar, Vamsee Vardhan Chivukula, Krishan Malik, Abhishek Varshney
Dynamic brightness and resolution control in virtual environments

Patent number: 11514721

Abstract: Systems, apparatuses, and methods may provide for technology to dynamically control a display in response to ocular characteristic measurements of at least one eye of a user.

Type: Grant

Filed: April 2, 2021

Date of Patent: November 29, 2022

Assignee: Intel Corporation

Inventors: Radhakrishnan Venkataraman, James M. Holland, Sayan Lahiri, Pattabhiraman K, Kamal Sinha, Chandrasekaran Sakthivel, Daniel Pohl, Vivek Tiwari, Philip R. Laws, Subramaniam Maiyuran, Abhishek R. Appu, ElMoustapha Ould-Ahmed-Vall, Peter L. Doyle, Devan Burke
Hierarchical general register file (GRF) for execution block

Patent number: 11507375

Abstract: In an example, an apparatus comprises a plurality of execution units, and a first general register file (GRF) communicatively couple to the plurality of execution units, wherein the first GRF is shared by the plurality of execution units. Other embodiments are also disclosed and claimed.

Type: Grant

Filed: May 12, 2021

Date of Patent: November 22, 2022

Assignee: INTEL CORPORATION

Inventors: Abhishek R. Appu, Altug Koker, Joydeep Ray, Kamal Sinha, Kiran C. Veernapu, Subramaniam Maiyuran, Prasoonkumar Surti, Guei-Yuan Lueh, David Puffer, Supratim Pal, Eric J. Hoekstra, Travis T. Schluessler, Linda L. Hurd
GRAPHICS PROCESSORS AND GRAPHICS PROCESSING UNITS HAVING DOT PRODUCT ACCUMULATE INSTRUCTION FOR HYBRID FLOATING POINT FORMAT

Publication number: 20220365901

Abstract: Described herein is a graphics processing unit (GPU) comprising a first processing cluster to perform parallel processing operations, the parallel processing operations including a ray tracing operation and a matrix multiply operation; and a second processing cluster coupled to the first processing cluster, wherein the first processing cluster includes a floating-point unit to perform floating point operations, the floating-point unit is configured to process an instruction using a bfloat16 (BF16) format with a multiplier to multiply second and third source operands while an accumulator adds a first source operand with output from the multiplier.

Type: Application

Filed: May 27, 2022

Publication date: November 17, 2022

Applicant: Intel Corporation

Inventors: Subramaniam Maiyuran, Shubra Marwaha, Ashutosh Garg, Supratim Pal, Jorge Parra, Chandra Gurram, Varghese George, Darin Starkey, Guei-Yuan Lueh
POSITION-BASED RENDERING APPARATUS AND METHOD FOR MULTI-DIE/GPU GRAPHICS PROCESSING

Publication number: 20220366630

Abstract: Position-based rendering apparatus and method for multi-die/GPU graphics processing. For example, one embodiment of a method comprises: distributing a plurality of graphics draws to a plurality of graphics processors; performing position-only shading using vertex data associated with tiles of a first draw on a first graphics processor, the first graphics processor responsively generating visibility data for each of the tiles; distributing subsets of the visibility data associated with different subsets of the tiles to different graphics processors; limiting geometry work to be performed on each tile by each graphics processor using the visibility data, each graphics processor to responsively generate rendered tiles; and wherein the rendered tiles are combined to generate a complete image frame.

Type: Application

Filed: July 28, 2022

Publication date: November 17, 2022

Applicant: Intel Corporation

Inventors: TRAVIS SCHLUESSLER, ZACK WATERS, MICHAEL APODACA, DANIEL JOHNSTON, JASON SURPRISE, PRASOONKUMAR SURTI, SUBRAMANIAM MAIYURAN, PETER DOYLE, SAURABH SHARMA, ANKUR SHAH, MURALI RAMADOSS
Asynchronous execution mechanism

Patent number: 11494867

Abstract: An apparatus to facilitate asynchronous execution at a processing unit. The apparatus includes one or more processors to detect independent task passes that may be executed out of order in a pipeline of the processing unit, schedule a first set of processing tasks to be executed at a first set of processing elements at the processing unit and schedule a second set of tasks to be executed at a second set of processing elements, wherein execution of the first set of tasks at the first set of processing elements is to be performed simultaneous and in parallel to execution of the second set of tasks at the second set of processing elements.

Type: Grant

Filed: December 8, 2020

Date of Patent: November 8, 2022

Assignee: Intel Corporation

Inventors: Saurabh Sharma, Michael Apodaca, Aditya Navale, Travis Schluessler, Vamsee Vardhan Chivukula, Abhishek Venkatesh, Subramaniam Maiyuran

prev 1 2 3 4 5 6 7 8 … next