Patents by Inventor Subramaniam Maiyuran

Subramaniam Maiyuran has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11321799
    Abstract: Examples described herein relate to a software and hardware optimization that manages scenarios where a write operation to a register is less than an entirety of the register. A compiler detects instructions that make partial writes to the same register, groups such instructions, and provides hints to hardware of the partial write. The execution unit combines the output data for grouped instructions and updates the destination register as single write instead of multiple separate partial writes.
    Type: Grant
    Filed: December 24, 2019
    Date of Patent: May 3, 2022
    Assignee: Intel Corporation
    Inventors: Chandra S. Gurram, Gang Y. Chen, Subramaniam Maiyuran, Supratim Pal, Ashutosh Garg, Jorge E. Parra, Darin M. Starkey, Guei-Yuan Lueh, Wei-Yu Chen
  • Publication number: 20220129266
    Abstract: Graphics processors and graphics processing units having dot product accumulate instructions for a hybrid floating point format are disclosed. In one embodiment, a graphics multiprocessor comprises an instruction unit to dispatch instructions and a processing resource coupled to the instruction unit. The processing resource is configured to receive a dot product accumulate instruction from the instruction unit and to process the dot product accumulate instruction using a bfloat16 number (BF16) format.
    Type: Application
    Filed: March 14, 2020
    Publication date: April 28, 2022
    Applicant: Intel Corporation
    Inventors: Subramaniam Maiyuran, Shubra Marwaha, Ashutosh Garg, Supratim Pal, Jorge Parra, Chandra Gurram, Varghese George, Darin Starkey, Guei-Yuan Lueh
  • Publication number: 20220129265
    Abstract: Methods and apparatus relating to techniques for data compression. In an example, an apparatus comprises a processor receive a data compression instruction for a memory segment; and in response to the data compression instruction, compress a sequence of identical memory values in response to a determination that the sequence of identical memory values has a length which exceeds a threshold. Other embodiments are also disclosed and claimed.
    Type: Application
    Filed: March 14, 2020
    Publication date: April 28, 2022
    Applicant: Intel Corporation
    Inventors: Abhishek R. Appu, Altug Koker, Aravindh Anantaraman, Elmoustapha Ould-Ahmed-Vall, Joydeep Ray, Mike Macpherson, Valentin Andrei, Nicolas Galoppo Von Borries, Varghese George, Subramaniam Maiyuran, Vasanth Ranganathan, Jayakrishna P S, K Pattabhiraman, Sudhakar Kamma
  • Publication number: 20220129271
    Abstract: Methods and apparatus relating to data initialization techniques. In an example, an apparatus comprises a processor to read one or more metadata codes which map to one or more cache lines in a cache memory and invoke a random number generator to generate random numerical data for the one or more cache lines in response to a determination that the one more metadata codes indicate that the cache lines are to contain random numerical data. Other embodiments are also disclosed and claimed.
    Type: Application
    Filed: March 14, 2020
    Publication date: April 28, 2022
    Applicant: Intel Corporation
    Inventors: Abhishek R. Appu, Aravindh Anantaraman, Elmoustapha Ould-Ahmed-Vall, Valentin Andrei, Nicolas Galoppo Von Borries, Varghese George, Altug Koker, Mike Macpherson, Subramaniam Maiyuran, Joydeep Ray, Vasanth Ranganathan
  • Publication number: 20220129521
    Abstract: Embodiments described herein include software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. One embodiment provides techniques to optimize training and inference on a systolic array when using sparse data. One embodiment provides techniques to use decompression information when performing sparse compute operations. One embodiment enables the disaggregation of special function compute arrays via a shared reg file. One embodiment enables packed data compress and expand operations on a GPGPU. One embodiment provides techniques to exploit block sparsity within the cache hierarchy of a GPGPU.
    Type: Application
    Filed: March 14, 2020
    Publication date: April 28, 2022
    Applicant: INTEL CORPORATION
    Inventors: Prasoonkumar Surti, Subramaniam Maiyuran, Valentin Andrei, Abhishek Appu, Varghese George, Altug Koker, Mike Macpherson, Elmoustapha Ould-Ahmed-Vall, Vasanth Ranganathan, Joydeep Ray, Lakshminarayanan Striramassarma, SungYe Kim
  • Patent number: 11314515
    Abstract: Embodiments described herein provide for an instruction and associated logic to enable a vector multiply add instructions with automatic zero skipping for sparse input. One embodiment provides for a general-purpose graphics processor comprising logic to perform operations comprising fetching a hardware macro instruction having a predicate mask, a repeat count, and a set of initial operands, where the initial operands include a destination operand and multiple source operands. The hardware macro instruction is configured to perform one or more multiply/add operations on input data associated with a set of matrices.
    Type: Grant
    Filed: December 23, 2019
    Date of Patent: April 26, 2022
    Assignee: Intel Corporation
    Inventors: Supratim Pal, Sasikanth Avancha, Ishwar Bhati, Wei-Yu Chen, Dipankar Das, Ashutosh Garg, Chandra S. Gurram, Junjie Gu, Guei-Yuan Lueh, Subramaniam Maiyuran, Jorge E. Parra, Sudarshan Srinivasan, Varghese George
  • Publication number: 20220121421
    Abstract: Methods and apparatus relating to techniques for multi-tile memory management. In an example, an apparatus comprises a cache memory, a high-bandwidth memory, a shader core communicatively coupled to the cache memory and comprising a processing element to decompress a first data element extracted from an in-memory database in the cache memory and having a first bit length to generate a second data element having a second bit length, greater than the first bit length, and an arithmetic logic unit (ALU) to compare the data element to a target value provided in a query of the in-memory database. Other embodiments are also disclosed and claimed.
    Type: Application
    Filed: March 14, 2020
    Publication date: April 21, 2022
    Applicant: Intel Corporation
    Inventors: Abhishek R. Appu, Altug Koker, Aravindh Anantaraman, Elmoustapha Ould-Ahmed-Vall, Valentin Andrei, Nicolas Galoppo Von Borries, Varghese George, Mike Macpherson, Subramaniam Maiyuran, Joydeep Ray, Lakshminarayana Striramassarma, Scott Janus, Brent Insko, Vasanth Ranganathan, Kamal Sinha, Arthur Hunter, Prasoonkumar Surti, David Puffer, James Valerio, Ankur N. Shah
  • Publication number: 20220114108
    Abstract: Systems and methods for improving cache efficiency and utilization are disclosed. In one embodiment, a graphics processor includes processing resources to perform graphics operations and a cache controller of a cache memory that is coupled to the processing resources. The cache controller is configured to set an initial aging policy using an aging field based on age of cache lines within the cache memory and to determine whether a hint or an instruction to indicate a level of aging has been received.
    Type: Application
    Filed: March 14, 2020
    Publication date: April 14, 2022
    Applicant: Intel Corporation
    Inventors: Altug Koker, Joydeep Ray, Elmoustapha Ould-Ahmed-Vall, Abhishek Appu, Aravindh Anantaraman, Valentin Andrei, Durgaprasad Bilagi, Varghese George, Brent Insko, Sanjeev Jahagirdar, Scott Janus, Pattabhiraman K., SungYe Kim, Subramaniam Maiyuran, Vasanth Ranganathan, Lakshminarayanan Striramassarma, Xinmin Tian
  • Publication number: 20220114096
    Abstract: Multi-tile Memory Management for Detecting Cross Tile Access, Providing Multi-Tile Inference Scaling with multicasting of data via copy operation, and Providing Page Migration are disclosed herein. In one embodiment, a graphics processor for a multi-tile architecture includes a first graphics processing unit (GPU) having a memory and a memory controller, a second graphics processing unit (GPU) having a memory and a cross-GPU fabric to communicatively couple the first and second GPUs. The memory controller is configured to determine whether frequent cross tile memory accesses occur from the first GPU to the memory of the second GPU in the multi-GPU configuration and to send a message to initiate a data transfer mechanism when frequent cross tile memory accesses occur from the first GPU to the memory of the second GPU.
    Type: Application
    Filed: March 14, 2020
    Publication date: April 14, 2022
    Applicant: Intel Corporation
    Inventors: Lakshminarayanan Striramassarma, Prasoonkumar Surti, Varghese George, Ben Ashbaugh, Aravindh Anantaraman, Valentin Andrei, Abhishek Appu, Nicolas Galoppo Von Borries, Altug Koker, Mike Macpherson, Subramaniam Maiyuran, Nilay Mistry, Elmoustapha Ould-Ahmed-Vall, Selvakumar Panneer, Vasanth Ranganathan, Joydeep Ray, Ankur Shah, Saurabh Tangri
  • Publication number: 20220066931
    Abstract: Embodiments described herein provide techniques to enable the dynamic reconfiguration of memory on a general-purpose graphics processing unit. One embodiment described herein enables dynamic reconfiguration of cache memory bank assignments based on hardware statistics. One embodiment enables for virtual memory address translation using mixed four kilobyte and sixty-four kilobyte pages within the same page table hierarchy and under the same page directory. One embodiment provides for a graphics processor and associated heterogenous processing system having near and far regions of the same level of a cache hierarchy.
    Type: Application
    Filed: March 14, 2020
    Publication date: March 3, 2022
    Applicant: INTEL CORPORATION
    Inventors: JOYDEEP RAY, NIRANJAN COORAY, SUBRAMANIAM MAIYURAN, ALTUG KOKER, PRASOONKUMAR SURTI, VARGHESE GEORGE, VALENTIN ANDREI, ABHISHEK APPU, GUADALUPE GARCIA, PATTABHIRAMAN K, SUNGYE KIM, SANJAY KUMAR, PRATIK MAROLIA, ELMOUSTAPHA OULD-AHMED-VALL, VASANTH RANGANATHAN, WILLIAM SADLER, LAKSHMINARAYANAN STRIRAMASSARMA
  • Publication number: 20220066737
    Abstract: Examples described herein relate to instructions to request performance of tanh and sigmoid instructions. For example, a compiler can generate native tanh instructions to perform tanh. In some examples, a tanh function can be compiled into instructions that include an instruction to perform either tanh(input) or tanh(input)/input depending on a value of the input to generate an intermediate output; an instruction to cause a performance of generation of scale factor based on the input; and an instruction to cause performance of a multiplication operation on the intermediate result with the scale factor. For example, a sigmoid function can be compiled to cause a math pipeline to perform a range check and performs operations based on a range.
    Type: Application
    Filed: August 26, 2020
    Publication date: March 3, 2022
    Inventors: Shuai MU, Cristina S. ANDERSON, Subramaniam MAIYURAN
  • Publication number: 20220058158
    Abstract: An apparatus to facilitate computing efficient cross channel operations in parallel computing machines using systolic arrays is disclosed. The apparatus includes a plurality of registers and one or more processing elements communicably coupled to the plurality of registers. The one or more processing elements include a systolic array circuit to perform cross-channel operations on source data received from a single source register of the plurality of registers, wherein the systolic array circuit is modified to: receive inputs from the single source register at different stages of the systolic array circuit; perform cross-channel operations at channels of the systolic array circuit; bypass disabled channels of the systolic array circuit, the disabled channels not used to compute the cross-channel operations; and broadcast a final result of a final stage of the systolic array circuit to all channels of a destination register.
    Type: Application
    Filed: November 3, 2021
    Publication date: February 24, 2022
    Applicant: Intel Corporation
    Inventors: Subramaniam Maiyuran, Jorge Parra, Supratim Pal, Chandra Gurram
  • Publication number: 20220058053
    Abstract: One embodiment provides for a general-purpose graphics processing unit comprising a set of processing elements to execute one or more thread groups of a second kernel to be executed by the general-purpose graphics processor, an on-chip memory coupled to the set of processing elements, and a scheduler coupled with the set of processing elements, the scheduler to schedule the thread groups of the kernel to the set of processing elements, wherein the scheduler is to schedule a thread group of the second kernel to execute subsequent to a thread group of a first kernel, the thread group of the second kernel configured to access a region of the on-chip memory that contains data written by the thread group of the first kernel in response to a determination that the second kernel is dependent upon the first kernel.
    Type: Application
    Filed: September 13, 2021
    Publication date: February 24, 2022
    Applicant: Intel Corporation
    Inventors: Valentin Andrei, Aravindh Anantaraman, Abhishek R. Appu, Nicolas C. Galoppo von Borries, Altug Koker, SungYe Kim, Elmoustapha Ould-Ahmed-Vall, Mike Macpherson, Subramaniam Maiyuran, Vasanth Ranganathan, Joydeep Ray, Varghese George
  • Patent number: 11257274
    Abstract: An embodiment of an electronic processing system may include an application processor, persistent storage media communicatively coupled to the application processor, and a graphics subsystem communicatively coupled to the application processor. The system may include one or more of a draw call re-orderer communicatively coupled to the application processor and the graphics subsystem to re-order two or more draw calls, a workload re-orderer communicatively coupled to the application processor and the graphics subsystem to re-order two or more work items in an order independent mode, a queue primitive included in at least one of the two or more draw calls to define a producer stage and a consumer stage, and an order-independent executor communicatively coupled to the application processor and the graphics subsystem to provide tile-based order independent execution of a compute stage. Other embodiments are disclosed and claimed.
    Type: Grant
    Filed: May 29, 2020
    Date of Patent: February 22, 2022
    Assignee: Intel Corporation
    Inventors: Devan Burke, Adam T. Lake, Jeffery S. Boles, John H. Feit, Karthik Vaidyanathan, Abhishek R. Appu, Joydeep Ray, Subramaniam Maiyuran, Altug Koker, Balaji Vembu, Murali Ramadoss, Prasoonkumar Surti, Eric J. Hoekstra, Gabor Liktor, Jonathan Kennedy, Slawomir Grajewski, Elmoustapha Ould-Ahmed-Vall
  • Patent number: 11250627
    Abstract: An apparatus to facilitate graphics rendering is disclosed. The apparatus comprises sequencer hardware to operate in a tile mode to render objects, including performing batch formation to generate one or more batches of received objects, performing tile sequencing for each of the objects to compute tile fill intersects for each of the objects and performing a play sequencing of each of the objects.
    Type: Grant
    Filed: June 29, 2020
    Date of Patent: February 15, 2022
    Assignee: Intel Corporation
    Inventors: Subramaniam Maiyuran, Saurabh Sharma, Jorge F. Garcia Pabon, Raghavendra Kamath Miyar, Sudheendra Srivathsa, Justin Decell, Aditya Navale
  • Patent number: 11232533
    Abstract: Embodiments are generally directed to memory prefetching in multiple GPU environment. An embodiment of an apparatus includes multiple processors including a host processor and multiple graphics processing units (GPUs) to process data, each of the GPUs including a prefetcher and a cache; and a memory for storage of data, the memory including a plurality of memory elements, wherein the prefetcher of each of the GPUs is to prefetch data from the memory to the cache of the GPU; and wherein the prefetcher of a GPU is prohibited from prefetching from a page that is not owned by the GPU or by the host processor.
    Type: Grant
    Filed: March 15, 2019
    Date of Patent: January 25, 2022
    Assignee: INTEL CORPORATION
    Inventors: Joydeep Ray, Aravindh Anantaraman, Valentin Andrei, Abhishek R. Appu, Nicolas Galoppo von Borries, Varghese George, Altug Koker, Elmoustapha Ould-Ahmed-Vall, Mike Macpherson, Subramaniam Maiyuran
  • Patent number: 11227358
    Abstract: Apparatuses including general-purpose graphics processing units and graphics multiprocessors that exploit queues or transitional buffers for improved low-latency high-bandwidth on-die data retrieval are disclosed. In one embodiment, a graphics multiprocessor includes at least one compute engine to provide a request, a queue or transitional buffer, and logic coupled to the queue or transitional buffer. The logic is configured to cause a request to be transferred to a queue or transitional buffer for temporary storage without processing the request and to determine whether the queue or transitional buffer has a predetermined amount of storage capacity.
    Type: Grant
    Filed: March 15, 2019
    Date of Patent: January 18, 2022
    Assignee: Intel Corporation
    Inventors: Aravindh Anantaraman, Altug Koker, Varghese George, Subramaniam Maiyuran, SungYe Kim, Valentin Andrei
  • Patent number: 11221848
    Abstract: Embodiments described herein provide an apparatus comprising a plurality of processing resources including a first processing resource and a second processing resource, a shared local memory communicatively coupled to the first processing resource and the second processing resource, and a processor to receive an instruction to initiate a matrix multiplication operation, write a first set of matrix data into a first set of registers, and share the first set of matrix data between the first processing resource and the second processing resource for use in the matrix multiplication operation. Other embodiments may be described and claimed.
    Type: Grant
    Filed: September 25, 2019
    Date of Patent: January 11, 2022
    Assignee: INTEL CORPORATION
    Inventors: Subramaniam Maiyuran, Varghese George, Joydeep Ray, Ashutosh Garg, Jorge Parra, Shubh Shah, Shubra Marwaha
  • Patent number: 11222392
    Abstract: An apparatus to facilitate compute optimization is disclosed. The apparatus includes a memory device including a first integrated circuit (IC) including a plurality of memory channels and a second IC including a plurality of processing units, each coupled to a memory channel in the plurality of memory channels.
    Type: Grant
    Filed: August 5, 2019
    Date of Patent: January 11, 2022
    Assignee: Intel Corporation
    Inventors: Prasoonkumar Surti, Narayan Srinivasa, Feng Chen, Joydeep Ray, Ben J. Ashbaugh, Nicolas C. Galoppo Von Borries, Eriko Nurvitadhi, Balaji Vembu, Tsung-Han Lin, Kamal Sinha, Rajkishore Barik, Sara S. Baghsorkhi, Justin E. Gottschlich, Altug Koker, Nadathur Rajagopalan Satish, Farshad Akhbari, Dukhwan Kim, Wenyin Fu, Travis T. Schluessler, Josh B. Mastronarde, Linda L. Hurd, John H. Feit, Jeffery S. Boles, Adam T. Lake, Karthik Vaidyanathan, Devan Burke, Subramaniam Maiyuran, Abhishek R. Appu
  • Publication number: 20210407194
    Abstract: An apparatus to facilitate graphics rendering is disclosed. The apparatus comprises sequencer hardware to operate in a tile mode to render objects, including performing batch formation to generate one or more batches of received objects, performing tile sequencing for each of the objects to compute tile fill intersects for each of the objects and performing a play sequencing of each of the objects.
    Type: Application
    Filed: June 29, 2020
    Publication date: December 30, 2021
    Applicant: Intel Corporation
    Inventors: Subramaniam Maiyuran, Saurabh Sharma, Jorge F. Garcia Pabon, Raghavendra Kamath Miyar, Sudheendra Srivathsa, Justin Decell, Aditya Navale