Patents by Inventor Supratim Pal
Supratim Pal has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 10692170Abstract: Embodiments described herein provide a graphics processor in which dependency tracking hardware is simplified via the use of compiler provided software scoreboard information. In one embodiment the shader compiler for shader programs is configured to encode software scoreboard information into each instruction. Dependencies can be evaluated by the shader compiler and provided as scoreboard information with each instruction. The hardware can then use the provided information when scheduling instructions. In one embodiment, a software scoreboard synchronization instruction is provided to facilitate software dependency handling within a shader program. Using software to facilitate software dependency handling and synchronization can simplify hardware design, reducing the area consumed by the hardware. In one embodiment, dependencies can be evaluated by the shader compiler instead of the GPU hardware.Type: GrantFiled: June 11, 2019Date of Patent: June 23, 2020Assignee: Intel CorporationInventors: Subramaniam Maiyuran, Supratim Pal, Jorge E. Parra, Chandra S. Gurram, Ashwin J. Shivani, Ashutosh Garg, Brent A. Schwartz, Jorge F. Garcia Pabon, Darin M. Starkey, Shubh B. Shah, Guei-Yuan Lueh, Kaiyu Chen, Konrad Trifunovic, Buqi Cheng, Weiyu Chen
-
Publication number: 20200073664Abstract: An apparatus to facilitate register sharing is disclosed.Type: ApplicationFiled: September 1, 2018Publication date: March 5, 2020Applicant: Intel CorporationInventors: PRATIK J. ASHAR, SUPRATIM PAL, SUBRAMANIAM MAIYURAN, WEI-YU CHEN, GUEI-YUAN LUEH
-
Publication number: 20200026514Abstract: In an example, an apparatus comprises a plurality of execution units, and a first general register file (GRF) communicatively couple to the plurality of execution units, wherein the first GRF is shared by the plurality of execution units. Other embodiments are also disclosed and claimed.Type: ApplicationFiled: July 30, 2019Publication date: January 23, 2020Applicant: Intel CorporationInventors: Abhishek R. Appu, Altug Koker, Joydeep Ray, Kamal Sinha, Kiran C. Veernapu, Subramaniam Maiyuran, Prasoonkumar Surti, Guei-Yuan Lueh, David Puffer, Supratim Pal, Eric J. Hoekstra, Travis T. Schluessler, Linda L. Hurd
-
Publication number: 20190362460Abstract: Embodiments described herein provide a graphics processor in which dependency tracking hardware is simplified via the use of compiler provided software scoreboard information. In one embodiment the shader compiler for shader programs is configured to encode software scoreboard information into each instruction. Dependencies can be evaluated by the shader compiler and provided as scoreboard information with each instruction. The hardware can then use the provided information when scheduling instructions. In one embodiment, a software scoreboard synchronization instruction is provided to facilitate software dependency handling within a shader program. Using software to facilitate software dependency handling and synchronization can simplify hardware design, reducing the area consumed by the hardware. In one embodiment, dependencies can be evaluated by the shader compiler instead of the GPU hardware.Type: ApplicationFiled: June 11, 2019Publication date: November 28, 2019Applicant: Intel CorporationInventors: Subramaniam Maiyuran, Supratim Pal, Jorge E. Parra, Chandra S. Gurram, Ashwin J. Shivani, Ashutosh Garg, Brent A. Schwartz, Jorge F. Garcia Pabon, Darin M. Starkey, Shubh B. Shah, Guei-Yuan Lueh, Kaiyu Chen, Konrad Trifunovic, Buqi Cheng, Weiyu Chen
-
Publication number: 20190324746Abstract: Embodiments described herein provided for an instruction and associated logic to enable GPGPU program code to access special purpose hardware logic to accelerate dot product operations. One embodiment provides for a graphics processing unit comprising a fetch unit to fetch an instruction for execution and a decode unit to decode the instruction into a decoded instruction. The decoded instruction is a matrix instruction to cause the graphics processing unit to perform a parallel dot product operation. The GPGPU also includes a systolic dot product unit to execute the decoded instruction across one or more SIMD lanes using multiple systolic layers, wherein to execute the decoded instruction, a dot product computed at a first systolic layer is to be output to a second systolic layer, wherein each systolic layer includes one or more sets of interconnected multipliers and adders, each set of multipliers and adders to generate a dot product.Type: ApplicationFiled: April 19, 2018Publication date: October 24, 2019Applicant: Intel CorporationInventors: SUBRAMANIAM MAIYURAN, GUEI-YUAN LUEH, SUPRATIM PAL, ASHUTOSH GARG, CHANDRA S. GURRAM, JORGE E. PARRA, JUNJIE GU, KONRAD TRIFUNOVIC, HONG BIN LIAO, MIKE B. MACPHERSON, SHUBH B. SHAH, SHUBRA MARWAHA, STEPHEN JUNKINS, TIMOTHY R. BAUER, VARGHESE GEORGE, WEIYU CHEN
-
Patent number: 10423415Abstract: Disclosed herein is an apparatus which comprises a plurality of execution units, and a first general register file (GRF) communicatively couple to the plurality of execution units, wherein the first GRF is shared by the plurality of execution units.Type: GrantFiled: April 1, 2017Date of Patent: September 24, 2019Assignee: INTEL CORPORATIONInventors: Abhishek R. Appu, Altug Koker, Joydeep Ray, Kamal Sinha, Kiran C. Veernapu, Subramaniam Maiyuran, Prasoonkumar Surti, Guei-Yuan Lueh, David Puffer, Supratim Pal, Eric J. Hoekstra, Travis T. Schluessler, Linda L. Hurd
-
Publication number: 20190265973Abstract: Methods and apparatus relating to techniques for fusing SIMD processing units. In an example, an apparatus comprises logic, at least partially comprising hardware logic, to receive an instruction set for execution on at least two graphics processing execution units, determine whether the instruction set requires data dependent addressing, and select between a synchronized execution environment for the at least two graphics processing units and an unsynchronized execution environment for the at least two graphics processing units based at least in part on the determination whether the instruction set requires data dependent addressing. Other embodiments are also disclosed and claimed.Type: ApplicationFiled: February 23, 2018Publication date: August 29, 2019Applicant: Intel CorporationInventors: Subramaniam Maiyuran, Supratim Pal, Ashutosh Garg, Darin M. Starkey, Guei-Yuan Lueh, Jorge E. Parra, Shubh B. Shah, Wei-Yu Chen, Vikranth Vemulapalli, Narsim Krishna, Brent A. Schwartz, Chandra S. Gurram, Wei Pan, Ashwin J. Shivani
-
Patent number: 10360654Abstract: Embodiments described herein provide a graphics processor in which dependency tracking hardware is simplified via the use of compiler provided software scoreboard information. In one embodiment the shader compiler for shader programs is configured to encode software scoreboard information into each instruction. Dependencies can be evaluated by the shader compiler and provided as scoreboard information with each instruction. The hardware can then use the provided information when scheduling instructions. In one embodiment, a software scoreboard synchronization instruction is provided to facilitate software dependency handling within a shader program. Using software to facilitate software dependency handling and synchronization can simplify hardware design, reducing the area consumed by the hardware. In one embodiment, dependencies can be evaluated by the shader compiler instead of the GPU hardware.Type: GrantFiled: May 25, 2018Date of Patent: July 23, 2019Assignee: Intel CorporationInventors: Subramaniam Maiyuran, Supratim Pal, Jorge E. Parra, Chandra S. Gurram, Ashwin J. Shivani, Ashutosh Garg, Brent A. Schwartz, Jorge F. Garcia Pabon, Darin M. Starkey, Shubh B. Shah, Guei-Yuan Lueh, Kaiyu Chen, Konrad Trifunovic, Buqi Cheng, Weiyu Chen
-
Patent number: 10152452Abstract: Techniques to suppress redundant reads to register addresses and to replicate read data are disclosed. The redundant reads are suppressed when multiple source operands specify the same register address to read. Additionally, the read data is replicated to a data stream or data location corresponding to the source operands where the data read was suppressed.Type: GrantFiled: May 29, 2015Date of Patent: December 11, 2018Assignee: INTEL CORPORATIONInventors: Supratim Pal, Subramaniam Maiyuran, Mark C. Davis
-
Publication number: 20180307487Abstract: An apparatus to facilitate control flow in a graphics processing system is disclosed. The apparatus includes logic a plurality of execution units to execute single instruction, multiple data (SIMD) and flow control logic to detect a diverging control flow in a plurality of SIMD channels and reduce the execution of the control flow to a subset of the SIMD channels.Type: ApplicationFiled: April 21, 2017Publication date: October 25, 2018Inventors: Subramaniam M. Maiyuran, Guei-Yuan Lueh, Supratim Pal, Gang Chen, Ananda V. Kommaraju, Joy Chandra, Altug Koker, Prasoonkumar Surti, David Puffer, Hong Bin Liao, Joydeep Ray, Abhishek R. Appu, Ankur N. Shah, Travis T. Schluessler, Jonathan Kennedy, Devan Burke
-
Publication number: 20180285106Abstract: In an example, an apparatus comprises a plurality of execution units, and a first general register file (GRF) communicatively couple to the plurality of execution units, wherein the first GRF is shared by the plurality of execution units. Other embodiments are also disclosed and claimed.Type: ApplicationFiled: April 1, 2017Publication date: October 4, 2018Inventors: Abhishek R. Appu, Altug Koker, Joydeep Ray, Kamal Sinha, Kiran C. Veernapu, Subramaniam Maiyuran, Prasoonkumar Surti, Guei-Yuan Lueh, David Puffer, Supratim Pal, Eric J. Hoekstra, Travis T. Schluessler, Linda L. Hurd
-
Patent number: 9880839Abstract: A processor is described having an instruction execution pipeline. The instruction execution pipeline has an instruction fetch stage to fetch an instruction specifying multiple target resultant registers. The instruction execution pipeline has an instruction decode stage to decode the instruction. The instruction execution pipeline has a functional unit to prepare resultant content specific to each of the multiple target resultant registers. The instruction execution pipeline has a write-back stage to write back said resultant content specific to each of said multiple target resultant registers.Type: GrantFiled: April 24, 2014Date of Patent: January 30, 2018Assignee: INTEL CORPORATIONInventors: Wei-Yu Chen, Guei-Yuan Lueh, Subramaniam Maiyuran, Supratim Pal
-
Patent number: 9632801Abstract: Conversion of an array of structures (AOS) to a structure of arrays (SOA) improves the efficiency of transfer from the AOS to the SOA. A similar technique can be used to convert efficiently from an SOA to an AOS. The controller performing the conversion computes a partition size as the highest common factor between the structure size of structures in AOS and the number of banks in a first memory device, and transfers data based on the partition size, rather than on the structure size. The controller can read a partition size number of elements from multiple different structures to ensure that full data transfer bandwidth is used for each transfer.Type: GrantFiled: April 9, 2014Date of Patent: April 25, 2017Assignee: Intel CorporationInventors: Supratim Pal, Murali Sundaresan
-
Publication number: 20170010894Abstract: Systems, apparatuses and methods may provide for associating a first instruction pointer with an IF block of a primary IF-ELSE conditional construct associated with a thread and activating a second instruction pointer in response to a dependency associated with the IF block. Additionally, the second instruction pointer may be associated with an ELSE block of the primary IF-ELSE conditional construct. In one example, the IF block and the ELSE block are executed, via the first instruction pointer and the second instruction pointer, one or more of independently from or parallel to one another.Type: ApplicationFiled: July 8, 2015Publication date: January 12, 2017Applicant: Intel CorporationInventors: Hema C. Nalluri, Supratim Pal, Subramaniam Maiyuran, Joy Chandra
-
Publication number: 20160350112Abstract: Techniques to suppress redundant reads to register addresses and to replicate read data are disclosed. The redundant reads are suppressed when multiple source operands specify the same register address to read. Additionally, the read data is replicated to a data stream or data location corresponding to the source operands where the data read was suppressed.Type: ApplicationFiled: May 29, 2015Publication date: December 1, 2016Applicant: Intel CorporationInventors: SUPRATIM PAL, SUBRAMANIAM MAIYURAN, MARK C. DAVIS
-
Patent number: 9245495Abstract: Systems, apparatus, articles, and methods are described including operations to generate a weighted look-up-table based at least in part on individual pixel input values within an active block region and on a plurality of contrast compensation functions. A second level compensation may be performed for a center pixel block of the active region based at least in part on the weighted look-up-table.Type: GrantFiled: December 21, 2012Date of Patent: January 26, 2016Assignee: INTEL CORPORATIONInventors: Niraj Gupta, Supratim Pal, Mahesh B. Chappalli, Yi-Jen Chiu, Hong Jiang
-
Publication number: 20150309800Abstract: A processor is described having an instruction execution pipeline. The instruction execution pipeline has an instruction fetch stage to fetch an instruction specifying multiple target resultant registers. The instruction execution pipeline has an instruction decode stage to decode the instruction. The instruction execution pipeline has a functional unit to prepare resultant content specific to each of the multiple target resultant registers. The instruction execution pipeline has a write-back stage to write back said resultant content specific to each of said multiple target resultant registers.Type: ApplicationFiled: April 24, 2014Publication date: October 29, 2015Inventors: WEI-YU CHEN, GUEI-YUAN LUEH, SUBRAMANIAM MAIYURAN, SUPRATIM PAL
-
Publication number: 20150294435Abstract: Conversion of an array of structures (AOS) to a structure of arrays (SOA) improves the efficiency of transfer from the AOS to the SOA. A similar technique can be used to convert efficiently from an SOA to an AOS. The controller performing the conversion computes a partition size as the highest common factor between the structure size of structures in AOS and the number of banks in a first memory device, and transfers data based on the partition size, rather than on the structure size. The controller can read a partition size number of elements from multiple different structures to ensure that full data transfer bandwidth is used for each transfer.Type: ApplicationFiled: April 9, 2014Publication date: October 15, 2015Applicant: Intel CorporationInventors: Supratim Pal, Murali Sundaresan
-
Publication number: 20140313243Abstract: Systems, apparatus, articles, and methods are described including operations to generate a weighted look-up-table based at least in part on individual pixel input values within an active block region and on a plurality of contrast compensation functions. A second level compensation may be performed for a center pixel block of the active region based at least in part on the weighted look-up-table.Type: ApplicationFiled: December 21, 2012Publication date: October 23, 2014Inventors: Niraj Gupta, Supratim Pal, Mahesh B. Chappalli, Yi-Jen Chiu, Hong Jiang