Patents by Inventor Chandra S. Gurram
Chandra S. Gurram has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11900502Abstract: Examples described herein relate to a software and hardware optimization that manages scenarios where a write operation to a register is less than an entirety of the register. A compiler detects instructions that make partial writes to the same register, groups such instructions, and provides hints to hardware of the partial write. The execution unit combines the output data for grouped instructions and updates the destination register as single write instead of multiple separate partial writes.Type: GrantFiled: May 2, 2022Date of Patent: February 13, 2024Assignee: Intel CorporationInventors: Chandra S. Gurram, Gang Y. Chen, Subramaniam Maiyuran, Supratim Pal, Ashutosh Garg, Jorge E. Parra, Darin M. Starkey, Guei-Yuan Lueh, Wei-Yu Chen
-
Publication number: 20230297373Abstract: Embodiments described herein provided for an instruction and associated logic to enable GPGPU program code to access special purpose hardware logic to accelerate dot product operations. One embodiment provides for a graphics processing unit comprising a fetch unit to fetch a single instruction for execution, a decode unit to decode the single instruction into a decoded instruction, wherein the decoded instruction is to cause the graphics processing unit to perform a set of parallel dot product operations on elements of input matrices, and a systolic dot product unit to execute the decoded instruction across one or more parallel processor lanes using multiple systolic layers associated with multiple pipeline stages. The multiple pipeline stages include one or more sets of interconnected multipliers and adders to compute multiple concurrent dot products.Type: ApplicationFiled: April 26, 2023Publication date: September 21, 2023Applicant: Intel CorporationInventors: SUBRAMANIAM MAIYURAN, GUEI-YUAN LUEH, SUPRATIM PAL, ASHUTOSH GARG, CHANDRA S. GURRAM, JORGE E. PARRA, JUNJIE GU, KONRAD TRIFUNOVIC, HONG BIN LIAO, MIKE B. MACPHERSON, SHUBH B. SHAH, SHUBRA MARWAHA, STEPHEN JUNKINS, TIMOTHY R. BAUER, VARGHESE GEORGE, WEIYU CHEN
-
Patent number: 11669329Abstract: Embodiments described herein provide for an instruction and associated logic to enable a vector multiply add instructions with automatic zero skipping for sparse input. One embodiment provides for a general-purpose graphics processor comprising logic to perform operations comprising fetching a hardware macro instruction having a predicate mask, a repeat count, and a set of initial operands, where the initial operands include a destination operand and multiple source operands. The hardware macro instruction is configured to perform one or more multiply/add operations on input data associated with a set of matrices.Type: GrantFiled: April 18, 2022Date of Patent: June 6, 2023Assignee: Intel CorporationInventors: Supratim Pal, Sasikanth Avancha, Ishwar Bhati, Wei-Yu Chen, Dipankar Das, Ashutosh Garg, Chandra S. Gurram, Junjie Gu, Guei-Yuan Lueh, Subramaniam Maiyuran, Jorge E. Parra, Sudarshan Srinivasan, Varghese George
-
Patent number: 11640297Abstract: Embodiments described herein provided for an instruction and associated logic to enable GPGPU program code to access special purpose hardware logic to accelerate dot product operations. One embodiment provides for a graphics processing unit comprising a fetch unit to fetch an instruction for execution and a decode unit to decode the instruction into a decoded instruction. The decoded instruction is a matrix instruction to cause the graphics processing unit to perform a parallel dot product operation. The GPGPU also includes systolic dot product circuitry to execute the decoded instruction across one or more SIMD lanes using multiple systolic layers, wherein to execute the decoded instruction, a dot product computed at a first systolic layer is to be output to a second systolic layer, wherein each systolic layer includes one or more sets of interconnected multipliers and adders, each set of multipliers and adders to generate a dot product.Type: GrantFiled: June 15, 2021Date of Patent: May 2, 2023Assignee: Intel CorporationInventors: Subramaniam Maiyuran, Guei-Yuan Lueh, Supratim Pal, Ashutosh Garg, Chandra S. Gurram, Jorge E. Parra, Junjie Gu, Konrad Trifunovic, Hong Bin Liao, Mike B. MacPherson, Shubh B. Shah, Shubra Marwaha, Stephen Junkins, Timothy R. Bauer, Varghese George, Weiyu Chen
-
Publication number: 20220326953Abstract: Embodiments described herein provide for an instruction and associated logic to enable a vector multiply add instructions with automatic zero skipping for sparse input. One embodiment provides for a general-purpose graphics processor comprising logic to perform operations comprising fetching a hardware macro instruction having a predicate mask, a repeat count, and a set of initial operands, where the initial operands include a destination operand and multiple source operands. The hardware macro instruction is configured to perform one or more multiply/add operations on input data associated with a set of matrices.Type: ApplicationFiled: April 18, 2022Publication date: October 13, 2022Applicant: Intel CorporationInventors: Supratim Pal, Sasikanth Avancha, Ishwar Bhati, Wei-Yu Chen, Dipankar Das, Ashutosh Garg, Chandra S. Gurram, Junjie Gu, Guei-Yuan Lueh, Subramaniam Maiyuran, Jorge E. Parra, Sudarshan Srinivasan, Varghese George
-
Publication number: 20220308877Abstract: A graphics processing apparatus includes a graphics processor and a constant cache. The graphics processor has a number of execution instances that will generate requests for constant data from the constant cache. The constant cache stores constants of multiple constant types. The constant cache has a single level of hierarchy to store the constant data. The constant cache has a banking structure based on the number of execution instances, where the execution instances generate requests for the constant data with unified messaging that is the same for the different types of constant data.Type: ApplicationFiled: March 26, 2021Publication date: September 29, 2022Inventors: Subramaniam MAIYURAN, Sudarshanram SHETTY, Travis SCHLUESSLER, Guei-Yuan LUEH, PingHang CHEUNG, Srividya KARUMURI, Chandra S. GURRAM, Shuai MU, Vikranth VEMULAPALLI
-
Publication number: 20220261949Abstract: Examples described herein relate to a software and hardware optimization that manages scenarios where a write operation to a register is less than an entirety of the register. A compiler detects instructions that make partial writes to the same register, groups such instructions, and provides hints to hardware of the partial write. The execution unit combines the output data for grouped instructions and updates the destination register as single write instead of multiple separate partial writes.Type: ApplicationFiled: May 2, 2022Publication date: August 18, 2022Inventors: Chandra S. GURRAM, Gang Y. CHEN, Subramaniam MAIYURAN, Supratim PAL, Ashutosh GARG, Jorge E. PARRA, Darin M. STARKEY, Guei-Yuan LUEH, Wei-Yu CHEN
-
Patent number: 11321799Abstract: Examples described herein relate to a software and hardware optimization that manages scenarios where a write operation to a register is less than an entirety of the register. A compiler detects instructions that make partial writes to the same register, groups such instructions, and provides hints to hardware of the partial write. The execution unit combines the output data for grouped instructions and updates the destination register as single write instead of multiple separate partial writes.Type: GrantFiled: December 24, 2019Date of Patent: May 3, 2022Assignee: Intel CorporationInventors: Chandra S. Gurram, Gang Y. Chen, Subramaniam Maiyuran, Supratim Pal, Ashutosh Garg, Jorge E. Parra, Darin M. Starkey, Guei-Yuan Lueh, Wei-Yu Chen
-
Patent number: 11314515Abstract: Embodiments described herein provide for an instruction and associated logic to enable a vector multiply add instructions with automatic zero skipping for sparse input. One embodiment provides for a general-purpose graphics processor comprising logic to perform operations comprising fetching a hardware macro instruction having a predicate mask, a repeat count, and a set of initial operands, where the initial operands include a destination operand and multiple source operands. The hardware macro instruction is configured to perform one or more multiply/add operations on input data associated with a set of matrices.Type: GrantFiled: December 23, 2019Date of Patent: April 26, 2022Assignee: Intel CorporationInventors: Supratim Pal, Sasikanth Avancha, Ishwar Bhati, Wei-Yu Chen, Dipankar Das, Ashutosh Garg, Chandra S. Gurram, Junjie Gu, Guei-Yuan Lueh, Subramaniam Maiyuran, Jorge E. Parra, Sudarshan Srinivasan, Varghese George
-
Publication number: 20210303299Abstract: Embodiments described herein provided for an instruction and associated logic to enable GPGPU program code to access special purpose hardware logic to accelerate dot product operations. One embodiment provides for a graphics processing unit comprising a fetch unit to fetch an instruction for execution and a decode unit to decode the instruction into a decoded instruction. The decoded instruction is a matrix instruction to cause the graphics processing unit to perform a parallel dot product operation. The GPGPU also includes systolic dot product circuitry to execute the decoded instruction across one or more SIMD lanes using multiple systolic layers, wherein to execute the decoded instruction, a dot product computed at a first systolic layer is to be output to a second systolic layer, wherein each systolic layer includes one or more sets of interconnected multipliers and adders, each set of multipliers and adders to generate a dot product.Type: ApplicationFiled: June 15, 2021Publication date: September 30, 2021Applicant: Intel CorporationInventors: SUBRAMANIAM MAIYURAN, GUEI-YUAN LUEH, SUPRATIM PAL, ASHUTOSH GARG, CHANDRA S. GURRAM, JORGE E. PARRA, JUNJIE GU, KONRAD TRIFUNOVIC, HONG BIN LIAO, MIKE B. MACPHERSON, SHUBH B. SHAH, SHUBRA MARWAHA, STEPHEN JUNKINS, TIMOTHY R. BAUER, VARGHESE GEORGE, WEIYU CHEN
-
Publication number: 20210191724Abstract: Embodiments described herein provide for an instruction and associated logic to enable a vector multiply add instructions with automatic zero skipping for sparse input. One embodiment provides for a general-purpose graphics processor comprising logic to perform operations comprising fetching a hardware macro instruction having a predicate mask, a repeat count, and a set of initial operands, where the initial operands include a destination operand and multiple source operands. The hardware macro instruction is configured to perform one or more multiply/add operations on input data associated with a set of matrices.Type: ApplicationFiled: December 23, 2019Publication date: June 24, 2021Applicant: Intel CorporationInventors: Supratim Pal, Sasikanth Avancha, Ishwar Bhati, Wei-Yu Chen, Dipankar Das, Ashutosh Garg, Chandra S. Gurram, Junjie Gu, Guei-Yuan Lueh, Subramaniam Maiyuran, Jorge E. Parra, Sudarshan Srinivasan, Varghese George
-
Publication number: 20210192673Abstract: Examples described herein relate to a software and hardware optimization that manages scenarios where a write operation to a register is less than an entirety of the register. A compiler detects instructions that make partial writes to the same register, groups such instructions, and provides hints to hardware of the partial write. The execution unit combines the output data for grouped instructions and updates the destination register as single write instead of multiple separate partial writes.Type: ApplicationFiled: December 24, 2019Publication date: June 24, 2021Inventors: Chandra S. GURRAM, Gang Y. CHEN, Subramaniam MAIYURAN, Supratim PAL, Ashutosh GARG, Jorge E. PARRA, Darin M. STARKEY, Guei-Yuan LUEH, Wei-Yu CHEN
-
Patent number: 11042370Abstract: Embodiments described herein provided for an instruction and associated logic to enable GPGPU program code to access special purpose hardware logic to accelerate dot product operations. One embodiment provides for a graphics processing unit comprising a fetch unit to fetch an instruction for execution and a decode unit to decode the instruction into a decoded instruction. The decoded instruction is a matrix instruction to cause the graphics processing unit to perform a parallel dot product operation. The GPGPU also includes a systolic dot product unit to execute the decoded instruction across one or more SIMD lanes using multiple systolic layers, wherein to execute the decoded instruction, a dot product computed at a first systolic layer is to be output to a second systolic layer, wherein each systolic layer includes one or more sets of interconnected multipliers and adders, each set of multipliers and adders to generate a dot product.Type: GrantFiled: April 19, 2018Date of Patent: June 22, 2021Assignee: Intel CorporationInventors: Subramaniam Maiyuran, Guei-Yuan Lueh, Supratim Pal, Ashutosh Garg, Chandra S. Gurram, Jorge E. Parra, Junjie Gu, Konrad Trifunovic, Hong Bin Liao, Mike B. Macpherson, Shubh B. Shah, Shubra Marwaha, Stephen Junkins, Timothy R. Bauer, Varghese George, Weiyu Chen
-
Patent number: 10983794Abstract: An processor to facilitate register sharing is disclosed. The processor includes a plurality of execution units (EUs), each including a General Purpose Register File (GRF) having a plurality of registers; and register sharing hardware to divide the plurality of registers into a first set of registers dedicated for execution of a first set of threads and a second set of registers shared for execution of a second set of threads.Type: GrantFiled: June 17, 2019Date of Patent: April 20, 2021Assignee: Intel CorporationInventors: Guei-Yuan Lueh, Subramaniam Maiyuran, Weiyu Chen, Konrad Trifunovic, Supratim Pal, Chandra S. Gurram, Jorge E. Parra, Pratik J. Ashar, Tomasz Bujewski
-
Publication number: 20200394041Abstract: An processor to facilitate register sharing is disclosed. The processor includes a plurality of execution units (EUs), each including a General Purpose Register File (GRF) having a plurality of registers; and register sharing hardware to divide the plurality of registers into a first set of registers dedicated for execution of a first set of threads and a second set of registers shared for execution of a second set of threads.Type: ApplicationFiled: June 17, 2019Publication date: December 17, 2020Applicant: Intel CorporationInventors: Guei-Yuan Lueh, Subramaniam Maiyuran, Weiyu Chen, Konrad Trifunovic, Supratim Pal, Chandra S. Gurram, Jorge E. Parra, Pratik J. Ashar, Tomasz Bujewski
-
Patent number: 10839478Abstract: A processor is disclosed. The processor includes an execution unit having a register file having one or more banks of registers to store operand values, an accumulator comprising a pool of registers to store operand values determined to cause a conflict at register banks within the register file and cache circuitry to control storage of the operand values determined to cause a conflict at the register banks from the register file to the pool of registers.Type: GrantFiled: April 8, 2019Date of Patent: November 17, 2020Assignee: Intel CorporationInventors: Guei-Yuan Lueh, Subramaniam Maiyuran, Wei-Yu Chen, Konrad Trifunovic, Supratim Pal, Chandra S. Gurram, Jorge E. Parra, Pratik J. Ashar, Tomasz Bujewski
-
Publication number: 20200320662Abstract: A processor is disclosed. The processor includes an execution unit having a register file having one or more banks of registers to store operand values, an accumulator comprising a pool of registers to store operand values determined to cause a conflict at register banks within the register file and cache circuitry to control storage of the operand values determined to cause a conflict at the register banks from the register file to the pool of registers.Type: ApplicationFiled: April 8, 2019Publication date: October 8, 2020Applicant: Intel CorporationInventors: Guei-Yuan Lueh, Subramaniam Maiyuran, Wei-Yu Chen, Konrad Trifunovic, Supratim Pal, Chandra S. Gurram, Jorge E. Parra, Pratik J. Ashar, Tomasz Bujewski
-
Patent number: 10769751Abstract: A processing apparatus is described. The apparatus includes a graphics processing unit (GPU), including a register file having a plurality of channels to store data and an execution unit to examine data at each of the plurality of channels, read a data value from a first of the plurality of channels upon a determination that each of the plurality of channels has the same data and execute a single input multi data (SIMD) instruction based on the data value.Type: GrantFiled: August 19, 2019Date of Patent: September 8, 2020Assignee: INTEL CORPORATIONInventors: Subramaniam Maiyuran, Jorge F. Garcia Pabon, Vikranth Vemulapalli, Chandra S. Gurram, Aditya Navale, Saurabh Sharma
-
Patent number: 10692170Abstract: Embodiments described herein provide a graphics processor in which dependency tracking hardware is simplified via the use of compiler provided software scoreboard information. In one embodiment the shader compiler for shader programs is configured to encode software scoreboard information into each instruction. Dependencies can be evaluated by the shader compiler and provided as scoreboard information with each instruction. The hardware can then use the provided information when scheduling instructions. In one embodiment, a software scoreboard synchronization instruction is provided to facilitate software dependency handling within a shader program. Using software to facilitate software dependency handling and synchronization can simplify hardware design, reducing the area consumed by the hardware. In one embodiment, dependencies can be evaluated by the shader compiler instead of the GPU hardware.Type: GrantFiled: June 11, 2019Date of Patent: June 23, 2020Assignee: Intel CorporationInventors: Subramaniam Maiyuran, Supratim Pal, Jorge E. Parra, Chandra S. Gurram, Ashwin J. Shivani, Ashutosh Garg, Brent A. Schwartz, Jorge F. Garcia Pabon, Darin M. Starkey, Shubh B. Shah, Guei-Yuan Lueh, Kaiyu Chen, Konrad Trifunovic, Buqi Cheng, Weiyu Chen
-
Publication number: 20200043124Abstract: A processing apparatus is described. The apparatus includes a graphics processing unit (GPU), including a register file having a plurality of channels to store data and an execution unit to examine data at each of the plurality of channels, read a data value from a first of the plurality of channels upon a determination that each of the plurality of channels has the same data and execute a single input multi data (SIMD) instruction based on the data value.Type: ApplicationFiled: August 19, 2019Publication date: February 6, 2020Applicant: Intel CorporationInventors: SUBRAMANIAM MAIYURAN, JORGE F. GARCIA PABON, VIKRANTH VEMULAPALLI, CHANDRA S. GURRAM, ADITYA NAVALE, SAURABH SHARMA