Patents by Inventor Vikranth Vemulapalli

Vikranth Vemulapalli has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12386617
    Abstract: An apparatus to facilitate gathering payload from arbitrary registers for send messages in a graphics environment is disclosed. The apparatus includes processing resources comprising execution circuitry to receive a send gather message instruction identifying a number of registers to access for a send message and identifying IDs of a plurality of individual registers corresponding to the number of registers; decode a first phase of the send gather message instruction; based on decoding the first phase, cause a second phase of the send gather message instruction to bypass an instruction decode stage; and dispatch the first phase subsequently followed by dispatch of the second phase to a send pipeline. The apparatus can also perform an immediate move of the IDs of the plurality of individual registers to an architectural register of the execution circuitry and include a pointer to the architectural register in the send gather message instruction.
    Type: Grant
    Filed: September 22, 2021
    Date of Patent: August 12, 2025
    Assignee: INTEL CORPORATION
    Inventors: Supratim Pal, Chandra Gurram, Fan-Yin Tzeng, Subramaniam Maiyuran, Guei-Yuan Lueh, Timothy R. Bauer, Vikranth Vemulapalli, Wei-Yu Chen
  • Publication number: 20250232511
    Abstract: Techniques are provided to enable mid-thread (instruction level) preemption in a graphics processor without requiring software intervention by a graphics driver associated with the graphics processor. Hardware based mid-thread preemption is facilitated using the thread dispatch hardware of the graphics processor to trigger execution of a kernel program by the graphics processor that saves the thread state of preempted processing resources and facilitates the subsequent restoration of that thread state to the same or a different set of processing resources.
    Type: Application
    Filed: January 17, 2024
    Publication date: July 17, 2025
    Applicant: Intel Corporation
    Inventors: Vasanth Ranganathan, James Valerio, Aditya Navale, Alan Curtis, Jeffery S. Boles, Hema Chand Nalluri, Vikranth Vemulapalli, Supratim Pal, Boris Kuznetsov, Brent A. Schwartz
  • Publication number: 20250231769
    Abstract: Techniques are provided to enable mid-thread (instruction level) preemption in a graphics processor without requiring software intervention by a graphics driver associated with the graphics processor. Hardware based mid-thread preemption is facilitated using the thread dispatch hardware of the graphics processor to trigger execution of a kernel program by the graphics processor that saves the thread state of preempted processing resources and facilitates the subsequent restoration of that thread state to the same or a different set of processing resources.
    Type: Application
    Filed: January 17, 2024
    Publication date: July 17, 2025
    Applicant: Intel Corporation
    Inventors: Vasanth Ranganathan, James Valerio, Aditya Navale, Alan Curtis, Jeffery S. Boles, Hema Chand Nalluri, Vikranth Vemulapalli, Supratim Pal, Boris Kuznetsov, John A. Wiegert
  • Publication number: 20250199858
    Abstract: Provision of multiple register allocation sizes for threads is described. An example of a system includes one or more processors including a graphics processor, the graphics processor including at least a first local thread dispatcher (TDL) and multiple processing resources, each processing resource including a plurality of registers; and memory for storage of data for processing, wherein the one or more processors are to determine a register size for a first thread; identify one or more processing resources having sufficient register space for the first thread; select a processing resource of the one or more processing resources having sufficient register space to assign the first thread; select an available thread slot of the selected processing resource for the first thread; and allocate registers of the selected processing resource for the first thread.
    Type: Application
    Filed: November 25, 2024
    Publication date: June 19, 2025
    Applicant: Intel Corporation
    Inventors: Chandra Gurram, Wei-Yu Chen, Vikranth Vemulapalli, Subramaniam Maiyuran, Jorge Eduardo Parra Osorio, Shuai Mu, Guei-Yuan Lueh, Supratim Pal
  • Patent number: 12333306
    Abstract: A graphics processing apparatus includes a graphics processor and a constant cache. The graphics processor has a number of execution instances that will generate requests for constant data from the constant cache. The constant cache stores constants of multiple constant types. The constant cache has a single level of hierarchy to store the constant data. The constant cache has a banking structure based on the number of execution instances, where the execution instances generate requests for the constant data with unified messaging that is the same for the different types of constant data.
    Type: Grant
    Filed: March 26, 2021
    Date of Patent: June 17, 2025
    Assignee: Intel Corporation
    Inventors: Subramaniam Maiyuran, Sudarshanram Shetty, Travis Schluessler, Guei-Yuan Lueh, PingHang Cheung, Srividya Karumuri, Chandra S. Gurram, Shuai Mu, Vikranth Vemulapalli
  • Patent number: 12293431
    Abstract: Embodiments described herein include, software, firmware, and hardware logic that provides techniques to perform arithmetic on sparse data via a systolic processing unit. Embodiment described herein provided techniques to detect zero value elements within a vector or a set of packed data elements output by a processing resource and generate metadata to indicate a location of the zero value elements within the plurality of data elements.
    Type: Grant
    Filed: May 2, 2023
    Date of Patent: May 6, 2025
    Assignee: Intel Corporation
    Inventors: Joydeep Ray, Scott Janus, Varghese George, Subramaniam Maiyuran, Altug Koker, Abhishek Appu, Prasoonkumar Surti, Vasanth Ranganathan, Valentin Andrei, Ashutosh Garg, Yoav Harel, Arthur Hunter, Jr., SungYe Kim, Mike Macpherson, Elmoustapha Ould-Ahmed-Vall, William Sadler, Lakshminarayanan Striramassarma, Vikranth Vemulapalli
  • Publication number: 20250103548
    Abstract: Systems and methods for improving cache efficiency and utilization are disclosed. In one embodiment, a graphics processor includes processing resources to perform graphics operations and a cache controller of a cache coupled to the processing resources. The cache controller is configured to control cache priority by determining whether default settings or an instruction will control cache operations for the cache.
    Type: Application
    Filed: November 14, 2024
    Publication date: March 27, 2025
    Applicant: Intel Corporation
    Inventors: Altug Koker, Joydeep Ray, Ben Ashbaugh, Jonathan Pearce, Abhishek Appu, Vasanth Ranganathan, Lakshminarayanan Striramassarma, Elmoustapha Ould-Ahmed-Vall, Aravindh Anantaraman, Valentin Andrei, Nicolas Galoppo Von Borries, Varghese George, Yoav Harel, Arthur Hunter, JR., Brent Insko, Scott Janus, Pattabhiraman K, Mike Macpherson, Subramaniam Maiyuran, Marian Alin Petre, Murali Ramadoss, Shailesh Shah, Kamal Sinha, Prasoonkumar Surti, Vikranth Vemulapalli
  • Publication number: 20250104179
    Abstract: A disaggregated processor package can be configured to accept interchangeable chiplets. Interchangeability is enabled by specifying a standard physical interconnect for chiplets that can enable the chiplet to interface with a fabric or bridge interconnect. Chiplets from different IP designers can conform to the common interconnect, enabling such chiplets to be interchangeable during assembly. The fabric and bridge interconnects logic on the chiplet can then be configured to confirm with the actual interconnect layout of the on-board logic of the chiplet. Additionally, data from chiplets can be transmitted across an inter-chiplet fabric using encapsulation, such that the actual data being transferred is opaque to the fabric, further enable interchangeability of the individual chiplets. With such an interchangeable design, cache or DRAM memory can be inserted into memory chiplet slots, while compute or graphics chiplets with a higher or lower core count can be inserted into logic chiplet slots.
    Type: Application
    Filed: October 3, 2024
    Publication date: March 27, 2025
    Applicant: Intel Corporation
    Inventors: Altug Koker, Lance Cheney, Eric Finley, Varghese George, Sanjeev Jahagirdar, Josh Mastronarde, Naveen Matam, Iqbal Rajwani, Lakshminarayanan Striramassarma, Melaku Teshome, Vikranth Vemulapalli, Binoj Xavier
  • Publication number: 20250061535
    Abstract: Embodiments described herein provide techniques to disaggregate an architecture of a system on a chip integrated circuit into multiple distinct chiplets that can be packaged onto a common chassis. In one embodiment, a graphics processing unit or parallel processor is composed from diverse silicon chiplets that are separately manufactured. A chiplet is an at least partially and distinctly packaged integrated circuit that includes distinct units of logic that can be assembled with other chiplets into a larger package. A diverse set of chiplets with different IP core logic can be assembled into a single device.
    Type: Application
    Filed: August 30, 2024
    Publication date: February 20, 2025
    Applicant: Intel Corporation
    Inventors: Naveen Matam, Lance Cheney, Eric Finley, Varghese George, Sanjeev Jahagirdar, Altug Koker, Josh Mastronarde, Iqbal Rajwani, Lakshminarayanan Striramassarma, Melaku Teshome, Vikranth Vemulapalli, Binoj Xavier
  • Patent number: 12210477
    Abstract: Systems and methods for improving cache efficiency and utilization are disclosed. In one embodiment, a graphics processor includes processing resources to perform graphics operations and a cache controller of a cache coupled to the processing resources. The cache controller is configured to control cache priority by determining whether default settings or an instruction will control cache operations for the cache.
    Type: Grant
    Filed: March 14, 2020
    Date of Patent: January 28, 2025
    Assignee: Intel Corporation
    Inventors: Altug Koker, Joydeep Ray, Ben Ashbaugh, Jonathan Pearce, Abhishek Appu, Vasanth Ranganathan, Lakshminarayanan Striramassarma, Elmoustapha Ould-Ahmed-Vall, Aravindh Anantaraman, Valentin Andrei, Nicolas Galoppo Von Borries, Varghese George, Yoav Harel, Arthur Hunter, Jr., Brent Insko, Scott Janus, Pattabhiraman K, Mike Macpherson, Subramaniam Maiyuran, Marian Alin Petre, Murali Ramadoss, Shailesh Shah, Kamal Sinha, Prasoonkumar Surti, Vikranth Vemulapalli
  • Patent number: 12210905
    Abstract: Provision of multiple register allocation sizes for threads is described. An example of a system includes one or more processors including a graphics processor, the graphics processor including at least a first local thread dispatcher (TDL) and multiple processing resources, each processing resource including a plurality of registers; and memory for storage of data for processing, wherein the one or more processors are to determine a register size for a first thread; identify one or more processing resources having sufficient register space for the first thread; select a processing resource of the one or more processing resources having sufficient register space to assign the first thread; select an available thread slot of the selected processing resource for the first thread; and allocate registers of the selected processing resource for the first thread.
    Type: Grant
    Filed: June 25, 2021
    Date of Patent: January 28, 2025
    Assignee: INTEL CORPORATION
    Inventors: Chandra Gurram, Wei-Yu Chen, Vikranth Vemulapalli, Subramaniam Maiyuran, Jorge Eduardo Parra Osorio, Shuai Mu, Guei-Yuan Lueh, Supratim Pal
  • Patent number: 12141890
    Abstract: A disaggregated processor package can be configured to accept interchangeable chiplets. Interchangeability is enabled by specifying a standard physical interconnect for chiplets that can enable the chiplet to interface with a fabric or bridge interconnect. Chiplets from different IP designers can conform to the common interconnect, enabling such chiplets to be interchangeable during assembly. The fabric and bridge interconnects logic on the chiplet can then be configured to confirm with the actual interconnect layout of the on-board logic of the chiplet. Additionally, data from chiplets can be transmitted across an inter-chiplet fabric using encapsulation, such that the actual data being transferred is opaque to the fabric, further enable interchangeability of the individual chiplets. With such an interchangeable design, cache or DRAM memory can be inserted into memory chiplet slots, while compute or graphics chiplets with a higher or lower core count can be inserted into logic chiplet slots.
    Type: Grant
    Filed: March 2, 2022
    Date of Patent: November 12, 2024
    Assignee: Intel Corporation
    Inventors: Altug Koker, Lance Cheney, Eric Finley, Varghese George, Sanjeev Jahagirdar, Josh Mastronarde, Naveen Matam, Iqbal Rajwani, Lakshminarayanan Striramassarma, Melaku Teshome, Vikranth Vemulapalli, Binoj Xavier
  • Patent number: 12117962
    Abstract: Methods and apparatus relating to scalar core integration in a graphics processor. In an example, an apparatus comprises a processor to receive a set of workload instructions for a graphics workload from a host complex, determine a first subset of operations in the set of operations that is suitable for execution by a scalar processor complex of the graphics processing device and a second subset of operations in the set of operations that is suitable for execution by a vector processor complex of the graphics processing device, assign the first subset of operations to the scalar processor complex for execution to generate a first set of outputs, assign the second subset of operations to the vector processor complex for execution to generate a second set of outputs. Other embodiments are also disclosed and claimed.
    Type: Grant
    Filed: August 16, 2023
    Date of Patent: October 15, 2024
    Assignee: INTEL CORPORATION
    Inventors: Joydeep Ray, Aravindh Anantaraman, Abhishek R. Appu, Altug Koker, Elmoustapha Ould-Ahmed-Vall, Valentin Andrei, Subramaniam Maiyuran, Nicolas Galoppo Von Borries, Varghese George, Mike Macpherson, Ben Ashbaugh, Murali Ramadoss, Vikranth Vemulapalli, William Sadler, Jonathan Pearce, Sungye Kim
  • Patent number: 12112398
    Abstract: Embodiments described herein provide techniques to disaggregate an architecture of a system on a chip integrated circuit into multiple distinct chiplets that can be packaged onto a common chassis. In one embodiment, a graphics processing unit or parallel processor is composed from diverse silicon chiplets that are separately manufactured. A chiplet is an at least partially and distinctly packaged integrated circuit that includes distinct units of logic that can be assembled with other chiplets into a larger package. A diverse set of chiplets with different IP core logic can be assembled into a single device.
    Type: Grant
    Filed: September 20, 2023
    Date of Patent: October 8, 2024
    Assignee: Intel Corporation
    Inventors: Naveen Matam, Lance Cheney, Eric Finley, Varghese George, Sanjeev Jahagirdar, Altug Koker, Josh Mastronarde, Iqbal Rajwani, Lakshminarayanan Striramassarma, Melaku Teshome, Vikranth Vemulapalli, Binoj Xavier
  • Patent number: 12056789
    Abstract: Embodiments described herein provide techniques to disaggregate an architecture of a system on a chip integrated circuit into multiple distinct chiplets that can be packaged onto a common chassis. In one embodiment, a graphics processing unit or parallel processor is composed from diverse silicon chiplets that are separately manufactured. A chiplet is an at least partially and distinctly packaged integrated circuit that includes distinct units of logic that can be assembled with other chiplets into a larger package. A diverse set of chiplets with different IP core logic can be assembled into a single device.
    Type: Grant
    Filed: August 24, 2023
    Date of Patent: August 6, 2024
    Assignee: Intel Corporation
    Inventors: Naveen Matam, Lance Cheney, Eric Finley, Varghese George, Sanjeev Jahagirdar, Altug Koker, Josh Mastronarde, Iqbal Rajwani, Lakshminarayanan Striramassarma, Melaku Teshome, Vikranth Vemulapalli, Binoj Xavier
  • Publication number: 20240256456
    Abstract: Embodiments are generally directed to data prefetching for graphics data processing. An embodiment of an apparatus includes one or more processors including one or more graphics processing units (GPUs); and a plurality of caches to provide storage for the one or more GPUs, the plurality of caches including at least an L1 cache and an L3 cache, wherein the apparatus to provide intelligent prefetching of data by a prefetcher of a first GPU of the one or more GPUs including measuring a hit rate for the Li cache; upon determining that the hit rate for the L1 cache is equal to or greater than a threshold value, limiting a prefetch of data to storage in the L3 cache, and upon determining that the hit rate for the L1 cache is less than a threshold value, allowing the prefetch of data to the L1 cache.
    Type: Application
    Filed: December 20, 2023
    Publication date: August 1, 2024
    Applicant: Intel Corporation
    Inventors: Vikranth Vemulapalli, Lakshminarayanan Striramassarma, Mike MacPherson, Aravindh Anantaraman, Ben Ashbaugh, Murali Ramadoss, William B. Sadler, Jonathan Pearce, Scott Janus, Brent Insko, Vasanth Ranganathan, Kamal Sinha, Arthur Hunter, Jr., Prasoonkumar Surti, Nicolas Galoppo von Borries, Joydeep Ray, Abhishek R. Appu, ElMoustapha Ould-Ahmed-Vall, Altug Koker, Sungye Kim, Subramaniam Maiyuran, Valentin Andrei
  • Patent number: 12045658
    Abstract: Apparatus and method for stack access throttling for synchronous ray tracing. For example, one embodiment of an apparatus comprises: ray tracing acceleration hardware to manage active ray tracing stack allocations to ensure that a size of the active ray tracing stack allocations remains within a threshold; and an execution unit to execute a thread to explicitly request a new ray tracing stack allocation from the ray tracing acceleration hardware, the ray tracing acceleration hardware to permit the new ray tracing stack allocation if the size of the active ray tracing stack allocations will remain within the threshold after permitting the new ray tracing stack allocation.
    Type: Grant
    Filed: January 31, 2022
    Date of Patent: July 23, 2024
    Assignee: Intel Corporation
    Inventors: Pawel Majewski, Prasoonkumar Surti, Karthik Vaidyanathan, Joshua Barczak, Vasanth Ranganathan, Vikranth Vemulapalli
  • Publication number: 20240168764
    Abstract: An apparatus to facilitate supporting and load balancing multiple double precision pipelines in a graphics environment is disclosed. The apparatus includes a processing core having at least one processing resource comprising: a first double precision (DP) pipeline to support double float operations, the first DP pipeline comprising a first set of floating point units (FPUs) configured in a pipelined configuration to enable new instructions to be issued to the first DP pipeline before previous instructions are complete; and a second DP pipeline to support the double float operations, wherein the second DP pipeline comprising a second set of FPUs configured in a pipelined configuration to enable new instructions to be issued to the first DP pipeline before previous instructions are complete.
    Type: Application
    Filed: November 18, 2022
    Publication date: May 23, 2024
    Applicant: Intel Corporation
    Inventors: Supratim Pal, Jiasheng Chen, Vikranth Vemulapalli, Subramaniam Maiyuran
  • Publication number: 20240160478
    Abstract: An apparatus to facilitate increasing processing resources in processing cores of a graphics environment is disclosed. The apparatus includes a plurality of processing resources to execute one or more execution threads; a plurality of message arbiter-processing resource (MA-PR) routers, wherein a respective MA-PR router of the plurality of MA-PR routers corresponds to a pair of processing resources of the plurality of processing resources and is to arbitrate routing of a thread control message from a message arbiter between the pair of processing resources; a plurality of local shared cache (LSC) sequencers to provide an interface between at least one LSC of the processing core and the plurality of processing resources; and a plurality of instruction caches (ICs) to store instructions of the one or more execution threads, wherein a respective IC of the plurality of ICs interfaces with a portion of the plurality of processing resources.
    Type: Application
    Filed: November 15, 2022
    Publication date: May 16, 2024
    Applicant: Intel Corporation
    Inventors: Jiasheng Chen, Chunhui Mei, Ben J. Ashbaugh, Naveen Matam, Joydeep Ray, Timothy Bauer, Guei-Yuan Lueh, Vasanth Ranganathan, Prashant Chaudhari, Vikranth Vemulapalli, Nishanth Reddy Pendluru, Piotr Reiter, Jain Philip, Marek Rudniewski, Christopher Spencer, Parth Damani, Prathamesh Raghunath Shinde, John Wiegert, Fataneh Ghodrat
  • Patent number: 11892950
    Abstract: Embodiments are generally directed to data prefetching for graphics data processing. An embodiment of an apparatus includes one or more processors including one or more graphics processing units (GPUs); and a plurality of caches to provide storage for the one or more GPUs, the plurality of caches including at least an L1 cache and an L3 cache, wherein the apparatus to provide intelligent prefetching of data by a prefetcher of a first GPU of the one or more GPUs including measuring a hit rate for the L1 cache; upon determining that the hit rate for the L1 cache is equal to or greater than a threshold value, limiting a prefetch of data to storage in the L3 cache, and upon determining that the hit rate for the L1 cache is less than a threshold value, allowing the prefetch of data to the L1 cache.
    Type: Grant
    Filed: July 15, 2022
    Date of Patent: February 6, 2024
    Assignee: INTEL CORPORATION
    Inventors: Vikranth Vemulapalli, Lakshminarayanan Striramassarma, Mike MacPherson, Aravindh Anantaraman, Ben Ashbaugh, Murali Ramadoss, William B. Sadler, Jonathan Pearce, Scott Janus, Brent Insko, Vasanth Ranganathan, Kamal Sinha, Arthur Hunter, Jr., Prasoonkumar Surti, Nicolas Galoppo von Borries, Joydeep Ray, Abhishek R. Appu, ElMoustapha Ould-Ahmed-Vall, Altug Koker, Sungye Kim, Subramaniam Maiyuran, Valentin Andrei