Patents by Inventor Omkar PARANJAPE

Omkar PARANJAPE has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11907717
    Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.
    Type: Grant
    Filed: February 8, 2023
    Date of Patent: February 20, 2024
    Assignee: NVIDIA Corporation
    Inventors: Andrew Kerr, Jack Choquette, Xiaogang Qiu, Omkar Paranjape, Poornachandra Rao, Shirish Gadre, Steven J. Heinrich, Manan Patel, Olivier Giroux, Alan Kaatz
  • Patent number: 11768686
    Abstract: In a streaming cache, multiple, dynamically sized tracking queues are employed. Request tracking information is distributed among the plural tracking queues to selectively enable out-of-order memory request returns. A dynamically controlled policy assigns pending requests to tracking queues, providing for example in-order memory returns in some contexts and/or for some traffic and out of order memory returns in other contexts and/or for other traffic.
    Type: Grant
    Filed: July 27, 2020
    Date of Patent: September 26, 2023
    Assignee: NVIDIA Corporation
    Inventors: Michael A Fetterman, Mark Gebhart, Shirish Gadre, Mitchell Hayenga, Steven Heinrich, Ramesh Jandhyala, Raghavan Madhavan, Omkar Paranjape, James Robertson, Jeff Schottmiller
  • Publication number: 20230185570
    Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.
    Type: Application
    Filed: February 8, 2023
    Publication date: June 15, 2023
    Inventors: Andrew KERR, Jack Choquette, Xiaogang Qiu, Omkar Paranjape, Poornachandra Rao, Shirish Gadre, Steven J. Heinrich, Manan Patel, Olivier Giroux, Alan Kaatz
  • Patent number: 11604649
    Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.
    Type: Grant
    Filed: June 30, 2021
    Date of Patent: March 14, 2023
    Assignee: NVIDIA Corporation
    Inventors: Andrew Kerr, Jack Choquette, Xiaogang Qiu, Omkar Paranjape, Poornachandra Rao, Shirish Gadre, Steven J. Heinrich, Manan Patel, Olivier Giroux, Alan Kaatz
  • Patent number: 11379944
    Abstract: A texture processing pipeline in a graphics processing unit generates the surface appearance for objects in a computer-generated scene. This texture processing pipeline determines, at multiple stages within the texture processing pipeline, whether texture operations and texture loads may be processed at an accelerated rate. At each stage that includes a decision point, the texture processing pipeline assumes that the current texture operation or texture load can be accelerated unless specific, known information indicates that the texture operation or texture load cannot be accelerated. As a result, the texture processing pipeline increases the number of texture operations and texture loads that are accelerated relative to the number of texture operations and texture loads that are not accelerated.
    Type: Grant
    Filed: June 23, 2020
    Date of Patent: July 5, 2022
    Assignee: NVIDIA CORPORATION
    Inventors: Michael Fetterman, Shirish Gadre, Mark Gebhart, Steven J. Heinrich, Ramesh Jandhyala, William Newhall, Omkar Paranjape, Stefano Pescador, Poorna Rao
  • Patent number: 11367160
    Abstract: A parallel processing unit (e.g., a GPU), in some examples, includes a hardware scheduler and hardware arbiter that launch graphics and compute work for simultaneous execution on a SIMD/SIMT processing unit. Each processing unit (e.g., a streaming multiprocessor) of the parallel processing unit operates in a graphics-greedy mode or a compute-greedy mode at respective times. The hardware arbiter, in response to a result of a comparison of at least one monitored performance or utilization metric to a user-configured threshold, can selectively cause the processing unit to run one or more compute work items from a compute queue when the processing unit is operating in the graphics-greedy mode, and cause the processing unit to run one or more graphics work items from a graphics queue when the processing unit is operating in the compute-greedy mode. Associated methods and systems are also described.
    Type: Grant
    Filed: August 2, 2018
    Date of Patent: June 21, 2022
    Assignee: NVIDIA CORPORATION
    Inventors: Rajballav Dash, Gregory Palmer, Gentaro Hirota, Lacky Shah, Jack Choquette, Emmett Kilgariff, Sriharsha Niverty, Milton Lei, Shirish Gadre, Omkar Paranjape, Lei Yang, Rouslan Dimitrov
  • Patent number: 11347668
    Abstract: A unified cache subsystem includes a data memory configured as both a shared memory and a local cache memory. The unified cache subsystem processes different types of memory transactions using different data pathways. To process memory transactions that target shared memory, the unified cache subsystem includes a direct pathway to the data memory. To process memory transactions that do not target shared memory, the unified cache subsystem includes a tag processing pipeline configured to identify cache hits and cache misses. When the tag processing pipeline identifies a cache hit for a given memory transaction, the transaction is rerouted to the direct pathway to data memory. When the tag processing pipeline identifies a cache miss for a given memory transaction, the transaction is pushed into a first-in first-out (FIFO) until miss data is returned from external memory. The tag processing pipeline is also configured to process texture-oriented memory transactions.
    Type: Grant
    Filed: July 6, 2020
    Date of Patent: May 31, 2022
    Assignee: NVIDIA Corporation
    Inventors: Xiaogang Qiu, Ronny Krashinsky, Steven Heinrich, Shirish Gadre, John Edmondson, Jack Choquette, Mark Gebhart, Ramesh Jandhyala, Poornachandra Rao, Omkar Paranjape, Michael Siu
  • Publication number: 20220027160
    Abstract: In a streaming cache, multiple, dynamically sized tracking queues are employed. Request tracking information is distributed among the plural tracking queues to selectively enable out-of-order memory request returns. A dynamically controlled policy assigns pending requests to tracking queues, providing for example in-order memory returns in some contexts and/or for some traffic and out of order memory returns in other contexts and/or for other traffic.
    Type: Application
    Filed: July 27, 2020
    Publication date: January 27, 2022
    Inventors: Michael A. FETTERMAN, Mark GEBHART, Shirish GADRE, Mitchell HAYENGA, Steven HEINRICH, Ramesh JANDHYALA, Raghavan MADHAVAN, Omkar PARANJAPE, James ROBERTSON, Jeff SCHOTTMILLER
  • Publication number: 20210398241
    Abstract: A texture processing pipeline in a graphics processing unit generates the surface appearance for objects in a computer-generated scene. This texture processing pipeline determines, at multiple stages within the texture processing pipeline, whether texture operations and texture loads may be processed at an accelerated rate. At each stage that includes a decision point, the texture processing pipeline assumes that the current texture operation or texture load can be accelerated unless specific, known information indicates that the texture operation or texture load cannot be accelerated. As a result, the texture processing pipeline increases the number of texture operations and texture loads that are accelerated relative to the number of texture operations and texture loads that are not accelerated.
    Type: Application
    Filed: June 23, 2020
    Publication date: December 23, 2021
    Inventors: Michael FETTERMAN, Shirish GADRE, Mark GEBHART, Steven J. HEINRICH, Ramesh JANDHYALA, William NEWHALL, Omkar PARANJAPE, Stefano PESCADOR, Poorna RAO
  • Publication number: 20210326137
    Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.
    Type: Application
    Filed: June 30, 2021
    Publication date: October 21, 2021
    Inventors: Andrew KERR, Jack CHOQUETTE, Xiaogang QIU, Omkar PARANJAPE, Poornachandra RAO, Shirish GADRE, Steven J. HEINRICH, Manan PATEL, Olivier GIROUX, Alan KAATZ
  • Patent number: 11080051
    Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.
    Type: Grant
    Filed: December 12, 2019
    Date of Patent: August 3, 2021
    Assignee: NVIDIA Corporation
    Inventors: Andrew Kerr, Jack Choquette, Xiaogang Qiu, Omkar Paranjape, Poornachandra Rao, Shirish Gadre, Steven J. Heinrich, Manan Patel, Olivier Giroux, Alan Kaatz
  • Publication number: 20210124582
    Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.
    Type: Application
    Filed: December 12, 2019
    Publication date: April 29, 2021
    Inventors: Andrew Kerr, Jack Choquette, Xiaogang Qiu, Omkar Paranjape, Poornachandra Rao, Shirish Gadre, Steven J. Heinrich, Manan Patel, Olivier Giroux, Alan Kaatz
  • Publication number: 20200401541
    Abstract: A unified cache subsystem includes a data memory configured as both a shared memory and a local cache memory. The unified cache subsystem processes different types of memory transactions using different data pathways. To process memory transactions that target shared memory, the unified cache subsystem includes a direct pathway to the data memory. To process memory transactions that do not target shared memory, the unified cache subsystem includes a tag processing pipeline configured to identify cache hits and cache misses. When the tag processing pipeline identifies a cache hit for a given memory transaction, the transaction is rerouted to the direct pathway to data memory. When the tag processing pipeline identifies a cache miss for a given memory transaction, the transaction is pushed into a first-in first-out (FIFO) until miss data is returned from external memory. The tag processing pipeline is also configured to process texture-oriented memory transactions.
    Type: Application
    Filed: July 6, 2020
    Publication date: December 24, 2020
    Inventors: Xiaogang QIU, Ronny KRASHINSKY, Steven HEINRICH, Shirish GADRE, John EDMONDSON, Jack CHOQUETTE, Mark GEBHART, Ramesh JANDHYALA, Poornachandra RAO, Omkar PARANJAPE, Michael SIU
  • Patent number: 10705994
    Abstract: A unified cache subsystem includes a data memory configured as both a shared memory and a local cache memory. The unified cache subsystem processes different types of memory transactions using different data pathways. To process memory transactions that target shared memory, the unified cache subsystem includes a direct pathway to the data memory. To process memory transactions that do not target shared memory, the unified cache subsystem includes a tag processing pipeline configured to identify cache hits and cache misses. When the tag processing pipeline identifies a cache hit for a given memory transaction, the transaction is rerouted to the direct pathway to data memory. When the tag processing pipeline identifies a cache miss for a given memory transaction, the transaction is pushed into a first-in first-out (FIFO) until miss data is returned from external memory. The tag processing pipeline is also configured to process texture-oriented memory transactions.
    Type: Grant
    Filed: May 4, 2017
    Date of Patent: July 7, 2020
    Assignee: NVIDIA Corporation
    Inventors: Xiaogang Qiu, Ronny Krashinsky, Steven Heinrich, Shirish Gadre, John Edmondson, Jack Choquette, Mark Gebhart, Ramesh Jandhyala, Poornachandra Rao, Omkar Paranjape, Michael Siu
  • Publication number: 20200043123
    Abstract: A parallel processing unit (e.g., a GPU), in some examples, includes a hardware scheduler and hardware arbiter that launch graphics and compute work for simultaneous execution on a SIMD/SIMT processing unit. Each processing unit (e.g., a streaming multiprocessor) of the parallel processing unit operates in a graphics-greedy mode or a compute-greedy mode at respective times. The hardware arbiter, in response to a result of a comparison of at least one monitored performance or utilization metric to a user-configured threshold, can selectively cause the processing unit to run one or more compute work items from a compute queue when the processing unit is operating in the graphics-greedy mode, and cause the processing unit to run one or more graphics work items from a graphics queue when the processing unit is operating in the compute-greedy mode. Associated methods and systems are also described.
    Type: Application
    Filed: August 2, 2018
    Publication date: February 6, 2020
    Inventors: Rajballav DASH, Gregory PALMER, Gentaro HIROTA, Lacky SHAH, Jack CHOQUETTE, Emmett KILGARIFF, Sriharsha NIVERTY, Milton LEI, Shirish GADRE, Omkar PARANJAPE, Lei YANG, Rouslan DIMITROV
  • Patent number: 10459861
    Abstract: A unified cache subsystem includes a data memory configured as both a shared memory and a local cache memory. The unified cache subsystem processes different types of memory transactions using different data pathways. To process memory transactions that target shared memory, the unified cache subsystem includes a direct pathway to the data memory. To process memory transactions that do not target shared memory, the unified cache subsystem includes a tag processing pipeline configured to identify cache hits and cache misses. When the tag processing pipeline identifies a cache hit for a given memory transaction, the transaction is rerouted to the direct pathway to data memory. When the tag processing pipeline identifies a cache miss for a given memory transaction, the transaction is pushed into a first-in first-out (FIFO) until miss data is returned from external memory. The tag processing pipeline is also configured to process texture-oriented memory transactions.
    Type: Grant
    Filed: September 26, 2017
    Date of Patent: October 29, 2019
    Assignee: NVIDIA CORPORATION
    Inventors: Xiaogang Qiu, Ronny Krashinsky, Steven Heinrich, Shirish Gadre, John Edmondson, Jack Choquette, Mark Gebhart, Ramesh Jandhyala, Poornachandra Rao, Omkar Paranjape, Michael Siu
  • Patent number: 10152329
    Abstract: One embodiment of the present disclosure sets forth an optimized way to execute pre-scheduled replay operations for divergent operations in a parallel processing subsystem. Specifically, a streaming multiprocessor (SM) includes a multi-stage pipeline configured to insert pre-scheduled replay operations into a multi-stage pipeline. A pre-scheduled replay unit detects whether the operation associated with the current instruction is accessing a common resource. If the threads are accessing data which are distributed across multiple cache lines, then the pre-scheduled replay unit inserts pre-scheduled replay operations behind the current instruction. The multi-stage pipeline executes the instruction and the associated pre-scheduled replay operations sequentially. If additional threads remain unserviced after execution of the instruction and the pre-scheduled replay operations, then additional replay operations are inserted via the replay loop, until all threads are serviced.
    Type: Grant
    Filed: February 9, 2012
    Date of Patent: December 11, 2018
    Assignee: NVIDIA CORPORATION
    Inventors: Michael Fetterman, Stewart Glenn Carlton, Jack Hilaire Choquette, Shirish Gadre, Olivier Giroux, Douglas J. Hahn, Steven James Heinrich, Eric Lyell Hill, Charles McCarver, Omkar Paranjape, Anjana Rajendran, Rajeshwaran Selvanesan
  • Publication number: 20180322078
    Abstract: A unified cache subsystem includes a data memory configured as both a shared memory and a local cache memory. The unified cache subsystem processes different types of memory transactions using different data pathways. To process memory transactions that target shared memory, the unified cache subsystem includes a direct pathway to the data memory. To process memory transactions that do not target shared memory, the unified cache subsystem includes a tag processing pipeline configured to identify cache hits and cache misses. When the tag processing pipeline identifies a cache hit for a given memory transaction, the transaction is rerouted to the direct pathway to data memory. When the tag processing pipeline identifies a cache miss for a given memory transaction, the transaction is pushed into a first-in first-out (FIFO) until miss data is returned from external memory. The tag processing pipeline is also configured to process texture-oriented memory transactions.
    Type: Application
    Filed: September 26, 2017
    Publication date: November 8, 2018
    Inventors: Xiaogang QIU, Ronny KRASHINSKY, Steven HEINRICH, Shirish GADRE, John EDMONDSON, Jack CHOQUETTE, Mark GEBHART, Ramesh JANDHYALA, Poornachandra RAO, Omkar PARANJAPE, Michael SIU
  • Publication number: 20180322077
    Abstract: A unified cache subsystem includes a data memory configured as both a shared memory and a local cache memory. The unified cache subsystem processes different types of memory transactions using different data pathways. To process memory transactions that target shared memory, the unified cache subsystem includes a direct pathway to the data memory. To process memory transactions that do not target shared memory, the unified cache subsystem includes a tag processing pipeline configured to identify cache hits and cache misses. When the tag processing pipeline identifies a cache hit for a given memory transaction, the transaction is rerouted to the direct pathway to data memory. When the tag processing pipeline identifies a cache miss for a given memory transaction, the transaction is pushed into a first-in first-out (FIFO) until miss data is returned from external memory. The tag processing pipeline is also configured to process texture-oriented memory transactions.
    Type: Application
    Filed: May 4, 2017
    Publication date: November 8, 2018
    Inventors: Xiaogang QIU, Ronny KRASHINSKY, Steven HEINRICH, Shirish GADRE, John EDMONDSON, Jack CHOQUETTE, Mark GEBHART, Ramesh JANDHYALA, Poornachandra RAO, Omkar PARANJAPE, Michael SIU
  • Patent number: 10095548
    Abstract: One embodiment of the present disclosure sets forth an effective way to maintain fairness and order in the scheduling of common resource access requests related to replay operations. Specifically, a streaming multiprocessor (SM) includes a total order queue (TOQ) configured to schedule the access requests over one or more execution cycles. Access requests are allowed to make forward progress when needed common resources have been allocated to the request. Where multiple access requests require the same common resource, priority is given to the older access request. Access requests may be placed in a sleep state pending availability of certain common resources. Deadlock may be avoided by allowing an older access request to steal resources from a younger resource request. One advantage of the disclosed technique is that older common resource access requests are not repeatedly blocked from making forward progress by newer access requests.
    Type: Grant
    Filed: May 21, 2012
    Date of Patent: October 9, 2018
    Assignee: NVIDIA CORPORATION
    Inventors: Michael Fetterman, Shirish Gadre, John H. Edmondson, Omkar Paranjape, Anjana Rajendran, Eric Lyell Hill, Rajeshwaran Selvanesan, Charles McCarver, Kevin Mitchell, Steven James Heinrich