Patents by Inventor Lacky Shah

Lacky Shah has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11954518
    Abstract: Apparatuses, systems, and techniques to optimize processor resources at a user-defined level. In at least one embodiment, priority of one or more tasks are adjusted to prevent one or more other dependent tasks from entering an idle state due to lack of resources to consume.
    Type: Grant
    Filed: December 20, 2019
    Date of Patent: April 9, 2024
    Assignee: Nvidia Corporation
    Inventors: Jonathon Evans, Lacky Shah, Phil Johnson, Jonah Alben, Brian Pharris, Greg Palmer, Brian Fahs
  • Publication number: 20230289190
    Abstract: This specification describes a programmatic multicast technique enabling one thread (for example, in a cooperative group array (CGA) on a GPU) to request data on behalf of one or more other threads (for example, executing on respective processor cores of the GPU). The multicast is supported by tracking circuitry that interfaces between multicast requests received from processor cores and the available memory. The multicast is designed to reduce cache (for example, layer 2 cache) bandwidth utilization enabling strong scaling and smaller tile sizes.
    Type: Application
    Filed: March 10, 2022
    Publication date: September 14, 2023
    Inventors: Apoorv PARLE, Ronny KRASHINSKY, John EDMONDSON, Jack CHOQUETTE, Shirish GADRE, Steve HEINRICH, Manan PATEL, Prakash Bangalore PRABHAKAR, JR., Ravi MANYAM, Wish GANDHI, Lacky SHAH, Alexander L. Minkin
  • Publication number: 20230289189
    Abstract: Distributed shared memory (DSMEM) comprises blocks of memory that are distributed or scattered across a processor (such as a GPU). Threads executing on a processing core local to one memory block are able to access a memory block local to a different processing core. In one embodiment, shared access to these DSMEM allocations distributed across a collection of processing cores is implemented by communications between the processing cores. Such distributed shared memory provides very low latency memory access for processing cores located in proximity to the memory blocks, and also provides a way for more distant processing cores to also access the memory blocks in a manner and using interconnects that do not interfere with the processing cores' access to main or global memory such as hacked by an L2 cache.
    Type: Application
    Filed: March 10, 2022
    Publication date: September 14, 2023
    Inventors: Prakash BANGALORE PRABHAKAR, Gentaro HIROTA, Ronny KRASHINSKY, Ze LONG, Brian PHARRIS, Rajballav DASH, Jeff TUCKEY, Jerome F. DULUK, JR., Lacky SHAH, Luke DURANT, Jack CHOQUETTE, Eric WERNESS, Naman GOVIL, Manan PATEL, Shayani DEB, Sandeep NAVADA, John EDMONDSON, Greg PALMER, Wish GANDHI, Ravi MANYAM, Apoorv PARLE, Olivier GIROUX, Shirish GADRE, Steve HEINRICH
  • Publication number: 20230289215
    Abstract: A new level(s) of hierarchy—Cooperate Group Arrays (CGAs)—and an associated new hardware-based work distribution/execution model is described. A CGA is a grid of thread blocks (also referred to as cooperative thread arrays (CTAs)). CGAs provide co-scheduling, e.g., control over where CTAs are placed/executed in a processor (such as a GPU), relative to the memory required by an application and relative to each other. Hardware support for such CGAs guarantees concurrency and enables applications to see more data locality, reduced latency, and better synchronization between all the threads in tightly cooperating collections of CTAs programmably distributed across different (e.g., hierarchical) hardware domains or partitions.
    Type: Application
    Filed: March 10, 2022
    Publication date: September 14, 2023
    Inventors: Greg PALMER, Gentaro HIROTA, Ronny KRASHINSKY, Ze LONG, Brian PHARRIS, Rajballav DASH, Jeff TUCKEY, Jerome F. DULUK, JR., Lacky SHAH, Luke DURANT, Jack CHOQUETTE, Eric WERNESS, Naman GOVIL, Manan PATEL, Shayani DEB, Sandeep NAVADA, John EDMONDSON, Prakash BANGALORE PRABHAKAR, Wish GANDHI, Ravi MANYAM, Apoorv PARLE, Olivier GIROUX, Shirish GADRE, Steve HEINRICH
  • Patent number: 11367160
    Abstract: A parallel processing unit (e.g., a GPU), in some examples, includes a hardware scheduler and hardware arbiter that launch graphics and compute work for simultaneous execution on a SIMD/SIMT processing unit. Each processing unit (e.g., a streaming multiprocessor) of the parallel processing unit operates in a graphics-greedy mode or a compute-greedy mode at respective times. The hardware arbiter, in response to a result of a comparison of at least one monitored performance or utilization metric to a user-configured threshold, can selectively cause the processing unit to run one or more compute work items from a compute queue when the processing unit is operating in the graphics-greedy mode, and cause the processing unit to run one or more graphics work items from a graphics queue when the processing unit is operating in the compute-greedy mode. Associated methods and systems are also described.
    Type: Grant
    Filed: August 2, 2018
    Date of Patent: June 21, 2022
    Assignee: NVIDIA CORPORATION
    Inventors: Rajballav Dash, Gregory Palmer, Gentaro Hirota, Lacky Shah, Jack Choquette, Emmett Kilgariff, Sriharsha Niverty, Milton Lei, Shirish Gadre, Omkar Paranjape, Lei Yang, Rouslan Dimitrov
  • Publication number: 20210191754
    Abstract: Apparatuses, systems, and techniques to optimize processor resources at a user-defined level. In at least one embodiment, priority of one or more tasks are adjusted to prevent one or more other dependent tasks from entering an idle state due to lack of resources to consume.
    Type: Application
    Filed: December 20, 2019
    Publication date: June 24, 2021
    Inventors: Jonathon Evans, Lacky Shah, Phil Johnson, Jonah Alben, Brian Pharris, Greg Palmer, Brian Fahs
  • Publication number: 20200043123
    Abstract: A parallel processing unit (e.g., a GPU), in some examples, includes a hardware scheduler and hardware arbiter that launch graphics and compute work for simultaneous execution on a SIMD/SIMT processing unit. Each processing unit (e.g., a streaming multiprocessor) of the parallel processing unit operates in a graphics-greedy mode or a compute-greedy mode at respective times. The hardware arbiter, in response to a result of a comparison of at least one monitored performance or utilization metric to a user-configured threshold, can selectively cause the processing unit to run one or more compute work items from a compute queue when the processing unit is operating in the graphics-greedy mode, and cause the processing unit to run one or more graphics work items from a graphics queue when the processing unit is operating in the compute-greedy mode. Associated methods and systems are also described.
    Type: Application
    Filed: August 2, 2018
    Publication date: February 6, 2020
    Inventors: Rajballav DASH, Gregory PALMER, Gentaro HIROTA, Lacky SHAH, Jack CHOQUETTE, Emmett KILGARIFF, Sriharsha NIVERTY, Milton LEI, Shirish GADRE, Omkar PARANJAPE, Lei YANG, Rouslan DIMITROV
  • Patent number: 10402323
    Abstract: In one embodiment of the present invention a cache unit organizes data stored in an attached memory to optimize accesses to compressed data. In operation, the cache unit introduces a layer of indirection between a physical address associated with a memory access request and groups of blocks in the attached memory. The layer of indirection—virtual tiles—enables the cache unit to selectively store compressed data that would conventionally be stored in separate physical tiles included in a group of blocks in a single physical tile. Because the cache unit stores compressed data associated with multiple physical tiles in a single physical tile and, more specifically, in adjacent locations within the single physical tile, the cache unit coalesces the compressed data into contiguous blocks. Subsequently, upon performing a read operation, the cache unit may retrieve the compressed data conventionally associated with separate physical tiles in a single read operation.
    Type: Grant
    Filed: October 28, 2015
    Date of Patent: September 3, 2019
    Assignee: NVIDIA CORPORATION
    Inventors: Praveen Krishnamurthy, Peter B. Holmquist, Wishwesh Gandhi, Timothy Purcell, Karan Mehra, Lacky Shah
  • Patent number: 9934145
    Abstract: In one embodiment of the present invention a cache unit organizes data stored in an attached memory to optimize accesses to compressed data. In operation, the cache unit introduces a layer of indirection between a physical address associated with a memory access request and groups of blocks in the attached memory. The layer of indirection—virtual tiles—enables the cache unit to selectively store compressed data that would conventionally be stored in separate physical tiles included in a group of blocks in a single physical tile. Because the cache unit stores compressed data associated with multiple physical tiles in a single physical tile and, more specifically, in adjacent locations within the single physical tile, the cache unit coalesces the compressed data into contiguous blocks. Subsequently, upon performing a read operation, the cache unit may retrieve the compressed data conventionally associated with separate physical tiles in a single read operation.
    Type: Grant
    Filed: October 28, 2015
    Date of Patent: April 3, 2018
    Assignee: NVIDIA Corporation
    Inventors: Praveen Krishnamurthy, Peter B. Holmquist, Wishwesh Gandhi, Timothy Purcell, Karan Mehra, Lacky Shah
  • Publication number: 20170123977
    Abstract: In one embodiment of the present invention a cache unit organizes data stored in an attached memory to optimize accesses to compressed data. In operation, the cache unit introduces a layer of indirection between a physical address associated with a memory access request and groups of blocks in the attached memory. The layer of indirection—virtual tiles—enables the cache unit to selectively store compressed data that would conventionally be stored in separate physical tiles included in a group of blocks in a single physical tile. Because the cache unit stores compressed data associated with multiple physical tiles in a single physical tile and, more specifically, in adjacent locations within the single physical tile, the cache unit coalesces the compressed data into contiguous blocks. Subsequently, upon performing a read operation, the cache unit may retrieve the compressed data conventionally associated with separate physical tiles in a single read operation.
    Type: Application
    Filed: October 28, 2015
    Publication date: May 4, 2017
    Inventors: Praveen KRISHNAMURTHY, Peter B. HOLMQUIST, Wishwesh GANDHI, Timothy PURCELL, Karan MEHRA, Lacky SHAH
  • Publication number: 20170123978
    Abstract: In one embodiment of the present invention a cache unit organizes data stored in an attached memory to optimize accesses to compressed data. In operation, the cache unit introduces a layer of indirection between a physical address associated with a memory access request and groups of blocks in the attached memory. The layer of indirection—virtual tiles—enables the cache unit to selectively store compressed data that would conventionally be stored in separate physical tiles included in a group of blocks in a single physical tile. Because the cache unit stores compressed data associated with multiple physical tiles in a single physical tile and, more specifically, in adjacent locations within the single physical tile, the cache unit coalesces the compressed data into contiguous blocks. Subsequently, upon performing a read operation, the cache unit may retrieve the compressed data conventionally associated with separate physical tiles in a single read operation.
    Type: Application
    Filed: October 28, 2015
    Publication date: May 4, 2017
    Inventors: Praveen KRISHNAMURTHY, Peter B. HOLMQUIST, Wishwesh GANDHI, Timothy PURCELL, Karan MEHRA, Lacky SHAH
  • Publication number: 20150149733
    Abstract: Method and system for supporting speculative modification in a data cache are provided and described. In one embodiment, a speculative cache buffer includes a plurality of cache lines and a plurality of state indicators. At least one of the cache lines is operable to receive an evicted cache line from a cache. The at least one of the cache lines is operable to return the evicted cache line to the cache if the cache requests the evicted cache line. Further, the plurality of state indicators is operable to indicate a state of a corresponding cache line of the cache lines.
    Type: Application
    Filed: January 14, 2011
    Publication date: May 28, 2015
    Inventors: Guillermo Rozas, Alexander Klaiber, David Dunn, Paul Serris, Lacky Shah
  • Patent number: 8893249
    Abstract: In a system that partitions an application program into page segments, a minimal portion of the application program is installed on a client system. The client prefetches page segments from the application server or the application server pushes additional page segments to the client. The application server begins streaming the requested page segments to the client when it receives a valid access token from the client. The client performs server load balancing across a plurality of application servers. If the client observes a non-response or slow response condition from an application server or license server, it switches to another application or license server.
    Type: Grant
    Filed: November 26, 2012
    Date of Patent: November 18, 2014
    Assignee: Numecent Holdings, Inc.
    Inventors: Daniel T. Arai, Sameer Panwar, Manuel E. Benitez, Anne M. Holler, Lacky Shah
  • Patent number: 8438298
    Abstract: An intelligent network streaming and execution system for conventionally coded applications provides a system that partitions an application program into page segments by observing the manner in which the application program is conventionally installed. A minimal portion of the application program is installed on a client system and the user launches the application in the same ways that applications on other client file systems are started. An application program server streams the page segments to the client as the application program executes on the client and the client stores the page segments in a cache. Page segments are requested by the client from the application server whenever a page fault occurs from the cache for the application program. The client prefetches page segments from the application server or the application server pushes additional page segments to the client based on the pattern of page segment requests for that particular application.
    Type: Grant
    Filed: June 13, 2006
    Date of Patent: May 7, 2013
    Assignee: Endeavors Technologies, Inc.
    Inventors: Daniel T. Arai, Sameer Panwar, Manuel E. Benitez, Anne M. Holler, Lacky Shah
  • Patent number: 7873793
    Abstract: Method and system for supporting speculative modification in a data cache are provided and described. A data cache comprises a plurality of cache lines. Each cache line includes a state indicator for indicating anyone of a plurality of states, wherein the plurality of states includes a speculative state to enable keeping track of speculative modification to data in the respective cache line. The speculative state enables a speculative modification to the data in the respective cache line to be made permanent in response to a first operation performed upon reaching a particular instruction boundary during speculative execution of instructions. Further, the speculative state enables the speculative modification to the data in the respective cache line to be undone in response to a second operation performed upon failing to reach the particular instruction boundary during speculative execution of instructions.
    Type: Grant
    Filed: May 29, 2007
    Date of Patent: January 18, 2011
    Inventors: Guillermo Rozas, Alexander Klaiber, David Dunn, Paul Serris, Lacky Shah
  • Patent number: 7725885
    Abstract: The present invention relates to a mechanism for adaptive run time compilation of traces of program code to achieve efficiency in compilation and overall execution. The mechanism uses a combination of interpretation and compilation in executing byte code. The mechanism selects code segments for compilation based upon frequency of execution and ease of compilation. The inventive mechanism is able to compile small code segments, which can be a subset of a method or subroutine, and comprising only single path execution. The mechanism thereby achieves efficiency in compilation by having less total code as well as having only straight line, or single path execution, code to compile.
    Type: Grant
    Filed: May 9, 2000
    Date of Patent: May 25, 2010
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Salil Pradhan, Lacky Shah
  • Patent number: 7606979
    Abstract: Method and system for conservatively managing store capacity available to a processor issuing stores are provided and described. In particular, a counter mechanism is utilized, whereas the counter mechanism is incremented or decremented based on the occurrence of particular events.
    Type: Grant
    Filed: December 12, 2006
    Date of Patent: October 20, 2009
    Inventors: Guillermo Rozas, Alexander Klaiber, David Dunn, Paul Serris, Lacky Shah
  • Publication number: 20080178298
    Abstract: An intelligent network streaming and execution system for conventionally coded applications provides a system that partitions an application program into page segments by observing the manner in which the application program is conventionally installed. A minimal portion of the application program is installed on a client system and the user launches the application in the same ways that applications on other client file systems are started. An application program server streams the page segments to the client as the application program executes on the client and the client stores the page segments in a cache. Page segments are requested by the client from the application server whenever a page fault occurs from the cache for the application program. The client prefetches page segments from the application server or the application server pushes additional page segments to the client based on the pattern of page segment requests for that particular application.
    Type: Application
    Filed: June 13, 2006
    Publication date: July 24, 2008
    Inventors: Daniel T. Arai, Sameer Panwar, Manuel E. Benitez, Anne M. Holler, Lacky Shah
  • Patent number: 7225299
    Abstract: Method and system for supporting speculative modification in a data cache are provided and described. A data cache comprises a plurality of cache lines. Each cache line includes a state indicator for indicating anyone of a plurality of states, wherein the plurality of states includes a speculative state to enable keeping track of speculative modification to data in the respective cache line. The speculative state enables a speculative modification to the data in the respective cache line to be made permanent in response to a first operation performed upon reaching a particular instruction boundary during speculative execution of instructions. Further, the speculative state enables the speculative modification to the data in the respective cache line to be undone in response to a second operation performed upon failing to reach the particular instruction boundary during speculative execution of instructions.
    Type: Grant
    Filed: July 16, 2003
    Date of Patent: May 29, 2007
    Assignee: Transmeta Corporation
    Inventors: Guillermo Rozas, Alexander Klaiber, David Dunn, Paul Serris, Lacky Shah
  • Patent number: 7149851
    Abstract: Method and system for conservatively managing store capacity available to a processor issuing stores are provided and described. In particular, a counter mechanism is utilized, whereas the counter mechanism is incremented or decremented based on the occurrence of particular events.
    Type: Grant
    Filed: August 21, 2003
    Date of Patent: December 12, 2006
    Assignee: Transmeta Corporation
    Inventors: Guillermo Rozas, Alexander Klaiber, David Dunn, Paul Serris, Lacky Shah