Patents by Inventor Wish GANDHI

Wish GANDHI has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11966480
    Abstract: Apparatuses, systems, and techniques for supporting fairness of multiple context sharing cryptographic hardware. An accelerator circuit includes a copy engine (CE) with AES-GCM hardware configured to perform both encryption and authentication of data transfers for multiple applications or multiple data streams in a single application or belonging to a single user. The CE splits a data transfer of a specified size into a set of partial transfers. The CE sequentially executes the set of partial transfers using a context for a period of time (e.g., a timeslice) for an application. The CE stores in a secure memory for the application one or more data for encryption or decryption (e.g., a hash key, a block counter, etc.) computed from a last partial transfer. The one or more data for encryption or decryption are retrieved and used when data transfers for the application is resumed by the CE.
    Type: Grant
    Filed: March 10, 2022
    Date of Patent: April 23, 2024
    Assignee: Nvidia Corporation
    Inventors: Adam Hendrickson, Vaishali Kulkarni, Gobikrishna Dhanuskodi, Naveen Cherukuri, Wish Gandhi, Raymond Wong
  • Publication number: 20230289190
    Abstract: This specification describes a programmatic multicast technique enabling one thread (for example, in a cooperative group array (CGA) on a GPU) to request data on behalf of one or more other threads (for example, executing on respective processor cores of the GPU). The multicast is supported by tracking circuitry that interfaces between multicast requests received from processor cores and the available memory. The multicast is designed to reduce cache (for example, layer 2 cache) bandwidth utilization enabling strong scaling and smaller tile sizes.
    Type: Application
    Filed: March 10, 2022
    Publication date: September 14, 2023
    Inventors: Apoorv PARLE, Ronny KRASHINSKY, John EDMONDSON, Jack CHOQUETTE, Shirish GADRE, Steve HEINRICH, Manan PATEL, Prakash Bangalore PRABHAKAR, JR., Ravi MANYAM, Wish GANDHI, Lacky SHAH, Alexander L. Minkin
  • Publication number: 20230289189
    Abstract: Distributed shared memory (DSMEM) comprises blocks of memory that are distributed or scattered across a processor (such as a GPU). Threads executing on a processing core local to one memory block are able to access a memory block local to a different processing core. In one embodiment, shared access to these DSMEM allocations distributed across a collection of processing cores is implemented by communications between the processing cores. Such distributed shared memory provides very low latency memory access for processing cores located in proximity to the memory blocks, and also provides a way for more distant processing cores to also access the memory blocks in a manner and using interconnects that do not interfere with the processing cores' access to main or global memory such as hacked by an L2 cache.
    Type: Application
    Filed: March 10, 2022
    Publication date: September 14, 2023
    Inventors: Prakash BANGALORE PRABHAKAR, Gentaro HIROTA, Ronny KRASHINSKY, Ze LONG, Brian PHARRIS, Rajballav DASH, Jeff TUCKEY, Jerome F. DULUK, JR., Lacky SHAH, Luke DURANT, Jack CHOQUETTE, Eric WERNESS, Naman GOVIL, Manan PATEL, Shayani DEB, Sandeep NAVADA, John EDMONDSON, Greg PALMER, Wish GANDHI, Ravi MANYAM, Apoorv PARLE, Olivier GIROUX, Shirish GADRE, Steve HEINRICH
  • Publication number: 20230289215
    Abstract: A new level(s) of hierarchy—Cooperate Group Arrays (CGAs)—and an associated new hardware-based work distribution/execution model is described. A CGA is a grid of thread blocks (also referred to as cooperative thread arrays (CTAs)). CGAs provide co-scheduling, e.g., control over where CTAs are placed/executed in a processor (such as a GPU), relative to the memory required by an application and relative to each other. Hardware support for such CGAs guarantees concurrency and enables applications to see more data locality, reduced latency, and better synchronization between all the threads in tightly cooperating collections of CTAs programmably distributed across different (e.g., hierarchical) hardware domains or partitions.
    Type: Application
    Filed: March 10, 2022
    Publication date: September 14, 2023
    Inventors: Greg PALMER, Gentaro HIROTA, Ronny KRASHINSKY, Ze LONG, Brian PHARRIS, Rajballav DASH, Jeff TUCKEY, Jerome F. DULUK, JR., Lacky SHAH, Luke DURANT, Jack CHOQUETTE, Eric WERNESS, Naman GOVIL, Manan PATEL, Shayani DEB, Sandeep NAVADA, John EDMONDSON, Prakash BANGALORE PRABHAKAR, Wish GANDHI, Ravi MANYAM, Apoorv PARLE, Olivier GIROUX, Shirish GADRE, Steve HEINRICH
  • Publication number: 20230289453
    Abstract: Apparatuses, systems, and techniques for supporting fairness of multiple context sharing cryptographic hardware. An accelerator circuit includes a copy engine (CE) with AES-GCM hardware configured to perform both encryption and authentication of data transfers for multiple applications or multiple data streams in a single application or belonging to a single user. The CE splits a data transfer of a specified size into a set of partial transfers. The CE sequentially executes the set of partial transfers using a context for a period of time (e.g., a timeslice) for an application. The CE stores in a secure memory for the application one or more data for encryption or decryption (e.g., a hash key, a block counter, etc.) computed from a last partial transfer. The one or more data for encryption or decryption are retrieved and used when data transfers for the application is resumed by the CE.
    Type: Application
    Filed: March 10, 2022
    Publication date: September 14, 2023
    Inventors: Adam Hendrickson, Vaishali Kulkarni, Gobikrishna Dhanuskodi, Naveen Cherukuri, Wish Gandhi, Raymond Wong
  • Patent number: 11698869
    Abstract: The subject application relates to computing an authentication tag for partial transfers scheduled across multiple direct memory access (DMA) engines. Apparatuses, systems, and techniques are described for computing an authentication tag for a data transfer when the data transfer is scheduled as partial transfers across a specified number of direct memory access (DMA) engines. An orchestration circuit stores partial authentication tags, computed by the DMA engines, and corresponding adjustment exponents during one or more rounds in which the partial transfers are scheduled and processed by the specified number of DMA engines. During a last round, a combined authentication tag can be computed based on the partial authentication tags and the corresponding adjustment exponents stored by the orchestration circuit during the rounds.
    Type: Grant
    Filed: March 10, 2022
    Date of Patent: July 11, 2023
    Assignee: NVIDIA Corporation
    Inventors: Vaishali Kulkarni, Naveen Cherukuri, Raymond Wong, Adam Hendrickson, Gobikrishna Dhanuskodi, Wish Gandhi