Patents by Inventor Tanmoy Mandal

Tanmoy Mandal has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230289211
    Abstract: A processor supports new thread group hierarchies by centralizing work distribution to provide hardware-guaranteed concurrent execution of thread groups in a thread group array through speculative launch and load balancing across processing cores. Efficiencies are realized by distributing grid rasterization among the processing cores.
    Type: Application
    Filed: March 10, 2022
    Publication date: September 14, 2023
    Inventors: Gentaro HIROTA, Tanmoy MANDAL, Jeff TUCKEY, Kevin STEPHANO, Chen MEI, Shayani DEB, Naman GOVIL, Rajballav DASH, Ronny KRASHINSKY, Ze LONG, Brian PHARRIS
  • Patent number: 10915445
    Abstract: A method, computer readable medium, and system are disclosed for a distributed cache that provides multiple processing units with fast access to a portion of data, which is stored in local memory. The distributed cache is composed of multiple smaller caches, and each of the smaller caches is associated with at least one processing unit. In addition to a shared crossbar network through which data is transferred between processing units and the smaller caches, a dedicated connection is provided between two or more smaller caches that form a partner cache set. Transferring data through the dedicated connections reduces congestion on the shared crossbar network. Reducing congestion on the shared crossbar network increases the available bandwidth and allows the number of processing units to increase. A coherence protocol is defined for accessing data stored in the distributed cache and for transferring data between the smaller caches of a partner cache set.
    Type: Grant
    Filed: September 18, 2018
    Date of Patent: February 9, 2021
    Assignee: NVIDIA Corporation
    Inventors: Wishwesh Anil Gandhi, Tanmoy Mandal, Ravi Kiran Manyam, Supriya Shrihari Rao
  • Publication number: 20200089611
    Abstract: A method, computer readable medium, and system are disclosed for a distributed cache that provides multiple processing units with fast access to a portion of data, which is stored in local memory. The distributed cache is composed of multiple smaller caches, and each of the smaller caches is associated with at least one processing unit. In addition to a shared crossbar network through which data is transferred between processing units and the smaller caches, a dedicated connection is provided between two or more smaller caches that form a partner cache set. Transferring data through the dedicated connections reduces congestion on the shared crossbar network. Reducing congestion on the shared crossbar network increases the available bandwidth and allows the number of processing units to increase. A coherence protocol is defined for accessing data stored in the distributed cache and for transferring data between the smaller caches of a partner cache set.
    Type: Application
    Filed: September 18, 2018
    Publication date: March 19, 2020
    Inventors: Wishwesh Anil Gandhi, Tanmoy Mandal, Ravi Kiran Manyam, Supriya Shrihari Rao
  • Patent number: 9715413
    Abstract: One embodiment of the present invention sets forth a technique for selecting a first processor included in a plurality of processors to receive work related to a compute task. The technique involves analyzing state data of each processor in the plurality of processors to identify one or more processors that have already been assigned one compute task and are eligible to receive work related to the one compute task, receiving, from each of the one or more processors identified as eligible, an availability value that indicates the capacity of the processor to receive new work, selecting a first processor to receive work related to the one compute task based on the availability values received from the one or more processors, and issuing, to the first processor via a cooperative thread array (CTA), the work related to the one compute task.
    Type: Grant
    Filed: January 18, 2012
    Date of Patent: July 25, 2017
    Assignee: NVIDIA Corporation
    Inventors: Karim M. Abdalla, Lacky V. Shah, Jerome F. Duluk, Jr., Timothy John Purcell, Tanmoy Mandal, Gentaro Hirota
  • Patent number: 9069609
    Abstract: One embodiment of the present invention sets forth a technique for assigning a compute task to a first processor included in a plurality of processors. The technique involves analyzing each compute task in a plurality of compute tasks to identify one or more compute tasks that are eligible for assignment to the first processor, where each compute task is listed in a first table and is associated with a priority value and an allocation order that indicates relative time at which the compute task was added to the first table. The technique further involves selecting a first task compute from the identified one or more compute tasks based on at least one of the priority value and the allocation order, and assigning the first compute task to the first processor for execution.
    Type: Grant
    Filed: January 18, 2012
    Date of Patent: June 30, 2015
    Assignee: NVIDIA CORPORATION
    Inventors: Karim M. Abdalla, Lacky V. Shah, Jerome F. Duluk, Jr., Timothy John Purcell, Tanmoy Mandal, Gentaro Hirota
  • Publication number: 20130185725
    Abstract: One embodiment of the present invention sets forth a technique for selecting a first processor included in a plurality of processors to receive work related to a compute task. The technique involves analyzing state data of each processor in the plurality of processors to identify one or more processors that have already been assigned one compute task and are eligible to receive work related to the one compute task, receiving, from each of the one or more processors identified as eligible, an availability value that indicates the capacity of the processor to receive new work, selecting a first processor to receive work related to the one compute task based on the availability values received from the one or more processors, and issuing, to the first processor via a cooperative thread array (CTA), the work related to the one compute task.
    Type: Application
    Filed: January 18, 2012
    Publication date: July 18, 2013
    Inventors: Karim M. ABDALLA, Lacky V. Shah, Jerome F. Duluk, JR., Timothy John Purcell, Tanmoy Mandal, Gentaro Hirota
  • Publication number: 20130185728
    Abstract: One embodiment of the present invention sets forth a technique for assigning a compute task to a first processor included in a plurality of processors. The technique involves analyzing each compute task in a plurality of compute tasks to identify one or more compute tasks that are eligible for assignment to the first processor, where each compute task is listed in a first table and is associated with a priority value and an allocation order that indicates relative time at which the compute task was added to the first table. The technique further involves selecting a first task compute from the identified one or more compute tasks based on at least one of the priority value and the allocation order, and assigning the first compute task to the first processor for execution.
    Type: Application
    Filed: January 18, 2012
    Publication date: July 18, 2013
    Inventors: Karim M. Abdalla, Lacky V. Shah, Jerome F. Duluk, JR., Timothy John Purcell, Tanmoy Mandal, Gentaro Hirota
  • Patent number: 8392669
    Abstract: One embodiment of the present invention sets forth a technique for efficiently and flexibly performing coalesced memory accesses for a thread group. For each read application request that services a thread group, the core interface generates one pending request table (PRT) entry and one or more memory access requests. The core interface determines the number of memory access requests and the size of each memory access request based on the spread of the memory access addresses in the application request. Each memory access request specifies the particular threads that the memory access request services. The PRT entry tracks the number of pending memory access requests. As the memory interface completes each memory access request, the core interface uses information in the memory access request and the corresponding PRT entry to route the returned data.
    Type: Grant
    Filed: November 26, 2008
    Date of Patent: March 5, 2013
    Assignee: NVIDIA Corporation
    Inventors: Lars Nyland, John R. Nickolls, Gentaro Hirota, Tanmoy Mandal
  • Patent number: 8086806
    Abstract: One embodiment of the present invention sets forth a technique for efficiently and flexibly performing coalesced memory accesses for a thread group. For each read application request that services a thread group, the core interface generates one pending request table (PRT) entry and one or more memory access requests. The core interface determines the number of memory access requests and the size of each memory access request based on the spread of the memory access addresses in the application request. Each memory access request specifies the particular threads that the memory access request services. The PRT entry tracks the number of pending memory access requests. As the memory interface completes each memory access request, the core interface uses information in the memory access request and the corresponding PRT entry to route the returned data.
    Type: Grant
    Filed: March 24, 2008
    Date of Patent: December 27, 2011
    Assignee: NVIDIA Corporation
    Inventors: Lars Nyland, John R. Nickolls, Gentaro Hirota, Tanmoy Mandal
  • Publication number: 20090240895
    Abstract: One embodiment of the present invention sets forth a technique for efficiently and flexibly performing coalesced memory accesses for a thread group. For each read application request that services a thread group, the core interface generates one pending request table (PRT) entry and one or more memory access requests. The core interface determines the number of memory access requests and the size of each memory access request based on the spread of the memory access addresses in the application request. Each memory access request specifies the particular threads that the memory access request services. The PRT entry tracks the number of pending memory access requests. As the memory interface completes each memory access request, the core interface uses information in the memory access request and the corresponding PRT entry to route the returned data.
    Type: Application
    Filed: March 24, 2008
    Publication date: September 24, 2009
    Inventors: Lars Nyland, John R. Nickolls, Gentaro Hirota, Tanmoy Mandal