Patents by Inventor Tanmoy Mandal
Tanmoy Mandal has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20230289211Abstract: A processor supports new thread group hierarchies by centralizing work distribution to provide hardware-guaranteed concurrent execution of thread groups in a thread group array through speculative launch and load balancing across processing cores. Efficiencies are realized by distributing grid rasterization among the processing cores.Type: ApplicationFiled: March 10, 2022Publication date: September 14, 2023Inventors: Gentaro HIROTA, Tanmoy MANDAL, Jeff TUCKEY, Kevin STEPHANO, Chen MEI, Shayani DEB, Naman GOVIL, Rajballav DASH, Ronny KRASHINSKY, Ze LONG, Brian PHARRIS
-
Patent number: 10915445Abstract: A method, computer readable medium, and system are disclosed for a distributed cache that provides multiple processing units with fast access to a portion of data, which is stored in local memory. The distributed cache is composed of multiple smaller caches, and each of the smaller caches is associated with at least one processing unit. In addition to a shared crossbar network through which data is transferred between processing units and the smaller caches, a dedicated connection is provided between two or more smaller caches that form a partner cache set. Transferring data through the dedicated connections reduces congestion on the shared crossbar network. Reducing congestion on the shared crossbar network increases the available bandwidth and allows the number of processing units to increase. A coherence protocol is defined for accessing data stored in the distributed cache and for transferring data between the smaller caches of a partner cache set.Type: GrantFiled: September 18, 2018Date of Patent: February 9, 2021Assignee: NVIDIA CorporationInventors: Wishwesh Anil Gandhi, Tanmoy Mandal, Ravi Kiran Manyam, Supriya Shrihari Rao
-
Publication number: 20200089611Abstract: A method, computer readable medium, and system are disclosed for a distributed cache that provides multiple processing units with fast access to a portion of data, which is stored in local memory. The distributed cache is composed of multiple smaller caches, and each of the smaller caches is associated with at least one processing unit. In addition to a shared crossbar network through which data is transferred between processing units and the smaller caches, a dedicated connection is provided between two or more smaller caches that form a partner cache set. Transferring data through the dedicated connections reduces congestion on the shared crossbar network. Reducing congestion on the shared crossbar network increases the available bandwidth and allows the number of processing units to increase. A coherence protocol is defined for accessing data stored in the distributed cache and for transferring data between the smaller caches of a partner cache set.Type: ApplicationFiled: September 18, 2018Publication date: March 19, 2020Inventors: Wishwesh Anil Gandhi, Tanmoy Mandal, Ravi Kiran Manyam, Supriya Shrihari Rao
-
Patent number: 9715413Abstract: One embodiment of the present invention sets forth a technique for selecting a first processor included in a plurality of processors to receive work related to a compute task. The technique involves analyzing state data of each processor in the plurality of processors to identify one or more processors that have already been assigned one compute task and are eligible to receive work related to the one compute task, receiving, from each of the one or more processors identified as eligible, an availability value that indicates the capacity of the processor to receive new work, selecting a first processor to receive work related to the one compute task based on the availability values received from the one or more processors, and issuing, to the first processor via a cooperative thread array (CTA), the work related to the one compute task.Type: GrantFiled: January 18, 2012Date of Patent: July 25, 2017Assignee: NVIDIA CorporationInventors: Karim M. Abdalla, Lacky V. Shah, Jerome F. Duluk, Jr., Timothy John Purcell, Tanmoy Mandal, Gentaro Hirota
-
Patent number: 9069609Abstract: One embodiment of the present invention sets forth a technique for assigning a compute task to a first processor included in a plurality of processors. The technique involves analyzing each compute task in a plurality of compute tasks to identify one or more compute tasks that are eligible for assignment to the first processor, where each compute task is listed in a first table and is associated with a priority value and an allocation order that indicates relative time at which the compute task was added to the first table. The technique further involves selecting a first task compute from the identified one or more compute tasks based on at least one of the priority value and the allocation order, and assigning the first compute task to the first processor for execution.Type: GrantFiled: January 18, 2012Date of Patent: June 30, 2015Assignee: NVIDIA CORPORATIONInventors: Karim M. Abdalla, Lacky V. Shah, Jerome F. Duluk, Jr., Timothy John Purcell, Tanmoy Mandal, Gentaro Hirota
-
Publication number: 20130185725Abstract: One embodiment of the present invention sets forth a technique for selecting a first processor included in a plurality of processors to receive work related to a compute task. The technique involves analyzing state data of each processor in the plurality of processors to identify one or more processors that have already been assigned one compute task and are eligible to receive work related to the one compute task, receiving, from each of the one or more processors identified as eligible, an availability value that indicates the capacity of the processor to receive new work, selecting a first processor to receive work related to the one compute task based on the availability values received from the one or more processors, and issuing, to the first processor via a cooperative thread array (CTA), the work related to the one compute task.Type: ApplicationFiled: January 18, 2012Publication date: July 18, 2013Inventors: Karim M. ABDALLA, Lacky V. Shah, Jerome F. Duluk, JR., Timothy John Purcell, Tanmoy Mandal, Gentaro Hirota
-
Publication number: 20130185728Abstract: One embodiment of the present invention sets forth a technique for assigning a compute task to a first processor included in a plurality of processors. The technique involves analyzing each compute task in a plurality of compute tasks to identify one or more compute tasks that are eligible for assignment to the first processor, where each compute task is listed in a first table and is associated with a priority value and an allocation order that indicates relative time at which the compute task was added to the first table. The technique further involves selecting a first task compute from the identified one or more compute tasks based on at least one of the priority value and the allocation order, and assigning the first compute task to the first processor for execution.Type: ApplicationFiled: January 18, 2012Publication date: July 18, 2013Inventors: Karim M. Abdalla, Lacky V. Shah, Jerome F. Duluk, JR., Timothy John Purcell, Tanmoy Mandal, Gentaro Hirota
-
Patent number: 8392669Abstract: One embodiment of the present invention sets forth a technique for efficiently and flexibly performing coalesced memory accesses for a thread group. For each read application request that services a thread group, the core interface generates one pending request table (PRT) entry and one or more memory access requests. The core interface determines the number of memory access requests and the size of each memory access request based on the spread of the memory access addresses in the application request. Each memory access request specifies the particular threads that the memory access request services. The PRT entry tracks the number of pending memory access requests. As the memory interface completes each memory access request, the core interface uses information in the memory access request and the corresponding PRT entry to route the returned data.Type: GrantFiled: November 26, 2008Date of Patent: March 5, 2013Assignee: NVIDIA CorporationInventors: Lars Nyland, John R. Nickolls, Gentaro Hirota, Tanmoy Mandal
-
Patent number: 8086806Abstract: One embodiment of the present invention sets forth a technique for efficiently and flexibly performing coalesced memory accesses for a thread group. For each read application request that services a thread group, the core interface generates one pending request table (PRT) entry and one or more memory access requests. The core interface determines the number of memory access requests and the size of each memory access request based on the spread of the memory access addresses in the application request. Each memory access request specifies the particular threads that the memory access request services. The PRT entry tracks the number of pending memory access requests. As the memory interface completes each memory access request, the core interface uses information in the memory access request and the corresponding PRT entry to route the returned data.Type: GrantFiled: March 24, 2008Date of Patent: December 27, 2011Assignee: NVIDIA CorporationInventors: Lars Nyland, John R. Nickolls, Gentaro Hirota, Tanmoy Mandal
-
Publication number: 20090240895Abstract: One embodiment of the present invention sets forth a technique for efficiently and flexibly performing coalesced memory accesses for a thread group. For each read application request that services a thread group, the core interface generates one pending request table (PRT) entry and one or more memory access requests. The core interface determines the number of memory access requests and the size of each memory access request based on the spread of the memory access addresses in the application request. Each memory access request specifies the particular threads that the memory access request services. The PRT entry tracks the number of pending memory access requests. As the memory interface completes each memory access request, the core interface uses information in the memory access request and the corresponding PRT entry to route the returned data.Type: ApplicationFiled: March 24, 2008Publication date: September 24, 2009Inventors: Lars Nyland, John R. Nickolls, Gentaro Hirota, Tanmoy Mandal