Patents by Inventor Rishkul Kulkarni

Rishkul Kulkarni has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Techniques for efficiently synchronizing multiple program threads

Patent number: 12271765

Abstract: Various embodiments include a parallel processing computer system that enables parallel instances of a program to synchronize at disparate addresses in memory. When the parallel program instances need to exchange data, the program instances synchronize based on a mask that identifies the program instances that are synchronizing. As each program instance reaches the point of synchronization, the program instance blocks and waits for all other program instances to reach the point of synchronization. When all program instances have reached the point of synchronization, at least one program instance executes a synchronous operation to exchange data. The program instances then continue execution at respective and disparate return addresses.

Type: Grant

Filed: June 3, 2021

Date of Patent: April 8, 2025

Assignee: NVIDIA CORPORATION

Inventors: Ajay Sudarshan Tirumala, Olivier Giroux, Peter Nelson, Gary M. Tarolli, Ankita Upreti, Konstantinos Kyriakopoulos, Divya Shanmughan, Rishkul Kulkarni
Application programming interface to wait on matrix multiply-accumulate

Patent number: 12204897

Abstract: Apparatuses, systems, and techniques to perform computational operations in response to one or more compute uniform device architecture (CUDA) programs. In at least one embodiment, one or more computational operations are to cause one or more other computational operations to wait until a portion of matrix multiply-accumulate (MMA) operations have been performed.

Type: Grant

Filed: November 30, 2022

Date of Patent: January 21, 2025

Assignee: NVIDIA CORPORATION

Inventors: Harold Carter Edwards, Kyrylo Perelygin, Maciej Tyrlik, Gokul Ramaswamy Hirisave Chandra Shekhara, Balaji Krishna Yugandhar Atukuri, Rishkul Kulkarni, Konstantinos Kyriakopoulos, Edward H. Gornish, David Allan Berson, Bageshri Sathe, James Player, Aman Arora, Alan Kaatz, Andrew Kerr, Haicheng Wu, Cris Cecka, Vijay Thakkar, Sean Treichler, Jack H. Choquette, Aditya Avinash Atluri, Apoorv Parle, Ronny Meir Krashinsky, Cody Addison, Girish Bhaskarrao Bharambe
APPLICATION PROGRAMMING INTERFACE TO WAIT ON MATRIX MULTIPLY-ACCUMULATE

Publication number: 20240168762

Abstract: Apparatuses, systems, and techniques to perform computational operations in response to one or more compute uniform device architecture (CUDA) programs. In at least one embodiment, one or more computational operations are to cause one or more other computational operations to wait until a portion of matrix multiply-accumulate (MMA) operations have been performed.

Type: Application

Filed: November 30, 2022

Publication date: May 23, 2024

Inventors: Harold Carter Edwards, Kyrylo Perelygin, Maciej Tyrlik, Gokul Ramaswamy Hirisave Chandra Shekhara, Balaji Krishna Yugandhar Atukuri, Rishkul Kulkarni, Konstantinos Kyriakopoulos, Edward H. Gornish, David Allan Berson, Bageshri Sathe, James Player, Aman Arora, Alan Kaatz, Andrew Kerr, Haicheng Wu, Cris Cecka, Vijay Thakkar, Sean Treichler, Jack H. Choquette, Aditya Avinash Atluri, Apoorv Parle, Ronny Meir Krashinsky, Cody Addison, Girish Bhaskarrao Bharambe
APPLICATION PROGRAMMING INTERFACE TO INDICATE OPERATIONS TO BE PERFORMED BY CORRESPONDING STREAMING MULTIPROCESSORS

Publication number: 20240168763

Abstract: Apparatuses, systems, and techniques to perform computational operations in response to one or more compute uniform device architecture (CUDA) programs. In at least one embodiment, one or more computational operations are to cause two or more other computational operations to be performed by two or more streaming multiprocessors (SMs).

Type: Application

Filed: November 30, 2022

Publication date: May 23, 2024

Inventors: Harold Carter Edwards, Kyrylo Perelygin, Maciej Tyrlik, Gokul Ramaswamy Hirisave Chandra Shekhara, Balaji Krishna Yugandhar Atukuri, Rishkul Kulkarni, Konstantinos Kyriakopoulos, Edward H. Gornish, David Allan Berson, Bageshri Sathe, James Player, Aman Arora, Alan Kaatz, Andrew Kerr, Haicheng Wu, Cris Cecka, Vijay Thakkar, Sean Treichler, Jack H. Choquette, Aditya Avinash Atluri, Apoorv Parle, Ronny Meir Krashinsky, Cody Addison, Girish Bhaskarrao Bharambe
APPLICATION PROGRAMMING INTERFACE TO SYNCHRONIZE MATRIX MULTIPLY-ACCUMULATE MEMORY TRANSACTIONS

Publication number: 20240169022

Abstract: Apparatuses, systems, and techniques to perform computational operations in response to one or more compute uniform device architecture (CUDA) programs. In at least one embodiment, one or more computational operations are to cause one or more other computational operations to wait until matrix multiply-accumulate (MMA) memory transactions are performed.

Type: Application

Filed: November 30, 2022

Publication date: May 23, 2024

Inventors: Harold Carter Edwards, Kyrylo Perelygin, Maciej Tyrlik, Gokul Ramaswamy Hirisave Chandra Shekhara, Balaji Krishna Yugandhar Atukuri, Rishkul Kulkarni, Konstantinos Kyriakopoulos, Edward H. Gornish, David Allan Berson, Bageshri Sathe, James Player, Aman Arora, Alan Kaatz, Andrew Kerr, Haicheng Wu, Cris Cecka, Vijay Thakkar, Sean Treichler, Jack H. Choquette, Aditya Avinash Atluri, Apoorv Parle, Ronny Meir Krashinsky, Cody Addison, Girish Bhaskarrao Bharambe
APPLICATION PROGRAMMING INTERFACE TO INDICATE MATRIX MULTIPLY-ACCUMULATE

Publication number: 20240169023

Abstract: Apparatuses, systems, and techniques to perform computational operations in response to one or more compute uniform device architecture (CUDA) programs. In at least one embodiment, one or more computational operations are to indicate whether matrix multiply-accumulate (MMA) memory operations are complete.

Type: Application

Filed: November 30, 2022

Publication date: May 23, 2024

Inventors: Harold Carter Edwards, Kyrylo Perelygin, Maciej Tyrlik, Gokul Ramaswamy Hirisave Chandra Shekhara, Balaji Krishna Yugandhar Atukuri, Rishkul Kulkarni, Konstantinos Kyriakopoulos, Edward H. Gornish, David Allan Berson, Bageshri Sathe, James Player, Aman Arora, Alan Kaatz, Andrew Kerr, Haicheng Wu, Cris Cecka, Vijay Thakkar, Sean Treichler, Jack H. Choquette, Aditya Avinash Atluri, Apoorv Parle, Ronny Meir Krashinsky, Cody Addison, Girish Bhaskarrao Bharambe
SCALARIZATION OF INSTRUCTIONS FOR SIMT ARCHITECTURES

Publication number: 20240118899

Abstract: Apparatuses, systems, and techniques to adapt instructions in a SIMT architecture for execution on serial execution units. In at least one embodiment, a set of one or more threads is selected from a group of active threads associated with an instruction and the instruction is executed for the set of one or more threads on a serial execution unit.

Type: Application

Filed: February 3, 2023

Publication date: April 11, 2024

Inventors: Aditya Avinash Atluri, Jack Choquette, Carter Edwards, Olivier Giroux, Praveen Kumar Kaushik, Ronny Krashinsky, Rishkul Kulkarni, Konstantinos Kyriakopoulos
Techniques for divergent thread group execution scheduling

Patent number: 11934867

Abstract: Warp sharding techniques to switch execution between divergent shards on instructions that trigger a long stall, thereby interleaving execution between diverged threads within a warp instead of across warps. The technique may be applied to mitigate pipeline stalls in applications with low warp occupancy and high divergence. Warp data cache locality may also be improved by concentrating memory accesses within a warp rather than spreading them across warps.

Type: Grant

Filed: February 24, 2021

Date of Patent: March 19, 2024

Assignee: NVIDIA CORP.

Inventors: Sana Damani, Mark Stephenson, Ram Rangan, Daniel Robert Johnson, Rishkul Kulkarni
TECHNIQUES TO SELECTIVELY STORE DATA

Publication number: 20230305845

Abstract: Apparatuses, systems, and techniques to cause data to be selectively stored in one or more memory locations. In at least one embodiment, a processor is to cause data to be selectively stored in one or more memory locations based, at least in part, on one or more threads to use the data.

Type: Application

Filed: March 31, 2022

Publication date: September 28, 2023

Inventors: Harold Carter Edwards, Stephen Anthony Bernard Jones, David Anthony Fontaine, Sebastian Piotr Jodlowski, Aditya Avinash Atluri, Andrew Robert Kerr, Michael Andrew Clark, Gonzalo Brito Gadeschi, Olivier Giroux, Jaydeep Marathe, Thibaut Lutz, Hariharan Sandanagobalane, Gokul Ramaswamy Hirisave Chandra Shekhara, Girish Bhaskarrao Bharambe, Rishkul Kulkarni, Konstantinos Kyriakopoulos
TECHNIQUES FOR DIVERGENT THREAD GROUP EXECUTION SCHEDULING

Publication number: 20220027194

Abstract: Warp sharding techniques to switch execution between divergent shards on instructions that trigger a long stall, thereby interleaving execution between diverged threads within a warp instead of across warps. The technique may be applied to mitigate pipeline stalls in applications with low warp occupancy and high divergence. Warp data cache locality may also be improved by concentrating memory accesses within a warp rather than spreading them across warps.

Type: Application

Filed: February 24, 2021

Publication date: January 27, 2022

Applicant: NVIDIA Corp.

Inventors: Sana Damani, Mark Stephenson, Ram Rangan, Daniel Robert Johnson, Rishkul Kulkarni