Patents by Inventor Rishkul Kulkarni
Rishkul Kulkarni has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20260161413Abstract: Apparatuses, systems, and techniques to perform operations in a processor asynchronously. In at least one embodiment, a processor performs perform at least one tensor instruction concurrently with one or more other instructions based, at least in part, on one or more indicators of the tensor instruction being asynchronous.Type: ApplicationFiled: December 9, 2024Publication date: June 11, 2026Inventors: Harold Carter Edwards, Vijay Harshad Thakkar, Gokul Ramaswamy Hirisave Chandra Shekhara, Edward H. Gornish, Rishkul Kulkarni, Maciej Piotr Tyrlik, Sean Jeffrey Treichler, Chao Li, Subhasmita Chakraborty, Daniel Joseph Lustig, Arjun Hans
-
Publication number: 20260064413Abstract: Apparatuses, systems, and techniques to perform an instruction to use storage to store information to be used exclusively by one or more tensor operations. In at least one embodiment, a processor retrieves information from storage that exclusively stores matrix information in response to an instruction and performs a multiplication computation using said matrix information.Type: ApplicationFiled: August 29, 2024Publication date: March 5, 2026Inventors: Harold Carter Edwards, Vijay Harshad Thakkar, Gokul Ramaswamy Hirisave Chandra Shekhara, Edward H. Gornish, Rishkul Kulkarni, Maciej Piotr Tyrlik, Sean Jeffrey Treichler, Chao Li
-
Publication number: 20260064803Abstract: Apparatuses, systems, and techniques to perform an matrix multiply accumulate (MMA) instruction to cause a plurality of portions of an MMA operation to be performed using a corresponding plurality of MMA accelerators. In at least one embodiment, a processor retrieves a plurality of matrix information from a memory that exclusively stores and performs a multiplication computation using said matrix information.Type: ApplicationFiled: August 29, 2024Publication date: March 5, 2026Inventors: Harold Carter Edwards, Vijay Harshad Thakkar, Gokul Ramaswamy Hirisave Chandra Shekhara, Edward H. Gornish, Rishkul Kulkarni, Maciej Piotr Tyrlik, Sean Jeffrey Treichler, Chao Li
-
Publication number: 20250383879Abstract: Apparatuses, systems, and techniques to adapt instructions in a SIMT architecture for execution on serial execution units. In at least one embodiment, a predicate mask is initialized to identify a group of active threads associated with an instruction. The predicate mask is initialized with an inherited predicate of the instruction. The instruction is executed for a set of one or more threads selected from the group of active threads using a serial execution unit.Type: ApplicationFiled: August 29, 2025Publication date: December 18, 2025Inventors: Aditya Avinash Atluri, Jack Choquette, Carter Edwards, Olivier Giroux, Praveen Kumar Kaushik, Ronny Krashinsky, Rishkul Kulkarni, Konstantinos Kyriakopoulos
-
Patent number: 12499065Abstract: Apparatuses, systems, and techniques to cause to cause one or more first storage address sizes to be converted into one or more second storage address sizes. In at least one embodiment, one or more circuits are to perform an application programming interface (API) to cause one or more first storage address sizes to be converted to one or more second storage address sizes based, at least in part, on one or more identifiers of one or more physical storage locations corresponding to either of the one or more first storage address sizes or the one or more second storage address sizes.Type: GrantFiled: January 25, 2024Date of Patent: December 16, 2025Assignee: NVIDIA CorporationInventors: Yashwardhan Narawane, Ze Long, Rishkul Kulkarni, Harold Carter Edwards, Vikram Dhar
-
Publication number: 20250355674Abstract: Apparatuses, systems, and techniques to compile and modify software programs. In at least one embodiment, a software program is to be modified to initialize information to be used by one or more application programming interfaces (APIs).Type: ApplicationFiled: May 17, 2024Publication date: November 20, 2025Inventors: Harold Carter Edwards, Stephen Jones, Michael Murphy, Advait Soman, Anis Ladram, Ze Long, Kyrylo Perelygin, Piotr Tomasz Ciolkosz, Kwang Hui Mark Theng, Rishkul Kulkarni, Girish Bhaskarrao Bharambe, Gregory Paul Smith
-
Publication number: 20250355647Abstract: Apparatuses, systems, and techniques to compile and modify software programs. In at least one embodiment, a software program is to be modified to initialize information to be used by one or more application programming interfaces (APIs).Type: ApplicationFiled: May 17, 2024Publication date: November 20, 2025Inventors: Harold Carter Edwards, Stephen Jones, Michael Murphy, Advait Soman, Anis Ladram, Ze Long, Kyrylo Perelygin, Piotr Tomasz Ciolkosz, Kwang Hui Mark Theng, Rishkul Kulkarni, Girish Bhaskarrao Bharambe, Gregory Paul Smith
-
Publication number: 20250335196Abstract: Apparatuses, systems, and techniques to perform computational operations in response to one or more compute uniform device architecture (CUDA) programs. In at least one embodiment, one or more computational operations are to cause one or more other computational operations to wait until a portion of matrix multiply-accumulate (MMA) operations have been performed.Type: ApplicationFiled: December 11, 2024Publication date: October 30, 2025Inventors: Harold Carter Edwards, Kyrylo Perelygin, Maciej Tyrlik, Gokul Ramaswamy Hirisave Chandra Shekhara, Balaji Krishna Yugandhar Atukuri, Rishkul Kulkarni, Konstantinos Kyriakopoulos, Edward H. Gornish, David Allan Berson, Bageshri Sathe, James Player, Aman Arora, Alan Kaatz, Andrew Kerr, Haicheng Wu, Cris Cecka, Vijay Thakkar, Sean Treichler, Jack H. Choquette, Aditya Avinash Atluri, Apoorv Parle, Ronny Meir Krashinsky, Cody Addison, Girish Bhaskarrao Bharambe
-
Patent number: 12405801Abstract: Apparatuses, systems, and techniques to adapt instructions in a SIMT architecture for execution on serial execution units. In at least one embodiment, a set of one or more threads is selected from a group of active threads associated with an instruction and the instruction is executed for the set of one or more threads on a serial execution unit.Type: GrantFiled: February 3, 2023Date of Patent: September 2, 2025Assignee: NVIDIA CorporationInventors: Aditya Avinash Atluri, Jack Choquette, Carter Edwards, Olivier Giroux, Praveen Kumar Kaushik, Ronny Krashinsky, Rishkul Kulkarni, Konstantinos Kyriakopoulos
-
Patent number: 12271765Abstract: Various embodiments include a parallel processing computer system that enables parallel instances of a program to synchronize at disparate addresses in memory. When the parallel program instances need to exchange data, the program instances synchronize based on a mask that identifies the program instances that are synchronizing. As each program instance reaches the point of synchronization, the program instance blocks and waits for all other program instances to reach the point of synchronization. When all program instances have reached the point of synchronization, at least one program instance executes a synchronous operation to exchange data. The program instances then continue execution at respective and disparate return addresses.Type: GrantFiled: June 3, 2021Date of Patent: April 8, 2025Assignee: NVIDIA CORPORATIONInventors: Ajay Sudarshan Tirumala, Olivier Giroux, Peter Nelson, Gary M. Tarolli, Ankita Upreti, Konstantinos Kyriakopoulos, Divya Shanmughan, Rishkul Kulkarni
-
Patent number: 12204897Abstract: Apparatuses, systems, and techniques to perform computational operations in response to one or more compute uniform device architecture (CUDA) programs. In at least one embodiment, one or more computational operations are to cause one or more other computational operations to wait until a portion of matrix multiply-accumulate (MMA) operations have been performed.Type: GrantFiled: November 30, 2022Date of Patent: January 21, 2025Assignee: NVIDIA CORPORATIONInventors: Harold Carter Edwards, Kyrylo Perelygin, Maciej Tyrlik, Gokul Ramaswamy Hirisave Chandra Shekhara, Balaji Krishna Yugandhar Atukuri, Rishkul Kulkarni, Konstantinos Kyriakopoulos, Edward H. Gornish, David Allan Berson, Bageshri Sathe, James Player, Aman Arora, Alan Kaatz, Andrew Kerr, Haicheng Wu, Cris Cecka, Vijay Thakkar, Sean Treichler, Jack H. Choquette, Aditya Avinash Atluri, Apoorv Parle, Ronny Meir Krashinsky, Cody Addison, Girish Bhaskarrao Bharambe
-
Publication number: 20240168762Abstract: Apparatuses, systems, and techniques to perform computational operations in response to one or more compute uniform device architecture (CUDA) programs. In at least one embodiment, one or more computational operations are to cause one or more other computational operations to wait until a portion of matrix multiply-accumulate (MMA) operations have been performed.Type: ApplicationFiled: November 30, 2022Publication date: May 23, 2024Inventors: Harold Carter Edwards, Kyrylo Perelygin, Maciej Tyrlik, Gokul Ramaswamy Hirisave Chandra Shekhara, Balaji Krishna Yugandhar Atukuri, Rishkul Kulkarni, Konstantinos Kyriakopoulos, Edward H. Gornish, David Allan Berson, Bageshri Sathe, James Player, Aman Arora, Alan Kaatz, Andrew Kerr, Haicheng Wu, Cris Cecka, Vijay Thakkar, Sean Treichler, Jack H. Choquette, Aditya Avinash Atluri, Apoorv Parle, Ronny Meir Krashinsky, Cody Addison, Girish Bhaskarrao Bharambe
-
Publication number: 20240169022Abstract: Apparatuses, systems, and techniques to perform computational operations in response to one or more compute uniform device architecture (CUDA) programs. In at least one embodiment, one or more computational operations are to cause one or more other computational operations to wait until matrix multiply-accumulate (MMA) memory transactions are performed.Type: ApplicationFiled: November 30, 2022Publication date: May 23, 2024Inventors: Harold Carter Edwards, Kyrylo Perelygin, Maciej Tyrlik, Gokul Ramaswamy Hirisave Chandra Shekhara, Balaji Krishna Yugandhar Atukuri, Rishkul Kulkarni, Konstantinos Kyriakopoulos, Edward H. Gornish, David Allan Berson, Bageshri Sathe, James Player, Aman Arora, Alan Kaatz, Andrew Kerr, Haicheng Wu, Cris Cecka, Vijay Thakkar, Sean Treichler, Jack H. Choquette, Aditya Avinash Atluri, Apoorv Parle, Ronny Meir Krashinsky, Cody Addison, Girish Bhaskarrao Bharambe
-
Publication number: 20240169023Abstract: Apparatuses, systems, and techniques to perform computational operations in response to one or more compute uniform device architecture (CUDA) programs. In at least one embodiment, one or more computational operations are to indicate whether matrix multiply-accumulate (MMA) memory operations are complete.Type: ApplicationFiled: November 30, 2022Publication date: May 23, 2024Inventors: Harold Carter Edwards, Kyrylo Perelygin, Maciej Tyrlik, Gokul Ramaswamy Hirisave Chandra Shekhara, Balaji Krishna Yugandhar Atukuri, Rishkul Kulkarni, Konstantinos Kyriakopoulos, Edward H. Gornish, David Allan Berson, Bageshri Sathe, James Player, Aman Arora, Alan Kaatz, Andrew Kerr, Haicheng Wu, Cris Cecka, Vijay Thakkar, Sean Treichler, Jack H. Choquette, Aditya Avinash Atluri, Apoorv Parle, Ronny Meir Krashinsky, Cody Addison, Girish Bhaskarrao Bharambe
-
Publication number: 20240168763Abstract: Apparatuses, systems, and techniques to perform computational operations in response to one or more compute uniform device architecture (CUDA) programs. In at least one embodiment, one or more computational operations are to cause two or more other computational operations to be performed by two or more streaming multiprocessors (SMs).Type: ApplicationFiled: November 30, 2022Publication date: May 23, 2024Inventors: Harold Carter Edwards, Kyrylo Perelygin, Maciej Tyrlik, Gokul Ramaswamy Hirisave Chandra Shekhara, Balaji Krishna Yugandhar Atukuri, Rishkul Kulkarni, Konstantinos Kyriakopoulos, Edward H. Gornish, David Allan Berson, Bageshri Sathe, James Player, Aman Arora, Alan Kaatz, Andrew Kerr, Haicheng Wu, Cris Cecka, Vijay Thakkar, Sean Treichler, Jack H. Choquette, Aditya Avinash Atluri, Apoorv Parle, Ronny Meir Krashinsky, Cody Addison, Girish Bhaskarrao Bharambe
-
Publication number: 20240118899Abstract: Apparatuses, systems, and techniques to adapt instructions in a SIMT architecture for execution on serial execution units. In at least one embodiment, a set of one or more threads is selected from a group of active threads associated with an instruction and the instruction is executed for the set of one or more threads on a serial execution unit.Type: ApplicationFiled: February 3, 2023Publication date: April 11, 2024Inventors: Aditya Avinash Atluri, Jack Choquette, Carter Edwards, Olivier Giroux, Praveen Kumar Kaushik, Ronny Krashinsky, Rishkul Kulkarni, Konstantinos Kyriakopoulos
-
Patent number: 11934867Abstract: Warp sharding techniques to switch execution between divergent shards on instructions that trigger a long stall, thereby interleaving execution between diverged threads within a warp instead of across warps. The technique may be applied to mitigate pipeline stalls in applications with low warp occupancy and high divergence. Warp data cache locality may also be improved by concentrating memory accesses within a warp rather than spreading them across warps.Type: GrantFiled: February 24, 2021Date of Patent: March 19, 2024Assignee: NVIDIA CORP.Inventors: Sana Damani, Mark Stephenson, Ram Rangan, Daniel Robert Johnson, Rishkul Kulkarni
-
Publication number: 20230305845Abstract: Apparatuses, systems, and techniques to cause data to be selectively stored in one or more memory locations. In at least one embodiment, a processor is to cause data to be selectively stored in one or more memory locations based, at least in part, on one or more threads to use the data.Type: ApplicationFiled: March 31, 2022Publication date: September 28, 2023Inventors: Harold Carter Edwards, Stephen Anthony Bernard Jones, David Anthony Fontaine, Sebastian Piotr Jodlowski, Aditya Avinash Atluri, Andrew Robert Kerr, Michael Andrew Clark, Gonzalo Brito Gadeschi, Olivier Giroux, Jaydeep Marathe, Thibaut Lutz, Hariharan Sandanagobalane, Gokul Ramaswamy Hirisave Chandra Shekhara, Girish Bhaskarrao Bharambe, Rishkul Kulkarni, Konstantinos Kyriakopoulos
-
Publication number: 20220027194Abstract: Warp sharding techniques to switch execution between divergent shards on instructions that trigger a long stall, thereby interleaving execution between diverged threads within a warp instead of across warps. The technique may be applied to mitigate pipeline stalls in applications with low warp occupancy and high divergence. Warp data cache locality may also be improved by concentrating memory accesses within a warp rather than spreading them across warps.Type: ApplicationFiled: February 24, 2021Publication date: January 27, 2022Applicant: NVIDIA Corp.Inventors: Sana Damani, Mark Stephenson, Ram Rangan, Daniel Robert Johnson, Rishkul Kulkarni