Patents by Inventor Alan Kaatz

Alan Kaatz has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12204897
    Abstract: Apparatuses, systems, and techniques to perform computational operations in response to one or more compute uniform device architecture (CUDA) programs. In at least one embodiment, one or more computational operations are to cause one or more other computational operations to wait until a portion of matrix multiply-accumulate (MMA) operations have been performed.
    Type: Grant
    Filed: November 30, 2022
    Date of Patent: January 21, 2025
    Assignee: NVIDIA CORPORATION
    Inventors: Harold Carter Edwards, Kyrylo Perelygin, Maciej Tyrlik, Gokul Ramaswamy Hirisave Chandra Shekhara, Balaji Krishna Yugandhar Atukuri, Rishkul Kulkarni, Konstantinos Kyriakopoulos, Edward H. Gornish, David Allan Berson, Bageshri Sathe, James Player, Aman Arora, Alan Kaatz, Andrew Kerr, Haicheng Wu, Cris Cecka, Vijay Thakkar, Sean Treichler, Jack H. Choquette, Aditya Avinash Atluri, Apoorv Parle, Ronny Meir Krashinsky, Cody Addison, Girish Bhaskarrao Bharambe
  • Patent number: 12141082
    Abstract: A parallel processing unit comprises a plurality of processors each being coupled to a memory access hardware circuitry. Each memory access hardware circuitry is configured to receive, from the coupled processor, a memory access request specifying a coordinate of a multidimensional data structure, wherein the memory access hardware circuit is one of a plurality of memory access circuitry each coupled to a respective one of the processors; and, in response to the memory access request, translate the coordinate of the multidimensional data structure into plural memory addresses for the multidimensional data structure and using the plural memory addresses, asynchronously transfer at least a portion of the multidimensional data structure for processing by at least the coupled processor. The memory locations may be in the shared memory of the coupled processor and/or an external memory.
    Type: Grant
    Filed: March 10, 2022
    Date of Patent: November 12, 2024
    Assignee: NVIDIA CORPORATION
    Inventors: Alexander L. Minkin, Alan Kaatz, Oliver Giroux, Jack Choquette, Shirish Gadre, Manan Patel, John Tran, Ronny Krashinsky, Jeff Schottmiller
  • Publication number: 20240169472
    Abstract: Apparatuses, systems, and techniques to perform a tensor prefetch instruction to cause one or more tensors to be transformed and stored into one or more caches. In at least one embodiment, one or more circuits of a GPU are to perform a tensor prefetch instruction to cause one or more tensors to be transformed and stored into one or more GPU caches.
    Type: Application
    Filed: December 21, 2022
    Publication date: May 23, 2024
    Inventors: Harold Carter Edwards, Stephen Anthony Bernard Jones, Alexander Lev Minkin, Olivier Giroux, Gokul Ramaswamy Hirisave Chandra Shekhara, Aditya Avinash Atluri, Apoorv Parle, Ronny Meir Krashinsky, Alan Kaatz, Andrew Robert Kerr, Jack H. Choquette
  • Publication number: 20240169469
    Abstract: Apparatuses, systems, and techniques to transform information corresponding to one or more memory transactions. In at least one embodiment, one or more circuits are to perform an application programming interface (API) to cause information corresponding to one or more memory transactions resulting from performance of the API to be transformed.
    Type: Application
    Filed: December 21, 2022
    Publication date: May 23, 2024
    Inventors: Harold Carter Edwards, Stephen Anthony Bernard Jones, Alexander Lev Minkin, Olivier Giroux, Gokul Ramaswamy Hirisave Chandra Shekhara, Aditya Avinash Atluri, Apoorv Parle, Chao Li, Ronny Meir Krashinsky, Alan Kaatz, Andrew Robert Kerr, Jack H. Choquette
  • Publication number: 20240169471
    Abstract: Apparatuses, systems, and techniques to perform a graphics processing unit (GPU) prefetch instruction to cause a variable amount of information to be stored into one or more GPU caches. In at least one embodiment, one or more circuits of a GPU are to perform a GPU prefetch instruction to cause a variable amount of information to be stored into one or more GPU caches.
    Type: Application
    Filed: December 21, 2022
    Publication date: May 23, 2024
    Inventors: Harold Carter Edwards, Stephen Anthony Bernard Jones, Alexander Lev Minkin, Olivier Giroux, Gokul Ramaswamy Hirisave Chandra Shekhara, Aditya Avinash Atluri, Apoorv Parle, Ronny Meir Krashinsky, Alan Kaatz, Andrew Robert Kerr, Jack H. Choquette
  • Publication number: 20240169022
    Abstract: Apparatuses, systems, and techniques to perform computational operations in response to one or more compute uniform device architecture (CUDA) programs. In at least one embodiment, one or more computational operations are to cause one or more other computational operations to wait until matrix multiply-accumulate (MMA) memory transactions are performed.
    Type: Application
    Filed: November 30, 2022
    Publication date: May 23, 2024
    Inventors: Harold Carter Edwards, Kyrylo Perelygin, Maciej Tyrlik, Gokul Ramaswamy Hirisave Chandra Shekhara, Balaji Krishna Yugandhar Atukuri, Rishkul Kulkarni, Konstantinos Kyriakopoulos, Edward H. Gornish, David Allan Berson, Bageshri Sathe, James Player, Aman Arora, Alan Kaatz, Andrew Kerr, Haicheng Wu, Cris Cecka, Vijay Thakkar, Sean Treichler, Jack H. Choquette, Aditya Avinash Atluri, Apoorv Parle, Ronny Meir Krashinsky, Cody Addison, Girish Bhaskarrao Bharambe
  • Publication number: 20240168829
    Abstract: Apparatuses, systems, and techniques to generate a tensor mapping. In at least one embodiment, one or more circuits are to perform an application programming interface (API) to cause a mapping from a first tensor to a second tensor to be generated.
    Type: Application
    Filed: December 21, 2022
    Publication date: May 23, 2024
    Inventors: Harold Carter Edwards, Stephen Anthony Bernard Jones, Alexander Lev Minkin, Olivier Giroux, Gokul Ramaswamy Hirisave Chandra Shekhara, Vishalkumar Ketankumar Mehta, Aditya Avinash Atluri, Apoorv Parle, Ronny Meir Krashinsky, Alan Kaatz, Andrew Robert Kerr, Jack H. Choquette
  • Publication number: 20240169023
    Abstract: Apparatuses, systems, and techniques to perform computational operations in response to one or more compute uniform device architecture (CUDA) programs. In at least one embodiment, one or more computational operations are to indicate whether matrix multiply-accumulate (MMA) memory operations are complete.
    Type: Application
    Filed: November 30, 2022
    Publication date: May 23, 2024
    Inventors: Harold Carter Edwards, Kyrylo Perelygin, Maciej Tyrlik, Gokul Ramaswamy Hirisave Chandra Shekhara, Balaji Krishna Yugandhar Atukuri, Rishkul Kulkarni, Konstantinos Kyriakopoulos, Edward H. Gornish, David Allan Berson, Bageshri Sathe, James Player, Aman Arora, Alan Kaatz, Andrew Kerr, Haicheng Wu, Cris Cecka, Vijay Thakkar, Sean Treichler, Jack H. Choquette, Aditya Avinash Atluri, Apoorv Parle, Ronny Meir Krashinsky, Cody Addison, Girish Bhaskarrao Bharambe
  • Publication number: 20240168763
    Abstract: Apparatuses, systems, and techniques to perform computational operations in response to one or more compute uniform device architecture (CUDA) programs. In at least one embodiment, one or more computational operations are to cause two or more other computational operations to be performed by two or more streaming multiprocessors (SMs).
    Type: Application
    Filed: November 30, 2022
    Publication date: May 23, 2024
    Inventors: Harold Carter Edwards, Kyrylo Perelygin, Maciej Tyrlik, Gokul Ramaswamy Hirisave Chandra Shekhara, Balaji Krishna Yugandhar Atukuri, Rishkul Kulkarni, Konstantinos Kyriakopoulos, Edward H. Gornish, David Allan Berson, Bageshri Sathe, James Player, Aman Arora, Alan Kaatz, Andrew Kerr, Haicheng Wu, Cris Cecka, Vijay Thakkar, Sean Treichler, Jack H. Choquette, Aditya Avinash Atluri, Apoorv Parle, Ronny Meir Krashinsky, Cody Addison, Girish Bhaskarrao Bharambe
  • Publication number: 20240168762
    Abstract: Apparatuses, systems, and techniques to perform computational operations in response to one or more compute uniform device architecture (CUDA) programs. In at least one embodiment, one or more computational operations are to cause one or more other computational operations to wait until a portion of matrix multiply-accumulate (MMA) operations have been performed.
    Type: Application
    Filed: November 30, 2022
    Publication date: May 23, 2024
    Inventors: Harold Carter Edwards, Kyrylo Perelygin, Maciej Tyrlik, Gokul Ramaswamy Hirisave Chandra Shekhara, Balaji Krishna Yugandhar Atukuri, Rishkul Kulkarni, Konstantinos Kyriakopoulos, Edward H. Gornish, David Allan Berson, Bageshri Sathe, James Player, Aman Arora, Alan Kaatz, Andrew Kerr, Haicheng Wu, Cris Cecka, Vijay Thakkar, Sean Treichler, Jack H. Choquette, Aditya Avinash Atluri, Apoorv Parle, Ronny Meir Krashinsky, Cody Addison, Girish Bhaskarrao Bharambe
  • Publication number: 20240169470
    Abstract: Apparatuses, systems, and techniques to store information in a plurality of storage locations allocated to a graphics processing unit (GPU). In at least one embodiment, one or more circuits are to perform an application programming interface (API) to cause information to be stored in a plurality of storage locations allocated to a first GPU.
    Type: Application
    Filed: December 21, 2022
    Publication date: May 23, 2024
    Inventors: Harold Carter Edwards, Stephen Anthony Bernard Jones, Alexander Lev Minkin, Olivier Giroux, Gokul Ramaswamy Hirisave Chandra Shekhara, Vishalkumar Ketankumar Mehta, Aditya Avinash Atluri, Apoorv Parle, Chao Li, Ronny Meir Krashinsky, Alan Kaatz, Andrew Robert Kerr, Jack H. Choquette
  • Publication number: 20240168831
    Abstract: Apparatuses, systems, and techniques to cause a first tensor to be translated into a second tensor according to a tensor map. In at least one embodiment, one or more circuits are to perform an application programming interface (API) to cause a first tensor to be translated into a second tensor according to a tensor map.
    Type: Application
    Filed: December 21, 2022
    Publication date: May 23, 2024
    Inventors: Harold Carter Edwards, Stephen Anthony Bernard Jones, Alexander Lev Minkin, Olivier Giroux, Gokul Ramaswamy Hirisave Chandra Shekhara, Aditya Avinash Atluri, Apoorv Parle, Chao Li, Ronny Meir Krashinsky, Alan Kaatz, Andrew Robert Kerr, Jack H. Choquette
  • Publication number: 20240168830
    Abstract: Apparatuses, systems, and techniques to indicate storage locations of information to be mapped from a first tensor to a second tensor. In at least one embodiment, one or more circuits are to perform an application programming interface (API) to indicate one or more storage locations of information to be mapped from a first tensor to a second tensor.
    Type: Application
    Filed: December 21, 2022
    Publication date: May 23, 2024
    Inventors: Harold Carter Edwards, Stephen Anthony Bernard Jones, Alexander Lev Minkin, Olivier Giroux, Gokul Ramaswamy Hirisave Chandra Shekhara, Aditya Avinash Atluri, Apoorv Parle, Ronny Meir Krashinsky, Alan Kaatz, Andrew Robert Kerr, Jack H. Choquette
  • Publication number: 20240168659
    Abstract: Apparatuses, systems, and techniques to transform and store information corresponding to one or more memory transactions. In at least one embodiment, one or more circuits are to perform an application programming interface (API) to cause information corresponding to one or more memory transactions resulting from performance of the API to be transformed and stored.
    Type: Application
    Filed: December 21, 2022
    Publication date: May 23, 2024
    Inventors: Harold Carter Edwards, Stephen Anthony Bernard Jones, Alexander Lev Minkin, Olivier Giroux, Gokul Ramaswamy Hirisave Chandra Shekhara, Aditya Avinash Atluri, Apoorv Parle, Chao Li, Ronny Meir Krashinsky, Alan Kaatz, Andrew Robert Kerr, Jack H. Choquette
  • Publication number: 20240168765
    Abstract: Apparatuses, systems, and techniques to perform a tensor prefetch instruction to cause one or more tensors to be stored into one or more caches. In at least one embodiment, one or more circuits of a GPU are to perform a tensor prefetch instruction to cause one or more tensors to be stored into one or more GPU caches.
    Type: Application
    Filed: December 21, 2022
    Publication date: May 23, 2024
    Inventors: Harold Carter Edwards, Stephen Anthony Bernard Jones, Alexander Lev Minkin, Olivier Giroux, Gokul Ramaswamy Hirisave Chandra Shekhara, Aditya Avinash Atluri, Apoorv Parle, Ronny Meir Krashinsky, Alan Kaatz, Andrew Robert Kerr, Jack H. Choquette
  • Publication number: 20240161224
    Abstract: Apparatuses, systems, and techniques to cause a first tensor to be translated into a second tensor according to a tensor map without storing information about a memory transaction corresponding to the translation. In at least one embodiment, one or more circuits are to perform an application programming interface (API) to cause a first tensor to be translated into a second tensor according to a tensor map without storing information about one or more memory transactions corresponding to the translation.
    Type: Application
    Filed: December 21, 2022
    Publication date: May 16, 2024
    Inventors: Harold Carter Edwards, Stephen Anthony Bernard Jones, Alexander Lev Minkin, Olivier Giroux, Gokul Ramaswamy Hirisave Chandra Shekhara, Vishalkumar Ketankumar Mehta, Aditya Avinash Atluri, Apoorv Parle, Chao Li, Ronny Meir Krashinsky, Alan Kaatz, Andrew Robert Kerr, Jack H. Choquette
  • Publication number: 20240161222
    Abstract: Apparatuses, systems, and techniques to indicate how to generate image-to-column transformations. In at least one embodiment, one or more circuits are to perform an application programming interface (API) to indicate how to generate one or more image-to-column transformations.
    Type: Application
    Filed: December 21, 2022
    Publication date: May 16, 2024
    Inventors: Harold Carter Edwards, Stephen Anthony Bernard Jones, Alexander Lev Minkin, Olivier Giroux, Gokul Ramaswamy Hirisave Chandra Shekhara, Vishalkumar Ketankumar Mehta, Aditya Avinash Atluri, Apoorv Parle, Ronny Meir Krashinsky, Alan Kaatz, Andrew Robert Kerr, Jack H. Choquette
  • Publication number: 20240161223
    Abstract: Apparatuses, systems, and techniques to cause a first tensor to be translated into a second tensor according to a tensor map. In at least one embodiment, one or more circuits are to perform an application programming interface (API) to cause a first tensor to be translated into a second tensor according to a tensor map.
    Type: Application
    Filed: December 21, 2022
    Publication date: May 16, 2024
    Inventors: Harold Carter Edwards, Stephen Anthony Bernard Jones, Alexander Lev Minkin, Olivier Giroux, Gokul Ramaswamy Hirisave Chandra Shekhara, Vishalkumar Ketankumar Mehta, Aditya Avinash Atluri, Apoorv Parle, Chao Li, Ronny Meir Krashinsky, Alan Kaatz, Andrew Robert Kerr, Jack H. Choquette
  • Patent number: 11907717
    Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.
    Type: Grant
    Filed: February 8, 2023
    Date of Patent: February 20, 2024
    Assignee: NVIDIA Corporation
    Inventors: Andrew Kerr, Jack Choquette, Xiaogang Qiu, Omkar Paranjape, Poornachandra Rao, Shirish Gadre, Steven J. Heinrich, Manan Patel, Olivier Giroux, Alan Kaatz
  • Publication number: 20230297643
    Abstract: Matrix multiplication operations can be implemented, at least in part, on one or more tensor cores of a parallel processing unit. An efficiency of the matrix multiplication operations can be improved in cases where one of the input operands or the output operand of the matrix multiplication operation is a square matrix having a triangular data pattern. In such cases, the number of computations performed by the tensor cores of the parallel processing unit can be reduced by dropping computations and/or masking out elements of the square matrix input operand on one side of the main diagonal of the square matrix. In other cases where the output operand exhibits the triangular data pattern, computations can be dropped or masked out for the invalid side of the main diagonal of the square matrix. In an embodiment, a library implementing the matrix multiplication operations is provided.
    Type: Application
    Filed: March 21, 2022
    Publication date: September 21, 2023
    Inventors: Aniket Shivam, Andrew Kerr, Haicheng Wu, Manish Gupta, Nikita Shustrov, Qing Yang, Alan Kaatz, Aditya Avinash Atluri