Patents by Inventor Alan Kaatz

Alan Kaatz has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Application programming interface to wait on matrix multiply-accumulate

Patent number: 12204897

Abstract: Apparatuses, systems, and techniques to perform computational operations in response to one or more compute uniform device architecture (CUDA) programs. In at least one embodiment, one or more computational operations are to cause one or more other computational operations to wait until a portion of matrix multiply-accumulate (MMA) operations have been performed.

Type: Grant

Filed: November 30, 2022

Date of Patent: January 21, 2025

Assignee: NVIDIA CORPORATION

Inventors: Harold Carter Edwards, Kyrylo Perelygin, Maciej Tyrlik, Gokul Ramaswamy Hirisave Chandra Shekhara, Balaji Krishna Yugandhar Atukuri, Rishkul Kulkarni, Konstantinos Kyriakopoulos, Edward H. Gornish, David Allan Berson, Bageshri Sathe, James Player, Aman Arora, Alan Kaatz, Andrew Kerr, Haicheng Wu, Cris Cecka, Vijay Thakkar, Sean Treichler, Jack H. Choquette, Aditya Avinash Atluri, Apoorv Parle, Ronny Meir Krashinsky, Cody Addison, Girish Bhaskarrao Bharambe
Method and apparatus for efficient access to multidimensional data structures and/or other large data blocks

Patent number: 12141082

Abstract: A parallel processing unit comprises a plurality of processors each being coupled to a memory access hardware circuitry. Each memory access hardware circuitry is configured to receive, from the coupled processor, a memory access request specifying a coordinate of a multidimensional data structure, wherein the memory access hardware circuit is one of a plurality of memory access circuitry each coupled to a respective one of the processors; and, in response to the memory access request, translate the coordinate of the multidimensional data structure into plural memory addresses for the multidimensional data structure and using the plural memory addresses, asynchronously transfer at least a portion of the multidimensional data structure for processing by at least the coupled processor. The memory locations may be in the shared memory of the coupled processor and/or an external memory.

Type: Grant

Filed: March 10, 2022

Date of Patent: November 12, 2024

Assignee: NVIDIA CORPORATION

Inventors: Alexander L. Minkin, Alan Kaatz, Oliver Giroux, Jack Choquette, Shirish Gadre, Manan Patel, John Tran, Ronny Krashinsky, Jeff Schottmiller
STORAGE OF TRANSFORMED TENSOR IN A CACHE

Publication number: 20240169472

Abstract: Apparatuses, systems, and techniques to perform a tensor prefetch instruction to cause one or more tensors to be transformed and stored into one or more caches. In at least one embodiment, one or more circuits of a GPU are to perform a tensor prefetch instruction to cause one or more tensors to be transformed and stored into one or more GPU caches.

Type: Application

Filed: December 21, 2022

Publication date: May 23, 2024

Inventors: Harold Carter Edwards, Stephen Anthony Bernard Jones, Alexander Lev Minkin, Olivier Giroux, Gokul Ramaswamy Hirisave Chandra Shekhara, Aditya Avinash Atluri, Apoorv Parle, Ronny Meir Krashinsky, Alan Kaatz, Andrew Robert Kerr, Jack H. Choquette
APPLICATION PROGRAMMING INTERFACE TO TRANSFORM INFORMATION CORRESPONDING TO A MEMORY TRANSACTION

Publication number: 20240169469

Abstract: Apparatuses, systems, and techniques to transform information corresponding to one or more memory transactions. In at least one embodiment, one or more circuits are to perform an application programming interface (API) to cause information corresponding to one or more memory transactions resulting from performance of the API to be transformed.

Type: Application

Filed: December 21, 2022

Publication date: May 23, 2024

Inventors: Harold Carter Edwards, Stephen Anthony Bernard Jones, Alexander Lev Minkin, Olivier Giroux, Gokul Ramaswamy Hirisave Chandra Shekhara, Aditya Avinash Atluri, Apoorv Parle, Chao Li, Ronny Meir Krashinsky, Alan Kaatz, Andrew Robert Kerr, Jack H. Choquette
STORAGE OF INFORMATION IN A GRAPHICS PROCESSING UNIT CACHE

Publication number: 20240169471

Abstract: Apparatuses, systems, and techniques to perform a graphics processing unit (GPU) prefetch instruction to cause a variable amount of information to be stored into one or more GPU caches. In at least one embodiment, one or more circuits of a GPU are to perform a GPU prefetch instruction to cause a variable amount of information to be stored into one or more GPU caches.

Type: Application

Filed: December 21, 2022

Publication date: May 23, 2024

Inventors: Harold Carter Edwards, Stephen Anthony Bernard Jones, Alexander Lev Minkin, Olivier Giroux, Gokul Ramaswamy Hirisave Chandra Shekhara, Aditya Avinash Atluri, Apoorv Parle, Ronny Meir Krashinsky, Alan Kaatz, Andrew Robert Kerr, Jack H. Choquette
APPLICATION PROGRAMMING INTERFACE TO SYNCHRONIZE MATRIX MULTIPLY-ACCUMULATE MEMORY TRANSACTIONS

Publication number: 20240169022

Abstract: Apparatuses, systems, and techniques to perform computational operations in response to one or more compute uniform device architecture (CUDA) programs. In at least one embodiment, one or more computational operations are to cause one or more other computational operations to wait until matrix multiply-accumulate (MMA) memory transactions are performed.

Type: Application

Filed: November 30, 2022

Publication date: May 23, 2024

Inventors: Harold Carter Edwards, Kyrylo Perelygin, Maciej Tyrlik, Gokul Ramaswamy Hirisave Chandra Shekhara, Balaji Krishna Yugandhar Atukuri, Rishkul Kulkarni, Konstantinos Kyriakopoulos, Edward H. Gornish, David Allan Berson, Bageshri Sathe, James Player, Aman Arora, Alan Kaatz, Andrew Kerr, Haicheng Wu, Cris Cecka, Vijay Thakkar, Sean Treichler, Jack H. Choquette, Aditya Avinash Atluri, Apoorv Parle, Ronny Meir Krashinsky, Cody Addison, Girish Bhaskarrao Bharambe
APPLICATION PROGRAMMING INTERFACE TO GENERATE A TENSOR MAPPING

Publication number: 20240168829

Abstract: Apparatuses, systems, and techniques to generate a tensor mapping. In at least one embodiment, one or more circuits are to perform an application programming interface (API) to cause a mapping from a first tensor to a second tensor to be generated.

Type: Application

Filed: December 21, 2022

Publication date: May 23, 2024

Inventors: Harold Carter Edwards, Stephen Anthony Bernard Jones, Alexander Lev Minkin, Olivier Giroux, Gokul Ramaswamy Hirisave Chandra Shekhara, Vishalkumar Ketankumar Mehta, Aditya Avinash Atluri, Apoorv Parle, Ronny Meir Krashinsky, Alan Kaatz, Andrew Robert Kerr, Jack H. Choquette
APPLICATION PROGRAMMING INTERFACE TO INDICATE MATRIX MULTIPLY-ACCUMULATE

Publication number: 20240169023

Abstract: Apparatuses, systems, and techniques to perform computational operations in response to one or more compute uniform device architecture (CUDA) programs. In at least one embodiment, one or more computational operations are to indicate whether matrix multiply-accumulate (MMA) memory operations are complete.

Type: Application

Filed: November 30, 2022

Publication date: May 23, 2024

Inventors: Harold Carter Edwards, Kyrylo Perelygin, Maciej Tyrlik, Gokul Ramaswamy Hirisave Chandra Shekhara, Balaji Krishna Yugandhar Atukuri, Rishkul Kulkarni, Konstantinos Kyriakopoulos, Edward H. Gornish, David Allan Berson, Bageshri Sathe, James Player, Aman Arora, Alan Kaatz, Andrew Kerr, Haicheng Wu, Cris Cecka, Vijay Thakkar, Sean Treichler, Jack H. Choquette, Aditya Avinash Atluri, Apoorv Parle, Ronny Meir Krashinsky, Cody Addison, Girish Bhaskarrao Bharambe
APPLICATION PROGRAMMING INTERFACE TO INDICATE OPERATIONS TO BE PERFORMED BY CORRESPONDING STREAMING MULTIPROCESSORS

Publication number: 20240168763

Abstract: Apparatuses, systems, and techniques to perform computational operations in response to one or more compute uniform device architecture (CUDA) programs. In at least one embodiment, one or more computational operations are to cause two or more other computational operations to be performed by two or more streaming multiprocessors (SMs).

Type: Application

Filed: November 30, 2022

Publication date: May 23, 2024

Inventors: Harold Carter Edwards, Kyrylo Perelygin, Maciej Tyrlik, Gokul Ramaswamy Hirisave Chandra Shekhara, Balaji Krishna Yugandhar Atukuri, Rishkul Kulkarni, Konstantinos Kyriakopoulos, Edward H. Gornish, David Allan Berson, Bageshri Sathe, James Player, Aman Arora, Alan Kaatz, Andrew Kerr, Haicheng Wu, Cris Cecka, Vijay Thakkar, Sean Treichler, Jack H. Choquette, Aditya Avinash Atluri, Apoorv Parle, Ronny Meir Krashinsky, Cody Addison, Girish Bhaskarrao Bharambe
APPLICATION PROGRAMMING INTERFACE TO WAIT ON MATRIX MULTIPLY-ACCUMULATE

Publication number: 20240168762

Abstract: Apparatuses, systems, and techniques to perform computational operations in response to one or more compute uniform device architecture (CUDA) programs. In at least one embodiment, one or more computational operations are to cause one or more other computational operations to wait until a portion of matrix multiply-accumulate (MMA) operations have been performed.

Type: Application

Filed: November 30, 2022

Publication date: May 23, 2024

Inventors: Harold Carter Edwards, Kyrylo Perelygin, Maciej Tyrlik, Gokul Ramaswamy Hirisave Chandra Shekhara, Balaji Krishna Yugandhar Atukuri, Rishkul Kulkarni, Konstantinos Kyriakopoulos, Edward H. Gornish, David Allan Berson, Bageshri Sathe, James Player, Aman Arora, Alan Kaatz, Andrew Kerr, Haicheng Wu, Cris Cecka, Vijay Thakkar, Sean Treichler, Jack H. Choquette, Aditya Avinash Atluri, Apoorv Parle, Ronny Meir Krashinsky, Cody Addison, Girish Bhaskarrao Bharambe
APPLICATION PROGRAMMING INTERFACE TO STORE INFORMATION IN A PLURALITY OF STORAGE LOCATIONS

Publication number: 20240169470

Abstract: Apparatuses, systems, and techniques to store information in a plurality of storage locations allocated to a graphics processing unit (GPU). In at least one embodiment, one or more circuits are to perform an application programming interface (API) to cause information to be stored in a plurality of storage locations allocated to a first GPU.

Type: Application

Filed: December 21, 2022

Publication date: May 23, 2024

Inventors: Harold Carter Edwards, Stephen Anthony Bernard Jones, Alexander Lev Minkin, Olivier Giroux, Gokul Ramaswamy Hirisave Chandra Shekhara, Vishalkumar Ketankumar Mehta, Aditya Avinash Atluri, Apoorv Parle, Chao Li, Ronny Meir Krashinsky, Alan Kaatz, Andrew Robert Kerr, Jack H. Choquette
APPLICATION PROGRAMMING INTERFACE TO TRANSLATE A TENSOR ACCORDING TO A TENSOR MAP

Publication number: 20240168831

Abstract: Apparatuses, systems, and techniques to cause a first tensor to be translated into a second tensor according to a tensor map. In at least one embodiment, one or more circuits are to perform an application programming interface (API) to cause a first tensor to be translated into a second tensor according to a tensor map.

Type: Application

Filed: December 21, 2022

Publication date: May 23, 2024

Inventors: Harold Carter Edwards, Stephen Anthony Bernard Jones, Alexander Lev Minkin, Olivier Giroux, Gokul Ramaswamy Hirisave Chandra Shekhara, Aditya Avinash Atluri, Apoorv Parle, Chao Li, Ronny Meir Krashinsky, Alan Kaatz, Andrew Robert Kerr, Jack H. Choquette
APPLICATION PROGRAMMING INTERFACE TO INDICATE STORAGE LOCATIONS

Publication number: 20240168830

Abstract: Apparatuses, systems, and techniques to indicate storage locations of information to be mapped from a first tensor to a second tensor. In at least one embodiment, one or more circuits are to perform an application programming interface (API) to indicate one or more storage locations of information to be mapped from a first tensor to a second tensor.

Type: Application

Filed: December 21, 2022

Publication date: May 23, 2024

Inventors: Harold Carter Edwards, Stephen Anthony Bernard Jones, Alexander Lev Minkin, Olivier Giroux, Gokul Ramaswamy Hirisave Chandra Shekhara, Aditya Avinash Atluri, Apoorv Parle, Ronny Meir Krashinsky, Alan Kaatz, Andrew Robert Kerr, Jack H. Choquette
APPLICATION PROGRAMMING INTERFACE TO TRANSFORM AND STORE INFORMATION CORRESPONDING TO A MEMORY TRANSACTION

Publication number: 20240168659

Abstract: Apparatuses, systems, and techniques to transform and store information corresponding to one or more memory transactions. In at least one embodiment, one or more circuits are to perform an application programming interface (API) to cause information corresponding to one or more memory transactions resulting from performance of the API to be transformed and stored.

Type: Application

Filed: December 21, 2022

Publication date: May 23, 2024

Inventors: Harold Carter Edwards, Stephen Anthony Bernard Jones, Alexander Lev Minkin, Olivier Giroux, Gokul Ramaswamy Hirisave Chandra Shekhara, Aditya Avinash Atluri, Apoorv Parle, Chao Li, Ronny Meir Krashinsky, Alan Kaatz, Andrew Robert Kerr, Jack H. Choquette
STORAGE OF TENSOR IN A CACHE

Publication number: 20240168765

Abstract: Apparatuses, systems, and techniques to perform a tensor prefetch instruction to cause one or more tensors to be stored into one or more caches. In at least one embodiment, one or more circuits of a GPU are to perform a tensor prefetch instruction to cause one or more tensors to be stored into one or more GPU caches.

Type: Application

Filed: December 21, 2022

Publication date: May 23, 2024

Inventors: Harold Carter Edwards, Stephen Anthony Bernard Jones, Alexander Lev Minkin, Olivier Giroux, Gokul Ramaswamy Hirisave Chandra Shekhara, Aditya Avinash Atluri, Apoorv Parle, Ronny Meir Krashinsky, Alan Kaatz, Andrew Robert Kerr, Jack H. Choquette
APPLICATION PROGRAMMING INTERFACE TO GENERATE A TENSOR ACCORDING TO A TENSOR MAP

Publication number: 20240161224

Abstract: Apparatuses, systems, and techniques to cause a first tensor to be translated into a second tensor according to a tensor map without storing information about a memory transaction corresponding to the translation. In at least one embodiment, one or more circuits are to perform an application programming interface (API) to cause a first tensor to be translated into a second tensor according to a tensor map without storing information about one or more memory transactions corresponding to the translation.

Type: Application

Filed: December 21, 2022

Publication date: May 16, 2024

Inventors: Harold Carter Edwards, Stephen Anthony Bernard Jones, Alexander Lev Minkin, Olivier Giroux, Gokul Ramaswamy Hirisave Chandra Shekhara, Vishalkumar Ketankumar Mehta, Aditya Avinash Atluri, Apoorv Parle, Chao Li, Ronny Meir Krashinsky, Alan Kaatz, Andrew Robert Kerr, Jack H. Choquette
APPLICATION PROGRAMMING INTERFACE TO INDICATE IMAGE-TO-COLUMN TRANSFORMATION

Publication number: 20240161222

Abstract: Apparatuses, systems, and techniques to indicate how to generate image-to-column transformations. In at least one embodiment, one or more circuits are to perform an application programming interface (API) to indicate how to generate one or more image-to-column transformations.

Type: Application

Filed: December 21, 2022

Publication date: May 16, 2024

Inventors: Harold Carter Edwards, Stephen Anthony Bernard Jones, Alexander Lev Minkin, Olivier Giroux, Gokul Ramaswamy Hirisave Chandra Shekhara, Vishalkumar Ketankumar Mehta, Aditya Avinash Atluri, Apoorv Parle, Ronny Meir Krashinsky, Alan Kaatz, Andrew Robert Kerr, Jack H. Choquette
APPLICATION PROGRAMMING INTERFACE TO TRANSLATE A TENSOR

Publication number: 20240161223

Abstract: Apparatuses, systems, and techniques to cause a first tensor to be translated into a second tensor according to a tensor map. In at least one embodiment, one or more circuits are to perform an application programming interface (API) to cause a first tensor to be translated into a second tensor according to a tensor map.

Type: Application

Filed: December 21, 2022

Publication date: May 16, 2024

Inventors: Harold Carter Edwards, Stephen Anthony Bernard Jones, Alexander Lev Minkin, Olivier Giroux, Gokul Ramaswamy Hirisave Chandra Shekhara, Vishalkumar Ketankumar Mehta, Aditya Avinash Atluri, Apoorv Parle, Chao Li, Ronny Meir Krashinsky, Alan Kaatz, Andrew Robert Kerr, Jack H. Choquette
Techniques for efficiently transferring data to a processor

Patent number: 11907717

Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.

Type: Grant

Filed: February 8, 2023

Date of Patent: February 20, 2024

Assignee: NVIDIA Corporation

Inventors: Andrew Kerr, Jack Choquette, Xiaogang Qiu, Omkar Paranjape, Poornachandra Rao, Shirish Gadre, Steven J. Heinrich, Manan Patel, Olivier Giroux, Alan Kaatz
NON-RECTANGULAR MATRIX COMPUTATIONS AND DATA PATTERN PROCESSING USING TENSOR CORES

Publication number: 20230297643

Abstract: Matrix multiplication operations can be implemented, at least in part, on one or more tensor cores of a parallel processing unit. An efficiency of the matrix multiplication operations can be improved in cases where one of the input operands or the output operand of the matrix multiplication operation is a square matrix having a triangular data pattern. In such cases, the number of computations performed by the tensor cores of the parallel processing unit can be reduced by dropping computations and/or masking out elements of the square matrix input operand on one side of the main diagonal of the square matrix. In other cases where the output operand exhibits the triangular data pattern, computations can be dropped or masked out for the invalid side of the main diagonal of the square matrix. In an embodiment, a library implementing the matrix multiplication operations is provided.

Type: Application

Filed: March 21, 2022

Publication date: September 21, 2023

Inventors: Aniket Shivam, Andrew Kerr, Haicheng Wu, Manish Gupta, Nikita Shustrov, Qing Yang, Alan Kaatz, Aditya Avinash Atluri

1 2 next