Patents by Inventor Alan Kaatz

Alan Kaatz has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11907717
    Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.
    Type: Grant
    Filed: February 8, 2023
    Date of Patent: February 20, 2024
    Assignee: NVIDIA Corporation
    Inventors: Andrew Kerr, Jack Choquette, Xiaogang Qiu, Omkar Paranjape, Poornachandra Rao, Shirish Gadre, Steven J. Heinrich, Manan Patel, Olivier Giroux, Alan Kaatz
  • Publication number: 20230297643
    Abstract: Matrix multiplication operations can be implemented, at least in part, on one or more tensor cores of a parallel processing unit. An efficiency of the matrix multiplication operations can be improved in cases where one of the input operands or the output operand of the matrix multiplication operation is a square matrix having a triangular data pattern. In such cases, the number of computations performed by the tensor cores of the parallel processing unit can be reduced by dropping computations and/or masking out elements of the square matrix input operand on one side of the main diagonal of the square matrix. In other cases where the output operand exhibits the triangular data pattern, computations can be dropped or masked out for the invalid side of the main diagonal of the square matrix. In an embodiment, a library implementing the matrix multiplication operations is provided.
    Type: Application
    Filed: March 21, 2022
    Publication date: September 21, 2023
    Inventors: Aniket Shivam, Andrew Kerr, Haicheng Wu, Manish Gupta, Nikita Shustrov, Qing Yang, Alan Kaatz, Aditya Avinash Atluri
  • Publication number: 20230289292
    Abstract: A parallel processing unit comprises a plurality of processors each being coupled to a memory access hardware circuitry. Each memory access hardware circuitry is configured to receive, from the coupled processor, a memory access request specifying a coordinate of a multidimensional data structure, wherein the memory access hardware circuit is one of a plurality of memory access circuitry each coupled to a respective one of the processors; and, in response to the memory access request, translate the coordinate of the multidimensional data structure into plural memory addresses for the multidimensional data structure and using the plural memory addresses, asynchronously transfer at least a portion of the multidimensional data structure for processing by at least the coupled processor. The memory locations may be in the shared memory of the coupled processor and/or an external memory.
    Type: Application
    Filed: March 10, 2022
    Publication date: September 14, 2023
    Inventors: Alexander L. Minkin, Alan Kaatz, Olivier Giroux, Jack Choquette, Shirish Gadre, Manan Patel, John Tran, Ronny Krashinsky, Jeff Schottmiller
  • Publication number: 20230289304
    Abstract: A parallel processing unit comprises a plurality of processors each being coupled to a memory access hardware circuitry. Each memory access hardware circuitry is configured to receive, from the coupled processor, a memory access request specifying a coordinate of a multidimensional data structure, wherein the memory access hardware circuit is one of a plurality of memory access circuitry each coupled to a respective one of the processors; and, in response to the memory access request, translate the coordinate of the multidimensional data structure into plural memory addresses for the multidimensional data structure and using the plural memory addresses, asynchronously transfer at least a portion of the multidimensional data structure for processing by at least the coupled processor. The memory locations may be in the shared memory of the coupled processor and/or an external memory.
    Type: Application
    Filed: March 10, 2022
    Publication date: September 14, 2023
    Inventors: Alexander L. Minkin, Alan Kaatz, Oliver Giroux, Jack Choquette, Shirish Gadre, Manan Patel, John Tran, Ronny Krashinsky, Jeff Schottmiller
  • Publication number: 20230185570
    Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.
    Type: Application
    Filed: February 8, 2023
    Publication date: June 15, 2023
    Inventors: Andrew KERR, Jack Choquette, Xiaogang Qiu, Omkar Paranjape, Poornachandra Rao, Shirish Gadre, Steven J. Heinrich, Manan Patel, Olivier Giroux, Alan Kaatz
  • Publication number: 20230140934
    Abstract: Apparatuses, systems, and techniques to perform a matrix multiplication using parallel processing. In at least one embodiment, a matrix multiplication is divided into a set of tiles, with each tile processed with a prolog task, a calculation task, and an epilog task. The prolog tasks are performed by a dedicated set of threads, with the remaining tasks performed in an interleaved manner using two or more thread groups.
    Type: Application
    Filed: March 8, 2022
    Publication date: May 11, 2023
    Inventors: Chao Li, Jing Li, Alan Kaatz, Ronny Meir Krashinsky, Albert Xu
  • Publication number: 20230110438
    Abstract: Apparatuses, systems, and techniques are presented to perform one or more operations. In at least one embodiment, one or more data values, to be used by one or more neural networks, are caused to be replaced by one or more invalid data values.
    Type: Application
    Filed: October 8, 2021
    Publication date: April 13, 2023
    Inventors: Pradeep Ramani, Alex Minkin, Alan Kaatz, Yang Xu, Ronny Krashinsky
  • Patent number: 11604649
    Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.
    Type: Grant
    Filed: June 30, 2021
    Date of Patent: March 14, 2023
    Assignee: NVIDIA Corporation
    Inventors: Andrew Kerr, Jack Choquette, Xiaogang Qiu, Omkar Paranjape, Poornachandra Rao, Shirish Gadre, Steven J. Heinrich, Manan Patel, Olivier Giroux, Alan Kaatz
  • Publication number: 20210326137
    Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.
    Type: Application
    Filed: June 30, 2021
    Publication date: October 21, 2021
    Inventors: Andrew KERR, Jack CHOQUETTE, Xiaogang QIU, Omkar PARANJAPE, Poornachandra RAO, Shirish GADRE, Steven J. HEINRICH, Manan PATEL, Olivier GIROUX, Alan KAATZ
  • Patent number: 11080051
    Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.
    Type: Grant
    Filed: December 12, 2019
    Date of Patent: August 3, 2021
    Assignee: NVIDIA Corporation
    Inventors: Andrew Kerr, Jack Choquette, Xiaogang Qiu, Omkar Paranjape, Poornachandra Rao, Shirish Gadre, Steven J. Heinrich, Manan Patel, Olivier Giroux, Alan Kaatz
  • Publication number: 20210124582
    Abstract: A technique for block data transfer is disclosed that reduces data transfer and memory access overheads and significantly reduces multiprocessor activity and energy consumption. Threads executing on a multiprocessor needing data stored in global memory can request and store the needed data in on-chip shared memory, which can be accessed by the threads multiple times. The data can be loaded from global memory and stored in shared memory using an instruction which directs the data into the shared memory without storing the data in registers and/or cache memory of the multiprocessor during the data transfer.
    Type: Application
    Filed: December 12, 2019
    Publication date: April 29, 2021
    Inventors: Andrew Kerr, Jack Choquette, Xiaogang Qiu, Omkar Paranjape, Poornachandra Rao, Shirish Gadre, Steven J. Heinrich, Manan Patel, Olivier Giroux, Alan Kaatz