Patents by Inventor Ronny Meir Krashinsky

Ronny Meir Krashinsky has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240036955
    Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to indicate one or more limitations of one or more attributes of one or more groups of blocks of one or more threads.
    Type: Application
    Filed: September 28, 2022
    Publication date: February 1, 2024
    Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
  • Publication number: 20240036916
    Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to indicate a maximum number of blocks of threads capable of being scheduled in parallel.
    Type: Application
    Filed: September 28, 2022
    Publication date: February 1, 2024
    Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
  • Publication number: 20240036917
    Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to indicate a maximum number of blocks of threads to be scheduled in parallel.
    Type: Application
    Filed: September 28, 2022
    Publication date: February 1, 2024
    Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
  • Publication number: 20240036956
    Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to indicate whether one or more threads within a group of blocks of threads have performed a barrier instruction and to cause performance of one or more threads within the group of blocks of threads to stop at least until all threads within the group of blocks have performed the barrier instruction.
    Type: Application
    Filed: September 28, 2022
    Publication date: February 1, 2024
    Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
  • Publication number: 20240036952
    Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to determine which of two or more blocks of threads are to be scheduled in parallel.
    Type: Application
    Filed: September 28, 2022
    Publication date: February 1, 2024
    Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
  • Publication number: 20240036953
    Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to indicate a scheduling policy of one or more blocks of one or more threads.
    Type: Application
    Filed: September 28, 2022
    Publication date: February 1, 2024
    Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
  • Publication number: 20240036915
    Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to determine a scheduling policy of one or more blocks of one or more threads.
    Type: Application
    Filed: September 28, 2022
    Publication date: February 1, 2024
    Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
  • Publication number: 20240036957
    Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to cause memory to be shared between two or more groups of blocks of threads.
    Type: Application
    Filed: September 28, 2022
    Publication date: February 1, 2024
    Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
  • Publication number: 20240036944
    Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to indicate whether one or more threads within two or more blocks of threads have performed a barrier instruction.
    Type: Application
    Filed: September 28, 2022
    Publication date: February 1, 2024
    Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
  • Publication number: 20240036945
    Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to cause performance of one or more threads within a group of blocks of threads to stop at least until all threads within the group of blocks have performed a barrier instruction.
    Type: Application
    Filed: September 28, 2022
    Publication date: February 1, 2024
    Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
  • Publication number: 20240036954
    Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to indicate one or more attributes of one or more groups of blocks of one or more threads.
    Type: Application
    Filed: September 28, 2022
    Publication date: February 1, 2024
    Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
  • Publication number: 20240036918
    Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to cause a kernel to be generated to cause two or more blocks of two or more threads to be scheduled in parallel.
    Type: Application
    Filed: September 28, 2022
    Publication date: February 1, 2024
    Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
  • Publication number: 20240036951
    Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to indicate two or more blocks of threads to be scheduled in parallel.
    Type: Application
    Filed: September 28, 2022
    Publication date: February 1, 2024
    Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
  • Publication number: 20230140934
    Abstract: Apparatuses, systems, and techniques to perform a matrix multiplication using parallel processing. In at least one embodiment, a matrix multiplication is divided into a set of tiles, with each tile processed with a prolog task, a calculation task, and an epilog task. The prolog tasks are performed by a dedicated set of threads, with the remaining tasks performed in an interleaved manner using two or more thread groups.
    Type: Application
    Filed: March 8, 2022
    Publication date: May 11, 2023
    Inventors: Chao Li, Jing Li, Alan Kaatz, Ronny Meir Krashinsky, Albert Xu
  • Publication number: 20220365750
    Abstract: Apparatuses, systems, and techniques to generate numbers. In at least one embodiment, one or more circuits are to cause one or more thirty-two bit floating point numbers to be truncated to generate one or more rounded numbers based, at least in part, on one or more rounding attributes.
    Type: Application
    Filed: May 16, 2022
    Publication date: November 17, 2022
    Inventors: Girish Bhaskarrao Bharambe, Kyrylo Perelygin, Advait Soman, Andrew Robert Kerr, Farhana Schuchman, Jaydeep Marathe, Stephen Anthony Bernard Jones, Ronny Meir Krashinsky, Jaewook Shin
  • Patent number: 11379420
    Abstract: Compressed data is oftentimes beneficial for reducing the computing resources required, for example, to transmit and store data. The compression of data is particularly useful when dealing with sparse data (data that includes numerous zeros or near-zero values) and only non-zero values above a certain threshold have significance. When dealing with compressed data, oftentimes the data needs to be decompressed for processing (e.g., by deep learning networks or other applications configured to operate on sparse, or other uncompressed data). Instructions are disclosed for supporting the decompression of compressed data by a processing unit such as a CPU and GPU.
    Type: Grant
    Filed: March 20, 2019
    Date of Patent: July 5, 2022
    Assignee: NVIDIA CORPORATION
    Inventors: Jorge Albericio Latorre, Jack H. Choquette, Manan Maheshkumar Patel, Jeffrey Pool, Ming Y. Siu, Ronny Meir Krashinsky, Ganesh Venkatesh
  • Publication number: 20200285618
    Abstract: Compressed data is oftentimes beneficial for reducing the computing resources required, for example, to transmit and store data. The compression of data is particularly useful when dealing with sparse data (data that includes numerous zeros or near-zero values) and only non-zero values above a certain threshold have significance. When dealing with compressed data, oftentimes the data needs to be decompressed for processing (e.g., by deep learning networks or other applications configured to operate on sparse, or other uncompressed data). Instructions are disclosed for supporting the decompression of compressed data by a processing unit such as a CPU and GPU.
    Type: Application
    Filed: March 20, 2019
    Publication date: September 10, 2020
    Inventors: Jorge Albericio Latorre, Jack H. Choquette, Manan Maheshkumar Patel, Jeffrey Pool, Ming Y. Siu, Ronny Meir Krashinsky, Ganesh Venkatesh
  • Patent number: 10067768
    Abstract: A method, system, and computer program product for executing divergent threads using a convergence barrier are disclosed. A first instruction in a program is executed by a plurality of threads, where the first instruction, when executed by a particular thread, indicates to a scheduler unit that the thread participates in a convergence barrier. A first path through the program is executed by a first divergent portion of the participating threads and a second path through the program is executed by a second divergent portion of the participating threads. The first divergent portion of the participating threads executes a second instruction in the program and transitions to a blocked state at the convergence barrier. The scheduler unit determines that all of the participating threads are synchronized at the convergence barrier and the convergence barrier is cleared.
    Type: Grant
    Filed: July 13, 2015
    Date of Patent: September 4, 2018
    Assignee: NVIDIA CORPORATION
    Inventors: Gregory Frederick Diamos, Richard Craig Johnson, Vinod Grover, Olivier Giroux, Jack H. Choquette, Michael Alan Fetterman, Ajay S. Tirumala, Peter Nelson, Ronny Meir Krashinsky
  • Patent number: 9971699
    Abstract: A method, computer readable medium, and system are disclosed for decoupling data pre-fetch from demand loads. The method includes the steps of receiving, by a processor, a set of instructions that includes a load instruction; and executing, by the processor, the load instruction to perform a load operation. The load operation loads data from a cache unit into a register file. The load instruction includes a no-update operator that prevents the cache unit from updating the cache state information in response to the load operation. The result is that the eviction policy for the cache unit responds to the order of pre-fetch memory access requests rather than the demand load operations.
    Type: Grant
    Filed: May 4, 2016
    Date of Patent: May 15, 2018
    Assignee: NVIDIA Corporation
    Inventors: Ronny Meir Krashinsky, Xiaogang Qiu
  • Publication number: 20170322887
    Abstract: A method, computer readable medium, and system are disclosed for decoupling data pre-fetch from demand loads. The method includes the steps of receiving, by a processor, a set of instructions that includes a load instruction; and executing, by the processor, the load instruction to perform a load operation. The load operation loads data from a cache unit into a register file. The load instruction includes a no-update operator that prevents the cache unit from updating the cache state information in response to the load operation. The result is that the eviction policy for the cache unit responds to the order of pre-fetch memory access requests rather than the demand load operations.
    Type: Application
    Filed: May 4, 2016
    Publication date: November 9, 2017
    Inventors: Ronny Meir Krashinsky, Xiaogang Qiu