Patents by Inventor Ronny Meir Krashinsky

Ronny Meir Krashinsky has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

APPLICATION PROGRAMMING INTERFACE TO INDICATE ATTRIBUTE LIMITATIONS

Publication number: 20240036955

Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to indicate one or more limitations of one or more attributes of one or more groups of blocks of one or more threads.

Type: Application

Filed: September 28, 2022

Publication date: February 1, 2024

Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
APPLICATION PROGRAMMING INTERFACE TO INDICATE PARALLEL SCHEDULING MAXIMUM

Publication number: 20240036916

Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to indicate a maximum number of blocks of threads capable of being scheduled in parallel.

Type: Application

Filed: September 28, 2022

Publication date: February 1, 2024

Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
APPLICATION PROGRAMMING INTERFACE TO INDICATE BLOCK MAXIMUM

Publication number: 20240036917

Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to indicate a maximum number of blocks of threads to be scheduled in parallel.

Type: Application

Filed: September 28, 2022

Publication date: February 1, 2024

Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
APPLICATION PROGRAMMING INTERFACE TO INDICATE PERFORMANCE OF BARRIER INSTRUCTION AND STOP PERFORMANCE OF THREADS

Publication number: 20240036956

Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to indicate whether one or more threads within a group of blocks of threads have performed a barrier instruction and to cause performance of one or more threads within the group of blocks of threads to stop at least until all threads within the group of blocks have performed the barrier instruction.

Type: Application

Filed: September 28, 2022

Publication date: February 1, 2024

Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
APPLICATION PROGRAMMING INTERFACE TO SCHEDULE THREAD BLOCKS

Publication number: 20240036952

Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to determine which of two or more blocks of threads are to be scheduled in parallel.

Type: Application

Filed: September 28, 2022

Publication date: February 1, 2024

Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
APPLICATION PROGRAMMING INTERFACE TO INDICATE SCHEDULING POLICIES

Publication number: 20240036953

Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to indicate a scheduling policy of one or more blocks of one or more threads.

Type: Application

Filed: September 28, 2022

Publication date: February 1, 2024

Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
APPLICATION PROGRAMMING INTERFACE TO PERFORM A SCHEDULING POLICY

Publication number: 20240036915

Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to determine a scheduling policy of one or more blocks of one or more threads.

Type: Application

Filed: September 28, 2022

Publication date: February 1, 2024

Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
APPLICATION PROGRAMMING INTERFACE TO SHARE MEMORY BETWEEN GROUPS OF BLOCKS OF THREADS

Publication number: 20240036957

Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to cause memory to be shared between two or more groups of blocks of threads.

Type: Application

Filed: September 28, 2022

Publication date: February 1, 2024

Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
APPLICATION PROGRAMMING INTERFACE TO INDICATE PERFORMANCE OF BARRIER INSTRUCTION

Publication number: 20240036944

Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to indicate whether one or more threads within two or more blocks of threads have performed a barrier instruction.

Type: Application

Filed: September 28, 2022

Publication date: February 1, 2024

Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
APPLICATION PROGRAMMING INTERFACE TO STOP PERFORMANCE OF THREADS

Publication number: 20240036945

Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to cause performance of one or more threads within a group of blocks of threads to stop at least until all threads within the group of blocks have performed a barrier instruction.

Type: Application

Filed: September 28, 2022

Publication date: February 1, 2024

Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
APPLICATION PROGRAMMING INTERFACE TO INDICATE ATTRIBUTES OF GROUPS OF BLOCKS OF THREADS

Publication number: 20240036954

Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to indicate one or more attributes of one or more groups of blocks of one or more threads.

Type: Application

Filed: September 28, 2022

Publication date: February 1, 2024

Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
APPLICATION PROGRAMMING INTERFACE TO GENERATE KERNELS

Publication number: 20240036918

Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to cause a kernel to be generated to cause two or more blocks of two or more threads to be scheduled in parallel.

Type: Application

Filed: September 28, 2022

Publication date: February 1, 2024

Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
APPLICATION PROGRAMMING INTERFACE TO INDICATE THREAD BLOCKS

Publication number: 20240036951

Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to indicate two or more blocks of threads to be scheduled in parallel.

Type: Application

Filed: September 28, 2022

Publication date: February 1, 2024

Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
THREAD SPECIALIZATION FOR COLLABORATIVE DATA TRANSFER AND COMPUTATION

Publication number: 20230140934

Abstract: Apparatuses, systems, and techniques to perform a matrix multiplication using parallel processing. In at least one embodiment, a matrix multiplication is divided into a set of tiles, with each tile processed with a prolog task, a calculation task, and an epilog task. The prolog tasks are performed by a dedicated set of threads, with the remaining tasks performed in an interleaved manner using two or more thread groups.

Type: Application

Filed: March 8, 2022

Publication date: May 11, 2023

Inventors: Chao Li, Jing Li, Alan Kaatz, Ronny Meir Krashinsky, Albert Xu
DATATYPE CONVERSION TECHNIQUE

Publication number: 20220365750

Abstract: Apparatuses, systems, and techniques to generate numbers. In at least one embodiment, one or more circuits are to cause one or more thirty-two bit floating point numbers to be truncated to generate one or more rounded numbers based, at least in part, on one or more rounding attributes.

Type: Application

Filed: May 16, 2022

Publication date: November 17, 2022

Inventors: Girish Bhaskarrao Bharambe, Kyrylo Perelygin, Advait Soman, Andrew Robert Kerr, Farhana Schuchman, Jaydeep Marathe, Stephen Anthony Bernard Jones, Ronny Meir Krashinsky, Jaewook Shin
Decompression techniques for processing compressed data suitable for artificial neural networks

Patent number: 11379420

Abstract: Compressed data is oftentimes beneficial for reducing the computing resources required, for example, to transmit and store data. The compression of data is particularly useful when dealing with sparse data (data that includes numerous zeros or near-zero values) and only non-zero values above a certain threshold have significance. When dealing with compressed data, oftentimes the data needs to be decompressed for processing (e.g., by deep learning networks or other applications configured to operate on sparse, or other uncompressed data). Instructions are disclosed for supporting the decompression of compressed data by a processing unit such as a CPU and GPU.

Type: Grant

Filed: March 20, 2019

Date of Patent: July 5, 2022

Assignee: NVIDIA CORPORATION

Inventors: Jorge Albericio Latorre, Jack H. Choquette, Manan Maheshkumar Patel, Jeffrey Pool, Ming Y. Siu, Ronny Meir Krashinsky, Ganesh Venkatesh
DECOMPRESSION TECHNIQUES FOR PROCESSING COMPRESSED DATA SUITABLE FOR ARTIFICIAL NEURAL NETWORKS

Publication number: 20200285618

Abstract: Compressed data is oftentimes beneficial for reducing the computing resources required, for example, to transmit and store data. The compression of data is particularly useful when dealing with sparse data (data that includes numerous zeros or near-zero values) and only non-zero values above a certain threshold have significance. When dealing with compressed data, oftentimes the data needs to be decompressed for processing (e.g., by deep learning networks or other applications configured to operate on sparse, or other uncompressed data). Instructions are disclosed for supporting the decompression of compressed data by a processing unit such as a CPU and GPU.

Type: Application

Filed: March 20, 2019

Publication date: September 10, 2020

Inventors: Jorge Albericio Latorre, Jack H. Choquette, Manan Maheshkumar Patel, Jeffrey Pool, Ming Y. Siu, Ronny Meir Krashinsky, Ganesh Venkatesh
Execution of divergent threads using a convergence barrier

Patent number: 10067768

Abstract: A method, system, and computer program product for executing divergent threads using a convergence barrier are disclosed. A first instruction in a program is executed by a plurality of threads, where the first instruction, when executed by a particular thread, indicates to a scheduler unit that the thread participates in a convergence barrier. A first path through the program is executed by a first divergent portion of the participating threads and a second path through the program is executed by a second divergent portion of the participating threads. The first divergent portion of the participating threads executes a second instruction in the program and transitions to a blocked state at the convergence barrier. The scheduler unit determines that all of the participating threads are synchronized at the convergence barrier and the convergence barrier is cleared.

Type: Grant

Filed: July 13, 2015

Date of Patent: September 4, 2018

Assignee: NVIDIA CORPORATION

Inventors: Gregory Frederick Diamos, Richard Craig Johnson, Vinod Grover, Olivier Giroux, Jack H. Choquette, Michael Alan Fetterman, Ajay S. Tirumala, Peter Nelson, Ronny Meir Krashinsky
Method to control cache replacement for decoupled data fetch

Patent number: 9971699

Abstract: A method, computer readable medium, and system are disclosed for decoupling data pre-fetch from demand loads. The method includes the steps of receiving, by a processor, a set of instructions that includes a load instruction; and executing, by the processor, the load instruction to perform a load operation. The load operation loads data from a cache unit into a register file. The load instruction includes a no-update operator that prevents the cache unit from updating the cache state information in response to the load operation. The result is that the eviction policy for the cache unit responds to the order of pre-fetch memory access requests rather than the demand load operations.

Type: Grant

Filed: May 4, 2016

Date of Patent: May 15, 2018

Assignee: NVIDIA Corporation

Inventors: Ronny Meir Krashinsky, Xiaogang Qiu
METHOD TO CONTROL CACHE REPLACEMENT FOR DECOUPLED DATA FETCH

Publication number: 20170322887

Abstract: A method, computer readable medium, and system are disclosed for decoupling data pre-fetch from demand loads. The method includes the steps of receiving, by a processor, a set of instructions that includes a load instruction; and executing, by the processor, the load instruction to perform a load operation. The load operation loads data from a cache unit into a register file. The load instruction includes a no-update operator that prevents the cache unit from updating the cache state information in response to the load operation. The result is that the eviction policy for the cache unit responds to the order of pre-fetch memory access requests rather than the demand load operations.

Type: Application

Filed: May 4, 2016

Publication date: November 9, 2017

Inventors: Ronny Meir Krashinsky, Xiaogang Qiu

1 2 next