Patents by Inventor ZE LONG

ZE LONG has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

KERNEL LAUNCH DEPENDENCIES

Publication number: 20250085973

Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, one or more software kernels are caused to indicate one or more dependencies among two or more software kernels. In at least one embodiment, one or more software kernels are performed based on one or more kernel dependencies.

Type: Application

Filed: September 11, 2023

Publication date: March 13, 2025

Applicant: NVIDIA Corporation

Inventors: Ze Long, Stephen Anthony Bernard Jones, Pradeep Moorthy, Mark Theng
Distributed shared memory

Patent number: 12248788

Abstract: Distributed shared memory (DSMEM) comprises blocks of memory that are distributed or scattered across a processor (such as a GPU). Threads executing on a processing core local to one memory block are able to access a memory block local to a different processing core. In one embodiment, shared access to these DSMEM allocations distributed across a collection of processing cores is implemented by communications between the processing cores. Such distributed shared memory provides very low latency memory access for processing cores located in proximity to the memory blocks, and also provides a way for more distant processing cores to also access the memory blocks in a manner and using interconnects that do not interfere with the processing cores' access to main or global memory such as hacked by an L2 cache.

Type: Grant

Filed: March 10, 2022

Date of Patent: March 11, 2025

Assignee: NVIDIA Corporation

Inventors: Prakash Bangalore Prabhakar, Gentaro Hirota, Ronny Krashinsky, Ze Long, Brian Pharris, Rajballav Dash, Jeff Tuckey, Jerome F. Duluk, Jr., Lacky Shah, Luke Durant, Jack Choquette, Eric Werness, Naman Govil, Manan Patel, Shayani Deb, Sandeep Navada, John Edmondson, Greg Palmer, Wish Gandhi, Ravi Manyam, Apoorv Parle, Olivier Giroux, Shirish Gadre, Steve Heinrich
Unified Memory GPU with Localized Mode

Publication number: 20250078199

Abstract: A GPU can selectively confine software function execution and associated data storage resources to locally-connected processing/storage components, thereby minimizing latency and other overhead that would otherwise be needed to access more remote resources. The GPU can selectively permit other software function execution and associated data storage resources to range across non-locally-connected processing/storage components when more processing and/or storage resources are required.

Type: Application

Filed: August 29, 2023

Publication date: March 6, 2025

Inventors: Wish GANDHI, Ze LONG, Kun FANG
APPLICATION PROGRAMMING INTERFACE TO INDICATE PARALLEL SCHEDULING MAXIMUM

Publication number: 20240036916

Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to indicate a maximum number of blocks of threads capable of being scheduled in parallel.

Type: Application

Filed: September 28, 2022

Publication date: February 1, 2024

Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
APPLICATION PROGRAMMING INTERFACE TO GENERATE KERNELS

Publication number: 20240036918

Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to cause a kernel to be generated to cause two or more blocks of two or more threads to be scheduled in parallel.

Type: Application

Filed: September 28, 2022

Publication date: February 1, 2024

Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
APPLICATION PROGRAMMING INTERFACE TO INDICATE THREAD BLOCKS

Publication number: 20240036951

Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to indicate two or more blocks of threads to be scheduled in parallel.

Type: Application

Filed: September 28, 2022

Publication date: February 1, 2024

Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
APPLICATION PROGRAMMING INTERFACE TO INDICATE PERFORMANCE OF BARRIER INSTRUCTION

Publication number: 20240036944

Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to indicate whether one or more threads within two or more blocks of threads have performed a barrier instruction.

Type: Application

Filed: September 28, 2022

Publication date: February 1, 2024

Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
APPLICATION PROGRAMMING INTERFACE TO STOP PERFORMANCE OF THREADS

Publication number: 20240036945

Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to cause performance of one or more threads within a group of blocks of threads to stop at least until all threads within the group of blocks have performed a barrier instruction.

Type: Application

Filed: September 28, 2022

Publication date: February 1, 2024

Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
APPLICATION PROGRAMMING INTERFACE TO INDICATE SCHEDULING POLICIES

Publication number: 20240036953

Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to indicate a scheduling policy of one or more blocks of one or more threads.

Type: Application

Filed: September 28, 2022

Publication date: February 1, 2024

Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
APPLICATION PROGRAMMING INTERFACE TO INDICATE ATTRIBUTES OF GROUPS OF BLOCKS OF THREADS

Publication number: 20240036954

Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to indicate one or more attributes of one or more groups of blocks of one or more threads.

Type: Application

Filed: September 28, 2022

Publication date: February 1, 2024

Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
APPLICATION PROGRAMMING INTERFACE TO INDICATE PERFORMANCE OF BARRIER INSTRUCTION AND STOP PERFORMANCE OF THREADS

Publication number: 20240036956

Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to indicate whether one or more threads within a group of blocks of threads have performed a barrier instruction and to cause performance of one or more threads within the group of blocks of threads to stop at least until all threads within the group of blocks have performed the barrier instruction.

Type: Application

Filed: September 28, 2022

Publication date: February 1, 2024

Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
APPLICATION PROGRAMMING INTERFACE TO PERFORM A SCHEDULING POLICY

Publication number: 20240036915

Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to determine a scheduling policy of one or more blocks of one or more threads.

Type: Application

Filed: September 28, 2022

Publication date: February 1, 2024

Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
APPLICATION PROGRAMMING INTERFACE TO SHARE MEMORY BETWEEN GROUPS OF BLOCKS OF THREADS

Publication number: 20240036957

Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to cause memory to be shared between two or more groups of blocks of threads.

Type: Application

Filed: September 28, 2022

Publication date: February 1, 2024

Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
APPLICATION PROGRAMMING INTERFACE TO INDICATE ATTRIBUTE LIMITATIONS

Publication number: 20240036955

Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to indicate one or more limitations of one or more attributes of one or more groups of blocks of one or more threads.

Type: Application

Filed: September 28, 2022

Publication date: February 1, 2024

Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
APPLICATION PROGRAMMING INTERFACE TO SCHEDULE THREAD BLOCKS

Publication number: 20240036952

Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to determine which of two or more blocks of threads are to be scheduled in parallel.

Type: Application

Filed: September 28, 2022

Publication date: February 1, 2024

Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
APPLICATION PROGRAMMING INTERFACE TO INDICATE BLOCK MAXIMUM

Publication number: 20240036917

Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to indicate a maximum number of blocks of threads to be scheduled in parallel.

Type: Application

Filed: September 28, 2022

Publication date: February 1, 2024

Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
Distributed Shared Memory

Publication number: 20230289189

Abstract: Distributed shared memory (DSMEM) comprises blocks of memory that are distributed or scattered across a processor (such as a GPU). Threads executing on a processing core local to one memory block are able to access a memory block local to a different processing core. In one embodiment, shared access to these DSMEM allocations distributed across a collection of processing cores is implemented by communications between the processing cores. Such distributed shared memory provides very low latency memory access for processing cores located in proximity to the memory blocks, and also provides a way for more distant processing cores to also access the memory blocks in a manner and using interconnects that do not interfere with the processing cores' access to main or global memory such as hacked by an L2 cache.

Type: Application

Filed: March 10, 2022

Publication date: September 14, 2023

Inventors: Prakash BANGALORE PRABHAKAR, Gentaro HIROTA, Ronny KRASHINSKY, Ze LONG, Brian PHARRIS, Rajballav DASH, Jeff TUCKEY, Jerome F. DULUK, JR., Lacky SHAH, Luke DURANT, Jack CHOQUETTE, Eric WERNESS, Naman GOVIL, Manan PATEL, Shayani DEB, Sandeep NAVADA, John EDMONDSON, Greg PALMER, Wish GANDHI, Ravi MANYAM, Apoorv PARLE, Olivier GIROUX, Shirish GADRE, Steve HEINRICH
Cooperative Group Arrays

Publication number: 20230289215

Abstract: A new level(s) of hierarchy—Cooperate Group Arrays (CGAs)—and an associated new hardware-based work distribution/execution model is described. A CGA is a grid of thread blocks (also referred to as cooperative thread arrays (CTAs)). CGAs provide co-scheduling, e.g., control over where CTAs are placed/executed in a processor (such as a GPU), relative to the memory required by an application and relative to each other. Hardware support for such CGAs guarantees concurrency and enables applications to see more data locality, reduced latency, and better synchronization between all the threads in tightly cooperating collections of CTAs programmably distributed across different (e.g., hierarchical) hardware domains or partitions.

Type: Application

Filed: March 10, 2022

Publication date: September 14, 2023

Inventors: Greg PALMER, Gentaro HIROTA, Ronny KRASHINSKY, Ze LONG, Brian PHARRIS, Rajballav DASH, Jeff TUCKEY, Jerome F. DULUK, JR., Lacky SHAH, Luke DURANT, Jack CHOQUETTE, Eric WERNESS, Naman GOVIL, Manan PATEL, Shayani DEB, Sandeep NAVADA, John EDMONDSON, Prakash BANGALORE PRABHAKAR, Wish GANDHI, Ravi MANYAM, Apoorv PARLE, Olivier GIROUX, Shirish GADRE, Steve HEINRICH
Techniques for Scalable Load Balancing of Thread Groups in a Processor

Publication number: 20230289211

Abstract: A processor supports new thread group hierarchies by centralizing work distribution to provide hardware-guaranteed concurrent execution of thread groups in a thread group array through speculative launch and load balancing across processing cores. Efficiencies are realized by distributing grid rasterization among the processing cores.

Type: Application

Filed: March 10, 2022

Publication date: September 14, 2023

Inventors: Gentaro HIROTA, Tanmoy MANDAL, Jeff TUCKEY, Kevin STEPHANO, Chen MEI, Shayani DEB, Naman GOVIL, Rajballav DASH, Ronny KRASHINSKY, Ze LONG, Brian PHARRIS
EFFICIENTLY LAUNCHING TASKS ON A PROCESSOR

Publication number: 20230236878

Abstract: In various embodiments, scheduling dependencies associated with tasks executed on a processor are decoupled from data dependencies associated with the tasks. Before the completion of a first task that is executing in the processor, a scheduling dependency specifying that a second task is dependent on the first task is resolved based on a pre-exit trigger. In response to the resolution of the scheduling dependency, the second task is launched on the processor.

Type: Application

Filed: January 25, 2022

Publication date: July 27, 2023

Inventors: Jack Hilaire CHOQUETTE, Rajballav DASH, Shayani DEB, Gentaro HIROTA, Ronny M. KRASHINSKY, Ze LONG, Chen MEI, Manan PATEL, Ming Y. SIU

1 2 next