Patents by Inventor ZE LONG

ZE LONG has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20250085973
    Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, one or more software kernels are caused to indicate one or more dependencies among two or more software kernels. In at least one embodiment, one or more software kernels are performed based on one or more kernel dependencies.
    Type: Application
    Filed: September 11, 2023
    Publication date: March 13, 2025
    Applicant: NVIDIA Corporation
    Inventors: Ze Long, Stephen Anthony Bernard Jones, Pradeep Moorthy, Mark Theng
  • Patent number: 12248788
    Abstract: Distributed shared memory (DSMEM) comprises blocks of memory that are distributed or scattered across a processor (such as a GPU). Threads executing on a processing core local to one memory block are able to access a memory block local to a different processing core. In one embodiment, shared access to these DSMEM allocations distributed across a collection of processing cores is implemented by communications between the processing cores. Such distributed shared memory provides very low latency memory access for processing cores located in proximity to the memory blocks, and also provides a way for more distant processing cores to also access the memory blocks in a manner and using interconnects that do not interfere with the processing cores' access to main or global memory such as hacked by an L2 cache.
    Type: Grant
    Filed: March 10, 2022
    Date of Patent: March 11, 2025
    Assignee: NVIDIA Corporation
    Inventors: Prakash Bangalore Prabhakar, Gentaro Hirota, Ronny Krashinsky, Ze Long, Brian Pharris, Rajballav Dash, Jeff Tuckey, Jerome F. Duluk, Jr., Lacky Shah, Luke Durant, Jack Choquette, Eric Werness, Naman Govil, Manan Patel, Shayani Deb, Sandeep Navada, John Edmondson, Greg Palmer, Wish Gandhi, Ravi Manyam, Apoorv Parle, Olivier Giroux, Shirish Gadre, Steve Heinrich
  • Publication number: 20250078199
    Abstract: A GPU can selectively confine software function execution and associated data storage resources to locally-connected processing/storage components, thereby minimizing latency and other overhead that would otherwise be needed to access more remote resources. The GPU can selectively permit other software function execution and associated data storage resources to range across non-locally-connected processing/storage components when more processing and/or storage resources are required.
    Type: Application
    Filed: August 29, 2023
    Publication date: March 6, 2025
    Inventors: Wish GANDHI, Ze LONG, Kun FANG
  • Publication number: 20240036956
    Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to indicate whether one or more threads within a group of blocks of threads have performed a barrier instruction and to cause performance of one or more threads within the group of blocks of threads to stop at least until all threads within the group of blocks have performed the barrier instruction.
    Type: Application
    Filed: September 28, 2022
    Publication date: February 1, 2024
    Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
  • Publication number: 20240036952
    Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to determine which of two or more blocks of threads are to be scheduled in parallel.
    Type: Application
    Filed: September 28, 2022
    Publication date: February 1, 2024
    Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
  • Publication number: 20240036957
    Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to cause memory to be shared between two or more groups of blocks of threads.
    Type: Application
    Filed: September 28, 2022
    Publication date: February 1, 2024
    Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
  • Publication number: 20240036955
    Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to indicate one or more limitations of one or more attributes of one or more groups of blocks of one or more threads.
    Type: Application
    Filed: September 28, 2022
    Publication date: February 1, 2024
    Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
  • Publication number: 20240036915
    Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to determine a scheduling policy of one or more blocks of one or more threads.
    Type: Application
    Filed: September 28, 2022
    Publication date: February 1, 2024
    Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
  • Publication number: 20240036953
    Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to indicate a scheduling policy of one or more blocks of one or more threads.
    Type: Application
    Filed: September 28, 2022
    Publication date: February 1, 2024
    Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
  • Publication number: 20240036951
    Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to indicate two or more blocks of threads to be scheduled in parallel.
    Type: Application
    Filed: September 28, 2022
    Publication date: February 1, 2024
    Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
  • Publication number: 20240036944
    Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to indicate whether one or more threads within two or more blocks of threads have performed a barrier instruction.
    Type: Application
    Filed: September 28, 2022
    Publication date: February 1, 2024
    Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
  • Publication number: 20240036918
    Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to cause a kernel to be generated to cause two or more blocks of two or more threads to be scheduled in parallel.
    Type: Application
    Filed: September 28, 2022
    Publication date: February 1, 2024
    Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
  • Publication number: 20240036917
    Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to indicate a maximum number of blocks of threads to be scheduled in parallel.
    Type: Application
    Filed: September 28, 2022
    Publication date: February 1, 2024
    Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
  • Publication number: 20240036954
    Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to indicate one or more attributes of one or more groups of blocks of one or more threads.
    Type: Application
    Filed: September 28, 2022
    Publication date: February 1, 2024
    Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
  • Publication number: 20240036945
    Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to cause performance of one or more threads within a group of blocks of threads to stop at least until all threads within the group of blocks have performed a barrier instruction.
    Type: Application
    Filed: September 28, 2022
    Publication date: February 1, 2024
    Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
  • Publication number: 20240036916
    Abstract: Apparatuses, systems, and techniques to execute CUDA programs. In at least one embodiment, an application programming interface is performed to indicate a maximum number of blocks of threads capable of being scheduled in parallel.
    Type: Application
    Filed: September 28, 2022
    Publication date: February 1, 2024
    Inventors: Ze Long, Kyrylo Perelygin, Harold Carter Edwards, Gokul Ramaswamy Hirisave Chandra Shekhara, Jaydeep Marathe, Ronny Meir Krashinsky, Girish Bhaskarrao Bharambe
  • Publication number: 20230289215
    Abstract: A new level(s) of hierarchy—Cooperate Group Arrays (CGAs)—and an associated new hardware-based work distribution/execution model is described. A CGA is a grid of thread blocks (also referred to as cooperative thread arrays (CTAs)). CGAs provide co-scheduling, e.g., control over where CTAs are placed/executed in a processor (such as a GPU), relative to the memory required by an application and relative to each other. Hardware support for such CGAs guarantees concurrency and enables applications to see more data locality, reduced latency, and better synchronization between all the threads in tightly cooperating collections of CTAs programmably distributed across different (e.g., hierarchical) hardware domains or partitions.
    Type: Application
    Filed: March 10, 2022
    Publication date: September 14, 2023
    Inventors: Greg PALMER, Gentaro HIROTA, Ronny KRASHINSKY, Ze LONG, Brian PHARRIS, Rajballav DASH, Jeff TUCKEY, Jerome F. DULUK, JR., Lacky SHAH, Luke DURANT, Jack CHOQUETTE, Eric WERNESS, Naman GOVIL, Manan PATEL, Shayani DEB, Sandeep NAVADA, John EDMONDSON, Prakash BANGALORE PRABHAKAR, Wish GANDHI, Ravi MANYAM, Apoorv PARLE, Olivier GIROUX, Shirish GADRE, Steve HEINRICH
  • Publication number: 20230289189
    Abstract: Distributed shared memory (DSMEM) comprises blocks of memory that are distributed or scattered across a processor (such as a GPU). Threads executing on a processing core local to one memory block are able to access a memory block local to a different processing core. In one embodiment, shared access to these DSMEM allocations distributed across a collection of processing cores is implemented by communications between the processing cores. Such distributed shared memory provides very low latency memory access for processing cores located in proximity to the memory blocks, and also provides a way for more distant processing cores to also access the memory blocks in a manner and using interconnects that do not interfere with the processing cores' access to main or global memory such as hacked by an L2 cache.
    Type: Application
    Filed: March 10, 2022
    Publication date: September 14, 2023
    Inventors: Prakash BANGALORE PRABHAKAR, Gentaro HIROTA, Ronny KRASHINSKY, Ze LONG, Brian PHARRIS, Rajballav DASH, Jeff TUCKEY, Jerome F. DULUK, JR., Lacky SHAH, Luke DURANT, Jack CHOQUETTE, Eric WERNESS, Naman GOVIL, Manan PATEL, Shayani DEB, Sandeep NAVADA, John EDMONDSON, Greg PALMER, Wish GANDHI, Ravi MANYAM, Apoorv PARLE, Olivier GIROUX, Shirish GADRE, Steve HEINRICH
  • Publication number: 20230289211
    Abstract: A processor supports new thread group hierarchies by centralizing work distribution to provide hardware-guaranteed concurrent execution of thread groups in a thread group array through speculative launch and load balancing across processing cores. Efficiencies are realized by distributing grid rasterization among the processing cores.
    Type: Application
    Filed: March 10, 2022
    Publication date: September 14, 2023
    Inventors: Gentaro HIROTA, Tanmoy MANDAL, Jeff TUCKEY, Kevin STEPHANO, Chen MEI, Shayani DEB, Naman GOVIL, Rajballav DASH, Ronny KRASHINSKY, Ze LONG, Brian PHARRIS
  • Publication number: 20230236878
    Abstract: In various embodiments, scheduling dependencies associated with tasks executed on a processor are decoupled from data dependencies associated with the tasks. Before the completion of a first task that is executing in the processor, a scheduling dependency specifying that a second task is dependent on the first task is resolved based on a pre-exit trigger. In response to the resolution of the scheduling dependency, the second task is launched on the processor.
    Type: Application
    Filed: January 25, 2022
    Publication date: July 27, 2023
    Inventors: Jack Hilaire CHOQUETTE, Rajballav DASH, Shayani DEB, Gentaro HIROTA, Ronny M. KRASHINSKY, Ze LONG, Chen MEI, Manan PATEL, Ming Y. SIU