Patents by Inventor Mark Hummel

Mark Hummel has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20240137410
    Abstract: Systems and techniques for performing multicast-reduction operations. In at least one embodiment, a network device receives first network data associated with a multicast operation to be collectively performed by at least a plurality of endpoints. The network device reserves resources to process second network data to be received from the endpoints, and sends the first network data to a plurality of additional network devices. The network device receives the second network data, and processes the second network data using the reserved resources.
    Type: Application
    Filed: December 19, 2023
    Publication date: April 25, 2024
    Inventors: Glenn Dearth, Mark Hummel, Nan Jiang, Gregory Thorson
  • Patent number: 11956306
    Abstract: Systems and techniques for performing multicast-reduction operations. In at least one embodiment, a network device receives first network data associated with a multicast operation to be collectively performed by at least a plurality of endpoints. The network device reserves resources to process second network data to be received from the endpoints, and sends the first network data to a plurality of additional network devices. The network device receives the second network data, and processes the second network data using the reserved resources.
    Type: Grant
    Filed: March 30, 2022
    Date of Patent: April 9, 2024
    Assignee: NVIDIA CORPORATION
    Inventors: Glenn Dearth, Mark Hummel, Nan Jiang, Gregory Thorson
  • Publication number: 20240098139
    Abstract: Systems and techniques for performing multicast-reduction operations. In at least one embodiment, a network device receives first network data associated with a multicast operation to be collectively performed by at least a plurality of endpoints. The network device reserves resources to process second network data to be received from the endpoints, and sends the first network data to a plurality of additional network devices. The network device receives the second network data, and processes the second network data using the reserved resources.
    Type: Application
    Filed: March 30, 2022
    Publication date: March 21, 2024
    Inventors: Glenn Dearth, Mark Hummel, Nan Jiang, Gregory Thorson
  • Publication number: 20240069736
    Abstract: Various embodiments include techniques for performing self-synchronizing remote memory operations in a multiprocessor computing system. During a remote memory operation in the multiprocessor computing system, a source processing unit transmits multiple segments of data to a destination processing. For each segment of data, the source processing unit transmits a remote memory operation to the destination processing unit that includes associated metadata that identifies the memory location of a corresponding synchronization object. The remote memory operation along with the metadata is transmitted as a single unit to the destination processing unit. The destination processing unit splits the operation into the remote memory operation and the memory synchronization operation. As a result, the source processing unit avoids the need to perform a separate memory synchronization operation, thereby reducing inter-processor communications and increasing performance of remote memory operations.
    Type: Application
    Filed: August 31, 2022
    Publication date: February 29, 2024
    Inventors: Srinivas Santosh Kumar MADUGULA, Olivier GIROUX, Wishwesh Anil GANDHI, Michael Allen PARKER, Raghuram L, Ivan TANASIC, Manan PATEL, Mark HUMMEL, Alexander L. MINKIN
  • Patent number: 11863390
    Abstract: Apparatuses, systems, and techniques are presented to configure computing resources to perform various tasks. In at least one embodiment, an approach presented herein can be used to verify whether a network of computing nodes is properly configured based, at least in part, on one or more expected data strings generated by the network of computing nodes.
    Type: Grant
    Filed: August 16, 2022
    Date of Patent: January 2, 2024
    Assignee: Nvidia Corporation
    Inventors: Miriam Menes, Eitan Zahavi, Gil Bloch, Ahmad Atamli, Meni Orenbach, Mark Hummel, Glenn Dearth
  • Publication number: 20230401159
    Abstract: A method and system for providing memory in a computer system. The method includes receiving a memory access request for a shared memory address from a processor, mapping the received memory access request to at least one virtual memory pool to produce a mapping result, and providing the mapping result to the processor.
    Type: Application
    Filed: August 24, 2023
    Publication date: December 14, 2023
    Applicants: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: Anthony Asaro, Kevin Normoyle, Mark Hummel
  • Patent number: 11822491
    Abstract: Fabric Attached Memory (FAM) provides a pool of memory that can be accessed by one or more processors, such as a graphics processing unit(s) (GPU)(s), over a network fabric. In one instance, a technique is disclosed for using imperfect processors as memory controllers to allow memory, which is local to the imperfect processors, to be accessed by other processors as fabric attached memory. In another instance, memory address compaction is used within the fabric elements to fully utilize the available memory space.
    Type: Grant
    Filed: October 20, 2021
    Date of Patent: November 21, 2023
    Assignee: NVIDIA Corporation
    Inventors: John Feehrer, Denis Foley, Mark Hummel, Vyas Venkataraman, Ram Gummadi, Samuel H. Duncan, Glenn Dearth, Brian Kelleher
  • Patent number: 11741019
    Abstract: A method and system for allocating memory to a memory operation executed by a processor in a computer arrangement having a plurality of processors. The method includes receiving a memory operation from a processor that references an address in a shared memory, mapping the received memory operation to at least one virtual memory pool to produce a mapping result, and providing the mapping result to the processor.
    Type: Grant
    Filed: September 10, 2021
    Date of Patent: August 29, 2023
    Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: Anthony Asaro, Kevin Normoyle, Mark Hummel
  • Publication number: 20230229524
    Abstract: In various examples, a single notification (e.g., a request for a memory access operation) that a processing element (PE) has reached a synchronization barrier may be propagated to multiple physical addresses (PAs) and/or devices associated with multiple processing elements. Thus, the notification may allow an indication that the processing element has reached the synchronization barrier to be recoded at multiple targets. Each notification may access the PAs of each PE and/or device of a barrier group to update a corresponding counter. The PEs and/or devices may poll or otherwise use the counter to determine when each PE of the group has reached the synchronization barrier. When a corresponding counter indicates synchronization at the synchronization barrier, a PE may proceed with performing a compute task asynchronously with one or more other PEs until a subsequent synchronization barrier may be reached.
    Type: Application
    Filed: January 18, 2022
    Publication date: July 20, 2023
    Inventors: Glenn Alan Dearth, Mark Hummel, Daniel Joseph Lustig
  • Publication number: 20230229599
    Abstract: In various examples, a memory model may support multicasting where a single request for a memory access operation may be propagated to multiple physical addresses associated with multiple processing elements (e.g., corresponding to respective local memory). Thus, the request may cause data to be read from and/or written to memory for each of the processing elements. In some examples, a memory model exposes multicasting to processes. This may include providing for separate multicast and unicast instructions or shared instructions with one or more parameters (e.g., indicating a virtual address) being used to indicate multicasting or unicasting. Additionally or alternatively, whether a request(s) is processed using multicasting or unicasting may be opaque to a process and/or application or may otherwise be determined by the system. One or more constraints may be imposed on processing requests using multicasting to maintain a coherent memory interface.
    Type: Application
    Filed: January 18, 2022
    Publication date: July 20, 2023
    Inventors: Glenn Alan Dearth, Mark Hummel, Daniel Joseph Lustig
  • Publication number: 20230224239
    Abstract: Apparatuses, systems, and techniques to multicast a transaction to a group of targets. In at least one embodiment, a set is selected from alternate sets of directives associated with the group of targets, and the transaction is transmitted to the group of targets in accordance with the selected set.
    Type: Application
    Filed: January 13, 2022
    Publication date: July 13, 2023
    Inventors: Glenn Dearth, Nan Jiang, Mark Hummel, Richard Reeves
  • Publication number: 20220043759
    Abstract: Fabric Attached Memory (FAM) provides a pool of memory that can be accessed by one or more processors, such as a graphics processing unit(s) (GPU)(s), over a network fabric. In one instance, a technique is disclosed for using imperfect processors as memory controllers to allow memory, which is local to the imperfect processors, to be accessed by other processors as fabric attached memory. In another instance, memory address compaction is used within the fabric elements to fully utilize the available memory space.
    Type: Application
    Filed: October 20, 2021
    Publication date: February 10, 2022
    Inventors: John FEEHRER, Denis FOLEY, Mark HUMMEL, Vyas VENKATARAMAN, Ram GUMMADI, Samuel H. DUNCAN, Glenn DEARTH, Brian KELLEHER
  • Patent number: 11233730
    Abstract: Introduced herein is a routing technique that, for example, routes a transaction to a destination port over a network that supports link aggregation and multi-port connection. In one embodiment, two tables that can be searched based on the target and supplemental routing IDs of the transaction are utilized to route the transaction to the proper port of the destination endpoint. In an embodiment, the first table provides a list of available ports at each hop/route point that can route the transaction to the destination endpoint, and the second table provides a supplemental routing ID that can select a specific group of ports from the first table that can correctly route the transaction to the proper port.
    Type: Grant
    Filed: December 2, 2019
    Date of Patent: January 25, 2022
    Assignee: Nvidia Corporation
    Inventors: Glenn Dearth, Mark Hummel
  • Publication number: 20210406196
    Abstract: A method and system for allocating memory to a memory operation executed by a processor in a computer arrangement having a plurality of processors. The method includes receiving a memory operation from a processor that references an address in a shared memory, mapping the received memory operation to at least one virtual memory pool to produce a mapping result, and providing the mapping result to the processor.
    Type: Application
    Filed: September 10, 2021
    Publication date: December 30, 2021
    Applicants: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: Anthony Asaro, Kevin Normoyle, Mark Hummel
  • Patent number: 11182309
    Abstract: Fabric Attached Memory (FAM) provides a pool of memory that can be accessed by one or more processors, such as a graphics processing unit(s) (GPU)(s), over a network fabric. In one instance, a technique is disclosed for using imperfect processors as memory controllers to allow memory, which is local to the imperfect processors, to be accessed by other processors as fabric attached memory. In another instance, memory address compaction is used within the fabric elements to fully utilize the available memory space.
    Type: Grant
    Filed: November 4, 2019
    Date of Patent: November 23, 2021
    Assignee: NVIDIA Corporation
    Inventors: John Feehrer, Denis Foley, Mark Hummel, Vyas Venkataraman, Ram Gummadi, Samuel H. Duncan, Glenn Dearth, Brian Kelleher
  • Patent number: 11119944
    Abstract: A method and system for allocating memory to a memory operation executed by a processor in a computer arrangement having a plurality of processors. The method includes receiving a memory operation from a processor that receives a memory operation from a processor that references an address in a shared memory, mapping the received memory operation to at least one of a plurality of virtual memory pools to produce a mapping result, and providing the mapping result to the processor.
    Type: Grant
    Filed: June 17, 2019
    Date of Patent: September 14, 2021
    Assignees: Advanced Micro Devices, Inc., ATI Technologies ULC
    Inventors: Anthony Asaro, Kevin Normoyle, Mark Hummel
  • Patent number: 11082347
    Abstract: Multiple processors are often used in computing systems to solve very large, complex problems, such as those encountered in artificial intelligence. Such processors typically exchange data among each other via an interconnect fabric (such as, e.g., a group of network connections and switches) in solving such complex problems. The amount of data injected into the interconnect fabric by the processors can at times overwhelm the interconnect fabric preventing some of the processors from communicating with each other. To address this problem, techniques are disclosed to enable, for example, processors that are connected to an interconnect fabric to coordinate and control the amount of data injected so that the interconnect fabric does not get overwhelmed.
    Type: Grant
    Filed: February 15, 2019
    Date of Patent: August 3, 2021
    Assignee: Nvidia Corporation
    Inventors: Glenn Dearth, Nan Jiang, John Wortman, Alex Ishii, Mark Hummel, Rich Reeves
  • Patent number: 11038800
    Abstract: An endpoint in a network may make posted or non-posted write requests to another endpoint in the network. For a non-posted write request, the target endpoint provides a response to the requesting endpoint indicating that the write request has been serviced. For a posted write request, the target endpoint does not provide such an acknowledgment. Hence, posted write requests have lower overhead, but they suffer from potential synchronization and resiliency issues. While non-posted write requests do not have those issues, they cause increased load on the network because such requests require the target endpoint to acknowledge each write request. Introduced herein is a network operation technique that uses non-posted transactions while maintaining a load overhead of the network as a manageable level. The introduced technique reduces the load overhead of the non-posted write requests by collapsing and reducing a number of the responses.
    Type: Grant
    Filed: August 28, 2019
    Date of Patent: June 15, 2021
    Assignee: Nvidia Corporation
    Inventors: Glenn Dearth, Mark Hummel, Jonathan Owen, Mike Osborn, John Wortman, Rich Reeves
  • Publication number: 20210133123
    Abstract: Fabric Attached Memory (FAM) provides a pool of memory that can be accessed by one or more processors, such as a graphics processing unit(s) (GPU)(s), over a network fabric. In one instance, a technique is disclosed for using imperfect processors as memory controllers to allow memory, which is local to the imperfect processors, to be accessed by other processors as fabric attached memory. In another instance, memory address compaction is used within the fabric elements to fully utilize the available memory space.
    Type: Application
    Filed: November 4, 2019
    Publication date: May 6, 2021
    Inventors: John FEEHRER, Denis Foley, Mark Hummel, Vyas Venkataraman, Ram Gummadi, Samuel H. Duncan, Glenn Dearth, Brian Kelleher
  • Publication number: 20210067449
    Abstract: An endpoint in a network may make posted or non-posted write requests to another endpoint in the network. For a non-posted write request, the target endpoint provides a response to the requesting endpoint indicating that the write request has been serviced. For a posted write request, the target endpoint does not provide such an acknowledgment. Hence, posted write requests have lower overhead, but they suffer from potential synchronization and resiliency issues. While non-posted write requests do not have those issues, they cause increased load on the network because such requests require the target endpoint to acknowledge each write request. Introduced herein is a network operation technique that uses non-posted transactions while maintaining a load overhead of the network as a manageable level. The introduced technique reduces the load overhead of the non-posted write requests by collapsing and reducing a number of the responses.
    Type: Application
    Filed: August 28, 2019
    Publication date: March 4, 2021
    Inventors: Glenn Dearth, Mark Hummel, Jonathan Owen, Mike Osborn, John Wortman, Rich Reeves