Patents by Inventor Benjamin Klenk
Benjamin Klenk has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Publication number: 20240184927Abstract: Messaging protocols used by components in a messaging system to exchange messages conventionally use a reliability mechanism to ensure that each message sent by a sender is received, without compromise, by the intended receiver. Typically, this reliability mechanism involves use of a returned acknowledgement message to the message sender, with automatic retransmission of the message by the sender when the acknowledgement message is not received (e.g. within a defined timeframe). However, existing acknowledgement-based reliability mechanisms require that a sender identifier be included in the message header, which increases the overhead of the message. The present disclosure provides an epoch-based reliability mechanism that allows the sender identifier to be omitted from the message header to minimize overhead and maximize the efficient use of the available bandwidth.Type: ApplicationFiled: December 2, 2022Publication date: June 6, 2024Inventors: Benjamin Klenk, Al Davis, Larry Robert Dennison
-
Publication number: 20230327996Abstract: Aggregation of small payloads from multiple packets may improve bandwidth efficiency of a network, particularly a high-performance compute cluster with thousands of network endpoints and distributed data. Aggregation is context-based and a packet header is reduced because the common components that are shared by the aggregated messages are included once within the header. Execution contexts are explicitly created and destroyed by application programs. Each participating endpoint stores context-specific properties until the context is destroyed, so that the properties are not included in the header. Aggregation may be performed at different hierarchical levels by switches and/or endpoints.Type: ApplicationFiled: January 4, 2023Publication date: October 12, 2023Inventors: Benjamin Klenk, Alan Lynn Davis, Larry Robert Dennison
-
Patent number: 11502867Abstract: A network device configured to perform scalable, in-network computations is described. The network device is configured to process pull requests and/or push requests from a plurality of endpoints connected to the network. A collective communication primitive from a particular endpoint can be received at a network device. The collective communication primitive is associated with a multicast region of a shared global address space and is mapped to a plurality of participating endpoints. The network device is configured to perform an in-network computation based on information received from the participating endpoints before forwarding a response to the collective communication primitive back to one or more of the participating endpoints.Type: GrantFiled: July 24, 2020Date of Patent: November 15, 2022Assignee: NVIDIA CorporationInventors: Benjamin Klenk, Nan Jiang, Larry Robert Dennison
-
Patent number: 11463272Abstract: A network device configured to perform scalable, in-network computations is described. The network device is configured to process pull requests and/or push requests from a plurality of endpoints connected to the network. A collective communication primitive from a particular endpoint can be received at a network device. The collective communication primitive is associated with a multicast region of a shared global address space and is mapped to a plurality of participating endpoints. The network device is configured to perform an in-network computation based on information received from the participating endpoints before forwarding a response to the collective communication primitive back to one or more of the participating endpoints. The endpoints can inject pull requests (e.g., load commands) and/or push requests (e.g., store commands) into the network. A multicast capability enables tasks, such as a reduction operation, to be offloaded to hardware in the network device.Type: GrantFiled: October 6, 2021Date of Patent: October 4, 2022Assignee: NVIDIA CorporationInventors: Benjamin Klenk, Nan Jiang, Larry Robert Dennison, Gregory M. Thorson
-
Patent number: 11341369Abstract: A technique for performing data parallel training of a neural network model is disclosed that incorporates batch normalization techniques using partial populations to generate normalization parameters. The technique involves processing, by each processor of a plurality of processors in parallel, a first portion of a sub-batch of training samples allocated to the processor to generate activations for the first portion of the sub-batch. Each processor analyzes the activations and transmits statistical measures for the first portion to an additional processor that reduces the statistical measures from multiple processors to generate normalization parameters for a partial population of the training samples that includes the first portion from each of the plurality of processors. The normalization parameters are then transmitted back to each of the processors to normalize the activations for both the first portion and a second portion of the sub-batch of training samples allocated to each processor.Type: GrantFiled: October 31, 2019Date of Patent: May 24, 2022Assignee: NVIDIA CorporationInventors: Larry Robert Dennison, Benjamin Klenk
-
Patent number: 11336476Abstract: A network device configured to perform scalable, in-network computations is described. The network device is configured to process pull requests and/or push requests from a plurality of endpoints connected to the network. A collective communication primitive from a particular endpoint can be received at a network device. The collective communication primitive is associated with a multicast region of a shared global address space and is mapped to a plurality of participating endpoints. The network device is configured to perform an in-network computation based on information received from the participating endpoints before forwarding a response to the collective communication primitive back to one or more of the participating endpoints. The endpoints can inject pull requests (e.g., load commands) and/or push requests (e.g., store commands) into the network. A multicast capability enables tasks, such as a reduction operation, to be offloaded to hardware in the network device.Type: GrantFiled: July 24, 2020Date of Patent: May 17, 2022Assignee: NVIDIA CorporationInventors: Benjamin Klenk, Nan Jiang, Larry Robert Dennison, Gregory M. Thorson
-
Publication number: 20220029845Abstract: A network device configured to perform scalable, in-network computations is described. The network device is configured to process pull requests and/or push requests from a plurality of endpoints connected to the network. A collective communication primitive from a particular endpoint can be received at a network device. The collective communication primitive is associated with a multicast region of a shared global address space and is mapped to a plurality of participating endpoints. The network device is configured to perform an in-network computation based on information received from the participating endpoints before forwarding a response to the collective communication primitive back to one or more of the participating endpoints. The endpoints can inject pull requests (e.g., load commands) and/or push requests (e.g., store commands) into the network. A multicast capability enables tasks, such as a reduction operation, to be offloaded to hardware in the network device.Type: ApplicationFiled: October 6, 2021Publication date: January 27, 2022Inventors: Benjamin Klenk, Nan Jiang, Larry Robert Dennison, Gregory M. Thorson
-
Patent number: 11171798Abstract: A network device configured to perform scalable, in-network computations is described. The network device is configured to process pull requests and/or push requests from a plurality of endpoints connected to the network. A collective communication primitive from a particular endpoint can be received at a network device. The collective communication primitive is associated with a multicast region of a shared global address space and is mapped to a plurality of participating endpoints. The network device is configured to perform an in-network computation based on information received from the participating endpoints before forwarding a response to the collective communication primitive back to one or more of the participating endpoints. The endpoints can inject pull requests (e.g., load commands) and/or push requests (e.g., store commands) into the network. A multicast capability enables tasks, such as a reduction operation, to be offloaded to hardware in the network device.Type: GrantFiled: July 24, 2020Date of Patent: November 9, 2021Assignee: NVIDIA CorporationInventors: Benjamin Klenk, Nan Jiang, Larry Robert Dennison, Gregory M. Thorson
-
Patent number: 11170263Abstract: A technique utilizing speculative execution and rollback for performing data parallel training of a neural network model is disclosed. Activations for a layer of the neural network model are normalized during a speculative normalization operation using estimated normalization parameters associated with a partial population of a set of training data allocated to a particular processor. Normalization parameters associated with the total population of the set of training data are generated by a distributed reduce operation in parallel with the speculative normalization operation. An optional rollback operation can revert the activations to a pre-normalization state if the estimated normalization parameters for the partial population are subsequently determined to be inaccurate compared to the normalization parameters for the population of the set of training data distributed across a plurality of processors.Type: GrantFiled: October 31, 2019Date of Patent: November 9, 2021Assignee: NVIDIA CorporationInventors: Larry Robert Dennison, Benjamin Klenk
-
Publication number: 20210036877Abstract: A network device configured to perform scalable, in-network computations is described. The network device is configured to process pull requests and/or push requests from a plurality of endpoints connected to the network. A collective communication primitive from a particular endpoint can be received at a network device. The collective communication primitive is associated with a multicast region of a shared global address space and is mapped to a plurality of participating endpoints. The network device is configured to perform an in-network computation based on information received from the participating endpoints before forwarding a response to the collective communication primitive back to one or more of the participating endpoints. The endpoints can inject pull requests (e.g., load commands) and/or push requests (e.g., store commands) into the network. A multicast capability enables tasks, such as a reduction operation, to be offloaded to hardware in the network device.Type: ApplicationFiled: July 24, 2020Publication date: February 4, 2021Inventors: Benjamin Klenk, Nan Jiang, Larry Robert Dennison, Gregory M. Thorson
-
Publication number: 20210036881Abstract: A network device configured to perform scalable, in-network computations is described. The network device is configured to process pull requests and/or push requests from a plurality of endpoints connected to the network. A collective communication primitive from a particular endpoint can be received at a network device. The collective communication primitive is associated with a multicast region of a shared global address space and is mapped to a plurality of participating endpoints. The network device is configured to perform an in-network computation based on information received from the participating endpoints before forwarding a response to the collective communication primitive back to one or more of the participating endpoints.Type: ApplicationFiled: July 24, 2020Publication date: February 4, 2021Inventors: Benjamin Klenk, Nan Jiang, Larry Robert Dennison
-
Publication number: 20210037107Abstract: A network device configured to perform scalable, in-network computations is described. The network device is configured to process pull requests and/or push requests from a plurality of endpoints connected to the network. A collective communication primitive from a particular endpoint can be received at a network device. The collective communication primitive is associated with a multicast region of a shared global address space and is mapped to a plurality of participating endpoints. The network device is configured to perform an in-network computation based on information received from the participating endpoints before forwarding a response to the collective communication primitive back to one or more of the participating endpoints. The endpoints can inject pull requests (e.g., load commands) and/or push requests (e.g., store commands) into the network. A multicast capability enables tasks, such as a reduction operation, to be offloaded to hardware in the network device.Type: ApplicationFiled: July 24, 2020Publication date: February 4, 2021Inventors: Benjamin Klenk, Nan Jiang, Larry Robert Dennison, Gregory M. Thorson
-
Publication number: 20200160112Abstract: A technique for performing data parallel training of a neural network model is disclosed that incorporates batch normalization techniques using partial populations to generate normalization parameters. The technique involves processing, by each processor of a plurality of processors in parallel, a first portion of a sub-batch of training samples allocated to the processor to generate activations for the first portion of the sub-batch. Each processor analyzes the activations and transmits statistical measures for the first portion to an additional processor that reduces the statistical measures from multiple processors to generate normalization parameters for a partial population of the training samples that includes the first portion from each of the plurality of processors. The normalization parameters are then transmitted back to each of the processors to normalize the activations for both the first portion and a second portion of the sub-batch of training samples allocated to each processor.Type: ApplicationFiled: October 31, 2019Publication date: May 21, 2020Inventors: Larry Robert Dennison, Benjamin Klenk
-
Publication number: 20200160123Abstract: A technique utilizing speculative execution and rollback for performing data parallel training of a neural network model is disclosed. Activations for a layer of the neural network model are normalized during a speculative normalization operation using estimated normalization parameters associated with a partial population of a set of training data allocated to a particular processor. Normalization parameters associated with the total population of the set of training data are generated by a distributed reduce operation in parallel with the speculative normalization operation. An optional rollback operation can revert the activations to a pre-normalization state if the estimated normalization parameters for the partial population are subsequently determined to be inaccurate compared to the normalization parameters for the population of the set of training data distributed across a plurality of processors.Type: ApplicationFiled: October 31, 2019Publication date: May 21, 2020Inventors: Larry Robert Dennison, Benjamin Klenk