Patents by Inventor James Dinan
James Dinan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11677839Abstract: Apparatuses, systems, and techniques are directed to automatic coalescing of GPU-initiated network communications. In one method, a communication engine receives, from a shared memory application executing on a first graphics processing unit (GPU), a first communication request assigned to or having a second GPU as a destination to be processed. The communication engine determines that the first communication request satisfies a coalescing criterion and stores the first communication request in association with a group of requests that have a common property. The communication engine coalesces the group of requests into a coalesced request and transports the coalesced request to the second GPU over a network.Type: GrantFiled: June 17, 2021Date of Patent: June 13, 2023Assignee: NVIDIA CorporationInventors: James Dinan, Akhil Langer, Sreeram Potluri
-
Patent number: 11645534Abstract: An embodiment of a semiconductor package apparatus may include technology to embed one or more trigger operations in one or more messages related to collective operations for a neural network, and issue the one or more messages related to the collective operations to a hardware-based message scheduler in a desired order of execution. Other embodiments are disclosed and claimed.Type: GrantFiled: September 11, 2018Date of Patent: May 9, 2023Assignee: Intel CorporationInventors: Sayantan Sur, James Dinan, Maria Garzaran, Anupama Kurpad, Andrew Friedley, Nusrat Islam, Robert Zak
-
Publication number: 20220407920Abstract: Apparatuses, systems, and techniques are directed to automatic coalescing of GPU-initiated network communications. In one method, a communication engine receives, from a shared memory application executing on a first graphics processing unit (GPU), a first communication request assigned to or having a second GPU as a destination to be processed. The communication engine determines that the first communication request satisfies a coalescing criterion and stores the first communication request in association with a group of requests that have a common property. The communication engine coalesces the group of requests into a coalesced request and transports the coalesced request to the second GPU over a network.Type: ApplicationFiled: June 17, 2021Publication date: December 22, 2022Inventors: James Dinan, Akhil Langer, Sreeram Potluri
-
Publication number: 20220334948Abstract: Methods, apparatus, systems and articles of manufacture to improve performance data collection are disclosed. An example apparatus includes a performance data comparator of a source node to collect the performance data of an application of the source node from the host fabric interface at a polling frequency; an interface to transmit a write back instruction to the host fabric interface, the write back instruction to cause data to be written to a memory address location of memory of the source node to trigger a wake up mode; and a frequency selector to: start the polling frequency to a first polling frequency for a sleep mode; and increase the polling frequency to a second polling frequency in response to the data in the memory address location identifying the wake mode.Type: ApplicationFiled: July 1, 2022Publication date: October 20, 2022Inventors: David Ozog, Md. Wasi-ur Rahman, James Dinan
-
Patent number: 11194636Abstract: Technologies for generating triggered conditional events operations include a host fabric interface (HFI) of a compute device configured to receive an operation execution command message associated with a triggered operation that has been fired, process the received operation execution command message to extract and store argument information from the received operation execution command, and increment an event counter associated with the fired triggered operation. The HFI is further configured to perform a triggered compare-and-generate event (TCAGE) operation as a function of the extracted argument information, determine whether to generate a triggering event, generate the triggering event as a function of the performed TCAGE operation, insert the generated triggered event into a triggered operation queue, and update the value of the event counter. Other embodiments are described herein.Type: GrantFiled: March 30, 2018Date of Patent: December 7, 2021Assignee: Intel CorporationInventors: Mario Flajslik, Keith D. Underwood, Timo Schneider, James Dinan
-
Patent number: 11188394Abstract: Technologies for synchronizing triggered operations include a host fabric interface (HFI) of a compute device configured to receive an operation execution command associated with a triggered operation that has been fired and determine whether the operation execution command includes an instruction to update a table entry of a table managed by the HFI. Additionally, the HFI is configured to issue, in response to a determination that the operation execution command includes the instruction to update the table entry, a triggered list enable (TLE) operation and a triggered list disable (TLD) operation to a table manager of the HFI and disable a corresponding table entry in response to the TLD operation having been triggered, the identified table entry. The HFI is further configured to execute one or more command operations associated with the received operation execution command and re-enable, in response to the TLE operation having been triggered, the table entry. Other embodiments are described herein.Type: GrantFiled: March 30, 2018Date of Patent: November 30, 2021Assignee: Intel CorporationInventors: James Dinan, Mario Flajslik, Timo Schneider, Keith D. Underwood
-
Patent number: 11157336Abstract: Technologies for extending triggered operations include a host fabric interface (HFI) of a compute device configured to detect a triggering event associated with a counter, increment the counter, and determine whether a value of the counter matches a trigger threshold of a triggered operation in a triggered operation queue associated with the counter. The HFI is further configured to execute, one or more commands associated with the triggered operation upon determining that the value of the counter matches the trigger threshold, and determine, subsequent to the execution of the one or more commands, whether the triggered operation corresponds to a recurring triggered operation. The HFI is additionally configured to increment, in response to a determination that the triggered operation corresponds to a recurring triggered operation, the value of the trigger threshold by a threshold increment and re-insert the triggered operation into the triggered operation queue. Other embodiments are described herein.Type: GrantFiled: December 30, 2017Date of Patent: October 26, 2021Assignee: Intel CorporationInventors: James Dinan, Mario Flajslik, Timo Schneider, Keith D. Underwood
-
Publication number: 20210255910Abstract: Systems, apparatuses and methods may provide for detecting an outbound communication and identifying a context of the outbound communication. Additionally, a completion status of the outbound communication may be tracked relative to the context. In one example, tracking the completion status includes incrementing a sent messages counter associated with the context in response to the outbound communication, detecting an acknowledgement of the outbound communication based on a network response to the outbound communication, incrementing a received acknowledgements counter associated with the context in response to the acknowledgement, comparing the sent messages counter to the received acknowledgements counter, and triggering a per-context memory ordering operation if the sent messages counter and the received acknowledgements counter have matching values.Type: ApplicationFiled: May 21, 2020Publication date: August 19, 2021Applicant: Intel CorporationInventors: Mario Flajslik, James Dinan
-
Patent number: 11023275Abstract: Technologies for managing a queue on a compute device are disclosed. In the illustrative embodiment, the queue is managed by a host fabric interface of the compute device. Queue operations such as enqueuing data onto the queue and dequeuing data from the queue may be requested by remote compute devices by sending queue operations which may be processed by the host fabric interface. The host fabric interface may, in some embodiments, fully manage the queue without any assistance from the processor of the compute device. In other embodiments, the processor of the compute device may be responsible for certain tasks, such as garbage collection.Type: GrantFiled: February 9, 2017Date of Patent: June 1, 2021Assignee: Intel CorporationInventors: James Dinan, Mario Flajslik, Timo Schneider
-
Patent number: 10963183Abstract: Technologies for fine-grained completion tracking of memory buffer accesses include a compute device. The compute device is to establish multiple counter pairs for a memory buffer. Each counter pair includes a locally managed offset and a completion counter. The compute device is also to receive a request from a remote compute device to access the memory buffer, assign one of the counter pairs to the request, advance the locally managed offset of the assigned counter pair by the amount of data to be read or written, and advance the completion counter of the assigned counter pair as the data is read from or written to the memory buffer. Other embodiments are also described and claimed.Type: GrantFiled: March 20, 2017Date of Patent: March 30, 2021Assignee: Intel CorporationInventors: James Dinan, Keith D. Underwood, Sayantan Sur, Charles A. Giefer, Mario Flajslik
-
Patent number: 10958589Abstract: Technologies for offloaded management of communication are disclosed. In order to manage communication with information that may be available to applications in a compute device, the compute device may offload communication management to a host fabric interface using a credit management system. A credit limit is established, and each message to be sent is added to a queue with a corresponding number of credits required to send the message. The host fabric interface of the compute device may send out messages as credits become available and decrease the number of available credits based on the number of credits required to send a particular message. When an acknowledgement of receipt of a message is received, the number of credits required to send the corresponding message may be added back to an available credit pool.Type: GrantFiled: March 29, 2017Date of Patent: March 23, 2021Assignee: Intel CorporationInventors: James Dinan, Sayantan Sur, Mario Flajslik, Keith D. Underwood
-
Patent number: 10693787Abstract: Techniques are disclosed to throttle bandwidth imbalanced data transfers. In some examples, an example computer-implemented method may include splitting a payload of a data transfer operation over a network fabric into multiple chunk get operations, starting the execution of a threshold number of the chunk get operations, and scheduling the remaining chunk get operations for subsequent execution. The method may also include executing a scheduled chunk get operation in response determining a completion of an executing chunk get operation. In some embodiments, the chunk get operations may be implemented as triggered operations.Type: GrantFiled: August 25, 2017Date of Patent: June 23, 2020Assignee: Intel CorporationInventors: Timo Schneider, Keith D. Underwood, Mario Flajslik, Sayantan Sur, James Dinan
-
Patent number: 10671457Abstract: Systems, apparatuses and methods may provide for detecting an outbound communication and identifying a context of the outbound communication. Additionally, a completion status of the outbound communication may be tracked relative to the context. In one example, tracking the completion status includes incrementing a sent messages counter associated with the context in response to the outbound communication, detecting an acknowledgement of the outbound communication based on a network response to the outbound communication, incrementing a received acknowledgements counter associated with the context in response to the acknowledgement, comparing the sent messages counter to the received acknowledgements counter, and triggering a per-context memory ordering operation if the sent messages counter and the received acknowledgements counter have matching values.Type: GrantFiled: March 27, 2015Date of Patent: June 2, 2020Assignee: Intel CorporationInventors: Mario Flajslik, James Dinan
-
Patent number: 10652353Abstract: Technologies for communication with direct data placement include a number of computing nodes in communication over a network. Each computing node includes a many-core processor having an integrated host fabric interface (HFI) that maintains an association table (AT). In response to receiving a message from a remote device, the HFI determines whether the AT includes an entry associating one or more parameters of the message to a destination processor core. If so, the HFI causes a data transfer agent (DTA) of the destination core to receive the message data. The DTA may place the message data in a private cache of the destination core. Message parameters may include a destination process identifier or other network address and a virtual memory address range. The HFI may automatically update the AT based on communication operations generated by software executed by the processor cores. Other embodiments are described and claimed.Type: GrantFiled: September 24, 2015Date of Patent: May 12, 2020Assignee: Intel CorporationInventors: James Dinan, Venkata Krishnan, Srinivas Sridharan, David A. Webb
-
Patent number: 10574733Abstract: Technologies for handling message passing interface receive operations include a compute node to determine a plurality of parameters of a receive entry to be posted and determine whether the plurality of parameters includes a wildcard entry. The compute node generates a hash based on at least one parameter of the plurality of parameters in response to determining that the plurality of parameters does not include the wildcard entry and appends the receive entry to a list in a bin of a posted receive data structure, wherein the bin is determined based on the generated hash. The compute node further tracks the wildcard entry in the posted receive data structure in response to determining the plurality of parameters includes the wildcard entry and appends the receive entry to a wildcard list of the posted receive data structure in response to tracking the wildcard entry.Type: GrantFiled: September 18, 2015Date of Patent: February 25, 2020Assignee: Intel CorporationInventors: James Dinan, Mario Flajslik, Keith D. Underwood
-
Patent number: 10554568Abstract: Technologies for estimating network round-trip times include a sender computing node in network communication with a set of neighboring computing nodes. The sender computing node is configured to determine the set of neighboring computing nodes, as well as a plurality of subsets of the set of neighboring computing nodes. Accordingly, the sender computing node generates a message queue for each of the plurality of subsets, each message queue including a probe message for each neighboring node in the subset to which the message queue corresponds. The sender computing node is further configured to determine a round-trip time for each message queue (i.e., subset of neighboring computing nodes) based on a duration of time between the first probe message of the message queue being transmitted and an acknowledgment being received in response to the last probe message of the message queue being transmitted.Type: GrantFiled: September 25, 2015Date of Patent: February 4, 2020Assignee: Intel CorporationInventors: Mario Flajslik, James Dinan
-
Patent number: 10439946Abstract: Technologies for endpoint congestion avoidance are disclosed. In order to avoid congestion caused by a network fabric that can transport data to a compute device faster than the compute device can store the data in a particular type of memory, the compute device may in the illustrative embodiment determine a suitable data transfer rate and communicate an indication of the data transfer rate to the remote compute device which is sending the data. The remote compute device may then send the data at the indicated data transfer rate, thus avoiding congestion.Type: GrantFiled: February 10, 2017Date of Patent: October 8, 2019Assignee: INTEL CORPORATIONInventors: James Dinan, Mario Flajslik, Robert C. Zak
-
Publication number: 20190188111Abstract: Methods, apparatus, systems and articles of manufacture to improve performance data collection are disclosed. An example apparatus includes a performance data comparator of a source node to collect the performance data of an application of the source node from the host fabric interface at a polling frequency; an interface to transmit a write back instruction to the host fabric interface, the write back instruction to cause data to be written to a memory address location of memory of the source node to trigger a wake up mode; and a frequency selector to: start the polling frequency to a first polling frequency for a sleep mode; and increase the polling frequency to a second polling frequency in response to the data in the memory address location identifying the wake mode.Type: ApplicationFiled: February 26, 2019Publication date: June 20, 2019Inventors: David Ozog, Md. Wasi-ur Rahman, James Dinan
-
Publication number: 20190068501Abstract: Techniques are disclosed to throttle bandwidth imbalanced data transfers. In some examples, an example computer-implemented method may include splitting a payload of a data transfer operation over a network fabric into multiple chunk get operations, starting the execution of a threshold number of the chunk get operations, and scheduling the remaining chunk get operations for subsequent execution. The method may also include executing a scheduled chunk get operation in response determining a completion of an executing chunk get operation. In some embodiments, the chunk get operations may be implemented as triggered operations.Type: ApplicationFiled: August 25, 2017Publication date: February 28, 2019Applicant: INTEL CORPORATIONInventors: TIMO SCHNEIDER, KEITH D. UNDERWOOD, MARIO FLAJSLIK, SAYANTAN SUR, JAMES DINAN
-
Publication number: 20190050274Abstract: Technologies for synchronizing triggered operations include a host fabric interface (HFI) of a compute device configured to receive an operation execution command associated with a triggered operation that has been fired and determine whether the operation execution command includes an instruction to update a table entry of a table managed by the HFI. Additionally, the he HFI is configured to issue, in response to a determination that the operation execution command includes the instruction to update the table entry, a triggered list enable (TLE) operation and a triggered list disable (TLD) operation to a table manager of the HFI and disable a corresponding table entry in response to the TLD operation having been triggered, the identified table entry. The HFI is further configured to execute one or more command operations associated with the received operation execution command and re-enable, in response to the TLE operation having been triggered, the table entry. Other embodiments are described herein.Type: ApplicationFiled: March 30, 2018Publication date: February 14, 2019Inventors: James Dinan, Mario Flajslik, Timo Schneider, Keith D. Underwood