Patents by Inventor Sayantan Sur

Sayantan Sur has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11409673
    Abstract: Examples include a method of managing storage for triggered operations. The method includes receiving a request to allocate a triggered operation; if there is a free triggered operation, allocating the free triggered operation; if there is no free triggered operation, recovering one or more fired triggered operations, freeing one or more of the recovered triggered operations, and allocating one of the freed triggered operations; configuring the allocated triggered operation; and storing the configured triggered operation in a cache on an input/output (I/O) device for subsequent asynchronous execution of the configured triggered operation.
    Type: Grant
    Filed: February 14, 2019
    Date of Patent: August 9, 2022
    Assignee: Intel Corporation
    Inventors: Andrew Friedley, Sayantan Sur, Ravindra Babu Ganapathi, Travis Hamilton, Keith D. Underwood
  • Publication number: 20220210639
    Abstract: In an embodiment, at least one interface mechanism may be provided. The mechanism may permit, at least in part, at least one process allocate, at least in part, and/or configure, at least in part, at least one network-associated object. Such allocation and/or configuration, at least in part, may be in accordance with at least one parameter set that may correspond, at least in part, to at least one query issued by the at least one process via the mechanism. Many modifications are possible without departing from this embodiment.
    Type: Application
    Filed: December 10, 2021
    Publication date: June 30, 2022
    Applicant: Intel Corporation
    Inventors: William R. Magro, Todd M. Rimmer, Robert J. Woodruff, Mark S. Hefty, Sayantan Sur
  • Patent number: 11246027
    Abstract: In an embodiment, at least one interface mechanism may be provided. The mechanism may permit, at least in part, at least one process allocate, at least in part, and/or configure, at least in part, at least one network-associated object. Such allocation and/or configuration, at least in part, may be in accordance with at least one parameter set that may correspond, at least in part, to at least one query issued by the at least one process via the mechanism. Many modifications are possible without departing from this embodiment.
    Type: Grant
    Filed: September 29, 2016
    Date of Patent: February 8, 2022
    Assignee: Intel Corporation
    Inventors: William R. Magro, Todd M. Rimmer, Robert J. Woodruff, Mark S. Hefty, Sayantan Sur
  • Patent number: 11150967
    Abstract: Methods, software, and systems for improved data transfer operations using overlapped rendezvous memory registration. Techniques are disclosed for transferring data between a first process operating as a sender and a second process operating as a receiver. The sender sends a PUT request message to the receiver including payload data stored in a send buffer and first and second match indicia. The first match indicia is used to determine whether the PUT request is expected or unexpected. If the PUT request is unexpected, an RMA GET operation is performed using the second matching indicia to pull data from the send buffer and write the data to a memory region in the user space of the process associated with the receiver. If the PUT request message is expected, the data payload with the PUT request is written to a receive buffer on the receiver determined using the first match indicia.
    Type: Grant
    Filed: September 30, 2017
    Date of Patent: October 19, 2021
    Assignee: Intel Corporation
    Inventors: Sayantan Sur, Keith Underwood, Ravindra Babu Ganapathi, Andrew Friedley
  • Publication number: 20210271536
    Abstract: Algorithms for optimizing small message collectives with hardware supported triggered operations and associated methods, apparatus, and systems. The algorithms are implemented in a distributed compute environment comprising a plurality of ranks including a root, a plurality of intermediate nodes, and a plurality of leaf nodes, where each of the plurality of ranks comprising a compute platform having a communication interface including embedded logic for implementing the algorithms. Collectives are employed to transfer data between parent ranks and child ranks. In connection with the collectives, control messages are sent from children of a collective to the parent of the collective informing the parent that the children of the collective have free buffers ready to receive data. The parent employs a counter to determine that a control message has been received from each of its children indicating each child has a free buffer prior to sending data to the children in the collective.
    Type: Application
    Filed: December 23, 2020
    Publication date: September 2, 2021
    Inventors: Maria Garzaran, Nusrat Islam, Gengbin Zheng, Sayantan Sur
  • Patent number: 10963183
    Abstract: Technologies for fine-grained completion tracking of memory buffer accesses include a compute device. The compute device is to establish multiple counter pairs for a memory buffer. Each counter pair includes a locally managed offset and a completion counter. The compute device is also to receive a request from a remote compute device to access the memory buffer, assign one of the counter pairs to the request, advance the locally managed offset of the assigned counter pair by the amount of data to be read or written, and advance the completion counter of the assigned counter pair as the data is read from or written to the memory buffer. Other embodiments are also described and claimed.
    Type: Grant
    Filed: March 20, 2017
    Date of Patent: March 30, 2021
    Assignee: Intel Corporation
    Inventors: James Dinan, Keith D. Underwood, Sayantan Sur, Charles A. Giefer, Mario Flajslik
  • Patent number: 10958589
    Abstract: Technologies for offloaded management of communication are disclosed. In order to manage communication with information that may be available to applications in a compute device, the compute device may offload communication management to a host fabric interface using a credit management system. A credit limit is established, and each message to be sent is added to a queue with a corresponding number of credits required to send the message. The host fabric interface of the compute device may send out messages as credits become available and decrease the number of available credits based on the number of credits required to send a particular message. When an acknowledgement of receipt of a message is received, the number of credits required to send the corresponding message may be added back to an available credit pool.
    Type: Grant
    Filed: March 29, 2017
    Date of Patent: March 23, 2021
    Assignee: Intel Corporation
    Inventors: James Dinan, Sayantan Sur, Mario Flajslik, Keith D. Underwood
  • Publication number: 20210042254
    Abstract: Methods and apparatus for an accelerator controller hub (ACH). The ACH may be a stand-alone component or integrated on-die or on package in an accelerator such as a GPU. The ACH may include a host device link (HDL) interface, one or more Peripheral Component Interconnect Express (PCIe) interfaces, one or more high performance accelerator link (HPAL) interfaces, and a router, operatively coupled to each of the HDL interface, the one or more PCIe interfaces, and the one or more HPAL interfaces. The HDL interface is configured to be coupled to a host CPU via an HDL link and the one or more HPAL interfaces are configured to be coupled to one or more HPALs that are used to access high performance accelerator fabrics (HPAFs) such as NVlink fabrics and CCIX (Cache Coherent Interconnect for Accelerators) fabrics. Platforms including ACHs or accelerators with integrated ACHs support RDMA transfers using RDMA semantics to enable transfers between accelerator memory on initiators and targets without CPU involvement.
    Type: Application
    Filed: October 28, 2020
    Publication date: February 11, 2021
    Inventors: Pratik Marolia, Andrew Herdrich, Rajesh Sankaran, Rahul Pal, David Puffer, Sayantan Sur, Ajaya Durg
  • Patent number: 10846245
    Abstract: Examples include a computing system having an input/output (I/O) device including a plurality of counters, each counter operating as one of a completion counter and a trigger counter, a processing device; and a memory device. The memory device stores instructions that, in response to execution by the processing device, cause the processing device to represent a plurality of triggered operations of collective communication for high-performance computing executable by the I/O device as a directed acyclic graph stored in the memory device, with triggered operations represented as vertices of the directed acyclic graph and dependencies between triggered operations represented as edges of the directed acyclic graph; traverse the directed acyclic graph using a first process to identify and mark vertices that can share a completion counter; and traverse the directed acyclic graph using a second process to assign a completion counter and a trigger counter for each vertex.
    Type: Grant
    Filed: March 14, 2019
    Date of Patent: November 24, 2020
    Assignee: Intel Corporation
    Inventors: Nusrat Islam, Gengbin Zheng, Sayantan Sur, Maria Garzaran, Akhil Langer
  • Publication number: 20200358721
    Abstract: Examples described herein relate to receiving, at a network interface, an allocation of a first group of one or more buffers to store data to be processed by a Message Passing Interface (MPI) and based on a received packet including an indicator that permits the network interface to select a buffer for the received packet and store the received packet in the selected buffer, the network interface storing a portion of the received packet in a buffer of the first group of the one or more buffers. The indicator can permit the network interface to select a buffer for the received packet and store the received packet in the selected buffer irrespective of a tag and sender associated with the received packet.
    Type: Application
    Filed: July 30, 2020
    Publication date: November 12, 2020
    Inventors: Todd RIMMER, Sayantan SUR, Michael William HEINZ
  • Patent number: 10693787
    Abstract: Techniques are disclosed to throttle bandwidth imbalanced data transfers. In some examples, an example computer-implemented method may include splitting a payload of a data transfer operation over a network fabric into multiple chunk get operations, starting the execution of a threshold number of the chunk get operations, and scheduling the remaining chunk get operations for subsequent execution. The method may also include executing a scheduled chunk get operation in response determining a completion of an executing chunk get operation. In some embodiments, the chunk get operations may be implemented as triggered operations.
    Type: Grant
    Filed: August 25, 2017
    Date of Patent: June 23, 2020
    Assignee: Intel Corporation
    Inventors: Timo Schneider, Keith D. Underwood, Mario Flajslik, Sayantan Sur, James Dinan
  • Publication number: 20190213146
    Abstract: Examples include a computing system having an input/output (I/O) device including a plurality of counters, each counter operating as one of a completion counter and a trigger counter, a processing device; and a memory device. The memory device stores instructions that, in response to execution by the processing device, cause the processing device to represent a plurality of triggered operations of collective communication for high-performance computing executable by the I/O device as a directed acyclic graph stored in the memory device, with triggered operations represented as vertices of the directed acyclic graph and dependencies between triggered operations represented as edges of the directed acyclic graph; traverse the directed acyclic graph using a first process to identify and mark vertices that can share a completion counter; and traverse the directed acyclic graph using a second process to assign a completion counter and a trigger counter for each vertex.
    Type: Application
    Filed: March 14, 2019
    Publication date: July 11, 2019
    Inventors: Nusrat ISLAM, Gengbin ZHENG, Sayantan SUR, Maria GARZARAN, Akhil LANGER
  • Publication number: 20190179779
    Abstract: Examples include a method of managing storage for triggered operations. The method includes receiving a request to allocate a triggered operation; if there is a free triggered operation, allocating the free triggered operation; if there is no free triggered operation, recovering one or more fired triggered operations, freeing one or more of the recovered triggered operations, and allocating one of the freed triggered operations; configuring the allocated triggered operation; and storing the configured triggered operation in a cache on an input/output (I/O) device for subsequent asynchronous execution of the configured triggered operation.
    Type: Application
    Filed: February 14, 2019
    Publication date: June 13, 2019
    Inventors: Andrew FRIEDLEY, Sayantan SUR, Ravindra Babu GANAPATHI, Travis HAMILTON, Keith D. UNDERWOOD
  • Publication number: 20190102236
    Abstract: Methods, software, and systems for improved data transfer operations using overlapped rendezvous memory registration. Techniques are disclosed for transferring data between a first process operating as a sender and a second process operating as a receiver. The sender sends a PUT request message to the receiver including payload data stored in a send buffer and first and second match indicia. Subsequent to or in conjunction with sending the PUT request message, the send buffer is exposed on the sender. The first match indicia is used to determine whether the PUT request is expected or unexpected. If the PUT request is unexpected, an RMA GET operation is performed using the second matching indicia to pull data from the send buffer and write the data to a memory region in the user space of the process associated with the receiver. The RMA GET operation may be retried one or more times in the event that the send buffer has yet to be exposed.
    Type: Application
    Filed: September 30, 2017
    Publication date: April 4, 2019
    Inventors: Sayantan Sur, Keith Underwood, Ravindra Babu Ganapathi, Andrew Friedley
  • Publication number: 20190068501
    Abstract: Techniques are disclosed to throttle bandwidth imbalanced data transfers. In some examples, an example computer-implemented method may include splitting a payload of a data transfer operation over a network fabric into multiple chunk get operations, starting the execution of a threshold number of the chunk get operations, and scheduling the remaining chunk get operations for subsequent execution. The method may also include executing a scheduled chunk get operation in response determining a completion of an executing chunk get operation. In some embodiments, the chunk get operations may be implemented as triggered operations.
    Type: Application
    Filed: August 25, 2017
    Publication date: February 28, 2019
    Applicant: INTEL CORPORATION
    Inventors: TIMO SCHNEIDER, KEITH D. UNDERWOOD, MARIO FLAJSLIK, SAYANTAN SUR, JAMES DINAN
  • Publication number: 20190042946
    Abstract: An embodiment of a semiconductor package apparatus may include technology to embed one or more trigger operations in one or more messages related to collective operations for a neural network, and issue the one or more messages related to the collective operations to a hardware-based message scheduler in a desired order of execution. Other embodiments are disclosed and claimed.
    Type: Application
    Filed: September 11, 2018
    Publication date: February 7, 2019
    Applicant: Intel Corporation
    Inventors: Sayantan Sur, James Dinan, Maria Garzaran, Anupama Kurpad, Andrew Friedley, Nusrat Islam, Robert Zak
  • Publication number: 20180287954
    Abstract: Technologies for offloaded management of communication are disclosed. In order to manage communication with information that may be available to applications in a compute device, the compute device may offload communication management to a host fabric interface using a credit management system. A credit limit is established, and each message to be sent is added to a queue with a corresponding number of credits required to send the message. The host fabric interface of the compute device may send out messages as credits become available and decrease the number of available credits based on the number of credits required to send a particular message. When an acknowledgement of receipt of a message is received, the number of credits required to send the corresponding message may be added back to an available credit pool.
    Type: Application
    Filed: March 29, 2017
    Publication date: October 4, 2018
    Inventors: James Dinan, Sayantan Sur, Mario Flajslik, Keith D. Underwood
  • Publication number: 20180267742
    Abstract: Technologies for fine-grained completion tracking of memory buffer accesses include a compute device. The compute device is to establish multiple counter pairs for a memory buffer. Each counter pair includes a locally managed offset and a completion counter. The compute device is also to receive a request from a remote compute device to access the memory buffer, assign one of the counter pairs to the request, advance the locally managed offset of the assigned counter pair by the amount of data to be read or written, and advance the completion counter of the assigned counter pair as the data is read from or written to the memory buffer. Other embodiments are also described and claimed.
    Type: Application
    Filed: March 20, 2017
    Publication date: September 20, 2018
    Inventors: James Dinan, Keith D. Underwood, Sayantan Sur, Charles A. Giefer, Mario Flajslik
  • Publication number: 20180183857
    Abstract: Particular embodiments described herein provide for an electronic device that can be configured to consolidate data from one or more processes on a node, where the node is part of a first collection of nodes, communicate the consolidated data to a second node, where the second node is in the first collection of nodes, where the first collection of nodes is part of a first group of a collection of nodes, and communicate the consolidated data to a third node, wherein the third node is in a second collection of nodes, where the second collection of nodes is part of the first group of the collection of nodes. In an example, the node is part of a multi-tiered dragonfly topology network and the data is part of a gather or scatter process.
    Type: Application
    Filed: December 23, 2016
    Publication date: June 28, 2018
    Applicant: Intel Corporation
    Inventors: Akhil Langer, Sayantan Sur
  • Patent number: 9811403
    Abstract: In one embodiment, an apparatus includes: a plurality of queues having a plurality of first entries to store receive information for a process; a master queue having a plurality of second entries to store wild card receive information, where redundant information of the plurality of second entries is to be included in a plurality of redundant entries of the plurality of queues; and a control circuit to match an incoming receive operation within one of the plurality of queues. Other embodiments are described and claimed.
    Type: Grant
    Filed: June 22, 2016
    Date of Patent: November 7, 2017
    Assignee: Intel Corporation
    Inventor: Sayantan Sur