Patents by Inventor Sujoy Sen

Sujoy Sen has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20210326270
    Abstract: Examples described herein relate to a network interface device comprising circuitry to receive an access request with a target logical block address (LBA) and based on a target media of the access request storing at least one object, translate the target LBA to an address and access content in the target media based on the address. In some examples, translate the target LBA to an address includes access a translation entry that maps the LBA to one or more of: a physical address or a virtual address. In some examples, translate the target LBA to an address comprises: request a software defined storage (SDS) stack to provide a translation of the LBA to one or more of: a physical address or a virtual address and store the translation into a mapping table for access by the circuitry. In some examples, at least one entry that maps the LBA to one or more of: a physical address or a virtual address is received before receipt of an access request.
    Type: Application
    Filed: June 26, 2021
    Publication date: October 21, 2021
    Inventors: Yi ZOU, Arun RAGHUNATH, Scott D. PETERSON, Sujoy SEN, Yadong LI
  • Publication number: 20210318961
    Abstract: Methods and apparatus for mitigating pooled memory cache miss latency with cache miss faults and transaction aborts. A compute platform coupled to one or more tiers of memory, such as remote pooled memory in a disaggregated environment executes memory transactions to access objects that are stored in the one or more tiers. A determination is made to whether a copy of the object is in a local cache on the platform; if it is, the object is accessed from the local cache. If the object is not in the local cache, a transaction abort may be generated if enabled for the transactions. Optionally, a cache miss page fault is generated if the object is in a cacheable region of a memory tier, and the transaction abort is not enabled. Various mechanisms are provided to determine what to do in response to a cache miss page fault, such as determining addresses for cache lines to prefetch from a memory tier storing the object(s), determining how much data to prefetch, and determining whether to perform a bulk transfer.
    Type: Application
    Filed: June 23, 2021
    Publication date: October 14, 2021
    Inventors: Scott D. PETERSON, Sujoy SEN, Francesc GUIM BERNAT
  • Publication number: 20210318920
    Abstract: A method of offloading performance of a workload includes receiving, on a first computing system acting as an initiator, a first function call from a caller, the first function call to be executed by an accelerator on a second computing system acting as a target, the first computing system coupled to the second computing system by a network; determining a type of the first function call; and generating a list of parameter values of the first function call.
    Type: Application
    Filed: June 25, 2021
    Publication date: October 14, 2021
    Applicant: Intel Corporation
    Inventors: Pradeep Pappachan, Sujoy Sen, Joseph Grecco, Mukesh Gangadhar Bhavani Venkatesan, Reshma Lal
  • Publication number: 20210314245
    Abstract: Technologies for dynamically managing resources in disaggregated accelerators include an accelerator. The accelerator includes acceleration circuitry with multiple logic portions, each capable of executing a different workload. Additionally, the accelerator includes communication circuitry to receive a workload to be executed by a logic portion of the accelerator and a dynamic resource allocation logic unit to identify a resource utilization threshold associated with one or more shared resources of the accelerator to be used by a logic portion in the execution of the workload, limit, as a function of the resource utilization threshold, the utilization of the one or more shared resources by the logic portion as the logic portion executes the workload, and subsequently adjust the resource utilization threshold as the workload is executed. Other embodiments are also described and claimed.
    Type: Application
    Filed: April 20, 2021
    Publication date: October 7, 2021
    Inventors: Francesc GUIM BERNAT, Susanne M. BALLE, Rahul KHANNA, Sujoy SEN, Karthik KUMAR
  • Patent number: 11137922
    Abstract: Technologies for providing accelerated functions as a service in a disaggregated architecture include a compute device that is to receive a request for an accelerated task. The task is associated with a kernel usable by an accelerator sled communicatively coupled to the compute device to execute the task. The compute device is further to determine, in response to the request and with a database indicative of kernels and associated accelerator sleds, an accelerator sled that includes an accelerator device configured with the kernel associated with the request. Additionally, the compute device is to assign the task to the determined accelerator sled for execution. Other embodiments are also described and claimed.
    Type: Grant
    Filed: September 29, 2017
    Date of Patent: October 5, 2021
    Assignee: Intel Corporation
    Inventors: Francesc Guim Bernat, Evan Custodio, Susanne M. Balle, Joe Grecco, Henry Mitchel, Rahul Khanna, Slawomir Putyrski, Sujoy Sen, Paul Dormitzer
  • Publication number: 20210271403
    Abstract: Technologies for dividing work across one or more accelerator devices include a compute device. The compute device is to determine a configuration of each of multiple accelerator devices of the compute device, receive a job to be accelerated from a requester device remote from the compute device, and divide the job into multiple tasks for a parallelization of the multiple tasks among the one or more accelerator devices, as a function of a job analysis of the job and the configuration of each accelerator device. The compute engine is further to schedule the tasks to the one or more accelerator devices based on the job analysis and execute the tasks on the one or more accelerator devices for the parallelization of the multiple tasks to obtain an output of the job.
    Type: Application
    Filed: May 14, 2021
    Publication date: September 2, 2021
    Inventors: Susanne M. Balle, Francesc Guim Bernat, Slawomir Putyrski, Joe Grecco, Henry Mitchel, Evan CUSTODIO, Rahul Khanna, Sujoy Sen
  • Patent number: 11106361
    Abstract: Technologies for quality of service (QoS) management include a computing device having a physical storage volume and multiple processor cores. A management thread reads I/O counters that are each associated with a logical volume and a processor core. The logical volumes are backed by the physical storage volume. The management thread configures stop bits as a function of the I/O counters and multiple QoS parameters. Each stop bit is associated with a logical volume and a processor core. The QoS parameters include minimum guaranteed bandwidth and optional maximum allowed bandwidth for each logical volume. A worker thread reads the stop bit associated with a logical volume and a processor core, accesses the logical volume if the stop bit is not set, and updates the I/O counter associated with the logical volume and the processor core in response to accessing the logical volume. Other embodiments are described and claimed.
    Type: Grant
    Filed: May 20, 2019
    Date of Patent: August 31, 2021
    Assignee: Intel Corporation
    Inventors: Sujoy Sen, Siddhartha Kumar Panda, Jayaraj Puthenpurackal Rajappan, Kunal Sablok, Ramkumar Venkatachalam
  • Patent number: 11029870
    Abstract: Technologies for dividing work across one or more accelerator devices include a compute device. The compute device is to determine a configuration of each of multiple accelerator devices of the compute device, receive a job to be accelerated from a requester device remote from the compute device, and divide the job into multiple tasks for a parallelization of the multiple tasks among the one or more accelerator devices, as a function of a job analysis of the job and the configuration of each accelerator device. The compute engine is further to schedule the tasks to the one or more accelerator devices based on the job analysis and execute the tasks on the one or more accelerator devices for the parallelization of the multiple tasks to obtain an output of the job.
    Type: Grant
    Filed: September 30, 2017
    Date of Patent: June 8, 2021
    Assignee: Intel Corporation
    Inventors: Susanne M. Balle, Francesc Guim Bernat, Slawomir Putyrski, Joe Grecco, Henry Mitchel, Evan Custodio, Rahul Khanna, Sujoy Sen
  • Publication number: 20210149812
    Abstract: Examples described herein includes an apparatus comprising: a network interface configured to: receive a request to copy data from a local memory to a remote memory; based on a configuration that the network interface is to manage a cache store the data into the cache and record that the data is stored in the cache. In some examples, store the data in the cache comprises store most recently evicted data from the local memory into the cache. In some examples, the network interface is to store data evicted from the local memory that is not stored into the cache into one or more remote memories.
    Type: Application
    Filed: November 24, 2020
    Publication date: May 20, 2021
    Inventors: Sujoy SEN, Durgesh SRIVASTAVA, Thomas E. WILLIS, Bassam N. COURY, Marcelo CINTRA
  • Publication number: 20210141552
    Abstract: Technologies for providing accelerated functions as a service in a disaggregated architecture include a compute device that is to receive a request for an accelerated task. The task is associated with a kernel usable by an accelerator sled communicatively coupled to the compute device to execute the task. The compute device is further to determine, in response to the request and with a database indicative of kernels and associated accelerator sleds, an accelerator sled that includes an accelerator device configured with the kernel associated with the request. Additionally, the compute device is to assign the task to the determined accelerator sled for execution. Other embodiments are also described and claimed.
    Type: Application
    Filed: December 17, 2020
    Publication date: May 13, 2021
    Applicant: Intel Corporation
    Inventors: Francesc Guim Bernat, Evan Custodio, Susanne M. Balle, Joe Grecco, Henry Mitchel, Rahul Khanna, Slawomir Putyrski, Sujoy Sen, Paul Dormitzer
  • Publication number: 20210117246
    Abstract: An apparatus to facilitate disaggregated computing for a distributed confidential computing environment is disclosed. The apparatus includes one or more processors to facilitate receiving a manifest corresponding to graph nodes representing regions of memory of a remote client machine, the graph nodes corresponding to a command buffer and to associated data structures and kernels of the command buffer used to initialize a hardware accelerator and execute the kernels, and the manifest indicating a destination memory location of each of the graph nodes and dependencies of each of the graph nodes; identifying, based on the manifest, the command buffer and the associated data structures to copy to the host memory; identifying, based on the manifest, the kernels to copy to local memory of the hardware accelerator; and patching addresses in the command buffer copied to the host memory with updated addresses of corresponding locations in the host memory.
    Type: Application
    Filed: December 23, 2020
    Publication date: April 22, 2021
    Applicant: Intel Corporation
    Inventors: Reshma Lal, Pradeep Pappachan, Luis Kida, Soham Jayesh Desai, Sujoy Sen, Selvakumar Panneer, Robert Sharp
  • Patent number: 10986005
    Abstract: Technologies for dynamically managing resources in disaggregated accelerators include an accelerator. The accelerator includes acceleration circuitry with multiple logic portions, each capable of executing a different workload. Additionally, the accelerator includes communication circuitry to receive a workload to be executed by a logic portion of the accelerator and a dynamic resource allocation logic unit to identify a resource utilization threshold associated with one or more shared resources of the accelerator to be used by a logic portion in the execution of the workload, limit, as a function of the resource utilization threshold, the utilization of the one or more shared resources by the logic portion as the logic portion executes the workload, and subsequently adjust the resource utilization threshold as the workload is executed. Other embodiments are also described and claimed.
    Type: Grant
    Filed: June 30, 2017
    Date of Patent: April 20, 2021
    Assignee: Intel Corporation
    Inventors: Francesc Guim Bernat, Susanne M. Balle, Rahul Khanna, Sujoy Sen, Karthik Kumar
  • Publication number: 20210105207
    Abstract: Examples described herein include one or more processors; a network interface; and a direct memory access (DMA) engine communicatively coupled to the one or more processors. In some examples, the DMA engine is to receive a DMA data access request and based on an address in the DMA data access request corresponding to a remote memory device, the DMA engine is to cause the network interface to generate at least one packet for transmission to the remote memory device. In some examples, the DMA data access request includes a source address, a destination address, and a length. In some examples, if the source address corresponds to a local memory device and the destination address corresponds to a remote memory device, the DMA engine is to cause the network interface to generate at least one packet for transmission to the remote memory device, wherein the at least one packet includes data stored at the source address.
    Type: Application
    Filed: November 24, 2020
    Publication date: April 8, 2021
    Inventors: Sujoy SEN, Durgesh SRIVASTAVA, Thomas E. WILLIS, Bassam N. COURY, Marcelo CINTRA
  • Publication number: 20210103403
    Abstract: Methods and apparatus for end-to-end data plane offloading for distributed storage using protocol hardware and Protocol Independent Switch Architecture (PISA) devices. Hardware-based data plane forwarding is implemented in compute and storage switches that comprise smart server switches running software executing in a kernel and user space. The compute switch is coupled to one or more compute servers/nodes and the storage server is coupled to one or more storage servers or storage arrays. The hardware-based data plane forwarding facilitates an end-to-end data plane between the computer server(s) and storage server(s)/array(s) that is offloaded to hardware. In one example the software comprises Ceph components used to implement control plane operations in connection with hardware offloaded data plane operations, and storage traffic employs the NVMe-oF protocol and the kernels include NVMe-oF modules. In one aspect the hardware-based data plane forwarding is implemented using programmable P4switch chips.
    Type: Application
    Filed: November 9, 2020
    Publication date: April 8, 2021
    Inventors: Shaopeng He, Yadong Li, Ziye Yang, Changpeng Liu, Haitao Kang, Cunming Liang, Gang Cao, Scott Peterson, Sujoy Sen, Yi Zou, Arun Raghunath
  • Patent number: 10970246
    Abstract: Technologies for network interface controllers (NICs) include a computing device having a NIC coupled to a root FPGA via an I/O link. The root FPGA is further coupled to multiple worker FPGAs by a serial link with each worker FPGA. The NIC may receive a remote direct memory access (RDMA) message from a remote host and send the RDMA message to the root FPGA via the I/O link. The root FPGA determines a target FPGA based on a memory address of the RDMA message. Each FPGA is associated with a part of a unified address space. If the target FPGA is a worker FPGA, the root FPGA sends the RDMA message to the worker FPGA via the corresponding serial link, and the worker FPGA processes the RDMA message. If the root FPGA is the target, the root FPGA may process the RDMA message. Other embodiments are described and claimed.
    Type: Grant
    Filed: May 3, 2019
    Date of Patent: April 6, 2021
    Assignee: Intel Corporation
    Inventors: Paul H. Dormitzer, Susanne M. Balle, Sujoy Sen, Evan Custodio
  • Publication number: 20210092069
    Abstract: Examples described herein relate to a network interface and at least one processor that is to indicate whether data is associated with a machine learning operation or non-machine learning operation to manage traversal of the data through one or more network elements to a destination network element and cause the network interface to include an indication in a packet of whether the packet includes machine learning data or non-machine learning data. In some examples, the indication in a packet of whether the packet includes machine learning data or non-machine learning data comprises a priority level and wherein one or more higher priority levels identify machine learning data. In some examples, for machine learning data, the priority level is based on whether the data is associated with inference, training, or re-training operations. In some examples, for machine learning data, the priority level is based on whether the data is associated with real-time or time insensitive inference operations.
    Type: Application
    Filed: December 10, 2020
    Publication date: March 25, 2021
    Inventors: Malek MUSLEH, Anupama KURPAD, Roberto PENARANDA CEBRIAN, Allister ALEMANIA, Pedro YEBENES SEGURA, Curt E. BRUNS, Robert SOUTHWORTH, Sujoy SEN
  • Publication number: 20210081312
    Abstract: Examples described herein includes a network interface controller comprising a memory interface and a network interface, the network interface controller configurable to provide access to local memory and remote memory to a requester, wherein the network interface controller is configured with an amount of memory of different memory access speeds for allocation to one or more requesters. In some examples, the network interface controller is to grant or deny a memory allocation request from a requester based on a configuration of an amount of memory for different memory access speeds for allocation to the requester. In some examples, the network interface controller is to grant or deny a memory access request from a requester based on a configuration of memory allocated to the requester. In some examples, the network interface controller is to regulate quality of service of memory access requests from requesters.
    Type: Application
    Filed: November 24, 2020
    Publication date: March 18, 2021
    Inventors: Bassam N. COURY, Sujoy SEN, Thomas E. WILLIS, Durgesh SRIVASTAVA
  • Publication number: 20210075633
    Abstract: Examples described herein relate to a network interface. In some examples, the network interface is to access data designated for transmission in at least one packet to multiple memory nodes by inclusion of an multicast identifier of a memory node group and transmit the at least one packet to a destination network device, wherein the multicast identifier of the memory node group in the at least one packet is to cause an intermediate network device to multicast the packet to multiple memory nodes. In some examples, a memory node comprises a memory pool that includes one or more of: volatile memory, non-volatile memory, or persistent memory. In some examples, the intermediate network device comprises a switch configured to determine network addresses of memory nodes associated with the multicast identifier of the memory node group.
    Type: Application
    Filed: November 24, 2020
    Publication date: March 11, 2021
    Inventors: Sujoy SEN, Thomas E. WILLIS, Durgesh SRIVASTAVA, Marcelo CINTRA, Bassam N. COURY
  • Publication number: 20210073161
    Abstract: Technologies for providing I/O channel abstraction for accelerator device kernels include an accelerator device comprising circuitry to obtain availability data indicative of an availability of one or more accelerator device kernels in a system, including one or more physical communication paths to each accelerator device kernel. The circuitry is also configured to determine whether to establish a logical communication path between a kernel of the present accelerator device and another accelerator device kernel and establish, in response to a determination to establish the logical communication path as a function of the obtained availability data, the logical communication path between the kernel of the present accelerator device and the other accelerator device kernel.
    Type: Application
    Filed: November 3, 2020
    Publication date: March 11, 2021
    Inventors: Susanne M. BALLE, Evan CUSTODIO, Francesc GUIM BERNAT, Sujoy SEN, Slawomir PUTYRSKI, Paul DORMITZER, Joseph GRECCO
  • Publication number: 20210073151
    Abstract: Examples described herein and includes at least one processor and a direct memory access (DMA) device. In some examples, the DMA device is to: access a command from a memory region allocated to receive commands for execution by the DMA device, wherein the command is to access content from a local memory device or remote memory node. In some examples, the DMA device is to: determine if the content is stored in a local memory device or a remote memory node based on a configuration that indicates whether a source address refers to a memory address associated with the local memory device or the remote memory node and whether a destination address refers to a memory address associated with the local memory device or the remote memory node. In some examples, the DMA device is to: copy the content from a local memory device or copy the content to the local memory device using a memory interface.
    Type: Application
    Filed: November 24, 2020
    Publication date: March 11, 2021
    Inventors: Sujoy SEN, Durgesh SRIVASTAVA, Thomas E. WILLIS, Bassam N. COURY, Marcelo CINTRA