Patents by Inventor Sujoy Sen

Sujoy Sen has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

ADDRESS TRANSLATION AT A TARGET NETWORK INTERFACE DEVICE

Publication number: 20210326270

Abstract: Examples described herein relate to a network interface device comprising circuitry to receive an access request with a target logical block address (LBA) and based on a target media of the access request storing at least one object, translate the target LBA to an address and access content in the target media based on the address. In some examples, translate the target LBA to an address includes access a translation entry that maps the LBA to one or more of: a physical address or a virtual address. In some examples, translate the target LBA to an address comprises: request a software defined storage (SDS) stack to provide a translation of the LBA to one or more of: a physical address or a virtual address and store the translation into a mapping table for access by the circuitry. In some examples, at least one entry that maps the LBA to one or more of: a physical address or a virtual address is received before receipt of an access request.

Type: Application

Filed: June 26, 2021

Publication date: October 21, 2021

Inventors: Yi ZOU, Arun RAGHUNATH, Scott D. PETERSON, Sujoy SEN, Yadong LI
MITIGATING POOLED MEMORY CACHE MISS LATENCY WITH CACHE MISS FAULTS AND TRANSACTION ABORTS

Publication number: 20210318961

Abstract: Methods and apparatus for mitigating pooled memory cache miss latency with cache miss faults and transaction aborts. A compute platform coupled to one or more tiers of memory, such as remote pooled memory in a disaggregated environment executes memory transactions to access objects that are stored in the one or more tiers. A determination is made to whether a copy of the object is in a local cache on the platform; if it is, the object is accessed from the local cache. If the object is not in the local cache, a transaction abort may be generated if enabled for the transactions. Optionally, a cache miss page fault is generated if the object is in a cacheable region of a memory tier, and the transaction abort is not enabled. Various mechanisms are provided to determine what to do in response to a cache miss page fault, such as determining addresses for cache lines to prefetch from a memory tier storing the object(s), determining how much data to prefetch, and determining whether to perform a bulk transfer.

Type: Application

Filed: June 23, 2021

Publication date: October 14, 2021

Inventors: Scott D. PETERSON, Sujoy SEN, Francesc GUIM BERNAT
LOW LATENCY REMOTING TO ACCELERATORS

Publication number: 20210318920

Abstract: A method of offloading performance of a workload includes receiving, on a first computing system acting as an initiator, a first function call from a caller, the first function call to be executed by an accelerator on a second computing system acting as a target, the first computing system coupled to the second computing system by a network; determining a type of the first function call; and generating a list of parameter values of the first function call.

Type: Application

Filed: June 25, 2021

Publication date: October 14, 2021

Applicant: Intel Corporation

Inventors: Pradeep Pappachan, Sujoy Sen, Joseph Grecco, Mukesh Gangadhar Bhavani Venkatesan, Reshma Lal
TECHNOLOGIES FOR DYNAMICALLY MANAGING RESOURCES IN DISAGGREGATED ACCELERATORS

Publication number: 20210314245

Abstract: Technologies for dynamically managing resources in disaggregated accelerators include an accelerator. The accelerator includes acceleration circuitry with multiple logic portions, each capable of executing a different workload. Additionally, the accelerator includes communication circuitry to receive a workload to be executed by a logic portion of the accelerator and a dynamic resource allocation logic unit to identify a resource utilization threshold associated with one or more shared resources of the accelerator to be used by a logic portion in the execution of the workload, limit, as a function of the resource utilization threshold, the utilization of the one or more shared resources by the logic portion as the logic portion executes the workload, and subsequently adjust the resource utilization threshold as the workload is executed. Other embodiments are also described and claimed.

Type: Application

Filed: April 20, 2021

Publication date: October 7, 2021

Inventors: Francesc GUIM BERNAT, Susanne M. BALLE, Rahul KHANNA, Sujoy SEN, Karthik KUMAR
Technologies for providing accelerated functions as a service in a disaggregated architecture

Patent number: 11137922

Abstract: Technologies for providing accelerated functions as a service in a disaggregated architecture include a compute device that is to receive a request for an accelerated task. The task is associated with a kernel usable by an accelerator sled communicatively coupled to the compute device to execute the task. The compute device is further to determine, in response to the request and with a database indicative of kernels and associated accelerator sleds, an accelerator sled that includes an accelerator device configured with the kernel associated with the request. Additionally, the compute device is to assign the task to the determined accelerator sled for execution. Other embodiments are also described and claimed.

Type: Grant

Filed: September 29, 2017

Date of Patent: October 5, 2021

Assignee: Intel Corporation

Inventors: Francesc Guim Bernat, Evan Custodio, Susanne M. Balle, Joe Grecco, Henry Mitchel, Rahul Khanna, Slawomir Putyrski, Sujoy Sen, Paul Dormitzer
TECHNOLOGIES FOR DIVIDING WORK ACROSS ACCELERATOR DEVICES

Publication number: 20210271403

Abstract: Technologies for dividing work across one or more accelerator devices include a compute device. The compute device is to determine a configuration of each of multiple accelerator devices of the compute device, receive a job to be accelerated from a requester device remote from the compute device, and divide the job into multiple tasks for a parallelization of the multiple tasks among the one or more accelerator devices, as a function of a job analysis of the job and the configuration of each accelerator device. The compute engine is further to schedule the tasks to the one or more accelerator devices based on the job analysis and execute the tasks on the one or more accelerator devices for the parallelization of the multiple tasks to obtain an output of the job.

Type: Application

Filed: May 14, 2021

Publication date: September 2, 2021

Inventors: Susanne M. Balle, Francesc Guim Bernat, Slawomir Putyrski, Joe Grecco, Henry Mitchel, Evan CUSTODIO, Rahul Khanna, Sujoy Sen
Technologies for lockless, scalable, and adaptive storage quality of service

Patent number: 11106361

Abstract: Technologies for quality of service (QoS) management include a computing device having a physical storage volume and multiple processor cores. A management thread reads I/O counters that are each associated with a logical volume and a processor core. The logical volumes are backed by the physical storage volume. The management thread configures stop bits as a function of the I/O counters and multiple QoS parameters. Each stop bit is associated with a logical volume and a processor core. The QoS parameters include minimum guaranteed bandwidth and optional maximum allowed bandwidth for each logical volume. A worker thread reads the stop bit associated with a logical volume and a processor core, accesses the logical volume if the stop bit is not set, and updates the I/O counter associated with the logical volume and the processor core in response to accessing the logical volume. Other embodiments are described and claimed.

Type: Grant

Filed: May 20, 2019

Date of Patent: August 31, 2021

Assignee: Intel Corporation

Inventors: Sujoy Sen, Siddhartha Kumar Panda, Jayaraj Puthenpurackal Rajappan, Kunal Sablok, Ramkumar Venkatachalam
Technologies for dividing work across accelerator devices

Patent number: 11029870

Abstract: Technologies for dividing work across one or more accelerator devices include a compute device. The compute device is to determine a configuration of each of multiple accelerator devices of the compute device, receive a job to be accelerated from a requester device remote from the compute device, and divide the job into multiple tasks for a parallelization of the multiple tasks among the one or more accelerator devices, as a function of a job analysis of the job and the configuration of each accelerator device. The compute engine is further to schedule the tasks to the one or more accelerator devices based on the job analysis and execute the tasks on the one or more accelerator devices for the parallelization of the multiple tasks to obtain an output of the job.

Type: Grant

Filed: September 30, 2017

Date of Patent: June 8, 2021

Assignee: Intel Corporation

Inventors: Susanne M. Balle, Francesc Guim Bernat, Slawomir Putyrski, Joe Grecco, Henry Mitchel, Evan Custodio, Rahul Khanna, Sujoy Sen
NETWORK INTERFACE CONTROLLER WITH EVICTION CACHE

Publication number: 20210149812

Abstract: Examples described herein includes an apparatus comprising: a network interface configured to: receive a request to copy data from a local memory to a remote memory; based on a configuration that the network interface is to manage a cache store the data into the cache and record that the data is stored in the cache. In some examples, store the data in the cache comprises store most recently evicted data from the local memory into the cache. In some examples, the network interface is to store data evicted from the local memory that is not stored into the cache into one or more remote memories.

Type: Application

Filed: November 24, 2020

Publication date: May 20, 2021

Inventors: Sujoy SEN, Durgesh SRIVASTAVA, Thomas E. WILLIS, Bassam N. COURY, Marcelo CINTRA
TECHNOLOGIES FOR PROVIDING ACCELERATED FUNCTIONS AS A SERVICE IN A DISAGGREGATED ARCHITECTURE

Publication number: 20210141552

Abstract: Technologies for providing accelerated functions as a service in a disaggregated architecture include a compute device that is to receive a request for an accelerated task. The task is associated with a kernel usable by an accelerator sled communicatively coupled to the compute device to execute the task. The compute device is further to determine, in response to the request and with a database indicative of kernels and associated accelerator sleds, an accelerator sled that includes an accelerator device configured with the kernel associated with the request. Additionally, the compute device is to assign the task to the determined accelerator sled for execution. Other embodiments are also described and claimed.

Type: Application

Filed: December 17, 2020

Publication date: May 13, 2021

Applicant: Intel Corporation

Inventors: Francesc Guim Bernat, Evan Custodio, Susanne M. Balle, Joe Grecco, Henry Mitchel, Rahul Khanna, Slawomir Putyrski, Sujoy Sen, Paul Dormitzer
DISAGGREGATED COMPUTING FOR DISTRIBUTED CONFIDENTIAL COMPUTING ENVIRONMENT

Publication number: 20210117246

Abstract: An apparatus to facilitate disaggregated computing for a distributed confidential computing environment is disclosed. The apparatus includes one or more processors to facilitate receiving a manifest corresponding to graph nodes representing regions of memory of a remote client machine, the graph nodes corresponding to a command buffer and to associated data structures and kernels of the command buffer used to initialize a hardware accelerator and execute the kernels, and the manifest indicating a destination memory location of each of the graph nodes and dependencies of each of the graph nodes; identifying, based on the manifest, the command buffer and the associated data structures to copy to the host memory; identifying, based on the manifest, the kernels to copy to local memory of the hardware accelerator; and patching addresses in the command buffer copied to the host memory with updated addresses of corresponding locations in the host memory.

Type: Application

Filed: December 23, 2020

Publication date: April 22, 2021

Applicant: Intel Corporation

Inventors: Reshma Lal, Pradeep Pappachan, Luis Kida, Soham Jayesh Desai, Sujoy Sen, Selvakumar Panneer, Robert Sharp
Technologies for dynamically managing resources in disaggregated accelerators

Patent number: 10986005

Abstract: Technologies for dynamically managing resources in disaggregated accelerators include an accelerator. The accelerator includes acceleration circuitry with multiple logic portions, each capable of executing a different workload. Additionally, the accelerator includes communication circuitry to receive a workload to be executed by a logic portion of the accelerator and a dynamic resource allocation logic unit to identify a resource utilization threshold associated with one or more shared resources of the accelerator to be used by a logic portion in the execution of the workload, limit, as a function of the resource utilization threshold, the utilization of the one or more shared resources by the logic portion as the logic portion executes the workload, and subsequently adjust the resource utilization threshold as the workload is executed. Other embodiments are also described and claimed.

Type: Grant

Filed: June 30, 2017

Date of Patent: April 20, 2021

Assignee: Intel Corporation

Inventors: Francesc Guim Bernat, Susanne M. Balle, Rahul Khanna, Sujoy Sen, Karthik Kumar
DIRECT MEMORY ACCESS (DMA) ENGINE WITH NETWORK INTERFACE CAPABILITIES

Publication number: 20210105207

Abstract: Examples described herein include one or more processors; a network interface; and a direct memory access (DMA) engine communicatively coupled to the one or more processors. In some examples, the DMA engine is to receive a DMA data access request and based on an address in the DMA data access request corresponding to a remote memory device, the DMA engine is to cause the network interface to generate at least one packet for transmission to the remote memory device. In some examples, the DMA data access request includes a source address, a destination address, and a length. In some examples, if the source address corresponds to a local memory device and the destination address corresponds to a remote memory device, the DMA engine is to cause the network interface to generate at least one packet for transmission to the remote memory device, wherein the at least one packet includes data stored at the source address.

Type: Application

Filed: November 24, 2020

Publication date: April 8, 2021

Inventors: Sujoy SEN, Durgesh SRIVASTAVA, Thomas E. WILLIS, Bassam N. COURY, Marcelo CINTRA
END-TO-END DATA PLANE OFFLOADING FOR DISTRIBUTED STORAGE USING PROTOCOL HARDWARE AND PISA DEVICES

Publication number: 20210103403

Abstract: Methods and apparatus for end-to-end data plane offloading for distributed storage using protocol hardware and Protocol Independent Switch Architecture (PISA) devices. Hardware-based data plane forwarding is implemented in compute and storage switches that comprise smart server switches running software executing in a kernel and user space. The compute switch is coupled to one or more compute servers/nodes and the storage server is coupled to one or more storage servers or storage arrays. The hardware-based data plane forwarding facilitates an end-to-end data plane between the computer server(s) and storage server(s)/array(s) that is offloaded to hardware. In one example the software comprises Ceph components used to implement control plane operations in connection with hardware offloaded data plane operations, and storage traffic employs the NVMe-oF protocol and the kernels include NVMe-oF modules. In one aspect the hardware-based data plane forwarding is implemented using programmable P4switch chips.

Type: Application

Filed: November 9, 2020

Publication date: April 8, 2021

Inventors: Shaopeng He, Yadong Li, Ziye Yang, Changpeng Liu, Haitao Kang, Cunming Liang, Gang Cao, Scott Peterson, Sujoy Sen, Yi Zou, Arun Raghunath
Technologies for remote networked accelerators

Patent number: 10970246

Abstract: Technologies for network interface controllers (NICs) include a computing device having a NIC coupled to a root FPGA via an I/O link. The root FPGA is further coupled to multiple worker FPGAs by a serial link with each worker FPGA. The NIC may receive a remote direct memory access (RDMA) message from a remote host and send the RDMA message to the root FPGA via the I/O link. The root FPGA determines a target FPGA based on a memory address of the RDMA message. Each FPGA is associated with a part of a unified address space. If the target FPGA is a worker FPGA, the root FPGA sends the RDMA message to the worker FPGA via the corresponding serial link, and the worker FPGA processes the RDMA message. If the root FPGA is the target, the root FPGA may process the RDMA message. Other embodiments are described and claimed.

Type: Grant

Filed: May 3, 2019

Date of Patent: April 6, 2021

Assignee: Intel Corporation

Inventors: Paul H. Dormitzer, Susanne M. Balle, Sujoy Sen, Evan Custodio
ACCELERATING MULTI-NODE PERFORMANCE OF MACHINE LEARNING WORKLOADS

Publication number: 20210092069

Abstract: Examples described herein relate to a network interface and at least one processor that is to indicate whether data is associated with a machine learning operation or non-machine learning operation to manage traversal of the data through one or more network elements to a destination network element and cause the network interface to include an indication in a packet of whether the packet includes machine learning data or non-machine learning data. In some examples, the indication in a packet of whether the packet includes machine learning data or non-machine learning data comprises a priority level and wherein one or more higher priority levels identify machine learning data. In some examples, for machine learning data, the priority level is based on whether the data is associated with inference, training, or re-training operations. In some examples, for machine learning data, the priority level is based on whether the data is associated with real-time or time insensitive inference operations.

Type: Application

Filed: December 10, 2020

Publication date: March 25, 2021

Inventors: Malek MUSLEH, Anupama KURPAD, Roberto PENARANDA CEBRIAN, Allister ALEMANIA, Pedro YEBENES SEGURA, Curt E. BRUNS, Robert SOUTHWORTH, Sujoy SEN
SHARED MEMORY

Publication number: 20210081312

Abstract: Examples described herein includes a network interface controller comprising a memory interface and a network interface, the network interface controller configurable to provide access to local memory and remote memory to a requester, wherein the network interface controller is configured with an amount of memory of different memory access speeds for allocation to one or more requesters. In some examples, the network interface controller is to grant or deny a memory allocation request from a requester based on a configuration of an amount of memory for different memory access speeds for allocation to the requester. In some examples, the network interface controller is to grant or deny a memory access request from a requester based on a configuration of memory allocated to the requester. In some examples, the network interface controller is to regulate quality of service of memory access requests from requesters.

Type: Application

Filed: November 24, 2020

Publication date: March 18, 2021

Inventors: Bassam N. COURY, Sujoy SEN, Thomas E. WILLIS, Durgesh SRIVASTAVA
PACKET MULTI-CAST FOR MEMORY POOL REPLICATION

Publication number: 20210075633

Abstract: Examples described herein relate to a network interface. In some examples, the network interface is to access data designated for transmission in at least one packet to multiple memory nodes by inclusion of an multicast identifier of a memory node group and transmit the at least one packet to a destination network device, wherein the multicast identifier of the memory node group in the at least one packet is to cause an intermediate network device to multicast the packet to multiple memory nodes. In some examples, a memory node comprises a memory pool that includes one or more of: volatile memory, non-volatile memory, or persistent memory. In some examples, the intermediate network device comprises a switch configured to determine network addresses of memory nodes associated with the multicast identifier of the memory node group.

Type: Application

Filed: November 24, 2020

Publication date: March 11, 2021

Inventors: Sujoy SEN, Thomas E. WILLIS, Durgesh SRIVASTAVA, Marcelo CINTRA, Bassam N. COURY
TECHNOLOGIES FOR ESTABLISHING COMMUNICATION CHANNEL BETWEEN ACCELERATOR DEVICE KERNELS

Publication number: 20210073161

Abstract: Technologies for providing I/O channel abstraction for accelerator device kernels include an accelerator device comprising circuitry to obtain availability data indicative of an availability of one or more accelerator device kernels in a system, including one or more physical communication paths to each accelerator device kernel. The circuitry is also configured to determine whether to establish a logical communication path between a kernel of the present accelerator device and another accelerator device kernel and establish, in response to a determination to establish the logical communication path as a function of the obtained availability data, the logical communication path between the kernel of the present accelerator device and the other accelerator device kernel.

Type: Application

Filed: November 3, 2020

Publication date: March 11, 2021

Inventors: Susanne M. BALLE, Evan CUSTODIO, Francesc GUIM BERNAT, Sujoy SEN, Slawomir PUTYRSKI, Paul DORMITZER, Joseph GRECCO
PAGE-BASED REMOTE MEMORY ACCESS USING SYSTEM MEMORY INTERFACE NETWORK DEVICE

Publication number: 20210073151

Abstract: Examples described herein and includes at least one processor and a direct memory access (DMA) device. In some examples, the DMA device is to: access a command from a memory region allocated to receive commands for execution by the DMA device, wherein the command is to access content from a local memory device or remote memory node. In some examples, the DMA device is to: determine if the content is stored in a local memory device or a remote memory node based on a configuration that indicates whether a source address refers to a memory address associated with the local memory device or the remote memory node and whether a destination address refers to a memory address associated with the local memory device or the remote memory node. In some examples, the DMA device is to: copy the content from a local memory device or copy the content to the local memory device using a memory interface.

Type: Application

Filed: November 24, 2020

Publication date: March 11, 2021

Inventors: Sujoy SEN, Durgesh SRIVASTAVA, Thomas E. WILLIS, Bassam N. COURY, Marcelo CINTRA

prev 1 2 3 4 5 6 7 … next