Patents by Inventor Rajesh Sankaran

Rajesh Sankaran has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11201838
    Abstract: In one embodiment, an input/output port includes a stateful transmit port having: a history storage to store a value corresponding to a transmit on change field of a prior data packet; a comparator to compare a transmit on change field of the data packet to the value stored in the history storage; and a selection circuit to output the data packet without the transmit on change field when the transmit on change field of the data packet matches the value. Other embodiments are described and claimed.
    Type: Grant
    Filed: September 25, 2019
    Date of Patent: December 14, 2021
    Assignee: Intel Corporation
    Inventors: Pratik Marolia, Rajesh Sankaran, Ishwar Agarwal, Nitish Paliwal
  • Patent number: 11169929
    Abstract: A processing device includes a core to execute instructions, and memory management circuitry coupled to, memory, the core and an I/O device that supports page faults. The memory management circuitry includes an express invalidations circuitry, and a page translation permission circuitry. The memory management circuitry is to, while the core is executing the instructions, receive a command to pause communication between the I/O device and the memory. In response to receiving the command to pause the communication, modify permissions of page translations by the page translation permission circuitry and transmit an invalidation request, by the express invalidations circuitry to the I/O device, to cause cached page translations in the I/O device to be invalidated.
    Type: Grant
    Filed: April 20, 2018
    Date of Patent: November 9, 2021
    Assignee: INTEL CORPORATION
    Inventors: Rupin Vakharwala, Amin Firoozshahian, Stephen Van Doren, Rajesh Sankaran, Mahesh Madhav, Omid Azizi, Andreas Kleen, Mahesh Maddury, Ashok Raj
  • Publication number: 20210342182
    Abstract: In one embodiment, a data mover accelerator is to receive, from a first agent having a first address space and a first process address space identifier (PASID) to identify the first address space, a first job descriptor comprising a second PASID selector to specify a second PASID to identify a second address space. In response to the first job descriptor, the data mover accelerator is to securely access the first address space and the second address space. Other embodiments are described and claimed.
    Type: Application
    Filed: June 23, 2020
    Publication date: November 4, 2021
    Inventors: SANJAY K. KUMAR, PHILIP LANTZ, RAJESH SANKARAN, NARAYAN RANGANATHAN, SAURABH GAYEN, DAVID A. KOUFATY, UTKARSH Y. KAKAIYA
  • Patent number: 11080088
    Abstract: A processor includes a processor core, a processor cache to store reporting data structures including a queue structure, and an interrupt posting circuit coupled to the processor core and the processing cache. The interrupt posting circuit receives an interrupt request directed to a virtual processor (VP) of a virtual machine (VM) executed by the processor core. The VM is managed by a virtual machine monitor (VMM) executed by the processor core. The interrupt posting circuit determines the VP is in an inactive state and records the interrupt request in a first posted data structure allocated by the VMM for the VP in main memory coupled to the processor. The interrupt posting circuit updates location information stored in the reporting data structures based on recording the interrupt request in the first posted data structure to generate updated location information that identifies a location of the interrupt request.
    Type: Grant
    Filed: December 19, 2018
    Date of Patent: August 3, 2021
    Assignee: Intel Corporation
    Inventors: Arumugam Thiyagarajah, Rajesh Sankaran, Dharmendra Thakkar
  • Patent number: 11055147
    Abstract: Techniques for scalable virtualization of an Input/Output (I/O) device are described. An electronic device composes a virtual device comprising one or more assignable interface (AI) instances of a plurality of AI instances of a hosting function exposed by the I/O device. The electronic device emulates device resources of the I/O device via the virtual device. The electronic device intercepts a request from the guest pertaining to the virtual device, and determines whether the request from the guest is a fast-path operation to be passed directly to one of the one or more AI instances of the I/O device or a slow-path operation that is to be at least partially serviced via software executed by the electronic device. For a slow-path operation, the electronic device services the request at least partially via the software executed by the electronic device.
    Type: Grant
    Filed: March 12, 2019
    Date of Patent: July 6, 2021
    Assignee: Intel Corporation
    Inventors: Utkarsh Y. Kakaiya, Rajesh Sankaran, Sanjay Kumar, Kun Tian, Philip Lantz
  • Publication number: 20210200545
    Abstract: An apparatus and method for hybrid software-hardware coherency.
    Type: Application
    Filed: December 27, 2019
    Publication date: July 1, 2021
    Inventors: Pratik Marolia, Rajesh Sankaran
  • Patent number: 11016894
    Abstract: Techniques and apparatus to manage cache coherency for different types of cache memory are described. In one embodiment, an apparatus may include at least one processor, at least one cache memory, and logic, at least a portion comprised in hardware, the logic to receive a memory operation request associated with the at least one cache memory, determine a cache status of the memory operation request, the cache status indicating one of a giant cache status or a small cache status, perform the memory operation request via a small cache coherence process responsive to the cache status being a small cache status, and perform the memory operation request via a giant cache coherence process responsive to the cache status being a small cache status. Other embodiments are described and claimed.
    Type: Grant
    Filed: August 7, 2017
    Date of Patent: May 25, 2021
    Assignee: INTEL CORPORATION
    Inventors: Rajesh Sankaran, Ishwar Agarwal, Stephen Van Doren
  • Publication number: 20210089466
    Abstract: Examples include an apparatus which accesses secure pages in a trust domain using secure lookups in first and second sets of page tables. For example, one embodiment of the processor comprises: a decoder to decode a plurality of instructions including instructions related to a trusted domain; execution circuitry to execute a first one or more of the instructions to establish a first trusted domain using a first trusted domain key, the trusted domain key to be used to encrypt memory pages within the first trusted domain; and the execution circuitry to execute a second one or more of the instructions to associate a first process address space identifier (PASID) with the first trusted domain, the first PASID to uniquely identify a first execution context associated with the first trusted domain.
    Type: Application
    Filed: August 5, 2020
    Publication date: March 25, 2021
    Inventors: Vedvyas SHANBHOGUE, Ravi SAHITA, Rajesh SANKARAN, Siddhartha CHHABRA, Abhishek BASAK, Krystof ZMUDZINSKI, Rupin VAKHARWALA
  • Publication number: 20210089316
    Abstract: Disclosed embodiments relate to deep learning implementations using systolic arrays and fused operations. In one example, a processor includes fetch and decode circuitry to fetch and decode an instruction having fields to specify an opcode and locations of a destination and N source matrices, the opcode indicating the processor is to load the N source matrices from memory, perform N convolutions on the N source matrices to generate N feature maps, and store results of the N convolutions in registers to be passed to an activation layer, wherein the processor is to perform the N convolutions and the activation layer with at most one memory load of each of the N source matrices. The processor further includes scheduling circuitry to schedule execution of the instruction and execution circuitry to execute the instruction as per the opcode.
    Type: Application
    Filed: September 25, 2019
    Publication date: March 25, 2021
    Applicant: Intel Corporation
    Inventors: William RASH, Subramaniam MAIYURAN, Varghese GEORGE, Bret L. TOLL, Rajesh SANKARAN, Robert S. CHAPPELL, Supratim PAL, Alexander F. HEINECKE, Elmoustapha OULD-AHMED-VALL, Gang CHEN
  • Publication number: 20210064525
    Abstract: A processor includes a hardware input/output (I/O) memory management unit (IOMMU) and a core, which executes an instruction to intercept a payload from a virtual machine (VM). The payload contains a guest bus device function (BDF) identifier, a guest address space identifier (ASID), and a guest address range. The core accesses, within a virtual machine control structure stored in memory, pointers to a first set of translation tables and a second set of translation tables. The core traverses the first set of translation tables to translate the guest BDF identifier to a host BDF identifier and traverses the second set of translation tables to translate the guest ASID to a host ASID. The core stores the host BDF identifier and the host ASID in the payload and submits, to the hardware IOMMU, an administrative command containing the payload to perform invalidation of the guest address range.
    Type: Application
    Filed: January 2, 2018
    Publication date: March 4, 2021
    Applicant: Intel Corporation
    Inventors: Kun Tian, Rajesh Sankaran, Sanjay Kumar, Ashok Raj
  • Publication number: 20210042254
    Abstract: Methods and apparatus for an accelerator controller hub (ACH). The ACH may be a stand-alone component or integrated on-die or on package in an accelerator such as a GPU. The ACH may include a host device link (HDL) interface, one or more Peripheral Component Interconnect Express (PCIe) interfaces, one or more high performance accelerator link (HPAL) interfaces, and a router, operatively coupled to each of the HDL interface, the one or more PCIe interfaces, and the one or more HPAL interfaces. The HDL interface is configured to be coupled to a host CPU via an HDL link and the one or more HPAL interfaces are configured to be coupled to one or more HPALs that are used to access high performance accelerator fabrics (HPAFs) such as NVlink fabrics and CCIX (Cache Coherent Interconnect for Accelerators) fabrics. Platforms including ACHs or accelerators with integrated ACHs support RDMA transfers using RDMA semantics to enable transfers between accelerator memory on initiators and targets without CPU involvement.
    Type: Application
    Filed: October 28, 2020
    Publication date: February 11, 2021
    Inventors: Pratik Marolia, Andrew Herdrich, Rajesh Sankaran, Rahul Pal, David Puffer, Sayantan Sur, Ajaya Durg
  • Publication number: 20210004328
    Abstract: Methods and apparatus implementing Hardware/Software co-optimization to improve performance and energy for inter-VM communication for NFVs and other producer-consumer workloads. The apparatus include multi-core processors with multi-level cache hierarchies including and L1 and L2 cache for each core and a shared last-level cache (LLC). One or more machine-level instructions are provided for proactively demoting cachelines from lower cache levels to higher cache levels, including demoting cachelines from L1/L2 caches to an LLC. Techniques are also provided for implementing hardware/software co-optimization in multi-socket NUMA architecture system, wherein cachelines may be selectively demoted and pushed to an LLC in a remote socket. In addition, techniques are disclosure for implementing early snooping in multi-socket systems to reduce latency when accessing cachelines on remote sockets.
    Type: Application
    Filed: September 21, 2020
    Publication date: January 7, 2021
    Inventors: Ren Wang, Andrew J. Herdrich, Yen-cheng Liu, Herbert H. Hum, Jong Soo Park, Christopher J. Hughes, Namakkal N. Venkatesan, Adrian C. Moga, Aamer Jaleel, Zeshan A. Chishti, Mesut A. Ergin, Jr-shian Tsai, Alexander W. Min, Tsung-yuan C. Tai, Christian Maciocco, Rajesh Sankaran
  • Publication number: 20210004338
    Abstract: Methods and apparatus for PASID-based routing extension for Scalable IOV systems. The system may include a Central Processing Unit (CPU) operatively coupled to a scalable Input/Output Virtualization (IOV) device via an in-line device such as a smart controller or accelerator. A Control Process Address Space Identifier (C-PASID) associated with a first memory space is implemented in an Assignable Device Interface (ADI) for the IOV device. The ADI also implements a Data PASID (D-PASID) associated with a second memory space in which data are stored. The C-PASID is used to fetch a descriptor in the first memory space and the D-PASID is employed to fetch data in the second memory space. A hub embedded on the in-line device or implemented as a discrete device is used to steer memory access requests and/or fetches to the CPU or to the in-line device using the C-PASID and D-PASID.
    Type: Application
    Filed: September 21, 2020
    Publication date: January 7, 2021
    Inventors: Pratik Marolia, Sanjay Kumar, Rajesh Sankaran, Utkarsh Y. Kakaiya
  • Publication number: 20210004334
    Abstract: Embodiment of this disclosure provides a mechanism to extend a workload instruction to include both untranslated and translated address space identifiers (ASIDs). In one embodiment, a processing device comprising a translation manager is provided. The translation manager receives a workload instruction from a guest application. The workload instruction comprises an untranslated (ASID) and a workload for an input/output (I/O) device. The untranslated ASID is translated to a translated ASID. The translated ASID inserted into a payload of the workload instruction. Thereupon, the payload is provided to a work queue of the I/O device to execute the workload based in part on at least one of: the translated ASID or the untranslated ASID.
    Type: Application
    Filed: March 28, 2018
    Publication date: January 7, 2021
    Inventors: Kun TIAN, Xiao ZHENG, Ashok RAJ, Sanjay KUMAR, Rajesh SANKARAN
  • Publication number: 20200409705
    Abstract: Disclosed embodiments relate to systems and methods to skip inconsequential matrix operations. In one example, a processor includes decode circuitry to decode an instruction having fields to specify an opcode and locations of first source, second source, and destination matrices, the opcode indicating that the processor is to multiply each element at row M and column K of the first source matrix with a corresponding element at row K and column N of the second source matrix, and accumulate a resulting product with previous contents of a corresponding element at row M and column N of the destination matrix, the processor to skip multiplications that, based on detected values of corresponding multiplicands, would generate inconsequential results, scheduling circuitry to schedule execution of the instruction; and execution circuitry to execute the instructions as per the opcode.
    Type: Application
    Filed: June 26, 2019
    Publication date: December 31, 2020
    Applicant: Intel Corporation
    Inventors: Elmoustapha OULD-AHMED-VALL, William RASH, Subramaniam MAIYURAN, Varghese GEORGE, Rajesh SANKARAN
  • Publication number: 20200409733
    Abstract: Graphics processing systems and methods are described. A graphics processing apparatus may comprise one or more graphics processing engines, a memory, a memory management unit (MMU) including a GPU second level page table and GPU dirty bit tracking, and a provisioning agent to receive a request from a virtual machine monitor (VMM) to provision a subcluster of graphics processing apparatuses, the subcluster including a plurality of graphics processing engines from a plurality of graphics processing apparatuses connected using a scale-up fabric, provision the scale-up fabric to route data within the subcluster of graphics processing apparatuses, and provision a plurality of resources on the graphics processing apparatus for the subcluster based on the request from the VMM.
    Type: Application
    Filed: June 28, 2019
    Publication date: December 31, 2020
    Inventors: Rajesh SANKARAN, Bret TOLL, William RASH, Subramaniam MAIYURAN, Gang CHEN, Varghese GEORGE
  • Publication number: 20200409844
    Abstract: Disclosed embodiments relate to an asynchronous cache-flush engine to manage platform coherent and memory-side caches. In one example, a system includes multiple interconnected sockets each including a cache flush engine (CFE), a core, and an associated cache hierarchy including a plurality of caches, one of the CFEs designated as a master CFE in a master socket, the master CFE to: receive a request specifying an opcode and a range, the opcode calling for a cache flush, execute the request to cause writeback and, if indicated by the request, invalidation of modified cache lines in the master socket falling within the range, and communicate a request to any other, slave sockets in the system each having a slave CFE to cause writeback and, if indicated by the request, invalidation of modified cache lines in the slave socket falling within the range.
    Type: Application
    Filed: June 26, 2019
    Publication date: December 31, 2020
    Inventors: Vivekananthan SANJEEPAN, Gideon GERZON, Ishwar AGARWAL, Rajesh SANKARAN, Andy RUDOFF
  • Publication number: 20200409585
    Abstract: A system for tracking memory access patterns to be used in making data placement and migration policies. The system includes a processing unit and a system memory. The system memory includes a local memory and a remote memory, each of which having mapped thereon, a plurality of memory pages. Each of the plurality of memory pages corresponds to one or more physical addresses. A set of attributes for each memory page is stored in a physical attribute table (PAT). The PAT is looked up and the attributes updated when a memory access is detected. Attributes stored in the PAT are used to control the movement of memory pages between the local memory and the remote memory. When the attributes in the PAT indicate a remote memory page is being accessed frequently by the processing unit, the remote memory page is moved from the remote memory to the local memory.
    Type: Application
    Filed: June 29, 2019
    Publication date: December 31, 2020
    Applicant: Intel Corporation
    Inventors: David Koufaty, Rajesh Sankaran, Rupin Vakharwala
  • Publication number: 20200341921
    Abstract: Embodiments of processors, methods, and systems for virtualizing interrupt prioritization and delivery are disclosed. In one embodiment, a processor includes instruction hardware and execution hardware. The instruction hardware is to receive a plurality of instructions, including a first instruction to transfer the processor from a root mode to a non-root mode for executing guest software in a virtual machine, wherein the processor is to return to the root mode upon the detection of any of a plurality of virtual machine exit events. The execution hardware is to execute the first instruction, execution of the first instruction to include determining a first virtual processor-priority value and storing the first virtual processor-priority value in a virtual copy of a processor-priority field, where the virtual copy of the processor-priority field is a virtual resource corresponding to a physical resource associated with an interrupt controller.
    Type: Application
    Filed: May 14, 2020
    Publication date: October 29, 2020
    Applicant: Intel Corporation
    Inventors: Gilbert Neiger, Rajesh Sankaran, Gideon Gerzon, Richard Uhlig, Sergiu Ghetie, Michael Neve de Mevergnies, Adil Karrar
  • Patent number: 10817441
    Abstract: The present disclosure is directed to systems and methods sharing memory circuitry between processor memory circuitry and accelerator memory circuitry in each of a plurality of peer-to-peer connected accelerator units. Each of the accelerator units includes virtual-to-physical address translation circuitry and migration circuitry. The virtual-to-physical address translation circuitry in each accelerator unit includes pages for each of at least some of the plurality of accelerator units. The migration circuitry causes the transfer of data between the processor memory circuitry and the accelerator memory circuitry in each of the plurality of accelerator circuits. The migration circuitry migrates and evicts data to/from accelerator memory circuitry based on statistical information associated with accesses to at least one of: processor memory circuitry or accelerator memory circuitry in one or more peer accelerator circuits.
    Type: Grant
    Filed: March 29, 2019
    Date of Patent: October 27, 2020
    Assignee: INTEL CORPORATION
    Inventors: Sanjay Kumar, David Koufaty, Philip Lantz, Pratik Marolia, Rajesh Sankaran, Koen Koning