Patents by Inventor Rajesh Sankaran

Rajesh Sankaran has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20210004334
    Abstract: Embodiment of this disclosure provides a mechanism to extend a workload instruction to include both untranslated and translated address space identifiers (ASIDs). In one embodiment, a processing device comprising a translation manager is provided. The translation manager receives a workload instruction from a guest application. The workload instruction comprises an untranslated (ASID) and a workload for an input/output (I/O) device. The untranslated ASID is translated to a translated ASID. The translated ASID inserted into a payload of the workload instruction. Thereupon, the payload is provided to a work queue of the I/O device to execute the workload based in part on at least one of: the translated ASID or the untranslated ASID.
    Type: Application
    Filed: March 28, 2018
    Publication date: January 7, 2021
    Inventors: Kun TIAN, Xiao ZHENG, Ashok RAJ, Sanjay KUMAR, Rajesh SANKARAN
  • Publication number: 20210004328
    Abstract: Methods and apparatus implementing Hardware/Software co-optimization to improve performance and energy for inter-VM communication for NFVs and other producer-consumer workloads. The apparatus include multi-core processors with multi-level cache hierarchies including and L1 and L2 cache for each core and a shared last-level cache (LLC). One or more machine-level instructions are provided for proactively demoting cachelines from lower cache levels to higher cache levels, including demoting cachelines from L1/L2 caches to an LLC. Techniques are also provided for implementing hardware/software co-optimization in multi-socket NUMA architecture system, wherein cachelines may be selectively demoted and pushed to an LLC in a remote socket. In addition, techniques are disclosure for implementing early snooping in multi-socket systems to reduce latency when accessing cachelines on remote sockets.
    Type: Application
    Filed: September 21, 2020
    Publication date: January 7, 2021
    Inventors: Ren Wang, Andrew J. Herdrich, Yen-cheng Liu, Herbert H. Hum, Jong Soo Park, Christopher J. Hughes, Namakkal N. Venkatesan, Adrian C. Moga, Aamer Jaleel, Zeshan A. Chishti, Mesut A. Ergin, Jr-shian Tsai, Alexander W. Min, Tsung-yuan C. Tai, Christian Maciocco, Rajesh Sankaran
  • Publication number: 20210004338
    Abstract: Methods and apparatus for PASID-based routing extension for Scalable IOV systems. The system may include a Central Processing Unit (CPU) operatively coupled to a scalable Input/Output Virtualization (IOV) device via an in-line device such as a smart controller or accelerator. A Control Process Address Space Identifier (C-PASID) associated with a first memory space is implemented in an Assignable Device Interface (ADI) for the IOV device. The ADI also implements a Data PASID (D-PASID) associated with a second memory space in which data are stored. The C-PASID is used to fetch a descriptor in the first memory space and the D-PASID is employed to fetch data in the second memory space. A hub embedded on the in-line device or implemented as a discrete device is used to steer memory access requests and/or fetches to the CPU or to the in-line device using the C-PASID and D-PASID.
    Type: Application
    Filed: September 21, 2020
    Publication date: January 7, 2021
    Inventors: Pratik Marolia, Sanjay Kumar, Rajesh Sankaran, Utkarsh Y. Kakaiya
  • Publication number: 20200409844
    Abstract: Disclosed embodiments relate to an asynchronous cache-flush engine to manage platform coherent and memory-side caches. In one example, a system includes multiple interconnected sockets each including a cache flush engine (CFE), a core, and an associated cache hierarchy including a plurality of caches, one of the CFEs designated as a master CFE in a master socket, the master CFE to: receive a request specifying an opcode and a range, the opcode calling for a cache flush, execute the request to cause writeback and, if indicated by the request, invalidation of modified cache lines in the master socket falling within the range, and communicate a request to any other, slave sockets in the system each having a slave CFE to cause writeback and, if indicated by the request, invalidation of modified cache lines in the slave socket falling within the range.
    Type: Application
    Filed: June 26, 2019
    Publication date: December 31, 2020
    Inventors: Vivekananthan SANJEEPAN, Gideon GERZON, Ishwar AGARWAL, Rajesh SANKARAN, Andy RUDOFF
  • Publication number: 20200409585
    Abstract: A system for tracking memory access patterns to be used in making data placement and migration policies. The system includes a processing unit and a system memory. The system memory includes a local memory and a remote memory, each of which having mapped thereon, a plurality of memory pages. Each of the plurality of memory pages corresponds to one or more physical addresses. A set of attributes for each memory page is stored in a physical attribute table (PAT). The PAT is looked up and the attributes updated when a memory access is detected. Attributes stored in the PAT are used to control the movement of memory pages between the local memory and the remote memory. When the attributes in the PAT indicate a remote memory page is being accessed frequently by the processing unit, the remote memory page is moved from the remote memory to the local memory.
    Type: Application
    Filed: June 29, 2019
    Publication date: December 31, 2020
    Applicant: Intel Corporation
    Inventors: David Koufaty, Rajesh Sankaran, Rupin Vakharwala
  • Publication number: 20200409733
    Abstract: Graphics processing systems and methods are described. A graphics processing apparatus may comprise one or more graphics processing engines, a memory, a memory management unit (MMU) including a GPU second level page table and GPU dirty bit tracking, and a provisioning agent to receive a request from a virtual machine monitor (VMM) to provision a subcluster of graphics processing apparatuses, the subcluster including a plurality of graphics processing engines from a plurality of graphics processing apparatuses connected using a scale-up fabric, provision the scale-up fabric to route data within the subcluster of graphics processing apparatuses, and provision a plurality of resources on the graphics processing apparatus for the subcluster based on the request from the VMM.
    Type: Application
    Filed: June 28, 2019
    Publication date: December 31, 2020
    Inventors: Rajesh SANKARAN, Bret TOLL, William RASH, Subramaniam MAIYURAN, Gang CHEN, Varghese GEORGE
  • Publication number: 20200409705
    Abstract: Disclosed embodiments relate to systems and methods to skip inconsequential matrix operations. In one example, a processor includes decode circuitry to decode an instruction having fields to specify an opcode and locations of first source, second source, and destination matrices, the opcode indicating that the processor is to multiply each element at row M and column K of the first source matrix with a corresponding element at row K and column N of the second source matrix, and accumulate a resulting product with previous contents of a corresponding element at row M and column N of the destination matrix, the processor to skip multiplications that, based on detected values of corresponding multiplicands, would generate inconsequential results, scheduling circuitry to schedule execution of the instruction; and execution circuitry to execute the instructions as per the opcode.
    Type: Application
    Filed: June 26, 2019
    Publication date: December 31, 2020
    Applicant: Intel Corporation
    Inventors: Elmoustapha OULD-AHMED-VALL, William RASH, Subramaniam MAIYURAN, Varghese GEORGE, Rajesh SANKARAN
  • Publication number: 20200341921
    Abstract: Embodiments of processors, methods, and systems for virtualizing interrupt prioritization and delivery are disclosed. In one embodiment, a processor includes instruction hardware and execution hardware. The instruction hardware is to receive a plurality of instructions, including a first instruction to transfer the processor from a root mode to a non-root mode for executing guest software in a virtual machine, wherein the processor is to return to the root mode upon the detection of any of a plurality of virtual machine exit events. The execution hardware is to execute the first instruction, execution of the first instruction to include determining a first virtual processor-priority value and storing the first virtual processor-priority value in a virtual copy of a processor-priority field, where the virtual copy of the processor-priority field is a virtual resource corresponding to a physical resource associated with an interrupt controller.
    Type: Application
    Filed: May 14, 2020
    Publication date: October 29, 2020
    Applicant: Intel Corporation
    Inventors: Gilbert Neiger, Rajesh Sankaran, Gideon Gerzon, Richard Uhlig, Sergiu Ghetie, Michael Neve de Mevergnies, Adil Karrar
  • Patent number: 10817441
    Abstract: The present disclosure is directed to systems and methods sharing memory circuitry between processor memory circuitry and accelerator memory circuitry in each of a plurality of peer-to-peer connected accelerator units. Each of the accelerator units includes virtual-to-physical address translation circuitry and migration circuitry. The virtual-to-physical address translation circuitry in each accelerator unit includes pages for each of at least some of the plurality of accelerator units. The migration circuitry causes the transfer of data between the processor memory circuitry and the accelerator memory circuitry in each of the plurality of accelerator circuits. The migration circuitry migrates and evicts data to/from accelerator memory circuitry based on statistical information associated with accesses to at least one of: processor memory circuitry or accelerator memory circuitry in one or more peer accelerator circuits.
    Type: Grant
    Filed: March 29, 2019
    Date of Patent: October 27, 2020
    Assignee: INTEL CORPORATION
    Inventors: Sanjay Kumar, David Koufaty, Philip Lantz, Pratik Marolia, Rajesh Sankaran, Koen Koning
  • Patent number: 10817425
    Abstract: Methods and apparatus implementing Hardware/Software co-optimization to improve performance and energy for inter-VM communication for NFVs and other producer-consumer workloads. The apparatus include multi-core processors with multi-level cache hierarchies including and L1 and L2 cache for each core and a shared last-level cache (LLC). One or more machine-level instructions are provided for proactively demoting cachelines from lower cache levels to higher cache levels, including demoting cachelines from L1/L2 caches to an LLC. Techniques are also provided for implementing hardware/software co-optimization in multi-socket NUMA architecture system, wherein cachelines may be selectively demoted and pushed to an LLC in a remote socket. In addition, techniques are disclosure for implementing early snooping in multi-socket systems to reduce latency when accessing cachelines on remote sockets.
    Type: Grant
    Filed: December 26, 2014
    Date of Patent: October 27, 2020
    Assignee: Intel Corporation
    Inventors: Ren Wang, Andrew J. Herdrich, Yen-cheng Liu, Herbert H. Hum, Jong Soo Park, Christopher J. Hughes, Namakkal N. Venkatesan, Adrian C. Moga, Aamer Jaleel, Zeshan A. Chishti, Mesut A. Ergin, Jr-shian Tsai, Alexander W. Min, Tsung-yuan C. Tai, Christian Maciocco, Rajesh Sankaran
  • Publication number: 20200319913
    Abstract: In one embodiment, an apparatus includes an input/output virtualization (IOV) device comprising: at least one function circuit to be shared by a plurality of virtual machines (VMs); and a plurality of assignable device interfaces (ADIs) coupled to the at least one function circuit, wherein each of the plurality of ADIs is to be associated with one of the plurality of VMs and comprises a first process address space identifier (PASID) field to store a first PASID to identify a descriptor queue stored in a host address space and a second PASID field to store a second PASID to identify a data buffer located in a VM address space. Other embodiments are described and claimed.
    Type: Application
    Filed: June 23, 2020
    Publication date: October 8, 2020
    Inventors: SANJAY K. KUMAR, RAJESH SANKARAN, UTKARSH Y. KAKAIYA, PRATIK M. MAROLIA
  • Publication number: 20200310993
    Abstract: The present disclosure is directed to systems and methods sharing memory circuitry between processor memory circuitry and accelerator memory circuitry in each of a plurality of peer-to-peer connected accelerator units. Each of the accelerator units includes physical-to-virtual address translation circuitry and migration circuitry. The physical-to-virtual address translation circuitry in each accelerator unit includes pages for each of at least some of the plurality of accelerator units. The migration circuitry causes the transfer of data between the processor memory circuitry and the accelerator memory circuitry in each of the plurality of accelerator circuits. The migration circuitry migrates and evicts data to/from accelerator memory circuitry based on statistical information associated with accesses to at least one of: processor memory circuitry or accelerator memory circuitry in one or more peer accelerator circuits.
    Type: Application
    Filed: March 29, 2019
    Publication date: October 1, 2020
    Applicant: INTEL CORPORATION
    Inventors: Sanjay Kumar, David Koufaty, Philip Lantz, Pratik Marolia, Rajesh Sankaran, Koen Koning
  • Publication number: 20200293365
    Abstract: Methods and apparatus relating to transactional page fault handling. In an example, an apparatus comprises a processor to divide an execution thread of a graphics workload into a set of transactions which are to be executed atomically, initiate the execution of the thread, and manage the execution of the thread according to one of a first protocol in response to a determination that a page fault occurred in the execution of a transaction, or a second protocol in response to a determination that a page fault did not occur in the execution of a transaction. Other embodiments are also disclosed and claimed.
    Type: Application
    Filed: March 15, 2019
    Publication date: September 17, 2020
    Applicant: Intel Corporation
    Inventors: NIRAN COORAY, WILLIAM B. SADLER, JONATHAN D. PEARCE, MARIAN ALIN PETRE, BEN ASHBAUGH, MURALI RAMADOSS, VIKRANTH VEMULAPALLI, ANKUR N SHAH, RAJESH SANKARAN
  • Patent number: 10761996
    Abstract: Examples include an apparatus which accesses secure pages in a trust domain using secure lookups in first and second sets of page tables. For example, one embodiment of the processor comprises: a decoder to decode a plurality of instructions including instructions related to a trusted domain; execution circuitry to execute a first one or more of the instructions to establish a first trusted domain using a first trusted domain key, the trusted domain key to be used to encrypt memory pages within the first trusted domain; and the execution circuitry to execute a second one or more of the instructions to associate a first process address space identifier (PASID) with the first trusted domain, the first PASID to uniquely identify a first execution context associated with the first trusted domain.
    Type: Grant
    Filed: September 28, 2018
    Date of Patent: September 1, 2020
    Assignee: Intel Corporation
    Inventors: Vedvyas Shanbhogue, Ravi Sahita, Rajesh Sankaran, Siddhartha Chhabra, Abhishek Basak, Krystof Zmudzinski, Rupin Vakharwala
  • Patent number: 10579382
    Abstract: An apparatus and method for scalable interrupt reporting.
    Type: Grant
    Filed: January 24, 2018
    Date of Patent: March 3, 2020
    Assignee: Intel Corporation
    Inventors: Rajesh Sankaran, Ankur Shah, Bryan White, Hema Nalluri, David Puffer, Murali Ramadoss, Altug Koker, Aditya Navale, Balaji Vembu
  • Publication number: 20200019515
    Abstract: Embodiments are directed to providing a secure address translation service. An embodiment of a system includes DRAM for storage of data, an IOMMU coupled to the DRAM, and a host-to-device link to couple the IOMMU with one or more devices and to operate as a translation agent on behalf of one or more devices in connection with memory operations relating to the DRAM, including receiving a translated request from a discrete device via the host-to-device link specifying a memory operation and a physical address within the DRAM pertaining to the memory operation, determining page access permissions assigned to a context of the discrete device for a physical page of the DRAM within which the physical address resides, allowing the memory operation to proceed when the page access permissions permit the memory operation, and blocking the memory operation when the page access permissions do not permit the memory operation.
    Type: Application
    Filed: September 25, 2019
    Publication date: January 16, 2020
    Applicant: Intel Corporation
    Inventors: David Koufaty, Rajesh Sankaran, Anna Trikalinou, Rupin Vakharwala
  • Publication number: 20200021540
    Abstract: In one embodiment, an input/output port includes a stateful transmit port having: a history storage to store a value corresponding to a transmit on change field of a prior data packet; a comparator to compare a transmit on change field of the data packet to the value stored in the history storage; and a selection circuit to output the data packet without the transmit on change field when the transmit on change field of the data packet matches the value. Other embodiments are described and claimed.
    Type: Application
    Filed: September 25, 2019
    Publication date: January 16, 2020
    Inventors: Pratik Marolia, Rajesh Sankaran, Ishwar Agarwal, Nitish Paliwal
  • Publication number: 20200012530
    Abstract: Techniques for scalable virtualization of an Input/Output (I/O) device are described. An electronic device composes a virtual device comprising one or more assignable interface (AI) instances of a plurality of AI instances of a hosting function exposed by the I/O device. The electronic device emulates device resources of the I/O device via the virtual device. The electronic device intercepts a request from the guest pertaining to the virtual device, and determines whether the request from the guest is a fast-path operation to be passed directly to one of the one or more AI instances of the I/O device or a slow-path operation that is to be at least partially serviced via software executed by the electronic device. For a slow-path operation, the electronic device services the request at least partially via the software executed by the electronic device.
    Type: Application
    Filed: March 12, 2019
    Publication date: January 9, 2020
    Inventors: Utkarsh Y. KAKAIYA, Rajesh SANKARAN, Sanjay KUMAR, Kun TIAN, Philip LANTZ
  • Publication number: 20190372914
    Abstract: Technologies for network interface controllers (NICs) include a compute sled and an accelerator sled in communication over a network. The accelerator sled configures a virtual switch endpoint associated with a remote direct memory access (RDMA) server instance that is associated with a field-programmable gate array (FPGA) of the accelerator sled. The accelerator sled updates local software defined networking (SDN) tables with a virtual tunnel associated with the virtual switch endpoint and a remote compute sled. A virtual switch of the accelerator sled switches virtual tunnel traffic from the remote compute sled to the RDMA server instance, which transfers data to or from the FPGA. The compute sled also updates a local SDN table with the virtual tunnel, and a virtual switch of the compute sled switches virtual tunnel traffic to or from the accelerator sled. Other embodiments are described and claimed.
    Type: Application
    Filed: August 14, 2019
    Publication date: December 5, 2019
    Inventors: Mrittika Ganguli, Sugesh Chandran, Parthasarathy Sarangam, Sujoy Sen, Susanne M. Balle, Rajesh Sankaran
  • Patent number: 10437616
    Abstract: Aspects of the embodiments are directed to systems and methods performed by a virtual shared work queue (VSWQ). The VSWQ can receive an enqueue command (ENQCMD/S) destined for a shared work queue of a peripheral device. The VSWQ can determine a value of a credit counter for the shared work queue, wherein a credit of the credit counter represents an availability of the shared work queue to accept the enqueue command. The VSWQ can respond to the enqueue command based on the value of the credit counter.
    Type: Grant
    Filed: December 31, 2016
    Date of Patent: October 8, 2019
    Assignee: Intel Corporation
    Inventors: Ishwar Agarwal, Rajesh Sankaran, Stephen Van Doren