Patents by Inventor Rajesh Sankaran
Rajesh Sankaran has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11016894Abstract: Techniques and apparatus to manage cache coherency for different types of cache memory are described. In one embodiment, an apparatus may include at least one processor, at least one cache memory, and logic, at least a portion comprised in hardware, the logic to receive a memory operation request associated with the at least one cache memory, determine a cache status of the memory operation request, the cache status indicating one of a giant cache status or a small cache status, perform the memory operation request via a small cache coherence process responsive to the cache status being a small cache status, and perform the memory operation request via a giant cache coherence process responsive to the cache status being a small cache status. Other embodiments are described and claimed.Type: GrantFiled: August 7, 2017Date of Patent: May 25, 2021Assignee: INTEL CORPORATIONInventors: Rajesh Sankaran, Ishwar Agarwal, Stephen Van Doren
-
Publication number: 20210089466Abstract: Examples include an apparatus which accesses secure pages in a trust domain using secure lookups in first and second sets of page tables. For example, one embodiment of the processor comprises: a decoder to decode a plurality of instructions including instructions related to a trusted domain; execution circuitry to execute a first one or more of the instructions to establish a first trusted domain using a first trusted domain key, the trusted domain key to be used to encrypt memory pages within the first trusted domain; and the execution circuitry to execute a second one or more of the instructions to associate a first process address space identifier (PASID) with the first trusted domain, the first PASID to uniquely identify a first execution context associated with the first trusted domain.Type: ApplicationFiled: August 5, 2020Publication date: March 25, 2021Inventors: Vedvyas SHANBHOGUE, Ravi SAHITA, Rajesh SANKARAN, Siddhartha CHHABRA, Abhishek BASAK, Krystof ZMUDZINSKI, Rupin VAKHARWALA
-
Publication number: 20210089316Abstract: Disclosed embodiments relate to deep learning implementations using systolic arrays and fused operations. In one example, a processor includes fetch and decode circuitry to fetch and decode an instruction having fields to specify an opcode and locations of a destination and N source matrices, the opcode indicating the processor is to load the N source matrices from memory, perform N convolutions on the N source matrices to generate N feature maps, and store results of the N convolutions in registers to be passed to an activation layer, wherein the processor is to perform the N convolutions and the activation layer with at most one memory load of each of the N source matrices. The processor further includes scheduling circuitry to schedule execution of the instruction and execution circuitry to execute the instruction as per the opcode.Type: ApplicationFiled: September 25, 2019Publication date: March 25, 2021Applicant: Intel CorporationInventors: William RASH, Subramaniam MAIYURAN, Varghese GEORGE, Bret L. TOLL, Rajesh SANKARAN, Robert S. CHAPPELL, Supratim PAL, Alexander F. HEINECKE, Elmoustapha OULD-AHMED-VALL, Gang CHEN
-
Publication number: 20210064525Abstract: A processor includes a hardware input/output (I/O) memory management unit (IOMMU) and a core, which executes an instruction to intercept a payload from a virtual machine (VM). The payload contains a guest bus device function (BDF) identifier, a guest address space identifier (ASID), and a guest address range. The core accesses, within a virtual machine control structure stored in memory, pointers to a first set of translation tables and a second set of translation tables. The core traverses the first set of translation tables to translate the guest BDF identifier to a host BDF identifier and traverses the second set of translation tables to translate the guest ASID to a host ASID. The core stores the host BDF identifier and the host ASID in the payload and submits, to the hardware IOMMU, an administrative command containing the payload to perform invalidation of the guest address range.Type: ApplicationFiled: January 2, 2018Publication date: March 4, 2021Applicant: Intel CorporationInventors: Kun Tian, Rajesh Sankaran, Sanjay Kumar, Ashok Raj
-
Publication number: 20210042254Abstract: Methods and apparatus for an accelerator controller hub (ACH). The ACH may be a stand-alone component or integrated on-die or on package in an accelerator such as a GPU. The ACH may include a host device link (HDL) interface, one or more Peripheral Component Interconnect Express (PCIe) interfaces, one or more high performance accelerator link (HPAL) interfaces, and a router, operatively coupled to each of the HDL interface, the one or more PCIe interfaces, and the one or more HPAL interfaces. The HDL interface is configured to be coupled to a host CPU via an HDL link and the one or more HPAL interfaces are configured to be coupled to one or more HPALs that are used to access high performance accelerator fabrics (HPAFs) such as NVlink fabrics and CCIX (Cache Coherent Interconnect for Accelerators) fabrics. Platforms including ACHs or accelerators with integrated ACHs support RDMA transfers using RDMA semantics to enable transfers between accelerator memory on initiators and targets without CPU involvement.Type: ApplicationFiled: October 28, 2020Publication date: February 11, 2021Inventors: Pratik Marolia, Andrew Herdrich, Rajesh Sankaran, Rahul Pal, David Puffer, Sayantan Sur, Ajaya Durg
-
Publication number: 20210004338Abstract: Methods and apparatus for PASID-based routing extension for Scalable IOV systems. The system may include a Central Processing Unit (CPU) operatively coupled to a scalable Input/Output Virtualization (IOV) device via an in-line device such as a smart controller or accelerator. A Control Process Address Space Identifier (C-PASID) associated with a first memory space is implemented in an Assignable Device Interface (ADI) for the IOV device. The ADI also implements a Data PASID (D-PASID) associated with a second memory space in which data are stored. The C-PASID is used to fetch a descriptor in the first memory space and the D-PASID is employed to fetch data in the second memory space. A hub embedded on the in-line device or implemented as a discrete device is used to steer memory access requests and/or fetches to the CPU or to the in-line device using the C-PASID and D-PASID.Type: ApplicationFiled: September 21, 2020Publication date: January 7, 2021Inventors: Pratik Marolia, Sanjay Kumar, Rajesh Sankaran, Utkarsh Y. Kakaiya
-
Publication number: 20210004328Abstract: Methods and apparatus implementing Hardware/Software co-optimization to improve performance and energy for inter-VM communication for NFVs and other producer-consumer workloads. The apparatus include multi-core processors with multi-level cache hierarchies including and L1 and L2 cache for each core and a shared last-level cache (LLC). One or more machine-level instructions are provided for proactively demoting cachelines from lower cache levels to higher cache levels, including demoting cachelines from L1/L2 caches to an LLC. Techniques are also provided for implementing hardware/software co-optimization in multi-socket NUMA architecture system, wherein cachelines may be selectively demoted and pushed to an LLC in a remote socket. In addition, techniques are disclosure for implementing early snooping in multi-socket systems to reduce latency when accessing cachelines on remote sockets.Type: ApplicationFiled: September 21, 2020Publication date: January 7, 2021Inventors: Ren Wang, Andrew J. Herdrich, Yen-cheng Liu, Herbert H. Hum, Jong Soo Park, Christopher J. Hughes, Namakkal N. Venkatesan, Adrian C. Moga, Aamer Jaleel, Zeshan A. Chishti, Mesut A. Ergin, Jr-shian Tsai, Alexander W. Min, Tsung-yuan C. Tai, Christian Maciocco, Rajesh Sankaran
-
Publication number: 20210004334Abstract: Embodiment of this disclosure provides a mechanism to extend a workload instruction to include both untranslated and translated address space identifiers (ASIDs). In one embodiment, a processing device comprising a translation manager is provided. The translation manager receives a workload instruction from a guest application. The workload instruction comprises an untranslated (ASID) and a workload for an input/output (I/O) device. The untranslated ASID is translated to a translated ASID. The translated ASID inserted into a payload of the workload instruction. Thereupon, the payload is provided to a work queue of the I/O device to execute the workload based in part on at least one of: the translated ASID or the untranslated ASID.Type: ApplicationFiled: March 28, 2018Publication date: January 7, 2021Inventors: Kun TIAN, Xiao ZHENG, Ashok RAJ, Sanjay KUMAR, Rajesh SANKARAN
-
Publication number: 20200409585Abstract: A system for tracking memory access patterns to be used in making data placement and migration policies. The system includes a processing unit and a system memory. The system memory includes a local memory and a remote memory, each of which having mapped thereon, a plurality of memory pages. Each of the plurality of memory pages corresponds to one or more physical addresses. A set of attributes for each memory page is stored in a physical attribute table (PAT). The PAT is looked up and the attributes updated when a memory access is detected. Attributes stored in the PAT are used to control the movement of memory pages between the local memory and the remote memory. When the attributes in the PAT indicate a remote memory page is being accessed frequently by the processing unit, the remote memory page is moved from the remote memory to the local memory.Type: ApplicationFiled: June 29, 2019Publication date: December 31, 2020Applicant: Intel CorporationInventors: David Koufaty, Rajesh Sankaran, Rupin Vakharwala
-
Publication number: 20200409733Abstract: Graphics processing systems and methods are described. A graphics processing apparatus may comprise one or more graphics processing engines, a memory, a memory management unit (MMU) including a GPU second level page table and GPU dirty bit tracking, and a provisioning agent to receive a request from a virtual machine monitor (VMM) to provision a subcluster of graphics processing apparatuses, the subcluster including a plurality of graphics processing engines from a plurality of graphics processing apparatuses connected using a scale-up fabric, provision the scale-up fabric to route data within the subcluster of graphics processing apparatuses, and provision a plurality of resources on the graphics processing apparatus for the subcluster based on the request from the VMM.Type: ApplicationFiled: June 28, 2019Publication date: December 31, 2020Inventors: Rajesh SANKARAN, Bret TOLL, William RASH, Subramaniam MAIYURAN, Gang CHEN, Varghese GEORGE
-
Publication number: 20200409844Abstract: Disclosed embodiments relate to an asynchronous cache-flush engine to manage platform coherent and memory-side caches. In one example, a system includes multiple interconnected sockets each including a cache flush engine (CFE), a core, and an associated cache hierarchy including a plurality of caches, one of the CFEs designated as a master CFE in a master socket, the master CFE to: receive a request specifying an opcode and a range, the opcode calling for a cache flush, execute the request to cause writeback and, if indicated by the request, invalidation of modified cache lines in the master socket falling within the range, and communicate a request to any other, slave sockets in the system each having a slave CFE to cause writeback and, if indicated by the request, invalidation of modified cache lines in the slave socket falling within the range.Type: ApplicationFiled: June 26, 2019Publication date: December 31, 2020Inventors: Vivekananthan SANJEEPAN, Gideon GERZON, Ishwar AGARWAL, Rajesh SANKARAN, Andy RUDOFF
-
Publication number: 20200409705Abstract: Disclosed embodiments relate to systems and methods to skip inconsequential matrix operations. In one example, a processor includes decode circuitry to decode an instruction having fields to specify an opcode and locations of first source, second source, and destination matrices, the opcode indicating that the processor is to multiply each element at row M and column K of the first source matrix with a corresponding element at row K and column N of the second source matrix, and accumulate a resulting product with previous contents of a corresponding element at row M and column N of the destination matrix, the processor to skip multiplications that, based on detected values of corresponding multiplicands, would generate inconsequential results, scheduling circuitry to schedule execution of the instruction; and execution circuitry to execute the instructions as per the opcode.Type: ApplicationFiled: June 26, 2019Publication date: December 31, 2020Applicant: Intel CorporationInventors: Elmoustapha OULD-AHMED-VALL, William RASH, Subramaniam MAIYURAN, Varghese GEORGE, Rajesh SANKARAN
-
Publication number: 20200341921Abstract: Embodiments of processors, methods, and systems for virtualizing interrupt prioritization and delivery are disclosed. In one embodiment, a processor includes instruction hardware and execution hardware. The instruction hardware is to receive a plurality of instructions, including a first instruction to transfer the processor from a root mode to a non-root mode for executing guest software in a virtual machine, wherein the processor is to return to the root mode upon the detection of any of a plurality of virtual machine exit events. The execution hardware is to execute the first instruction, execution of the first instruction to include determining a first virtual processor-priority value and storing the first virtual processor-priority value in a virtual copy of a processor-priority field, where the virtual copy of the processor-priority field is a virtual resource corresponding to a physical resource associated with an interrupt controller.Type: ApplicationFiled: May 14, 2020Publication date: October 29, 2020Applicant: Intel CorporationInventors: Gilbert Neiger, Rajesh Sankaran, Gideon Gerzon, Richard Uhlig, Sergiu Ghetie, Michael Neve de Mevergnies, Adil Karrar
-
Patent number: 10817425Abstract: Methods and apparatus implementing Hardware/Software co-optimization to improve performance and energy for inter-VM communication for NFVs and other producer-consumer workloads. The apparatus include multi-core processors with multi-level cache hierarchies including and L1 and L2 cache for each core and a shared last-level cache (LLC). One or more machine-level instructions are provided for proactively demoting cachelines from lower cache levels to higher cache levels, including demoting cachelines from L1/L2 caches to an LLC. Techniques are also provided for implementing hardware/software co-optimization in multi-socket NUMA architecture system, wherein cachelines may be selectively demoted and pushed to an LLC in a remote socket. In addition, techniques are disclosure for implementing early snooping in multi-socket systems to reduce latency when accessing cachelines on remote sockets.Type: GrantFiled: December 26, 2014Date of Patent: October 27, 2020Assignee: Intel CorporationInventors: Ren Wang, Andrew J. Herdrich, Yen-cheng Liu, Herbert H. Hum, Jong Soo Park, Christopher J. Hughes, Namakkal N. Venkatesan, Adrian C. Moga, Aamer Jaleel, Zeshan A. Chishti, Mesut A. Ergin, Jr-shian Tsai, Alexander W. Min, Tsung-yuan C. Tai, Christian Maciocco, Rajesh Sankaran
-
Patent number: 10817441Abstract: The present disclosure is directed to systems and methods sharing memory circuitry between processor memory circuitry and accelerator memory circuitry in each of a plurality of peer-to-peer connected accelerator units. Each of the accelerator units includes virtual-to-physical address translation circuitry and migration circuitry. The virtual-to-physical address translation circuitry in each accelerator unit includes pages for each of at least some of the plurality of accelerator units. The migration circuitry causes the transfer of data between the processor memory circuitry and the accelerator memory circuitry in each of the plurality of accelerator circuits. The migration circuitry migrates and evicts data to/from accelerator memory circuitry based on statistical information associated with accesses to at least one of: processor memory circuitry or accelerator memory circuitry in one or more peer accelerator circuits.Type: GrantFiled: March 29, 2019Date of Patent: October 27, 2020Assignee: INTEL CORPORATIONInventors: Sanjay Kumar, David Koufaty, Philip Lantz, Pratik Marolia, Rajesh Sankaran, Koen Koning
-
Publication number: 20200319913Abstract: In one embodiment, an apparatus includes an input/output virtualization (IOV) device comprising: at least one function circuit to be shared by a plurality of virtual machines (VMs); and a plurality of assignable device interfaces (ADIs) coupled to the at least one function circuit, wherein each of the plurality of ADIs is to be associated with one of the plurality of VMs and comprises a first process address space identifier (PASID) field to store a first PASID to identify a descriptor queue stored in a host address space and a second PASID field to store a second PASID to identify a data buffer located in a VM address space. Other embodiments are described and claimed.Type: ApplicationFiled: June 23, 2020Publication date: October 8, 2020Inventors: SANJAY K. KUMAR, RAJESH SANKARAN, UTKARSH Y. KAKAIYA, PRATIK M. MAROLIA
-
Publication number: 20200310993Abstract: The present disclosure is directed to systems and methods sharing memory circuitry between processor memory circuitry and accelerator memory circuitry in each of a plurality of peer-to-peer connected accelerator units. Each of the accelerator units includes physical-to-virtual address translation circuitry and migration circuitry. The physical-to-virtual address translation circuitry in each accelerator unit includes pages for each of at least some of the plurality of accelerator units. The migration circuitry causes the transfer of data between the processor memory circuitry and the accelerator memory circuitry in each of the plurality of accelerator circuits. The migration circuitry migrates and evicts data to/from accelerator memory circuitry based on statistical information associated with accesses to at least one of: processor memory circuitry or accelerator memory circuitry in one or more peer accelerator circuits.Type: ApplicationFiled: March 29, 2019Publication date: October 1, 2020Applicant: INTEL CORPORATIONInventors: Sanjay Kumar, David Koufaty, Philip Lantz, Pratik Marolia, Rajesh Sankaran, Koen Koning
-
Publication number: 20200293365Abstract: Methods and apparatus relating to transactional page fault handling. In an example, an apparatus comprises a processor to divide an execution thread of a graphics workload into a set of transactions which are to be executed atomically, initiate the execution of the thread, and manage the execution of the thread according to one of a first protocol in response to a determination that a page fault occurred in the execution of a transaction, or a second protocol in response to a determination that a page fault did not occur in the execution of a transaction. Other embodiments are also disclosed and claimed.Type: ApplicationFiled: March 15, 2019Publication date: September 17, 2020Applicant: Intel CorporationInventors: NIRAN COORAY, WILLIAM B. SADLER, JONATHAN D. PEARCE, MARIAN ALIN PETRE, BEN ASHBAUGH, MURALI RAMADOSS, VIKRANTH VEMULAPALLI, ANKUR N SHAH, RAJESH SANKARAN
-
Patent number: 10761996Abstract: Examples include an apparatus which accesses secure pages in a trust domain using secure lookups in first and second sets of page tables. For example, one embodiment of the processor comprises: a decoder to decode a plurality of instructions including instructions related to a trusted domain; execution circuitry to execute a first one or more of the instructions to establish a first trusted domain using a first trusted domain key, the trusted domain key to be used to encrypt memory pages within the first trusted domain; and the execution circuitry to execute a second one or more of the instructions to associate a first process address space identifier (PASID) with the first trusted domain, the first PASID to uniquely identify a first execution context associated with the first trusted domain.Type: GrantFiled: September 28, 2018Date of Patent: September 1, 2020Assignee: Intel CorporationInventors: Vedvyas Shanbhogue, Ravi Sahita, Rajesh Sankaran, Siddhartha Chhabra, Abhishek Basak, Krystof Zmudzinski, Rupin Vakharwala
-
Patent number: 10579382Abstract: An apparatus and method for scalable interrupt reporting.Type: GrantFiled: January 24, 2018Date of Patent: March 3, 2020Assignee: Intel CorporationInventors: Rajesh Sankaran, Ankur Shah, Bryan White, Hema Nalluri, David Puffer, Murali Ramadoss, Altug Koker, Aditya Navale, Balaji Vembu