Patents by Inventor Rajesh Sankaran

Rajesh Sankaran has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Techniques to provide cache coherency based on cache type

Patent number: 11016894

Abstract: Techniques and apparatus to manage cache coherency for different types of cache memory are described. In one embodiment, an apparatus may include at least one processor, at least one cache memory, and logic, at least a portion comprised in hardware, the logic to receive a memory operation request associated with the at least one cache memory, determine a cache status of the memory operation request, the cache status indicating one of a giant cache status or a small cache status, perform the memory operation request via a small cache coherence process responsive to the cache status being a small cache status, and perform the memory operation request via a giant cache coherence process responsive to the cache status being a small cache status. Other embodiments are described and claimed.

Type: Grant

Filed: August 7, 2017

Date of Patent: May 25, 2021

Assignee: INTEL CORPORATION

Inventors: Rajesh Sankaran, Ishwar Agarwal, Stephen Van Doren
APPARATUS AND METHOD FOR SECURE MEMORY ACCESS USING TRUST DOMAINS

Publication number: 20210089466

Abstract: Examples include an apparatus which accesses secure pages in a trust domain using secure lookups in first and second sets of page tables. For example, one embodiment of the processor comprises: a decoder to decode a plurality of instructions including instructions related to a trusted domain; execution circuitry to execute a first one or more of the instructions to establish a first trusted domain using a first trusted domain key, the trusted domain key to be used to encrypt memory pages within the first trusted domain; and the execution circuitry to execute a second one or more of the instructions to associate a first process address space identifier (PASID) with the first trusted domain, the first PASID to uniquely identify a first execution context associated with the first trusted domain.

Type: Application

Filed: August 5, 2020

Publication date: March 25, 2021

Inventors: Vedvyas SHANBHOGUE, Ravi SAHITA, Rajesh SANKARAN, Siddhartha CHHABRA, Abhishek BASAK, Krystof ZMUDZINSKI, Rupin VAKHARWALA
DEEP LEARNING IMPLEMENTATIONS USING SYSTOLIC ARRAYS AND FUSED OPERATIONS

Publication number: 20210089316

Abstract: Disclosed embodiments relate to deep learning implementations using systolic arrays and fused operations. In one example, a processor includes fetch and decode circuitry to fetch and decode an instruction having fields to specify an opcode and locations of a destination and N source matrices, the opcode indicating the processor is to load the N source matrices from memory, perform N convolutions on the N source matrices to generate N feature maps, and store results of the N convolutions in registers to be passed to an activation layer, wherein the processor is to perform the N convolutions and the activation layer with at most one memory load of each of the N source matrices. The processor further includes scheduling circuitry to schedule execution of the instruction and execution circuitry to execute the instruction as per the opcode.

Type: Application

Filed: September 25, 2019

Publication date: March 25, 2021

Applicant: Intel Corporation

Inventors: William RASH, Subramaniam MAIYURAN, Varghese GEORGE, Bret L. TOLL, Rajesh SANKARAN, Robert S. CHAPPELL, Supratim PAL, Alexander F. HEINECKE, Elmoustapha OULD-AHMED-VALL, Gang CHEN
HARDWARE-BASED VIRTUALIZATION OF INPUT/OUTPUT (I/O) MEMORY MANAGEMENT UNIT

Publication number: 20210064525

Abstract: A processor includes a hardware input/output (I/O) memory management unit (IOMMU) and a core, which executes an instruction to intercept a payload from a virtual machine (VM). The payload contains a guest bus device function (BDF) identifier, a guest address space identifier (ASID), and a guest address range. The core accesses, within a virtual machine control structure stored in memory, pointers to a first set of translation tables and a second set of translation tables. The core traverses the first set of translation tables to translate the guest BDF identifier to a host BDF identifier and traverses the second set of translation tables to translate the guest ASID to a host ASID. The core stores the host BDF identifier and the host ASID in the payload and submits, to the hardware IOMMU, an administrative command containing the payload to perform invalidation of the guest address range.

Type: Application

Filed: January 2, 2018

Publication date: March 4, 2021

Applicant: Intel Corporation

Inventors: Kun Tian, Rajesh Sankaran, Sanjay Kumar, Ashok Raj
ACCELERATOR CONTROLLER HUB

Publication number: 20210042254

Abstract: Methods and apparatus for an accelerator controller hub (ACH). The ACH may be a stand-alone component or integrated on-die or on package in an accelerator such as a GPU. The ACH may include a host device link (HDL) interface, one or more Peripheral Component Interconnect Express (PCIe) interfaces, one or more high performance accelerator link (HPAL) interfaces, and a router, operatively coupled to each of the HDL interface, the one or more PCIe interfaces, and the one or more HPAL interfaces. The HDL interface is configured to be coupled to a host CPU via an HDL link and the one or more HPAL interfaces are configured to be coupled to one or more HPALs that are used to access high performance accelerator fabrics (HPAFs) such as NVlink fabrics and CCIX (Cache Coherent Interconnect for Accelerators) fabrics. Platforms including ACHs or accelerators with integrated ACHs support RDMA transfers using RDMA semantics to enable transfers between accelerator memory on initiators and targets without CPU involvement.

Type: Application

Filed: October 28, 2020

Publication date: February 11, 2021

Inventors: Pratik Marolia, Andrew Herdrich, Rajesh Sankaran, Rahul Pal, David Puffer, Sayantan Sur, Ajaya Durg
PASID BASED ROUTING EXTENSION FOR SCALABLE IOV SYSTEMS

Publication number: 20210004338

Abstract: Methods and apparatus for PASID-based routing extension for Scalable IOV systems. The system may include a Central Processing Unit (CPU) operatively coupled to a scalable Input/Output Virtualization (IOV) device via an in-line device such as a smart controller or accelerator. A Control Process Address Space Identifier (C-PASID) associated with a first memory space is implemented in an Assignable Device Interface (ADI) for the IOV device. The ADI also implements a Data PASID (D-PASID) associated with a second memory space in which data are stored. The C-PASID is used to fetch a descriptor in the first memory space and the D-PASID is employed to fetch data in the second memory space. A hub embedded on the in-line device or implemented as a discrete device is used to steer memory access requests and/or fetches to the CPU or to the in-line device using the C-PASID and D-PASID.

Type: Application

Filed: September 21, 2020

Publication date: January 7, 2021

Inventors: Pratik Marolia, Sanjay Kumar, Rajesh Sankaran, Utkarsh Y. Kakaiya
HARDWARE/SOFTWARE CO-OPTIMIZATION TO IMPROVE PERFORMANCE AND ENERGY FOR INTER-VM COMMUNICATION FOR NFVS AND OTHER PRODUCER-CONSUMER WORKLOADS

Publication number: 20210004328

Abstract: Methods and apparatus implementing Hardware/Software co-optimization to improve performance and energy for inter-VM communication for NFVs and other producer-consumer workloads. The apparatus include multi-core processors with multi-level cache hierarchies including and L1 and L2 cache for each core and a shared last-level cache (LLC). One or more machine-level instructions are provided for proactively demoting cachelines from lower cache levels to higher cache levels, including demoting cachelines from L1/L2 caches to an LLC. Techniques are also provided for implementing hardware/software co-optimization in multi-socket NUMA architecture system, wherein cachelines may be selectively demoted and pushed to an LLC in a remote socket. In addition, techniques are disclosure for implementing early snooping in multi-socket systems to reduce latency when accessing cachelines on remote sockets.

Type: Application

Filed: September 21, 2020

Publication date: January 7, 2021

Inventors: Ren Wang, Andrew J. Herdrich, Yen-cheng Liu, Herbert H. Hum, Jong Soo Park, Christopher J. Hughes, Namakkal N. Venkatesan, Adrian C. Moga, Aamer Jaleel, Zeshan A. Chishti, Mesut A. Ergin, Jr-shian Tsai, Alexander W. Min, Tsung-yuan C. Tai, Christian Maciocco, Rajesh Sankaran
ADDRESS SPACE IDENTIFIER MANAGEMENT IN COMPLEX INPUT/OUTPUT VIRTUALIZATION ENVIRONMENTS

Publication number: 20210004334

Abstract: Embodiment of this disclosure provides a mechanism to extend a workload instruction to include both untranslated and translated address space identifiers (ASIDs). In one embodiment, a processing device comprising a translation manager is provided. The translation manager receives a workload instruction from a guest application. The workload instruction comprises an untranslated (ASID) and a workload for an input/output (I/O) device. The untranslated ASID is translated to a translated ASID. The translated ASID inserted into a payload of the workload instruction. Thereupon, the payload is provided to a work queue of the I/O device to execute the workload based in part on at least one of: the translated ASID or the untranslated ASID.

Type: Application

Filed: March 28, 2018

Publication date: January 7, 2021

Inventors: Kun TIAN, Xiao ZHENG, Ashok RAJ, Sanjay KUMAR, Rajesh SANKARAN
SYSTEM AND METHOD TO TRACK PHYSICAL ADDRESS ACCESSES BY A CPU OR DEVICE

Publication number: 20200409585

Abstract: A system for tracking memory access patterns to be used in making data placement and migration policies. The system includes a processing unit and a system memory. The system memory includes a local memory and a remote memory, each of which having mapped thereon, a plurality of memory pages. Each of the plurality of memory pages corresponds to one or more physical addresses. A set of attributes for each memory page is stored in a physical attribute table (PAT). The PAT is looked up and the attributes updated when a memory access is detected. Attributes stored in the PAT are used to control the movement of memory pages between the local memory and the remote memory. When the attributes in the PAT indicate a remote memory page is being accessed frequently by the processing unit, the remote memory page is moved from the remote memory to the local memory.

Type: Application

Filed: June 29, 2019

Publication date: December 31, 2020

Applicant: Intel Corporation

Inventors: David Koufaty, Rajesh Sankaran, Rupin Vakharwala
VIRTUALIZATION AND MULTI-TENANCY SUPPORT IN GRAPHICS PROCESSORS

Publication number: 20200409733

Abstract: Graphics processing systems and methods are described. A graphics processing apparatus may comprise one or more graphics processing engines, a memory, a memory management unit (MMU) including a GPU second level page table and GPU dirty bit tracking, and a provisioning agent to receive a request from a virtual machine monitor (VMM) to provision a subcluster of graphics processing apparatuses, the subcluster including a plurality of graphics processing engines from a plurality of graphics processing apparatuses connected using a scale-up fabric, provision the scale-up fabric to route data within the subcluster of graphics processing apparatuses, and provision a plurality of resources on the graphics processing apparatus for the subcluster based on the request from the VMM.

Type: Application

Filed: June 28, 2019

Publication date: December 31, 2020

Inventors: Rajesh SANKARAN, Bret TOLL, William RASH, Subramaniam MAIYURAN, Gang CHEN, Varghese GEORGE
ASYNCHRONOUS CACHE FLUSH ENGINE TO MANAGE PLATFORM COHERENT AND MEMORY SIDE CACHES

Publication number: 20200409844

Abstract: Disclosed embodiments relate to an asynchronous cache-flush engine to manage platform coherent and memory-side caches. In one example, a system includes multiple interconnected sockets each including a cache flush engine (CFE), a core, and an associated cache hierarchy including a plurality of caches, one of the CFEs designated as a master CFE in a master socket, the master CFE to: receive a request specifying an opcode and a range, the opcode calling for a cache flush, execute the request to cause writeback and, if indicated by the request, invalidation of modified cache lines in the master socket falling within the range, and communicate a request to any other, slave sockets in the system each having a slave CFE to cause writeback and, if indicated by the request, invalidation of modified cache lines in the slave socket falling within the range.

Type: Application

Filed: June 26, 2019

Publication date: December 31, 2020

Inventors: Vivekananthan SANJEEPAN, Gideon GERZON, Ishwar AGARWAL, Rajesh SANKARAN, Andy RUDOFF
SYSTEMS AND METHODS TO SKIP INCONSEQUENTIAL MATRIX OPERATIONS

Publication number: 20200409705

Abstract: Disclosed embodiments relate to systems and methods to skip inconsequential matrix operations. In one example, a processor includes decode circuitry to decode an instruction having fields to specify an opcode and locations of first source, second source, and destination matrices, the opcode indicating that the processor is to multiply each element at row M and column K of the first source matrix with a corresponding element at row K and column N of the second source matrix, and accumulate a resulting product with previous contents of a corresponding element at row M and column N of the destination matrix, the processor to skip multiplications that, based on detected values of corresponding multiplicands, would generate inconsequential results, scheduling circuitry to schedule execution of the instruction; and execution circuitry to execute the instructions as per the opcode.

Type: Application

Filed: June 26, 2019

Publication date: December 31, 2020

Applicant: Intel Corporation

Inventors: Elmoustapha OULD-AHMED-VALL, William RASH, Subramaniam MAIYURAN, Varghese GEORGE, Rajesh SANKARAN
VIRTUALIZING INTERRUPT PRIORITIZATION AND DELIVERY

Publication number: 20200341921

Abstract: Embodiments of processors, methods, and systems for virtualizing interrupt prioritization and delivery are disclosed. In one embodiment, a processor includes instruction hardware and execution hardware. The instruction hardware is to receive a plurality of instructions, including a first instruction to transfer the processor from a root mode to a non-root mode for executing guest software in a virtual machine, wherein the processor is to return to the root mode upon the detection of any of a plurality of virtual machine exit events. The execution hardware is to execute the first instruction, execution of the first instruction to include determining a first virtual processor-priority value and storing the first virtual processor-priority value in a virtual copy of a processor-priority field, where the virtual copy of the processor-priority field is a virtual resource corresponding to a physical resource associated with an interrupt controller.

Type: Application

Filed: May 14, 2020

Publication date: October 29, 2020

Applicant: Intel Corporation

Inventors: Gilbert Neiger, Rajesh Sankaran, Gideon Gerzon, Richard Uhlig, Sergiu Ghetie, Michael Neve de Mevergnies, Adil Karrar
Hardware/software co-optimization to improve performance and energy for inter-VM communication for NFVs and other producer-consumer workloads

Patent number: 10817425

Abstract: Methods and apparatus implementing Hardware/Software co-optimization to improve performance and energy for inter-VM communication for NFVs and other producer-consumer workloads. The apparatus include multi-core processors with multi-level cache hierarchies including and L1 and L2 cache for each core and a shared last-level cache (LLC). One or more machine-level instructions are provided for proactively demoting cachelines from lower cache levels to higher cache levels, including demoting cachelines from L1/L2 caches to an LLC. Techniques are also provided for implementing hardware/software co-optimization in multi-socket NUMA architecture system, wherein cachelines may be selectively demoted and pushed to an LLC in a remote socket. In addition, techniques are disclosure for implementing early snooping in multi-socket systems to reduce latency when accessing cachelines on remote sockets.

Type: Grant

Filed: December 26, 2014

Date of Patent: October 27, 2020

Assignee: Intel Corporation

Inventors: Ren Wang, Andrew J. Herdrich, Yen-cheng Liu, Herbert H. Hum, Jong Soo Park, Christopher J. Hughes, Namakkal N. Venkatesan, Adrian C. Moga, Aamer Jaleel, Zeshan A. Chishti, Mesut A. Ergin, Jr-shian Tsai, Alexander W. Min, Tsung-yuan C. Tai, Christian Maciocco, Rajesh Sankaran
Shared accelerator memory systems and methods

Patent number: 10817441

Abstract: The present disclosure is directed to systems and methods sharing memory circuitry between processor memory circuitry and accelerator memory circuitry in each of a plurality of peer-to-peer connected accelerator units. Each of the accelerator units includes virtual-to-physical address translation circuitry and migration circuitry. The virtual-to-physical address translation circuitry in each accelerator unit includes pages for each of at least some of the plurality of accelerator units. The migration circuitry causes the transfer of data between the processor memory circuitry and the accelerator memory circuitry in each of the plurality of accelerator circuits. The migration circuitry migrates and evicts data to/from accelerator memory circuitry based on statistical information associated with accesses to at least one of: processor memory circuitry or accelerator memory circuitry in one or more peer accelerator circuits.

Type: Grant

Filed: March 29, 2019

Date of Patent: October 27, 2020

Assignee: INTEL CORPORATION

Inventors: Sanjay Kumar, David Koufaty, Philip Lantz, Pratik Marolia, Rajesh Sankaran, Koen Koning
SYSTEM, APPARATUS AND METHOD FOR ACCESSING MULTIPLE ADDRESS SPACES VIA A VIRTUALIZATION DEVICE

Publication number: 20200319913

Abstract: In one embodiment, an apparatus includes an input/output virtualization (IOV) device comprising: at least one function circuit to be shared by a plurality of virtual machines (VMs); and a plurality of assignable device interfaces (ADIs) coupled to the at least one function circuit, wherein each of the plurality of ADIs is to be associated with one of the plurality of VMs and comprises a first process address space identifier (PASID) field to store a first PASID to identify a descriptor queue stored in a host address space and a second PASID field to store a second PASID to identify a data buffer located in a VM address space. Other embodiments are described and claimed.

Type: Application

Filed: June 23, 2020

Publication date: October 8, 2020

Inventors: SANJAY K. KUMAR, RAJESH SANKARAN, UTKARSH Y. KAKAIYA, PRATIK M. MAROLIA
SHARED ACCELERATOR MEMORY SYSTEMS AND METHODS

Publication number: 20200310993

Abstract: The present disclosure is directed to systems and methods sharing memory circuitry between processor memory circuitry and accelerator memory circuitry in each of a plurality of peer-to-peer connected accelerator units. Each of the accelerator units includes physical-to-virtual address translation circuitry and migration circuitry. The physical-to-virtual address translation circuitry in each accelerator unit includes pages for each of at least some of the plurality of accelerator units. The migration circuitry causes the transfer of data between the processor memory circuitry and the accelerator memory circuitry in each of the plurality of accelerator circuits. The migration circuitry migrates and evicts data to/from accelerator memory circuitry based on statistical information associated with accesses to at least one of: processor memory circuitry or accelerator memory circuitry in one or more peer accelerator circuits.

Type: Application

Filed: March 29, 2019

Publication date: October 1, 2020

Applicant: INTEL CORPORATION

Inventors: Sanjay Kumar, David Koufaty, Philip Lantz, Pratik Marolia, Rajesh Sankaran, Koen Koning
TRANSACTIONAL PAGE FAULT HANDLING

Publication number: 20200293365

Abstract: Methods and apparatus relating to transactional page fault handling. In an example, an apparatus comprises a processor to divide an execution thread of a graphics workload into a set of transactions which are to be executed atomically, initiate the execution of the thread, and manage the execution of the thread according to one of a first protocol in response to a determination that a page fault occurred in the execution of a transaction, or a second protocol in response to a determination that a page fault did not occur in the execution of a transaction. Other embodiments are also disclosed and claimed.

Type: Application

Filed: March 15, 2019

Publication date: September 17, 2020

Applicant: Intel Corporation

Inventors: NIRAN COORAY, WILLIAM B. SADLER, JONATHAN D. PEARCE, MARIAN ALIN PETRE, BEN ASHBAUGH, MURALI RAMADOSS, VIKRANTH VEMULAPALLI, ANKUR N SHAH, RAJESH SANKARAN
Apparatus and method for secure memory access using trust domains

Patent number: 10761996

Abstract: Examples include an apparatus which accesses secure pages in a trust domain using secure lookups in first and second sets of page tables. For example, one embodiment of the processor comprises: a decoder to decode a plurality of instructions including instructions related to a trusted domain; execution circuitry to execute a first one or more of the instructions to establish a first trusted domain using a first trusted domain key, the trusted domain key to be used to encrypt memory pages within the first trusted domain; and the execution circuitry to execute a second one or more of the instructions to associate a first process address space identifier (PASID) with the first trusted domain, the first PASID to uniquely identify a first execution context associated with the first trusted domain.

Type: Grant

Filed: September 28, 2018

Date of Patent: September 1, 2020

Assignee: Intel Corporation

Inventors: Vedvyas Shanbhogue, Ravi Sahita, Rajesh Sankaran, Siddhartha Chhabra, Abhishek Basak, Krystof Zmudzinski, Rupin Vakharwala
Method and apparatus for a scalable interrupt infrastructure

Patent number: 10579382

Abstract: An apparatus and method for scalable interrupt reporting.

Type: Grant

Filed: January 24, 2018

Date of Patent: March 3, 2020

Assignee: Intel Corporation

Inventors: Rajesh Sankaran, Ankur Shah, Bryan White, Hema Nalluri, David Puffer, Murali Ramadoss, Altug Koker, Aditya Navale, Balaji Vembu

prev 1 2 3 4 5 6 7 next