Patents by Inventor Namakkal N. Venkatesan

Namakkal N. Venkatesan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Processor and method implementing a cacheline demote machine instruction

Patent number: 11513957

Abstract: Methods and apparatus implementing Hardware/Software co-optimization to improve performance and energy for inter-VM communication for NFVs and other producer-consumer workloads. The apparatus include multi-core processors with multi-level cache hierarchies including and L1 and L2 cache for each core and a shared last-level cache (LLC). One or more machine-level instructions are provided for proactively demoting cachelines from lower cache levels to higher cache levels, including demoting cachelines from L1/L2 caches to an LLC. Techniques are also provided for implementing hardware/software co-optimization in multi-socket NUMA architecture system, wherein cachelines may be selectively demoted and pushed to an LLC in a remote socket. In addition, techniques are disclosure for implementing early snooping in multi-socket systems to reduce latency when accessing cachelines on remote sockets.

Type: Grant

Filed: September 21, 2020

Date of Patent: November 29, 2022

Assignee: Intel Corporation

Inventors: Ren Wang, Andrew J. Herdrich, Yen-cheng Liu, Herbert H. Hum, Jong Soo Park, Christopher J. Hughes, Namakkal N. Venkatesan, Adrian C. Moga, Aamer Jaleel, Zeshan A. Chishti, Mesut A. Ergin, Jr-shian Tsai, Alexander W. Min, Tsung-yuan C. Tai, Christian Maciocco, Rajesh Sankaran
Techniques for flow control packet processing

Patent number: 11418495

Abstract: Techniques and apparatuses for processing data unit are described. In one embodiment, for example, an apparatus for networking may include at least one memory, logic, at least a portion of the logic comprised in hardware coupled to the at least one memory, the logic to access an encrypted packet having an encrypted portion, determine at least one flow control segment of the encrypted portion, decrypt the at least one flow control segment to generate a partially-decrypted packet comprising a decrypted at least one flow control segment and an encrypted remainder portion, the remainder portion comprising a portion of the encrypted packet that does not include the decrypted at least one flow control segment, access process information in the decrypted at least one flow control segment, and process the partially-decrypted packet according to the process information. Other embodiments are described and claimed.

Type: Grant

Filed: September 26, 2017

Date of Patent: August 16, 2022

Inventors: John J. Browne, Chris Macnamara, Namakkal N. Venkatesan, Tomasz Kantecki, Declan W. Doherty
Oversubscribable resource allocation

Patent number: 11296956

Abstract: There is disclosed in one example a computing apparatus, including: a hardware platform configured to communicatively couple with a multi-tenant cloud service, the multi-tenant cloud service including an oversubscribable resource; and a service assurance for oversubscribable resource (SAOR) engine configured to: receive tenant subscriptions to the oversubscribable resource, wherein tenant subscriptions exceed available instances of the oversubscribable resource; receive per-tenant quality of service (QoS) metrics for the oversubscribable resource; receive an allocation request from a guest for allocation of an instance of the oversubscribable resource; compare the request to currently-available instances of the oversubscribable resource; determine that the oversubscribable resource has capacity to service the request according to the QoS metrics of the tenant; and allocate an instance of the oversubscribable resource to the guest.

Type: Grant

Filed: June 26, 2018

Date of Patent: April 5, 2022

Assignee: Intel Corporation

Inventors: Fan Zhang, Roger Keith Wiles, Xin Zeng, Cunming Liang, Namakkal N. Venkatesan
Lazy increment for high frequency counters

Patent number: 11080202

Abstract: A computing apparatus, including: a processor; a pointer to a counter memory location; and a lazy increment counter engine to: receive a stimulus to update the counter; and lazy increment the counter including issuing a weakly-ordered increment directive to the pointer.

Type: Grant

Filed: September 30, 2017

Date of Patent: August 3, 2021

Assignee: Intel Corporation

Inventors: Niall D. McDonnell, Christopher MacNamara, John J. Browne, Andrew Cunningham, Brendan Ryan, Patrick Fleming, Namakkal N. Venkatesan, Bruce Richardson, Tomasz Kantecki, Sean Harte, Pierre Laurent
Multi-core communication acceleration using hardware queue device

Patent number: 10929323

Abstract: Apparatus and methods implementing a hardware queue management device for reducing inter-core data transfer overhead by offloading request management and data coherency tasks from the CPU cores. The apparatus include multi-core processors, a shared L3 or last-level cache (“LLC”), and a hardware queue management device to receive, store, and process inter-core data transfer requests. The hardware queue management device further comprises a resource management system to control the rate in which the cores may submit requests to reduce core stalls and dropped requests. Additionally, software instructions are introduced to optimize communication between the cores and the queue management device.

Type: Grant

Filed: October 14, 2019

Date of Patent: February 23, 2021

Assignee: Intel Corporation

Inventors: Ren Wang, Yipeng Wang, Andrew Herdrich, Jr-Shian Tsai, Tsung-Yuan C. Tai, Niall D. McDonnell, Hugh Wilkinson, Bradley A. Burres, Bruce Richardson, Namakkal N. Venkatesan, Debra Bernstein, Edwin Verplanke, Stephen R. Van Doren, An Yan, Andrew Cunningham, David Sonnier, Gage Eads, James T. Clee, Jamison D. Whitesell, Jerry Pirog, Jonathan Kenny, Joseph R. Hasting, Narender Vangati, Stephen Miller, Te K. Ma, William Burroughs
OVERSUBSCRIBABLE RESOURCE ALLOCATION

Publication number: 20210044503

Abstract: There is disclosed in one example a computing apparatus, including: a hardware platform configured to communicatively couple with a multi-tenant cloud service, the multi-tenant cloud service including an oversubscribable resource; and a service assurance for oversubscribable resource (SAOR) engine configured to: receive tenant subscriptions to the oversubscribable resource, wherein tenant subscriptions exceed available instances of the oversubscribable resource; receive per-tenant quality of service (QoS) metrics for the oversubscribable resource; receive an allocation request from a guest for allocation of an instance of the oversubscribable resource; compare the request to currently-available instances of the oversubscribable resource; determine that the oversubscribable resource has capacity to service the request according to the QoS metrics of the tenant; and allocate an instance of the oversubscribable resource to the guest.

Type: Application

Filed: June 28, 2018

Publication date: February 11, 2021

Applicant: Intel Corporation

Inventors: Fan Zhang, Roger Keith Wiles, Xin Zeng, Cunming Liang, Namakkal N. Venkatesan
HARDWARE/SOFTWARE CO-OPTIMIZATION TO IMPROVE PERFORMANCE AND ENERGY FOR INTER-VM COMMUNICATION FOR NFVS AND OTHER PRODUCER-CONSUMER WORKLOADS

Publication number: 20210004328

Abstract: Methods and apparatus implementing Hardware/Software co-optimization to improve performance and energy for inter-VM communication for NFVs and other producer-consumer workloads. The apparatus include multi-core processors with multi-level cache hierarchies including and L1 and L2 cache for each core and a shared last-level cache (LLC). One or more machine-level instructions are provided for proactively demoting cachelines from lower cache levels to higher cache levels, including demoting cachelines from L1/L2 caches to an LLC. Techniques are also provided for implementing hardware/software co-optimization in multi-socket NUMA architecture system, wherein cachelines may be selectively demoted and pushed to an LLC in a remote socket. In addition, techniques are disclosure for implementing early snooping in multi-socket systems to reduce latency when accessing cachelines on remote sockets.

Type: Application

Filed: September 21, 2020

Publication date: January 7, 2021

Inventors: Ren Wang, Andrew J. Herdrich, Yen-cheng Liu, Herbert H. Hum, Jong Soo Park, Christopher J. Hughes, Namakkal N. Venkatesan, Adrian C. Moga, Aamer Jaleel, Zeshan A. Chishti, Mesut A. Ergin, Jr-shian Tsai, Alexander W. Min, Tsung-yuan C. Tai, Christian Maciocco, Rajesh Sankaran
Hardware/software co-optimization to improve performance and energy for inter-VM communication for NFVs and other producer-consumer workloads

Patent number: 10817425

Abstract: Methods and apparatus implementing Hardware/Software co-optimization to improve performance and energy for inter-VM communication for NFVs and other producer-consumer workloads. The apparatus include multi-core processors with multi-level cache hierarchies including and L1 and L2 cache for each core and a shared last-level cache (LLC). One or more machine-level instructions are provided for proactively demoting cachelines from lower cache levels to higher cache levels, including demoting cachelines from L1/L2 caches to an LLC. Techniques are also provided for implementing hardware/software co-optimization in multi-socket NUMA architecture system, wherein cachelines may be selectively demoted and pushed to an LLC in a remote socket. In addition, techniques are disclosure for implementing early snooping in multi-socket systems to reduce latency when accessing cachelines on remote sockets.

Type: Grant

Filed: December 26, 2014

Date of Patent: October 27, 2020

Assignee: Intel Corporation

Inventors: Ren Wang, Andrew J. Herdrich, Yen-cheng Liu, Herbert H. Hum, Jong Soo Park, Christopher J. Hughes, Namakkal N. Venkatesan, Adrian C. Moga, Aamer Jaleel, Zeshan A. Chishti, Mesut A. Ergin, Jr-shian Tsai, Alexander W. Min, Tsung-yuan C. Tai, Christian Maciocco, Rajesh Sankaran
SOFTWARE-TRANSPARENT HARDWARE PREDICTOR FOR CORE-TO-CORE DATA TRANSFER OPTIMIZATION

Publication number: 20200285578

Abstract: Apparatus, method, and system for implementing a software-transparent hardware predictor for core-to-core data communication optimization are described herein. An embodiment of the apparatus includes a plurality of hardware processor cores each including a private cache; a shared cache that is communicatively coupled to and shared by the plurality of hardware processor cores; and a predictor circuit. The predictor circuit is to track activities relating to a plurality of monitored cache lines in the private cache of a producer hardware processor core (producer core) and to enable a cache line push operation upon determining a target hardware processor core (target core) based on the tracked activities. An execution of the cache line push operation is to cause a plurality of unmonitored cache lines in the private cache of the producer core to be moved to the private cache of the target core.

Type: Application

Filed: March 18, 2020

Publication date: September 10, 2020

Applicant: Intel Corporation

Inventors: Ren Wang, Joseph Nuzman, Samantika S. Sury, Andrew J. Herdrich, Namakkal N. Venkatesan, Anil Vasudevan, Tsung-Yuan C. Tai, Niall D. McDonnell
Software-transparent hardware predictor for core-to-core data transfer optimization

Patent number: 10635590

Abstract: Apparatus, method, and system for implementing a software-transparent hardware predictor for core-to-core data communication optimization are described herein. An embodiment of the apparatus includes a plurality of hardware processor cores each including a private cache; a shared cache that is communicatively coupled to and shared by the plurality of hardware processor cores; and a predictor circuit. The predictor circuit is to track activities relating to a plurality of monitored cache lines in the private cache of a producer hardware processor core (producer core) and to enable a cache line push operation upon determining a target hardware processor core (target core) based on the tracked activities. An execution of the cache line push operation is to cause a plurality of unmonitored cache lines in the private cache of the producer core to be moved to the private cache of the target core.

Type: Grant

Filed: September 29, 2017

Date of Patent: April 28, 2020

Assignee: Intel Corporation

Inventors: Ren Wang, Joseph Nuzman, Samantika S. Sury, Andrew J. Herdrich, Namakkal N. Venkatesan, Anil Vasudevan, Tsung-Yuan C. Tai, Niall D. McDonnell
MULTI-CORE COMMUNICATION ACCELERATION USING HARDWARE QUEUE DEVICE

Publication number: 20200042479

Abstract: Apparatus and methods implementing a hardware queue management device for reducing inter-core data transfer overhead by offloading request management and data coherency tasks from the CPU cores. The apparatus include multi-core processors, a shared L3 or last-level cache (“LLC”), and a hardware queue management device to receive, store, and process inter-core data transfer requests. The hardware queue management device further comprises a resource management system to control the rate in which the cores may submit requests to reduce core stalls and dropped requests. Additionally, software instructions are introduced to optimize communication between the cores and the queue management device.

Type: Application

Filed: October 14, 2019

Publication date: February 6, 2020

Applicant: Intel Corporation

Inventors: Ren Wang, Yipeng Wang, Andrew Herdrich, Jr-Shian Tsai, Tsung-Yuan C. Tai, Niall D. McDonnell, Hugh Wilkinson, Bradley A. Burres, Bruce Richardson, Namakkal N. Venkatesan, Debra Bernstein, Edwin Verplanke, Stephen R. Van Doren, An Yan, Andrew Cunningham, David Sonnier, Gage Eads, James T. Clee, Jamison D. Whitesell, Jerry Pirog, Jonathan Kenny, Joseph R. Hasting, Narender Vangati, Stephen Miller, Te K. Ma, William Burroughs
Packet flow classification

Patent number: 10455063

Abstract: Technologies for packet flow classification on a computing device include a hash table including a plurality of hash table buckets in which each hash table bucket maps a plurality of keys to corresponding traffic flows. The computing device performs packet flow classification on received data packets, where the packet flow classification includes a plurality of sequential classification stages and fetch classification operations and non-fetch classification operations are performed in each classification stage. The fetch classification operations include to prefetch a key of a first received data packet based on a set of packet fields of the first received data packet for use during a subsequent classification stage, prefetch a hash table bucket from the hash table based on a key signature of the prefetched key for use during another subsequent classification stage, and prefetch a traffic flow to be applied to the first received data packet based on the prefetched hash table bucket and the prefetched key.

Type: Grant

Filed: August 15, 2017

Date of Patent: October 22, 2019

Assignee: Intel Corporation

Inventors: Cristian Florin F. Dumitrescu, Namakkal N. Venkatesan, Pierre Laurent, Bruce Richardson
Multi-core communication acceleration using hardware queue device

Patent number: 10445271

Abstract: Apparatus and methods implementing a hardware queue management device for reducing inter-core data transfer overhead by offloading request management and data coherency tasks from the CPU cores. The apparatus include multi-core processors, a shared L3 or last-level cache (“LLC”), and a hardware queue management device to receive, store, and process inter-core data transfer requests. The hardware queue management device further comprises a resource management system to control the rate in which the cores may submit requests to reduce core stalls and dropped requests. Additionally, software instructions are introduced to optimize communication between the cores and the queue management device.

Type: Grant

Filed: January 4, 2016

Date of Patent: October 15, 2019

Assignee: Intel Corporation

Inventors: Ren Wang, Namakkal N. Venkatesan, Debra Bernstein, Edwin Verplanke, Stephen R. Van Doren, An Yan, Andrew Cunningham, David Sonnier, Gage Eads, James T. Clee, Jamison D. Whitesell, Yipeng Wang, Jerry Pirog, Jonathan Kenny, Joseph R. Hasting, Narender Vangati, Stephen Miller, Te K. Ma, William Burroughs, Andrew J. Herdrich, Jr-Shian Tsai, Tsung-Yuan C. Tai, Niall D. McDonnell, Hugh Wilkinson, Bradley A. Burres, Bruce Richardson
Network interface device facilitating transaction assurance

Patent number: 10334041

Abstract: A network interface device (NID) interfaced with a host machine communicates with a local link of the host machine to obtain transaction-specific data relied upon by the host machine to be delivered to a destination by the NID according to a reliable message delivery protocol. The NID conducts communications over a network in response to obtaining of the transaction-specific data, with the network communications including execution of the reliable message delivery protocol independent of any operability of the host machine.

Type: Grant

Filed: November 23, 2015

Date of Patent: June 25, 2019

Assignee: Intel Corporation

Inventors: Vadim Sukhomlinov, Kshitij A. Doshi, Namakkal N. Venkatesan, Roger Keith Wiles
Technologies for network device flow lookup management

Patent number: 10284470

Abstract: Technologies for managing network flow lookups of a network device include a network controller and a target device, each communicatively coupled to the network device. The network device includes a cache for a processor of the network device and a main memory. The network device additionally includes a multi-level hash table having a first-level hash table stored in the cache of the network device and a second-level hash table stored in the main memory of the network device. The network device is configured to determine whether to store a network flow hash corresponding to a network flow indicating the target device in the first-level or second-level hash table based on a priority of the network flow provided to the network device by the network controller.

Type: Grant

Filed: December 23, 2014

Date of Patent: May 7, 2019

Assignee: Intel Corporation

Inventors: Ren Wang, Namakkal N. Venkatesan, Aamer Jaleel, Tsung-Yuan C. Tai, Sameh Gobriel, Christian Maciocco
Processors and methods for managing cache tiering with gather-scatter vector semantics

Patent number: 10268580

Abstract: Processors and methods implementing a machine instruction to perform cache line demotion on multiple cache lines to enable efficient sharing of cache lines between processor cores. One general aspect includes a processor comprising: a plurality of hardware processor cores, where each of the hardware processor cores to include a first cache. The processor also includes a second cache, communicatively coupled to and shared by the plurality of hardware processor cores. The processor to support a first machine instruction, the first machine instruction to include a vector register operand identifying a vector register which contains a plurality of data elements each used to identify a cache line. An execution of the first machine instruction by one of the plurality of hardware processor cores to cause a plurality of identified cache lines to be demoted, such that the demoted cache lines are moved from the first cache to the second cache.

Type: Grant

Filed: September 30, 2016

Date of Patent: April 23, 2019

Assignee: Intel Corporation

Inventors: Kshitij A. Doshi, Namakkal N. Venkatesan, Ren Wang, Andrew J. Herdrich
SOFTWARE-TRANSPARENT HARDWARE PREDICTOR FOR CORE-TO-CORE DATA TRANSFER OPTIMIZATION

Publication number: 20190102303

Abstract: Apparatus, method, and system for implementing a software-transparent hardware predictor for core-to-core data communication optimization are described herein. An embodiment of the apparatus includes a plurality of hardware processor cores each including a private cache; a shared cache that is communicatively coupled to and shared by the plurality of hardware processor cores; and a predictor circuit. The predictor circuit is to track activities relating to a plurality of monitored cache lines in the private cache of a producer hardware processor core (producer core) and to enable a cache line push operation upon determining a target hardware processor core (target core) based on the tracked activities. An execution of the cache line push operation is to cause a plurality of unmonitored cache lines in the private cache of the producer core to be moved to the private cache of the target core.

Type: Application

Filed: September 29, 2017

Publication date: April 4, 2019

Inventors: Ren Wang, Joseph Nuzman, Samantika S. Sury, Andrew J. Herdrich, Namakkal N. Venkatesan, Anil Vasudevan, Tsung-Yuan C. Tai, Niall D. McDonnell
LAZY INCREMENT FOR HIGH FREQUENCY COUNTERS

Publication number: 20190102312

Abstract: A computing apparatus, including: a processor; a pointer to a counter memory location; and a lazy increment counter engine to: receive a stimulus to update the counter; and lazy increment the counter including issuing a weakly-ordered increment directive to the pointer.

Type: Application

Filed: September 30, 2017

Publication date: April 4, 2019

Inventors: Niall D. McDonnell, Christopher MacNamara, John J. Browne, Andrew Cunningham, Brendan Ryan, Patrick Fleming, Namakkal N. Venkatesan, Bruce Richardson, Tomasz Kantecki, Sean Harte, Pierre Laurent
TECHNIQUES FOR FLOW CONTROL PACKET PROCESSING

Publication number: 20190097984

Abstract: Techniques and apparatuses for processing data unit are described. In one embodiment, for example, an apparatus for networking may include at least one memory, logic, at least a portion of the logic comprised in hardware coupled to the at least one memory, the logic to access an encrypted packet having an encrypted portion, determine at least one flow control segment of the encrypted portion, decrypt the at least one flow control segment to generate a partially-decrypted packet comprising a decrypted at least one flow control segment and an encrypted remainder portion, the remainder portion comprising a portion of the encrypted packet that does not include the decrypted at least one flow control segment, access process information in the decrypted at least one flow control segment, and process the partially-decrypted packet according to the process information. Other embodiments are described and claimed.

Type: Application

Filed: September 26, 2017

Publication date: March 28, 2019

Applicant: INTEL CORPORATION

Inventors: John J. Browne, Chris Macnamara, Namakkal N. Venkatesan, Tomasz Kantecki, Declan W. Doherty
Instruction and logic for software hints to improve hardware prefetcher effectiveness

Patent number: 10229060

Abstract: Embodiments provide for a processor comprising a cache, a prefetcher to select information according to a prefetcher algorithm and to send the selected information to the cache, and a prefetch tuning buffer including tuning state for the set of candidate prefetcher algorithms, wherein the prefetcher is to adjust operation of the prefetcher algorithm based on the tuning state.

Type: Grant

Filed: December 5, 2016

Date of Patent: March 12, 2019

Assignee: INTEL CORPORATION

Inventors: Christopher B. Wilkerson, Ren Wang, Namakkal N. Venkatesan, Patrick Lu

1 2 3 next