Patents by Inventor Namakkal N. Venkatesan

Namakkal N. Venkatesan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11513957
    Abstract: Methods and apparatus implementing Hardware/Software co-optimization to improve performance and energy for inter-VM communication for NFVs and other producer-consumer workloads. The apparatus include multi-core processors with multi-level cache hierarchies including and L1 and L2 cache for each core and a shared last-level cache (LLC). One or more machine-level instructions are provided for proactively demoting cachelines from lower cache levels to higher cache levels, including demoting cachelines from L1/L2 caches to an LLC. Techniques are also provided for implementing hardware/software co-optimization in multi-socket NUMA architecture system, wherein cachelines may be selectively demoted and pushed to an LLC in a remote socket. In addition, techniques are disclosure for implementing early snooping in multi-socket systems to reduce latency when accessing cachelines on remote sockets.
    Type: Grant
    Filed: September 21, 2020
    Date of Patent: November 29, 2022
    Assignee: Intel Corporation
    Inventors: Ren Wang, Andrew J. Herdrich, Yen-cheng Liu, Herbert H. Hum, Jong Soo Park, Christopher J. Hughes, Namakkal N. Venkatesan, Adrian C. Moga, Aamer Jaleel, Zeshan A. Chishti, Mesut A. Ergin, Jr-shian Tsai, Alexander W. Min, Tsung-yuan C. Tai, Christian Maciocco, Rajesh Sankaran
  • Patent number: 11418495
    Abstract: Techniques and apparatuses for processing data unit are described. In one embodiment, for example, an apparatus for networking may include at least one memory, logic, at least a portion of the logic comprised in hardware coupled to the at least one memory, the logic to access an encrypted packet having an encrypted portion, determine at least one flow control segment of the encrypted portion, decrypt the at least one flow control segment to generate a partially-decrypted packet comprising a decrypted at least one flow control segment and an encrypted remainder portion, the remainder portion comprising a portion of the encrypted packet that does not include the decrypted at least one flow control segment, access process information in the decrypted at least one flow control segment, and process the partially-decrypted packet according to the process information. Other embodiments are described and claimed.
    Type: Grant
    Filed: September 26, 2017
    Date of Patent: August 16, 2022
    Inventors: John J. Browne, Chris Macnamara, Namakkal N. Venkatesan, Tomasz Kantecki, Declan W. Doherty
  • Patent number: 11296956
    Abstract: There is disclosed in one example a computing apparatus, including: a hardware platform configured to communicatively couple with a multi-tenant cloud service, the multi-tenant cloud service including an oversubscribable resource; and a service assurance for oversubscribable resource (SAOR) engine configured to: receive tenant subscriptions to the oversubscribable resource, wherein tenant subscriptions exceed available instances of the oversubscribable resource; receive per-tenant quality of service (QoS) metrics for the oversubscribable resource; receive an allocation request from a guest for allocation of an instance of the oversubscribable resource; compare the request to currently-available instances of the oversubscribable resource; determine that the oversubscribable resource has capacity to service the request according to the QoS metrics of the tenant; and allocate an instance of the oversubscribable resource to the guest.
    Type: Grant
    Filed: June 26, 2018
    Date of Patent: April 5, 2022
    Assignee: Intel Corporation
    Inventors: Fan Zhang, Roger Keith Wiles, Xin Zeng, Cunming Liang, Namakkal N. Venkatesan
  • Patent number: 11080202
    Abstract: A computing apparatus, including: a processor; a pointer to a counter memory location; and a lazy increment counter engine to: receive a stimulus to update the counter; and lazy increment the counter including issuing a weakly-ordered increment directive to the pointer.
    Type: Grant
    Filed: September 30, 2017
    Date of Patent: August 3, 2021
    Assignee: Intel Corporation
    Inventors: Niall D. McDonnell, Christopher MacNamara, John J. Browne, Andrew Cunningham, Brendan Ryan, Patrick Fleming, Namakkal N. Venkatesan, Bruce Richardson, Tomasz Kantecki, Sean Harte, Pierre Laurent
  • Patent number: 10929323
    Abstract: Apparatus and methods implementing a hardware queue management device for reducing inter-core data transfer overhead by offloading request management and data coherency tasks from the CPU cores. The apparatus include multi-core processors, a shared L3 or last-level cache (“LLC”), and a hardware queue management device to receive, store, and process inter-core data transfer requests. The hardware queue management device further comprises a resource management system to control the rate in which the cores may submit requests to reduce core stalls and dropped requests. Additionally, software instructions are introduced to optimize communication between the cores and the queue management device.
    Type: Grant
    Filed: October 14, 2019
    Date of Patent: February 23, 2021
    Assignee: Intel Corporation
    Inventors: Ren Wang, Yipeng Wang, Andrew Herdrich, Jr-Shian Tsai, Tsung-Yuan C. Tai, Niall D. McDonnell, Hugh Wilkinson, Bradley A. Burres, Bruce Richardson, Namakkal N. Venkatesan, Debra Bernstein, Edwin Verplanke, Stephen R. Van Doren, An Yan, Andrew Cunningham, David Sonnier, Gage Eads, James T. Clee, Jamison D. Whitesell, Jerry Pirog, Jonathan Kenny, Joseph R. Hasting, Narender Vangati, Stephen Miller, Te K. Ma, William Burroughs
  • Publication number: 20210044503
    Abstract: There is disclosed in one example a computing apparatus, including: a hardware platform configured to communicatively couple with a multi-tenant cloud service, the multi-tenant cloud service including an oversubscribable resource; and a service assurance for oversubscribable resource (SAOR) engine configured to: receive tenant subscriptions to the oversubscribable resource, wherein tenant subscriptions exceed available instances of the oversubscribable resource; receive per-tenant quality of service (QoS) metrics for the oversubscribable resource; receive an allocation request from a guest for allocation of an instance of the oversubscribable resource; compare the request to currently-available instances of the oversubscribable resource; determine that the oversubscribable resource has capacity to service the request according to the QoS metrics of the tenant; and allocate an instance of the oversubscribable resource to the guest.
    Type: Application
    Filed: June 28, 2018
    Publication date: February 11, 2021
    Applicant: Intel Corporation
    Inventors: Fan Zhang, Roger Keith Wiles, Xin Zeng, Cunming Liang, Namakkal N. Venkatesan
  • Publication number: 20210004328
    Abstract: Methods and apparatus implementing Hardware/Software co-optimization to improve performance and energy for inter-VM communication for NFVs and other producer-consumer workloads. The apparatus include multi-core processors with multi-level cache hierarchies including and L1 and L2 cache for each core and a shared last-level cache (LLC). One or more machine-level instructions are provided for proactively demoting cachelines from lower cache levels to higher cache levels, including demoting cachelines from L1/L2 caches to an LLC. Techniques are also provided for implementing hardware/software co-optimization in multi-socket NUMA architecture system, wherein cachelines may be selectively demoted and pushed to an LLC in a remote socket. In addition, techniques are disclosure for implementing early snooping in multi-socket systems to reduce latency when accessing cachelines on remote sockets.
    Type: Application
    Filed: September 21, 2020
    Publication date: January 7, 2021
    Inventors: Ren Wang, Andrew J. Herdrich, Yen-cheng Liu, Herbert H. Hum, Jong Soo Park, Christopher J. Hughes, Namakkal N. Venkatesan, Adrian C. Moga, Aamer Jaleel, Zeshan A. Chishti, Mesut A. Ergin, Jr-shian Tsai, Alexander W. Min, Tsung-yuan C. Tai, Christian Maciocco, Rajesh Sankaran
  • Patent number: 10817425
    Abstract: Methods and apparatus implementing Hardware/Software co-optimization to improve performance and energy for inter-VM communication for NFVs and other producer-consumer workloads. The apparatus include multi-core processors with multi-level cache hierarchies including and L1 and L2 cache for each core and a shared last-level cache (LLC). One or more machine-level instructions are provided for proactively demoting cachelines from lower cache levels to higher cache levels, including demoting cachelines from L1/L2 caches to an LLC. Techniques are also provided for implementing hardware/software co-optimization in multi-socket NUMA architecture system, wherein cachelines may be selectively demoted and pushed to an LLC in a remote socket. In addition, techniques are disclosure for implementing early snooping in multi-socket systems to reduce latency when accessing cachelines on remote sockets.
    Type: Grant
    Filed: December 26, 2014
    Date of Patent: October 27, 2020
    Assignee: Intel Corporation
    Inventors: Ren Wang, Andrew J. Herdrich, Yen-cheng Liu, Herbert H. Hum, Jong Soo Park, Christopher J. Hughes, Namakkal N. Venkatesan, Adrian C. Moga, Aamer Jaleel, Zeshan A. Chishti, Mesut A. Ergin, Jr-shian Tsai, Alexander W. Min, Tsung-yuan C. Tai, Christian Maciocco, Rajesh Sankaran
  • Publication number: 20200285578
    Abstract: Apparatus, method, and system for implementing a software-transparent hardware predictor for core-to-core data communication optimization are described herein. An embodiment of the apparatus includes a plurality of hardware processor cores each including a private cache; a shared cache that is communicatively coupled to and shared by the plurality of hardware processor cores; and a predictor circuit. The predictor circuit is to track activities relating to a plurality of monitored cache lines in the private cache of a producer hardware processor core (producer core) and to enable a cache line push operation upon determining a target hardware processor core (target core) based on the tracked activities. An execution of the cache line push operation is to cause a plurality of unmonitored cache lines in the private cache of the producer core to be moved to the private cache of the target core.
    Type: Application
    Filed: March 18, 2020
    Publication date: September 10, 2020
    Applicant: Intel Corporation
    Inventors: Ren Wang, Joseph Nuzman, Samantika S. Sury, Andrew J. Herdrich, Namakkal N. Venkatesan, Anil Vasudevan, Tsung-Yuan C. Tai, Niall D. McDonnell
  • Patent number: 10635590
    Abstract: Apparatus, method, and system for implementing a software-transparent hardware predictor for core-to-core data communication optimization are described herein. An embodiment of the apparatus includes a plurality of hardware processor cores each including a private cache; a shared cache that is communicatively coupled to and shared by the plurality of hardware processor cores; and a predictor circuit. The predictor circuit is to track activities relating to a plurality of monitored cache lines in the private cache of a producer hardware processor core (producer core) and to enable a cache line push operation upon determining a target hardware processor core (target core) based on the tracked activities. An execution of the cache line push operation is to cause a plurality of unmonitored cache lines in the private cache of the producer core to be moved to the private cache of the target core.
    Type: Grant
    Filed: September 29, 2017
    Date of Patent: April 28, 2020
    Assignee: Intel Corporation
    Inventors: Ren Wang, Joseph Nuzman, Samantika S. Sury, Andrew J. Herdrich, Namakkal N. Venkatesan, Anil Vasudevan, Tsung-Yuan C. Tai, Niall D. McDonnell
  • Publication number: 20200042479
    Abstract: Apparatus and methods implementing a hardware queue management device for reducing inter-core data transfer overhead by offloading request management and data coherency tasks from the CPU cores. The apparatus include multi-core processors, a shared L3 or last-level cache (“LLC”), and a hardware queue management device to receive, store, and process inter-core data transfer requests. The hardware queue management device further comprises a resource management system to control the rate in which the cores may submit requests to reduce core stalls and dropped requests. Additionally, software instructions are introduced to optimize communication between the cores and the queue management device.
    Type: Application
    Filed: October 14, 2019
    Publication date: February 6, 2020
    Applicant: Intel Corporation
    Inventors: Ren Wang, Yipeng Wang, Andrew Herdrich, Jr-Shian Tsai, Tsung-Yuan C. Tai, Niall D. McDonnell, Hugh Wilkinson, Bradley A. Burres, Bruce Richardson, Namakkal N. Venkatesan, Debra Bernstein, Edwin Verplanke, Stephen R. Van Doren, An Yan, Andrew Cunningham, David Sonnier, Gage Eads, James T. Clee, Jamison D. Whitesell, Jerry Pirog, Jonathan Kenny, Joseph R. Hasting, Narender Vangati, Stephen Miller, Te K. Ma, William Burroughs
  • Patent number: 10455063
    Abstract: Technologies for packet flow classification on a computing device include a hash table including a plurality of hash table buckets in which each hash table bucket maps a plurality of keys to corresponding traffic flows. The computing device performs packet flow classification on received data packets, where the packet flow classification includes a plurality of sequential classification stages and fetch classification operations and non-fetch classification operations are performed in each classification stage. The fetch classification operations include to prefetch a key of a first received data packet based on a set of packet fields of the first received data packet for use during a subsequent classification stage, prefetch a hash table bucket from the hash table based on a key signature of the prefetched key for use during another subsequent classification stage, and prefetch a traffic flow to be applied to the first received data packet based on the prefetched hash table bucket and the prefetched key.
    Type: Grant
    Filed: August 15, 2017
    Date of Patent: October 22, 2019
    Assignee: Intel Corporation
    Inventors: Cristian Florin F. Dumitrescu, Namakkal N. Venkatesan, Pierre Laurent, Bruce Richardson
  • Patent number: 10445271
    Abstract: Apparatus and methods implementing a hardware queue management device for reducing inter-core data transfer overhead by offloading request management and data coherency tasks from the CPU cores. The apparatus include multi-core processors, a shared L3 or last-level cache (“LLC”), and a hardware queue management device to receive, store, and process inter-core data transfer requests. The hardware queue management device further comprises a resource management system to control the rate in which the cores may submit requests to reduce core stalls and dropped requests. Additionally, software instructions are introduced to optimize communication between the cores and the queue management device.
    Type: Grant
    Filed: January 4, 2016
    Date of Patent: October 15, 2019
    Assignee: Intel Corporation
    Inventors: Ren Wang, Namakkal N. Venkatesan, Debra Bernstein, Edwin Verplanke, Stephen R. Van Doren, An Yan, Andrew Cunningham, David Sonnier, Gage Eads, James T. Clee, Jamison D. Whitesell, Yipeng Wang, Jerry Pirog, Jonathan Kenny, Joseph R. Hasting, Narender Vangati, Stephen Miller, Te K. Ma, William Burroughs, Andrew J. Herdrich, Jr-Shian Tsai, Tsung-Yuan C. Tai, Niall D. McDonnell, Hugh Wilkinson, Bradley A. Burres, Bruce Richardson
  • Patent number: 10334041
    Abstract: A network interface device (NID) interfaced with a host machine communicates with a local link of the host machine to obtain transaction-specific data relied upon by the host machine to be delivered to a destination by the NID according to a reliable message delivery protocol. The NID conducts communications over a network in response to obtaining of the transaction-specific data, with the network communications including execution of the reliable message delivery protocol independent of any operability of the host machine.
    Type: Grant
    Filed: November 23, 2015
    Date of Patent: June 25, 2019
    Assignee: Intel Corporation
    Inventors: Vadim Sukhomlinov, Kshitij A. Doshi, Namakkal N. Venkatesan, Roger Keith Wiles
  • Patent number: 10284470
    Abstract: Technologies for managing network flow lookups of a network device include a network controller and a target device, each communicatively coupled to the network device. The network device includes a cache for a processor of the network device and a main memory. The network device additionally includes a multi-level hash table having a first-level hash table stored in the cache of the network device and a second-level hash table stored in the main memory of the network device. The network device is configured to determine whether to store a network flow hash corresponding to a network flow indicating the target device in the first-level or second-level hash table based on a priority of the network flow provided to the network device by the network controller.
    Type: Grant
    Filed: December 23, 2014
    Date of Patent: May 7, 2019
    Assignee: Intel Corporation
    Inventors: Ren Wang, Namakkal N. Venkatesan, Aamer Jaleel, Tsung-Yuan C. Tai, Sameh Gobriel, Christian Maciocco
  • Patent number: 10268580
    Abstract: Processors and methods implementing a machine instruction to perform cache line demotion on multiple cache lines to enable efficient sharing of cache lines between processor cores. One general aspect includes a processor comprising: a plurality of hardware processor cores, where each of the hardware processor cores to include a first cache. The processor also includes a second cache, communicatively coupled to and shared by the plurality of hardware processor cores. The processor to support a first machine instruction, the first machine instruction to include a vector register operand identifying a vector register which contains a plurality of data elements each used to identify a cache line. An execution of the first machine instruction by one of the plurality of hardware processor cores to cause a plurality of identified cache lines to be demoted, such that the demoted cache lines are moved from the first cache to the second cache.
    Type: Grant
    Filed: September 30, 2016
    Date of Patent: April 23, 2019
    Assignee: Intel Corporation
    Inventors: Kshitij A. Doshi, Namakkal N. Venkatesan, Ren Wang, Andrew J. Herdrich
  • Publication number: 20190102303
    Abstract: Apparatus, method, and system for implementing a software-transparent hardware predictor for core-to-core data communication optimization are described herein. An embodiment of the apparatus includes a plurality of hardware processor cores each including a private cache; a shared cache that is communicatively coupled to and shared by the plurality of hardware processor cores; and a predictor circuit. The predictor circuit is to track activities relating to a plurality of monitored cache lines in the private cache of a producer hardware processor core (producer core) and to enable a cache line push operation upon determining a target hardware processor core (target core) based on the tracked activities. An execution of the cache line push operation is to cause a plurality of unmonitored cache lines in the private cache of the producer core to be moved to the private cache of the target core.
    Type: Application
    Filed: September 29, 2017
    Publication date: April 4, 2019
    Inventors: Ren Wang, Joseph Nuzman, Samantika S. Sury, Andrew J. Herdrich, Namakkal N. Venkatesan, Anil Vasudevan, Tsung-Yuan C. Tai, Niall D. McDonnell
  • Publication number: 20190102312
    Abstract: A computing apparatus, including: a processor; a pointer to a counter memory location; and a lazy increment counter engine to: receive a stimulus to update the counter; and lazy increment the counter including issuing a weakly-ordered increment directive to the pointer.
    Type: Application
    Filed: September 30, 2017
    Publication date: April 4, 2019
    Inventors: Niall D. McDonnell, Christopher MacNamara, John J. Browne, Andrew Cunningham, Brendan Ryan, Patrick Fleming, Namakkal N. Venkatesan, Bruce Richardson, Tomasz Kantecki, Sean Harte, Pierre Laurent
  • Publication number: 20190097984
    Abstract: Techniques and apparatuses for processing data unit are described. In one embodiment, for example, an apparatus for networking may include at least one memory, logic, at least a portion of the logic comprised in hardware coupled to the at least one memory, the logic to access an encrypted packet having an encrypted portion, determine at least one flow control segment of the encrypted portion, decrypt the at least one flow control segment to generate a partially-decrypted packet comprising a decrypted at least one flow control segment and an encrypted remainder portion, the remainder portion comprising a portion of the encrypted packet that does not include the decrypted at least one flow control segment, access process information in the decrypted at least one flow control segment, and process the partially-decrypted packet according to the process information. Other embodiments are described and claimed.
    Type: Application
    Filed: September 26, 2017
    Publication date: March 28, 2019
    Applicant: INTEL CORPORATION
    Inventors: John J. Browne, Chris Macnamara, Namakkal N. Venkatesan, Tomasz Kantecki, Declan W. Doherty
  • Patent number: 10229060
    Abstract: Embodiments provide for a processor comprising a cache, a prefetcher to select information according to a prefetcher algorithm and to send the selected information to the cache, and a prefetch tuning buffer including tuning state for the set of candidate prefetcher algorithms, wherein the prefetcher is to adjust operation of the prefetcher algorithm based on the tuning state.
    Type: Grant
    Filed: December 5, 2016
    Date of Patent: March 12, 2019
    Assignee: INTEL CORPORATION
    Inventors: Christopher B. Wilkerson, Ren Wang, Namakkal N. Venkatesan, Patrick Lu