Patents by Inventor Namakkal N. Venkatesan
Namakkal N. Venkatesan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11513957Abstract: Methods and apparatus implementing Hardware/Software co-optimization to improve performance and energy for inter-VM communication for NFVs and other producer-consumer workloads. The apparatus include multi-core processors with multi-level cache hierarchies including and L1 and L2 cache for each core and a shared last-level cache (LLC). One or more machine-level instructions are provided for proactively demoting cachelines from lower cache levels to higher cache levels, including demoting cachelines from L1/L2 caches to an LLC. Techniques are also provided for implementing hardware/software co-optimization in multi-socket NUMA architecture system, wherein cachelines may be selectively demoted and pushed to an LLC in a remote socket. In addition, techniques are disclosure for implementing early snooping in multi-socket systems to reduce latency when accessing cachelines on remote sockets.Type: GrantFiled: September 21, 2020Date of Patent: November 29, 2022Assignee: Intel CorporationInventors: Ren Wang, Andrew J. Herdrich, Yen-cheng Liu, Herbert H. Hum, Jong Soo Park, Christopher J. Hughes, Namakkal N. Venkatesan, Adrian C. Moga, Aamer Jaleel, Zeshan A. Chishti, Mesut A. Ergin, Jr-shian Tsai, Alexander W. Min, Tsung-yuan C. Tai, Christian Maciocco, Rajesh Sankaran
-
Patent number: 11418495Abstract: Techniques and apparatuses for processing data unit are described. In one embodiment, for example, an apparatus for networking may include at least one memory, logic, at least a portion of the logic comprised in hardware coupled to the at least one memory, the logic to access an encrypted packet having an encrypted portion, determine at least one flow control segment of the encrypted portion, decrypt the at least one flow control segment to generate a partially-decrypted packet comprising a decrypted at least one flow control segment and an encrypted remainder portion, the remainder portion comprising a portion of the encrypted packet that does not include the decrypted at least one flow control segment, access process information in the decrypted at least one flow control segment, and process the partially-decrypted packet according to the process information. Other embodiments are described and claimed.Type: GrantFiled: September 26, 2017Date of Patent: August 16, 2022Inventors: John J. Browne, Chris Macnamara, Namakkal N. Venkatesan, Tomasz Kantecki, Declan W. Doherty
-
Patent number: 11296956Abstract: There is disclosed in one example a computing apparatus, including: a hardware platform configured to communicatively couple with a multi-tenant cloud service, the multi-tenant cloud service including an oversubscribable resource; and a service assurance for oversubscribable resource (SAOR) engine configured to: receive tenant subscriptions to the oversubscribable resource, wherein tenant subscriptions exceed available instances of the oversubscribable resource; receive per-tenant quality of service (QoS) metrics for the oversubscribable resource; receive an allocation request from a guest for allocation of an instance of the oversubscribable resource; compare the request to currently-available instances of the oversubscribable resource; determine that the oversubscribable resource has capacity to service the request according to the QoS metrics of the tenant; and allocate an instance of the oversubscribable resource to the guest.Type: GrantFiled: June 26, 2018Date of Patent: April 5, 2022Assignee: Intel CorporationInventors: Fan Zhang, Roger Keith Wiles, Xin Zeng, Cunming Liang, Namakkal N. Venkatesan
-
Patent number: 11080202Abstract: A computing apparatus, including: a processor; a pointer to a counter memory location; and a lazy increment counter engine to: receive a stimulus to update the counter; and lazy increment the counter including issuing a weakly-ordered increment directive to the pointer.Type: GrantFiled: September 30, 2017Date of Patent: August 3, 2021Assignee: Intel CorporationInventors: Niall D. McDonnell, Christopher MacNamara, John J. Browne, Andrew Cunningham, Brendan Ryan, Patrick Fleming, Namakkal N. Venkatesan, Bruce Richardson, Tomasz Kantecki, Sean Harte, Pierre Laurent
-
Patent number: 10929323Abstract: Apparatus and methods implementing a hardware queue management device for reducing inter-core data transfer overhead by offloading request management and data coherency tasks from the CPU cores. The apparatus include multi-core processors, a shared L3 or last-level cache (“LLC”), and a hardware queue management device to receive, store, and process inter-core data transfer requests. The hardware queue management device further comprises a resource management system to control the rate in which the cores may submit requests to reduce core stalls and dropped requests. Additionally, software instructions are introduced to optimize communication between the cores and the queue management device.Type: GrantFiled: October 14, 2019Date of Patent: February 23, 2021Assignee: Intel CorporationInventors: Ren Wang, Yipeng Wang, Andrew Herdrich, Jr-Shian Tsai, Tsung-Yuan C. Tai, Niall D. McDonnell, Hugh Wilkinson, Bradley A. Burres, Bruce Richardson, Namakkal N. Venkatesan, Debra Bernstein, Edwin Verplanke, Stephen R. Van Doren, An Yan, Andrew Cunningham, David Sonnier, Gage Eads, James T. Clee, Jamison D. Whitesell, Jerry Pirog, Jonathan Kenny, Joseph R. Hasting, Narender Vangati, Stephen Miller, Te K. Ma, William Burroughs
-
Publication number: 20210044503Abstract: There is disclosed in one example a computing apparatus, including: a hardware platform configured to communicatively couple with a multi-tenant cloud service, the multi-tenant cloud service including an oversubscribable resource; and a service assurance for oversubscribable resource (SAOR) engine configured to: receive tenant subscriptions to the oversubscribable resource, wherein tenant subscriptions exceed available instances of the oversubscribable resource; receive per-tenant quality of service (QoS) metrics for the oversubscribable resource; receive an allocation request from a guest for allocation of an instance of the oversubscribable resource; compare the request to currently-available instances of the oversubscribable resource; determine that the oversubscribable resource has capacity to service the request according to the QoS metrics of the tenant; and allocate an instance of the oversubscribable resource to the guest.Type: ApplicationFiled: June 28, 2018Publication date: February 11, 2021Applicant: Intel CorporationInventors: Fan Zhang, Roger Keith Wiles, Xin Zeng, Cunming Liang, Namakkal N. Venkatesan
-
Publication number: 20210004328Abstract: Methods and apparatus implementing Hardware/Software co-optimization to improve performance and energy for inter-VM communication for NFVs and other producer-consumer workloads. The apparatus include multi-core processors with multi-level cache hierarchies including and L1 and L2 cache for each core and a shared last-level cache (LLC). One or more machine-level instructions are provided for proactively demoting cachelines from lower cache levels to higher cache levels, including demoting cachelines from L1/L2 caches to an LLC. Techniques are also provided for implementing hardware/software co-optimization in multi-socket NUMA architecture system, wherein cachelines may be selectively demoted and pushed to an LLC in a remote socket. In addition, techniques are disclosure for implementing early snooping in multi-socket systems to reduce latency when accessing cachelines on remote sockets.Type: ApplicationFiled: September 21, 2020Publication date: January 7, 2021Inventors: Ren Wang, Andrew J. Herdrich, Yen-cheng Liu, Herbert H. Hum, Jong Soo Park, Christopher J. Hughes, Namakkal N. Venkatesan, Adrian C. Moga, Aamer Jaleel, Zeshan A. Chishti, Mesut A. Ergin, Jr-shian Tsai, Alexander W. Min, Tsung-yuan C. Tai, Christian Maciocco, Rajesh Sankaran
-
Patent number: 10817425Abstract: Methods and apparatus implementing Hardware/Software co-optimization to improve performance and energy for inter-VM communication for NFVs and other producer-consumer workloads. The apparatus include multi-core processors with multi-level cache hierarchies including and L1 and L2 cache for each core and a shared last-level cache (LLC). One or more machine-level instructions are provided for proactively demoting cachelines from lower cache levels to higher cache levels, including demoting cachelines from L1/L2 caches to an LLC. Techniques are also provided for implementing hardware/software co-optimization in multi-socket NUMA architecture system, wherein cachelines may be selectively demoted and pushed to an LLC in a remote socket. In addition, techniques are disclosure for implementing early snooping in multi-socket systems to reduce latency when accessing cachelines on remote sockets.Type: GrantFiled: December 26, 2014Date of Patent: October 27, 2020Assignee: Intel CorporationInventors: Ren Wang, Andrew J. Herdrich, Yen-cheng Liu, Herbert H. Hum, Jong Soo Park, Christopher J. Hughes, Namakkal N. Venkatesan, Adrian C. Moga, Aamer Jaleel, Zeshan A. Chishti, Mesut A. Ergin, Jr-shian Tsai, Alexander W. Min, Tsung-yuan C. Tai, Christian Maciocco, Rajesh Sankaran
-
Publication number: 20200285578Abstract: Apparatus, method, and system for implementing a software-transparent hardware predictor for core-to-core data communication optimization are described herein. An embodiment of the apparatus includes a plurality of hardware processor cores each including a private cache; a shared cache that is communicatively coupled to and shared by the plurality of hardware processor cores; and a predictor circuit. The predictor circuit is to track activities relating to a plurality of monitored cache lines in the private cache of a producer hardware processor core (producer core) and to enable a cache line push operation upon determining a target hardware processor core (target core) based on the tracked activities. An execution of the cache line push operation is to cause a plurality of unmonitored cache lines in the private cache of the producer core to be moved to the private cache of the target core.Type: ApplicationFiled: March 18, 2020Publication date: September 10, 2020Applicant: Intel CorporationInventors: Ren Wang, Joseph Nuzman, Samantika S. Sury, Andrew J. Herdrich, Namakkal N. Venkatesan, Anil Vasudevan, Tsung-Yuan C. Tai, Niall D. McDonnell
-
Patent number: 10635590Abstract: Apparatus, method, and system for implementing a software-transparent hardware predictor for core-to-core data communication optimization are described herein. An embodiment of the apparatus includes a plurality of hardware processor cores each including a private cache; a shared cache that is communicatively coupled to and shared by the plurality of hardware processor cores; and a predictor circuit. The predictor circuit is to track activities relating to a plurality of monitored cache lines in the private cache of a producer hardware processor core (producer core) and to enable a cache line push operation upon determining a target hardware processor core (target core) based on the tracked activities. An execution of the cache line push operation is to cause a plurality of unmonitored cache lines in the private cache of the producer core to be moved to the private cache of the target core.Type: GrantFiled: September 29, 2017Date of Patent: April 28, 2020Assignee: Intel CorporationInventors: Ren Wang, Joseph Nuzman, Samantika S. Sury, Andrew J. Herdrich, Namakkal N. Venkatesan, Anil Vasudevan, Tsung-Yuan C. Tai, Niall D. McDonnell
-
Publication number: 20200042479Abstract: Apparatus and methods implementing a hardware queue management device for reducing inter-core data transfer overhead by offloading request management and data coherency tasks from the CPU cores. The apparatus include multi-core processors, a shared L3 or last-level cache (“LLC”), and a hardware queue management device to receive, store, and process inter-core data transfer requests. The hardware queue management device further comprises a resource management system to control the rate in which the cores may submit requests to reduce core stalls and dropped requests. Additionally, software instructions are introduced to optimize communication between the cores and the queue management device.Type: ApplicationFiled: October 14, 2019Publication date: February 6, 2020Applicant: Intel CorporationInventors: Ren Wang, Yipeng Wang, Andrew Herdrich, Jr-Shian Tsai, Tsung-Yuan C. Tai, Niall D. McDonnell, Hugh Wilkinson, Bradley A. Burres, Bruce Richardson, Namakkal N. Venkatesan, Debra Bernstein, Edwin Verplanke, Stephen R. Van Doren, An Yan, Andrew Cunningham, David Sonnier, Gage Eads, James T. Clee, Jamison D. Whitesell, Jerry Pirog, Jonathan Kenny, Joseph R. Hasting, Narender Vangati, Stephen Miller, Te K. Ma, William Burroughs
-
Patent number: 10455063Abstract: Technologies for packet flow classification on a computing device include a hash table including a plurality of hash table buckets in which each hash table bucket maps a plurality of keys to corresponding traffic flows. The computing device performs packet flow classification on received data packets, where the packet flow classification includes a plurality of sequential classification stages and fetch classification operations and non-fetch classification operations are performed in each classification stage. The fetch classification operations include to prefetch a key of a first received data packet based on a set of packet fields of the first received data packet for use during a subsequent classification stage, prefetch a hash table bucket from the hash table based on a key signature of the prefetched key for use during another subsequent classification stage, and prefetch a traffic flow to be applied to the first received data packet based on the prefetched hash table bucket and the prefetched key.Type: GrantFiled: August 15, 2017Date of Patent: October 22, 2019Assignee: Intel CorporationInventors: Cristian Florin F. Dumitrescu, Namakkal N. Venkatesan, Pierre Laurent, Bruce Richardson
-
Patent number: 10445271Abstract: Apparatus and methods implementing a hardware queue management device for reducing inter-core data transfer overhead by offloading request management and data coherency tasks from the CPU cores. The apparatus include multi-core processors, a shared L3 or last-level cache (“LLC”), and a hardware queue management device to receive, store, and process inter-core data transfer requests. The hardware queue management device further comprises a resource management system to control the rate in which the cores may submit requests to reduce core stalls and dropped requests. Additionally, software instructions are introduced to optimize communication between the cores and the queue management device.Type: GrantFiled: January 4, 2016Date of Patent: October 15, 2019Assignee: Intel CorporationInventors: Ren Wang, Namakkal N. Venkatesan, Debra Bernstein, Edwin Verplanke, Stephen R. Van Doren, An Yan, Andrew Cunningham, David Sonnier, Gage Eads, James T. Clee, Jamison D. Whitesell, Yipeng Wang, Jerry Pirog, Jonathan Kenny, Joseph R. Hasting, Narender Vangati, Stephen Miller, Te K. Ma, William Burroughs, Andrew J. Herdrich, Jr-Shian Tsai, Tsung-Yuan C. Tai, Niall D. McDonnell, Hugh Wilkinson, Bradley A. Burres, Bruce Richardson
-
Patent number: 10334041Abstract: A network interface device (NID) interfaced with a host machine communicates with a local link of the host machine to obtain transaction-specific data relied upon by the host machine to be delivered to a destination by the NID according to a reliable message delivery protocol. The NID conducts communications over a network in response to obtaining of the transaction-specific data, with the network communications including execution of the reliable message delivery protocol independent of any operability of the host machine.Type: GrantFiled: November 23, 2015Date of Patent: June 25, 2019Assignee: Intel CorporationInventors: Vadim Sukhomlinov, Kshitij A. Doshi, Namakkal N. Venkatesan, Roger Keith Wiles
-
Patent number: 10284470Abstract: Technologies for managing network flow lookups of a network device include a network controller and a target device, each communicatively coupled to the network device. The network device includes a cache for a processor of the network device and a main memory. The network device additionally includes a multi-level hash table having a first-level hash table stored in the cache of the network device and a second-level hash table stored in the main memory of the network device. The network device is configured to determine whether to store a network flow hash corresponding to a network flow indicating the target device in the first-level or second-level hash table based on a priority of the network flow provided to the network device by the network controller.Type: GrantFiled: December 23, 2014Date of Patent: May 7, 2019Assignee: Intel CorporationInventors: Ren Wang, Namakkal N. Venkatesan, Aamer Jaleel, Tsung-Yuan C. Tai, Sameh Gobriel, Christian Maciocco
-
Patent number: 10268580Abstract: Processors and methods implementing a machine instruction to perform cache line demotion on multiple cache lines to enable efficient sharing of cache lines between processor cores. One general aspect includes a processor comprising: a plurality of hardware processor cores, where each of the hardware processor cores to include a first cache. The processor also includes a second cache, communicatively coupled to and shared by the plurality of hardware processor cores. The processor to support a first machine instruction, the first machine instruction to include a vector register operand identifying a vector register which contains a plurality of data elements each used to identify a cache line. An execution of the first machine instruction by one of the plurality of hardware processor cores to cause a plurality of identified cache lines to be demoted, such that the demoted cache lines are moved from the first cache to the second cache.Type: GrantFiled: September 30, 2016Date of Patent: April 23, 2019Assignee: Intel CorporationInventors: Kshitij A. Doshi, Namakkal N. Venkatesan, Ren Wang, Andrew J. Herdrich
-
Publication number: 20190102303Abstract: Apparatus, method, and system for implementing a software-transparent hardware predictor for core-to-core data communication optimization are described herein. An embodiment of the apparatus includes a plurality of hardware processor cores each including a private cache; a shared cache that is communicatively coupled to and shared by the plurality of hardware processor cores; and a predictor circuit. The predictor circuit is to track activities relating to a plurality of monitored cache lines in the private cache of a producer hardware processor core (producer core) and to enable a cache line push operation upon determining a target hardware processor core (target core) based on the tracked activities. An execution of the cache line push operation is to cause a plurality of unmonitored cache lines in the private cache of the producer core to be moved to the private cache of the target core.Type: ApplicationFiled: September 29, 2017Publication date: April 4, 2019Inventors: Ren Wang, Joseph Nuzman, Samantika S. Sury, Andrew J. Herdrich, Namakkal N. Venkatesan, Anil Vasudevan, Tsung-Yuan C. Tai, Niall D. McDonnell
-
Publication number: 20190102312Abstract: A computing apparatus, including: a processor; a pointer to a counter memory location; and a lazy increment counter engine to: receive a stimulus to update the counter; and lazy increment the counter including issuing a weakly-ordered increment directive to the pointer.Type: ApplicationFiled: September 30, 2017Publication date: April 4, 2019Inventors: Niall D. McDonnell, Christopher MacNamara, John J. Browne, Andrew Cunningham, Brendan Ryan, Patrick Fleming, Namakkal N. Venkatesan, Bruce Richardson, Tomasz Kantecki, Sean Harte, Pierre Laurent
-
Publication number: 20190097984Abstract: Techniques and apparatuses for processing data unit are described. In one embodiment, for example, an apparatus for networking may include at least one memory, logic, at least a portion of the logic comprised in hardware coupled to the at least one memory, the logic to access an encrypted packet having an encrypted portion, determine at least one flow control segment of the encrypted portion, decrypt the at least one flow control segment to generate a partially-decrypted packet comprising a decrypted at least one flow control segment and an encrypted remainder portion, the remainder portion comprising a portion of the encrypted packet that does not include the decrypted at least one flow control segment, access process information in the decrypted at least one flow control segment, and process the partially-decrypted packet according to the process information. Other embodiments are described and claimed.Type: ApplicationFiled: September 26, 2017Publication date: March 28, 2019Applicant: INTEL CORPORATIONInventors: John J. Browne, Chris Macnamara, Namakkal N. Venkatesan, Tomasz Kantecki, Declan W. Doherty
-
Patent number: 10229060Abstract: Embodiments provide for a processor comprising a cache, a prefetcher to select information according to a prefetcher algorithm and to send the selected information to the cache, and a prefetch tuning buffer including tuning state for the set of candidate prefetcher algorithms, wherein the prefetcher is to adjust operation of the prefetcher algorithm based on the tuning state.Type: GrantFiled: December 5, 2016Date of Patent: March 12, 2019Assignee: INTEL CORPORATIONInventors: Christopher B. Wilkerson, Ren Wang, Namakkal N. Venkatesan, Patrick Lu