Patents by Inventor Sreenivas Subramoney

Sreenivas Subramoney has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

METHOD AND APPARATUS FOR REDUCING WRITE CONGESTION IN NON-VOLATILE MEMORY BASED LAST LEVEL CACHES

Publication number: 20180285268

Abstract: In one embodiment, a processor comprises a processing core, a last level cache (LLC), and a mid-level cache. The mid-level cache is to determine that an idle indicator has been set, wherein the idle indicator is set based on an amount of activity at the LLC, and based on the determination that the idle indicator has been set, identify a first cache line to be evicted from a first set of cache lines of the mid-level cache and send a request to write the first cache line to the LLC.

Type: Application

Filed: March 31, 2017

Publication date: October 4, 2018

Applicant: Intel Corporation

Inventors: Kunal Kishore Korgaonkar, Ishwar S. Bhati, Huichu Liu, Jayesh Gaur, Sasikanth Manipatruni, Sreenivas Subramoney, Tanay Karnik, Hong Wang, Ian A. Young
HARDWARE ACCELERATOR FOR SELECTING DATA ELEMENTS

Publication number: 20180285364

Abstract: A processor may include a plurality of processing elements and a hardware accelerator for selecting data elements. The hardware accelerator may: access an input data set comprising a set of data elements, each data element having a score value; increment bin counters based on the score values of the set of data elements, each bin counter to count a number of data elements with an associated score value; determine a cumulative sum of count values for a sequence of bin counters, the sequence beginning with a first bin counter of the plurality of bin counters; identify a second bin counter in the sequence of bin counters at which the cumulative sum reaches a selection quantity N; and generate an output data set based on a comparison of the set of data elements to a threshold score associated with the second bin counter.

Type: Application

Filed: March 31, 2017

Publication date: October 4, 2018

Inventors: MAHESH MAMIDIPAKA, SRIVATSAVA JANDHYALA, ANISH N K, NAGADASTAGIRI REDDY C, SREENIVAS SUBRAMONEY
EFFICIENT HARDWARE-BASED EXTRACTION OF PROGRAM INSTRUCTIONS FOR CRITICAL PATHS

Publication number: 20180232235

Abstract: A processor includes a memory to hold a buffer to store data dependencies comprising nodes and edges for each of a plurality of micro-operations. The nodes include a first node for dispatch, a second node for execution, and a third node for commit. A detector circuit is to queue, in the buffer, the nodes of a micro-operation; add, to determine a node weight for each of the nodes of the micro-operation, an edge weight to a previous node weight of a connected micro-operation that yields a maximum node weight for the node, wherein the node weight comprises a number of execution cycles of an OOO pipeline of the processor and the edge weight comprises a number of execution cycles to execute the connected micro-operation; and identify, as a critical path, a path through the data dependencies that yields the maximum node weight for the micro-operation.

Type: Application

Filed: February 15, 2017

Publication date: August 16, 2018

Inventors: Jayesh Gaur, Pooja Roy, Sreenivas Subramoney, Hong Wang, Ronak Singhal
WRITE CONGESTION AWARE BYPASS FOR NON-VOLATILE MEMORY, LAST LEVEL CACHE

Publication number: 20180232311

Abstract: A processor includes a processing core and a cache controller including a read queue and a separate write queue. The read queue is to buffer read requests of the processing core to a non-volatile memory, last level cache (NVM-LLC), and the write queue is to buffer write requests to the NVM-LLC. The cache controller is to detect whether the write queue is full. The cache controller further prioritizes a first order of sending requests to the NVM-LLC when the write queue contains an empty slot, the first order specifying a first pattern of sending the read requests before the write requests, and prioritizes a second order of sending requests to the NVM-LLC in response to a determination that the write queue is full, the second order specifying a second pattern of alternating between sending a write request from the write queue and a read request from the read queue.

Type: Application

Filed: February 13, 2017

Publication date: August 16, 2018

Inventors: Ishwar S. Bhati, Huichu Liu, Jayesh Gaur, Kunal Korgaonkar, Sasikanth Manipatruni, Sreenivas Subramoney, Tanay Karnik, Hong Wang, Ian A. Young
MEMORY-EFFICIENT LAST LEVEL CACHE ARCHITECTURE

Publication number: 20180203799

Abstract: A memory-efficient last level cache (LLC) architecture is described. A processor implementing a LLC architecture may include a processor core, a last level cache (LLC) operatively coupled to the processor core, and a cache controller operatively coupled to the LLC. The cache controller is to monitor a bandwidth demand of a channel between the processor core and a dynamic random-access memory (DRAM) device associated with the LLC. The cache controller is further to perform a first defined number of consecutive reads from the DRAM device when the bandwidth demand exceeds a first threshold value and perform a first defined number of consecutive writes of modified lines from the LLC to the DRAM device when the bandwidth demand exceeds the first threshold value.

Type: Application

Filed: January 18, 2017

Publication date: July 19, 2018

Inventors: Jayesh Gaur, Ayan Mandal, Anant Nori, Sreenivas Subramoney
TECHNOLOGIES FOR FEATURE DETECTION AND TRACKING

Publication number: 20180189587

Abstract: Aspects of the present disclosure relates to technologies (systems, devices, methods, etc.) for performing feature detection and/or feature tracking based on image data. In embodiments, the technologies include or leverage a SLAM hardware accelerator (SWA) that includes a feature detection component and optionally a feature tracking component. The feature detection component may be configured to perform feature detection on working data encompassed by a sliding window. The feature tracking component is configured to perform feature tracking operations to track one or more detected features, e.g., using normalized cross correlation (NCC) or another method.

Type: Application

Filed: November 29, 2017

Publication date: July 5, 2018

Applicant: Intel Corporation

Inventors: Dipan Kumar Mandal, Om J. Omer, Lance E. Hacking, James Radford, Sreenivas Subramoney, Eagle Jones, Gautham N. Chinya
SYSTEMS AND METHODS FOR PAGE MANAGEMENT USING LOCAL PAGE INFORMATION

Publication number: 20180188994

Abstract: Systems for page management using local page information are disclosed. The system may include a processor, including a memory controller, and a memory, including a row buffer. The memory controller may include circuitry to determine that a page stored in the row buffer has been idle for a time exceeding a predetermined threshold determine whether the page is exempt from idle page closures, and, based on a determination that the page is exempt, refrain from closing the page. Associated methods are also disclosed.

Type: Application

Filed: December 29, 2016

Publication date: July 5, 2018

Inventors: Sriseshan Srikanth, Lavanya Subramanian, Sreenivas Subramoney
Partner-aware virtual microsectoring for sectored cache architectures

Patent number: 10013352

Abstract: Embodiments described include systems, apparatuses, and methods using sectored dynamic random access memory (DRAM) cache. An exemplary apparatus may include at least one hardware processor core and a sectored dynamic random access (DRAM) cache coupled to the at least one hardware processor core.

Type: Grant

Filed: September 26, 2014

Date of Patent: July 3, 2018

Assignee: Intel Corporation

Inventors: Sreenivas Subramoney, Jayesh Gaur, Mukesh Agrawal, Mainak Chaudhuri
MEMORY AWARE REORDERED SOURCE

Publication number: 20180181329

Abstract: Processor, apparatus, and method for reordering a stream of memory access requests to establish locality are described herein. One embodiment of a method includes: storing in a request queue memory access requests generated by a plurality of execution units, the memory access requests comprising a first request to access a first memory page in a memory and a second request to access a second memory page in the memory; maintaining a list of unique memory pages, each unique memory page associated with one or more memory access requests stored the request queue and is to be accessed by the one or more memory access requests; selecting a current memory page from the list of unique memory pages; and dispatching from the request queue to the memory, all memory access requests associated with the current memory page before any other memory access request in the request queue is dispatched.

Type: Application

Filed: December 28, 2016

Publication date: June 28, 2018

Inventors: Ishwar S. Bhati, Udit Dhawan, Jayesh Gaur, Sreenivas Subramoney
Branch Predictor with Empirical Branch Bias Override

Publication number: 20180173533

Abstract: A processor may include a baseline branch predictor and an empirical branch bias override circuit. The baseline branch predictor may receive a branch instruction associated with a given address identifier, and generate, based on a global branch history, an initial prediction of a branch direction for the instruction. The empirical branch bias override circuit may determine, dependent on a direction of an observed branch direction bias in executed branch instruction instances associated with the address identifier, whether the initial prediction should be overridden, may determine, in response to determining that the initial prediction should be overridden, a final prediction that matches the observed branch direction bias, or may determine, in response determining that the initial prediction should not be overridden, a final prediction that matches the initial prediction.

Type: Application

Filed: December 19, 2016

Publication date: June 21, 2018

Inventors: Niranjan K. Soundararajan, Sreenivas Subramoney, Rahul Pal, Ragavendra Natarajan
SYSTEM, METHOD, AND APPARATUS FOR REDUCING REDUNDANT WRITES TO MEMORY BY EARLY DETECTION AND ROI-BASED THROTTLING

Publication number: 20180121353

Abstract: Systems, methods, and processors to reduce redundant writes to memory. An embodiment of a system includes: a plurality of processors; a memory coupled to one of more of the plurality of processors; a cache coupled to the memory such that a dirty cache line evicted from the cache is written to the memory; and a redundant write detection circuitry coupled to the cache, wherein the redundant write detection circuitry to control write access to the cache based on a redundancy check of data to be written to the cache. The system may include a first predictor circuitry to deactivate the redundant write detection circuitry responsive to a determination that power consumed by the redundancy check is greater than the power it saves, or a second predictor circuitry to deactivate the redundant write detection circuitry when memory bandwidth saved from performing the redundancy check is not being utilized by memory reads.

Type: Application

Filed: October 27, 2016

Publication date: May 3, 2018

Inventors: Jayesh Gaur, Sreenivas Subramoney, Leon Polishuk
COORDINATED THREAD CRITICALITY-AWARE MEMORY SCHEDULING

Publication number: 20180088944

Abstract: A multi-core processor includes a plurality of cores to execute a plurality of threads and to monitor metrics for each of the plurality of threads during an interval, the metrics including stall cycle values, prefetches of a first type, and prefetches of a second type. The multi-core processor further includes criticality-aware thread prioritization (CATP) logic to compute a stall fraction for each of the plurality of threads during the interval using the stall cycle values, identify a thread with a highest stall fraction of the plurality of threads, determine the highest stall fraction is greater than a stall threshold, prioritize demand requests of the identified thread, compute a prefetch accuracy of the identified thread during the interval using the prefetches of the first type and the prefetches of the second type, determine the prefetch accuracy is greater than a prefetch threshold, and prioritize prefetch requests of the identified thread.

Type: Application

Filed: September 23, 2016

Publication date: March 29, 2018

Inventors: Lavanya Subramanian, Sreenivas Subramoney, Nithiyanandan Bashyam, Anant Nori
Coordinated thread criticality-aware memory scheduling

Patent number: 9921839

Abstract: A multi-core processor includes a plurality of cores to execute a plurality of threads and to monitor metrics for each of the plurality of threads during an interval, the metrics including stall cycle values, prefetches of a first type, and prefetches of a second type. The multi-core processor further includes criticality-aware thread prioritization (CATP) logic to compute a stall fraction for each of the plurality of threads during the interval using the stall cycle values, identify a thread with a highest stall fraction of the plurality of threads, determine the highest stall fraction is greater than a stall threshold, prioritize demand requests of the identified thread, compute a prefetch accuracy of the identified thread during the interval using the prefetches of the first type and the prefetches of the second type, determine the prefetch accuracy is greater than a prefetch threshold, and prioritize prefetch requests of the identified thread.

Type: Grant

Filed: September 23, 2016

Date of Patent: March 20, 2018

Assignee: Intel Corporation

Inventors: Lavanya Subramanian, Sreenivas Subramoney, Nithiyanandan Bashyam, Anant Nori
Online learning based algorithms to increase retention and reuse of GPU-generated dynamic surfaces in outer-level caches

Patent number: 9720829

Abstract: Some implementations disclosed herein provide techniques for caching memory data and for managing cache retention. Different cache retention policies may be applied to different cached data streams such as those of a graphics processing unit. Actual performance of the cache with respect to the data streams may be observed, and the cache retention policies may be varied based on the observed actual performance.

Type: Grant

Filed: December 29, 2011

Date of Patent: August 1, 2017

Assignee: Intel Corporation

Inventors: Suresh Srinivasan, Rakesh Ramesh, Sreenivas Subramoney, Jayesh Gaur
Instruction and Logic for Managing Cumulative System Bandwidth through Dynamic Request Partitioning

Publication number: 20160179387

Abstract: A processor includes an execution unit, a memory subsystem, and a memory management unit (MMU). The MMU includes logic to evaluate a first bandwidth usage of the memory subsystem and logic to evaluate a second bandwidth usage between the processor and a memory. The memory is communicatively coupled to the memory subsystem. The memory subsystem is to implement a cache for the memory. The MMU further includes logic to evaluate a request of the memory subsystem, and, based upon the first bandwidth usage and the second bandwidth usage, fulfill the request by bypassing the memory subsystem.

Type: Application

Filed: December 16, 2015

Publication date: June 23, 2016

Inventors: Jayesh Gaur, Prasanna Rengasamy, Pradeep Ramachandran, Sreenivas Subramoney
Identifying and prioritizing critical instructions within processor circuitry

Patent number: 9323678

Abstract: In one embodiment, the present invention includes a method for identifying a memory request corresponding to a load instruction as a critical transaction if an instruction pointer of the load instruction is present in a critical instruction table associated with a processor core, sending the memory request to a system agent of the processor with a critical indicator to identify the memory request as a critical transaction, and prioritizing the memory request ahead of other pending transactions responsive to the critical indicator. Other embodiments are described and claimed.

Type: Grant

Filed: December 30, 2011

Date of Patent: April 26, 2016

Assignee: Intel Corporation

Inventors: Amit Kumar, Sreenivas Subramoney
Partner-Aware Virtual Microsectoring for Sectored Cache Architectures

Publication number: 20160092369

Abstract: Embodiments described include systems, apparatuses, and methods using sectored dynamic random access memory (DRAM) cache. An exemplary apparatus may include at least one hardware processor core and a sectored dynamic random access (DRAM) cache coupled to the at least one hardware processor core.

Type: Application

Filed: September 26, 2014

Publication date: March 31, 2016

Inventors: Sreenivas SUBRAMONEY, Jayesh GAUR, Mukesh AGRAWAL, Mainak CHAUDHURI
Data compression in processor caches

Patent number: 9251096

Abstract: In an embodiment, a processor includes a cache data array including a plurality of physical ways, each physical way to store a baseline way and a victim way; a cache tag array including a plurality of tag groups, each tag group associated with a particular physical way and including a first tag associated with the baseline way stored in the particular physical way, and a second tag associated with the victim way stored in the particular physical way; and cache control logic to: select a first baseline way based on a replacement policy, select a first victim way based on an available capacity of a first physical way including the first victim way, and move a first data element from the first baseline way to the first victim way. Other embodiments are described and claimed.

Type: Grant

Filed: September 25, 2013

Date of Patent: February 2, 2016

Assignee: Intel Corporation

Inventors: Sreenivas Subramoney, Jayesh Gaur, Alaa R Alameldeen
Dead block predictors for cooperative execution in the last level cache

Patent number: 9195606

Abstract: A cache memory eviction method includes maintaining thread-aware cache access data per cache block in a cache memory, wherein the cache access data is indicative of a number of times a cache block is accessed by a first thread, associating a cache block with one of a plurality of bins based on cache access data values of the cache block, and selecting a cache block to evict from a plurality of cache block candidates based, at least in part, upon the bins with which the cache block candidates are associated.

Type: Grant

Filed: March 15, 2013

Date of Patent: November 24, 2015

Assignee: Intel Corporation

Inventors: Ragavendra Natarajan, Jayesh Guar, Nithiyanandan Bashyam, Mainak Chaudhuri, Sreenivas Subramoney
Data Compression In Processor Caches

Publication number: 20150089126

Abstract: In an embodiment, a processor includes a cache data array including a plurality of physical ways, each physical way to store a baseline way and a victim way; a cache tag array including a plurality of tag groups, each tag group associated with a particular physical way and including a first tag associated with the baseline way stored in the particular physical way, and a second tag associated with the victim way stored in the particular physical way; and cache control logic to: select a first baseline way based on a replacement policy, select a first victim way based on an available capacity of a first physical way including the first victim way, and move a first data element from the first baseline way to the first victim way. Other embodiments are described and claimed.

Type: Application

Filed: September 25, 2013

Publication date: March 26, 2015

Inventors: Sreenivas Subramoney, Jayesh Gaur, Alaa R. Alameldeen

prev … 4 5 6 7 8 9 10 next