Patents by Inventor Sreenivas Subramoney

Sreenivas Subramoney has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20180285268
    Abstract: In one embodiment, a processor comprises a processing core, a last level cache (LLC), and a mid-level cache. The mid-level cache is to determine that an idle indicator has been set, wherein the idle indicator is set based on an amount of activity at the LLC, and based on the determination that the idle indicator has been set, identify a first cache line to be evicted from a first set of cache lines of the mid-level cache and send a request to write the first cache line to the LLC.
    Type: Application
    Filed: March 31, 2017
    Publication date: October 4, 2018
    Applicant: Intel Corporation
    Inventors: Kunal Kishore Korgaonkar, Ishwar S. Bhati, Huichu Liu, Jayesh Gaur, Sasikanth Manipatruni, Sreenivas Subramoney, Tanay Karnik, Hong Wang, Ian A. Young
  • Publication number: 20180285364
    Abstract: A processor may include a plurality of processing elements and a hardware accelerator for selecting data elements. The hardware accelerator may: access an input data set comprising a set of data elements, each data element having a score value; increment bin counters based on the score values of the set of data elements, each bin counter to count a number of data elements with an associated score value; determine a cumulative sum of count values for a sequence of bin counters, the sequence beginning with a first bin counter of the plurality of bin counters; identify a second bin counter in the sequence of bin counters at which the cumulative sum reaches a selection quantity N; and generate an output data set based on a comparison of the set of data elements to a threshold score associated with the second bin counter.
    Type: Application
    Filed: March 31, 2017
    Publication date: October 4, 2018
    Inventors: MAHESH MAMIDIPAKA, SRIVATSAVA JANDHYALA, ANISH N K, NAGADASTAGIRI REDDY C, SREENIVAS SUBRAMONEY
  • Publication number: 20180232235
    Abstract: A processor includes a memory to hold a buffer to store data dependencies comprising nodes and edges for each of a plurality of micro-operations. The nodes include a first node for dispatch, a second node for execution, and a third node for commit. A detector circuit is to queue, in the buffer, the nodes of a micro-operation; add, to determine a node weight for each of the nodes of the micro-operation, an edge weight to a previous node weight of a connected micro-operation that yields a maximum node weight for the node, wherein the node weight comprises a number of execution cycles of an OOO pipeline of the processor and the edge weight comprises a number of execution cycles to execute the connected micro-operation; and identify, as a critical path, a path through the data dependencies that yields the maximum node weight for the micro-operation.
    Type: Application
    Filed: February 15, 2017
    Publication date: August 16, 2018
    Inventors: Jayesh Gaur, Pooja Roy, Sreenivas Subramoney, Hong Wang, Ronak Singhal
  • Publication number: 20180232311
    Abstract: A processor includes a processing core and a cache controller including a read queue and a separate write queue. The read queue is to buffer read requests of the processing core to a non-volatile memory, last level cache (NVM-LLC), and the write queue is to buffer write requests to the NVM-LLC. The cache controller is to detect whether the write queue is full. The cache controller further prioritizes a first order of sending requests to the NVM-LLC when the write queue contains an empty slot, the first order specifying a first pattern of sending the read requests before the write requests, and prioritizes a second order of sending requests to the NVM-LLC in response to a determination that the write queue is full, the second order specifying a second pattern of alternating between sending a write request from the write queue and a read request from the read queue.
    Type: Application
    Filed: February 13, 2017
    Publication date: August 16, 2018
    Inventors: Ishwar S. Bhati, Huichu Liu, Jayesh Gaur, Kunal Korgaonkar, Sasikanth Manipatruni, Sreenivas Subramoney, Tanay Karnik, Hong Wang, Ian A. Young
  • Publication number: 20180203799
    Abstract: A memory-efficient last level cache (LLC) architecture is described. A processor implementing a LLC architecture may include a processor core, a last level cache (LLC) operatively coupled to the processor core, and a cache controller operatively coupled to the LLC. The cache controller is to monitor a bandwidth demand of a channel between the processor core and a dynamic random-access memory (DRAM) device associated with the LLC. The cache controller is further to perform a first defined number of consecutive reads from the DRAM device when the bandwidth demand exceeds a first threshold value and perform a first defined number of consecutive writes of modified lines from the LLC to the DRAM device when the bandwidth demand exceeds the first threshold value.
    Type: Application
    Filed: January 18, 2017
    Publication date: July 19, 2018
    Inventors: Jayesh Gaur, Ayan Mandal, Anant Nori, Sreenivas Subramoney
  • Publication number: 20180189587
    Abstract: Aspects of the present disclosure relates to technologies (systems, devices, methods, etc.) for performing feature detection and/or feature tracking based on image data. In embodiments, the technologies include or leverage a SLAM hardware accelerator (SWA) that includes a feature detection component and optionally a feature tracking component. The feature detection component may be configured to perform feature detection on working data encompassed by a sliding window. The feature tracking component is configured to perform feature tracking operations to track one or more detected features, e.g., using normalized cross correlation (NCC) or another method.
    Type: Application
    Filed: November 29, 2017
    Publication date: July 5, 2018
    Applicant: Intel Corporation
    Inventors: Dipan Kumar Mandal, Om J. Omer, Lance E. Hacking, James Radford, Sreenivas Subramoney, Eagle Jones, Gautham N. Chinya
  • Publication number: 20180188994
    Abstract: Systems for page management using local page information are disclosed. The system may include a processor, including a memory controller, and a memory, including a row buffer. The memory controller may include circuitry to determine that a page stored in the row buffer has been idle for a time exceeding a predetermined threshold determine whether the page is exempt from idle page closures, and, based on a determination that the page is exempt, refrain from closing the page. Associated methods are also disclosed.
    Type: Application
    Filed: December 29, 2016
    Publication date: July 5, 2018
    Inventors: Sriseshan Srikanth, Lavanya Subramanian, Sreenivas Subramoney
  • Patent number: 10013352
    Abstract: Embodiments described include systems, apparatuses, and methods using sectored dynamic random access memory (DRAM) cache. An exemplary apparatus may include at least one hardware processor core and a sectored dynamic random access (DRAM) cache coupled to the at least one hardware processor core.
    Type: Grant
    Filed: September 26, 2014
    Date of Patent: July 3, 2018
    Assignee: Intel Corporation
    Inventors: Sreenivas Subramoney, Jayesh Gaur, Mukesh Agrawal, Mainak Chaudhuri
  • Publication number: 20180181329
    Abstract: Processor, apparatus, and method for reordering a stream of memory access requests to establish locality are described herein. One embodiment of a method includes: storing in a request queue memory access requests generated by a plurality of execution units, the memory access requests comprising a first request to access a first memory page in a memory and a second request to access a second memory page in the memory; maintaining a list of unique memory pages, each unique memory page associated with one or more memory access requests stored the request queue and is to be accessed by the one or more memory access requests; selecting a current memory page from the list of unique memory pages; and dispatching from the request queue to the memory, all memory access requests associated with the current memory page before any other memory access request in the request queue is dispatched.
    Type: Application
    Filed: December 28, 2016
    Publication date: June 28, 2018
    Inventors: Ishwar S. Bhati, Udit Dhawan, Jayesh Gaur, Sreenivas Subramoney
  • Publication number: 20180173533
    Abstract: A processor may include a baseline branch predictor and an empirical branch bias override circuit. The baseline branch predictor may receive a branch instruction associated with a given address identifier, and generate, based on a global branch history, an initial prediction of a branch direction for the instruction. The empirical branch bias override circuit may determine, dependent on a direction of an observed branch direction bias in executed branch instruction instances associated with the address identifier, whether the initial prediction should be overridden, may determine, in response to determining that the initial prediction should be overridden, a final prediction that matches the observed branch direction bias, or may determine, in response determining that the initial prediction should not be overridden, a final prediction that matches the initial prediction.
    Type: Application
    Filed: December 19, 2016
    Publication date: June 21, 2018
    Inventors: Niranjan K. Soundararajan, Sreenivas Subramoney, Rahul Pal, Ragavendra Natarajan
  • Publication number: 20180121353
    Abstract: Systems, methods, and processors to reduce redundant writes to memory. An embodiment of a system includes: a plurality of processors; a memory coupled to one of more of the plurality of processors; a cache coupled to the memory such that a dirty cache line evicted from the cache is written to the memory; and a redundant write detection circuitry coupled to the cache, wherein the redundant write detection circuitry to control write access to the cache based on a redundancy check of data to be written to the cache. The system may include a first predictor circuitry to deactivate the redundant write detection circuitry responsive to a determination that power consumed by the redundancy check is greater than the power it saves, or a second predictor circuitry to deactivate the redundant write detection circuitry when memory bandwidth saved from performing the redundancy check is not being utilized by memory reads.
    Type: Application
    Filed: October 27, 2016
    Publication date: May 3, 2018
    Inventors: Jayesh Gaur, Sreenivas Subramoney, Leon Polishuk
  • Publication number: 20180088944
    Abstract: A multi-core processor includes a plurality of cores to execute a plurality of threads and to monitor metrics for each of the plurality of threads during an interval, the metrics including stall cycle values, prefetches of a first type, and prefetches of a second type. The multi-core processor further includes criticality-aware thread prioritization (CATP) logic to compute a stall fraction for each of the plurality of threads during the interval using the stall cycle values, identify a thread with a highest stall fraction of the plurality of threads, determine the highest stall fraction is greater than a stall threshold, prioritize demand requests of the identified thread, compute a prefetch accuracy of the identified thread during the interval using the prefetches of the first type and the prefetches of the second type, determine the prefetch accuracy is greater than a prefetch threshold, and prioritize prefetch requests of the identified thread.
    Type: Application
    Filed: September 23, 2016
    Publication date: March 29, 2018
    Inventors: Lavanya Subramanian, Sreenivas Subramoney, Nithiyanandan Bashyam, Anant Nori
  • Patent number: 9921839
    Abstract: A multi-core processor includes a plurality of cores to execute a plurality of threads and to monitor metrics for each of the plurality of threads during an interval, the metrics including stall cycle values, prefetches of a first type, and prefetches of a second type. The multi-core processor further includes criticality-aware thread prioritization (CATP) logic to compute a stall fraction for each of the plurality of threads during the interval using the stall cycle values, identify a thread with a highest stall fraction of the plurality of threads, determine the highest stall fraction is greater than a stall threshold, prioritize demand requests of the identified thread, compute a prefetch accuracy of the identified thread during the interval using the prefetches of the first type and the prefetches of the second type, determine the prefetch accuracy is greater than a prefetch threshold, and prioritize prefetch requests of the identified thread.
    Type: Grant
    Filed: September 23, 2016
    Date of Patent: March 20, 2018
    Assignee: Intel Corporation
    Inventors: Lavanya Subramanian, Sreenivas Subramoney, Nithiyanandan Bashyam, Anant Nori
  • Patent number: 9720829
    Abstract: Some implementations disclosed herein provide techniques for caching memory data and for managing cache retention. Different cache retention policies may be applied to different cached data streams such as those of a graphics processing unit. Actual performance of the cache with respect to the data streams may be observed, and the cache retention policies may be varied based on the observed actual performance.
    Type: Grant
    Filed: December 29, 2011
    Date of Patent: August 1, 2017
    Assignee: Intel Corporation
    Inventors: Suresh Srinivasan, Rakesh Ramesh, Sreenivas Subramoney, Jayesh Gaur
  • Publication number: 20160179387
    Abstract: A processor includes an execution unit, a memory subsystem, and a memory management unit (MMU). The MMU includes logic to evaluate a first bandwidth usage of the memory subsystem and logic to evaluate a second bandwidth usage between the processor and a memory. The memory is communicatively coupled to the memory subsystem. The memory subsystem is to implement a cache for the memory. The MMU further includes logic to evaluate a request of the memory subsystem, and, based upon the first bandwidth usage and the second bandwidth usage, fulfill the request by bypassing the memory subsystem.
    Type: Application
    Filed: December 16, 2015
    Publication date: June 23, 2016
    Inventors: Jayesh Gaur, Prasanna Rengasamy, Pradeep Ramachandran, Sreenivas Subramoney
  • Patent number: 9323678
    Abstract: In one embodiment, the present invention includes a method for identifying a memory request corresponding to a load instruction as a critical transaction if an instruction pointer of the load instruction is present in a critical instruction table associated with a processor core, sending the memory request to a system agent of the processor with a critical indicator to identify the memory request as a critical transaction, and prioritizing the memory request ahead of other pending transactions responsive to the critical indicator. Other embodiments are described and claimed.
    Type: Grant
    Filed: December 30, 2011
    Date of Patent: April 26, 2016
    Assignee: Intel Corporation
    Inventors: Amit Kumar, Sreenivas Subramoney
  • Publication number: 20160092369
    Abstract: Embodiments described include systems, apparatuses, and methods using sectored dynamic random access memory (DRAM) cache. An exemplary apparatus may include at least one hardware processor core and a sectored dynamic random access (DRAM) cache coupled to the at least one hardware processor core.
    Type: Application
    Filed: September 26, 2014
    Publication date: March 31, 2016
    Inventors: Sreenivas SUBRAMONEY, Jayesh GAUR, Mukesh AGRAWAL, Mainak CHAUDHURI
  • Patent number: 9251096
    Abstract: In an embodiment, a processor includes a cache data array including a plurality of physical ways, each physical way to store a baseline way and a victim way; a cache tag array including a plurality of tag groups, each tag group associated with a particular physical way and including a first tag associated with the baseline way stored in the particular physical way, and a second tag associated with the victim way stored in the particular physical way; and cache control logic to: select a first baseline way based on a replacement policy, select a first victim way based on an available capacity of a first physical way including the first victim way, and move a first data element from the first baseline way to the first victim way. Other embodiments are described and claimed.
    Type: Grant
    Filed: September 25, 2013
    Date of Patent: February 2, 2016
    Assignee: Intel Corporation
    Inventors: Sreenivas Subramoney, Jayesh Gaur, Alaa R Alameldeen
  • Patent number: 9195606
    Abstract: A cache memory eviction method includes maintaining thread-aware cache access data per cache block in a cache memory, wherein the cache access data is indicative of a number of times a cache block is accessed by a first thread, associating a cache block with one of a plurality of bins based on cache access data values of the cache block, and selecting a cache block to evict from a plurality of cache block candidates based, at least in part, upon the bins with which the cache block candidates are associated.
    Type: Grant
    Filed: March 15, 2013
    Date of Patent: November 24, 2015
    Assignee: Intel Corporation
    Inventors: Ragavendra Natarajan, Jayesh Guar, Nithiyanandan Bashyam, Mainak Chaudhuri, Sreenivas Subramoney
  • Publication number: 20150089126
    Abstract: In an embodiment, a processor includes a cache data array including a plurality of physical ways, each physical way to store a baseline way and a victim way; a cache tag array including a plurality of tag groups, each tag group associated with a particular physical way and including a first tag associated with the baseline way stored in the particular physical way, and a second tag associated with the victim way stored in the particular physical way; and cache control logic to: select a first baseline way based on a replacement policy, select a first victim way based on an available capacity of a first physical way including the first victim way, and move a first data element from the first baseline way to the first victim way. Other embodiments are described and claimed.
    Type: Application
    Filed: September 25, 2013
    Publication date: March 26, 2015
    Inventors: Sreenivas Subramoney, Jayesh Gaur, Alaa R. Alameldeen