Patents by Inventor Sreenivas Subramoney

Sreenivas Subramoney has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20140368524
    Abstract: Some implementations disclosed herein provide techniques for caching memory data and for managing cache retention. Different cache retention policies may be applied to different cached data streams such as those of a graphics processing unit. Actual performance of the cache with respect to the data streams may be observed, and the cache retention policies may be varied based on the observed actual performance.
    Type: Application
    Filed: December 29, 2011
    Publication date: December 18, 2014
    Inventors: Suresh Srinivasan, Ramesh K. Rakesh, Sreenivas Subramoney, Jayesh Gaur
  • Publication number: 20140351524
    Abstract: A cache memory eviction method includes maintaining thread-aware cache access data per cache block in a cache memory, wherein the cache access data is indicative of a number of times a cache block is accessed by a first thread, associating a cache block with one of a plurality of bins based on cache access data values of the cache block, and selecting a cache block to evict from a plurality of cache block candidates based, at least in part, upon the bins with which the cache block candidates are associated.
    Type: Application
    Filed: March 15, 2013
    Publication date: November 27, 2014
    Inventors: Ragavendra Natarajan, Jayesh Guar, Nithiyanandan Bashyam, Mainak Chaudhuri, Sreenivas Subramoney
  • Patent number: 8667222
    Abstract: An apparatus and method are described for implementing an exclusive lower level cache (LLC) policy within a computer processor. For example, one embodiment of a computer processor comprises: a mid-level cache circuit (MLC) for storing a first set of cache lines containing instructions and/or data; a lower level cache circuit (LLC) for storing a second set of cache lines of instructions and/or data; and an insertion circuit for implementing a policy for inserting or replacing cache lines within the LLC based on values of use recency and use frequency associated with the lines.
    Type: Grant
    Filed: April 1, 2011
    Date of Patent: March 4, 2014
    Assignee: Intel Corporation
    Inventors: Jayesh Gaur, Mainak Chaudhuri, Sreenivas Subramoney
  • Publication number: 20130339595
    Abstract: In one embodiment, the present invention includes a method for identifying a memory request corresponding to a load instruction as a critical transaction if an instruction pointer of the load instruction is present in a critical instruction table associated with a processor core, sending the memory request to a system agent of the processor with a critical indicator to identify the memory request as a critical transaction, and prioritizing the memory request ahead of other pending transactions responsive to the critical indicator. Other embodiments are described and claimed.
    Type: Application
    Filed: December 30, 2011
    Publication date: December 19, 2013
    Inventors: Amit Kumar, Sreenivas Subramoney
  • Publication number: 20130262779
    Abstract: Profiling and analyzing modules may be combined with hardware modules to identify a likelihood that a particular region of code in a computer program contains data that would benefit from prefetching. Those regions of code that would not benefit from prefetching may also be identified. Once a region of code has been identified, a hardware prefetcher may be selectively enabled or disable when executing code in identified code region. In some instances, once a processing device finishes executing code in the identified code region, the state of the hardware prefetcher may then be switched back to its original state. Systems, methods, and media are provided.
    Type: Application
    Filed: March 30, 2012
    Publication date: October 3, 2013
    Inventors: Jayaram BOBBA, Ryan CARLSON, Jeffrey Cook, Abhinav DAS, Jason HORIHAN, Wei LI, Suresh SRINIVAS, Sreenivas SUBRAMONEY, Krishnaswamy VISWANATHAN
  • Publication number: 20130166846
    Abstract: Some implementations disclosed herein provide techniques and arrangements for a hierarchy-aware replacement policy for a last-level cache. A detector may be used to provide the last-level cache with information about blocks in a lower-level cache. For example, the detector may receive a notification identifying a block evicted from the lower-level cache. The notification may include a category associated with the block. The detector may identify a request that caused the block to be filled into the lower-level cache. The detector may determine whether one or more statistics associated with the category satisfy a threshold. In response to determining that the one or more statistics associated with the category satisfy the threshold, the detector may send an indication to the last-level cache that the block is a candidate for eviction from the last-level cache.
    Type: Application
    Filed: December 20, 2012
    Publication date: June 27, 2013
    Inventors: Jayesh Gaur, Mainak Chaudhuri, Sreenivas Subramoney, Nithiyanandan Bashyam, Joseph Nuzman
  • Publication number: 20120254550
    Abstract: An apparatus and method are described for implementing an exclusive lower level cache (LLC) policy within a computer processor. For example, one embodiment of a computer processor comprises: a mid-level cache circuit (MLC) for storing a first set of cache lines containing instructions and/or data; a lower level cache circuit (LLC) for storing a second set of cache lines of instructions and/or data; and an insertion circuit for implementing a policy for inserting or replacing cache lines within the LLC based on values of use recency and use frequency associated with the lines.
    Type: Application
    Filed: April 1, 2011
    Publication date: October 4, 2012
    Inventors: Jayesh Gaur, Mainak Chaudhuri, Sreenivas Subramoney
  • Patent number: 7577947
    Abstract: Methods and apparatus to dynamically insert prefetch instructions are disclosed. In an example method, one or more samples associated with cache misses are identified from a performance monitoring unit in a processor system. Based on sample information associated with the one or more samples, delinquent information is generated. To dynamically insert one or more prefetch instructions, a prefetch point is identified based on the delinquent information.
    Type: Grant
    Filed: December 19, 2003
    Date of Patent: August 18, 2009
    Assignee: Intel Corporation
    Inventors: Sreenivas Subramoney, Mauricio J. Serrano, Richard L. Hudson, Ali-Reza Adl-Tabatabai
  • Patent number: 7490117
    Abstract: Techniques are described for optimizing memory management in a processor system. The techniques may be implemented on processors that include on-chip performance monitoring and on systems where an external performance monitor is coupled to a processor. Processors that include a Performance Monitoring Unit (PMU) are examples. The PMU may store data on read and write cache misses, as well as data on translation lookaside buffer (TLB) misses. The data from the PMU is used to determine if any memory regions within a memory heap are delinquent memory regions, i.e., regions exhibiting high numbers of memory problems or stalls. If delinquent memory regions are found, the memory manager, such as a garbage collection routine, can efficiently optimize memory performance as well as the mutators performance by improving the layout of objects in the heap. In this way, memory management routines may be focused based on dynamic and real-time memory performance data.
    Type: Grant
    Filed: December 31, 2003
    Date of Patent: February 10, 2009
    Assignee: Intel Corporation
    Inventors: Sreenivas Subramoney, Richard Hudson, Mauricio Serrano, Ali-Reza Adl-Tabatabai
  • Patent number: 7389385
    Abstract: Methods and apparatus to insert prefetch instructions based on garbage collector analysis and compiler analysis are disclosed. In an example method, one or more batches of samples associated with cache misses from a performance monitoring unit in a processor system are received. One or more samples from the one or more batches of samples based on delinquent information are selected. A performance impact indicator associated with the one or more samples is generated. Based on the performance indicator, at least one of a garbage collector analysis and a compiler analysis is initiated to identify one or more delinquent paths. Based on the at least one of the garbage collector analysis and the compiler analysis, one or more prefetch points to insert prefetch instructions are identified.
    Type: Grant
    Filed: December 19, 2003
    Date of Patent: June 17, 2008
    Assignee: Intel Corporation
    Inventors: Mauricio J. Serrano, Sreenivas Subramoney, Richard L. Hudson, Ali-Reza Adl-Tabatabai
  • Patent number: 7197521
    Abstract: An arrangement is provided for using bit vector toggling to achieve concurrent mark-sweep garbage collection in a managed runtime system. A heap may be divided into a number of heap blocks. Each heap block may contain a mark bit vector pointer, a sweep bit vector pointer, and two bit vectors of which one may be initially pointed to by the mark bit vector pointer and used for marking and the other may be initially pointed to by the sweep bit vector pointer and used for sweeping. At the end of the marking phase for a heap block, the bit vector used for marking and the bit vector used for sweeping may be toggled so that marking phase and sweeping phase may proceed concurrently and both phases may proceed concurrently with mutators.
    Type: Grant
    Filed: November 21, 2003
    Date of Patent: March 27, 2007
    Assignee: Intel Corporation
    Inventors: Sreenivas Subramoney, Richard Hudson
  • Publication number: 20060143421
    Abstract: Techniques are described for optimizing memory management in a processor system. The techniques may be implemented on processors that include on-chip performance monitoring and on systems where an external performance monitor is coupled to a processor. Processors that include a Performance Monitoring Unit (PMU) are examples. The PMU may store data on read and write cache misses, as well as data on translation lookaside buffer (TLB) misses. The data from the PMU is used to determine if any memory regions within a memory heap are delinquent memory regions, i.e., regions exhibiting high numbers of memory problems or stalls. If delinquent memory regions are found, the memory manager, such as a garbage collection routine, can efficiently optimize memory performance as well as the mutators performance by improving the layout of objects in the heap. In this way, memory management routines may be focused based on dynamic and real-time memory performance data.
    Type: Application
    Filed: December 31, 2003
    Publication date: June 29, 2006
    Applicant: INTEL CORPORATION
    Inventors: Sreenivas Subramoney, Richard Hudson, Mauricio Serrano, Ali-Reza Adl-Tabatabai
  • Patent number: 6950837
    Abstract: An improved moving garbage collection algorithm is described. The algorithm allows efficient use of non-temporal stores to reduce the required time for garbage collection. Non-temporal stores (or copies) are a CPU feature that allows the copy of data objects within main memory with no interference or pollution of the cache memory. The live objects copied to new memory locations will not be accessed again in the near future and therefore need not be copied to cache. This avoids copy operations and avoids taxing the CPU with cache determinations. In a preferred embodiment, the algorithm of the present invention exploits the fact that live data objects will be stored to consecutive new memory locations in order to perform streaming copies. Since each copy procedure has an associated CPU overhead, the process of streaming the copies reduces the degradation of system performance and thus reduces the time for garbage collection.
    Type: Grant
    Filed: June 19, 2001
    Date of Patent: September 27, 2005
    Assignee: Intel Corporation
    Inventors: Sreenivas Subramoney, Richard L. Hudson
  • Publication number: 20050198088
    Abstract: An arrangement is provided for using only one bit vector per heap block to improve the concurrency and parallelism of mark-sweep-compact garbage collection in a managed runtime system. A heap may be divided into a number of heap blocks. Each heap block has only one bit vector used for marking, compacting, and sweeping, and in that bit vector only one bit is needed per word or double word in that heap block. Both marking and sweeping phases may proceed concurrently with the execution of applications. Because all information needed for marking, compacting, and sweeping is contained in a bit vector for a heap block, multiple heap blocks may be marked, compacted, or swept in parallel through multiple garbage collection threads. Only a portion of heap blocks may be selected for compaction during each garbage collection to make the compaction incremental to reduce the disruptiveness of compaction to running applications and to achieve a fine load-balance of garbage collection process.
    Type: Application
    Filed: March 3, 2004
    Publication date: September 8, 2005
    Inventors: Sreenivas Subramoney, Richard Hudson
  • Publication number: 20050138329
    Abstract: Methods and apparatus to dynamically insert prefetch instructions are disclosed. In an example method, one or more samples associated with cache misses are identified from a performance monitoring unit in a processor system. Based on sample information associated with the one or more samples, delinquent information is generated. To dynamically insert one or more prefetch instructions, a prefetch point is identified based on the delinquent information.
    Type: Application
    Filed: December 19, 2003
    Publication date: June 23, 2005
    Inventors: Sreenivas Subramoney, Mauricio Serrano, Richard Hudson, Ali-Reza Adl-Tabatabai
  • Publication number: 20050138294
    Abstract: Methods and apparatus to insert prefetch instructions based on garbage collector analysis and compiler analysis are disclosed. In an example method, one or more batches of samples associated with cache misses from a performance monitoring unit in a processor system are received. One or more samples from the one or more batches of samples based on delinquent information are selected. A performance impact indicator associated with the one or more samples is generated. Based on the performance indicator, at least one of a garbage collector analysis and a compiler analysis is initiated to identify one or more delinquent paths. Based on the at least one of the garbage collector analysis and the compiler analysis, one or more prefetch points to insert prefetch instructions are identified.
    Type: Application
    Filed: December 19, 2003
    Publication date: June 23, 2005
    Inventors: Mauricio Serrano, Sreenivas Subramoney, Richard Hudson, Ali-Reza Adl-Tabatabai
  • Publication number: 20050114413
    Abstract: An arrangement is provided for using bit vector toggling to achieve concurrent mark-sweep garbage collection in a managed runtime system. A heap may be divided into a number of heap blocks. Each heap block may contain a mark bit vector pointer, a sweep bit vector pointer, and two bit vectors of which one may be initially pointed to by the mark bit vector pointer and used for marking and the other may be initially pointed to by the sweep bit vector pointer and used for sweeping. At the end of the marking phase for a heap block, the bit vector used for marking and the bit vector used for sweeping may be toggled so that marking phase and sweeping phase may proceed concurrently and both phases may proceed concurrently with mutators.
    Type: Application
    Filed: November 21, 2003
    Publication date: May 26, 2005
    Inventors: Sreenivas Subramoney, Richard Hudson
  • Publication number: 20040205313
    Abstract: A method is provided including determining if an object has been published previously, identifying the object as public according to whether the object has been published previously, identifying objects reachable from the object as public according to whether the object has been published previously, and publishing the object. An apparatus for performing the method, and an article including a machine- accessible medium that provides instructions that, if executed by a processor, will cause the processor to perform the method are also provided.
    Type: Application
    Filed: April 10, 2003
    Publication date: October 14, 2004
    Inventors: Richard L. Hudson, Hong Wang, John Shen, Weldon Washburn, James M. Stichnoth, Sreenivas Subramoney, CHRIS NEWBURN
  • Patent number: 6662274
    Abstract: A method for creating a mark stack for use in a moving garbage collection algorithm is described. The algorithm of the present invention creates a mark stack to implement a MGCA. The algorithm allows efficient use of cache memory prefetch features to reduce the required time to complete the mark stack and thus reduce the time required for garbage collection. Instructions are issued to prefetch data objects that will be examined in the future, so that by the time the scan pointer reaches the data object, the cache lines for the data object are already filled. At some point after the data object is prefetched, the address location of associated data objects is likewise prefetched. Finally, the associated data objects located at the previously fetched addresses are prefetched. This reduces garbage collection by continually supplying the garbage collector with a stream of preemptively prefetched data objects that require scanning.
    Type: Grant
    Filed: June 20, 2001
    Date of Patent: December 9, 2003
    Assignee: Intel Corporation
    Inventors: Sreenivas Subramoney, Richard L. Hudson
  • Publication number: 20020199065
    Abstract: A method for creating a mark stack for use in a moving garbage collection algorithm is described. The algorithm of the present invention creates a mark stack to implement a MGCA. The algorithm allows efficient use of cache memory prefetch features to reduce the required time to complete the mark stack and thus reduce the time required for garbage collection. Instructions are issued to prefetch data objects that will be examined in the future, so that by the time the scan pointer reaches the data object, the cache lines for the data object are already filled. At some point after the data object is prefetched, the address location of associated data objects is likewise prefetched. Finally, the associated data objects located at the previously fetched addresses are prefetched. This reduces garbage collection by continually supplying the garbage collector with a stream of preemptively prefetched data objects that require scanning.
    Type: Application
    Filed: June 20, 2001
    Publication date: December 26, 2002
    Inventors: Sreenivas Subramoney, Richard L. Hudson