Patents by Inventor Sreenivas Subramoney

Sreenivas Subramoney has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 10776270
    Abstract: A memory-efficient last level cache (LLC) architecture is described. A processor implementing a LLC architecture may include a processor core, a last level cache (LLC) operatively coupled to the processor core, and a cache controller operatively coupled to the LLC. The cache controller is to monitor a bandwidth demand of a channel between the processor core and a dynamic random-access memory (DRAM) device associated with the LLC. The cache controller is further to perform a first defined number of consecutive reads from the DRAM device when the bandwidth demand exceeds a first threshold value and perform a first defined number of consecutive writes of modified lines from the LLC to the DRAM device when the bandwidth demand exceeds the first threshold value.
    Type: Grant
    Filed: December 17, 2018
    Date of Patent: September 15, 2020
    Assignee: Intel Corporation
    Inventors: Jayesh Gaur, Ayan Mandal, Anant V. Nori, Sreenivas Subramoney
  • Publication number: 20200285580
    Abstract: In one embodiment, an apparatus comprises a processor and a memory controller. The processor is to identify a memory access operation associated with a memory location of a memory. The processor is further to determine that a cache memory does not contain data associated with the memory location. The processor is further to send a memory access notification to a memory controller via a first transmission path. The processor is further to send a memory access request to the memory controller via a second transmission path, wherein the second transmission path is slower than the first transmission path. The memory controller is to receive the memory access notification via the first transmission path, and send a memory activation request based on the memory access notification, wherein the memory activation request comprises a request to activate a memory bank associated with the memory location.
    Type: Application
    Filed: June 30, 2017
    Publication date: September 10, 2020
    Applicant: Intel Corporation
    Inventors: Lavanya Subramanian, Sreenivas Subramoney, Anant Vithal Nori
  • Patent number: 10754655
    Abstract: A processing device includes a branch IP table and branch predication circuitry coupled to the branch IP table. The branch predication circuitry to: determine a dynamic convergence point in a conditional branch of set of instructions; store the dynamic convergence point in the branch IP table; fetch a first and second speculative path of the conditional branch; while determining which of the first speculative path and the second speculative path is a taken path of the conditional branch and determining whether a dynamic convergence point is fetched corresponding to the stored dynamic convergence point, stall scheduling of instructions of the first speculative path and the second speculative path; and in response to determining that one of the first speculative path and the second speculative path is the taken path and the fetched dynamic convergence point corresponds to the stored convergence point, resume scheduling of the instructions of the taken path.
    Type: Grant
    Filed: June 28, 2018
    Date of Patent: August 25, 2020
    Assignee: Intel Corporation
    Inventors: Adarsh Chauhan, Hong Wang, Jayesh Gaur, Zeev Sperber, Sumeet Bandishte, Lihu Rappoport, Stanislav Shwartsman, Kamil Garifullin, Sreenivas Subramoney, Adi Yoaz
  • Patent number: 10719355
    Abstract: A processor including an execution unit, an instruction scheduler circuit to identify a first instruction of an instruction stream, identify a second instruction on which execution of the first instruction depends, and assign a first dispatch priority value to the first instruction and the second instruction, and a dispatch circuit to dispatch, based on the first dispatch priority value, the first instruction and the second instruction to an instruction execution circuit.
    Type: Grant
    Filed: February 7, 2018
    Date of Patent: July 21, 2020
    Assignee: Intel Corporation
    Inventors: Pooja Roy, Jayesh Gaur, Sreenivas Subramoney, Zeev Sperber, Alexandr Titov, Lihu Rappoport, Stanislav Shwartsman, Hong Wang, Adi Yoaz, Ronak Singhal, Robert S. Chappell
  • Publication number: 20200226203
    Abstract: A disclosed apparatus to multiply matrices includes a compute engine. The compute engine includes multipliers in a two dimensional array that has a plurality of array locations defined by columns and rows. The apparatus also includes a plurality of adders in columns. A broadcast interconnect between a cache and the multipliers broadcasts a first set of operand data elements to multipliers in the rows of the array. A unicast interconnect unicasts a second set of operands between a data buffer and the multipliers. The multipliers multiply the operands to generate a plurality of outputs, and the adders add the outputs generated by the multipliers.
    Type: Application
    Filed: March 27, 2020
    Publication date: July 16, 2020
    Inventors: Biji George, Om Ji Omer, Dipan Kumar Mandal, Cormac Brick, Lance Hacking, Sreenivas Subramoney, Belliappa Kuttanna
  • Patent number: 10713053
    Abstract: An apparatus and method for adaptive spatial accelerated prefetching. For example, one embodiment of an apparatus comprises: execution circuitry to execute instructions and process data; a Level 2 (L2) cache to store at least a portion of the data; and a prefetcher to prefetch data from a memory subsystem to the L2 cache in anticipation of the data being needed by the execution unit to execute one or more of the instructions, the prefetcher comprising a buffer to store one or more prefetched memory pages or portions thereof, and signature data indicating detected patterns of access to the one or more prefetched memory pages; wherein the prefetcher is to prefetch one or more cache lines based on the signature data.
    Type: Grant
    Filed: June 30, 2018
    Date of Patent: July 14, 2020
    Assignee: Intel Corporation
    Inventors: Rahul Bera, Anant Vithal Nori, Sreenivas Subramoney, Hong Wang
  • Publication number: 20200210339
    Abstract: System and method for prefetching pointer-referenced data. A method embodiment includes: tracking a plurality of load instructions which includes a first load instruction to access a first data that identifies a first memory location; detecting a second load instruction which accesses a second memory location for a second data, the second memory location matching the first memory location identified by the first data; responsive to the detecting, updating a list of pointer load instructions to include information identifying the first load instruction as a pointer load instruction; prefetching a third data for a third load instruction prior to executing the third load instruction; identifying the third load instruction as a pointer load instruction based on information from the list of pointer load instructions and responsively prefetching a fourth data from a fourth memory location, wherein the fourth memory location is identified by the third data.
    Type: Application
    Filed: December 27, 2018
    Publication date: July 2, 2020
    Inventors: Sreenivas Subramoney, Stanislav Shwartsman, Anant Nori, Shankar Balachandran, Elad Shtiegmann, Vineeth Mekkat, Manjunath Shevgoor, Sourabh Alurkar
  • Publication number: 20200201644
    Abstract: In one embodiment, an apparatus includes: a value prediction storage including a plurality of entries each to store address information of an instruction, a value prediction for the instruction and a confidence value for the value prediction; and a control circuit coupled to the value prediction storage. In response to an instruction address of a first instruction, the control circuit is to access a first entry of the value prediction storage to obtain a first value prediction associated with the first instruction and control execution of a second instruction based at least in part on the first value prediction. Other embodiments are described and claimed.
    Type: Application
    Filed: December 21, 2018
    Publication date: June 25, 2020
    Inventors: Sumeet Bandishte, Jayesh Gaur, Sreenivas Subramoney, Hong Wang
  • Publication number: 20200192670
    Abstract: In one embodiment, an apparatus includes a context-based prediction circuit to receive an instruction address for a branch instruction and a plurality of predictions associated with the branch instruction from a global prediction circuit. The context-based prediction circuit may include: a table having a plurality of entries each to store a context prediction value for a corresponding branch instruction; and a control circuit to generate, for the branch instruction, an index value to index into the table, the control circuit to generate the index value based at least in part on at least some of the plurality of predictions associated with the branch instruction and the instruction address for the branch instruction. Other embodiments are described and claimed.
    Type: Application
    Filed: December 17, 2018
    Publication date: June 18, 2020
    Inventors: Saurabh Gupta, Niranjan Soundararajan, Ragavendra Natarajan, Jared Warner Stark, IV, Lihu Rappoport, Sreenivas Subramoney
  • Publication number: 20200169383
    Abstract: A processor comprises a first register to store an encoded pointer to a memory location. First context information is stored in first bits of the encoded pointer and a slice of a linear address of the memory location is stored in second bits of the encoded pointer. The processor also includes circuitry to execute a memory access instruction to obtain a physical address of the memory location, access encrypted data at the memory location, derive a first tweak based at least in part on the encoded pointer, and generate a keystream based on the first tweak and a key. The circuitry is to further execute the memory access instruction to store state information associated with memory access instruction in a first buffer, and to decrypt the encrypted data based on the keystream. The keystream is to be generated at least partly in parallel with accessing the encrypted data.
    Type: Application
    Filed: January 29, 2020
    Publication date: May 28, 2020
    Applicant: Intel Corporation
    Inventors: David M. Durham, Michael LeMay, Michael E. Kounavis, Santosh Ghosh, Sergej Deutsch, Anant Vithal Nori, Jayesh Gaur, Sreenivas Subramoney, Karanvir S. Grewal
  • Patent number: 10664281
    Abstract: Methods and apparatuses relating to dynamic asymmetric scaling of branch predictor tables are described. Branch predictor circuits to perform dynamic asymmetric scaling of branch predictor tables are also described.
    Type: Grant
    Filed: September 29, 2018
    Date of Patent: May 26, 2020
    Assignee: INTEL CORPORATION
    Inventors: Ragavendra Natarajan, Niranjan Soundararajan, Saurabh Gupta, Sreenivas Subramoney
  • Patent number: 10642621
    Abstract: In one embodiment, a branch prediction circuit includes: a first bimodal predictor having a first plurality of entries each to store first prediction information for a corresponding branch instruction; a global predictor having a plurality of global entries each to store global prediction information for a corresponding branch instruction; a second bimodal predictor having a second plurality of entries each to store second prediction information for a corresponding branch instruction; a monitoring table having a plurality of monitoring entries each to store a counter value based on the second prediction information for a corresponding branch instruction; and a control circuit to allocate a global entry within the global predictor based at least in part on the counter value of a monitoring entry of the monitoring table for a corresponding branch instruction. Other embodiments are described and claimed.
    Type: Grant
    Filed: December 29, 2017
    Date of Patent: May 5, 2020
    Assignee: Intel Corporation
    Inventors: Ragavendra Natarajan, Niranjan Soundararajan, Sreenivas Subramoney
  • Publication number: 20200104137
    Abstract: Methods and apparatuses relating to dynamic asymmetric scaling of branch predictor tables are described. Branch predictor circuits to perform dynamic asymmetric scaling of branch predictor tables are also described.
    Type: Application
    Filed: September 29, 2018
    Publication date: April 2, 2020
    Inventors: Ragavendra Natarajan, NIiranjan Soundararajan, Saurabh Gupta, Sreenivas Subramoney
  • Patent number: 10579414
    Abstract: Embodiments of apparatuses, methods, and systems for misprediction-triggered local history-based branch prediction are described. In one embodiments, an apparatus includes a current pattern table and a local pattern table. The current pattern table has a plurality of entries, each entry in which to store a plurality of pattern lengths of a current pattern of one of a plurality of branch instructions. The local pattern table is to provide a first branch prediction based on the current pattern.
    Type: Grant
    Filed: April 1, 2017
    Date of Patent: March 3, 2020
    Assignee: Intel Corporation
    Inventors: Niranjan K. Soundararajan, Saurabh Gupta, Sreenivas Subramoney, Rahul Pal, Ragavendra Natarajan, Daniel Deng, Jared W. Stark, Ronak Singhal, Hong Wang
  • Patent number: 10559348
    Abstract: In one embodiment, an apparatus includes a memory array having a plurality of memory cells, a plurality of bitlines coupled to the plurality of memory cells, and a plurality of wordlines coupled to the plurality of memory cells. The memory array may further include a sense amplifier circuit to sense and amplify a value stored in a memory cell of the plurality of memory cells. The sense amplifier circuit may include: a buffer circuit to store the value, the buffer circuit coupled between a first internal node of the sense amplifier circuit and a second internal node of the sense amplifier circuit; and an equalization circuit to equalize the first internal node and the second internal node while the sense amplifier circuit is decoupled from the memory array. Other embodiments are described and claimed.
    Type: Grant
    Filed: May 16, 2018
    Date of Patent: February 11, 2020
    Assignee: Intel Corporation
    Inventors: Lavanya Subramanian, Kaushik Vaidyanathan, Anant Nori, Sreenivas Subramoney, Tanay Karnik
  • Publication number: 20200004542
    Abstract: A processing device includes a branch IP table and branch predication circuitry coupled to the branch IP table. The branch predication circuitry to: determine a dynamic convergence point in a conditional branch of set of instructions; store the dynamic convergence point in the branch IP table; fetch a first and second speculative path of the conditional branch; while determining which of the first speculative path and the second speculative path is a taken path of the conditional branch and determining whether a dynamic convergence point is fetched corresponding to the stored dynamic convergence point, stall scheduling of instructions of the first speculative path and the second speculative path; and in response to determining that one of the first speculative path and the second speculative path is the taken path and the fetched dynamic convergence point corresponds to the stored convergence point, resume scheduling of the instructions of the taken path.
    Type: Application
    Filed: June 28, 2018
    Publication date: January 2, 2020
    Inventors: Adarsh Chauhan, Jayesh Gaur, Zeev Sperber, Sumeet Bandishte, Lihu Rappoport, Stanislav Shwartsman, Kamil Garifullin, Sreenivas Subramoney, Adi Yoaz, Hong Wang
  • Patent number: 10496413
    Abstract: A processor includes a memory to hold a buffer to store data dependencies comprising nodes and edges for each of a plurality of micro-operations. The nodes include a first node for dispatch, a second node for execution, and a third node for commit. A detector circuit is to queue, in the buffer, the nodes of a micro-operation; add, to determine a node weight for each of the nodes of the micro-operation, an edge weight to a previous node weight of a connected micro-operation that yields a maximum node weight for the node, wherein the node weight comprises a number of execution cycles of an OOO pipeline of the processor and the edge weight comprises a number of execution cycles to execute the connected micro-operation; and identify, as a critical path, a path through the data dependencies that yields the maximum node weight for the micro-operation.
    Type: Grant
    Filed: February 15, 2017
    Date of Patent: December 3, 2019
    Assignee: Intel Corporation
    Inventors: Jayesh Gaur, Pooja Roy, Sreenivas Subramoney, Hong Wang, Ronak Singhal
  • Publication number: 20190355411
    Abstract: In one embodiment, an apparatus includes a memory array having a plurality of memory cells, a plurality of bitlines coupled to the plurality of memory cells, and a plurality of wordlines coupled to the plurality of memory cells. The memory array may further include a sense amplifier circuit to sense and amplify a value stored in a memory cell of the plurality of memory cells. The sense amplifier circuit may include: a buffer circuit to store the value, the buffer circuit coupled between a first internal node of the sense amplifier circuit and a second internal node of the sense amplifier circuit; and an equalization circuit to equalize the first internal node and the second internal node while the sense amplifier circuit is decoupled from the memory array. Other embodiments are described and claimed.
    Type: Application
    Filed: May 16, 2018
    Publication date: November 21, 2019
    Inventors: Lavanya Subramanian, Kaushik Vaidyanathan, Anant Nori, Sreenivas Subramoney, Tanay Karnik
  • Publication number: 20190310853
    Abstract: An apparatus and method for adaptive spatial accelerated prefetching. For example, one embodiment of an apparatus comprises: execution circuitry to execute instructions and process data; a Level 2 (L2) cache to store at least a portion of the data; and a prefetcher to prefetch data from a memory subsystem to the L2 cache in anticipation of the data being needed by the execution unit to execute one or more of the instructions, the prefetcher comprising a buffer to store one or more prefetched memory pages or portions thereof, and signature data indicating detected patterns of access to the one or more prefetched memory pages; wherein the prefetcher is to prefetch one or more cache lines based on the signature data.
    Type: Application
    Filed: June 30, 2018
    Publication date: October 10, 2019
    Inventors: RAHUL BERA, ANANT VITHAL NORI, SREENIVAS SUBRAMONEY, HONG WANG
  • Patent number: 10430198
    Abstract: One embodiment provides an apparatus. The apparatus includes a store direct dependent (SDD) branch prediction circuitry and an SDD management circuitry. The store direct dependent (SDD) branch prediction circuitry is to store an SDD branch table. The SDD branch table is to store at least one record. Each record includes a branch instruction pointer (IP) field, a load IP field, a store IP field, a comparison info field and at least one of a store value field and/or a predicted outcome field. The SDD management circuitry is to populate the SDD branch table at runtime and to override a baseline branch prediction associated with an incoming branch IP with an SDD branch prediction, if the SDD branch table contains a first record populated with the incoming branch IP and at least one of a store value and/or an SDD predicted outcome.
    Type: Grant
    Filed: January 12, 2018
    Date of Patent: October 1, 2019
    Assignee: Intel Corporation
    Inventors: Saurabh Gupta, Rahul Pal, Niranjan Soundararajan, Ragavendra Natarajan, Sreenivas Subramoney