Patents by Inventor Sreenivas Subramoney

Sreenivas Subramoney has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 12020033
    Abstract: Apparatus and method for memorizing repeat function calls are described herein. An apparatus embodiment includes: uop buffer circuitry to identify a function for memorization based on retiring micro-operations (uops) from a processing pipeline; memorization retirement circuitry to generate a signature of the function which includes input and output data of the function; a memorization data structure to store the signature; and predictor circuitry to detect an instance of the function to be executed by the processing pipeline and to responsively exclude a first subset of uops associated with the instance from execution when a confidence level associated with the function is above a threshold. One or more instructions that are data-dependent on execution of the instance is then provided with the output data of the function from the memorization data structure.
    Type: Grant
    Filed: December 24, 2020
    Date of Patent: June 25, 2024
    Assignee: Intel Corporation
    Inventors: Niranjan Kumar Soundararajan, Sreenivas Subramoney, Jayesh Gaur, S R Swamy Saranam Chongala
  • Publication number: 20240202000
    Abstract: Techniques and mechanisms for efficiently saving and recovering state of a processor core. In an embodiment, a processor core fetches and decodes a first instruction to generate a first decoded instruction, wherein the first instruction comprises a first opcode which corresponds to one or more components of the processor core. Execution of the first instruction comprises saving microarchitectural state of the one or more components to a memory of the core. In another embodiment, a processor core fetches and decodes a second instruction to generate a second decoded instruction, wherein the second instruction comprises a second opcode which corresponds to the same one or more components. Execution of the second instruction comprises restoring the microarchitectural state from the memory to the one or more components.
    Type: Application
    Filed: December 19, 2022
    Publication date: June 20, 2024
    Applicant: Intel Corporation
    Inventors: Niranjan Soundararajan, Sreenivas Subramoney
  • Publication number: 20240201949
    Abstract: Systems, apparatuses and methods may provide for technology that includes a compute-in-memory (CiM) enabled memory array to conduct digital bit-serial multiply and accumulate (MAC) operations on multi-bit input data and weight data stored in the CiM enabled memory array, an adder tree coupled to the CiM enabled memory array, an accumulator coupled to the adder tree, and an input bit selection stage coupled to the CiM enabled memory array, wherein the input bit selection stage restricts serial bit selection on the multi-bit input data to non-zero values during the digital MAC operations.
    Type: Application
    Filed: February 28, 2024
    Publication date: June 20, 2024
    Inventors: Sagar Varma Sayyaparaju, Om Ji Omer, Sreenivas Subramoney
  • Patent number: 11972126
    Abstract: Technologies disclosed herein provide one example of a system that includes processor circuitry to be communicatively coupled to a memory circuitry. The processor circuitry is to receive a memory access request corresponding to an application for access to an address range in a memory allocation of the memory circuitry and to locate a metadata region within the memory allocation. The processor circuitry is also to, in response to a determination that the address range includes at least a portion of the metadata region, obtain first metadata stored in the metadata region, use the first metadata to determine an alternate memory address in a relocation region, and read, at the alternate memory address, displaced data from the portion of the metadata region included in the address range of the memory allocation. The address range includes one or more bytes of an expected allocation region of the memory allocation.
    Type: Grant
    Filed: September 10, 2021
    Date of Patent: April 30, 2024
    Assignee: Intel Corporation
    Inventors: David M. Durham, Michael D. LeMay, Sergej Deutsch, Joydeep Rakshit, Anant Vithal Nori, Jayesh Gaur, Sreenivas Subramoney
  • Patent number: 11949414
    Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed to improve in-memory multiply and accumulate operations. An example apparatus includes a first multiplexer in a subarray of memory, the first multiplexer to receive first values representative of a column of a lookup table (LUT) including entries to represent products of four-bit numbers and return second values from an intersection of a row and the column of the LUT based on a first element of a first operand; shift and adder logic in the subarray, the shift and adder logic to shift the second values based on at least one of the first element of the first operand or a first element of a second operand; and accumulation storage in the subarray, the accumulation storage to store at least the shifted second values.
    Type: Grant
    Filed: December 22, 2020
    Date of Patent: April 2, 2024
    Assignee: INTEL CORPORATION
    Inventors: Gurpreet Singh Kalsi, Akshay Krishna Ramanathan, Kamlesh Pillai, Sreenivas Subramoney, Srivatsa Rangachar Srinivasa, Anirud Thyagharajan, Om Ji Omer, Saurabh Jain
  • Publication number: 20240103874
    Abstract: Methods and apparatus for instruction elimination through hardware driven memoization of loop instances. A hardware-based loop memoization technique learns repeating sequences of loops and transparently removes instructions for the loop instructions from instruction sequences while making their output available to dependent instructions as if the loop instructions had been executed. A path-based predictor is implemented at the front-end to predict these loop instances and remove their instructions from instruction sequences. A novel memoization prediction micro-operation (Uop) is inserted into the instruction sequence for instances of loops that are predicted to be memoized. The memoization prediction Uop is used to compare the input signature (expected set of input values for the loop) with the actual signature to determine correct and incorrect predictions.
    Type: Application
    Filed: September 23, 2022
    Publication date: March 28, 2024
    Inventors: Niranjan Kumar Soundararajan, Sreenivas Subramoney, Jayesh Gaur
  • Publication number: 20240103878
    Abstract: An example of an integrated circuit may include a first execution cluster, a second execution cluster that is one or more of narrower and shallower as compared to the first execution cluster, and circuitry to selectively steer instructions to the first execution cluster and the second execution cluster based on branch misprediction information. Other embodiments are disclosed and claimed.
    Type: Application
    Filed: September 26, 2022
    Publication date: March 28, 2024
    Applicant: Intel Corporation
    Inventors: Jayesh Gaur, Sufiyan Syed, Adithya Ranganathan, Sreenivas Subramoney
  • Patent number: 11941534
    Abstract: A system is provided that includes a bit vector-based distance counter circuitry configured to generate one or more bit vectors encoded with information about potential matches and edits between a read and a reference genome, wherein the read comprises an encoding of a fragment of deoxyribonucleic acid (DNA) encoded via bases G, A, T, C. The system further includes a bit vector-based traceback circuitry configured to divide the reference genome into one or more windows and to use the plurality of bit vectors to generate a traceback output for each of the one or more windows, wherein the traceback output comprises a match, a substitution, an insert, a delete, or a combination thereof, between the read and the one or more windows.
    Type: Grant
    Filed: December 28, 2019
    Date of Patent: March 26, 2024
    Assignee: Intel Corporation
    Inventors: Gurpreet Singh Kalsi, Anant V. Nori, Christopher Justin Hughes, Sreenivas Subramoney, Damla Senol
  • Patent number: 11874773
    Abstract: Systems, methods, and apparatuses relating to a dual spatial pattern prefetcher are described.
    Type: Grant
    Filed: December 28, 2019
    Date of Patent: January 16, 2024
    Assignee: Intel Corporation
    Inventors: Rahul Bera, Anant Vithal Nori, Sreenivas Subramoney
  • Publication number: 20230418757
    Abstract: Techniques and mechanisms for selectively increasing or decreasing an amount of cache resources which are to be available for use in the provisioning of decoded micro-operations in a processor. In an embodiment, a processor core comprises both a first cache which is dedicated to caching micro-operations, and a second cache which is coupled to receive data, or non-decoded instructions. The core further comprises circuitry to monitor one or more cache performance characteristics of the core. Based on the one or more cache performance characteristics, the circuitry performs an evaluation to determine whether to increase—or alternatively, to decrease—the size of a pool of one or more caches which are to be available to receive micro-operations. In another embodiment, the second cache is added to the pool based on an indication of an overutilization of the first cache.
    Type: Application
    Filed: June 22, 2022
    Publication date: December 28, 2023
    Applicant: Intel Corporation
    Inventors: Niranjan Soundararajan, Sreenivas Subramoney, Vishal Gupta, Neelu Shivprakash Kalani
  • Publication number: 20230409481
    Abstract: System and method for prefetching pointer-referenced data. A method embodiment includes: tracking a plurality of load instructions which includes a first load instruction to access a first data that identifies a first memory location; detecting a second load instruction which accesses a second memory location for a second data, the second memory location matching the first memory location identified by the first data; responsive to the detecting, updating a list of pointer load instructions to include information identifying the first load instruction as a pointer load instruction; prefetching a third data for a third load instruction prior to executing the third load instruction; identifying the third load instruction as a pointer load instruction based on information from the list of pointer load instructions and responsively prefetching a fourth data from a fourth memory location, wherein the fourth memory location is identified by the third data.
    Type: Application
    Filed: May 19, 2023
    Publication date: December 21, 2023
    Applicant: Intel Corporation
    Inventors: Sreenivas Subramoney, Stanislav Shwartsman, Anant Nori, Shankar Balachandran, Elad Shtiegmann, Vineeth Mekkat, Manjunath Shevgoor, Sourabh Alurkar
  • Patent number: 11847053
    Abstract: Systems, methods, and apparatuses relating to circuitry to implement a duplication resistant on-die irregular data prefetcher are described.
    Type: Grant
    Filed: March 27, 2020
    Date of Patent: December 19, 2023
    Assignee: Intel Corporation
    Inventors: Prathmesh Kallurkar, Anant Vithal Nori, Sreenivas Subramoney
  • Publication number: 20230333999
    Abstract: Systems, apparatuses and methods may provide for technology that includes a chip having a memory structure including compute hardware, a plurality of address decoders coupled to the compute hardware, and a hierarchical interconnect fabric coupled to the plurality of address decoders, and direct memory address (DMA) hardware positioned adjacent to one or more of the plurality of address decoders, wherein the DMA hardware is to conduct on-chip transfers of intermediate state data via the hierarchical interconnect fabric. Additionally, the chip may include logic to allocate address space in the chip to intermediate state data and store the intermediate state data to the allocated address space.
    Type: Application
    Filed: June 22, 2023
    Publication date: October 19, 2023
    Inventors: Om Ji Omer, Anirud Thyagharajan, Sreenivas Subramoney
  • Patent number: 11783170
    Abstract: Systems, apparatuses and methods may provide for technology that decodes data via an instruction that indicates a number of rulebooks to be processed, an input feature size, an output feature size, and a plurality of feature map base addresses, rearranges spatially distributed voxel output feature maps in the decoded data based on weight planes, and performs a channel-wise multiply-accumulate (MAC) operation on the rearranged spatially distributed voxel output feature maps to obtain an output, wherein the channel-wise MAC operation is performed as partial accumulations by a plurality of processing elements.
    Type: Grant
    Filed: January 25, 2023
    Date of Patent: October 10, 2023
    Assignee: INTEL CORPORATION
    Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Sreenivas Subramoney, Prashant Laddha, Om Ji Omer
  • Patent number: 11734174
    Abstract: Described is an low overhead method and apparatus to reconfigure a pair of buffered interconnect links to operate in one of these three modes—first mode (e.g., bandwidth mode), second mode (e.g., latency mode), and third mode (e.g., energy mode). In bandwidth mode, each link in the pair buffered interconnect links carries a unique signal from source to destination. In latency mode, both links in the pair carry the same signal from source to destination, where one link in the pair is “primary” and other is called the “assist”. Temporal alignment of transitions in this pair of buffered interconnects reduces the effective capacitance of primary, thereby reducing delay or latency. In energy mode, one link in the pair, the primary, alone carries a signal, while the other link in the pair is idle. An idle neighbor on one side reduces energy consumption of the primary.
    Type: Grant
    Filed: September 19, 2019
    Date of Patent: August 22, 2023
    Assignee: Intel Corporation
    Inventors: Huichu Liu, Tanay Karnik, Tejpal Singh, Yen-Cheng Liu, Lavanya Subramanian, Mahesh Kumashikar, Sri Harsha Choday, Sreenivas Subramoney, Kaushik Vaidyanathan, Daniel H. Morris, Uygar E. Avci, Ian A. Young
  • Patent number: 11693780
    Abstract: System and method for prefetching pointer-referenced data. A method embodiment includes: tracking a plurality of load instructions which includes a first load instruction to access a first data that identifies a first memory location; detecting a second load instruction which accesses a second memory location for a second data, the second memory location matching the first memory location identified by the first data; responsive to the detecting, updating a list of pointer load instructions to include information identifying the first load instruction as a pointer load instruction; prefetching a third data for a third load instruction prior to executing the third load instruction; identifying the third load instruction as a pointer load instruction based on information from the list of pointer load instructions and responsively prefetching a fourth data from a fourth memory location, wherein the fourth memory location is identified by the third data.
    Type: Grant
    Filed: August 2, 2021
    Date of Patent: July 4, 2023
    Assignee: Intel Corporation
    Inventors: Sreenivas Subramoney, Stanislav Shwartsman, Anant Nori, Shankar Balachandran, Elad Shtiegmann, Vineeth Mekkat, Manjunath Shevgoor, Sourabh Alurkar
  • Publication number: 20230205699
    Abstract: An apparatus includes memory circuitry including a first data structure and prefetch circuitry that is coupled to the memory circuitry. The prefetch circuitry is to store, in the first data structure, a first subregion entry corresponding to a first subregion of a memory region allocated to a program. The first subregion entry is to include a plurality of delta values. A first delta value of the plurality of delta values represents a first distance between two cache lines associated with consecutive memory accesses within a second subregion of the memory region. The prefetch circuitry is further to detect a first memory access of a first cache line in the first subregion, identify prefetch candidates based on the first cache line and the plurality of delta values, and issue at least one prefetch request based on at least two of the prefetch candidates to be prefetched into a cache.
    Type: Application
    Filed: December 24, 2021
    Publication date: June 29, 2023
    Applicant: Intel Corporation
    Inventors: Swaraj Sha, Anant Vithal Nori, Sreenivas Subramoney, Stanislav Shwartsman, Pavel I. Kryukov, Lihu Rappoport
  • Publication number: 20230195469
    Abstract: Techniques and mechanisms for a processor to determine an execution of instructions based on a prediction of a taken branch. In an embodiment, a first prediction unit generates each of multiple branch predictions in one cycle of successive branch prediction cycles. An indication of the branch predictions is provided to an execution pipeline, which prepares to execute an instruction based on the indication. Where a first one of the branch predictions is determined to be of a low confidence type, said first branch prediction is further indicated to a second prediction unit, which performs a second branch prediction based on the same branch instruction for which the first branch prediction was made. In another embodiment, the second prediction unit signals that a state of the execution pipeline is to be cleared, based on a determination that the first and second branch predictions are inconsistent with each other.
    Type: Application
    Filed: December 21, 2021
    Publication date: June 22, 2023
    Applicant: Intel Corporation
    Inventors: Sumeet Bandishte, Jayesh Gaur, Franck Sala, Alexey Yurievich Sivtsov, Jared Warner Stark, IV, Lihu Rappoport, Sreenivas Subramoney
  • Publication number: 20230195464
    Abstract: Methods and apparatus relating to throttling a code fetch for speculative code paths are described. In an embodiment, a first storage structure stores a reference to a code line in response to a request to be received from a cache. A second storage structure to store a reference to the code line in response to an update to an Instruction Dispatch Queue (IDQ). Logic circuitry controls additional code line fetch operations based at least in part on a comparison of a number of ongoing speculative code fetches and a determination that the code line is speculative. Other embodiments are also disclosed and claimed.
    Type: Application
    Filed: December 16, 2021
    Publication date: June 22, 2023
    Applicant: Intel Corporation
    Inventors: Anant Vithal Nori, Prathmesh Kallurkar, Sreenivas Subramoney, Niranjan Kumar Soundararajan
  • Publication number: 20230185718
    Abstract: Methods and apparatus relating to de-prioritizing speculative code lines in on-chip caches are described. In an embodiment, logic circuitry determines whether a storage structure includes a reference to a code miss request prior to transmission of the code miss request to a shared cache. The logic circuitry causes de-prioritization of a code line, corresponding to the code miss request, in the shared cache in response to an absence of the reference in the storage structure. Other embodiments are also disclosed and claimed.
    Type: Application
    Filed: December 14, 2021
    Publication date: June 15, 2023
    Applicant: Intel Corporation
    Inventors: Anant Vithal Nori, Prathmesh Kallurkar, Niranjan Kumar Soundararajan, Sreenivas Subramoney, Lihu Rappoport, Hanna Alam, Adrian Moga, Ronak Singhal