Patents by Inventor Sreenivas Subramoney

Sreenivas Subramoney has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11949414
    Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed to improve in-memory multiply and accumulate operations. An example apparatus includes a first multiplexer in a subarray of memory, the first multiplexer to receive first values representative of a column of a lookup table (LUT) including entries to represent products of four-bit numbers and return second values from an intersection of a row and the column of the LUT based on a first element of a first operand; shift and adder logic in the subarray, the shift and adder logic to shift the second values based on at least one of the first element of the first operand or a first element of a second operand; and accumulation storage in the subarray, the accumulation storage to store at least the shifted second values.
    Type: Grant
    Filed: December 22, 2020
    Date of Patent: April 2, 2024
    Assignee: INTEL CORPORATION
    Inventors: Gurpreet Singh Kalsi, Akshay Krishna Ramanathan, Kamlesh Pillai, Sreenivas Subramoney, Srivatsa Rangachar Srinivasa, Anirud Thyagharajan, Om Ji Omer, Saurabh Jain
  • Publication number: 20240103878
    Abstract: An example of an integrated circuit may include a first execution cluster, a second execution cluster that is one or more of narrower and shallower as compared to the first execution cluster, and circuitry to selectively steer instructions to the first execution cluster and the second execution cluster based on branch misprediction information. Other embodiments are disclosed and claimed.
    Type: Application
    Filed: September 26, 2022
    Publication date: March 28, 2024
    Applicant: Intel Corporation
    Inventors: Jayesh Gaur, Sufiyan Syed, Adithya Ranganathan, Sreenivas Subramoney
  • Publication number: 20240103874
    Abstract: Methods and apparatus for instruction elimination through hardware driven memoization of loop instances. A hardware-based loop memoization technique learns repeating sequences of loops and transparently removes instructions for the loop instructions from instruction sequences while making their output available to dependent instructions as if the loop instructions had been executed. A path-based predictor is implemented at the front-end to predict these loop instances and remove their instructions from instruction sequences. A novel memoization prediction micro-operation (Uop) is inserted into the instruction sequence for instances of loops that are predicted to be memoized. The memoization prediction Uop is used to compare the input signature (expected set of input values for the loop) with the actual signature to determine correct and incorrect predictions.
    Type: Application
    Filed: September 23, 2022
    Publication date: March 28, 2024
    Inventors: Niranjan Kumar Soundararajan, Sreenivas Subramoney, Jayesh Gaur
  • Patent number: 11941534
    Abstract: A system is provided that includes a bit vector-based distance counter circuitry configured to generate one or more bit vectors encoded with information about potential matches and edits between a read and a reference genome, wherein the read comprises an encoding of a fragment of deoxyribonucleic acid (DNA) encoded via bases G, A, T, C. The system further includes a bit vector-based traceback circuitry configured to divide the reference genome into one or more windows and to use the plurality of bit vectors to generate a traceback output for each of the one or more windows, wherein the traceback output comprises a match, a substitution, an insert, a delete, or a combination thereof, between the read and the one or more windows.
    Type: Grant
    Filed: December 28, 2019
    Date of Patent: March 26, 2024
    Assignee: Intel Corporation
    Inventors: Gurpreet Singh Kalsi, Anant V. Nori, Christopher Justin Hughes, Sreenivas Subramoney, Damla Senol
  • Patent number: 11874773
    Abstract: Systems, methods, and apparatuses relating to a dual spatial pattern prefetcher are described.
    Type: Grant
    Filed: December 28, 2019
    Date of Patent: January 16, 2024
    Assignee: Intel Corporation
    Inventors: Rahul Bera, Anant Vithal Nori, Sreenivas Subramoney
  • Publication number: 20230418757
    Abstract: Techniques and mechanisms for selectively increasing or decreasing an amount of cache resources which are to be available for use in the provisioning of decoded micro-operations in a processor. In an embodiment, a processor core comprises both a first cache which is dedicated to caching micro-operations, and a second cache which is coupled to receive data, or non-decoded instructions. The core further comprises circuitry to monitor one or more cache performance characteristics of the core. Based on the one or more cache performance characteristics, the circuitry performs an evaluation to determine whether to increase—or alternatively, to decrease—the size of a pool of one or more caches which are to be available to receive micro-operations. In another embodiment, the second cache is added to the pool based on an indication of an overutilization of the first cache.
    Type: Application
    Filed: June 22, 2022
    Publication date: December 28, 2023
    Applicant: Intel Corporation
    Inventors: Niranjan Soundararajan, Sreenivas Subramoney, Vishal Gupta, Neelu Shivprakash Kalani
  • Publication number: 20230409481
    Abstract: System and method for prefetching pointer-referenced data. A method embodiment includes: tracking a plurality of load instructions which includes a first load instruction to access a first data that identifies a first memory location; detecting a second load instruction which accesses a second memory location for a second data, the second memory location matching the first memory location identified by the first data; responsive to the detecting, updating a list of pointer load instructions to include information identifying the first load instruction as a pointer load instruction; prefetching a third data for a third load instruction prior to executing the third load instruction; identifying the third load instruction as a pointer load instruction based on information from the list of pointer load instructions and responsively prefetching a fourth data from a fourth memory location, wherein the fourth memory location is identified by the third data.
    Type: Application
    Filed: May 19, 2023
    Publication date: December 21, 2023
    Applicant: Intel Corporation
    Inventors: Sreenivas Subramoney, Stanislav Shwartsman, Anant Nori, Shankar Balachandran, Elad Shtiegmann, Vineeth Mekkat, Manjunath Shevgoor, Sourabh Alurkar
  • Patent number: 11847053
    Abstract: Systems, methods, and apparatuses relating to circuitry to implement a duplication resistant on-die irregular data prefetcher are described.
    Type: Grant
    Filed: March 27, 2020
    Date of Patent: December 19, 2023
    Assignee: Intel Corporation
    Inventors: Prathmesh Kallurkar, Anant Vithal Nori, Sreenivas Subramoney
  • Publication number: 20230333999
    Abstract: Systems, apparatuses and methods may provide for technology that includes a chip having a memory structure including compute hardware, a plurality of address decoders coupled to the compute hardware, and a hierarchical interconnect fabric coupled to the plurality of address decoders, and direct memory address (DMA) hardware positioned adjacent to one or more of the plurality of address decoders, wherein the DMA hardware is to conduct on-chip transfers of intermediate state data via the hierarchical interconnect fabric. Additionally, the chip may include logic to allocate address space in the chip to intermediate state data and store the intermediate state data to the allocated address space.
    Type: Application
    Filed: June 22, 2023
    Publication date: October 19, 2023
    Inventors: Om Ji Omer, Anirud Thyagharajan, Sreenivas Subramoney
  • Patent number: 11783170
    Abstract: Systems, apparatuses and methods may provide for technology that decodes data via an instruction that indicates a number of rulebooks to be processed, an input feature size, an output feature size, and a plurality of feature map base addresses, rearranges spatially distributed voxel output feature maps in the decoded data based on weight planes, and performs a channel-wise multiply-accumulate (MAC) operation on the rearranged spatially distributed voxel output feature maps to obtain an output, wherein the channel-wise MAC operation is performed as partial accumulations by a plurality of processing elements.
    Type: Grant
    Filed: January 25, 2023
    Date of Patent: October 10, 2023
    Assignee: INTEL CORPORATION
    Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Sreenivas Subramoney, Prashant Laddha, Om Ji Omer
  • Patent number: 11734174
    Abstract: Described is an low overhead method and apparatus to reconfigure a pair of buffered interconnect links to operate in one of these three modes—first mode (e.g., bandwidth mode), second mode (e.g., latency mode), and third mode (e.g., energy mode). In bandwidth mode, each link in the pair buffered interconnect links carries a unique signal from source to destination. In latency mode, both links in the pair carry the same signal from source to destination, where one link in the pair is “primary” and other is called the “assist”. Temporal alignment of transitions in this pair of buffered interconnects reduces the effective capacitance of primary, thereby reducing delay or latency. In energy mode, one link in the pair, the primary, alone carries a signal, while the other link in the pair is idle. An idle neighbor on one side reduces energy consumption of the primary.
    Type: Grant
    Filed: September 19, 2019
    Date of Patent: August 22, 2023
    Assignee: Intel Corporation
    Inventors: Huichu Liu, Tanay Karnik, Tejpal Singh, Yen-Cheng Liu, Lavanya Subramanian, Mahesh Kumashikar, Sri Harsha Choday, Sreenivas Subramoney, Kaushik Vaidyanathan, Daniel H. Morris, Uygar E. Avci, Ian A. Young
  • Patent number: 11693780
    Abstract: System and method for prefetching pointer-referenced data. A method embodiment includes: tracking a plurality of load instructions which includes a first load instruction to access a first data that identifies a first memory location; detecting a second load instruction which accesses a second memory location for a second data, the second memory location matching the first memory location identified by the first data; responsive to the detecting, updating a list of pointer load instructions to include information identifying the first load instruction as a pointer load instruction; prefetching a third data for a third load instruction prior to executing the third load instruction; identifying the third load instruction as a pointer load instruction based on information from the list of pointer load instructions and responsively prefetching a fourth data from a fourth memory location, wherein the fourth memory location is identified by the third data.
    Type: Grant
    Filed: August 2, 2021
    Date of Patent: July 4, 2023
    Assignee: Intel Corporation
    Inventors: Sreenivas Subramoney, Stanislav Shwartsman, Anant Nori, Shankar Balachandran, Elad Shtiegmann, Vineeth Mekkat, Manjunath Shevgoor, Sourabh Alurkar
  • Publication number: 20230205699
    Abstract: An apparatus includes memory circuitry including a first data structure and prefetch circuitry that is coupled to the memory circuitry. The prefetch circuitry is to store, in the first data structure, a first subregion entry corresponding to a first subregion of a memory region allocated to a program. The first subregion entry is to include a plurality of delta values. A first delta value of the plurality of delta values represents a first distance between two cache lines associated with consecutive memory accesses within a second subregion of the memory region. The prefetch circuitry is further to detect a first memory access of a first cache line in the first subregion, identify prefetch candidates based on the first cache line and the plurality of delta values, and issue at least one prefetch request based on at least two of the prefetch candidates to be prefetched into a cache.
    Type: Application
    Filed: December 24, 2021
    Publication date: June 29, 2023
    Applicant: Intel Corporation
    Inventors: Swaraj Sha, Anant Vithal Nori, Sreenivas Subramoney, Stanislav Shwartsman, Pavel I. Kryukov, Lihu Rappoport
  • Publication number: 20230205692
    Abstract: Apparatus and method for leveraging simultaneous multithreading for bulk compute operations. For example, one embodiment of a processor comprises: a plurality of cores including a first core to simultaneously process instructions of a plurality of threads; a cache hierarchy coupled to the first core and the memory, the cache hierarchy comprising a Level 1 (L1) cache, a Level 2 (L2) cache, and a Level 3 (L3) cache; and a plurality of compute units coupled to the first core including a first compute unit associated with the L1 cache, a second compute unit associated with the L2 cache, and a third compute unit associated with the L3 cache, wherein the first core is to offload instructions for execution by the compute units, the first core to offload instructions from a first thread to the first compute unit, instructions from a second thread to the second compute unit, and instructions from a third thread to the third compute unit.
    Type: Application
    Filed: December 23, 2021
    Publication date: June 29, 2023
    Inventors: ANANT NORI, RAHUL BERA, SHANKAR BALACHANDRAN, JOYDEEP RAKSHIT, Om Ji OMER, SREENIVAS SUBRAMONEY, AVISHAII ABUHATZERA, BELLIAPPA KUTTANNA
  • Publication number: 20230195464
    Abstract: Methods and apparatus relating to throttling a code fetch for speculative code paths are described. In an embodiment, a first storage structure stores a reference to a code line in response to a request to be received from a cache. A second storage structure to store a reference to the code line in response to an update to an Instruction Dispatch Queue (IDQ). Logic circuitry controls additional code line fetch operations based at least in part on a comparison of a number of ongoing speculative code fetches and a determination that the code line is speculative. Other embodiments are also disclosed and claimed.
    Type: Application
    Filed: December 16, 2021
    Publication date: June 22, 2023
    Applicant: Intel Corporation
    Inventors: Anant Vithal Nori, Prathmesh Kallurkar, Sreenivas Subramoney, Niranjan Kumar Soundararajan
  • Publication number: 20230195469
    Abstract: Techniques and mechanisms for a processor to determine an execution of instructions based on a prediction of a taken branch. In an embodiment, a first prediction unit generates each of multiple branch predictions in one cycle of successive branch prediction cycles. An indication of the branch predictions is provided to an execution pipeline, which prepares to execute an instruction based on the indication. Where a first one of the branch predictions is determined to be of a low confidence type, said first branch prediction is further indicated to a second prediction unit, which performs a second branch prediction based on the same branch instruction for which the first branch prediction was made. In another embodiment, the second prediction unit signals that a state of the execution pipeline is to be cleared, based on a determination that the first and second branch predictions are inconsistent with each other.
    Type: Application
    Filed: December 21, 2021
    Publication date: June 22, 2023
    Applicant: Intel Corporation
    Inventors: Sumeet Bandishte, Jayesh Gaur, Franck Sala, Alexey Yurievich Sivtsov, Jared Warner Stark, IV, Lihu Rappoport, Sreenivas Subramoney
  • Publication number: 20230185718
    Abstract: Methods and apparatus relating to de-prioritizing speculative code lines in on-chip caches are described. In an embodiment, logic circuitry determines whether a storage structure includes a reference to a code miss request prior to transmission of the code miss request to a shared cache. The logic circuitry causes de-prioritization of a code line, corresponding to the code miss request, in the shared cache in response to an absence of the reference in the storage structure. Other embodiments are also disclosed and claimed.
    Type: Application
    Filed: December 14, 2021
    Publication date: June 15, 2023
    Applicant: Intel Corporation
    Inventors: Anant Vithal Nori, Prathmesh Kallurkar, Niranjan Kumar Soundararajan, Sreenivas Subramoney, Lihu Rappoport, Hanna Alam, Adrian Moga, Ronak Singhal
  • Publication number: 20230169319
    Abstract: Systems, apparatuses and methods may provide for technology that decodes data via an instruction that indicates a number of rulebooks to be processed, an input feature size, an output feature size, and a plurality of feature map base addresses, rearranges spatially distributed voxel output feature maps in the decoded data based on weight planes, and performs a channel-wise multiply-accumulate (MAC) operation on the rearranged spatially distributed voxel output feature maps to obtain an output, wherein the channel-wise MAC operation is performed as partial accumulations by a plurality of processing elements.
    Type: Application
    Filed: January 25, 2023
    Publication date: June 1, 2023
    Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Sreenivas Subramoney, Prashant Laddha, Om Ji Omer
  • Patent number: 11656971
    Abstract: A processor comprises a microarchitectural feature and dynamic tuning unit (DTU) circuitry. The processor executes a program for first and second execution windows with the microarchitectural feature disabled and enabled, respectively. The DTU circuitry automatically determines whether the processor achieved worse performance in the second execution window. In response to determining that the processor achieved worse performance in the second execution window, the DTU circuitry updates a usefulness state for a selected address of the program to denote worse performance. In response to multiple consecutive determinations that the processor achieved worse performance with the microarchitectural feature enabled, the DTU circuitry automatically updates the usefulness state to denote a confirmed bad state.
    Type: Grant
    Filed: January 24, 2022
    Date of Patent: May 23, 2023
    Assignee: Intel Corporation
    Inventors: Adarsh Chauhan, Jayesh Gaur, Franck Sala, Lihu Rappoport, Zeev Sperber, Adi Yoaz, Sreenivas Subramoney
  • Patent number: 11645078
    Abstract: Systems, methods, and apparatuses relating to hardware for auto-predication of critical branches. In one embodiment, a processor core includes a decoder to decode instructions into decoded instructions, an execution unit to execute the decoded instructions, a branch predictor circuit to predict a future outcome of a branch instruction, and a branch predication manager circuit to disable use of the predicted future outcome for a conditional critical branch comprising the branch instruction.
    Type: Grant
    Filed: December 28, 2019
    Date of Patent: May 9, 2023
    Assignee: Intel Corporation
    Inventors: Adarsh Chauhan, Franck Sala, Jayesh Gaur, Zeev Sperber, Lihu Rappoport, Adi Yoaz, Sreenivas Subramoney