Patents by Inventor Sreenivas Subramoney
Sreenivas Subramoney has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 11949414Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed to improve in-memory multiply and accumulate operations. An example apparatus includes a first multiplexer in a subarray of memory, the first multiplexer to receive first values representative of a column of a lookup table (LUT) including entries to represent products of four-bit numbers and return second values from an intersection of a row and the column of the LUT based on a first element of a first operand; shift and adder logic in the subarray, the shift and adder logic to shift the second values based on at least one of the first element of the first operand or a first element of a second operand; and accumulation storage in the subarray, the accumulation storage to store at least the shifted second values.Type: GrantFiled: December 22, 2020Date of Patent: April 2, 2024Assignee: INTEL CORPORATIONInventors: Gurpreet Singh Kalsi, Akshay Krishna Ramanathan, Kamlesh Pillai, Sreenivas Subramoney, Srivatsa Rangachar Srinivasa, Anirud Thyagharajan, Om Ji Omer, Saurabh Jain
-
Publication number: 20240103878Abstract: An example of an integrated circuit may include a first execution cluster, a second execution cluster that is one or more of narrower and shallower as compared to the first execution cluster, and circuitry to selectively steer instructions to the first execution cluster and the second execution cluster based on branch misprediction information. Other embodiments are disclosed and claimed.Type: ApplicationFiled: September 26, 2022Publication date: March 28, 2024Applicant: Intel CorporationInventors: Jayesh Gaur, Sufiyan Syed, Adithya Ranganathan, Sreenivas Subramoney
-
Publication number: 20240103874Abstract: Methods and apparatus for instruction elimination through hardware driven memoization of loop instances. A hardware-based loop memoization technique learns repeating sequences of loops and transparently removes instructions for the loop instructions from instruction sequences while making their output available to dependent instructions as if the loop instructions had been executed. A path-based predictor is implemented at the front-end to predict these loop instances and remove their instructions from instruction sequences. A novel memoization prediction micro-operation (Uop) is inserted into the instruction sequence for instances of loops that are predicted to be memoized. The memoization prediction Uop is used to compare the input signature (expected set of input values for the loop) with the actual signature to determine correct and incorrect predictions.Type: ApplicationFiled: September 23, 2022Publication date: March 28, 2024Inventors: Niranjan Kumar Soundararajan, Sreenivas Subramoney, Jayesh Gaur
-
Patent number: 11941534Abstract: A system is provided that includes a bit vector-based distance counter circuitry configured to generate one or more bit vectors encoded with information about potential matches and edits between a read and a reference genome, wherein the read comprises an encoding of a fragment of deoxyribonucleic acid (DNA) encoded via bases G, A, T, C. The system further includes a bit vector-based traceback circuitry configured to divide the reference genome into one or more windows and to use the plurality of bit vectors to generate a traceback output for each of the one or more windows, wherein the traceback output comprises a match, a substitution, an insert, a delete, or a combination thereof, between the read and the one or more windows.Type: GrantFiled: December 28, 2019Date of Patent: March 26, 2024Assignee: Intel CorporationInventors: Gurpreet Singh Kalsi, Anant V. Nori, Christopher Justin Hughes, Sreenivas Subramoney, Damla Senol
-
Patent number: 11874773Abstract: Systems, methods, and apparatuses relating to a dual spatial pattern prefetcher are described.Type: GrantFiled: December 28, 2019Date of Patent: January 16, 2024Assignee: Intel CorporationInventors: Rahul Bera, Anant Vithal Nori, Sreenivas Subramoney
-
Publication number: 20230418757Abstract: Techniques and mechanisms for selectively increasing or decreasing an amount of cache resources which are to be available for use in the provisioning of decoded micro-operations in a processor. In an embodiment, a processor core comprises both a first cache which is dedicated to caching micro-operations, and a second cache which is coupled to receive data, or non-decoded instructions. The core further comprises circuitry to monitor one or more cache performance characteristics of the core. Based on the one or more cache performance characteristics, the circuitry performs an evaluation to determine whether to increase—or alternatively, to decrease—the size of a pool of one or more caches which are to be available to receive micro-operations. In another embodiment, the second cache is added to the pool based on an indication of an overutilization of the first cache.Type: ApplicationFiled: June 22, 2022Publication date: December 28, 2023Applicant: Intel CorporationInventors: Niranjan Soundararajan, Sreenivas Subramoney, Vishal Gupta, Neelu Shivprakash Kalani
-
Publication number: 20230409481Abstract: System and method for prefetching pointer-referenced data. A method embodiment includes: tracking a plurality of load instructions which includes a first load instruction to access a first data that identifies a first memory location; detecting a second load instruction which accesses a second memory location for a second data, the second memory location matching the first memory location identified by the first data; responsive to the detecting, updating a list of pointer load instructions to include information identifying the first load instruction as a pointer load instruction; prefetching a third data for a third load instruction prior to executing the third load instruction; identifying the third load instruction as a pointer load instruction based on information from the list of pointer load instructions and responsively prefetching a fourth data from a fourth memory location, wherein the fourth memory location is identified by the third data.Type: ApplicationFiled: May 19, 2023Publication date: December 21, 2023Applicant: Intel CorporationInventors: Sreenivas Subramoney, Stanislav Shwartsman, Anant Nori, Shankar Balachandran, Elad Shtiegmann, Vineeth Mekkat, Manjunath Shevgoor, Sourabh Alurkar
-
Patent number: 11847053Abstract: Systems, methods, and apparatuses relating to circuitry to implement a duplication resistant on-die irregular data prefetcher are described.Type: GrantFiled: March 27, 2020Date of Patent: December 19, 2023Assignee: Intel CorporationInventors: Prathmesh Kallurkar, Anant Vithal Nori, Sreenivas Subramoney
-
Publication number: 20230333999Abstract: Systems, apparatuses and methods may provide for technology that includes a chip having a memory structure including compute hardware, a plurality of address decoders coupled to the compute hardware, and a hierarchical interconnect fabric coupled to the plurality of address decoders, and direct memory address (DMA) hardware positioned adjacent to one or more of the plurality of address decoders, wherein the DMA hardware is to conduct on-chip transfers of intermediate state data via the hierarchical interconnect fabric. Additionally, the chip may include logic to allocate address space in the chip to intermediate state data and store the intermediate state data to the allocated address space.Type: ApplicationFiled: June 22, 2023Publication date: October 19, 2023Inventors: Om Ji Omer, Anirud Thyagharajan, Sreenivas Subramoney
-
Patent number: 11783170Abstract: Systems, apparatuses and methods may provide for technology that decodes data via an instruction that indicates a number of rulebooks to be processed, an input feature size, an output feature size, and a plurality of feature map base addresses, rearranges spatially distributed voxel output feature maps in the decoded data based on weight planes, and performs a channel-wise multiply-accumulate (MAC) operation on the rearranged spatially distributed voxel output feature maps to obtain an output, wherein the channel-wise MAC operation is performed as partial accumulations by a plurality of processing elements.Type: GrantFiled: January 25, 2023Date of Patent: October 10, 2023Assignee: INTEL CORPORATIONInventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Sreenivas Subramoney, Prashant Laddha, Om Ji Omer
-
Patent number: 11734174Abstract: Described is an low overhead method and apparatus to reconfigure a pair of buffered interconnect links to operate in one of these three modes—first mode (e.g., bandwidth mode), second mode (e.g., latency mode), and third mode (e.g., energy mode). In bandwidth mode, each link in the pair buffered interconnect links carries a unique signal from source to destination. In latency mode, both links in the pair carry the same signal from source to destination, where one link in the pair is “primary” and other is called the “assist”. Temporal alignment of transitions in this pair of buffered interconnects reduces the effective capacitance of primary, thereby reducing delay or latency. In energy mode, one link in the pair, the primary, alone carries a signal, while the other link in the pair is idle. An idle neighbor on one side reduces energy consumption of the primary.Type: GrantFiled: September 19, 2019Date of Patent: August 22, 2023Assignee: Intel CorporationInventors: Huichu Liu, Tanay Karnik, Tejpal Singh, Yen-Cheng Liu, Lavanya Subramanian, Mahesh Kumashikar, Sri Harsha Choday, Sreenivas Subramoney, Kaushik Vaidyanathan, Daniel H. Morris, Uygar E. Avci, Ian A. Young
-
Patent number: 11693780Abstract: System and method for prefetching pointer-referenced data. A method embodiment includes: tracking a plurality of load instructions which includes a first load instruction to access a first data that identifies a first memory location; detecting a second load instruction which accesses a second memory location for a second data, the second memory location matching the first memory location identified by the first data; responsive to the detecting, updating a list of pointer load instructions to include information identifying the first load instruction as a pointer load instruction; prefetching a third data for a third load instruction prior to executing the third load instruction; identifying the third load instruction as a pointer load instruction based on information from the list of pointer load instructions and responsively prefetching a fourth data from a fourth memory location, wherein the fourth memory location is identified by the third data.Type: GrantFiled: August 2, 2021Date of Patent: July 4, 2023Assignee: Intel CorporationInventors: Sreenivas Subramoney, Stanislav Shwartsman, Anant Nori, Shankar Balachandran, Elad Shtiegmann, Vineeth Mekkat, Manjunath Shevgoor, Sourabh Alurkar
-
Publication number: 20230205699Abstract: An apparatus includes memory circuitry including a first data structure and prefetch circuitry that is coupled to the memory circuitry. The prefetch circuitry is to store, in the first data structure, a first subregion entry corresponding to a first subregion of a memory region allocated to a program. The first subregion entry is to include a plurality of delta values. A first delta value of the plurality of delta values represents a first distance between two cache lines associated with consecutive memory accesses within a second subregion of the memory region. The prefetch circuitry is further to detect a first memory access of a first cache line in the first subregion, identify prefetch candidates based on the first cache line and the plurality of delta values, and issue at least one prefetch request based on at least two of the prefetch candidates to be prefetched into a cache.Type: ApplicationFiled: December 24, 2021Publication date: June 29, 2023Applicant: Intel CorporationInventors: Swaraj Sha, Anant Vithal Nori, Sreenivas Subramoney, Stanislav Shwartsman, Pavel I. Kryukov, Lihu Rappoport
-
Publication number: 20230205692Abstract: Apparatus and method for leveraging simultaneous multithreading for bulk compute operations. For example, one embodiment of a processor comprises: a plurality of cores including a first core to simultaneously process instructions of a plurality of threads; a cache hierarchy coupled to the first core and the memory, the cache hierarchy comprising a Level 1 (L1) cache, a Level 2 (L2) cache, and a Level 3 (L3) cache; and a plurality of compute units coupled to the first core including a first compute unit associated with the L1 cache, a second compute unit associated with the L2 cache, and a third compute unit associated with the L3 cache, wherein the first core is to offload instructions for execution by the compute units, the first core to offload instructions from a first thread to the first compute unit, instructions from a second thread to the second compute unit, and instructions from a third thread to the third compute unit.Type: ApplicationFiled: December 23, 2021Publication date: June 29, 2023Inventors: ANANT NORI, RAHUL BERA, SHANKAR BALACHANDRAN, JOYDEEP RAKSHIT, Om Ji OMER, SREENIVAS SUBRAMONEY, AVISHAII ABUHATZERA, BELLIAPPA KUTTANNA
-
Publication number: 20230195464Abstract: Methods and apparatus relating to throttling a code fetch for speculative code paths are described. In an embodiment, a first storage structure stores a reference to a code line in response to a request to be received from a cache. A second storage structure to store a reference to the code line in response to an update to an Instruction Dispatch Queue (IDQ). Logic circuitry controls additional code line fetch operations based at least in part on a comparison of a number of ongoing speculative code fetches and a determination that the code line is speculative. Other embodiments are also disclosed and claimed.Type: ApplicationFiled: December 16, 2021Publication date: June 22, 2023Applicant: Intel CorporationInventors: Anant Vithal Nori, Prathmesh Kallurkar, Sreenivas Subramoney, Niranjan Kumar Soundararajan
-
Publication number: 20230195469Abstract: Techniques and mechanisms for a processor to determine an execution of instructions based on a prediction of a taken branch. In an embodiment, a first prediction unit generates each of multiple branch predictions in one cycle of successive branch prediction cycles. An indication of the branch predictions is provided to an execution pipeline, which prepares to execute an instruction based on the indication. Where a first one of the branch predictions is determined to be of a low confidence type, said first branch prediction is further indicated to a second prediction unit, which performs a second branch prediction based on the same branch instruction for which the first branch prediction was made. In another embodiment, the second prediction unit signals that a state of the execution pipeline is to be cleared, based on a determination that the first and second branch predictions are inconsistent with each other.Type: ApplicationFiled: December 21, 2021Publication date: June 22, 2023Applicant: Intel CorporationInventors: Sumeet Bandishte, Jayesh Gaur, Franck Sala, Alexey Yurievich Sivtsov, Jared Warner Stark, IV, Lihu Rappoport, Sreenivas Subramoney
-
Publication number: 20230185718Abstract: Methods and apparatus relating to de-prioritizing speculative code lines in on-chip caches are described. In an embodiment, logic circuitry determines whether a storage structure includes a reference to a code miss request prior to transmission of the code miss request to a shared cache. The logic circuitry causes de-prioritization of a code line, corresponding to the code miss request, in the shared cache in response to an absence of the reference in the storage structure. Other embodiments are also disclosed and claimed.Type: ApplicationFiled: December 14, 2021Publication date: June 15, 2023Applicant: Intel CorporationInventors: Anant Vithal Nori, Prathmesh Kallurkar, Niranjan Kumar Soundararajan, Sreenivas Subramoney, Lihu Rappoport, Hanna Alam, Adrian Moga, Ronak Singhal
-
Publication number: 20230169319Abstract: Systems, apparatuses and methods may provide for technology that decodes data via an instruction that indicates a number of rulebooks to be processed, an input feature size, an output feature size, and a plurality of feature map base addresses, rearranges spatially distributed voxel output feature maps in the decoded data based on weight planes, and performs a channel-wise multiply-accumulate (MAC) operation on the rearranged spatially distributed voxel output feature maps to obtain an output, wherein the channel-wise MAC operation is performed as partial accumulations by a plurality of processing elements.Type: ApplicationFiled: January 25, 2023Publication date: June 1, 2023Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Sreenivas Subramoney, Prashant Laddha, Om Ji Omer
-
Patent number: 11656971Abstract: A processor comprises a microarchitectural feature and dynamic tuning unit (DTU) circuitry. The processor executes a program for first and second execution windows with the microarchitectural feature disabled and enabled, respectively. The DTU circuitry automatically determines whether the processor achieved worse performance in the second execution window. In response to determining that the processor achieved worse performance in the second execution window, the DTU circuitry updates a usefulness state for a selected address of the program to denote worse performance. In response to multiple consecutive determinations that the processor achieved worse performance with the microarchitectural feature enabled, the DTU circuitry automatically updates the usefulness state to denote a confirmed bad state.Type: GrantFiled: January 24, 2022Date of Patent: May 23, 2023Assignee: Intel CorporationInventors: Adarsh Chauhan, Jayesh Gaur, Franck Sala, Lihu Rappoport, Zeev Sperber, Adi Yoaz, Sreenivas Subramoney
-
Patent number: 11645078Abstract: Systems, methods, and apparatuses relating to hardware for auto-predication of critical branches. In one embodiment, a processor core includes a decoder to decode instructions into decoded instructions, an execution unit to execute the decoded instructions, a branch predictor circuit to predict a future outcome of a branch instruction, and a branch predication manager circuit to disable use of the predicted future outcome for a conditional critical branch comprising the branch instruction.Type: GrantFiled: December 28, 2019Date of Patent: May 9, 2023Assignee: Intel CorporationInventors: Adarsh Chauhan, Franck Sala, Jayesh Gaur, Zeev Sperber, Lihu Rappoport, Adi Yoaz, Sreenivas Subramoney