Patents by Inventor Sreenivas Subramoney

Sreenivas Subramoney has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Apparatus and method for hardware-based memoization of function calls to reduce instruction execution

Patent number: 12020033

Abstract: Apparatus and method for memorizing repeat function calls are described herein. An apparatus embodiment includes: uop buffer circuitry to identify a function for memorization based on retiring micro-operations (uops) from a processing pipeline; memorization retirement circuitry to generate a signature of the function which includes input and output data of the function; a memorization data structure to store the signature; and predictor circuitry to detect an instance of the function to be executed by the processing pipeline and to responsively exclude a first subset of uops associated with the instance from execution when a confidence level associated with the function is above a threshold. One or more instructions that are data-dependent on execution of the instance is then provided with the output data of the function from the memorization data structure.

Type: Grant

Filed: December 24, 2020

Date of Patent: June 25, 2024

Assignee: Intel Corporation

Inventors: Niranjan Kumar Soundararajan, Sreenivas Subramoney, Jayesh Gaur, S R Swamy Saranam Chongala
DEVICE, METHOD AND SYSTEM TO CAPTURE OR RESTORE MICROARCHITECTURAL STATE OF A PROCESSOR CORE

Publication number: 20240202000

Abstract: Techniques and mechanisms for efficiently saving and recovering state of a processor core. In an embodiment, a processor core fetches and decodes a first instruction to generate a first decoded instruction, wherein the first instruction comprises a first opcode which corresponds to one or more components of the processor core. Execution of the first instruction comprises saving microarchitectural state of the one or more components to a memory of the core. In another embodiment, a processor core fetches and decodes a second instruction to generate a second decoded instruction, wherein the second instruction comprises a second opcode which corresponds to the same one or more components. Execution of the second instruction comprises restoring the microarchitectural state from the memory to the one or more components.

Type: Application

Filed: December 19, 2022

Publication date: June 20, 2024

Applicant: Intel Corporation

Inventors: Niranjan Soundararajan, Sreenivas Subramoney
SPARSITY-AWARE PERFORMANCE BOOST IN COMPUTE-IN-MEMORY CORES FOR DEEP NEURAL NETWORK ACCELERATION

Publication number: 20240201949

Abstract: Systems, apparatuses and methods may provide for technology that includes a compute-in-memory (CiM) enabled memory array to conduct digital bit-serial multiply and accumulate (MAC) operations on multi-bit input data and weight data stored in the CiM enabled memory array, an adder tree coupled to the CiM enabled memory array, an accumulator coupled to the adder tree, and an input bit selection stage coupled to the CiM enabled memory array, wherein the input bit selection stage restricts serial bit selection on the multi-bit input data to non-zero values during the digital MAC operations.

Type: Application

Filed: February 28, 2024

Publication date: June 20, 2024

Inventors: Sagar Varma Sayyaparaju, Om Ji Omer, Sreenivas Subramoney
Data relocation for inline metadata

Patent number: 11972126

Abstract: Technologies disclosed herein provide one example of a system that includes processor circuitry to be communicatively coupled to a memory circuitry. The processor circuitry is to receive a memory access request corresponding to an application for access to an address range in a memory allocation of the memory circuitry and to locate a metadata region within the memory allocation. The processor circuitry is also to, in response to a determination that the address range includes at least a portion of the metadata region, obtain first metadata stored in the metadata region, use the first metadata to determine an alternate memory address in a relocation region, and read, at the alternate memory address, displaced data from the portion of the metadata region included in the address range of the memory allocation. The address range includes one or more bytes of an expected allocation region of the memory allocation.

Type: Grant

Filed: September 10, 2021

Date of Patent: April 30, 2024

Assignee: Intel Corporation

Inventors: David M. Durham, Michael D. LeMay, Sergej Deutsch, Joydeep Rakshit, Anant Vithal Nori, Jayesh Gaur, Sreenivas Subramoney
Methods, apparatus, and articles of manufacture to improve in-memory multiply and accumulate operations

Patent number: 11949414

Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed to improve in-memory multiply and accumulate operations. An example apparatus includes a first multiplexer in a subarray of memory, the first multiplexer to receive first values representative of a column of a lookup table (LUT) including entries to represent products of four-bit numbers and return second values from an intersection of a row and the column of the LUT based on a first element of a first operand; shift and adder logic in the subarray, the shift and adder logic to shift the second values based on at least one of the first element of the first operand or a first element of a second operand; and accumulation storage in the subarray, the accumulation storage to store at least the shifted second values.

Type: Grant

Filed: December 22, 2020

Date of Patent: April 2, 2024

Assignee: INTEL CORPORATION

Inventors: Gurpreet Singh Kalsi, Akshay Krishna Ramanathan, Kamlesh Pillai, Sreenivas Subramoney, Srivatsa Rangachar Srinivasa, Anirud Thyagharajan, Om Ji Omer, Saurabh Jain
INSTRUCTION ELIMINATION THROUGH HARDWARE DRIVEN MEMOIZATION OF LOOP INSTANCES

Publication number: 20240103874

Abstract: Methods and apparatus for instruction elimination through hardware driven memoization of loop instances. A hardware-based loop memoization technique learns repeating sequences of loops and transparently removes instructions for the loop instructions from instruction sequences while making their output available to dependent instructions as if the loop instructions had been executed. A path-based predictor is implemented at the front-end to predict these loop instances and remove their instructions from instruction sequences. A novel memoization prediction micro-operation (Uop) is inserted into the instruction sequence for instances of loops that are predicted to be memoized. The memoization prediction Uop is used to compare the input signature (expected set of input values for the loop) with the actual signature to determine correct and incorrect predictions.

Type: Application

Filed: September 23, 2022

Publication date: March 28, 2024

Inventors: Niranjan Kumar Soundararajan, Sreenivas Subramoney, Jayesh Gaur
SHORT PIPELINE FOR FAST RECOVERY FROM A BRANCH MISPREDICTION

Publication number: 20240103878

Abstract: An example of an integrated circuit may include a first execution cluster, a second execution cluster that is one or more of narrower and shallower as compared to the first execution cluster, and circuitry to selectively steer instructions to the first execution cluster and the second execution cluster based on branch misprediction information. Other embodiments are disclosed and claimed.

Type: Application

Filed: September 26, 2022

Publication date: March 28, 2024

Applicant: Intel Corporation

Inventors: Jayesh Gaur, Sufiyan Syed, Adithya Ranganathan, Sreenivas Subramoney
Genome sequence alignment system and method

Patent number: 11941534

Abstract: A system is provided that includes a bit vector-based distance counter circuitry configured to generate one or more bit vectors encoded with information about potential matches and edits between a read and a reference genome, wherein the read comprises an encoding of a fragment of deoxyribonucleic acid (DNA) encoded via bases G, A, T, C. The system further includes a bit vector-based traceback circuitry configured to divide the reference genome into one or more windows and to use the plurality of bit vectors to generate a traceback output for each of the one or more windows, wherein the traceback output comprises a match, a substitution, an insert, a delete, or a combination thereof, between the read and the one or more windows.

Type: Grant

Filed: December 28, 2019

Date of Patent: March 26, 2024

Assignee: Intel Corporation

Inventors: Gurpreet Singh Kalsi, Anant V. Nori, Christopher Justin Hughes, Sreenivas Subramoney, Damla Senol
Apparatuses, methods, and systems for dual spatial pattern prefetcher

Patent number: 11874773

Abstract: Systems, methods, and apparatuses relating to a dual spatial pattern prefetcher are described.

Type: Grant

Filed: December 28, 2019

Date of Patent: January 16, 2024

Assignee: Intel Corporation

Inventors: Rahul Bera, Anant Vithal Nori, Sreenivas Subramoney
SELECTIVE PROVISIONING OF SUPPLEMENTARY MICRO-OPERATION CACHE RESOURCES

Publication number: 20230418757

Abstract: Techniques and mechanisms for selectively increasing or decreasing an amount of cache resources which are to be available for use in the provisioning of decoded micro-operations in a processor. In an embodiment, a processor core comprises both a first cache which is dedicated to caching micro-operations, and a second cache which is coupled to receive data, or non-decoded instructions. The core further comprises circuitry to monitor one or more cache performance characteristics of the core. Based on the one or more cache performance characteristics, the circuitry performs an evaluation to determine whether to increase—or alternatively, to decrease—the size of a pool of one or more caches which are to be available to receive micro-operations. In another embodiment, the second cache is added to the pool based on an indication of an overutilization of the first cache.

Type: Application

Filed: June 22, 2022

Publication date: December 28, 2023

Applicant: Intel Corporation

Inventors: Niranjan Soundararajan, Sreenivas Subramoney, Vishal Gupta, Neelu Shivprakash Kalani
SYSTEM, METHOD, AND APPARATUS FOR ENHANCED POINTER IDENTIFICATION AND PREFETCHING

Publication number: 20230409481

Abstract: System and method for prefetching pointer-referenced data. A method embodiment includes: tracking a plurality of load instructions which includes a first load instruction to access a first data that identifies a first memory location; detecting a second load instruction which accesses a second memory location for a second data, the second memory location matching the first memory location identified by the first data; responsive to the detecting, updating a list of pointer load instructions to include information identifying the first load instruction as a pointer load instruction; prefetching a third data for a third load instruction prior to executing the third load instruction; identifying the third load instruction as a pointer load instruction based on information from the list of pointer load instructions and responsively prefetching a fourth data from a fourth memory location, wherein the fourth memory location is identified by the third data.

Type: Application

Filed: May 19, 2023

Publication date: December 21, 2023

Applicant: Intel Corporation

Inventors: Sreenivas Subramoney, Stanislav Shwartsman, Anant Nori, Shankar Balachandran, Elad Shtiegmann, Vineeth Mekkat, Manjunath Shevgoor, Sourabh Alurkar
Apparatuses, methods, and systems for a duplication resistant on-die irregular data prefetcher

Patent number: 11847053

Abstract: Systems, methods, and apparatuses relating to circuitry to implement a duplication resistant on-die irregular data prefetcher are described.

Type: Grant

Filed: March 27, 2020

Date of Patent: December 19, 2023

Assignee: Intel Corporation

Inventors: Prathmesh Kallurkar, Anant Vithal Nori, Sreenivas Subramoney
MAXIMIZING ON-CHIP DATA REUSE IN COMPUTE IN MEMORY AND COMPUTE NEAR MEMORY ARCHITECTURES

Publication number: 20230333999

Abstract: Systems, apparatuses and methods may provide for technology that includes a chip having a memory structure including compute hardware, a plurality of address decoders coupled to the compute hardware, and a hierarchical interconnect fabric coupled to the plurality of address decoders, and direct memory address (DMA) hardware positioned adjacent to one or more of the plurality of address decoders, wherein the DMA hardware is to conduct on-chip transfers of intermediate state data via the hierarchical interconnect fabric. Additionally, the chip may include logic to allocate address space in the chip to intermediate state data and store the intermediate state data to the allocated address space.

Type: Application

Filed: June 22, 2023

Publication date: October 19, 2023

Inventors: Om Ji Omer, Anirud Thyagharajan, Sreenivas Subramoney
Spatially sparse neural network accelerator for multi-dimension visual analytics

Patent number: 11783170

Abstract: Systems, apparatuses and methods may provide for technology that decodes data via an instruction that indicates a number of rulebooks to be processed, an input feature size, an output feature size, and a plurality of feature map base addresses, rearranges spatially distributed voxel output feature maps in the decoded data based on weight planes, and performs a channel-wise multiply-accumulate (MAC) operation on the rearranged spatially distributed voxel output feature maps to obtain an output, wherein the channel-wise MAC operation is performed as partial accumulations by a plurality of processing elements.

Type: Grant

Filed: January 25, 2023

Date of Patent: October 10, 2023

Assignee: INTEL CORPORATION

Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Sreenivas Subramoney, Prashant Laddha, Om Ji Omer
Low overhead, high bandwidth re-configurable interconnect apparatus and method

Patent number: 11734174

Abstract: Described is an low overhead method and apparatus to reconfigure a pair of buffered interconnect links to operate in one of these three modes—first mode (e.g., bandwidth mode), second mode (e.g., latency mode), and third mode (e.g., energy mode). In bandwidth mode, each link in the pair buffered interconnect links carries a unique signal from source to destination. In latency mode, both links in the pair carry the same signal from source to destination, where one link in the pair is “primary” and other is called the “assist”. Temporal alignment of transitions in this pair of buffered interconnects reduces the effective capacitance of primary, thereby reducing delay or latency. In energy mode, one link in the pair, the primary, alone carries a signal, while the other link in the pair is idle. An idle neighbor on one side reduces energy consumption of the primary.

Type: Grant

Filed: September 19, 2019

Date of Patent: August 22, 2023

Assignee: Intel Corporation

Inventors: Huichu Liu, Tanay Karnik, Tejpal Singh, Yen-Cheng Liu, Lavanya Subramanian, Mahesh Kumashikar, Sri Harsha Choday, Sreenivas Subramoney, Kaushik Vaidyanathan, Daniel H. Morris, Uygar E. Avci, Ian A. Young
System, method, and apparatus for enhanced pointer identification and prefetching

Patent number: 11693780

Abstract: System and method for prefetching pointer-referenced data. A method embodiment includes: tracking a plurality of load instructions which includes a first load instruction to access a first data that identifies a first memory location; detecting a second load instruction which accesses a second memory location for a second data, the second memory location matching the first memory location identified by the first data; responsive to the detecting, updating a list of pointer load instructions to include information identifying the first load instruction as a pointer load instruction; prefetching a third data for a third load instruction prior to executing the third load instruction; identifying the third load instruction as a pointer load instruction based on information from the list of pointer load instructions and responsively prefetching a fourth data from a fourth memory location, wherein the fourth memory location is identified by the third data.

Type: Grant

Filed: August 2, 2021

Date of Patent: July 4, 2023

Assignee: Intel Corporation

Inventors: Sreenivas Subramoney, Stanislav Shwartsman, Anant Nori, Shankar Balachandran, Elad Shtiegmann, Vineeth Mekkat, Manjunath Shevgoor, Sourabh Alurkar
REGION AWARE DELTA PREFETCHER

Publication number: 20230205699

Abstract: An apparatus includes memory circuitry including a first data structure and prefetch circuitry that is coupled to the memory circuitry. The prefetch circuitry is to store, in the first data structure, a first subregion entry corresponding to a first subregion of a memory region allocated to a program. The first subregion entry is to include a plurality of delta values. A first delta value of the plurality of delta values represents a first distance between two cache lines associated with consecutive memory accesses within a second subregion of the memory region. The prefetch circuitry is further to detect a first memory access of a first cache line in the first subregion, identify prefetch candidates based on the first cache line and the plurality of delta values, and issue at least one prefetch request based on at least two of the prefetch candidates to be prefetched into a cache.

Type: Application

Filed: December 24, 2021

Publication date: June 29, 2023

Applicant: Intel Corporation

Inventors: Swaraj Sha, Anant Vithal Nori, Sreenivas Subramoney, Stanislav Shwartsman, Pavel I. Kryukov, Lihu Rappoport
DEVICE, METHOD, AND SYSTEM TO FACILITATE IMPROVED BANDWIDTH OF A BRANCH PREDICTION UNIT

Publication number: 20230195469

Abstract: Techniques and mechanisms for a processor to determine an execution of instructions based on a prediction of a taken branch. In an embodiment, a first prediction unit generates each of multiple branch predictions in one cycle of successive branch prediction cycles. An indication of the branch predictions is provided to an execution pipeline, which prepares to execute an instruction based on the indication. Where a first one of the branch predictions is determined to be of a low confidence type, said first branch prediction is further indicated to a second prediction unit, which performs a second branch prediction based on the same branch instruction for which the first branch prediction was made. In another embodiment, the second prediction unit signals that a state of the execution pipeline is to be cleared, based on a determination that the first and second branch predictions are inconsistent with each other.

Type: Application

Filed: December 21, 2021

Publication date: June 22, 2023

Applicant: Intel Corporation

Inventors: Sumeet Bandishte, Jayesh Gaur, Franck Sala, Alexey Yurievich Sivtsov, Jared Warner Stark, IV, Lihu Rappoport, Sreenivas Subramoney
Throttling Code Fetch For Speculative Code Paths

Publication number: 20230195464

Abstract: Methods and apparatus relating to throttling a code fetch for speculative code paths are described. In an embodiment, a first storage structure stores a reference to a code line in response to a request to be received from a cache. A second storage structure to store a reference to the code line in response to an update to an Instruction Dispatch Queue (IDQ). Logic circuitry controls additional code line fetch operations based at least in part on a comparison of a number of ongoing speculative code fetches and a determination that the code line is speculative. Other embodiments are also disclosed and claimed.

Type: Application

Filed: December 16, 2021

Publication date: June 22, 2023

Applicant: Intel Corporation

Inventors: Anant Vithal Nori, Prathmesh Kallurkar, Sreenivas Subramoney, Niranjan Kumar Soundararajan
DE-PRIORITIZING SPECULATIVE CODE LINES IN ON-CHIP CACHES

Publication number: 20230185718

Abstract: Methods and apparatus relating to de-prioritizing speculative code lines in on-chip caches are described. In an embodiment, logic circuitry determines whether a storage structure includes a reference to a code miss request prior to transmission of the code miss request to a shared cache. The logic circuitry causes de-prioritization of a code line, corresponding to the code miss request, in the shared cache in response to an absence of the reference in the storage structure. Other embodiments are also disclosed and claimed.

Type: Application

Filed: December 14, 2021

Publication date: June 15, 2023

Applicant: Intel Corporation

Inventors: Anant Vithal Nori, Prathmesh Kallurkar, Niranjan Kumar Soundararajan, Sreenivas Subramoney, Lihu Rappoport, Hanna Alam, Adrian Moga, Ronak Singhal

prev 1 2 3 4 5 6 … next