Patents by Inventor Sreenivas Subramoney

Sreenivas Subramoney has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20220197794
    Abstract: An embodiment of an integrated circuit may comprise a core, a first level core cache memory coupled to the core, a shared core cache memory coupled to the core, a first cache controller coupled to the core and communicatively coupled to the first level core cache memory, a second cache controller coupled to the core and communicatively coupled to the shared core cache memory, and circuitry coupled to the core and communicatively coupled to the first cache controller and the second cache controller to determine if a workload has a large code footprint, and, if so determined, partition N ways of the shared core cache memory into first and second chunks of ways with the first chunk of M ways reserved for code cache lines from the workload and the second chunk of N minus M ways reserved for data cache lines from the workload, where N and M are positive integer values and N minus M is greater than zero. Other embodiments are disclosed and claimed.
    Type: Application
    Filed: December 22, 2020
    Publication date: June 23, 2022
    Applicant: Intel Corporation
    Inventors: Prathmesh Kallurkar, Anant Vithal Nori, Sreenivas Subramoney
  • Patent number: 11347828
    Abstract: A disclosed apparatus to multiply matrices includes a compute engine. The compute engine includes multipliers in a two dimensional array that has a plurality of array locations defined by columns and rows. The apparatus also includes a plurality of adders in columns. A broadcast interconnect between a cache and the multipliers broadcasts a first set of operand data elements to multipliers in the rows of the array. A unicast interconnect unicasts a second set of operands between a data buffer and the multipliers. The multipliers multiply the operands to generate a plurality of outputs, and the adders add the outputs generated by the multipliers.
    Type: Grant
    Filed: March 27, 2020
    Date of Patent: May 31, 2022
    Assignee: Intel Corporation
    Inventors: Biji George, Om Ji Omer, Dipan Kumar Mandal, Cormac Brick, Lance Hacking, Sreenivas Subramoney, Belliappa Kuttanna
  • Patent number: 11321089
    Abstract: Methods and apparatuses relating to instruction set architecture (ISA) based and automatic load tracking hardware for opportunistic re-steer of data-dependent flaky branches are described.
    Type: Grant
    Filed: June 27, 2020
    Date of Patent: May 3, 2022
    Assignee: Intel Corporation
    Inventors: Saurabh Gupta, Niranjan Soundararajan, Ragavendra Natarajan, Sreenivas Subramoney
  • Publication number: 20220129763
    Abstract: An embodiment of an integrated circuit may comprise a front end unit, and circuitry coupled to the front end unit, the circuitry to provide a high confidence, multiple branch offset predictor. For example, the circuitry may be configured to identify an entry in a multiple-taken-branch prediction table that corresponds to a conditional branch instruction, determine if a confidence level of the entry exceeds a threshold confidence level, and, if so determined, provide multiple taken branch predictions that stem from the conditional branch instruction from the entry in the multiple-taken-branch prediction table. Other embodiments are disclosed and claimed.
    Type: Application
    Filed: December 22, 2020
    Publication date: April 28, 2022
    Applicant: Intel Corporation
    Inventors: Sumeet Bandishte, Jayesh Gaur, Polychronis Xekalakis, Ariel Sabba, Deborah Marr, Sreenivas Subramoney
  • Publication number: 20220113974
    Abstract: A memory architecture includes processing circuits co-located with memory subarrays for performing computations within the memory architecture. The memory architecture includes a plurality of decoders in hierarchical levels that include a multicast capability for distributing data or compute operations to individual subarrays. The multicast may be configurable with respect to individual fan-outs at each hierarchical level. A computation workflow may be organized into a compute supertile representing one or more “supertiles” of input data to be processed in the compute supertile. The individual data tiles of the input data supertile may be used by multiple compute tiles executed by the processing circuits of the subarrays, and the data tiles multicast to the respective processing circuits for efficient data loading and parallel computation.
    Type: Application
    Filed: December 23, 2021
    Publication date: April 14, 2022
    Applicant: INTEL CORPORATION
    Inventors: Om Ji Omer, Gurpreet Singh Kalsi, Anirud Thyagharajan, Saurabh Jain, Kamlesh R. Pillai, Sreenivas Subramoney, Avishaii Abuhatzera
  • Publication number: 20220114234
    Abstract: A matrix processing engine is provided for efficient matrix computation performed by a dense matrix compute circuit (performing SIMD operations) and a scalar computing core (performing SISD operations). These two processing components operate together to produce output data tiles by feeding results of the dense SIMD operations to the scalar computing core using thread packing and an in-line buffer for accumulating and packing the dense result data. This permits the scalar computing to spawn threads to operate on the dense results as available and without requiring partial or intermediate data read/writes between the dense and scalar computations.
    Type: Application
    Filed: December 22, 2021
    Publication date: April 14, 2022
    Applicant: INTEL CORPORATION
    Inventors: Biji George, Sreenivas Subramoney, Om Ji Omer, Anoop Viswam
  • Publication number: 20220100514
    Abstract: Techniques for processing loops are described. An exemplary apparatus at least includes decoder circuitry to decode a single instruction, the single instruction to include a field for an opcode, the opcode to indicate execution circuitry is to perform an operation to configure execution of one or more loops, wherein the one or more loops are to include a plurality of configuration instructions and instructions that are to use metadata generated by ones of the plurality of configuration instructions; and execution circuitry to perform the operation as indicated by the opcode.
    Type: Application
    Filed: December 26, 2020
    Publication date: March 31, 2022
    Inventors: Anant NORI, Shankar BALACHANDRAN, Sreenivas SUBRAMONEY, Joydeep RAKSHIT, Vedvyas SHANBHOGUE, Avishaii ABUHATZERA, Belliappa KUTTANNA
  • Publication number: 20220091852
    Abstract: Methods and apparatus relating to Instruction Set Architecture (ISA) and/or microarchitecture for early pipeline re-steering using load address prediction to mitigate branch misprediction penalties are described. In an embodiment, decode circuitry decodes a load instruction and Load Address Predictor (LAP) circuitry issues a load prefetch request to memory for data for a load operation of the load instruction. Compute circuitry executes an outcome for a branch operation of the load instruction based on the data from the load prefetch request. And re-steering circuitry transmits a signal to cause flushing of data associated with the load instruction in response to a mismatch between the outcome for the branch operation and a stored prediction value for the branch. Other embodiments are also disclosed and claimed.
    Type: Application
    Filed: September 22, 2020
    Publication date: March 24, 2022
    Applicant: Intel Corporation
    Inventors: Saurabh Gupta, Niranjan Kumar Soundarajan, Sreenivas Subramoney, Ragavendra Natarajan
  • Patent number: 11256599
    Abstract: A processor comprises a microarchitectural feature and dynamic tuning unit (DTU) circuitry. The processor executes a program for first and second execution windows with the microarchitectural feature disabled and enabled, respectively. The DTU circuitry automatically determines whether the processor achieved worse performance in the second execution window. In response to determining that the processor achieved worse performance in the second execution window, the DTU circuitry updates a usefulness state for a selected address of the program to denote worse performance. In response to multiple consecutive determinations that the processor achieved worse performance with the microarchitectural feature enabled, the DTU circuitry automatically updates the usefulness state to denote a confirmed bad state.
    Type: Grant
    Filed: December 21, 2020
    Date of Patent: February 22, 2022
    Assignee: Intel Corporation
    Inventors: Adarsh Chauhan, Jayesh Gaur, Franck Sala, Lihu Rappoport, Zeev Sperber, Adi Yoaz, Sreenivas Subramoney
  • Patent number: 11238309
    Abstract: An example apparatus for selecting keypoints in image includes a keypoint detector to detect keypoints in a plurality of received images. The apparatus also includes a score calculator to calculate a keypoint score for each of the detected keypoints based on a descriptor score indicating descriptor invariance. The apparatus includes a keypoint selector to select keypoints based on the calculated keypoint scores. The apparatus also further includes a descriptor calculator to calculate descriptors for each of the selected keypoints. The apparatus also includes a descriptor matcher to match corresponding descriptors between images in the plurality of received images. The apparatus further also includes a feature tracker to track a feature in the plurality of images based on the matched descriptors.
    Type: Grant
    Filed: December 26, 2018
    Date of Patent: February 1, 2022
    Assignee: Intel Corporation
    Inventors: Dipan Kumar Mandal, Gurpreet Kalsi, Om J Omer, Prashant Laddha, Sreenivas Subramoney
  • Publication number: 20220012571
    Abstract: Apparatuses and articles of manufacture are disclosed. An example apparatus includes an activation function control and decode circuitry to populate an input buffer circuitry with an input data element bit subset of less than a threshold number of bits of the input data element retrieved from the memory circuitry. The activation function and control circuitry also populate a kernel weight buffer circuitry with a weight data element bit subset of less than the threshold number of bits of the weight data element retrieved from the memory circuitry. The apparatus also including a preprocessor circuitry to calculate a partial convolution value of at least a portion of the input data element bit subset and the weight data element bit subset to determine a predicted sign of the partial convolution value.
    Type: Application
    Filed: September 24, 2021
    Publication date: January 13, 2022
    Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Bharathwaj Suresh, Sreenivas Subramoney, Avishaii Abuhatzera
  • Publication number: 20210405896
    Abstract: Technologies disclosed herein provide one example of a system that includes processor circuitry to be communicatively coupled to a memory circuitry. The processor circuitry is to receive a memory access request corresponding to an application for access to an address range in a memory allocation of the memory circuitry and to locate a metadata region within the memory allocation. The processor circuitry is also to, in response to a determination that the address range includes at least a portion of the metadata region, obtain first metadata stored in the metadata region, use the first metadata to determine an alternate memory address in a relocation region, and read, at the alternate memory address, displaced data from the portion of the metadata region included in the address range of the memory allocation. The address range includes one or more bytes of an expected allocation region of the memory allocation.
    Type: Application
    Filed: September 10, 2021
    Publication date: December 30, 2021
    Applicant: Intel Corporation
    Inventors: David M. Durham, Michael D. LeMay, Sergej Deutsch, Joydeep Rakshit, Anant Vithal Nori, Jayesh Gaur, Sreenivas Subramoney
  • Patent number: 11188467
    Abstract: A method is described. The method includes receiving a read or write request for a cache line. The method includes directing the request to a set of logical super lines based on the cache line's system memory address. The method includes associating the request with a cache line of the set of logical super lines. The method includes, if the request is a write request: compressing the cache line to form a compressed cache line, breaking the cache line down into smaller data units and storing the smaller data units into a memory side cache. The method includes, if the request is a read request: reading smaller data units of the compressed cache line from the memory side cache and decompressing the cache line.
    Type: Grant
    Filed: September 28, 2017
    Date of Patent: November 30, 2021
    Assignee: Intel Corporation
    Inventors: Israel Diamand, Alaa R. Alameldeen, Sreenivas Subramoney, Supratik Majumder, Srinivas Santosh Kumar Madugula, Jayesh Gaur, Zvika Greenfield, Anant V. Nori
  • Publication number: 20210365377
    Abstract: System and method for prefetching pointer-referenced data. A method embodiment includes: tracking a plurality of load instructions which includes a first load instruction to access a first data that identifies a first memory location; detecting a second load instruction which accesses a second memory location for a second data, the second memory location matching the first memory location identified by the first data; responsive to the detecting, updating a list of pointer load instructions to include information identifying the first load instruction as a pointer load instruction; prefetching a third data for a third load instruction prior to executing the third load instruction; identifying the third load instruction as a pointer load instruction based on information from the list of pointer load instructions and responsively prefetching a fourth data from a fourth memory location, wherein the fourth memory location is identified by the third data.
    Type: Application
    Filed: August 2, 2021
    Publication date: November 25, 2021
    Applicant: Intel Corporation
    Inventors: Sreenivas Subramoney, Stanislav Shwartsman, Anant Nori, Shankar Balachandran, Elad Shtiegmann, Vineeth Mekkat, Manjunath Shevgoor, Sourabh Alurkar
  • Publication number: 20210326139
    Abstract: Methods and apparatuses relating to instruction set architecture (ISA) based and automatic load tracking hardware for opportunistic re-steer of data-dependent flaky branches are described.
    Type: Application
    Filed: June 27, 2020
    Publication date: October 21, 2021
    Inventors: SAURABH GUPTA, NIRANJAN SOUNDARARAJAN, RAGAVENDRA NATARAJAN, SREENIVAS SUBRAMONEY
  • Publication number: 20210303468
    Abstract: Systems, methods, and apparatuses relating to circuitry to implement a duplication resistant on-die irregular data prefetcher are described.
    Type: Application
    Filed: March 27, 2020
    Publication date: September 30, 2021
    Inventors: Prathmesh Kallurkar, Anant Vithal Nori, Sreenivas Subramoney
  • Patent number: 11080194
    Abstract: System and method for prefetching pointer-referenced data. A method embodiment includes: tracking a plurality of load instructions which includes a first load instruction to access a first data that identifies a first memory location; detecting a second load instruction which accesses a second memory location for a second data, the second memory location matching the first memory location identified by the first data; responsive to the detecting, updating a list of pointer load instructions to include information identifying the first load instruction as a pointer load instruction; prefetching a third data for a third load instruction prior to executing the third load instruction; identifying the third load instruction as a pointer load instruction based on information from the list of pointer load instructions and responsively prefetching a fourth data from a fourth memory location, wherein the fourth memory location is identified by the third data.
    Type: Grant
    Filed: December 27, 2018
    Date of Patent: August 3, 2021
    Assignee: Intel Corporation
    Inventors: Sreenivas Subramoney, Stanislav Shwartsman, Anant Nori, Shankar Balachandran, Elad Shtiegmann, Vineeth Mekkat, Manjunath Shevgoor, Sourabh Alurkar
  • Publication number: 20210232312
    Abstract: Disclosed Methods, Apparatus, and articles of manufacture to profile page tables for memory management are disclosed. An example apparatus includes a processor to execute computer readable instructions to: profile a first page at a first level of a page table as not part of a target group; and in response to profiling the first page as not part of the target group, label a data page at a second level that corresponds to the first page as not part of the target group, the second level being lower than the first level.
    Type: Application
    Filed: March 26, 2021
    Publication date: July 29, 2021
    Inventors: Aravinda Prasad, Sandeep Kumar, Sreenivas Subramoney, Andy Rudoff
  • Publication number: 20210201163
    Abstract: A system is provided that includes a bit vector-based distance counter circuitry configured to generate one or more bit vectors encoded with information about potential matches and edits between a read and a reference genome, wherein the read comprises an encoding of a fragment of deoxyribonucleic acid (DNA) encoded via bases G, A, T, C. The system further includes a bit vector-based traceback circuitry configured to divide the reference genome into one or more windows and to use the plurality of bit vectors to generate a traceback output for each of the one or more windows, wherein the traceback output comprises a match, a substitution, an insert, a delete, or a combination thereof, between the read and the one or more windows.
    Type: Application
    Filed: December 28, 2019
    Publication date: July 1, 2021
    Inventors: Gurpreet Singh Kalsi, Anant V. Nori, Christopher Justin Hughes, Sreenivas Subramoney, Damla Senol
  • Patent number: 11043256
    Abstract: Described are mechanisms and methods for amortizing the cost of address decode, row-decode and wordline firing across multiple read accesses (instead of just on one read access). Some or all memory locations that share a wordline (WL) may be read, by walking through column multiplexor addresses (instead of just reading out one column multiplexor address per WL fire or memory access). The mechanisms and methods disclosed herein may advantageously enable N distinct memory words to be read out if the array uses an N-to-1 column multiplexor. Since memories such as embedded DRAMs (eDRAMs) may undergo a destructive read, for a given WL fire, a design may be disposed to sense N distinct memory words and restore them in order.
    Type: Grant
    Filed: June 29, 2019
    Date of Patent: June 22, 2021
    Assignee: Intel Corporation
    Inventors: Kaushik Vaidyanathan, Huichu Liu, Tanay Karnik, Sreenivas Subramoney, Jayesh Gaur, Sudhanshu Shukla