Patents by Inventor Sreenivas Subramoney

Sreenivas Subramoney has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20250117329
    Abstract: Methods and apparatus relating to an instruction and/or micro-architecture support for decompression on core are described. In an embodiment, decode circuitry decodes a decompression instruction into a first micro operation and a second micro operation. The first micro operation causes one or more load operations to fetch data into one or more cachelines of a cache of a processor core. Decompression Engine (DE) circuitry decompresses the fetched data from the one or more cachelines of the cache of the processor core in response to the second micro operation. Other embodiments are also disclosed and claimed.
    Type: Application
    Filed: November 14, 2024
    Publication date: April 10, 2025
    Applicant: Intel Corporation
    Inventors: Jayesh Gaur, Adarsh Chauhan, Vinodh Gopal, Vedvyas Shanbhogue, Sreenivas Subramoney, Wajdi Feghali
  • Patent number: 12248696
    Abstract: Example compute-in-memory (CIM) or processor-in-memory (PIM) techniques using repurposed or dedicated static random access memory (SRAM) rows of an SRAM sub-array to store look-up-table (LUT) entries for use in a multiply and accumulate (MAC) operation.
    Type: Grant
    Filed: June 7, 2021
    Date of Patent: March 11, 2025
    Assignee: Intel Corporation
    Inventors: Saurabh Jain, Srivatsa Rangachar Srinivasa, Akshay Krishna Ramanathan, Gurpreet Singh Kalsi, Kamlesh R. Pillai, Sreenivas Subramoney
  • Patent number: 12242721
    Abstract: Disclosed Methods, Apparatus, and articles of manufacture to profile page tables for memory management are disclosed. An example apparatus includes a processor to execute computer readable instructions to: profile a first page at a first level of a page table as not part of a target group; and in response to profiling the first page as not part of the target group, label a data page at a second level that corresponds to the first page as not part of the target group, the second level being lower than the first level.
    Type: Grant
    Filed: March 26, 2021
    Date of Patent: March 4, 2025
    Assignee: Intel Corporation
    Inventors: Aravinda Prasad, Sandeep Kumar, Sreenivas Subramoney, Andy Rudoff
  • Patent number: 12216581
    Abstract: System and method for prefetching pointer-referenced data. A method embodiment includes: tracking a plurality of load instructions which includes a first load instruction to access a first data that identifies a first memory location; detecting a second load instruction which accesses a second memory location for a second data, the second memory location matching the first memory location identified by the first data; responsive to the detecting, updating a list of pointer load instructions to include information identifying the first load instruction as a pointer load instruction; prefetching a third data for a third load instruction prior to executing the third load instruction; identifying the third load instruction as a pointer load instruction based on information from the list of pointer load instructions and responsively prefetching a fourth data from a fourth memory location, wherein the fourth memory location is identified by the third data.
    Type: Grant
    Filed: May 19, 2023
    Date of Patent: February 4, 2025
    Assignee: Intel Corporation
    Inventors: Sreenivas Subramoney, Stanislav Shwartsman, Anant Nori, Shankar Balachandran, Elad Shtiegmann, Vineeth Mekkat, Manjunath Shevgoor, Sourabh Alurkar
  • Patent number: 12190114
    Abstract: In one embodiment, a processor includes a branch predictor to predict whether a branch instruction is to be taken and a branch target buffer (BTB) coupled to the branch predictor. The branch target buffer may be segmented into a first cache portion and a second cache portion, where, in response to an indication that the branch is to be taken, the BTB is to access an entry in one of the first cache portion and the second cache portion based at least in part on a type of the branch instruction, an occurrence frequency of the branch instruction, and spatial information regarding a distance between a target address of a target of the branch instruction and an address of the branch instruction. Other embodiments are described and claimed.
    Type: Grant
    Filed: December 22, 2020
    Date of Patent: January 7, 2025
    Assignee: Intel Corporation
    Inventors: Niranjan Kumar Soundararajan, Sreenivas Subramoney, Sr Swamy Saranam Chongala
  • Patent number: 12189559
    Abstract: Exemplary embodiments maintain spatial locality of the data being processed by a sparse CNN. The spatial locality is maintained by reordering the data to preserve spatial locality. The reordering may be performed on data elements and on data for groups of co-located data elements referred to herein as “chunks”. Thus, the data may be reordered into chunks, where each chunk contains data for spatially co-located data elements, and in addition, chunks may be organized so that spatially located chunks are together. The use of chunks helps to reduce the need to re-fetch data during processing. Chunk sizes may be chosen based on the memory constraints of the processing logic (e.g., cache sizes).
    Type: Grant
    Filed: June 26, 2020
    Date of Patent: January 7, 2025
    Assignee: Intel Corporation
    Inventors: Anirud Thyagharajan, Prashant Laddha, Om Omer, Sreenivas Subramoney
  • Publication number: 20250004772
    Abstract: Apparatus and method for a decompression hardware copy engine with efficient sequence overlapping copy. For example, one embodiment of an apparatus comprises: a plurality of processing cores, one or more of the plurality of processing cores to execute program code to produce a plurality of literals and sequences from a compressed data stream; and decompression acceleration circuitry to generate a decompressed data stream based on the plurality of literals and sequences, the decompression acceleration circuitry comprising: a sequence pre-processor circuit to process batches of sequences of the plurality of sequences and generate a plurality of copy instructions, the sequence pre-processor circuit to merge multiple copy operations corresponding to multiple sequences into a merged copy instruction; and a copy engine circuit to execute the copy instructions to produce the decompressed data stream.
    Type: Application
    Filed: June 30, 2023
    Publication date: January 2, 2025
    Inventors: Kamlesh PILLAI, Vinodh GOPAL, Gurpreet Singh KALSI, Sreenivas SUBRAMONEY, Wajdi K. FEGHALI
  • Publication number: 20250004766
    Abstract: Techniques for software defined super core usage are described. In some examples, a first and a second processor core are to operate as a single virtual core as configured by the operating system to execute the first set of instruction segments of the single threaded program and the second set of instruction segments of the single threaded program concurrently using a shared memory space, wherein the instruction segments are to include one or more of a store instruction to store live register data to be shared with another core and a load instruction to load live register data shared by another core.
    Type: Application
    Filed: June 28, 2024
    Publication date: January 2, 2025
    Inventors: Jayesh Gaur, Sumeet Bandishte, Anant Nori, Michael Chynoweth, Sreenivas Subramoney, Adi Yoaz, Anshuman Dhuliya
  • Patent number: 12182018
    Abstract: Methods and apparatus relating to an instruction and/or micro-architecture support for decompression on core are described. In an embodiment, decode circuitry decodes a decompression instruction into a first micro operation and a second micro operation. The first micro operation causes one or more load operations to fetch data into one or more cachelines of a cache of a processor core. Decompression Engine (DE) circuitry decompresses the fetched data from the one or more cachelines of the cache of the processor core in response to the second micro operation. Other embodiments are also disclosed and claimed.
    Type: Grant
    Filed: December 23, 2020
    Date of Patent: December 31, 2024
    Assignee: Intel Corporation
    Inventors: Jayesh Gaur, Adarsh Chauhan, Vinodh Gopal, Vedvyas Shanbhogue, Sreenivas Subramoney, Wajdi Feghali
  • Patent number: 12153925
    Abstract: An embodiment of an integrated circuit may comprise a core, a front end unit coupled to the core to decode one or more instruction wherein the front end unit includes a first decode path, a second decode path, and circuitry to: predict a taken branch of a conditional branch instruction of the one or more instructions, decode a predicted path of the taken branch on the first decode path, determine if the conditional branch instruction corresponds to a hard-to-predict conditional branch instruction and if the second decode path is available and, if so determined, decode an alternate path of a not-taken branch of the hard-to-predict conditional branch instruction on the second decode path. Other embodiments are disclosed and claimed.
    Type: Grant
    Filed: December 22, 2020
    Date of Patent: November 26, 2024
    Assignee: Intel Corporation
    Inventors: Niranjan Soundararajan, Sreenivas Subramoney
  • Patent number: 12140696
    Abstract: According to various embodiments, a radar device is described comprising a processor configured to generate a scene comprising an object based on a plurality of receive wireless signals, generate a ground truth object parameter of the object and generate a dataset representative of the scene and a radar detector configured to determine an object parameter of the object using a machine learning algorithm and the dataset, determine an error value of the machine learning algorithm using a cost function, the object parameter, and the ground truth object parameter and adjust the machine learning algorithm values to reduce the error value.
    Type: Grant
    Filed: July 14, 2021
    Date of Patent: November 12, 2024
    Assignee: Intel Corporation
    Inventors: Chulong Chen, Wenling Margaret Huang, Saiveena Kesaraju, Ivan Simões Gaspar, Pradyumna S. Singh, Biji George, Dipan Kumar Mandal, Om Ji Omer, Sreenivas Subramoney, Yuval Amizur, Leor Banin, Hao Chen, Nir Dvorecki, Shengbo Xu
  • Patent number: 12117908
    Abstract: Systems, apparatuses and methods may provide for technology that associates a unique identifier with an application, creates an entry in a metadata table, wherein the metadata table is at a fixed location in persistent system memory, populates the entry with the unique identifier, a user identifier, and a pointer to a root of a page table tree, and recovers in-use data pages after a system crash. In one example, the in-use data pages are recovered from the persistent system memory based on the metadata table and include one or more of application heap information or application stack information.
    Type: Grant
    Filed: December 4, 2020
    Date of Patent: October 15, 2024
    Assignee: Intel Corporation
    Inventors: Aravinda Prasad, Sreenivas Subramoney
  • Patent number: 12112171
    Abstract: Techniques for processing loops are described. An exemplary apparatus at least includes decoder circuitry to decode a single instruction, the single instruction to include a field for an opcode, the opcode to indicate execution circuitry is to perform an operation to configure execution of one or more loops, wherein the one or more loops are to include a plurality of configuration instructions and instructions that are to use metadata generated by ones of the plurality of configuration instructions; and execution circuitry to perform the operation as indicated by the opcode.
    Type: Grant
    Filed: December 26, 2020
    Date of Patent: October 8, 2024
    Assignee: Intel Corporation
    Inventors: Anant Nori, Shankar Balachandran, Sreenivas Subramoney, Joydeep Rakshit, Vedvyas Shanbhogue, Avishaii Abuhatzera, Belliappa Kuttanna
  • Patent number: 12066945
    Abstract: An embodiment of an integrated circuit may comprise a core, a first level core cache memory coupled to the core, a shared core cache memory coupled to the core, a first cache controller coupled to the core and communicatively coupled to the first level core cache memory, a second cache controller coupled to the core and communicatively coupled to the shared core cache memory, and circuitry coupled to the core and communicatively coupled to the first cache controller and the second cache controller to determine if a workload has a large code footprint, and, if so determined, partition N ways of the shared core cache memory into first and second chunks of ways with the first chunk of M ways reserved for code cache lines from the workload and the second chunk of N minus M ways reserved for data cache lines from the workload, where N and M are positive integer values and N minus M is greater than zero. Other embodiments are disclosed and claimed.
    Type: Grant
    Filed: December 22, 2020
    Date of Patent: August 20, 2024
    Assignee: Intel Corporation
    Inventors: Prathmesh Kallurkar, Anant Vithal Nori, Sreenivas Subramoney
  • Patent number: 12028094
    Abstract: Methods and apparatus relating to an Application Programming Interface (API) for fine grained low latency decompression within a processor core are described. In an embodiment, a decompression Application Programming Interface (API) receives an input handle to a data object. The data object includes compressed data and metadata. Decompression Engine (DE) circuitry decompresses the compressed data to generate uncompressed data. The DE circuitry decompress the compressed data in response to invocation of a decompression instruction by the decompression API. The metadata comprises a first operand to indicate a location of the compressed data, a second operand to indicate a size of the compressed data, a third operand to indicate a location to which decompressed data by the DE circuitry is to be stored, and a fourth operand to indicate a size of the decompressed data. Other embodiments are also disclosed and claimed.
    Type: Grant
    Filed: December 23, 2020
    Date of Patent: July 2, 2024
    Inventors: Jayesh Gaur, Adarsh Chauhan, Vinodh Gopal, Vedvyas Shanbhogue, Sreenivas Subramoney, Wajdi Feghali
  • Publication number: 20240211408
    Abstract: Apparatus and method for probabilistic cacheline replacement for accelerating address translation. For example, one embodiment of a processor comprises: a plurality of cores, each core to process instructions; a cache to be shared by a subset of the plurality of cores, the cache comprising an N-way set associative cache for storing page table entry (PTE) cachelines and non-PTE cachelines; and a cache manager to implement a PTE-aware eviction policy for evicting cachelines from the cache, the PTE-aware eviction policy to cause a reduction of evictions of PTE cachelines during non-PTE cacheline fills.
    Type: Application
    Filed: December 23, 2022
    Publication date: June 27, 2024
    Inventors: JOYDEEP RAKSHIT, ANANT VITHAL NORI, SREENIVAS SUBRAMONEY, HANNA ALAM, JOSEPH NUZMAN
  • Patent number: 12020033
    Abstract: Apparatus and method for memorizing repeat function calls are described herein. An apparatus embodiment includes: uop buffer circuitry to identify a function for memorization based on retiring micro-operations (uops) from a processing pipeline; memorization retirement circuitry to generate a signature of the function which includes input and output data of the function; a memorization data structure to store the signature; and predictor circuitry to detect an instance of the function to be executed by the processing pipeline and to responsively exclude a first subset of uops associated with the instance from execution when a confidence level associated with the function is above a threshold. One or more instructions that are data-dependent on execution of the instance is then provided with the output data of the function from the memorization data structure.
    Type: Grant
    Filed: December 24, 2020
    Date of Patent: June 25, 2024
    Assignee: Intel Corporation
    Inventors: Niranjan Kumar Soundararajan, Sreenivas Subramoney, Jayesh Gaur, S R Swamy Saranam Chongala
  • Publication number: 20240202000
    Abstract: Techniques and mechanisms for efficiently saving and recovering state of a processor core. In an embodiment, a processor core fetches and decodes a first instruction to generate a first decoded instruction, wherein the first instruction comprises a first opcode which corresponds to one or more components of the processor core. Execution of the first instruction comprises saving microarchitectural state of the one or more components to a memory of the core. In another embodiment, a processor core fetches and decodes a second instruction to generate a second decoded instruction, wherein the second instruction comprises a second opcode which corresponds to the same one or more components. Execution of the second instruction comprises restoring the microarchitectural state from the memory to the one or more components.
    Type: Application
    Filed: December 19, 2022
    Publication date: June 20, 2024
    Applicant: Intel Corporation
    Inventors: Niranjan Soundararajan, Sreenivas Subramoney
  • Publication number: 20240201949
    Abstract: Systems, apparatuses and methods may provide for technology that includes a compute-in-memory (CiM) enabled memory array to conduct digital bit-serial multiply and accumulate (MAC) operations on multi-bit input data and weight data stored in the CiM enabled memory array, an adder tree coupled to the CiM enabled memory array, an accumulator coupled to the adder tree, and an input bit selection stage coupled to the CiM enabled memory array, wherein the input bit selection stage restricts serial bit selection on the multi-bit input data to non-zero values during the digital MAC operations.
    Type: Application
    Filed: February 28, 2024
    Publication date: June 20, 2024
    Inventors: Sagar Varma Sayyaparaju, Om Ji Omer, Sreenivas Subramoney
  • Publication number: 20240143379
    Abstract: It is provided an apparatus for enabling sequential prefetching inside a host, the apparatus comprising interface circuitry, machine-readable instructions, and processing circuitry to execute the machine-readable instructions. The machine-readable instructions comprise instructions to identify a first memory access pattern of an application in a guest virtual address space inside a virtual machine. The application is running inside the virtual machine and wherein the virtual machine is running on the host. The machine-readable instructions further comprise instructions to modify a layout of a guest physical address space, wherein the guest physical address space is corresponding to the guest virtual address space, to sequentialize a second memory access pattern in a host virtual address space. The second memory access pattern in the host virtual address space is corresponding to the first memory access pattern of the application in the guest virtual address space.
    Type: Application
    Filed: September 13, 2023
    Publication date: May 2, 2024
    Inventors: Chandra PRAKASH, Aravinda PRASAD, Sreenivas SUBRAMONEY