Patents by Inventor Sreenivas Subramoney

Sreenivas Subramoney has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Techniques for Accelerating Neural Networks

Publication number: 20210166114

Abstract: Embodiments are generally directed to techniques for accelerating neural networks. Many embodiments include a hardware accelerator for a bi-directional multi-layered GRU and LC neural network. Some embodiments are particularly directed to a hardware accelerator that enables offloading of the entire LC+GRU network to the hardware accelerator. Various embodiments include a hardware accelerator with a plurality of matrix vector units to perform GRU steps in parallel with LC steps. For example, at least a portion of computation by a first matrix vector unit of a GRU step in a neural network may overlap at least a portion of computation by a second matrix vector unit of an output feature vector for the neural network. Several embodiments include overlapping computation associated with a layer of a neural network with data transfer associated with another of the neural network.

Type: Application

Filed: February 10, 2021

Publication date: June 3, 2021

Applicant: Intel Corporation

Inventors: Gurpreet S Kalsi, Ramachandra Chakenalli Nanjegowda, Kamlesh R Pillai, Sreenivas Subramoney
METHODS, APPARATUS, AND ARTICLES OF MANUFACTURE TO IMPROVE IN-MEMORY MULTIPLY AND ACCUMULATE OPERATIONS

Publication number: 20210111722

Abstract: Methods, apparatus, systems, and articles of manufacture are disclosed to improve in-memory multiply and accumulate operations. An example apparatus includes a first multiplexer in a subarray of memory, the first multiplexer to receive first values representative of a column of a lookup table (LUT) including entries to represent products of four-bit numbers and return second values from an intersection of a row and the column of the LUT based on a first element of a first operand; shift and adder logic in the subarray, the shift and adder logic to shift the second values based on at least one of the first element of the first operand or a first element of a second operand; and accumulation storage in the subarray, the accumulation storage to store at least the shifted second values.

Type: Application

Filed: December 22, 2020

Publication date: April 15, 2021

Inventors: Gurpreet Singh Kalsi, Akshay Krishna Ramanathan, Kamlesh Pillai, Sreenivas Subramoney, Srivatsa Rangachar Srinivasa, Anirud Thyagharajan, Om Ji Omer, Saurabh Jain
SPATIALLY SPARSE NEURAL NETWORK ACCELERATOR FOR MULTI-DIMENSION VISUAL ANALYTICS

Publication number: 20210110187

Abstract: Systems, apparatuses and methods may provide for technology that decodes data via an instruction that indicates a number of rulebooks to be processed, an input feature size, an output feature size, and a plurality of feature map base addresses, rearranges spatially distributed voxel output feature maps in the decoded data based on weight planes, and performs a channel-wise multiply-accumulate (MAC) operation on the rearranged spatially distributed voxel output feature maps to obtain an output, wherein the channel-wise MAC operation is performed as partial accumulations by a plurality of processing elements.

Type: Application

Filed: December 22, 2020

Publication date: April 15, 2021

Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Sreenivas Subramoney, Prashant Laddha, Om Ji Omer
Technology For Dynamically Tuning Processor Features

Publication number: 20210109839

Abstract: A processor comprises a microarchitectural feature and dynamic tuning unit (DTU) circuitry. The processor executes a program for first and second execution windows with the microarchitectural feature disabled and enabled, respectively. The DTU circuitry automatically determines whether the processor achieved worse performance in the second execution window. In response to determining that the processor achieved worse performance in the second execution window, the DTU circuitry updates a usefulness state for a selected address of the program to denote worse performance. In response to multiple consecutive determinations that the processor achieved worse performance with the microarchitectural feature enabled, the DTU circuitry automatically updates the usefulness state to denote a confirmed bad state.

Type: Application

Filed: December 21, 2020

Publication date: April 15, 2021

Inventors: Adarsh Chauhan, Jayesh Gaur, Franck Sala, Lihu Rappoport, Zeev Sperber, Adi Yoaz, Sreenivas Subramoney
TILE-BASED SPARSITY AWARE DATAFLOW OPTIMIZATION FOR SPARSE DATA

Publication number: 20210090328

Abstract: Systems, apparatuses and methods provide technology for optimizing processing of sparse data, such as 3D pointcloud data sets. The technology may include generating a locality-aware rulebook based on an input unstructured sparse data set, such as a 3D pointcloud data set, the locality-aware rulebook storing spatial neighborhood information for active voxels in the input unstructured sparse data set, computing an average receptive field (ARF) value based on the locality aware rulebook, and determining, from a plurality of tile size and loop order combinations, a tile size and loop order combination for processing the unstructured sparse data based on the computed ARF value. The technology may also include providing the locality-aware rulebook and the tile size and loop order combination to a compute engine such as a neural network, the compute engine to process the unstructured sparse data using the locality aware rulebook and the tile size and loop order combination.

Type: Application

Filed: December 7, 2020

Publication date: March 25, 2021

Inventors: Prashant Laddha, Anirud Thyagharajan, Om Ji Omer, Sreenivas Subramoney
RESTORING PERSISTENT APPLICATION DATA FROM NON-VOLATILE MEMORY AFTER A SYSTEM CRASH OR SYSTEM REBOOT

Publication number: 20210089411

Abstract: Systems, apparatuses and methods may provide for technology that associates a unique identifier with an application, creates an entry in a metadata table, wherein the metadata table is at a fixed location in persistent system memory, populates the entry with the unique identifier, a user identifier, and a pointer to a root of a page table tree, and recovers in-use data pages after a system crash. In one example, the in-use data pages are recovered from the persistent system memory based on the metadata table and include one or more of application heap information or application stack information.

Type: Application

Filed: December 4, 2020

Publication date: March 25, 2021

Inventors: Aravinda Prasad, Sreenivas Subramoney
PAGE ALLOCATION FOR CONTIGUITY-AWARE TRANSLATION LOOKASIDE BUFFERS

Publication number: 20210089467

Abstract: Systems, apparatuses and methods may provide for technology that allocates a physical page for a virtual memory address associated with a fault, determines a size and layout of an address space containing the virtual memory address, and conducts a soft reservation of a set of contiguous physical memory pages based on the size and the layout of the address space.

Type: Application

Filed: December 7, 2020

Publication date: March 25, 2021

Inventors: Aravinda Prasad, Sreenivas Subramoney
LOW OVERHEAD, HIGH BANDWIDTH RE-CONFIGURABLE INTERCONNECT APPARATUS AND METHOD

Publication number: 20210089448

Abstract: Described is an low overhead method and apparatus to reconfigure a pair of buffered interconnect links to operate in one of these three modes—first mode (e.g., bandwidth mode), second mode (e.g., latency mode), and third mode (e.g., energy mode). In bandwidth mode, each link in the pair buffered interconnect links carries a unique signal from source to destination. In latency mode, both links in the pair carry the same signal from source to destination, where one link in the pair is “primary” and other is called the “assist”. Temporal alignment of transitions in this pair of buffered interconnects reduces the effective capacitance of primary, thereby reducing delay or latency. In energy mode, one link in the pair, the primary, alone carries a signal, while the other link in the pair is idle. An idle neighbor on one side reduces energy consumption of the primary.

Type: Application

Filed: September 19, 2019

Publication date: March 25, 2021

Applicant: Intel Corporation

Inventors: Huichu Liu, Tanay Karnik, Tejpal Singh, Yen-Cheng Liu, Lavanya Subramanian, Mahesh Kumashikar, Sri Harsha Chodav, Sreenivas Subramoney, Kaushik Vaidyanathan, Daniel H. Morris, Uygar E. Avci, Ian A. Young
APPARATUSES, METHODS, AND SYSTEMS FOR DUAL SPATIAL PATTERN PREFETCHER

Publication number: 20210089456

Abstract: Systems, methods, and apparatuses relating to a dual spatial pattern prefetcher are described.

Type: Application

Filed: December 28, 2019

Publication date: March 25, 2021

Inventors: Rahul BERA, Anant Vithal NORI, Sreenivas SUBRAMONEY
Systems and methods for mitigating dram cache conflicts through hardware assisted redirection of pages (HARP)

Patent number: 10956327

Abstract: Disclosed embodiments relate to systems and methods structured to mitigate cache conflicts through hardware assisted redirection of pages. In one example, a processor includes a translation cache to store a physical to slice mapping in response to a cache conflict mitigation request corresponding to a page; and a cache controller to determine whether the translation cache comprises the physical to slice mapping; determine whether one of a plurality of slices in a translation table comprises the physical to slice mapping if the translation cache does not comprise the physical to slice mapping, the translation table communicably coupled to a non-volatile memory; and if the translation table does not comprise the physical to slice mapping, redirect the cache conflict mitigation request to the non-volatile memory; and allocate a new physical to slice mapping for the page to one of the plurality of slices in the translation table.

Type: Grant

Filed: June 29, 2019

Date of Patent: March 23, 2021

Assignee: Intel Corporation

Inventors: Adithya Nallan Chakravarthi, Anant Vithal Nori, Jayesh Gaur, Sreenivas Subramoney
System, apparatus and method for context-based override of history-based branch predictions

Patent number: 10949208

Abstract: In one embodiment, an apparatus includes a context-based prediction circuit to receive an instruction address for a branch instruction and a plurality of predictions associated with the branch instruction from a global prediction circuit. The context-based prediction circuit may include: a table having a plurality of entries each to store a context prediction value for a corresponding branch instruction; and a control circuit to generate, for the branch instruction, an index value to index into the table, the control circuit to generate the index value based at least in part on at least some of the plurality of predictions associated with the branch instruction and the instruction address for the branch instruction. Other embodiments are described and claimed.

Type: Grant

Filed: December 17, 2018

Date of Patent: March 16, 2021

Assignee: Intel Corporation

Inventors: Saurabh Gupta, Niranjan Soundararajan, Ragavendra Natarajan, Jared Warner Stark, IV, Lihu Rappoport, Sreenivas Subramoney
MULTI-LEVEL SYSTEM MEMORY WITH NEAR MEMORY CAPABLE OF STORING COMPRESSED CACHE LINES

Publication number: 20210056030

Abstract: A method is described. The method includes receiving a read or write request for a cache line. The method includes directing the request to a set of logical super lines based on the cache line's system memory address. The method includes associating the request with a cache line of the set of logical super lines. The method includes, if the request is a write request: compressing the cache line to form a compressed cache line, breaking the cache line down into smaller data units and storing the smaller data units into a memory side cache. The method includes, if the request is a read request: reading smaller data units of the compressed cache line from the memory side cache and decompressing the cache line.

Type: Application

Filed: November 6, 2020

Publication date: February 25, 2021

Inventors: Israel DIAMAND, Alaa R. ALAMELDEEN, Sreenivas SUBRAMONEY, Supratik MAJUMDER, Srinivas Santosh Kumar MADUGULA, Jayesh GAUR, Zvika GREENFIELD, Anant V. NORI
Technology for dynamically tuning processor features

Patent number: 10915421

Abstract: A processor comprises a microarchitectural feature and dynamic tuning unit (DTU) circuitry. The processor executes a program for first and second execution windows with the microarchitectural feature disabled and enabled, respectively. The DTU circuitry automatically determines whether the processor achieved worse performance in the second execution window. In response to determining that the processor achieved worse performance in the second execution window, the DTU circuitry updates a usefulness state for a selected address of the program to denote worse performance. In response to multiple consecutive determinations that the processor achieved worse performance with the microarchitectural feature enabled, the DTU circuitry automatically updates the usefulness state to denote a confirmed bad state.

Type: Grant

Filed: September 19, 2019

Date of Patent: February 9, 2021

Assignee: Intel Corporation

Inventors: Adarsh Chauhan, Jayesh Gaur, Franck Sala, Lihu Rappoport, Zeev Sperber, Adi Yoaz, Sreenivas Subramoney
DETECTING A DYNAMIC CONTROL FLOW RE-CONVERGENCE POINT FOR CONDITIONAL BRANCHES IN HARDWARE

Publication number: 20210019149

Abstract: Systems, methods, and apparatuses relating to hardware for auto-predication of critical branches. In one embodiment, a processor core includes a decoder to decode instructions into decoded instructions, an execution unit to execute the decoded instructions, a branch predictor circuit to predict a future outcome of a branch instruction, and a branch predication manager circuit to disable use of the predicted future outcome for a conditional critical branch comprising the branch instruction.

Type: Application

Filed: December 28, 2019

Publication date: January 21, 2021

Inventors: ADARSH CHAUHAN, Franck SALA, Jayesh GAUR, Zeev SPERBER, Lihu RAPPOPORT, Adi YOAZ, Sreenivas SUBRAMONEY
HIGH BANDWIDTH DESTRUCTIVE READ EMBEDDED MEMORY

Publication number: 20200411079

Abstract: Described are mechanisms and methods for amortizing the cost of address decode, row-decode and wordline firing across multiple read accesses (instead of just on one read access). Some or all memory locations that share a wordline (WL) may be read, by walking through column multiplexor addresses (instead of just reading out one column multiplexor address per WL fire or memory access). The mechanisms and methods disclosed herein may advantageously enable N distinct memory words to be read out if the array uses an N-to-1 column multiplexor. Since memories such as embedded DRAMs (eDRAMs) may undergo a destructive read, for a given WL fire, a design may be disposed to sense N distinct memory words and restore them in order.

Type: Application

Filed: June 29, 2019

Publication date: December 31, 2020

Applicant: Intel Corporation

Inventors: Kaushik Vaidyanathan, Huichu Liu, Tanay Karnik, Sreenivas Subramoney, Jayesh Gaur, Sudhanshu Shukla
SYSTEMS AND METHODS FOR MITIGATING DRAM CACHE CONFLICTS THROUGH HARDWARE ASSISTED REDIRECTION OF PAGES (HARP)

Publication number: 20200409849

Abstract: Disclosed embodiments relate to systems and methods structured to mitigate cache conflicts through hardware assisted redirection of pages. In one example, a processor includes a translation cache to store a physical to slice mapping in response to a cache conflict mitigation request corresponding to a page; and a cache controller to determine whether the translation cache comprises the physical to slice mapping; determine whether one of a plurality of slices in a translation table comprises the physical to slice mapping if the translation cache does not comprise the physical to slice mapping, the translation table communicably coupled to a non-volatile memory; and if the translation table does not comprise the physical to slice mapping, redirect the cache conflict mitigation request to the non-volatile memory; and allocate a new physical to slice mapping for the page to one of the plurality of slices in the translation table.

Type: Application

Filed: June 29, 2019

Publication date: December 31, 2020

Inventors: Adithya NALLAN CHAKRAVARTHI, Anant Vithal NORI, Jayesh GAUR, Sreenivas SUBRAMONEY
Memory aware reordered source

Patent number: 10866902

Abstract: Processor, apparatus, and method for reordering a stream of memory access requests to establish locality are described herein. One embodiment of a method includes: storing in a request queue memory access requests generated by a plurality of execution units, the memory access requests comprising a first request to access a first memory page in a memory and a second request to access a second memory page in the memory; maintaining a list of unique memory pages, each unique memory page associated with one or more memory access requests stored the request queue and is to be accessed by the one or more memory access requests; selecting a current memory page from the list of unique memory pages; and dispatching from the request queue to the memory, all memory access requests associated with the current memory page before any other memory access request in the request queue is dispatched.

Type: Grant

Filed: December 28, 2016

Date of Patent: December 15, 2020

Assignee: Intel Corporation

Inventors: Ishwar S. Bhati, Udit Dhawan, Jayesh Gaur, Sreenivas Subramoney
Supporting timely and context triggered prefetching in microprocessors

Patent number: 10846084

Abstract: Implementations of the disclosure implement timely and context triggered (TACT) prefetching that targets particular load IPs in a program contributing to a threshold amount of the long latency accesses. A processing device comprising an execution unit; and a prefetcher circuit communicably coupled to the execution unit is provided. The prefetcher circuit is to detect a memory request for a target instruction pointer (IP) in a program to be executed by the execution unit. A trigger IP is identified to initiate a prefetch operation of memory data for the target IP. Thereupon, an association is determined between memory addresses of the trigger IP and the target IP. The association comprising a series of offsets representing a path between the trigger IP and an instance of the target IP in memory. Based on the association, an offset from the memory address of the trigger IP to prefetch the memory data is produced.

Type: Grant

Filed: January 3, 2018

Date of Patent: November 24, 2020

Assignee: Intel Corporation

Inventors: Anant Vithal Nori, Sreenivas Subramoney, Shankar Balachandran, Hong Wang
System, apparatus and method for focused data value prediction to accelerate focused instructions

Patent number: 10846093

Abstract: In one embodiment, an apparatus includes: a value prediction storage including a plurality of entries each to store address information of an instruction, a value prediction for the instruction and a confidence value for the value prediction; and a control circuit coupled to the value prediction storage. In response to an instruction address of a first instruction, the control circuit is to access a first entry of the value prediction storage to obtain a first value prediction associated with the first instruction and control execution of a second instruction based at least in part on the first value prediction. Other embodiments are described and claimed.

Type: Grant

Filed: December 21, 2018

Date of Patent: November 24, 2020

Assignee: Intel Corporation

Inventors: Sumeet Bandishte, Jayesh Gaur, Sreenivas Subramoney, Hong Wang
REORDERING OF SPARSE DATA TO INDUCE SPATIAL LOCALITY FOR N-DIMENSIONAL SPARSE CONVOLUTIONAL NEURAL NETWORK PROCESSING

Publication number: 20200327396

Abstract: Exemplary embodiments maintain spatial locality of the data being processed by a sparse CNN. The spatial locality is maintained by reordering the data to preserve spatial locality. The reordering may be performed on data elements and on data for groups of co-located data elements referred to herein as “chunks”. Thus, the data may be reordered into chunks, where each chunk contains data for spatially co-located data elements, and in addition, chunks may be organized so that spatially located chunks are together. The use of chunks helps to reduce the need to re-fetch data during processing. Chunk sizes may be chosen based on the memory constraints of the processing logic (e.g., cache sizes).

Type: Application

Filed: June 26, 2020

Publication date: October 15, 2020

Applicant: Intel Corporation

Inventors: Anirud Thyagharajan, Prashant Laddha, Om Omer, Sreenivas Subramoney

prev 1 2 3 4 5 6 7 8 … next