Patents by Inventor Sreenivas Subramoney

Sreenivas Subramoney has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

SPATIALLY SPARSE NEURAL NETWORK ACCELERATOR FOR MULTI-DIMENSION VISUAL ANALYTICS

Publication number: 20230169319

Abstract: Systems, apparatuses and methods may provide for technology that decodes data via an instruction that indicates a number of rulebooks to be processed, an input feature size, an output feature size, and a plurality of feature map base addresses, rearranges spatially distributed voxel output feature maps in the decoded data based on weight planes, and performs a channel-wise multiply-accumulate (MAC) operation on the rearranged spatially distributed voxel output feature maps to obtain an output, wherein the channel-wise MAC operation is performed as partial accumulations by a plurality of processing elements.

Type: Application

Filed: January 25, 2023

Publication date: June 1, 2023

Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Sreenivas Subramoney, Prashant Laddha, Om Ji Omer
Technology for dynamically tuning processor features

Patent number: 11656971

Abstract: A processor comprises a microarchitectural feature and dynamic tuning unit (DTU) circuitry. The processor executes a program for first and second execution windows with the microarchitectural feature disabled and enabled, respectively. The DTU circuitry automatically determines whether the processor achieved worse performance in the second execution window. In response to determining that the processor achieved worse performance in the second execution window, the DTU circuitry updates a usefulness state for a selected address of the program to denote worse performance. In response to multiple consecutive determinations that the processor achieved worse performance with the microarchitectural feature enabled, the DTU circuitry automatically updates the usefulness state to denote a confirmed bad state.

Type: Grant

Filed: January 24, 2022

Date of Patent: May 23, 2023

Assignee: Intel Corporation

Inventors: Adarsh Chauhan, Jayesh Gaur, Franck Sala, Lihu Rappoport, Zeev Sperber, Adi Yoaz, Sreenivas Subramoney
Detecting a dynamic control flow re-convergence point for conditional branches in hardware

Patent number: 11645078

Abstract: Systems, methods, and apparatuses relating to hardware for auto-predication of critical branches. In one embodiment, a processor core includes a decoder to decode instructions into decoded instructions, an execution unit to execute the decoded instructions, a branch predictor circuit to predict a future outcome of a branch instruction, and a branch predication manager circuit to disable use of the predicted future outcome for a conditional critical branch comprising the branch instruction.

Type: Grant

Filed: December 28, 2019

Date of Patent: May 9, 2023

Assignee: Intel Corporation

Inventors: Adarsh Chauhan, Franck Sala, Jayesh Gaur, Zeev Sperber, Lihu Rappoport, Adi Yoaz, Sreenivas Subramoney
Spatially sparse neural network accelerator for multi-dimension visual analytics

Patent number: 11620818

Abstract: Systems, apparatuses and methods may provide for technology that decodes data via an instruction that indicates a number of rulebooks to be processed, an input feature size, an output feature size, and a plurality of feature map base addresses, rearranges spatially distributed voxel output feature maps in the decoded data based on weight planes, and performs a channel-wise multiply-accumulate (MAC) operation on the rearranged spatially distributed voxel output feature maps to obtain an output, wherein the channel-wise MAC operation is performed as partial accumulations by a plurality of processing elements.

Type: Grant

Filed: December 22, 2020

Date of Patent: April 4, 2023

Assignee: INTEL CORPORATION

Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Sreenivas Subramoney, Prashant Laddha, Om Ji Omer
PREDICTION OF NEXT TAKEN BRANCHES IN A PROCESSOR

Publication number: 20230100693

Abstract: In an embodiment, a processor may include an execution circuit to execute a plurality of instructions. The processor may also include a prediction circuit to: in response to a detection of a first target instruction in a program, identify a prediction data entry associated with a path history for the first target instruction, the identified prediction data entry to indicate an offset distance from the first target instruction to a predicted next taken branch of the program; and determine the predicted next taken branch of the program based on the offset distance indicated by the identified prediction data entry. Other embodiments are described and claimed.

Type: Application

Filed: September 24, 2021

Publication date: March 30, 2023

Inventors: Saurabh Gupta, Ragavendra Natarajan, Niranjan K. Soundararajan, Jared W. Stark, IV, Sreenivas Subramoney
CACHING BASED ON BRANCH INSTRUCTIONS IN A PROCESSOR

Publication number: 20230103206

Abstract: In an embodiment, a processor may include an execution circuit to execute a plurality of instructions, a cache, and a decode circuit. The decode circuit may be to: detect a branch instruction in a program, the branch instruction to cause execution to follow either a first path or a second path in the program; and in response to a determination that the branch instruction is a hard to predict (HTP) branch, cause first and second sets of instructions to be stored in the cache, where the first set of instructions is included in the first path, and where the second set of instructions is included in the second path. Other embodiments are described and claimed.

Type: Application

Filed: September 24, 2021

Publication date: March 30, 2023

Inventors: Niranjan Soundararajan, Sreenivas Subramoney, Neelu Shivprakash Kalani, Vishal Gupta
Cryptographic computing engine for memory load and store units of a microarchitecture pipeline

Patent number: 11575504

Abstract: A processor comprises a first register to store an encoded pointer to a memory location. First context information is stored in first bits of the encoded pointer and a slice of a linear address of the memory location is stored in second bits of the encoded pointer. The processor also includes circuitry to execute a memory access instruction to obtain a physical address of the memory location, access encrypted data at the memory location, derive a first tweak based at least in part on the encoded pointer, and generate a keystream based on the first tweak and a key. The circuitry is to further execute the memory access instruction to store state information associated with memory access instruction in a first buffer, and to decrypt the encrypted data based on the keystream. The keystream is to be generated at least partly in parallel with accessing the encrypted data.

Type: Grant

Filed: January 29, 2020

Date of Patent: February 7, 2023

Assignee: Intel Corporation

Inventors: David M. Durham, Michael LeMay, Michael E. Kounavis, Santosh Ghosh, Sergej Deutsch, Anant Vithal Nori, Jayesh Gaur, Sreenivas Subramoney, Karanvir S. Grewal
CONTROL LOGIC FOR CONFIGURABLE AND SCALABLE MULTI-PRECISION OPERATION

Publication number: 20220382514

Abstract: Systems, apparatuses, and methods include technology that determines whether an operation is a floating-point based computation or an integer-based computation. When the operation is the floating-point based computation, the technology generates a map of the operation to integer-based compute engines to control the integer-based compute engines to execute the floating-point based computation. When the operation is the integer-based computation, the technology controls the integer-based compute engines to execute the integer-based computation.

Type: Application

Filed: June 6, 2022

Publication date: December 1, 2022

Inventors: Kamlesh Pillai, Gurpreet Singh Kalsi, Sreedevi Ambika, Om Omer, Sreenivas Subramoney
MAINTAINING CONTIGUITY OF VIRTUAL TO PHYSICAL ADDRESS MAPPINGS TO EXPLOIT CONTIGUITY-AWARE TRANSLATION LOOK-ASIDE BUFFER HARDWARE

Publication number: 20220283954

Abstract: Embodiments described herein are generally directed to maintaining contiguity of virtual to physical address mappings to exploit a contiguity-aware TLB. In an example, information regarding a migration set of one or more pages within a physical address space that have been identified for migration from a source tier of memory to a target tier of memory is received in which the physical address space comprises a first contiguous region of physical memory addresses and a VMA includes a second contiguous region of virtual memory addresses corresponding to the first contiguous region. It is determined whether the migration would break contiguity of a mapping maintained by a contiguity-aware TLB between pages of the first contiguous region and pages of the second contiguous region. Responsive an affirmative determination, discontinuities within the mapping resulting from the migration are minimized by intelligently increasing or decreasing the migration set.

Type: Application

Filed: May 26, 2022

Publication date: September 8, 2022

Applicant: Intel Corporation

Inventors: Aravinda Prasad, Sreenivas Subramoney
Technology For Dynamically Tuning Processor Features

Publication number: 20220206925

Abstract: A processor comprises a microarchitectural feature and dynamic tuning unit (DTU) circuitry. The processor executes a program for first and second execution windows with the microarchitectural feature disabled and enabled, respectively. The DTU circuitry automatically determines whether the processor achieved worse performance in the second execution window. In response to determining that the processor achieved worse performance in the second execution window, the DTU circuitry updates a usefulness state for a selected address of the program to denote worse performance. In response to multiple consecutive determinations that the processor achieved worse performance with the microarchitectural feature enabled, the DTU circuitry automatically updates the usefulness state to denote a confirmed bad state.

Type: Application

Filed: January 24, 2022

Publication date: June 30, 2022

Inventors: Adarsh Chauhan, Jayesh Gaur, Franck Sala, Lihu Rappoport, Zeev Sperber, Adi Yoaz, Sreenivas Subramoney
APPARATUS AND METHOD FOR HARDWARE-BASED MEMOIZATION OF FUNCTION CALLS TO REDUCE INSTRUCTION EXECUTION

Publication number: 20220206816

Abstract: Apparatus and method for memoizing repeat function calls are described herein. An apparatus embodiment includes: uop buffer circuitry to identify a function for memoization based on retiring micro-operations (uops) from a processing pipeline; memoization retirement circuitry to generate a signature of the function which includes input and output data of the function; a memoization data structure to store the signature; and predictor circuitry to detect an instance of the function to be executed by the processing pipeline and to responsively exclude a first subset of uops associated with the instance from execution when a confidence level associated with the function is above a threshold. One or more instructions that are data-dependent on execution of the instance is then provided with the output data of the function from the memoization data structure.

Type: Application

Filed: December 24, 2020

Publication date: June 30, 2022

Applicant: Intel Corporation

Inventors: Niranjan Kumar Soundararajan, Sreenivas Subramoney, Jayesh Gaur, S R Swamy Saranam Chongala
APPLICATION PROGRAMMING INTERFACE FOR FINE GRAINED LOW LATENCY DECOMPRESSION WITHIN PROCESSOR CORE

Publication number: 20220197813

Abstract: Methods and apparatus relating to techniques for increasing per core memory bandwidth by using forget store operations are described. In an embodiment, a cache stores a buffer. Execution circuitry executes an instruction. The instruction causes one or more cachelines in the cache to be marked based on a start address for the buffer and a size of the buffer. A marked cacheline in the cache is to be prevented from being written back to memory. Other embodiments are also disclosed and claimed.

Type: Application

Filed: December 23, 2020

Publication date: June 23, 2022

Applicant: Intel Corporation

Inventors: Jayesh Gaur, Adarsh Chauhan, Vinodh Gopal, Vedvyas Shanbhogue, Sreenivas Subramoney, Wajdi Feghali
SPECULATIVE DECOMPRESSION WITHIN PROCESSOR CORE CACHES

Publication number: 20220197643

Abstract: Methods and apparatus relating to speculative decompression within processor core caches are described. In an embodiment, decode circuitry decodes a decompression instruction into a first micro operation and a second micro operation. The first micro operation causes one or more load operations to fetch data into a plurality of cachelines of a cache of a processor core. Decompression Engine (DE) circuitry decompresses the fetched data from the plurality of cachelines of the cache of the processor core in response to the second micro operation. The decompression instruction causes the DE circuitry to perform an out-of-order decompression of the plurality of cachelines. Other embodiments are also disclosed and claimed.

Type: Application

Filed: December 23, 2020

Publication date: June 23, 2022

Applicant: Intel Corporation

Inventors: Jayesh Gaur, Adarsh Chauhan, Vinodh Gopal, Vedvyas Shanbhogue, Sreenivas Subramoney, Wajdi Feghali
ALTERNATE PATH DECODE FOR HARD-TO-PREDICT BRANCH

Publication number: 20220197650

Abstract: An embodiment of an integrated circuit may comprise a core, a front end unit coupled to the core to decode one or more instruction wherein the front end unit includes a first decode path, a second decode path, and circuitry to: predict a taken branch of a conditional branch instruction of the one or more instructions, decode a predicted path of the taken branch on the first decode path, determine if the conditional branch instruction corresponds to a hard-to-predict conditional branch instruction and if the second decode path is available and, if so determined, decode an alternate path of a not-taken branch of the hard-to-predict conditional branch instruction on the second decode path. Other embodiments are disclosed and claimed.

Type: Application

Filed: December 22, 2020

Publication date: June 23, 2022

Applicant: Intel Corporation

Inventors: Niranjan Soundararajan, Sreenivas Subramoney
DYNAMIC SHARED CACHE PARTITION FOR WORKLOAD WITH LARGE CODE FOOTPRINT

Publication number: 20220197794

Abstract: An embodiment of an integrated circuit may comprise a core, a first level core cache memory coupled to the core, a shared core cache memory coupled to the core, a first cache controller coupled to the core and communicatively coupled to the first level core cache memory, a second cache controller coupled to the core and communicatively coupled to the shared core cache memory, and circuitry coupled to the core and communicatively coupled to the first cache controller and the second cache controller to determine if a workload has a large code footprint, and, if so determined, partition N ways of the shared core cache memory into first and second chunks of ways with the first chunk of M ways reserved for code cache lines from the workload and the second chunk of N minus M ways reserved for data cache lines from the workload, where N and M are positive integer values and N minus M is greater than zero. Other embodiments are disclosed and claimed.

Type: Application

Filed: December 22, 2020

Publication date: June 23, 2022

Applicant: Intel Corporation

Inventors: Prathmesh Kallurkar, Anant Vithal Nori, Sreenivas Subramoney
Instruction and Micro-Architecture Support for Decompression on Core

Publication number: 20220197799

Abstract: Methods and apparatus relating to an instruction and/or micro-architecture support for decompression on core are described. In an embodiment, decode circuitry decodes a decompression instruction into a first micro operation and a second micro operation. The first micro operation causes one or more load operations to fetch data into one or more cachelines of a cache of a processor core. Decompression Engine (DE) circuitry decompresses the fetched data from the one or more cachelines of the cache of the processor core in response to the second micro operation. Other embodiments are also disclosed and claimed.

Type: Application

Filed: December 23, 2020

Publication date: June 23, 2022

Applicant: Intel Corporation

Inventors: Jayesh Gaur, Adarsh Chauhan, Vinodh Gopal, Vedvyas Shanbhogue, Sreenivas Subramoney, Wajdi Feghali
APPLICATION PROGRAMMING INTERFACE FOR FINE GRAINED LOW LATENCY DECOMPRESSION WITHIN PROCESSOR CORE

Publication number: 20220197659

Abstract: Methods and apparatus relating to an Application Programming Interface (API) for fine grained low latency decompression within a processor core are described. In an embodiment, a decompression Application Programming Interface (API) receives an input handle to a data object. The data object includes compressed data and metadata. Decompression Engine (DE) circuitry decompresses the compressed data to generate uncompressed data. The DE circuitry decompress the compressed data in response to invocation of a decompression instruction by the decompression API. The metadata comprises a first operand to indicate a location of the compressed data, a second operand to indicate a size of the compressed data, a third operand to indicate a location to which decompressed data by the DE circuitry is to be stored, and a fourth operand to indicate a size of the decompressed data. Other embodiments are also disclosed and claimed.

Type: Application

Filed: December 23, 2020

Publication date: June 23, 2022

Applicant: Intel Corporation

Inventors: Jayesh Gaur, Adarsh Chauhan, Vinodh Gopal, Vedvyas Shanbhogue, Sreenivas Subramoney, Wajdi Feghali
Methods, apparatus, articles of manufacture to perform accelerated matrix multiplication

Patent number: 11347828

Abstract: A disclosed apparatus to multiply matrices includes a compute engine. The compute engine includes multipliers in a two dimensional array that has a plurality of array locations defined by columns and rows. The apparatus also includes a plurality of adders in columns. A broadcast interconnect between a cache and the multipliers broadcasts a first set of operand data elements to multipliers in the rows of the array. A unicast interconnect unicasts a second set of operands between a data buffer and the multipliers. The multipliers multiply the operands to generate a plurality of outputs, and the adders add the outputs generated by the multipliers.

Type: Grant

Filed: March 27, 2020

Date of Patent: May 31, 2022

Assignee: Intel Corporation

Inventors: Biji George, Om Ji Omer, Dipan Kumar Mandal, Cormac Brick, Lance Hacking, Sreenivas Subramoney, Belliappa Kuttanna
Instruction set architecture based and automatic load tracking for opportunistic re-steer of data-dependent flaky branches

Patent number: 11321089

Abstract: Methods and apparatuses relating to instruction set architecture (ISA) based and automatic load tracking hardware for opportunistic re-steer of data-dependent flaky branches are described.

Type: Grant

Filed: June 27, 2020

Date of Patent: May 3, 2022

Assignee: Intel Corporation

Inventors: Saurabh Gupta, Niranjan Soundararajan, Ragavendra Natarajan, Sreenivas Subramoney
HIGH CONFIDENCE MULTIPLE BRANCH OFFSET PREDICTOR

Publication number: 20220129763

Abstract: An embodiment of an integrated circuit may comprise a front end unit, and circuitry coupled to the front end unit, the circuitry to provide a high confidence, multiple branch offset predictor. For example, the circuitry may be configured to identify an entry in a multiple-taken-branch prediction table that corresponds to a conditional branch instruction, determine if a confidence level of the entry exceeds a threshold confidence level, and, if so determined, provide multiple taken branch predictions that stem from the conditional branch instruction from the entry in the multiple-taken-branch prediction table. Other embodiments are disclosed and claimed.

Type: Application

Filed: December 22, 2020

Publication date: April 28, 2022

Applicant: Intel Corporation

Inventors: Sumeet Bandishte, Jayesh Gaur, Polychronis Xekalakis, Ariel Sabba, Deborah Marr, Sreenivas Subramoney

prev 1 2 3 4 5 6 7 … next