Patents by Inventor Karthikeyan Avudaiyappan

Karthikeyan Avudaiyappan has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Accelerating processing based on sparsity for neural network hardware processors

Patent number: 11853717

Abstract: Embodiments of the present disclosure include systems and methods for accelerating processing based on sparsity for neural network hardware processors. An input manager determines a pair of non-zero values from a pair of data streams in a plurality of pairs of data streams and retrieve the pair of non-zero values from the pair of data streams. A multiplier performs a multiplication operation on the pair of non-zero values and generate a product of the pair of non-zero values. An accumulator manager receives the product of the pair of non-zero values from the multiplier and sends the product of the pair of non-zero values to a corresponding accumulator in a plurality of accumulators.

Type: Grant

Filed: January 14, 2021

Date of Patent: December 26, 2023

Assignee: Microsoft Technology Licensing, LLC

Inventors: Karthikeyan Avudaiyappan, Jeffrey Andrews
FETCHING NON-ZERO DATA

Publication number: 20230343374

Abstract: Embodiments of the present disclosure include techniques storing and retrieving data. In one embodiment, sub-matrices of data are stored as row slices and column slices. A fetch circuit determines if particular slices of one sub-matrix, when combined with corresponding slices of another sub-matrix, produce a zero result and need not be retrieved. In another embodiment, the present disclosure includes a memory circuit comprising memory banks and sub-banks. The sub-banks store slices of sub-matrices. A request moves between serially configured memory banks and slices in different sub-banks may be retrieved at the same time.

Type: Application

Filed: April 26, 2022

Publication date: October 26, 2023

Inventors: Karthikeyan AVUDAIYAPPAN, Jeffrey A ANDREWS
FETCHING NON-ZERO DATA

Publication number: 20230342291

Abstract: Embodiments of the present disclosure include techniques storing and retrieving data. In one embodiment, sub-matrices of data are stored as row slices and column slices. A fetch circuit determines if particular slices of one sub-matrix, when combined with corresponding slices of another sub-matrix, produce a zero result and need not be retrieved. In another embodiment, the present disclosure includes a memory circuit comprising memory banks and sub-banks. The sub-banks store slices of sub-matrices. A request moves between serially configured memory banks and slices in different sub-banks may be retrieved at the same time.

Type: Application

Filed: April 26, 2022

Publication date: October 26, 2023

Inventors: Karthikeyan AVUDAIYAPPAN, Jeffrey A ANDREWS
MULTIPLY-ACCUMULATOR ARRAY CIRCUIT WITH ACTIVATION CACHE

Publication number: 20230244448

Abstract: Embodiments of the present disclosure include a multiply-accumulator (MAC) array circuit comprising an activation cache and a plurality of multiply-accumulator (MA) groups. The activation cache comprises cache lines configured to store sub-slices of an input activation array. The cache lines are coupled to particular MA groups. Activations stored in the cache lines may be used and reused across multiple MA groups.

Type: Application

Filed: February 1, 2022

Publication date: August 3, 2023

Inventors: Karthikeyan AVUDAIYAPPAN, Jeffrey A ANDREWS
MULTIPURPOSE MULTIPLY-ACCUMULATOR ARRAY

Publication number: 20230214185

Abstract: Embodiments of the present disclosure include a multipurpose multiply-accumulator (MAC) array circuit comprising one or more input memories for receiving operands and a plurality of multiply-accumulator circuits each selectively coupled to the one or more input memories to receive at least a pair of operands and generate a result. Each of the plurality of multiply-accumulator circuits receives operands from the one or more input memories independently. Additionally, selection of operands from the one or more input memories is controlled based on at least an operation and/or data types, where different operation and/or data types configure the plurality of multiply-accumulator circuits to receive different pairs of operands from the one or more input memories to execute particular operation types.

Type: Application

Filed: December 28, 2021

Publication date: July 6, 2023

Inventors: Karthikeyan AVUDAIYAPPAN, Jeffrey A. ANDREWS
ACCELERATING PROCESSING BASED ON SPARSITY FOR NEURAL NETWORK HARDWARE PROCESSORS

Publication number: 20220222043

Abstract: Embodiments of the present disclosure include systems and methods for accelerating processing based on sparsity for neural network hardware processors. An input manager determines a pair of non-zero values from a pair of data streams in a plurality of pairs of data streams and retrieve the pair of non-zero values from the pair of data streams. A multiplier performs a multiplication operation on the pair of non-zero values and generate a product of the pair of non-zero values. An accumulator manager receives the product of the pair of non-zero values from the multiplier and sends the product of the pair of non-zero values to a corresponding accumulator in a plurality of accumulators.

Type: Application

Filed: January 14, 2021

Publication date: July 14, 2022

Inventors: Karthikeyan Avudaiyappan, Jeffrey Andrews
DETERMINING SCHEDULES FOR PROCESSING NEURAL NETWORKS ON HARDWARE

Publication number: 20220215234

Abstract: Embodiments of the present disclosure include systems and methods for determining schedules for processing neural networks on hardware. A set of instructions for processing data through a neural network is received. Based on a hardware definition specifying the set of hardware units and functions that each hardware unit in the set of the hardware unit is configured to perform, a schedule of a set of operations to be performed by a subset of the set of hardware units to implement the set of instructions are determined. The schedule of the set of operations are distributed to the subset of the set of hardware units.

Type: Application

Filed: January 7, 2021

Publication date: July 7, 2022

Inventors: Karthikeyan Avudaiyappan, Jeffrey Andrews
Methods and systems for managing synonyms in virtually indexed physically tagged caches

Patent number: 11314647

Abstract: Methods and systems for managing synonyms in VIPT caches are disclosed. A method includes tracking lines of a copied cache using a directory, examining a specified bit of a virtual address that is associated with a load request and determining its status and making an entry in one of a plurality of parts of the directory based on the status of the specified bit of the virtual address that is examined. The method further includes updating one of, and invalidating the other of, a cache line that is associated with the virtual address that is stored in a first index of the copied cache, and a cache line that is associated with a synonym of the virtual address that is stored at a second index of the copied cache, upon receiving a request to update a physical address associated with the virtual address.

Type: Grant

Filed: December 23, 2019

Date of Patent: April 26, 2022

Assignee: INTEL CORPORATION

Inventor: Karthikeyan Avudaiyappan
Systems and methods for load canceling in a processor that is connected to an external interconnect fabric

Patent number: 10884739

Abstract: Systems and methods for load canceling in a processor that is connected to an external interconnect fabric are disclosed. As a part of a method for load canceling in a processor that is connected to an external bus, and responsive to a flush request and a corresponding cancellation of pending speculative loads from a load queue, a type of one or more of the pending speculative loads that are positioned in the instruction pipeline external to the processor, is converted from load to prefetch. Data corresponding to one or more of the pending speculative loads that are positioned in the instruction pipeline external to the processor is accessed and returned to cache as prefetch data. The prefetch data is retired in a cache location of the processor.

Type: Grant

Filed: May 24, 2018

Date of Patent: January 5, 2021

Assignee: INTEL CORPORATION

Inventors: Karthikeyan Avudaiyappan, Mohammad Abdallah
Method and apparatus for supporting a plurality of load accesses of a cache in a single cycle to maintain throughput

Patent number: 10698833

Abstract: A method for supporting a plurality of requests for access to a data cache memory (“cache”) is disclosed. The method comprises accessing a first set of requests to access the cache, wherein the cache comprises a plurality of blocks. Further, responsive to the first set of requests to access the cache, the method comprises accessing a tag memory that maintains a plurality of copies of tags for each entry in the cache and identifying tags that correspond to individual requests of the first set. The method also comprises performing arbitration in a same clock cycle as the accessing and identifying of tags, wherein the arbitration comprises: (a) identifying a second set of requests to access the cache from the first set, wherein the second set accesses a same block within the cache; and (b) selecting each request from the second set to receive data from the same block.

Type: Grant

Filed: January 26, 2018

Date of Patent: June 30, 2020

Assignee: Intel Corporation

Inventors: Karthikeyan Avudaiyappan, Sourabh Alurkar
Systems and methods for non-blocking implementation of cache flush instructions

Patent number: 10585804

Abstract: Systems and methods for non-blocking implementation of cache flush instructions are disclosed. As a part of a method, data is accessed that is received in a write-back data holding buffer from a cache flushing operation, the data is flagged with a processor identifier and a serialization flag, and responsive to the flagging, the cache is notified that the cache flush is completed. Subsequent to the notifying, access is provided to data then present in the write-back data holding buffer to determine if data then present in the write-back data holding buffer is flagged.

Type: Grant

Filed: November 7, 2017

Date of Patent: March 10, 2020

Assignee: Intel Corporation

Inventors: Karthikeyan Avudaiyappan, Mohammad Abdallah
Methods and systems for managing synonyms in virtually indexed physically tagged caches

Patent number: 10565113

Abstract: Methods and systems for managing synonyms in VIPT caches are disclosed. A method includes tracking lines of a copied cache using a directory, examining a specified bit of a virtual address that is associated with a load request and determining its status and making an entry in one of a plurality of parts of the directory based on the status of the specified bit of the virtual address that is examined. The method further includes updating one of, and invalidating the other of, a cache line that is associated with the virtual address that is stored in a first index of the copied cache, and a cache line that is associated with a synonym of the virtual address that is stored at a second index of the copied cache, upon receiving a request to update a physical address associated with the virtual address.

Type: Grant

Filed: August 13, 2015

Date of Patent: February 18, 2020

Assignee: INTEL CORPORATION

Inventor: Karthikeyan Avudaiyappan
Systems and methods for acquiring data for loads at different access times from hierarchical sources using a load queue as a temporary storage buffer and completing the load early

Patent number: 10552334

Abstract: A method and system acquires cache line data associated with a load from respective hierarchical cache data storage components. As a part of the method and system, a store queue is accessed for one or more portions of a cache line associated with the load, and, if the one or more portions of the cache line is held in the store queue, the one or more portions of the cache line is stored in a load queue location associated with the load. The load is completed if the one or more portions of the cache line stored in the load queue location includes all portions of the cache line associated with the load.

Type: Grant

Filed: March 24, 2017

Date of Patent: February 4, 2020

Assignee: INTEL CORPORATION

Inventors: Karthikeyan Avudaiyappan, Paul G. Chan
Systems and methods for faster read after write forwarding using a virtual address

Patent number: 10402322

Abstract: Methods for read after write forwarding using a virtual address are disclosed. A method includes determining when a virtual address has been remapped from corresponding to a first physical address to a second physical address and determining if all stores occupying a store queue before the remapping have been retired from the store queue. Loads that are younger than the stores that occupied the store queue before the remapping are prevented from being dispatched and executed until the stores that occupied the store queue before the remapping have left the store queue and become globally visible.

Type: Grant

Filed: August 25, 2017

Date of Patent: September 3, 2019

Assignee: Intel Corporation

Inventors: Karthikeyan Avudaiyappan, Paul Chan
Systems and methods for maintaining the coherency of a store coalescing cache and a load cache

Patent number: 10346302

Abstract: A method for maintaining the coherency of a store coalescing cache and a load cache is disclosed. As a part of the method, responsive to a write-back of an entry from a level one store coalescing cache to a level two cache, the entry is written into the level two cache and into the level one load cache. The writing of the entry into the level two cache and into the level one load cache is executed at the speed of access of the level two cache.

Type: Grant

Filed: July 19, 2017

Date of Patent: July 9, 2019

Assignee: Intel Corporation

Inventors: Karthikeyan Avudaiyappan, Mohammad Abdallah
Systems and methods for accessing a unified translation lookaside buffer

Patent number: 10310987

Abstract: Systems and methods for accessing a unified translation lookaside buffer (TLB) are disclosed. A method includes receiving an indicator of a level one translation lookaside buffer (L1TLB) miss corresponding to a request for a virtual address to physical address translation, searching a cache that includes virtual addresses and page sizes that correspond to translation table entries (TTEs) that have been evicted from the L1TLB, where a page size is identified, and searching a second level TLB and identifying a physical address that is contained in the second level TLB. Access is provided to the identified physical address.

Type: Grant

Filed: August 15, 2017

Date of Patent: June 4, 2019

Assignee: INTEL CORPORATION

Inventors: Karthikeyan Avudaiyappan, Mohammad Abdallah
Systems and methods for invasive debug of a processor without processor execution of instructions

Patent number: 10296432

Abstract: Methods for invasive debug of a processor without processor execution of instructions are disclosed. As a part of a method, a memory mapped I/O of the processor is accessed using a debug bus and an operation is initiated that causes a debug port to gain access to registers of the processor using the memory mapped I/O. The invasive debug of the processor is executed from the debug port via registers of the processor.

Type: Grant

Filed: May 3, 2017

Date of Patent: May 21, 2019

Assignee: INTEL CORPORATION

Inventors: Karthikeyan Avudaiyappan, Brian McGee
Systems and methods for implementing weak stream software data and instruction prefetching using a hardware data prefetcher

Patent number: 10255187

Abstract: A method for weak stream software data and instruction prefetching using a hardware data prefetcher is disclosed. A method includes, determining if software includes software prefetch instructions, using a hardware data prefetcher, and, accessing the software prefetch instructions if the software includes software prefetch instructions. Using the hardware data prefetcher, weak stream software data and instruction prefetching operations are executed based on the software prefetch instructions, free of training operations.

Type: Grant

Filed: May 3, 2016

Date of Patent: April 9, 2019

Assignee: Intel Corporation

Inventors: Karthikeyan Avudaiyappan, Mohammad Abdallah
Methods, systems and apparatus for predicting the way of a set associative cache

Patent number: 10248570

Abstract: A method for predicting a way of a set associative shadow cache is disclosed. As a part of a method, a request to fetch a first far taken branch instruction of a first cache line from an instruction cache is received, and responsive to a hit in the instruction cache, a predicted way is selected from a way array using a way that corresponds to the hit in the instruction cache. A second cache line is selected from a shadow cache using the predicted way and the first cache line and the second cache line are forwarded in the same clock cycle.

Type: Grant

Filed: January 4, 2018

Date of Patent: April 2, 2019

Assignee: Intel Corporation

Inventors: Mohammad Abdallah, Ravishankar Rao, Karthikeyan Avudaiyappan
Systems and methods for flushing a cache with modified data

Patent number: 10210101

Abstract: Systems and methods for flushing a cache with modified data are disclosed. Responsive to a request to flush data from a cache with modified data to a next level cache that does not include the cache with modified data, the cache with modified data is accessed using an index and a way and an address associated with the index and the way is secured. Using the address, the cache with modified data is accessed a second time and an entry that is associated with the address is retrieved from the cache with modified data. The entry is placed into a location of the next level cache.

Type: Grant

Filed: November 27, 2017

Date of Patent: February 19, 2019

Assignee: Intel Corporation

Inventors: Karthikeyan Avudaiyappan, Mohammad Abdallah

1 2 3 4 5 next