Abstract: Embodiments of the invention may execute a NN by executing sub-tensor columns, each sub-tensor column including computations from portions of a layers of the NN, and each sub-tensor column performing computations entirely within a first layer of cache (e.g. L2 in one embodiment) and saving its output entirely within a second layer of cache (e.g. L3 in one embodiment). Embodiments may include partitioning the execution of a NN by partitioning the execution of the NN into sub-tensor columns, each sub-tensor column including computations from portions of layers of the NN, each sub-tensor column performing computations entirely within a first layer of cache and saving its output entirely within a second layer of cache.
Type:
Grant
Filed:
December 10, 2021
Date of Patent:
January 17, 2023
Assignee:
Neuralmagic Ltd.
Inventors:
Alexander Matveev, Nir Shavit, Govind Ramnarayan