Patents Assigned to Neuralmagic Inc.
-
Patent number: 11960982Abstract: A system and method may partition and/or execute a NN, by, for a graph including nodes and hyper edges, each node representing a data item in the NN and each hyper edge representing an operation in the NN, identifying a deep tensor column comprising a subset of the nodes and a subset of the hyper edges, such that the operations in the deep tensor column, when executed, use only data which fits within a preselected cache.Type: GrantFiled: October 21, 2022Date of Patent: April 16, 2024Assignee: NEURALMAGIC, INC.Inventors: Alexander Matveev, Nir Shavit, Govind Ramnarayan, Tyler Michael Smith, Sage Moore
-
Patent number: 11960934Abstract: A method and system for computing one or more outputs of a neural network having a plurality of layers is provided. The method and system can include determining a plurality of sub-computations from total computations of the neural network to execute in parallel wherein the computations to execute in parallel involve computations from multiple layers. The method and system also can also include avoiding repeating overlapped computations and/or multiple memory reads and writes during execution.Type: GrantFiled: August 8, 2022Date of Patent: April 16, 2024Assignee: NEURALMAGIC, INC.Inventors: Alexander Matveev, Nir Shavit
-
Patent number: 11797855Abstract: A system and method of accelerating execution of a NN model, by at least one processor may include: receiving a first matrix A, representing elements of a kernel K of the NN model and a second matrix B, representing elements of an input I to kernel K; producing from matrix A, a group-sparse matrix A?, comprising G tensors of elements. The number of elements in each tensor is defined by, or equal to a number of entries in each index of an input tensor register used for a specific Single Instruction Multiple Data (SIMD) tensor operation, and all elements of A? outside said G tensors are null. The system and method may further include executing kernel K on input I, by performing at least one computation of the SIMD tensor operation, having as operands elements of a tensor of the G tensors and corresponding elements of the B matrix.Type: GrantFiled: November 4, 2021Date of Patent: October 24, 2023Assignee: Neuralmagic, Inc.Inventors: Alexander Matveev, Dan Alistarh, Justin Kopinsky, Rati Gelashvili, Mark Kurtz, Nir Shavit
-
Patent number: 11636343Abstract: Training a neural network (NN) may include training a NN N, and for S, a version of N to be sparsified (e.g. a copy of N), removing NN elements from S to create a sparsified version of S, and training S using outputs from N (e.g. “distillation”). A boosting or reintroduction phase may follow sparsification: training a NN may include for a trained NN N and S, a sparsified version of N, re-introducing NN elements previously removed from S, and training S using outputs from N. The boosting phase need not use a NN sparsified by “distillation.” Training and sparsification, or training and reintroduction, may be performed iteratively or over repetitions.Type: GrantFiled: September 26, 2019Date of Patent: April 25, 2023Assignee: Neuralmagic Inc.Inventor: Dan Alistarh
-
Patent number: 11544559Abstract: A system and method of executing a convolution layer of a neural network may include: (a) selecting an output spatial position (OSP) of an output matrix data element of the convolution layer; (b) selecting, based on the selected OSP, a non-zero input element of an input matrix data element; (c) producing, based on the selected OSP, a vector of kernel elements from a kernel matrix data element; (d) performing a vectoral multiplication operation of the selected non-zero input element and the vector of kernel elements, and accumulating a product of the vectoral multiplication in a vector register of a processor; (e) repeating (c) and (d) with subsequent non-zero input elements and corresponding vectors of kernel elements to obtain an outcome of the convolution of the selected OSP; and (f) repeating (a) through (e) with subsequent selection of OSPs, to obtain an outcome of the convolution layer.Type: GrantFiled: January 8, 2020Date of Patent: January 3, 2023Assignee: Neuralmagic Inc.Inventor: Justin Kopinsky
-
Publication number: 20220383068Abstract: A method and system for computing one or more outputs of a neural network having a plurality of layers is provided. The method and system can include determining a plurality of sub-computations from total computations of the neural network to execute in parallel wherein the computations to execute in parallel involve computations from multiple layers. The method and system also can also include avoiding repeating overlapped computations and/or multiple memory reads and writes during execution.Type: ApplicationFiled: August 8, 2022Publication date: December 1, 2022Applicant: Neuralmagic Inc.Inventors: Alexander MATVEEV, Nir SHAVIT
-
Patent number: 11449363Abstract: A method and system for computing one or more outputs of a neural network having a plurality of layers is provided. The method and system can include determining a plurality of sub-computations from total computations of the neural network to execute in parallel wherein the computations to execute in parallel involve computations from multiple layers. The method and system also can also include avoiding repeating overlapped computations and/or multiple memory reads and writes during execution.Type: GrantFiled: May 30, 2019Date of Patent: September 20, 2022Assignee: Neuralmagic Inc.Inventors: Alexander Matveev, Nir Shavit
-
Publication number: 20220058486Abstract: A system and method of accelerating execution of a NN model, by at least one processor may include: receiving a first matrix A, representing elements of a kernel K of the NN model and a second matrix B, representing elements of an input I to kernel K; producing from matrix A, a group-sparse matrix A?, comprising G tensors of elements. The number of elements in each tensor is defined by, or equal to a number of entries in each index of an input tensor register used for a specific Single Instruction Multiple Data (SIMD) tensor operation, and all elements of A? outside said G tensors are null. The system and method may further include executing kernel K on input I, by performing at least one computation of the SIMD tensor operation, having as operands elements of a tensor of the G tensors and corresponding elements of the B matrix.Type: ApplicationFiled: November 4, 2021Publication date: February 24, 2022Applicant: Neuralmagic Inc.Inventors: Alexander MATVEEV, Dan ALISTARH, Justin KOPINSKY, Rati GELASHVILI, Mark KURTZ, Nir SHAVIT
-
Patent number: 11216732Abstract: A system and method may generate code to be used when executing neural networks (NNs), for example convolutional neural networks (CNNs) which may include one or more convolutional layers. For at least one convolutional layer, for each non-zero element in a kernel tensor or matrix associated with the convolutional layer, instructions may be generated or issued. For example, for each non-zero element, a vector broadcast instruction may be generated, and a fused multiply-add (FMA) instruction may be generated, having as parameters a register representing a portion of the output for the convolutional layer, a register storing input data for the convolutional layer, and a register or reference to memory storing the non-zero element. The software or code produced may be executed during convolutional operations, for example as part of a larger application such as a NN inference application.Type: GrantFiled: February 18, 2021Date of Patent: January 4, 2022Assignee: NEURALMAGIC INC.Inventors: Aleksandar Zlateski, Justin Kopinsky
-
Patent number: 11195095Abstract: A system and method of accelerating execution of a NN model, by at least one processor may include: receiving a first matrix A, representing elements of a kernel K of the NN model and a second matrix B, representing elements of an input I to kernel K; producing from matrix A, a group-sparse matrix A?, comprising G tensors of elements. The number of elements in each tensor is defined by, or equal to a number of entries in each index of an input tensor register used for a specific Single Instruction Multiple Data (SIMD) tensor operation, and all elements of A? outside said G tensors are null. The system and method may further include executing kernel K on input I, by performing at least one computation of the SIMD tensor operation, having as operands elements of a tensor of the G tensors and corresponding elements of the B matrix.Type: GrantFiled: August 5, 2020Date of Patent: December 7, 2021Assignee: NEURALMAGIC INC.Inventors: Alexander Matveev, Dan Alistarh, Justin Kopinsky, Rati Gelashvili, Mark Kurtz, Nir Shavit
-
Publication number: 20210216872Abstract: A system and a method of training a Neural network (NN) model may include, receiving a pretrained NN model, that may include a plurality of layers, each associated with an activation matrix; selecting at least one, and performing an iterative training process on the layer. The iterative training process may include, applying an activation threshold to the activation matrix of the layer; measuring an accuracy value of the NN model; retraining the layer, while using a bimodal regularization function of one or more activation matrices of the NN model; and repeating the applying, measuring and retraining, while each repetition uses different activation threshold values. This repetition may be repeated until a maximal value of the activation threshold, where the NN model still converges, is found.Type: ApplicationFiled: January 14, 2021Publication date: July 15, 2021Applicant: Neuralmagic Inc.Inventors: Mark KURTZ, Dan ALISTARH
-
Publication number: 20210201124Abstract: A computer processor may include a number of cores, a shared cache shared among the cores, and a local cache associated with each core and used by that core only. Input data for a neural network (NN) layer may be partitioned into a set of tiles of size T×T, and the tile set may be partitioned into blocks of R tiles. For each block, a core may perform a transform operation on the tiles to produce transformed data matrices fitting in a local cache, and a set of multiply operations, each multiply operation using a transformed data matrix and a transformed kernel matrix from a set of transformed kernel matrices. The set of transformed kernel matrices may fit in the shared cache. The result of at least one of the multiply operations may be stored in a location used to store a transformed data matrix.Type: ApplicationFiled: August 27, 2019Publication date: July 1, 2021Applicant: Neuralmagic Inc.Inventor: Rati GELASHVILI
-
Publication number: 20210182676Abstract: A system and method may generate code to be used when executing neural networks (NNs), for example convolutional neural networks (CNNs) which may include one or more convolutional layers. For at least one convolutional layer, for each non-zero element in a kernel tensor or matrix associated with the convolutional layer, instructions may be generated or issued. For example, for each non-zero element, a vector broadcast instruction may be generated, and a fused multiply-add (FMA) instruction may be generated, having as parameters a register representing a portion of the output for the convolutional layer, a register storing input data for the convolutional layer, and a register or reference to memory storing the non-zero element. The software or code produced may be executed during convolutional operations, for example as part of a larger application such as a NN inference application.Type: ApplicationFiled: February 18, 2021Publication date: June 17, 2021Applicant: Neuralmagic Inc.Inventors: Aleksandar ZLATESKI, Justin KOPINSKY
-
Patent number: 10963787Abstract: A system and method may generate code to be used when executing neural networks (NNs), for example convolutional neural networks (CNNs) which may include one or more convolutional layers. For at least one convolutional layer, for each non-zero element in a kernel tensor or matrix associated with the convolutional layer, instructions may be generated or issued. For example, for each non-zero element, a vector broadcast instruction may be generated, and a fused multiply-add (FMA) instruction may be generated, having as parameters a register representing a portion of the output for the convolutional layer, a register storing input data for the convolutional layer, and a register or reference to memory storing the non-zero element. The software or code produced may be executed during convolutional operations, for example as part of a larger application such as a NN inference application.Type: GrantFiled: January 24, 2020Date of Patent: March 30, 2021Assignee: NEURALMAGIC INC.Inventors: Aleksandar Zlateski, Justin Kopinsky
-
Publication number: 20210042624Abstract: A system and method of accelerating execution of a NN model, by at least one processor may include: receiving a first matrix A, representing elements of a kernel K of the NN model and a second matrix B, representing elements of an input I to kernel K; producing from matrix A, a group-sparse matrix A?, comprising G tensors of elements. The number of elements in each tensor is defined by, or equal to a number of entries in each index of an input tensor register used for a specific Single Instruction Multiple Data (SIMD) tensor operation, and all elements of A? outside said G tensors are null. The system and method may further include executing kernel K on input I, by performing at least one computation of the SIMD tensor operation, having as operands elements of a tensor of the G tensors and corresponding elements of the B matrix.Type: ApplicationFiled: August 5, 2020Publication date: February 11, 2021Applicant: Neuralmagic Inc.Inventors: Alexander MATVEEV, Dan ALISTARH, Justin KOPINSKY, Rati GELASHVILI, Mark KURTZ, Nir SHAVIT
-
Patent number: 10915816Abstract: A system and method of inferring a neural network (NN) on one or more target computing devices. The NN may include a plurality of layers, where at least one layer includes one or more kernels. Embodiments may include: receiving a data structure representing the NN; analyzing the data structure to produce one or more tasks, where each task may include computations pertaining to a kernel of the NN; selecting a sparse version of at least one kernel and replacing the at least one kernel with the sparse version; and compiling the one or more tasks to produce one or more respective tensor columns, The one or more tensor columns are adapted to fit in respective one or more cache memories of the one or more target computing devices, and include task instruction code that represents at least one computation of the kernel of the NN.Type: GrantFiled: September 18, 2020Date of Patent: February 9, 2021Assignee: NEURALMAGIC INC.Inventors: Alexander Matveev, Nir Shavit
-
Patent number: 10902318Abstract: A system and method for convolutional layer in convolutional neural networks is provided. The convolution is performed via a transformation that includes relocating input, relocating convolution filters and performing an aggregate matrix multiply.Type: GrantFiled: November 6, 2018Date of Patent: January 26, 2021Assignee: NEURALMAGIC INC.Inventors: Alexander Matveev, Nir Shavit
-
Publication number: 20210004684Abstract: A system and method of inferring a neural network (NN) on one or more target computing devices. The NN may include a plurality of layers, where at least one layer includes one or more kernels. Embodiments may include: receiving a data structure representing the NN; analyzing the data structure to produce one or more tasks, where each task may include computations pertaining to a kernel of the NN; selecting a sparse version of at least one kernel and replacing the at least one kernel with the sparse version; and compiling the one or more tasks to produce one or more respective tensor columns, The one or more tensor columns are adapted to fit in respective one or more cache memories of the one or more target computing devices, and include task instruction code that represents at least one computation of the kernel of the NN.Type: ApplicationFiled: September 18, 2020Publication date: January 7, 2021Applicant: Neuralmagic Inc.Inventors: Alexander MATVEEV, Nir SHAVIT
-
Patent number: 10832133Abstract: A system and method of inferring a neural network (NN) on one or more target computing devices. The NN may include a plurality of layers, where at least one layer includes one or more kernels. Embodiments may include: receiving a data structure representing the NN; analyzing the data structure to produce one or more tasks, where each task may include computations pertaining to a kernel of the NN; selecting a sparse version of at least one kernel and replacing the at least one kernel with the sparse version; and compiling the one or more tasks to produce one or more respective tensor columns, The one or more tensor columns are adapted to fit in respective one or more cache memories of the one or more target computing devices, and include task instruction code that represents at least one computation of the kernel of the NN.Type: GrantFiled: January 24, 2020Date of Patent: November 10, 2020Assignee: NEURALMAGIC INC.Inventors: Alexander Matveev, Nir Shavit
-
Publication number: 20200218978Abstract: A system and method of executing a convolution layer of a neural network may include: (a) selecting an output spatial position (OSP) of an output matrix data element of the convolution layer; (b) selecting, based on the selected OSP, a non-zero input element of an input matrix data element; (c) producing, based on the selected OSP, a vector of kernel elements from a kernel matrix data element; (d) performing a vectoral multiplication operation of the selected non-zero input element and the vector of kernel elements, and accumulating a product of the vectoral multiplication in a vector register of a processor; (e) repeating (c) and (d) with subsequent non-zero input elements and corresponding vectors of kernel elements to obtain an outcome of the convolution of the selected OSP; and (f) repeating (a) through (e) with subsequent selection of OSPs, to obtain an outcome of the convolution layer.Type: ApplicationFiled: January 8, 2020Publication date: July 9, 2020Applicant: Neuralmagic Inc.Inventor: Justin KOPINSKY