Patents by Inventor Andreas Moshovos
Andreas Moshovos has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).
-
Patent number: 12182621Abstract: A system and method for using sparsity to accelerate deep learning networks. The method includes: communicating a bit vector to a scheduler identifying which values in an input tensor are non-zero; for each lane of the input tensor, determining which values are to be communicated for multiply-accumulate (MAC) operations, the determination including directing performance of one of: communicating the current value in the lane; communicating the next value in the same lane where such value is non-zero; communicating a value from a step ahead in time where such value is non-zero; and communicating a value from a neighboring lane where such value is non-zero; and outputting the values of the MAC operations.Type: GrantFiled: July 16, 2021Date of Patent: December 31, 2024Assignee: THE GOVERNING COUNCIL OF THE UNIVERSITY OF TORONTOInventors: Mostafa Mahmoud, Andreas Moshovos
-
Patent number: 12118212Abstract: A system and method for memory compression for deep learning networks. The method includes: compacting an input data stream by identifying a bit width necessary to accommodate the value from the input data stream with the highest magnitude; storing a least significant bits of the input data stream in a first memory store, the number of bits equal to the bit width, wherein if the value requires more bits than those currently left unused in the first memory store, the remaining bits are written into a second memory store; and outputting the value of the first memory store, as a consecutive part of a compressed data stream, with an associated width of the data in the first memory store when the first memory store becomes full and copying the value of the second memory store to the first memory store; and decompressing the compressed data stream.Type: GrantFiled: November 10, 2022Date of Patent: October 15, 2024Assignee: THE GOVERNING COUNCIL OF THE UNIVERSITY OF TORONTOInventors: Isak Edo Vivancos, Andreas Moshovos, Sayeh Sharifymoghaddam, Alberto Delmas Lascorz
-
Patent number: 11928566Abstract: There is provided a system and method for compression and decompression of a data stream used by machine learning networks. The method including: encoding each value in the data stream, including: determining a mapping to one of a plurality of non-overlapping ranges, each value encoded as a symbol representative of the range and a corresponding offset; and arithmetically coding the symbol using a probability count; storing a compressed data stream including the arithmetically coded symbols and the corresponding offsets; and decoding the compressed data stream with arithmetic decoding using the probability count, the arithmetic decoded symbols use the offset bits to arrive at a decoded data stream; and communicating the decoded data stream for use by the machine learning networks.Type: GrantFiled: January 11, 2023Date of Patent: March 12, 2024Inventors: Alberto Delmas Lascorz, Andreas Moshovos
-
Publication number: 20240005213Abstract: There is provided a system and method for compression and decompression of a data stream used by machine learning networks. The method including: encoding each value in the data stream, including: determining a mapping to one of a plurality of non-overlapping ranges, each value encoded as a symbol representative of the range and a corresponding offset; and arithmetically coding the symbol using a probability count; storing a compressed data stream including the arithmetically coded symbols and the corresponding offsets; and decoding the compressed data stream with arithmetic decoding using the probability count, the arithmetic decoded symbols use the offset bits to arrive at a decoded data stream; and communicating the decoded data stream for use by the machine learning networks.Type: ApplicationFiled: September 14, 2023Publication date: January 4, 2024Inventors: Alberto DELMAS LASCORZ, Andreas MOSHOVOS
-
Patent number: 11836971Abstract: A processor-implemented method implementing a convolution neural network includes: determining a plurality of differential groups by grouping a plurality of raw windows of an input feature map into the plurality of differential groups; determining differential windows by performing, for each respective differential group of the differential groups, a differential operation between the raw windows of the respective differential group; determining a reference element of an output feature map corresponding to a reference raw window among the raw windows by performing a convolution operation between a kernel and the reference raw window; and determining remaining elements of the output feature map by performing a reference element summation operation based on the reference element and each of a plurality of convolution operation results determined by performing respective convolution operations between the kernel and each of the differential windows.Type: GrantFiled: August 23, 2019Date of Patent: December 5, 2023Assignee: Samsung Electronics Co., Ltd.Inventors: Mostafa Mahmoud, Andreas Moshovos
-
Publication number: 20230334285Abstract: A method for memory storage including storing a neural network by storing values of the neural network each as a reference to a representative value; and, in some embodiments, storing additional values of the neural network. Each of the representative values can be generated by assigning each of the values of the neural network to a cluster; and for each cluster, selecting a centroid from the cluster. The method can include performing one or more multiply-accumulate operations A1B1+ . . . +AnBn on input vectors A and input vectors B, by accumulating input vectors A to an accumulated sum of input vectors A per input vector B having the same representative value and subsequently multiplying each of the accumulated sums of input vectors A by the representative value of the input vector B. A system is also described, and a method for configuring memory according to a data structure.Type: ApplicationFiled: September 21, 2021Publication date: October 19, 2023Inventors: Andreas Moshovos, Ali Hadi Zadeh, Isak Edo Vivancos, Omar Mohamed Awad
-
Publication number: 20230297337Abstract: A system and method for accelerating multiply-accumulate (MAC) floating-point units during training of deep learning networks. The method including: receiving a first input data stream A and a second input data stream B; adding exponents of the first data stream A and the second data stream B in pairs to produce product exponents; determining a maximum exponent using a comparator; determining a number of bits by which each significand in the second data stream has to be shifted prior to accumulation by adding product exponent deltas to the corresponding term in the first data stream and using an adder tree to reduce the operands in the second data stream into a single partial sum; adding the partial sum to a corresponding aligned value using the maximum exponent to determine accumulated values; and outputting the accumulated values.Type: ApplicationFiled: July 19, 2021Publication date: September 21, 2023Inventors: Omar MOHAMED AWAD, Mostafa MAHMOUD, Andreas MOSHOVOS
-
Publication number: 20230273828Abstract: A system and method for using sparsity to accelerate deep learning networks. The method includes: communicating a bit vector to a scheduler identifying which values in an input tensor are non-zero; for each lane of the input tensor, determining which values are to be communicated for multiply-accumulate (MAC) operations, the determination including directing performance of one of: communicating the current value in the lane; communicating the next value in the same lane where such value is non-zero; communicating a value from a step ahead in time where such value is non-zero; and communicating a value from a neighboring lane where such value is non-zero; and outputting the values of the MAC operations.Type: ApplicationFiled: July 16, 2021Publication date: August 31, 2023Inventors: Mostafa MAHMOUD, Andreas MOSHOVOS
-
Publication number: 20230267376Abstract: There is provided a system and method for compression and decompression of a data stream used by machine learning networks. The method including: encoding each value in the data stream, including: determining a mapping to one of a plurality of non-overlapping ranges, each value encoded as a symbol representative of the range and a corresponding offset; and arithmetically coding the symbol using a probability count; storing a compressed data stream including the arithmetically coded symbols and the corresponding offsets; and decoding the compressed data stream with arithmetic decoding using the probability count, the arithmetic decoded symbols use the offset bits to arrive at a decoded data stream; and communicating the decoded data stream for use by the machine learning networks.Type: ApplicationFiled: January 11, 2023Publication date: August 24, 2023Inventors: Alberto DELMAS LASCORZ, Andreas Moshovos
-
Publication number: 20230186065Abstract: A system for bit-serial computation in a neural network is described. The system may be embodied on an integrated circuit and include one or more bit-serial tiles for performing bit-serial computations in which each bit-serial tile receives input neurons and synapses, and communicates output neurons. Also included is an activation memory for storing the neurons and a dispatcher. The dispatcher reads neurons and synapses from memory and communicates either the neurons or the synapses bit-serially to the one or more bit-serial tiles. The other of the neurons or the synapses are communicated bit-parallelly to the one or more bit-serial tiles, or according to a further embodiment, may also be communicated bit-serially to the one or more bit-serial tiles.Type: ApplicationFiled: February 10, 2023Publication date: June 15, 2023Applicant: Samsung Electronics Co., Ltd.Inventors: Patrick Judd, Jorge Albericio, Alberto Delmas Lascorz, Andreas Moshovos, Sayeh Sharify
-
Publication number: 20230131251Abstract: A system and method for memory compression for deep learning networks. The method includes: compacting an input data stream by identifying a bit width necessary to accommodate the value from the input data stream with the highest magnitude; storing a least significant bits of the input data stream in a first memory store, the number of bits equal to the bit width, wherein if the value requires more bits than those currently left unused in the first memory store, the remaining bits are written into a second memory store; and outputting the value of the first memory store, as a consecutive part of a compressed data stream, with an associated width of the data in the first memory store when the first memory store becomes full and copying the value of the second memory store to the first memory store; and decompressing the compressed data stream.Type: ApplicationFiled: November 10, 2022Publication date: April 27, 2023Inventors: Isak EDO VIVANCOS, Andreas MOSHOVOS, Sayeh SHARIFYMOGHADDAM, Alberto DELMAS LASCORZ
-
Patent number: 11610100Abstract: A system for bit-serial computation in a neural network is described. The system may be embodied on an integrated circuit and include one or more bit-serial tiles for performing bit-serial computations in which each bit-serial tile receives input neurons and synapses, and communicates output neurons. Also included is an activation memory for storing the neurons and a dispatcher and a reducer. The dispatcher reads neurons and synapses from memory and communicates either the neurons or the synapses bit-serially to the one or more bit-serial tiles. The other of the neurons or the synapses are communicated bit-parallelly to the one or more bit-serial tiles, or according to a further embodiment, may also be communicated bit-serially to the one or more bit-serial tiles. The reducer receives the output neurons from the one or more tiles, and communicates the output neurons to the activation memory.Type: GrantFiled: July 7, 2019Date of Patent: March 21, 2023Assignee: Samsung Electronics Co., Ltd.Inventors: Patrick Judd, Jorge Albericio, Alberto Delmas Lascorz, Andreas Moshovos, Sayeh Sharifymoghaddam
-
Publication number: 20230070243Abstract: There is provided a system and method for template matching for neural population pattern detection. The method including: receiving neuron signal streams and serially associating a bit indicator with spikes from each neuron signal stream; serially determining a first summation (S1), a second summation (S2), and a third summation (S3) on the received neuron signals, the first summation including an element-wise multiply-sum using a time-dependent sliding indicator window on the received neuron signal streams and a template, the second summation including an accumulation using the time-dependent sliding indicator window, and the third summation including a sum of squares using the time-dependent sliding indicator window; and determining a correlation value associated with a match of the template with the received neural signal streams, the correlation value determined by combining the first summation, the second summation, and the third summation with predetermined constants associated with the template.Type: ApplicationFiled: July 20, 2022Publication date: March 9, 2023Inventors: Ameer ABD ELHADI, Ciaran Brochan BANNON, Andreas MOSHOVOS, Hendrik STEENLAND
-
Publication number: 20220327367Abstract: Described is a system, integrated circuit and method for reducing ineffectual computations in the processing of layers in a neural network. One or more tiles perform computations where each tile receives input neurons, offsets and synapses, and where each input neuron has an associated offset. Each tile generates output neurons, and there is also an activation memory for storing neurons in communication with the tiles via a dispatcher and an encoder. The dispatcher reads neurons from the activation memory and communicates the neurons to the tiles and reads synapses from a memory and communicates the synapses to the tiles. The encoder receives the output neurons from the tiles, encodes them and communicates the output neurons to the activation memory. The offsets are processed by the tiles in order to perform computations only on non-zero neurons. Optionally, synapses may be similarly processed to skip ineffectual operations.Type: ApplicationFiled: June 22, 2022Publication date: October 13, 2022Applicant: Samsung Electronics Co., Ltd.Inventors: Patrick Judd, Jorge Albericio, Alberto Delmas Lascorz, Andreas Moshovos, Sayeh Sharifymoghaddam
-
Patent number: 11423289Abstract: Described is a system, integrated circuit and method for reducing ineffectual computations in the processing of layers in a neural network. One or more tiles perform computations where each tile receives input neurons, offsets and synapses, and where each input neuron has an associated offset. Each tile generates output neurons, and there is also an activation memory for storing neurons in communication with the tiles via a dispatcher and an encoder. The dispatcher reads neurons from the activation memory and communicates the neurons to the tiles and reads synapses from a memory and communicates the synapses to the tiles. The encoder receives the output neurons from the tiles, encodes them and communicates the output neurons to the activation memory. The offsets are processed by the tiles in order to perform computations only on non-zero neurons. Optionally, synapses may be similarly processed to skip ineffectual operations.Type: GrantFiled: June 14, 2017Date of Patent: August 23, 2022Assignee: Samsung Electronics Co., Ltd.Inventors: Patrick Judd, Jorge Albericio, Alberto Delmas Lascorz, Andreas Moshovos, Sayeh Sharifymoghaddam
-
Publication number: 20220092382Abstract: A method for memory storage including storing a neural network by storing values of the neural network each as a reference to a representative value; and, in some embodiments, storing additional values of the neural network. Each of the representative values can be generated by assigning each of the values of the neural network to a cluster; and for each cluster, selecting a centroid from the cluster. The method can include performing one or more multiply-accumulate operations A1B1+ . . . +AnBn on input vectors A and input vectors B, by accumulating input vectors A to an accumulated sum of input vectors A per input vector B having the same representative value and subsequently multiplying each of the accumulated sums of input vectors A by the representative value of the input vector B. A system is also described, as well as a method for configuring memory according to a data structure.Type: ApplicationFiled: December 22, 2020Publication date: March 24, 2022Inventors: Andreas Moshovos, Ali Hadi Zadeh, Isak Edo Vivancos, Omar Mohamed Awad
-
Publication number: 20210125046Abstract: Described is a neural network accelerator tile. It includes an activation memory interface for interfacing with an activation memory to receive a set of activation representations and a weight memory interface for interfacing with a weight memory to receive a set of weight representations, and a processing element. The processing element is configured to implement a one-hot encoder, a histogrammer, an aligner, a reducer, and an accumulation sub-element which process the set of activation representations and the set of weight representations to produce a set of output representations.Type: ApplicationFiled: April 25, 2019Publication date: April 29, 2021Inventors: Andreas Moshovos, Mostafa Mahmoud, Sayeh Sharifymoghaddam
-
Publication number: 20210004668Abstract: Described is a neural network accelerator tile for exploiting input sparsity. The tile includes a weight memory to supply each weight lane with a weight and a weight selection metadata, an activation selection unit to receive a set of input activation values and rearrange the set of input activation values to supply each activation lane with a set of rearranged activation values, a set of multiplexers including at least one multiplexer per pair of activation and weight lanes, where each multiplexer is configured to select a combination activation value for the activation lane from the activation lane set of rearranged activation values based on the weight lane weight selection metadata, and a set of combination units including at least one combination unit per multiplexer, where each combination unit is configured to combine the activation lane combination value with the weight lane weight to output a weight lane product.Type: ApplicationFiled: February 15, 2019Publication date: January 7, 2021Inventors: Andreas Moshovos, Alberto Delmas Lascorz, Zisis Poulos, Dylan Malone Stuart, Patrick Judd, Sayeh Sharify, Mostafa Mahmoud, Milos Nikolic, Kevin Chong Man Siu, Jorge Albericio
-
Publication number: 20200125931Abstract: A system for bit-serial computation in a neural network is described. The system may be embodied on an integrated circuit and include one or more bit-serial tiles for performing bit-serial computations in which each bit-serial tile receives input neurons and synapses, and communicates output neurons. Also included is an activation memory for storing the neurons and a dispatcher and a reducer. The dispatcher reads neurons and synapses from memory and communicates either the neurons or the synapses bit-serially to the one or more bit-serial tiles. The other of the neurons or the synapses are communicated bit-parallelly to the one or more bit-serial tiles, or according to a further embodiment, may also be communicated bit-serially to the one or more bit-serial tiles. The reducer receives the output neurons from the one or more tiles, and communicates the output neurons to the activation memory.Type: ApplicationFiled: July 7, 2019Publication date: April 23, 2020Inventors: Patrick Judd, Jorge Albericio, Alberto Delmas Lascorz, Andreas Moshovos, Sayeh Sharify
-
Publication number: 20200065646Abstract: A processor-implemented method implementing a convolution neural network includes: determining a plurality of differential groups by grouping a plurality of raw windows of an input feature map into the plurality of differential groups; determining differential windows by performing, for each respective differential group of the differential groups, a differential operation between the raw windows of the respective differential group; determining a reference element of an output feature map corresponding to a reference raw window among the raw windows by performing a convolution operation between a kernel and the reference raw window; and determining remaining elements of the output feature map by performing a reference element summation operation based on the reference element and each of a plurality of convolution operation results determined by performing respective convolution operations between the kernel and each of the differential windows.Type: ApplicationFiled: August 23, 2019Publication date: February 27, 2020Applicants: Samsung Electronics Co., Ltd., The Governing Council of the University of TorontoInventors: Mostafa MAHMOUD, Andreas MOSHOVOS