Patents by Inventor Andreas Moshovos

Andreas Moshovos has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

System and method for using sparsity to accelerate deep learning networks

Patent number: 12182621

Abstract: A system and method for using sparsity to accelerate deep learning networks. The method includes: communicating a bit vector to a scheduler identifying which values in an input tensor are non-zero; for each lane of the input tensor, determining which values are to be communicated for multiply-accumulate (MAC) operations, the determination including directing performance of one of: communicating the current value in the lane; communicating the next value in the same lane where such value is non-zero; communicating a value from a step ahead in time where such value is non-zero; and communicating a value from a neighboring lane where such value is non-zero; and outputting the values of the MAC operations.

Type: Grant

Filed: July 16, 2021

Date of Patent: December 31, 2024

Assignee: THE GOVERNING COUNCIL OF THE UNIVERSITY OF TORONTO

Inventors: Mostafa Mahmoud, Andreas Moshovos
System and method for memory compression for deep learning networks

Patent number: 12118212

Abstract: A system and method for memory compression for deep learning networks. The method includes: compacting an input data stream by identifying a bit width necessary to accommodate the value from the input data stream with the highest magnitude; storing a least significant bits of the input data stream in a first memory store, the number of bits equal to the bit width, wherein if the value requires more bits than those currently left unused in the first memory store, the remaining bits are written into a second memory store; and outputting the value of the first memory store, as a consecutive part of a compressed data stream, with an associated width of the data in the first memory store when the first memory store becomes full and copying the value of the second memory store to the first memory store; and decompressing the compressed data stream.

Type: Grant

Filed: November 10, 2022

Date of Patent: October 15, 2024

Assignee: THE GOVERNING COUNCIL OF THE UNIVERSITY OF TORONTO

Inventors: Isak Edo Vivancos, Andreas Moshovos, Sayeh Sharifymoghaddam, Alberto Delmas Lascorz
System and method for off-chip data compression and decompression for machine learning networks

Patent number: 11928566

Abstract: There is provided a system and method for compression and decompression of a data stream used by machine learning networks. The method including: encoding each value in the data stream, including: determining a mapping to one of a plurality of non-overlapping ranges, each value encoded as a symbol representative of the range and a corresponding offset; and arithmetically coding the symbol using a probability count; storing a compressed data stream including the arithmetically coded symbols and the corresponding offsets; and decoding the compressed data stream with arithmetic decoding using the probability count, the arithmetic decoded symbols use the offset bits to arrive at a decoded data stream; and communicating the decoded data stream for use by the machine learning networks.

Type: Grant

Filed: January 11, 2023

Date of Patent: March 12, 2024

Inventors: Alberto Delmas Lascorz, Andreas Moshovos
SYSTEM AND METHOD FOR OFF-CHIP DATA COMPRESSION AND DECOMPRESSION FOR MACHINE LEARNING NETWORKS

Publication number: 20240005213

Abstract: There is provided a system and method for compression and decompression of a data stream used by machine learning networks. The method including: encoding each value in the data stream, including: determining a mapping to one of a plurality of non-overlapping ranges, each value encoded as a symbol representative of the range and a corresponding offset; and arithmetically coding the symbol using a probability count; storing a compressed data stream including the arithmetically coded symbols and the corresponding offsets; and decoding the compressed data stream with arithmetic decoding using the probability count, the arithmetic decoded symbols use the offset bits to arrive at a decoded data stream; and communicating the decoded data stream for use by the machine learning networks.

Type: Application

Filed: September 14, 2023

Publication date: January 4, 2024

Inventors: Alberto DELMAS LASCORZ, Andreas MOSHOVOS
Method and device with convolution neural network processing

Patent number: 11836971

Abstract: A processor-implemented method implementing a convolution neural network includes: determining a plurality of differential groups by grouping a plurality of raw windows of an input feature map into the plurality of differential groups; determining differential windows by performing, for each respective differential group of the differential groups, a differential operation between the raw windows of the respective differential group; determining a reference element of an output feature map corresponding to a reference raw window among the raw windows by performing a convolution operation between a kernel and the reference raw window; and determining remaining elements of the output feature map by performing a reference element summation operation based on the reference element and each of a plurality of convolution operation results determined by performing respective convolution operations between the kernel and each of the differential windows.

Type: Grant

Filed: August 23, 2019

Date of Patent: December 5, 2023

Assignee: Samsung Electronics Co., Ltd.

Inventors: Mostafa Mahmoud, Andreas Moshovos
QUANTIZATION FOR NEURAL NETWORK COMPUTATION

Publication number: 20230334285

Abstract: A method for memory storage including storing a neural network by storing values of the neural network each as a reference to a representative value; and, in some embodiments, storing additional values of the neural network. Each of the representative values can be generated by assigning each of the values of the neural network to a cluster; and for each cluster, selecting a centroid from the cluster. The method can include performing one or more multiply-accumulate operations A1B1+ . . . +AnBn on input vectors A and input vectors B, by accumulating input vectors A to an accumulated sum of input vectors A per input vector B having the same representative value and subsequently multiplying each of the accumulated sums of input vectors A by the representative value of the input vector B. A system is also described, and a method for configuring memory according to a data structure.

Type: Application

Filed: September 21, 2021

Publication date: October 19, 2023

Inventors: Andreas Moshovos, Ali Hadi Zadeh, Isak Edo Vivancos, Omar Mohamed Awad
SYSTEM AND METHOD FOR ACCELERATING TRAINING OF DEEP LEARNING NETWORKS

Publication number: 20230297337

Abstract: A system and method for accelerating multiply-accumulate (MAC) floating-point units during training of deep learning networks. The method including: receiving a first input data stream A and a second input data stream B; adding exponents of the first data stream A and the second data stream B in pairs to produce product exponents; determining a maximum exponent using a comparator; determining a number of bits by which each significand in the second data stream has to be shifted prior to accumulation by adding product exponent deltas to the corresponding term in the first data stream and using an adder tree to reduce the operands in the second data stream into a single partial sum; adding the partial sum to a corresponding aligned value using the maximum exponent to determine accumulated values; and outputting the accumulated values.

Type: Application

Filed: July 19, 2021

Publication date: September 21, 2023

Inventors: Omar MOHAMED AWAD, Mostafa MAHMOUD, Andreas MOSHOVOS
SYSTEM AND METHOD FOR USING SPARSITY TO ACCELERATE DEEP LEARNING NETWORKS

Publication number: 20230273828

Abstract: A system and method for using sparsity to accelerate deep learning networks. The method includes: communicating a bit vector to a scheduler identifying which values in an input tensor are non-zero; for each lane of the input tensor, determining which values are to be communicated for multiply-accumulate (MAC) operations, the determination including directing performance of one of: communicating the current value in the lane; communicating the next value in the same lane where such value is non-zero; communicating a value from a step ahead in time where such value is non-zero; and communicating a value from a neighboring lane where such value is non-zero; and outputting the values of the MAC operations.

Type: Application

Filed: July 16, 2021

Publication date: August 31, 2023

Inventors: Mostafa MAHMOUD, Andreas MOSHOVOS
SYSTEM AND METHOD FOR OFF-CHIP DATA COMPRESSION AND DECOMPRESSION FOR MACHINE LEARNING NETWORKS

Publication number: 20230267376

Abstract: There is provided a system and method for compression and decompression of a data stream used by machine learning networks. The method including: encoding each value in the data stream, including: determining a mapping to one of a plurality of non-overlapping ranges, each value encoded as a symbol representative of the range and a corresponding offset; and arithmetically coding the symbol using a probability count; storing a compressed data stream including the arithmetically coded symbols and the corresponding offsets; and decoding the compressed data stream with arithmetic decoding using the probability count, the arithmetic decoded symbols use the offset bits to arrive at a decoded data stream; and communicating the decoded data stream for use by the machine learning networks.

Type: Application

Filed: January 11, 2023

Publication date: August 24, 2023

Inventors: Alberto DELMAS LASCORZ, Andreas Moshovos
ACCELERATOR FOR DEEP NEURAL NETWORKS

Publication number: 20230186065

Abstract: A system for bit-serial computation in a neural network is described. The system may be embodied on an integrated circuit and include one or more bit-serial tiles for performing bit-serial computations in which each bit-serial tile receives input neurons and synapses, and communicates output neurons. Also included is an activation memory for storing the neurons and a dispatcher. The dispatcher reads neurons and synapses from memory and communicates either the neurons or the synapses bit-serially to the one or more bit-serial tiles. The other of the neurons or the synapses are communicated bit-parallelly to the one or more bit-serial tiles, or according to a further embodiment, may also be communicated bit-serially to the one or more bit-serial tiles.

Type: Application

Filed: February 10, 2023

Publication date: June 15, 2023

Applicant: Samsung Electronics Co., Ltd.

Inventors: Patrick Judd, Jorge Albericio, Alberto Delmas Lascorz, Andreas Moshovos, Sayeh Sharify
SYSTEM AND METHOD FOR MEMORY COMPRESSION FOR DEEP LEARNING NETWORKS

Publication number: 20230131251

Abstract: A system and method for memory compression for deep learning networks. The method includes: compacting an input data stream by identifying a bit width necessary to accommodate the value from the input data stream with the highest magnitude; storing a least significant bits of the input data stream in a first memory store, the number of bits equal to the bit width, wherein if the value requires more bits than those currently left unused in the first memory store, the remaining bits are written into a second memory store; and outputting the value of the first memory store, as a consecutive part of a compressed data stream, with an associated width of the data in the first memory store when the first memory store becomes full and copying the value of the second memory store to the first memory store; and decompressing the compressed data stream.

Type: Application

Filed: November 10, 2022

Publication date: April 27, 2023

Inventors: Isak EDO VIVANCOS, Andreas MOSHOVOS, Sayeh SHARIFYMOGHADDAM, Alberto DELMAS LASCORZ
Accelerator for deep neural networks

Patent number: 11610100

Abstract: A system for bit-serial computation in a neural network is described. The system may be embodied on an integrated circuit and include one or more bit-serial tiles for performing bit-serial computations in which each bit-serial tile receives input neurons and synapses, and communicates output neurons. Also included is an activation memory for storing the neurons and a dispatcher and a reducer. The dispatcher reads neurons and synapses from memory and communicates either the neurons or the synapses bit-serially to the one or more bit-serial tiles. The other of the neurons or the synapses are communicated bit-parallelly to the one or more bit-serial tiles, or according to a further embodiment, may also be communicated bit-serially to the one or more bit-serial tiles. The reducer receives the output neurons from the one or more tiles, and communicates the output neurons to the activation memory.

Type: Grant

Filed: July 7, 2019

Date of Patent: March 21, 2023

Assignee: Samsung Electronics Co., Ltd.

Inventors: Patrick Judd, Jorge Albericio, Alberto Delmas Lascorz, Andreas Moshovos, Sayeh Sharifymoghaddam
SYSTEM AND METHOD FOR TEMPLATE MATCHING FOR NEURAL POPULATION PATTERN DETECTION

Publication number: 20230070243

Abstract: There is provided a system and method for template matching for neural population pattern detection. The method including: receiving neuron signal streams and serially associating a bit indicator with spikes from each neuron signal stream; serially determining a first summation (S1), a second summation (S2), and a third summation (S3) on the received neuron signals, the first summation including an element-wise multiply-sum using a time-dependent sliding indicator window on the received neuron signal streams and a template, the second summation including an accumulation using the time-dependent sliding indicator window, and the third summation including a sum of squares using the time-dependent sliding indicator window; and determining a correlation value associated with a match of the template with the received neural signal streams, the correlation value determined by combining the first summation, the second summation, and the third summation with predetermined constants associated with the template.

Type: Application

Filed: July 20, 2022

Publication date: March 9, 2023

Inventors: Ameer ABD ELHADI, Ciaran Brochan BANNON, Andreas MOSHOVOS, Hendrik STEENLAND
ACCELERATOR FOR DEEP NEURAL NETWORKS

Publication number: 20220327367

Abstract: Described is a system, integrated circuit and method for reducing ineffectual computations in the processing of layers in a neural network. One or more tiles perform computations where each tile receives input neurons, offsets and synapses, and where each input neuron has an associated offset. Each tile generates output neurons, and there is also an activation memory for storing neurons in communication with the tiles via a dispatcher and an encoder. The dispatcher reads neurons from the activation memory and communicates the neurons to the tiles and reads synapses from a memory and communicates the synapses to the tiles. The encoder receives the output neurons from the tiles, encodes them and communicates the output neurons to the activation memory. The offsets are processed by the tiles in order to perform computations only on non-zero neurons. Optionally, synapses may be similarly processed to skip ineffectual operations.

Type: Application

Filed: June 22, 2022

Publication date: October 13, 2022

Applicant: Samsung Electronics Co., Ltd.

Inventors: Patrick Judd, Jorge Albericio, Alberto Delmas Lascorz, Andreas Moshovos, Sayeh Sharifymoghaddam
Accelerator for deep neural networks

Patent number: 11423289

Abstract: Described is a system, integrated circuit and method for reducing ineffectual computations in the processing of layers in a neural network. One or more tiles perform computations where each tile receives input neurons, offsets and synapses, and where each input neuron has an associated offset. Each tile generates output neurons, and there is also an activation memory for storing neurons in communication with the tiles via a dispatcher and an encoder. The dispatcher reads neurons from the activation memory and communicates the neurons to the tiles and reads synapses from a memory and communicates the synapses to the tiles. The encoder receives the output neurons from the tiles, encodes them and communicates the output neurons to the activation memory. The offsets are processed by the tiles in order to perform computations only on non-zero neurons. Optionally, synapses may be similarly processed to skip ineffectual operations.

Type: Grant

Filed: June 14, 2017

Date of Patent: August 23, 2022

Assignee: Samsung Electronics Co., Ltd.

Inventors: Patrick Judd, Jorge Albericio, Alberto Delmas Lascorz, Andreas Moshovos, Sayeh Sharifymoghaddam
QUANTIZATION FOR NEURAL NETWORK COMPUTATION

Publication number: 20220092382

Abstract: A method for memory storage including storing a neural network by storing values of the neural network each as a reference to a representative value; and, in some embodiments, storing additional values of the neural network. Each of the representative values can be generated by assigning each of the values of the neural network to a cluster; and for each cluster, selecting a centroid from the cluster. The method can include performing one or more multiply-accumulate operations A1B1+ . . . +AnBn on input vectors A and input vectors B, by accumulating input vectors A to an accumulated sum of input vectors A per input vector B having the same representative value and subsequently multiplying each of the accumulated sums of input vectors A by the representative value of the input vector B. A system is also described, as well as a method for configuring memory according to a data structure.

Type: Application

Filed: December 22, 2020

Publication date: March 24, 2022

Inventors: Andreas Moshovos, Ali Hadi Zadeh, Isak Edo Vivancos, Omar Mohamed Awad
NEURAL NETWORK PROCESSING ELEMENT

Publication number: 20210125046

Abstract: Described is a neural network accelerator tile. It includes an activation memory interface for interfacing with an activation memory to receive a set of activation representations and a weight memory interface for interfacing with a weight memory to receive a set of weight representations, and a processing element. The processing element is configured to implement a one-hot encoder, a histogrammer, an aligner, a reducer, and an accumulation sub-element which process the set of activation representations and the set of weight representations to produce a set of output representations.

Type: Application

Filed: April 25, 2019

Publication date: April 29, 2021

Inventors: Andreas Moshovos, Mostafa Mahmoud, Sayeh Sharifymoghaddam
NEURAL NETWORK ACCELERATOR

Publication number: 20210004668

Abstract: Described is a neural network accelerator tile for exploiting input sparsity. The tile includes a weight memory to supply each weight lane with a weight and a weight selection metadata, an activation selection unit to receive a set of input activation values and rearrange the set of input activation values to supply each activation lane with a set of rearranged activation values, a set of multiplexers including at least one multiplexer per pair of activation and weight lanes, where each multiplexer is configured to select a combination activation value for the activation lane from the activation lane set of rearranged activation values based on the weight lane weight selection metadata, and a set of combination units including at least one combination unit per multiplexer, where each combination unit is configured to combine the activation lane combination value with the weight lane weight to output a weight lane product.

Type: Application

Filed: February 15, 2019

Publication date: January 7, 2021

Inventors: Andreas Moshovos, Alberto Delmas Lascorz, Zisis Poulos, Dylan Malone Stuart, Patrick Judd, Sayeh Sharify, Mostafa Mahmoud, Milos Nikolic, Kevin Chong Man Siu, Jorge Albericio
ACCELERATOR FOR DEEP NEURAL NETWORKS

Publication number: 20200125931

Abstract: A system for bit-serial computation in a neural network is described. The system may be embodied on an integrated circuit and include one or more bit-serial tiles for performing bit-serial computations in which each bit-serial tile receives input neurons and synapses, and communicates output neurons. Also included is an activation memory for storing the neurons and a dispatcher and a reducer. The dispatcher reads neurons and synapses from memory and communicates either the neurons or the synapses bit-serially to the one or more bit-serial tiles. The other of the neurons or the synapses are communicated bit-parallelly to the one or more bit-serial tiles, or according to a further embodiment, may also be communicated bit-serially to the one or more bit-serial tiles. The reducer receives the output neurons from the one or more tiles, and communicates the output neurons to the activation memory.

Type: Application

Filed: July 7, 2019

Publication date: April 23, 2020

Inventors: Patrick Judd, Jorge Albericio, Alberto Delmas Lascorz, Andreas Moshovos, Sayeh Sharify
METHOD AND DEVICE WITH CONVOLUTION NEURAL NETWORK PROCESSING

Publication number: 20200065646

Abstract: A processor-implemented method implementing a convolution neural network includes: determining a plurality of differential groups by grouping a plurality of raw windows of an input feature map into the plurality of differential groups; determining differential windows by performing, for each respective differential group of the differential groups, a differential operation between the raw windows of the respective differential group; determining a reference element of an output feature map corresponding to a reference raw window among the raw windows by performing a convolution operation between a kernel and the reference raw window; and determining remaining elements of the output feature map by performing a reference element summation operation based on the reference element and each of a plurality of convolution operation results determined by performing respective convolution operations between the kernel and each of the differential windows.

Type: Application

Filed: August 23, 2019

Publication date: February 27, 2020

Applicants: Samsung Electronics Co., Ltd., The Governing Council of the University of Toronto

Inventors: Mostafa MAHMOUD, Andreas MOSHOVOS

1 2 next