Patents by Inventor Andreas Moshovos

Andreas Moshovos has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11928566
    Abstract: There is provided a system and method for compression and decompression of a data stream used by machine learning networks. The method including: encoding each value in the data stream, including: determining a mapping to one of a plurality of non-overlapping ranges, each value encoded as a symbol representative of the range and a corresponding offset; and arithmetically coding the symbol using a probability count; storing a compressed data stream including the arithmetically coded symbols and the corresponding offsets; and decoding the compressed data stream with arithmetic decoding using the probability count, the arithmetic decoded symbols use the offset bits to arrive at a decoded data stream; and communicating the decoded data stream for use by the machine learning networks.
    Type: Grant
    Filed: January 11, 2023
    Date of Patent: March 12, 2024
    Inventors: Alberto Delmas Lascorz, Andreas Moshovos
  • Publication number: 20240005213
    Abstract: There is provided a system and method for compression and decompression of a data stream used by machine learning networks. The method including: encoding each value in the data stream, including: determining a mapping to one of a plurality of non-overlapping ranges, each value encoded as a symbol representative of the range and a corresponding offset; and arithmetically coding the symbol using a probability count; storing a compressed data stream including the arithmetically coded symbols and the corresponding offsets; and decoding the compressed data stream with arithmetic decoding using the probability count, the arithmetic decoded symbols use the offset bits to arrive at a decoded data stream; and communicating the decoded data stream for use by the machine learning networks.
    Type: Application
    Filed: September 14, 2023
    Publication date: January 4, 2024
    Inventors: Alberto DELMAS LASCORZ, Andreas MOSHOVOS
  • Patent number: 11836971
    Abstract: A processor-implemented method implementing a convolution neural network includes: determining a plurality of differential groups by grouping a plurality of raw windows of an input feature map into the plurality of differential groups; determining differential windows by performing, for each respective differential group of the differential groups, a differential operation between the raw windows of the respective differential group; determining a reference element of an output feature map corresponding to a reference raw window among the raw windows by performing a convolution operation between a kernel and the reference raw window; and determining remaining elements of the output feature map by performing a reference element summation operation based on the reference element and each of a plurality of convolution operation results determined by performing respective convolution operations between the kernel and each of the differential windows.
    Type: Grant
    Filed: August 23, 2019
    Date of Patent: December 5, 2023
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Mostafa Mahmoud, Andreas Moshovos
  • Publication number: 20230334285
    Abstract: A method for memory storage including storing a neural network by storing values of the neural network each as a reference to a representative value; and, in some embodiments, storing additional values of the neural network. Each of the representative values can be generated by assigning each of the values of the neural network to a cluster; and for each cluster, selecting a centroid from the cluster. The method can include performing one or more multiply-accumulate operations A1B1+ . . . +AnBn on input vectors A and input vectors B, by accumulating input vectors A to an accumulated sum of input vectors A per input vector B having the same representative value and subsequently multiplying each of the accumulated sums of input vectors A by the representative value of the input vector B. A system is also described, and a method for configuring memory according to a data structure.
    Type: Application
    Filed: September 21, 2021
    Publication date: October 19, 2023
    Inventors: Andreas Moshovos, Ali Hadi Zadeh, Isak Edo Vivancos, Omar Mohamed Awad
  • Publication number: 20230297337
    Abstract: A system and method for accelerating multiply-accumulate (MAC) floating-point units during training of deep learning networks. The method including: receiving a first input data stream A and a second input data stream B; adding exponents of the first data stream A and the second data stream B in pairs to produce product exponents; determining a maximum exponent using a comparator; determining a number of bits by which each significand in the second data stream has to be shifted prior to accumulation by adding product exponent deltas to the corresponding term in the first data stream and using an adder tree to reduce the operands in the second data stream into a single partial sum; adding the partial sum to a corresponding aligned value using the maximum exponent to determine accumulated values; and outputting the accumulated values.
    Type: Application
    Filed: July 19, 2021
    Publication date: September 21, 2023
    Inventors: Omar MOHAMED AWAD, Mostafa MAHMOUD, Andreas MOSHOVOS
  • Publication number: 20230273828
    Abstract: A system and method for using sparsity to accelerate deep learning networks. The method includes: communicating a bit vector to a scheduler identifying which values in an input tensor are non-zero; for each lane of the input tensor, determining which values are to be communicated for multiply-accumulate (MAC) operations, the determination including directing performance of one of: communicating the current value in the lane; communicating the next value in the same lane where such value is non-zero; communicating a value from a step ahead in time where such value is non-zero; and communicating a value from a neighboring lane where such value is non-zero; and outputting the values of the MAC operations.
    Type: Application
    Filed: July 16, 2021
    Publication date: August 31, 2023
    Inventors: Mostafa MAHMOUD, Andreas MOSHOVOS
  • Publication number: 20230267376
    Abstract: There is provided a system and method for compression and decompression of a data stream used by machine learning networks. The method including: encoding each value in the data stream, including: determining a mapping to one of a plurality of non-overlapping ranges, each value encoded as a symbol representative of the range and a corresponding offset; and arithmetically coding the symbol using a probability count; storing a compressed data stream including the arithmetically coded symbols and the corresponding offsets; and decoding the compressed data stream with arithmetic decoding using the probability count, the arithmetic decoded symbols use the offset bits to arrive at a decoded data stream; and communicating the decoded data stream for use by the machine learning networks.
    Type: Application
    Filed: January 11, 2023
    Publication date: August 24, 2023
    Inventors: Alberto DELMAS LASCORZ, Andreas Moshovos
  • Publication number: 20230186065
    Abstract: A system for bit-serial computation in a neural network is described. The system may be embodied on an integrated circuit and include one or more bit-serial tiles for performing bit-serial computations in which each bit-serial tile receives input neurons and synapses, and communicates output neurons. Also included is an activation memory for storing the neurons and a dispatcher. The dispatcher reads neurons and synapses from memory and communicates either the neurons or the synapses bit-serially to the one or more bit-serial tiles. The other of the neurons or the synapses are communicated bit-parallelly to the one or more bit-serial tiles, or according to a further embodiment, may also be communicated bit-serially to the one or more bit-serial tiles.
    Type: Application
    Filed: February 10, 2023
    Publication date: June 15, 2023
    Applicant: Samsung Electronics Co., Ltd.
    Inventors: Patrick Judd, Jorge Albericio, Alberto Delmas Lascorz, Andreas Moshovos, Sayeh Sharify
  • Publication number: 20230131251
    Abstract: A system and method for memory compression for deep learning networks. The method includes: compacting an input data stream by identifying a bit width necessary to accommodate the value from the input data stream with the highest magnitude; storing a least significant bits of the input data stream in a first memory store, the number of bits equal to the bit width, wherein if the value requires more bits than those currently left unused in the first memory store, the remaining bits are written into a second memory store; and outputting the value of the first memory store, as a consecutive part of a compressed data stream, with an associated width of the data in the first memory store when the first memory store becomes full and copying the value of the second memory store to the first memory store; and decompressing the compressed data stream.
    Type: Application
    Filed: November 10, 2022
    Publication date: April 27, 2023
    Inventors: Isak EDO VIVANCOS, Andreas MOSHOVOS, Sayeh SHARIFYMOGHADDAM, Alberto DELMAS LASCORZ
  • Patent number: 11610100
    Abstract: A system for bit-serial computation in a neural network is described. The system may be embodied on an integrated circuit and include one or more bit-serial tiles for performing bit-serial computations in which each bit-serial tile receives input neurons and synapses, and communicates output neurons. Also included is an activation memory for storing the neurons and a dispatcher and a reducer. The dispatcher reads neurons and synapses from memory and communicates either the neurons or the synapses bit-serially to the one or more bit-serial tiles. The other of the neurons or the synapses are communicated bit-parallelly to the one or more bit-serial tiles, or according to a further embodiment, may also be communicated bit-serially to the one or more bit-serial tiles. The reducer receives the output neurons from the one or more tiles, and communicates the output neurons to the activation memory.
    Type: Grant
    Filed: July 7, 2019
    Date of Patent: March 21, 2023
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Patrick Judd, Jorge Albericio, Alberto Delmas Lascorz, Andreas Moshovos, Sayeh Sharifymoghaddam
  • Publication number: 20230070243
    Abstract: There is provided a system and method for template matching for neural population pattern detection. The method including: receiving neuron signal streams and serially associating a bit indicator with spikes from each neuron signal stream; serially determining a first summation (S1), a second summation (S2), and a third summation (S3) on the received neuron signals, the first summation including an element-wise multiply-sum using a time-dependent sliding indicator window on the received neuron signal streams and a template, the second summation including an accumulation using the time-dependent sliding indicator window, and the third summation including a sum of squares using the time-dependent sliding indicator window; and determining a correlation value associated with a match of the template with the received neural signal streams, the correlation value determined by combining the first summation, the second summation, and the third summation with predetermined constants associated with the template.
    Type: Application
    Filed: July 20, 2022
    Publication date: March 9, 2023
    Inventors: Ameer ABD ELHADI, Ciaran Brochan BANNON, Andreas MOSHOVOS, Hendrik STEENLAND
  • Publication number: 20220327367
    Abstract: Described is a system, integrated circuit and method for reducing ineffectual computations in the processing of layers in a neural network. One or more tiles perform computations where each tile receives input neurons, offsets and synapses, and where each input neuron has an associated offset. Each tile generates output neurons, and there is also an activation memory for storing neurons in communication with the tiles via a dispatcher and an encoder. The dispatcher reads neurons from the activation memory and communicates the neurons to the tiles and reads synapses from a memory and communicates the synapses to the tiles. The encoder receives the output neurons from the tiles, encodes them and communicates the output neurons to the activation memory. The offsets are processed by the tiles in order to perform computations only on non-zero neurons. Optionally, synapses may be similarly processed to skip ineffectual operations.
    Type: Application
    Filed: June 22, 2022
    Publication date: October 13, 2022
    Applicant: Samsung Electronics Co., Ltd.
    Inventors: Patrick Judd, Jorge Albericio, Alberto Delmas Lascorz, Andreas Moshovos, Sayeh Sharifymoghaddam
  • Patent number: 11423289
    Abstract: Described is a system, integrated circuit and method for reducing ineffectual computations in the processing of layers in a neural network. One or more tiles perform computations where each tile receives input neurons, offsets and synapses, and where each input neuron has an associated offset. Each tile generates output neurons, and there is also an activation memory for storing neurons in communication with the tiles via a dispatcher and an encoder. The dispatcher reads neurons from the activation memory and communicates the neurons to the tiles and reads synapses from a memory and communicates the synapses to the tiles. The encoder receives the output neurons from the tiles, encodes them and communicates the output neurons to the activation memory. The offsets are processed by the tiles in order to perform computations only on non-zero neurons. Optionally, synapses may be similarly processed to skip ineffectual operations.
    Type: Grant
    Filed: June 14, 2017
    Date of Patent: August 23, 2022
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Patrick Judd, Jorge Albericio, Alberto Delmas Lascorz, Andreas Moshovos, Sayeh Sharifymoghaddam
  • Publication number: 20220092382
    Abstract: A method for memory storage including storing a neural network by storing values of the neural network each as a reference to a representative value; and, in some embodiments, storing additional values of the neural network. Each of the representative values can be generated by assigning each of the values of the neural network to a cluster; and for each cluster, selecting a centroid from the cluster. The method can include performing one or more multiply-accumulate operations A1B1+ . . . +AnBn on input vectors A and input vectors B, by accumulating input vectors A to an accumulated sum of input vectors A per input vector B having the same representative value and subsequently multiplying each of the accumulated sums of input vectors A by the representative value of the input vector B. A system is also described, as well as a method for configuring memory according to a data structure.
    Type: Application
    Filed: December 22, 2020
    Publication date: March 24, 2022
    Inventors: Andreas Moshovos, Ali Hadi Zadeh, Isak Edo Vivancos, Omar Mohamed Awad
  • Publication number: 20210125046
    Abstract: Described is a neural network accelerator tile. It includes an activation memory interface for interfacing with an activation memory to receive a set of activation representations and a weight memory interface for interfacing with a weight memory to receive a set of weight representations, and a processing element. The processing element is configured to implement a one-hot encoder, a histogrammer, an aligner, a reducer, and an accumulation sub-element which process the set of activation representations and the set of weight representations to produce a set of output representations.
    Type: Application
    Filed: April 25, 2019
    Publication date: April 29, 2021
    Inventors: Andreas Moshovos, Mostafa Mahmoud, Sayeh Sharifymoghaddam
  • Publication number: 20210004668
    Abstract: Described is a neural network accelerator tile for exploiting input sparsity. The tile includes a weight memory to supply each weight lane with a weight and a weight selection metadata, an activation selection unit to receive a set of input activation values and rearrange the set of input activation values to supply each activation lane with a set of rearranged activation values, a set of multiplexers including at least one multiplexer per pair of activation and weight lanes, where each multiplexer is configured to select a combination activation value for the activation lane from the activation lane set of rearranged activation values based on the weight lane weight selection metadata, and a set of combination units including at least one combination unit per multiplexer, where each combination unit is configured to combine the activation lane combination value with the weight lane weight to output a weight lane product.
    Type: Application
    Filed: February 15, 2019
    Publication date: January 7, 2021
    Inventors: Andreas Moshovos, Alberto Delmas Lascorz, Zisis Poulos, Dylan Malone Stuart, Patrick Judd, Sayeh Sharify, Mostafa Mahmoud, Milos Nikolic, Kevin Chong Man Siu, Jorge Albericio
  • Publication number: 20200125931
    Abstract: A system for bit-serial computation in a neural network is described. The system may be embodied on an integrated circuit and include one or more bit-serial tiles for performing bit-serial computations in which each bit-serial tile receives input neurons and synapses, and communicates output neurons. Also included is an activation memory for storing the neurons and a dispatcher and a reducer. The dispatcher reads neurons and synapses from memory and communicates either the neurons or the synapses bit-serially to the one or more bit-serial tiles. The other of the neurons or the synapses are communicated bit-parallelly to the one or more bit-serial tiles, or according to a further embodiment, may also be communicated bit-serially to the one or more bit-serial tiles. The reducer receives the output neurons from the one or more tiles, and communicates the output neurons to the activation memory.
    Type: Application
    Filed: July 7, 2019
    Publication date: April 23, 2020
    Inventors: Patrick Judd, Jorge Albericio, Alberto Delmas Lascorz, Andreas Moshovos, Sayeh Sharify
  • Publication number: 20200065646
    Abstract: A processor-implemented method implementing a convolution neural network includes: determining a plurality of differential groups by grouping a plurality of raw windows of an input feature map into the plurality of differential groups; determining differential windows by performing, for each respective differential group of the differential groups, a differential operation between the raw windows of the respective differential group; determining a reference element of an output feature map corresponding to a reference raw window among the raw windows by performing a convolution operation between a kernel and the reference raw window; and determining remaining elements of the output feature map by performing a reference element summation operation based on the reference element and each of a plurality of convolution operation results determined by performing respective convolution operations between the kernel and each of the differential windows.
    Type: Application
    Filed: August 23, 2019
    Publication date: February 27, 2020
    Applicants: Samsung Electronics Co., Ltd., The Governing Council of the University of Toronto
    Inventors: Mostafa MAHMOUD, Andreas MOSHOVOS
  • Patent number: 10387771
    Abstract: A system for bit-serial computation in a neural network is described. The system may be embodied on an integrated circuit and include one or more bit-serial tiles for performing bit-serial computations in which each bit-serial tile receives input neurons and synapses, and communicates output neurons. Also included is an activation memory for storing the neurons and a dispatcher and a reducer. The dispatcher reads neurons and synapses from memory and communicates either the neurons or the synapses bit-serially to the one or more bit-serial tiles. The other of the neurons or the synapses are communicated bit-parallelly to the one or more bit-serial tiles, or according to a further embodiment, may also be communicated bit-serially to the one or more bit-serial tiles. The reducer receives the output neurons from the one or more tiles, and communicates the output neurons to the activation memory.
    Type: Grant
    Filed: May 26, 2017
    Date of Patent: August 20, 2019
    Inventors: Patrick Judd, Jorge Albericio, Alberto Delmas Lascorz, Andreas Moshovos, Sayeh Sharify
  • Publication number: 20190205740
    Abstract: Described is a system, integrated circuit and method for reducing ineffectual computations in the processing of layers in a neural network. One or more tiles perform computations where each tile receives input neurons, offsets and synapses, and where each input neuron has an associated offset. Each tile generates output neurons, and there is also an activation memory for storing neurons in communication with the tiles via a dispatcher and an encoder. The dispatcher reads neurons from the activation memory and communicates the neurons to the tiles and reads synapses from a memory and communicates the synapses to the tiles. The encoder receives the output neurons from the tiles, encodes them and communicates the output neurons to the activation memory. The offsets are processed by the tiles in order to perform computations only on non-zero neurons. Optionally, synapses may be similarly processed to skip ineffectual operations.
    Type: Application
    Filed: June 14, 2017
    Publication date: July 4, 2019
    Applicant: THE GOVERNING COUNCIL OF THE UNIVERSITY OF TORONTO
    Inventors: Patrick Judd, Jorge Albericio, Alberto Delmas Lascorz, Andreas Moshovos, Sayeh Sharify