Patents by Inventor Joseph HASSOUN

Joseph HASSOUN has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

Mixed-precision neural processing unit (NPU) using spatial fusion with load balancing

Patent number: 12001929

Abstract: According to one general aspect, an apparatus may include a machine learning system. The machine learning system may include a precision determination circuit configured to: determine a precision level of data, and divide the data into a data subdivision. The machine learning system may exploit sparsity during the computation of each subdivision. The machine learning system may include a load balancing circuit configured to select a load balancing technique, wherein the load balancing technique includes alternately loading the computation circuit with at least a first data/weight subdivision combination and a second data/weight subdivision combination. The load balancing circuit may be configured to load a computation circuit with a selected data subdivision and a selected weight subdivision based, at least in part, upon the load balancing technique.

Type: Grant

Filed: June 10, 2020

Date of Patent: June 4, 2024

Assignee: SAMSUNG ELECTRONICS CO., LTD.

Inventors: Hamzah Abdelaziz, Joseph Hassoun, Ali Shafiee Ardestani
RUNTIME RECONFIGURABLE COMPRESSION FORMAT CONVERSION WITH BIT-PLANE GRANULARITY

Publication number: 20240162917

Abstract: A runtime bit-plane data-format optimizer for a processing element includes a sparsity-detector and a compression-converter. The sparsity-detector selects a bit-plane compression-conversion format during a runtime of the processing element using a performance model that is based on a first sparsity pattern of first bit-plane data stored in a memory exterior to the processing element and a second sparsity pattern of second bit-plane data that is to be stored in a memory within the processing element. The second sparsity pattern is based on a runtime configuration of the processing element. The first bit-plane data is stored using a first bit-plane compression format and the bit-plane second data is to be stored using a second bit-plane compression format. The compression-conversion circuit converts the first bit-plane compression format of the first data to be the second bit-plane compression format of the second data.

Type: Application

Filed: January 12, 2023

Publication date: May 16, 2024

Inventors: Jong Hoon SHIN, Ardavan PEDRAM, Joseph HASSOUN
DNNS ACCELERATION WITH BLOCK-WISE N:M STRUCTURED WEIGHT SPARSITY

Publication number: 20240160483

Abstract: An accelerator core includes first and second buffers and at least one group of k processing elements. The first buffer receives at least one group of block-wise sparsified first elements. A block size (k,c) of each group of block-wise sparsified first elements includes k rows and c columns in which k is greater than or equal to 2, k times p equals K, and c times q equals C in which K is an output channel dimension of a tensor of first elements, C is a number of input channels of the tensor of first elements, p is an integer and q is an integer. The second buffer receive second elements. Each respective group of processing elements receive k rows of first elements from a block of first elements corresponding to the group of PEs, and receives second elements that correspond to first elements received from the first buffer.

Type: Application

Filed: January 13, 2023

Publication date: May 16, 2024

Inventors: Hamzah ABDELAZIZ, Joseph HASSOUN
RUNTIME RECONFIGURABLE COMPRESSION FORMAT CONVERSION

Publication number: 20240162916

Abstract: A runtime data-format optimizer for a processing element includes a sparsity-detector and a compression-converter. The sparsity-detector selects a first compression-conversion format during a runtime of the processing element based on a performance model that is based on a first sparsity pattern of first data stored in a first memory that is exterior to the processing element and a second sparsity pattern of second data that is to be stored in a second memory within the processing element. The second sparsity pattern is based on a runtime configuration of the processing element. The first data is stored in the first memory using a first compression format and the second data is to be stored in the second memory using a second compression format. The compression-conversion circuit converts the first compression format of the first data to be the second compression format of the second data based on the first compression-conversion format.

Type: Application

Filed: January 12, 2023

Publication date: May 16, 2024

Inventors: Jong Hoon SHIN, Ardavan PEDRAM, Joseph HASSOUN
WEIGHT-SPARSE NPU WITH FINE-GRAINED STRUCTURED SPARSITY

Publication number: 20240119270

Abstract: A neural processing unit is reconfigurable to process a fine-grain structured sparsity weight arrangement selected from N:M=1:4, 2:4, 2:8 and 4:8 fine-grain structured weight sparsity arrangements. A weight buffer stores weight values and a weight multiplexer array outputs one or more weight values stored in the weight buffer as first operand values based on a selected fine-grain structured sparsity weight arrangement. An activation buffer stores activation values and an activation multiplexer array outputs one or more activation values stored in the activation buffer as second operand values based on the selected fine-grain structured weight sparsity in which each respective second operand value and a corresponding first operand value forms an operand value pair. A multiplier array outputs a product value for each operand value pair.

Type: Application

Filed: November 3, 2022

Publication date: April 11, 2024

Inventors: Jong Hoon SHIN, Ardavan PEDRAM, Joseph HASSOUN
HYBRID-SPARSE NPU WITH FINE-GRAINED STRUCTURED SPARSITY

Publication number: 20240095505

Abstract: A neural processing unit is disclosed that supports dual-sparsity modes. A weight buffer is configured to store weight values in an arrangement selected from a structured weight sparsity arrangement or a random weight sparsity arrangement. A weight multiplexer array is configured to output one or more weight values stored in the weight buffer as first operand values based on the selected weight sparsity arrangement. An activation buffer is configured to store activation values. An activation multiplexer array includes inputs to the activation multiplexer array that are coupled to the activation buffer, and is configured to output one or more activation values stored in the activation buffer as second operand values in which each respective second operand value and a corresponding first operand value forming an operand value pair. A multiplier array is configured to output a product value for each operand value pair.

Type: Application

Filed: November 3, 2022

Publication date: March 21, 2024

Inventors: Jong Hoon SHIN, Ardavan PEDRAM, Joseph HASSOUN
Processor for fine-grain sparse integer and floating-point operations

Patent number: 11861327

Abstract: A processor for fine-grain sparse integer and floating-point operations and method of operation thereof are provided. In some embodiments, the method includes forming a first set of products and forming a second set of products. The forming of the first set of products may include: multiplying, in a first multiplier, a first activation value by a least significant sub-word and a most significant sub-word of a first weight to form a first partial product and a second partial product; and adding the first partial product and the second partial product. The forming of the second set of products may include: multiplying, in the first multiplier, a second activation value by a first sub-word and a second sub-word of a mantissa to form a third partial product and a fourth partial product; and adding the third partial product and the fourth partial product.

Type: Grant

Filed: December 22, 2020

Date of Patent: January 2, 2024

Assignee: Samsung Electronics Co., Ltd.

Inventors: Ali Shafiee Ardestani, Joseph Hassoun
SYSTEM AND METHOD FOR INCREASING UTILIZATION OF DOT-PRODUCT BASED NEURAL NETWORK ACCELERATOR

Publication number: 20230289584

Abstract: A method of flattening channel data of an input feature map in an inference system includes retrieving pixel values of a channel of a plurality of channels of the input feature map from a memory and storing the pixel values in a buffer, extracting first values of a first region having a first size from among the pixel values stored in the buffer, the first region corresponding to an overlap region of a kernel of the inference system with channel data of the input feature map, rearranging second values corresponding to the overlap region of the kernel from among the first values in the first region, and identifying a first group of consecutive values from among the rearranged second values for supplying to a first dot-product circuit of the inference system.

Type: Application

Filed: May 18, 2023

Publication date: September 14, 2023

Inventors: Ali Shafiee Ardestani, Joseph Hassoun
System and method for increasing utilization of dot-product based neural network accelerator

Patent number: 11687764

Abstract: A method of flattening channel data of an input feature map in an inference system includes retrieving pixel values of a channel of a plurality of channels of the input feature map from a memory and storing the pixel values in a buffer, extracting first values of a first region having a first size from among the pixel values stored in the buffer, the first region corresponding to an overlap region of a kernel of the inference system with channel data of the input feature map, rearranging second values corresponding to the overlap region of the kernel from among the first values in the first region, and identifying a first group of consecutive values from among the rearranged second values for supplying to a first dot-product circuit of the inference system.

Type: Grant

Filed: June 12, 2020

Date of Patent: June 27, 2023

Assignee: Samsung Electronics Co., Ltd.

Inventors: Ali Shafiee Ardestani, Joseph Hassoun
System and method for performing computations for deep neural networks

Patent number: 11681907

Abstract: A computation unit for performing a computation of a neural network layer is disclosed. A number of processing element (PE) units are arranged in an array. First input values are provided in parallel in an input dimension of the array during a first processing period, and a second input values are provided in parallel in the input dimension during a second processing period. Computations are performed by the PE units based on stored weight values. An adder coupled to the first set of PE units generates a first sum of results of the computations by the first set of PE units during the first processing cycle, and generates a second sum of results of the computations during the second processing cycle. A first accumulator coupled to the first adder stores the first sum, and further shifts the first sum to a second accumulator prior to storing the second sum.

Type: Grant

Filed: October 14, 2022

Date of Patent: June 20, 2023

Assignee: Samsung Electronics Co., Ltd.

Inventors: Hamzah Abdelaziz, Joseph Hassoun, Ali Shafiee Ardestani
SYSTEM AND METHOD FOR PERFORMING COMPUTATIONS FOR DEEP NEURAL NETWORKS

Publication number: 20230047273

Abstract: A computation unit for performing a computation of a neural network layer is disclosed. A number of processing element (PE) units are arranged in an array. First input values are provided in parallel in an input dimension of the array during a first processing period, and a second input values are provided in parallel in the input dimension during a second processing period. Computations are performed by the PE units based on stored weight values. An adder coupled to the first set of PE units generates a first sum of results of the computations by the first set of PE units during the first processing cycle, and generates a second sum of results of the computations during the second processing cycle. A first accumulator coupled to the first adder stores the first sum, and further shifts the first sum to a second accumulator prior to storing the second sum.

Type: Application

Filed: October 14, 2022

Publication date: February 16, 2023

Inventors: Hamzah Abdelaziz, Joseph Hassoun, Ali Shafiee Ardestani
Signed multiplication using unsigned multiplier with dynamic fine-grained operand isolation

Patent number: 11579842

Abstract: An N×N multiplier may include a N/2×N first multiplier, a N/2×N/2 second multiplier, and a N/2×N/2 third multiplier. The N×N multiplier receives two operands to multiply. The first, second and/or third multipliers are selectively disabled if an operand equals zero or has a small value. If the operands are both less than 2N/2, the second or the third multiplier are used to multiply the operands. If one operand is less than 2N/2 and the other operand is equal to or greater than 2N/2, the first multiplier is used or the second and third multipliers are used to multiply the operands. If both operands are equal to or greater than 2N/2, the first, second and third multipliers are used to multiply the operands.

Type: Grant

Filed: January 15, 2021

Date of Patent: February 14, 2023

Inventors: Ilia Ovsiannikov, Ali Shafiee Ardestani, Joseph Hassoun, Lei Wang
CIRCUIT FOR HANDLING PROCESSING WITH OUTLIERS

Publication number: 20220414421

Abstract: A system and method for handling processing with outliers. In some embodiments, the method includes: reading a first activation and a second activation, each including a least significant part and a most significant part, multiplying a first weight and a second weight by the respective activations, the multiplying of the first weight by the first activation including multiplying the first weight by the least significant part of the first activation in a first multiplier, the multiplying of the second weight by the second activation including: multiplying the second weight by the least significant part of the second activation in a second multiplier, and multiplying the second weight by the most significant part of the second activation in a shared multiplier, the shared multiplier being associated with a plurality of rows of an array of activations.

Type: Application

Filed: October 4, 2021

Publication date: December 29, 2022

Inventors: Ali Shafiee Ardestani, Hamzah Ahmed Ali Abdelaziz, Joseph Hassoun
LEARNED THRESHOLD TOKEN PRUNING FOR TRANSFORMER NEURAL NETWORKS

Publication number: 20220374766

Abstract: An architecture and method are disclosed to reduce computation in a self-attention model. The self-attention model is trained using multiple sub-models; each sub-model receiving an input sequence of tokens; each input sequence of tokens being scored within each sub-model to provide a token score for each sub-model; each sub-model having a predetermined threshold score. Each sub-model prunes tokens from the input sequence with a score below the predetermined threshold score for the sub-model. The pruned sequences of each sub-model are used as the input sequences for the next sub-model. The predetermined threshold scores for each sub-model differing.

Type: Application

Filed: January 18, 2022

Publication date: November 24, 2022

Inventors: David Philip Lloyd THORSLEY, Sheng SHEN, Se Hoon KIM, Amir GHOLAMINEJAD, Woosuk KWON, Joseph HASSOUN, Kurt KEUTZER
System and method for performing computations for deep neural networks

Patent number: 11507817

Abstract: A computation unit for performing a computation of a neural network layer is disclosed. A number of processing element (PE) units are arranged in an array. First input values are provided in parallel in an input dimension of the array during a first processing period, and a second input values are provided in parallel in the input dimension during a second processing period. Computations are performed by the PE units based on stored weight values. An adder coupled to the first set of PE units generates a first sum of results of the computations by the first set of PE units during the first processing cycle, and generates a second sum of results of the computations during the second processing cycle. A first accumulator coupled to the first adder stores the first sum, and further shifts the first sum to a second accumulator prior to storing the second sum.

Type: Grant

Filed: June 12, 2020

Date of Patent: November 22, 2022

Assignee: Samsung Electronics Co., Ltd.

Inventors: Hamzah Abdelaziz, Joseph Hassoun, Ali Shafiee Ardestani
PROCESSOR FOR FINE-GRAIN SPARSE INTEGER AND FLOATING-POINT OPERATIONS

Publication number: 20220147312

Abstract: A processor for fine-grain sparse integer and floating-point operations and method of operation thereof. In some embodiments, the method includes forming a first set of products and forming a second set of products. The forming of the first set of products may include: multiplying, in a first multiplier, a first activation value by a least significant sub-word and a most significant sub-word of a first weight form a first partial product and a second partial product; and adding the first partial product and the second partial product. The forming of the second set of products may include: multiplying, in the first multiplier, a second activation value by a first sub-word and a second sub-word of a mantissa to form a third partial product and a fourth partial product; and adding the third partial product and the fourth partial product.

Type: Application

Filed: December 22, 2020

Publication date: May 12, 2022

Inventors: Ali Shafiee Ardestani, Joseph Hassoun
PROCESSOR WITH OUTLIER ACCOMMODATION

Publication number: 20220114425

Abstract: A system and method for performing sets of multiplications in a manner that accommodates outlier values. In some embodiments the method includes: forming a first set of products, each product of the first set of products being a product of a first activation value and a respective weight of a first plurality of weights. The forming of the first set of products may include multiplying, in a first multiplier, the first activation value and a least significant sub-word of a first weight to form a first partial product; multiplying, in a second multiplier, the first activation value and a least significant sub-word of a second weight; multiplying, in a third multiplier, the first activation value and a most significant sub-word of the first weight to form a second partial product; and adding the first partial product and the second partial product.

Type: Application

Filed: December 2, 2020

Publication date: April 14, 2022

Inventors: Ali Shafiee Ardestani, Joseph Hassoun
SYSTEM AND METHOD FOR INCREASING UTILIZATION OF DOT-PRODUCT BASED NEURAL NETWORK ACCELERATOR

Publication number: 20210326682

Abstract: A method of flattening channel data of an input feature map in an inference system includes retrieving pixel values of a channel of a plurality of channels of the input feature map from a memory and storing the pixel values in a buffer, extracting first values of a first region having a first size from among the pixel values stored in the buffer, the first region corresponding to an overlap region of a kernel of the inference system with channel data of the input feature map, rearranging second values corresponding to the overlap region of the kernel from among the first values in the first region, and identifying a first group of consecutive values from among the rearranged second values for supplying to a first dot-product circuit of the inference system.

Type: Application

Filed: June 12, 2020

Publication date: October 21, 2021

Inventors: Ali Shafiee Ardestani, Joseph Hassoun
SYSTEM AND METHOD FOR PERFORMING COMPUTATIONS FOR DEEP NEURAL NETWORKS

Publication number: 20210326686

Abstract: A computation unit for performing a computation of a neural network layer is disclosed. A number of processing element (PE) units are arranged in an array. First input values are provided in parallel in an input dimension of the array during a first processing period, and a second input values are provided in parallel in the input dimension during a second processing period. Computations are performed by the PE units based on stored weight values. An adder coupled to the first set of PE units generates a first sum of results of the computations by the first set of PE units during the first processing cycle, and generates a second sum of results of the computations during the second processing cycle. A first accumulator coupled to the first adder stores the first sum, and further shifts the first sum to a second accumulator prior to storing the second sum.

Type: Application

Filed: June 12, 2020

Publication date: October 21, 2021

Inventors: Hamzah Abdelaziz, Joseph Hassoun, Ali Shafiee Ardestani
MIXED-PRECISION NEURAL PROCESSING UNIT (NPU) USING SPATIAL FUSION WITH LOAD BALANCING

Publication number: 20210312325

Abstract: According to one general aspect, an apparatus may include a machine learning system. The machine learning system may include a precision determination circuit configured to: determine a precision level of data, and divide the data into a data subdivision. The machine learning system may exploit sparsity during the computation of each subdivision. The machine learning system may include a load balancing circuit configured to select a load balancing technique, wherein the load balancing technique includes alternately loading the computation circuit with at least a first data/weight subdivision combination and a second data/weight subdivision combination. The load balancing circuit may be configured to load a computation circuit with a selected data subdivision and a selected weight subdivision based, at least in part, upon the load balancing technique.

Type: Application

Filed: June 10, 2020

Publication date: October 7, 2021

Inventors: Hamzah ABDELAZIZ, Joseph HASSOUN, Ali SHAFIEE ARDESTANI

1 2 next