Patents by Inventor Joseph H. Hassoun

Joseph H. Hassoun has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Patent number: 11954574
    Abstract: A neural processor. In some embodiments, the processor includes a first tile, a second tile, a memory, and a bus. The bus may be connected to the memory, the first tile, and the second tile. The first tile may include: a first weight register, a second weight register, an activations buffer, a first multiplier, and a second multiplier. The activations buffer may be configured to include: a first queue connected to the first multiplier and a second queue connected to the second multiplier. The first queue may include a first register and a second register adjacent to the first register, the first register being an output register of the first queue. The first tile may be configured: in a first state: to multiply, in the first multiplier, a first weight by an activation from the output register of the first queue, and in a second state: to multiply, in the first multiplier, the first weight by an activation from the second register of the first queue.
    Type: Grant
    Filed: June 19, 2019
    Date of Patent: April 9, 2024
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Ilia Ovsiannikov, Ali Shafiee Ardestani, Joseph H. Hassoun, Lei Wang, Sehwan Lee, JoonHo Song, Jun-Woo Jang, Yibing Michelle Wang, Yuecheng Li
  • Publication number: 20240095519
    Abstract: A neural network inference accelerator includes first and second neural processing units (NPUs) and a sparsity management unit. The first NPU receives activation and weight tensors based on an activation sparsity density and a weight sparsity density both being greater than a predetermined sparsity density. The second NPU receives activation and weight tensors based on at least one of the activation sparsity density and the weight sparsity density being less than or equal to the predetermined sparsity density. The sparsity management unit controls transfer of the activation tensor and the weight tensor based on the activation sparsity density and the weight sparsity density with respect to the predetermined sparsity density.
    Type: Application
    Filed: November 17, 2022
    Publication date: March 21, 2024
    Inventors: Ardavan PEDRAM, Ali SHAFIEE ARDESTANI, Jong Hoon SHIN, Joseph H. HASSOUN
  • Publication number: 20240095518
    Abstract: A memory system and a method are disclosed for training a neural network model. A decompressor unit decompresses an activation tensor to a first predetermined sparsity density based on the activation tensor being compressed, and decompresses an weight tensor to a second predetermined sparsity density based on the weight tensor being compressed. A buffer unit receives the activation tensor at the first predetermined sparsity density and the weight tensor at the second predetermined sparsity density. A neural processing unit receives the activation tensor and the weight tensor from the buffer unit and computes a result for the activation tensor and the weight tensor based on first predetermined sparsity density of the activation tensor and based on the second predetermined sparsity density of the weight tensor.
    Type: Application
    Filed: November 16, 2022
    Publication date: March 21, 2024
    Inventors: Ardavan PEDRAM, Jong Hoon SHIN, Joseph H. HASSOUN
  • Patent number: 11880760
    Abstract: A processor to perform inference on deep learning neural network models. In some embodiments, the process includes: a first tile, a second tile, a memory, and a bus, the bus being connected to: the memory, the first tile, and the second tile, the first tile including: a first weight register, a second weight register, an activations cache, a shuffler, an activations buffer, a first multiplier, and a second multiplier, the activations buffer being configured to include: a first queue connected to the first multiplier, and a second queue connected to the second multiplier, the activations cache including a plurality of independent lanes, each of the independent lanes being randomly accessible, the first tile being configured: to receive a tensor including a plurality of two-dimensional arrays, each representing one color component of the image; and to perform a convolution of a kernel with one of the two-dimensional arrays.
    Type: Grant
    Filed: April 3, 2020
    Date of Patent: January 23, 2024
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Ilia Ovsiannikov, Ali Shafiee Ardestani, Hamzah Ahmed Ali Abdelaziz, Joseph H. Hassoun
  • Patent number: 11861328
    Abstract: A processor for fine-grain sparse integer and floating-point operations and method of operation thereof are provided. In some embodiments, the method includes forming a first set of products, and forming a second set of products. The forming of the first set of products may include: multiplying, in a first multiplier, a second multiplier, and a third multiplier, the first activation value by a first least significant sub-word, a second least significant sub-word, and a most significant sub-word; and adding a first resulting partial product and a second resulting partial product. The forming of the second set of products may include forming a first floating point product, the forming of the first floating point product including multiplying, in the first multiplier, a first sub-word of a mantissa of an activation value by a first sub-word of a mantissa of a weight, to form a third partial product.
    Type: Grant
    Filed: December 23, 2020
    Date of Patent: January 2, 2024
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Ali Shafiee Ardestani, Joseph H. Hassoun
  • Publication number: 20230351151
    Abstract: A neural processor. In some embodiments, the processor includes a first tile, a second tile, a memory, and a bus. The bus may be connected to the memory, the first tile, and the second tile. The first tile may include: a first weight register, a second weight register, an activations buffer, a first multiplier, and a second multiplier. The activations buffer may be configured to include: a first queue connected to the first multiplier and a second queue connected to the second multiplier. The first queue may include a first register and a second register adjacent to the first register, the first register being an output register of the first queue. The first tile may be configured: in a first state: to multiply, in the first multiplier, a first weight by an activation from the output register of the first queue, and in a second state: to multiply, in the first multiplier, the first weight by an activation from the second register of the first queue.
    Type: Application
    Filed: July 10, 2023
    Publication date: November 2, 2023
    Inventors: Ilia Ovsiannikov, Ali Shafiee Ardestani, Joseph H. Hassoun, Lei Wang, Sehwan Lee, JoonHo Song, Jun-Woo Jang, Yibing Michelle Wang, Yuecheng Li
  • Patent number: 11783161
    Abstract: A neural processor. In some embodiments, the processor includes a first tile, a second tile, a memory, and a bus. The bus may be connected to the memory, the first tile, and the second tile. The first tile may include: a first weight register, a second weight register, an activations buffer, a first multiplier, and a second multiplier. The activations buffer may be configured to include: a first queue connected to the first multiplier and a second queue connected to the second multiplier. The first queue may include a first register and a second register adjacent to the first register, the first register being an output register of the first queue. The first tile may be configured: in a first state: to multiply, in the first multiplier, a first weight by an activation from the output register of the first queue, and in a second state: to multiply, in the first multiplier, the first weight by an activation from the second register of the first queue.
    Type: Grant
    Filed: June 19, 2019
    Date of Patent: October 10, 2023
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Ilia Ovsiannikov, Ali Shafiee Ardestani, Joseph H. Hassoun, Lei Wang, Sehwan Lee, JoonHo Song, Jun-Woo Jang, Yibing Michelle Wang, Yuecheng Li
  • Patent number: 11783162
    Abstract: A neural processor. In some embodiments, the processor includes a first tile, a second tile, a memory, and a bus. The bus may be connected to the memory, the first tile, and the second tile. The first tile may include: a first weight register, a second weight register, an activations buffer, a first multiplier, and a second multiplier. The activations buffer may be configured to include: a first queue connected to the first multiplier and a second queue connected to the second multiplier. The first queue may include a first register and a second register adjacent to the first register, the first register being an output register of the first queue. The first tile may be configured: in a first state: to multiply, in the first multiplier, a first weight by an activation from the output register of the first queue, and in a second state: to multiply, in the first multiplier, the first weight by an activation from the second register of the first queue.
    Type: Grant
    Filed: August 27, 2019
    Date of Patent: October 10, 2023
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Ilia Ovsiannikov, Ali Shafiee Ardestani, Joseph H. Hassoun, Lei Wang, Sehwan Lee, JoonHo Song, Jun-Woo Jang, Yibing Michelle Wang, Yuecheng Li
  • Patent number: 11775801
    Abstract: A neural processor. In some embodiments, the processor includes a first tile, a second tile, a memory, and a bus. The bus may be connected to the memory, the first tile, and the second tile. The first tile may include: a first weight register, a second weight register, an activations buffer, a first multiplier, and a second multiplier. The activations buffer may be configured to include: a first queue connected to the first multiplier and a second queue connected to the second multiplier. The first queue may include a first register and a second register adjacent to the first register, the first register being an output register of the first queue. The first tile may be configured: in a first state: to multiply, in the first multiplier, a first weight by an activation from the output register of the first queue, and in a second state: to multiply, in the first multiplier, the first weight by an activation from the second register of the first queue.
    Type: Grant
    Filed: August 27, 2019
    Date of Patent: October 3, 2023
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Ilia Ovsiannikov, Ali Shafiee Ardestani, Joseph H. Hassoun, Lei Wang, Sehwan Lee, JoonHo Song, Jun-Woo Jang, Yibing Michelle Wang, Yuecheng Li
  • Patent number: 11775256
    Abstract: An N×N multiplier may include a N/2×N first multiplier, a N/2×N/2 second multiplier, and a N/2×N/2 third multiplier. The N×N multiplier receives two operands to multiply. The first, second and/or third multipliers are selectively disabled if an operand equals zero or has a small value. If the operands are both less than 2N/2, the second or the third multiplier are used to multiply the operands. If one operand is less than 2N/2 and the other operand is equal to or greater than 2N/2, the first multiplier is used or the second and third multipliers are used to multiply the operands. If both operands are equal to or greater than 2N/2, the first, second and third multipliers are used to multiply the operands.
    Type: Grant
    Filed: January 12, 2023
    Date of Patent: October 3, 2023
    Inventors: Ilia Ovsiannikov, Ali Shafiee Ardestani, Joseph H. Hassoun, Lei Wang
  • Patent number: 11775611
    Abstract: In some embodiments, a method of quantizing an artificial neural network includes dividing a quantization range for a tensor of the artificial neural network into a first region and a second region, and quantizing values of the tensor in the first region separately from values of the tensor in the second region. In some embodiments, linear or nonlinear quantization are applied to values of the tensor in the first region and the second region. In some embodiments, the method includes locating a breakpoint between the first region and the second region by substantially minimizing an expected quantization error over at least a portion of the quantization range. In some embodiments, the expected quantization error is minimized by solving analytically and/or searching numerically.
    Type: Grant
    Filed: March 11, 2020
    Date of Patent: October 3, 2023
    Inventors: Jun Fang, Joseph H. Hassoun, Ali Shafiee Ardestani, Hamzah Ahmed Ali Abdelaziz, Georgios Georgiadis, Hui Chen, David Philip Lloyd Thorsley
  • Patent number: 11775802
    Abstract: A neural processor. In some embodiments, the processor includes a first tile, a second tile, a memory, and a bus. The bus may be connected to the memory, the first tile, and the second tile. The first tile may include: a first weight register, a second weight register, an activations buffer, a first multiplier, and a second multiplier. The activations buffer may be configured to include: a first queue connected to the first multiplier and a second queue connected to the second multiplier. The first queue may include a first register and a second register adjacent to the first register, the first register being an output register of the first queue. The first tile may be configured: in a first state: to multiply, in the first multiplier, a first weight by an activation from the output register of the first queue, and in a second state: to multiply, in the first multiplier, the first weight by an activation from the second register of the first queue.
    Type: Grant
    Filed: August 27, 2019
    Date of Patent: October 3, 2023
    Assignee: Samsung Electronics Co., Ltd.
    Inventors: Ilia Ovsiannikov, Ali Shafiee Ardestani, Joseph H. Hassoun, Lei Wang, Sehwan Lee, JoonHo Song, Jun-Woo Jang, Yibing Michelle Wang, Yuecheng Li
  • Publication number: 20230205488
    Abstract: A system and method for efficient processing for neural network inference operations. In some embodiments, the system includes: a circuit configured to multiply a first number by a second number, the first number being represented as: a sign bit five exponent bits, and seven mantissa bits, representing an eight-bit full mantissa.
    Type: Application
    Filed: January 6, 2022
    Publication date: June 29, 2023
    Inventors: Ling LI, Ali SHAFIEE ARDESTANI, Hamzah ABDELAZIZ, Joseph H. HASSOUN
  • Patent number: 11671111
    Abstract: A multichannel data packer includes a plurality of two-input multiplexers and a controller. The plurality of two-input multiplexers is arranged in 2N rows and N columns in which N is an integer greater than 1. Each input of a multiplexer in a first column receives a respective bit stream of 2N channels of bit streams. Each respective bit stream includes a bit-stream length based on data in the bit stream. The multiplexers in a last column output 2N channels of packed bit streams each having a same bit-stream length. The controller controls the plurality of multiplexers so that the multiplexers in the last column output the 2N channels of bit streams that each has the same bit-stream length.
    Type: Grant
    Filed: April 7, 2020
    Date of Patent: June 6, 2023
    Inventors: Ilia Ovsiannikov, Ali Shafiee Ardestani, Lei Wang, Joseph H. Hassoun
  • Publication number: 20230153065
    Abstract: An N×N multiplier may include a N/2×N first multiplier, a N/2×N/2 second multiplier, and a N/2×N/2 third multiplier. The N×N multiplier receives two operands to multiply. The first, second and/or third multipliers are selectively disabled if an operand equals zero or has a small value. If the operands are both less than 2N/2, the second or the third multiplier are used to multiply the operands. If one operand is less than 2N/2 and the other operand is equal to or greater than 2N/2, the first multiplier is used or the second and third multipliers are used to multiply the operands. If both operands are equal to or greater than 2N/2, the first, second and third multipliers are used to multiply the operands.
    Type: Application
    Filed: January 12, 2023
    Publication date: May 18, 2023
    Inventors: Ilia OVSIANNIKOV, Ali SHAFIEE ARDESTANI, Joseph H. HASSOUN, Lei WANG
  • Publication number: 20230153569
    Abstract: A system and a method are disclosed for estimating a latency of a layer of a neural network. A host processing device adds an auxiliary layer to a selected layer of the neural network. A neural processing unit executes an inference operation over the selected layer and the auxiliary layer. A total latency is measured for the inference operation for the selected layer and the auxiliary layer, and an overhead latency is measured for the inference operation. The overhead latency is subtracted from the total latency to generate an estimate of the latency of the layer. In one embodiment, measuring the overhead latency for the inference operation that is associated with the auxiliary layer involves modeling the overhead latency based on a linear regression of an input data size that is input to the selected layer, and an output data size that is output from the auxiliary layer.
    Type: Application
    Filed: January 14, 2022
    Publication date: May 18, 2023
    Inventors: Jun FANG, Li YANG, David THORSLEY, Joseph H. HASSOUN
  • Publication number: 20230107658
    Abstract: A system and method for training a neural network. In some embodiments the method includes training a full-sized network and a plurality of sub-networks, the training including performing a plurality of iterations of supervised co-training, the performing of each iteration including co-training the full-sized network and a subset of the sub-networks.
    Type: Application
    Filed: May 6, 2022
    Publication date: April 6, 2023
    Inventors: Li YANG, Jun FANG, David Philip Lloyd THORSLEY, Joseph H. HASSOUN, Hamzah Ahmed Ali ABDELAZIZ
  • Publication number: 20230103997
    Abstract: A vision transformer includes L layers, and H attention heads in each layer. An h? of the attention heads include an attention mask added before a Softmax operation, and an h of the attention heads include unmasked attention heads in which H=h?+h. Each attention mask multiplies a Query vector and a Key vector for form element-wise products. At least one attention mask is a hard mask that selects closest neighbors of a patch and ignores patches further away than the closest neighbors of the patch. Alternatively, at least one attention mask includes a soft mask that multiplies weights of closest neighbors of a patch by a magnification factor and passes weights of patches that are further away than the closest neighbors of the patch. A learnable bias ? may be added to diagonal elements of the at least one attention map.
    Type: Application
    Filed: January 11, 2022
    Publication date: April 6, 2023
    Inventors: Ling LI, Ali SHAFIEE ARDESTANI, Joseph H. HASSOUN
  • Publication number: 20230047025
    Abstract: A multichannel data packer includes a plurality of two-input multiplexers and a controller. The plurality of two-input multiplexers is arranged in 2N rows and N columns in which N is an integer greater than 1. Each input of a multiplexer in a first column receives a respective bit stream of 2N channels of bit streams. Each respective bit stream includes a bit-stream length based on data in the bit stream. The multiplexers in a last column output 2N channels of packed bit streams each having a same bit-stream length. The controller controls the plurality of multiplexers so that the multiplexers in the last column output the 2N channels of bit streams that each has the same bit-stream length.
    Type: Application
    Filed: October 19, 2022
    Publication date: February 16, 2023
    Inventors: Ilia OVSIANNIKOV, Ali SHAFIEE ARDESTANI, Lei WANG, Joseph H. HASSOUN
  • Publication number: 20230038891
    Abstract: A method is disclosed for reducing computation of a differentiable architecture search. An output node is formed having a channel dimension that is one-fourth of a channel dimension of a normal cell of a neural network architecture by averaging channel outputs of intermediate nodes of the normal cell. The output node is preprocessed using a 1×1 convolution to form channels of input nodes for a next layer of the cells in the neural network architecture. Forming the output node includes forming s groups of channel outputs of the intermediate nodes by dividing the channel outputs of the intermediate nodes by a splitting parameter s. An average channel output for each group of channel outputs is formed, and the output node is formed by concatenating the average channel output for each group of channels with channel outputs of the intermediate nodes of the normal cell.
    Type: Application
    Filed: September 8, 2021
    Publication date: February 9, 2023
    Inventors: Jun FANG, David Philip Lloyd THORSLEY, Chengyao SHEN, Joseph H. HASSOUN