Patents by Inventor Joseph H. Hassoun

Joseph H. Hassoun has filed for patents to protect the following inventions. This listing includes patent applications that are pending as well as patents that have already been granted by the United States Patent and Trademark Office (USPTO).

  • Publication number: 20230028226
    Abstract: A method is disclosed to reduce computation in a self-attention deep-learning model. A feature-map regularization term is added to a loss function while training the self-attention model. At least one low-magnitude feature is removed from at least one feature map of the self-attention model during inference. Weights of the self-attention model are quantized after the self-attention model has been trained. Adding the feature-map regularization term reduces activation values of feature maps, and removing the at least one low-magnitude feature from at least one feature map may be performed by setting the low-magnitude feature to be equal to zero based on the low-magnitude feature having a value that is less than a predetermined threshold. Feature maps of the self-attention model quantized and compressed.
    Type: Application
    Filed: September 14, 2021
    Publication date: January 26, 2023
    Inventors: David Philip Lloyd THORSLEY, Joseph H. HASSOUN, Jun FANG, Chengyao SHEN
  • Publication number: 20220405559
    Abstract: A neural network accelerator is disclosed that includes a multiplication unit, an adder-tree unit and an accumulator unit. The multiplication unit and the adder tree unit are configured to perform lattice-multiplication operations. The accumulator unit is coupled to an output of the adder tree to form dot-product values from the lattice-multiplication operations performed by the multiplication unit and the adder tree unit. The multiplication unit includes n multiplier units that perform lattice-multiplication-based operations and output product values. Each multiplier unit includes a plurality of multipliers. Each multiplier unit receives first and second multiplicands that each include a most significant nibble (MSN) and a least significant nibble (LSN). The multipliers in each multiplier unit receive different combinations of the MSNs and the LSNs of the multiplicands. The multiplication unit and the adder can provide mixed-precision dot-product computations.
    Type: Application
    Filed: August 31, 2021
    Publication date: December 22, 2022
    Inventors: Hamzah Ahmed Ali ABDELAZIZ, Ali SHAFIEE ARDESTANI, Joseph H. HASSOUN
  • Publication number: 20220405557
    Abstract: A system and a method is disclosed for processing input feature map (IFM) data of a current layer of a neural network model using an array of reconfigurable neural processing units (NPUs) and storing output feature map (OFM) data of the next layer of the neural network model at a location that does not involve a data transfer between memories of the NPUs according to the subject matter disclosed herein. The reconfigurable NPUs may be used to improve NPU utilization of NPUs of a neural processing system.
    Type: Application
    Filed: August 11, 2021
    Publication date: December 22, 2022
    Inventors: Jong Hoon SHIN, Ali SHAFIEE ARDESTANI, Joseph H. HASSOUN
  • Publication number: 20220405558
    Abstract: A core of neural processing units is configured to efficiently process a depthwise convolution by maximizing spatial feature-map locality using adder trees. Data paths of activations and weights are inverted, and 2-to-1 multiplexers are every 2/9 multipliers along a row of multipliers. During a depthwise convolution operation, the core is operated using a RS×HW dataflow to maximize the locality of feature maps. For a normal convolution operation, the data paths of activations and weights may be configured for a normal convolution configuration and in which multiplexers are idle.
    Type: Application
    Filed: August 12, 2021
    Publication date: December 22, 2022
    Inventors: Jong Hoon SHIN, Ali SHAFIEE ARDESTANI, Joseph H. HASSOUN
  • Publication number: 20220156569
    Abstract: A general matrix-matrix (GEMM) accelerator core includes first and second buffers, and a processing element (PE). The first buffer receives a elements of a matrix A of activation values. The second buffer receives b elements of a matrix B of weight values. The matrix B is preprocessed with a nonzero-valued b element replacing a zero-valued b element in a first row of the second buffer based on the zero-valued b element being in the first row of the second buffer. Metadata is generated that includes movement information of the nonzero-valued b element to replace the zero-valued b element. The PE receives b elements from a first row of the second buffer and a elements from the first buffer from locations in the first buffer that correspond to locations in the second buffer from where the b elements have been received by the PE as indicated by the metadata.
    Type: Application
    Filed: November 8, 2021
    Publication date: May 19, 2022
    Inventors: Jong Hoon SHIN, Ali SHAFIEE ARDESTANI, Joseph H. HASSOUN
  • Publication number: 20220156568
    Abstract: A general matrix-matrix (GEMM) accelerator core includes first and second buffers, a control logic circuit, and a first processing element (PE). The first buffer receives a elements of a first matrix A of activation values. The second buffer receives b elements of a second matrix B of weight values. The control logic circuit replaces a zero-valued a element in a first column of the first buffer with a nonzero-valued a element that is within a maximum borrowing distance of a location of the zero-valued a element in the first column of the first buffer. The PE receives a elements from the first column of the first buffer including the nonzero-valued element a selected to replace the zero-valued a element and receives b elements from locations in the second buffer that correspond to locations in the first buffer from where the a elements have been received by the PE.
    Type: Application
    Filed: November 8, 2021
    Publication date: May 19, 2022
    Inventors: Jong Hoon SHIN, Ali SHAFIEE ARDESTANI, Joseph H. HASSOUN
  • Publication number: 20220147313
    Abstract: A processor for fine-grain sparse integer and floating-point operations and method of operation thereof. In some embodiments, the method includes forming a first set of products, and forming a second set of products. The forming of the first set of products may include: multiplying, in a first multiplier, a second multiplier, and a third multiplier, the first activation value by a first least significant sub-word, a second least significant sub-word, and a most significant sub-word; and adding a first resulting partial product and a second resulting partial product. The forming of the second set of products may include forming a first floating point product, the forming of the first floating point product including multiplying, in the first multiplier, a first sub-word of a mantissa of an activation value by a first sub-word of a mantissa of a weight, to form a third partial product.
    Type: Application
    Filed: December 23, 2020
    Publication date: May 12, 2022
    Inventors: Ali Shafiee Ardestani, Joseph H. Hassoun
  • Publication number: 20210357748
    Abstract: A system and method for weight preprocessing. In some embodiments, the method includes performing intra-tile preprocessing of a first weight tensor to form a first pre-processed weight tensor, and performing inter-tile preprocessing of the first pre-processed weight tensor, to form a second pre-processed weight tensor. The intra-tile preprocessing may include moving a first element of a first weight tile of the first weight tensor by one position, within the first weight tile, in a lookahead direction or in a lookaside direction. The inter-tile preprocessing may include moving a first row of a weight tile of the first pre-processed weight tensor by one position in a lookahead direction or by one position in a lookaside direction.
    Type: Application
    Filed: August 11, 2020
    Publication date: November 18, 2021
    Inventors: Jong Hoon Shin, Ali Shafiee Ardestani, Hamzah Ahmed Ali Abdelaziz, Joseph H. Hassoun
  • Publication number: 20210319079
    Abstract: A dot-product architecture and method are disclosed for calculating floating-point dot-products of two vectors. The architecture includes an array of multiplier units that each include an integer logic that multiplies integer values of corresponding elements of the two vectors; an exponent logic that adds exponent values of the corresponding elements of the two vectors to form an unbiased exponent values, and a local shifter that forms a first shifted value by shifting a product-integer value by a number of bits in a predetermined direction based on a difference value between an unbiased exponent value corresponding to the product-integer value and a maximum unbiased exponent value for the array of multiplier units. An adder tree adds shifted values output from local shifters of the array of multiplier units to form an output, and an accumulator accumulates the output of the addition unit.
    Type: Application
    Filed: January 20, 2021
    Publication date: October 14, 2021
    Inventors: Hamzah Ahmed Ali ABDELAZIZ, Ali SHAFIEE ARDESTANI, Joseph H. HASSOUN
  • Publication number: 20210133278
    Abstract: A method of quantizing an artificial neural network may include dividing a quantization range for a tensor of the artificial neural network into a first region and a second region, and quantizing values of the tensor in the first region separately from values of the tensor in the second region. Linear or nonlinear quantization may be applied to values of the tensor in the first region and the second region. The method may include locating a breakpoint between the first region and the second region by substantially minimizing an expected quantization error over at least a portion of the quantization range. The expected quantization error may be minimized by solving analytically and/or searching numerically.
    Type: Application
    Filed: March 11, 2020
    Publication date: May 6, 2021
    Inventors: Jun FANG, Joseph H. HASSOUN, Ali SHAFIEE ARDESTANI, Hamzah Ahmed Ali ABDELAZIZ, Georgios GEORGIADIS, Hui CHEN, David Philip Lloyd THORSLEY
  • Publication number: 20200349420
    Abstract: A processor to perform inference on deep learning neural network models. In some embodiments, the process includes: a first tile, a second tile, a memory, and a bus, the bus being connected to: the memory, the first tile, and the second tile, the first tile including: a first weight register, a second weight register, an activations cache, a shuffler, an activations buffer, a first multiplier, and a second multiplier, the activations buffer being configured to include: a first queue connected to the first multiplier, and a second queue connected to the second multiplier, the activations cache including a plurality of independent lanes, each of the independent lanes being randomly accessible, the first tile being configured: to receive a tensor including a plurality of two-dimensional arrays, each representing one color component of the image; and to perform a convolution of a kernel with one of the two-dimensional arrays.
    Type: Application
    Filed: April 3, 2020
    Publication date: November 5, 2020
    Inventors: Ilia Ovsiannikov, Ali Shafiee Ardestani, Hamzah Ahmed Ali Abdelaziz, Joseph H. Hassoun
  • Publication number: 20200336272
    Abstract: A multichannel data packer includes a plurality of two-input multiplexers and a controller. The plurality of two-input multiplexers is arranged in 2N rows and N columns in which N is an integer greater than 1. Each input of a multiplexer in a first column receives a respective bit stream of 2N channels of bit streams. Each respective bit stream includes a bit-stream length based on data in the bit stream. The multiplexers in a last column output 2N channels of packed bit streams each having a same bit-stream length. The controller controls the plurality of multiplexers so that the multiplexers in the last column output the 2N channels of bit streams that each has the same bit-stream length.
    Type: Application
    Filed: April 7, 2020
    Publication date: October 22, 2020
    Inventors: Ilia OVSIANNIKOV, Lei WANG, Ali ARDESTANI SHAFIEE, Joseph H. HASSOUN
  • Publication number: 20200026978
    Abstract: A neural processor. In some embodiments, the processor includes a first tile, a second tile, a memory, and a bus. The bus may be connected to the memory, the first tile, and the second tile. The first tile may include: a first weight register, a second weight register, an activations buffer, a first multiplier, and a second multiplier. The activations buffer may be configured to include: a first queue connected to the first multiplier and a second queue connected to the second multiplier. The first queue may include a first register and a second register adjacent to the first register, the first register being an output register of the first queue. The first tile may be configured: in a first state: to multiply, in the first multiplier, a first weight by an activation from the output register of the first queue, and in a second state: to multiply, in the first multiplier, the first weight by an activation from the second register of the first queue.
    Type: Application
    Filed: August 27, 2019
    Publication date: January 23, 2020
    Inventors: Ilia Ovsiannikov, Ali Shafiee Ardestani, Joseph H. Hassoun, Lei Wang, Sehwan Lee, JoonHo Song, Jun-Woo Jang, Yibing Michelle Wang, Yuecheng Li
  • Publication number: 20200026979
    Abstract: A neural processor. In some embodiments, the processor includes a first tile, a second tile, a memory, and a bus. The bus may be connected to the memory, the first tile, and the second tile. The first tile may include: a first weight register, a second weight register, an activations buffer, a first multiplier, and a second multiplier. The activations buffer may be configured to include: a first queue connected to the first multiplier and a second queue connected to the second multiplier. The first queue may include a first register and a second register adjacent to the first register, the first register being an output register of the first queue. The first tile may be configured: in a first state: to multiply, in the first multiplier, a first weight by an activation from the output register of the first queue, and in a second state: to multiply, in the first multiplier, the first weight by an activation from the second register of the first queue.
    Type: Application
    Filed: August 27, 2019
    Publication date: January 23, 2020
    Inventors: Ilia Ovsiannikov, Ali Shafiee Ardestani, Joseph H. Hassoun, Lei Wang, Sehwan Lee, JoonHo Song, Jun-Woo Jang, Yibing Michelle Wang, Yuecheng Li
  • Publication number: 20200026980
    Abstract: A neural processor. In some embodiments, the processor includes a first tile, a second tile, a memory, and a bus. The bus may be connected to the memory, the first tile, and the second tile. The first tile may include: a first weight register, a second weight register, an activations buffer, a first multiplier, and a second multiplier. The activations buffer may be configured to include: a first queue connected to the first multiplier and a second queue connected to the second multiplier. The first queue may include a first register and a second register adjacent to the first register, the first register being an output register of the first queue. The first tile may be configured: in a first state: to multiply, in the first multiplier, a first weight by an activation from the output register of the first queue, and in a second state: to multiply, in the first multiplier, the first weight by an activation from the second register of the first queue.
    Type: Application
    Filed: August 27, 2019
    Publication date: January 23, 2020
    Inventors: Ilia Ovsiannikov, Ali Shafiee Ardestani, Joseph H. Hassoun, Lei Wang, Sehwan Lee, JoonHo Song, Jun-Woo Jang, Yibing Michelle Wang, Yuecheng Li
  • Publication number: 20190392287
    Abstract: A neural processor. In some embodiments, the processor includes a first tile, a second tile, a memory, and a bus. The bus may be connected to the memory, the first tile, and the second tile. The first tile may include: a first weight register, a second weight register, an activations buffer, a first multiplier, and a second multiplier. The activations buffer may be configured to include: a first queue connected to the first multiplier and a second queue connected to the second multiplier. The first queue may include a first register and a second register adjacent to the first register, the first register being an output register of the first queue. The first tile may be configured: in a first state: to multiply, in the first multiplier, a first weight by an activation from the output register of the first queue, and in a second state: to multiply, in the first multiplier, the first weight by an activation from the second register of the first queue.
    Type: Application
    Filed: June 19, 2019
    Publication date: December 26, 2019
    Inventors: Ilia Ovsiannikov, Ali Shafiee Ardestani, Joseph H. Hassoun, Lei Wang, Sehwan Lee, JoonHo Song, Jun-Woo Jang, Yibing Michelle Wang, Yuecheng Li
  • Patent number: 7317773
    Abstract: Method and apparatus for doubling the throughput rate of data transmission on a logic path comprising providing two latches that alternately receive successive bits of the data stream to be transmitted and a multiplexer having data transmission paths that are alternately clocked by two separate clocks, which clocks are substantially 180 degrees out of phase.
    Type: Grant
    Filed: July 9, 2004
    Date of Patent: January 8, 2008
    Assignee: Xilinx, Inc.
    Inventors: Steven P. Young, Suresh M. Menon, Ketan Sodha, Richard A. Carberry, Joseph H. Hassoun
  • Publication number: 20040239365
    Abstract: Method and apparatus for doubling the throughput rate of data transmission on a logic path comprising providing two latches that alternately receive successive bits of the data stream to be transmitted and a multiplexer having data transmission paths that are alternately clocked by two separate clocks, which clocks are substantially 180 degrees out of phase.
    Type: Application
    Filed: July 9, 2004
    Publication date: December 2, 2004
    Applicant: Xilinx, Inc.
    Inventors: Steven P. Young, Suresh M. Menon, Ketan Sodha, Richard A. Carberry, Joseph H. Hassoun
  • Patent number: 6820086
    Abstract: A linked list structure in a computing system includes a first entry and additional entries. Each additional entry includes a link reference to a prior entry in the linked list. The link reference for each additional entry all are stored within a content addressable memory. Each additional entry is accessible by performing a content search using the link reference to the prior entry. The linked list is traversed by accessing the first entry in the linked list. A second entry in the linked list is accessed by searching the content addressable memory with an index of the first entry. A third entry in the linked list is accessed by searching the content addressable memory with an index of the second entry.
    Type: Grant
    Filed: June 18, 1999
    Date of Patent: November 16, 2004
    Assignee: Hewlett-Packard Development Company, L.P.
    Inventors: Sorin Iacobovici, William R. Bryg, Joseph H. Hassoun
  • Patent number: 6777980
    Abstract: Method and apparatus for doubling the throughput rate of data transmission on a logic path comprising providing two latches that alternately receive successive bits of the data stream to be transmitted and a multiplexer having data transmission paths that are alternately clocked by two separate clocks, which clocks are substantially 180 degrees out of phase.
    Type: Grant
    Filed: January 15, 2003
    Date of Patent: August 17, 2004
    Assignee: Xilinx, Inc.
    Inventors: Steven P. Young, Suresh M. Menon, Ketan Sodha, Richard A. Carberry, Joseph H. Hassoun